304 lines
7.2 KiB
Markdown
304 lines
7.2 KiB
Markdown
# Address Validation Service Design
|
|
|
|
## Overview
|
|
A separate Docker service for validating and standardizing addresses using a hybrid approach that prioritizes privacy and minimizes external API calls.
|
|
|
|
## Architecture
|
|
|
|
### Service Design
|
|
- **Standalone FastAPI service** running on port 8001
|
|
- **SQLite database** containing USPS ZIP+4 data (~500MB)
|
|
- **USPS API integration** for street-level validation when needed
|
|
- **Redis cache** for validated addresses
|
|
- **Internal HTTP API** for communication with main legal application
|
|
|
|
### Data Flow
|
|
```
|
|
1. Legal App → Address Service (POST /validate)
|
|
2. Address Service checks local ZIP database
|
|
3. If ZIP/city/state valid → return immediately
|
|
4. If street validation needed → call USPS API
|
|
5. Cache result in Redis
|
|
6. Return standardized address to Legal App
|
|
```
|
|
|
|
## Technical Requirements
|
|
|
|
### Dependencies
|
|
- FastAPI framework
|
|
- SQLAlchemy for database operations
|
|
- SQLite for ZIP+4 database storage
|
|
- Redis for caching validated addresses
|
|
- httpx for USPS API calls
|
|
- Pydantic for request/response validation
|
|
|
|
### Database Schema
|
|
```sql
|
|
-- ZIP+4 Database (from USPS monthly files)
|
|
CREATE TABLE zip_codes (
|
|
zip_code TEXT,
|
|
plus4 TEXT,
|
|
city TEXT,
|
|
state TEXT,
|
|
county TEXT,
|
|
delivery_point TEXT,
|
|
PRIMARY KEY (zip_code, plus4)
|
|
);
|
|
|
|
CREATE INDEX idx_zip_city ON zip_codes(zip_code, city);
|
|
CREATE INDEX idx_city_state ON zip_codes(city, state);
|
|
```
|
|
|
|
### API Endpoints
|
|
|
|
#### POST /validate
|
|
Validate and standardize an address.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"street": "123 Main St",
|
|
"city": "Anytown",
|
|
"state": "CA",
|
|
"zip": "90210",
|
|
"strict": false // Optional: require exact match
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"valid": true,
|
|
"confidence": 0.95,
|
|
"source": "local", // "local", "usps_api", "cached"
|
|
"standardized": {
|
|
"street": "123 MAIN ST",
|
|
"city": "ANYTOWN",
|
|
"state": "CA",
|
|
"zip": "90210",
|
|
"plus4": "1234",
|
|
"delivery_point": "12"
|
|
},
|
|
"corrections": [
|
|
"Standardized street abbreviation ST"
|
|
]
|
|
}
|
|
```
|
|
|
|
#### GET /health
|
|
Health check endpoint.
|
|
|
|
#### POST /batch-validate
|
|
Batch validation for multiple addresses (up to 50).
|
|
|
|
#### GET /stats
|
|
Service statistics (cache hit rate, API usage, etc.).
|
|
|
|
## Privacy & Security Features
|
|
|
|
### Data Minimization
|
|
- Only street numbers/names sent to USPS API when necessary
|
|
- ZIP/city/state validation happens offline first
|
|
- Validated addresses cached to avoid repeat API calls
|
|
- No logging of personal addresses
|
|
|
|
### Rate Limiting
|
|
- USPS API limited to 5 requests/second
|
|
- Internal queue system for burst requests
|
|
- Fallback to local-only validation when rate limited
|
|
|
|
### Caching Strategy
|
|
- Redis cache with 30-day TTL for validated addresses
|
|
- Cache key: SHA256 hash of normalized address
|
|
- Cache hit ratio target: >80% after initial warmup
|
|
|
|
## Data Sources
|
|
|
|
### USPS ZIP+4 Database
|
|
- **Source:** USPS Address Management System
|
|
- **Update frequency:** Monthly
|
|
- **Size:** ~500MB compressed, ~2GB uncompressed
|
|
- **Format:** Fixed-width text files (legacy format)
|
|
- **Download:** Automated monthly sync via USPS FTP
|
|
|
|
### USPS Address Validation API
|
|
- **Endpoint:** https://secure.shippingapis.com/ShippingAPI.dll
|
|
- **Rate limit:** 5 requests/second, 10,000/day free
|
|
- **Authentication:** USPS Web Tools User ID required
|
|
- **Response format:** XML (convert to JSON internally)
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Basic Service (1-2 days)
|
|
- FastAPI service setup
|
|
- Basic ZIP code validation using downloaded USPS data
|
|
- Docker containerization
|
|
- Simple /validate endpoint
|
|
|
|
### Phase 2: USPS Integration (1 day)
|
|
- USPS API client implementation
|
|
- Street-level validation
|
|
- Error handling and fallbacks
|
|
|
|
### Phase 3: Caching & Optimization (1 day)
|
|
- Redis integration
|
|
- Performance optimization
|
|
- Batch validation endpoint
|
|
|
|
### Phase 4: Data Management (1 day)
|
|
- Automated USPS data downloads
|
|
- Database update procedures
|
|
- Monitoring and alerting
|
|
|
|
### Phase 5: Integration (0.5 day)
|
|
- Update legal app to use address service
|
|
- Form validation integration
|
|
- Error handling in UI
|
|
|
|
## Docker Configuration
|
|
|
|
### Dockerfile
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
COPY requirements.txt .
|
|
RUN pip install -r requirements.txt
|
|
|
|
COPY . .
|
|
EXPOSE 8001
|
|
|
|
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]
|
|
```
|
|
|
|
### Docker Compose Addition
|
|
```yaml
|
|
services:
|
|
address-service:
|
|
build: ./address-service
|
|
ports:
|
|
- "8001:8001"
|
|
environment:
|
|
- USPS_USER_ID=${USPS_USER_ID}
|
|
- REDIS_URL=redis://redis:6379
|
|
depends_on:
|
|
- redis
|
|
volumes:
|
|
- ./address-service/data:/app/data
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
- `USPS_USER_ID`: USPS Web Tools user ID
|
|
- `REDIS_URL`: Redis connection string
|
|
- `ZIP_DB_PATH`: Path to SQLite ZIP database
|
|
- `UPDATE_SCHEDULE`: Cron schedule for data updates
|
|
- `API_RATE_LIMIT`: USPS API rate limit (default: 5/second)
|
|
- `CACHE_TTL`: Cache time-to-live in seconds (default: 2592000 = 30 days)
|
|
|
|
## Monitoring & Metrics
|
|
|
|
### Key Metrics
|
|
- Cache hit ratio
|
|
- USPS API usage/limits
|
|
- Response times (local vs API)
|
|
- Validation success rates
|
|
- Database update status
|
|
|
|
### Health Checks
|
|
- Service availability
|
|
- Database connectivity
|
|
- Redis connectivity
|
|
- USPS API connectivity
|
|
- Disk space for ZIP database
|
|
|
|
## Error Handling
|
|
|
|
### Graceful Degradation
|
|
1. USPS API down → Fall back to local ZIP validation only
|
|
2. Redis down → Skip caching, direct validation
|
|
3. ZIP database corrupt → Use USPS API only
|
|
4. All systems down → Return input address with warning
|
|
|
|
### Error Responses
|
|
```json
|
|
{
|
|
"valid": false,
|
|
"error": "USPS_API_UNAVAILABLE",
|
|
"message": "Street validation temporarily unavailable",
|
|
"fallback_used": "local_zip_only"
|
|
}
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- Address normalization functions
|
|
- ZIP database queries
|
|
- USPS API client
|
|
- Caching logic
|
|
|
|
### Integration Tests
|
|
- Full validation workflow
|
|
- Error handling scenarios
|
|
- Performance benchmarks
|
|
- Data update procedures
|
|
|
|
### Load Testing
|
|
- Concurrent validation requests
|
|
- Cache performance under load
|
|
- USPS API rate limiting behavior
|
|
|
|
## Security Considerations
|
|
|
|
### Input Validation
|
|
- Sanitize all address inputs
|
|
- Prevent SQL injection in ZIP queries
|
|
- Validate against malicious payloads
|
|
|
|
### Network Security
|
|
- Internal service communication only
|
|
- No direct external access to service
|
|
- HTTPS for USPS API calls
|
|
- Redis authentication if exposed
|
|
|
|
### Data Protection
|
|
- No persistent logging of addresses
|
|
- Secure cache key generation
|
|
- Regular security updates for dependencies
|
|
|
|
## Future Enhancements
|
|
|
|
### Phase 2 Features
|
|
- International address validation (Google/SmartyStreets)
|
|
- Address autocomplete suggestions
|
|
- Geocoding integration
|
|
- Delivery route optimization
|
|
|
|
### Performance Optimizations
|
|
- Database partitioning by state
|
|
- Compressed cache storage
|
|
- Async batch processing
|
|
- CDN for static ZIP data
|
|
|
|
## Cost Analysis
|
|
|
|
### Infrastructure Costs
|
|
- Additional container resources: ~$10/month
|
|
- Redis cache: ~$5/month
|
|
- USPS ZIP data storage: Minimal
|
|
- USPS API: Free tier (10K requests/day)
|
|
|
|
### Development Time
|
|
- Initial implementation: 3-5 days
|
|
- Testing and refinement: 1-2 days
|
|
- Documentation and deployment: 0.5 day
|
|
- **Total: 4.5-7.5 days**
|
|
|
|
### ROI
|
|
- Improved data quality
|
|
- Reduced shipping errors
|
|
- Better client communication
|
|
- Compliance with data standards
|
|
- Foundation for future location-based features |