7.2 KiB
7.2 KiB
Address Validation Service Design
Overview
A separate Docker service for validating and standardizing addresses using a hybrid approach that prioritizes privacy and minimizes external API calls.
Architecture
Service Design
- Standalone FastAPI service running on port 8001
- SQLite database containing USPS ZIP+4 data (~500MB)
- USPS API integration for street-level validation when needed
- Redis cache for validated addresses
- Internal HTTP API for communication with main legal application
Data Flow
1. Legal App → Address Service (POST /validate)
2. Address Service checks local ZIP database
3. If ZIP/city/state valid → return immediately
4. If street validation needed → call USPS API
5. Cache result in Redis
6. Return standardized address to Legal App
Technical Requirements
Dependencies
- FastAPI framework
- SQLAlchemy for database operations
- SQLite for ZIP+4 database storage
- Redis for caching validated addresses
- httpx for USPS API calls
- Pydantic for request/response validation
Database Schema
-- ZIP+4 Database (from USPS monthly files)
CREATE TABLE zip_codes (
zip_code TEXT,
plus4 TEXT,
city TEXT,
state TEXT,
county TEXT,
delivery_point TEXT,
PRIMARY KEY (zip_code, plus4)
);
CREATE INDEX idx_zip_city ON zip_codes(zip_code, city);
CREATE INDEX idx_city_state ON zip_codes(city, state);
API Endpoints
POST /validate
Validate and standardize an address.
Request:
{
"street": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "90210",
"strict": false // Optional: require exact match
}
Response:
{
"valid": true,
"confidence": 0.95,
"source": "local", // "local", "usps_api", "cached"
"standardized": {
"street": "123 MAIN ST",
"city": "ANYTOWN",
"state": "CA",
"zip": "90210",
"plus4": "1234",
"delivery_point": "12"
},
"corrections": [
"Standardized street abbreviation ST"
]
}
GET /health
Health check endpoint.
POST /batch-validate
Batch validation for multiple addresses (up to 50).
GET /stats
Service statistics (cache hit rate, API usage, etc.).
Privacy & Security Features
Data Minimization
- Only street numbers/names sent to USPS API when necessary
- ZIP/city/state validation happens offline first
- Validated addresses cached to avoid repeat API calls
- No logging of personal addresses
Rate Limiting
- USPS API limited to 5 requests/second
- Internal queue system for burst requests
- Fallback to local-only validation when rate limited
Caching Strategy
- Redis cache with 30-day TTL for validated addresses
- Cache key: SHA256 hash of normalized address
- Cache hit ratio target: >80% after initial warmup
Data Sources
USPS ZIP+4 Database
- Source: USPS Address Management System
- Update frequency: Monthly
- Size: ~500MB compressed, ~2GB uncompressed
- Format: Fixed-width text files (legacy format)
- Download: Automated monthly sync via USPS FTP
USPS Address Validation API
- Endpoint: https://secure.shippingapis.com/ShippingAPI.dll
- Rate limit: 5 requests/second, 10,000/day free
- Authentication: USPS Web Tools User ID required
- Response format: XML (convert to JSON internally)
Implementation Phases
Phase 1: Basic Service (1-2 days)
- FastAPI service setup
- Basic ZIP code validation using downloaded USPS data
- Docker containerization
- Simple /validate endpoint
Phase 2: USPS Integration (1 day)
- USPS API client implementation
- Street-level validation
- Error handling and fallbacks
Phase 3: Caching & Optimization (1 day)
- Redis integration
- Performance optimization
- Batch validation endpoint
Phase 4: Data Management (1 day)
- Automated USPS data downloads
- Database update procedures
- Monitoring and alerting
Phase 5: Integration (0.5 day)
- Update legal app to use address service
- Form validation integration
- Error handling in UI
Docker Configuration
Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8001
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]
Docker Compose Addition
services:
address-service:
build: ./address-service
ports:
- "8001:8001"
environment:
- USPS_USER_ID=${USPS_USER_ID}
- REDIS_URL=redis://redis:6379
depends_on:
- redis
volumes:
- ./address-service/data:/app/data
Configuration
Environment Variables
USPS_USER_ID: USPS Web Tools user IDREDIS_URL: Redis connection stringZIP_DB_PATH: Path to SQLite ZIP databaseUPDATE_SCHEDULE: Cron schedule for data updatesAPI_RATE_LIMIT: USPS API rate limit (default: 5/second)CACHE_TTL: Cache time-to-live in seconds (default: 2592000 = 30 days)
Monitoring & Metrics
Key Metrics
- Cache hit ratio
- USPS API usage/limits
- Response times (local vs API)
- Validation success rates
- Database update status
Health Checks
- Service availability
- Database connectivity
- Redis connectivity
- USPS API connectivity
- Disk space for ZIP database
Error Handling
Graceful Degradation
- USPS API down → Fall back to local ZIP validation only
- Redis down → Skip caching, direct validation
- ZIP database corrupt → Use USPS API only
- All systems down → Return input address with warning
Error Responses
{
"valid": false,
"error": "USPS_API_UNAVAILABLE",
"message": "Street validation temporarily unavailable",
"fallback_used": "local_zip_only"
}
Testing Strategy
Unit Tests
- Address normalization functions
- ZIP database queries
- USPS API client
- Caching logic
Integration Tests
- Full validation workflow
- Error handling scenarios
- Performance benchmarks
- Data update procedures
Load Testing
- Concurrent validation requests
- Cache performance under load
- USPS API rate limiting behavior
Security Considerations
Input Validation
- Sanitize all address inputs
- Prevent SQL injection in ZIP queries
- Validate against malicious payloads
Network Security
- Internal service communication only
- No direct external access to service
- HTTPS for USPS API calls
- Redis authentication if exposed
Data Protection
- No persistent logging of addresses
- Secure cache key generation
- Regular security updates for dependencies
Future Enhancements
Phase 2 Features
- International address validation (Google/SmartyStreets)
- Address autocomplete suggestions
- Geocoding integration
- Delivery route optimization
Performance Optimizations
- Database partitioning by state
- Compressed cache storage
- Async batch processing
- CDN for static ZIP data
Cost Analysis
Infrastructure Costs
- Additional container resources: ~$10/month
- Redis cache: ~$5/month
- USPS ZIP data storage: Minimal
- USPS API: Free tier (10K requests/day)
Development Time
- Initial implementation: 3-5 days
- Testing and refinement: 1-2 days
- Documentation and deployment: 0.5 day
- Total: 4.5-7.5 days
ROI
- Improved data quality
- Reduced shipping errors
- Better client communication
- Compliance with data standards
- Foundation for future location-based features