# Address Validation Service Design ## Overview A separate Docker service for validating and standardizing addresses using a hybrid approach that prioritizes privacy and minimizes external API calls. ## Architecture ### Service Design - **Standalone FastAPI service** running on port 8001 - **SQLite database** containing USPS ZIP+4 data (~500MB) - **USPS API integration** for street-level validation when needed - **Redis cache** for validated addresses - **Internal HTTP API** for communication with main legal application ### Data Flow ``` 1. Legal App → Address Service (POST /validate) 2. Address Service checks local ZIP database 3. If ZIP/city/state valid → return immediately 4. If street validation needed → call USPS API 5. Cache result in Redis 6. Return standardized address to Legal App ``` ## Technical Requirements ### Dependencies - FastAPI framework - SQLAlchemy for database operations - SQLite for ZIP+4 database storage - Redis for caching validated addresses - httpx for USPS API calls - Pydantic for request/response validation ### Database Schema ```sql -- ZIP+4 Database (from USPS monthly files) CREATE TABLE zip_codes ( zip_code TEXT, plus4 TEXT, city TEXT, state TEXT, county TEXT, delivery_point TEXT, PRIMARY KEY (zip_code, plus4) ); CREATE INDEX idx_zip_city ON zip_codes(zip_code, city); CREATE INDEX idx_city_state ON zip_codes(city, state); ``` ### API Endpoints #### POST /validate Validate and standardize an address. **Request:** ```json { "street": "123 Main St", "city": "Anytown", "state": "CA", "zip": "90210", "strict": false // Optional: require exact match } ``` **Response:** ```json { "valid": true, "confidence": 0.95, "source": "local", // "local", "usps_api", "cached" "standardized": { "street": "123 MAIN ST", "city": "ANYTOWN", "state": "CA", "zip": "90210", "plus4": "1234", "delivery_point": "12" }, "corrections": [ "Standardized street abbreviation ST" ] } ``` #### GET /health Health check endpoint. #### POST /batch-validate Batch validation for multiple addresses (up to 50). #### GET /stats Service statistics (cache hit rate, API usage, etc.). ## Privacy & Security Features ### Data Minimization - Only street numbers/names sent to USPS API when necessary - ZIP/city/state validation happens offline first - Validated addresses cached to avoid repeat API calls - No logging of personal addresses ### Rate Limiting - USPS API limited to 5 requests/second - Internal queue system for burst requests - Fallback to local-only validation when rate limited ### Caching Strategy - Redis cache with 30-day TTL for validated addresses - Cache key: SHA256 hash of normalized address - Cache hit ratio target: >80% after initial warmup ## Data Sources ### USPS ZIP+4 Database - **Source:** USPS Address Management System - **Update frequency:** Monthly - **Size:** ~500MB compressed, ~2GB uncompressed - **Format:** Fixed-width text files (legacy format) - **Download:** Automated monthly sync via USPS FTP ### USPS Address Validation API - **Endpoint:** https://secure.shippingapis.com/ShippingAPI.dll - **Rate limit:** 5 requests/second, 10,000/day free - **Authentication:** USPS Web Tools User ID required - **Response format:** XML (convert to JSON internally) ## Implementation Phases ### Phase 1: Basic Service (1-2 days) - FastAPI service setup - Basic ZIP code validation using downloaded USPS data - Docker containerization - Simple /validate endpoint ### Phase 2: USPS Integration (1 day) - USPS API client implementation - Street-level validation - Error handling and fallbacks ### Phase 3: Caching & Optimization (1 day) - Redis integration - Performance optimization - Batch validation endpoint ### Phase 4: Data Management (1 day) - Automated USPS data downloads - Database update procedures - Monitoring and alerting ### Phase 5: Integration (0.5 day) - Update legal app to use address service - Form validation integration - Error handling in UI ## Docker Configuration ### Dockerfile ```dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8001 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"] ``` ### Docker Compose Addition ```yaml services: address-service: build: ./address-service ports: - "8001:8001" environment: - USPS_USER_ID=${USPS_USER_ID} - REDIS_URL=redis://redis:6379 depends_on: - redis volumes: - ./address-service/data:/app/data ``` ## Configuration ### Environment Variables - `USPS_USER_ID`: USPS Web Tools user ID - `REDIS_URL`: Redis connection string - `ZIP_DB_PATH`: Path to SQLite ZIP database - `UPDATE_SCHEDULE`: Cron schedule for data updates - `API_RATE_LIMIT`: USPS API rate limit (default: 5/second) - `CACHE_TTL`: Cache time-to-live in seconds (default: 2592000 = 30 days) ## Monitoring & Metrics ### Key Metrics - Cache hit ratio - USPS API usage/limits - Response times (local vs API) - Validation success rates - Database update status ### Health Checks - Service availability - Database connectivity - Redis connectivity - USPS API connectivity - Disk space for ZIP database ## Error Handling ### Graceful Degradation 1. USPS API down → Fall back to local ZIP validation only 2. Redis down → Skip caching, direct validation 3. ZIP database corrupt → Use USPS API only 4. All systems down → Return input address with warning ### Error Responses ```json { "valid": false, "error": "USPS_API_UNAVAILABLE", "message": "Street validation temporarily unavailable", "fallback_used": "local_zip_only" } ``` ## Testing Strategy ### Unit Tests - Address normalization functions - ZIP database queries - USPS API client - Caching logic ### Integration Tests - Full validation workflow - Error handling scenarios - Performance benchmarks - Data update procedures ### Load Testing - Concurrent validation requests - Cache performance under load - USPS API rate limiting behavior ## Security Considerations ### Input Validation - Sanitize all address inputs - Prevent SQL injection in ZIP queries - Validate against malicious payloads ### Network Security - Internal service communication only - No direct external access to service - HTTPS for USPS API calls - Redis authentication if exposed ### Data Protection - No persistent logging of addresses - Secure cache key generation - Regular security updates for dependencies ## Future Enhancements ### Phase 2 Features - International address validation (Google/SmartyStreets) - Address autocomplete suggestions - Geocoding integration - Delivery route optimization ### Performance Optimizations - Database partitioning by state - Compressed cache storage - Async batch processing - CDN for static ZIP data ## Cost Analysis ### Infrastructure Costs - Additional container resources: ~$10/month - Redis cache: ~$5/month - USPS ZIP data storage: Minimal - USPS API: Free tier (10K requests/day) ### Development Time - Initial implementation: 3-5 days - Testing and refinement: 1-2 days - Documentation and deployment: 0.5 day - **Total: 4.5-7.5 days** ### ROI - Improved data quality - Reduced shipping errors - Better client communication - Compliance with data standards - Foundation for future location-based features