Files
delphi-database/docs/ADDRESS_VALIDATION_SERVICE.md
2025-08-14 21:40:49 -05:00

7.2 KiB

Address Validation Service Design

Overview

A separate Docker service for validating and standardizing addresses using a hybrid approach that prioritizes privacy and minimizes external API calls.

Architecture

Service Design

  • Standalone FastAPI service running on port 8001
  • SQLite database containing USPS ZIP+4 data (~500MB)
  • USPS API integration for street-level validation when needed
  • Redis cache for validated addresses
  • Internal HTTP API for communication with main legal application

Data Flow

1. Legal App → Address Service (POST /validate)
2. Address Service checks local ZIP database
3. If ZIP/city/state valid → return immediately
4. If street validation needed → call USPS API
5. Cache result in Redis
6. Return standardized address to Legal App

Technical Requirements

Dependencies

  • FastAPI framework
  • SQLAlchemy for database operations
  • SQLite for ZIP+4 database storage
  • Redis for caching validated addresses
  • httpx for USPS API calls
  • Pydantic for request/response validation

Database Schema

-- ZIP+4 Database (from USPS monthly files)
CREATE TABLE zip_codes (
    zip_code TEXT,
    plus4 TEXT,
    city TEXT,
    state TEXT,
    county TEXT,
    delivery_point TEXT,
    PRIMARY KEY (zip_code, plus4)
);

CREATE INDEX idx_zip_city ON zip_codes(zip_code, city);
CREATE INDEX idx_city_state ON zip_codes(city, state);

API Endpoints

POST /validate

Validate and standardize an address.

Request:

{
    "street": "123 Main St",
    "city": "Anytown",
    "state": "CA", 
    "zip": "90210",
    "strict": false  // Optional: require exact match
}

Response:

{
    "valid": true,
    "confidence": 0.95,
    "source": "local",  // "local", "usps_api", "cached"
    "standardized": {
        "street": "123 MAIN ST",
        "city": "ANYTOWN",
        "state": "CA",
        "zip": "90210",
        "plus4": "1234",
        "delivery_point": "12"
    },
    "corrections": [
        "Standardized street abbreviation ST"
    ]
}

GET /health

Health check endpoint.

POST /batch-validate

Batch validation for multiple addresses (up to 50).

GET /stats

Service statistics (cache hit rate, API usage, etc.).

Privacy & Security Features

Data Minimization

  • Only street numbers/names sent to USPS API when necessary
  • ZIP/city/state validation happens offline first
  • Validated addresses cached to avoid repeat API calls
  • No logging of personal addresses

Rate Limiting

  • USPS API limited to 5 requests/second
  • Internal queue system for burst requests
  • Fallback to local-only validation when rate limited

Caching Strategy

  • Redis cache with 30-day TTL for validated addresses
  • Cache key: SHA256 hash of normalized address
  • Cache hit ratio target: >80% after initial warmup

Data Sources

USPS ZIP+4 Database

  • Source: USPS Address Management System
  • Update frequency: Monthly
  • Size: ~500MB compressed, ~2GB uncompressed
  • Format: Fixed-width text files (legacy format)
  • Download: Automated monthly sync via USPS FTP

USPS Address Validation API

Implementation Phases

Phase 1: Basic Service (1-2 days)

  • FastAPI service setup
  • Basic ZIP code validation using downloaded USPS data
  • Docker containerization
  • Simple /validate endpoint

Phase 2: USPS Integration (1 day)

  • USPS API client implementation
  • Street-level validation
  • Error handling and fallbacks

Phase 3: Caching & Optimization (1 day)

  • Redis integration
  • Performance optimization
  • Batch validation endpoint

Phase 4: Data Management (1 day)

  • Automated USPS data downloads
  • Database update procedures
  • Monitoring and alerting

Phase 5: Integration (0.5 day)

  • Update legal app to use address service
  • Form validation integration
  • Error handling in UI

Docker Configuration

Dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
EXPOSE 8001

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]

Docker Compose Addition

services:
  address-service:
    build: ./address-service
    ports:
      - "8001:8001"
    environment:
      - USPS_USER_ID=${USPS_USER_ID}
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis
    volumes:
      - ./address-service/data:/app/data

Configuration

Environment Variables

  • USPS_USER_ID: USPS Web Tools user ID
  • REDIS_URL: Redis connection string
  • ZIP_DB_PATH: Path to SQLite ZIP database
  • UPDATE_SCHEDULE: Cron schedule for data updates
  • API_RATE_LIMIT: USPS API rate limit (default: 5/second)
  • CACHE_TTL: Cache time-to-live in seconds (default: 2592000 = 30 days)

Monitoring & Metrics

Key Metrics

  • Cache hit ratio
  • USPS API usage/limits
  • Response times (local vs API)
  • Validation success rates
  • Database update status

Health Checks

  • Service availability
  • Database connectivity
  • Redis connectivity
  • USPS API connectivity
  • Disk space for ZIP database

Error Handling

Graceful Degradation

  1. USPS API down → Fall back to local ZIP validation only
  2. Redis down → Skip caching, direct validation
  3. ZIP database corrupt → Use USPS API only
  4. All systems down → Return input address with warning

Error Responses

{
    "valid": false,
    "error": "USPS_API_UNAVAILABLE",
    "message": "Street validation temporarily unavailable",
    "fallback_used": "local_zip_only"
}

Testing Strategy

Unit Tests

  • Address normalization functions
  • ZIP database queries
  • USPS API client
  • Caching logic

Integration Tests

  • Full validation workflow
  • Error handling scenarios
  • Performance benchmarks
  • Data update procedures

Load Testing

  • Concurrent validation requests
  • Cache performance under load
  • USPS API rate limiting behavior

Security Considerations

Input Validation

  • Sanitize all address inputs
  • Prevent SQL injection in ZIP queries
  • Validate against malicious payloads

Network Security

  • Internal service communication only
  • No direct external access to service
  • HTTPS for USPS API calls
  • Redis authentication if exposed

Data Protection

  • No persistent logging of addresses
  • Secure cache key generation
  • Regular security updates for dependencies

Future Enhancements

Phase 2 Features

  • International address validation (Google/SmartyStreets)
  • Address autocomplete suggestions
  • Geocoding integration
  • Delivery route optimization

Performance Optimizations

  • Database partitioning by state
  • Compressed cache storage
  • Async batch processing
  • CDN for static ZIP data

Cost Analysis

Infrastructure Costs

  • Additional container resources: ~$10/month
  • Redis cache: ~$5/month
  • USPS ZIP data storage: Minimal
  • USPS API: Free tier (10K requests/day)

Development Time

  • Initial implementation: 3-5 days
  • Testing and refinement: 1-2 days
  • Documentation and deployment: 0.5 day
  • Total: 4.5-7.5 days

ROI

  • Improved data quality
  • Reduced shipping errors
  • Better client communication
  • Compliance with data standards
  • Foundation for future location-based features