fixing rolodex and search
This commit is contained in:
304
ADDRESS_VALIDATION_SERVICE.md
Normal file
304
ADDRESS_VALIDATION_SERVICE.md
Normal file
@@ -0,0 +1,304 @@
|
||||
# Address Validation Service Design
|
||||
|
||||
## Overview
|
||||
A separate Docker service for validating and standardizing addresses using a hybrid approach that prioritizes privacy and minimizes external API calls.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Service Design
|
||||
- **Standalone FastAPI service** running on port 8001
|
||||
- **SQLite database** containing USPS ZIP+4 data (~500MB)
|
||||
- **USPS API integration** for street-level validation when needed
|
||||
- **Redis cache** for validated addresses
|
||||
- **Internal HTTP API** for communication with main legal application
|
||||
|
||||
### Data Flow
|
||||
```
|
||||
1. Legal App → Address Service (POST /validate)
|
||||
2. Address Service checks local ZIP database
|
||||
3. If ZIP/city/state valid → return immediately
|
||||
4. If street validation needed → call USPS API
|
||||
5. Cache result in Redis
|
||||
6. Return standardized address to Legal App
|
||||
```
|
||||
|
||||
## Technical Requirements
|
||||
|
||||
### Dependencies
|
||||
- FastAPI framework
|
||||
- SQLAlchemy for database operations
|
||||
- SQLite for ZIP+4 database storage
|
||||
- Redis for caching validated addresses
|
||||
- httpx for USPS API calls
|
||||
- Pydantic for request/response validation
|
||||
|
||||
### Database Schema
|
||||
```sql
|
||||
-- ZIP+4 Database (from USPS monthly files)
|
||||
CREATE TABLE zip_codes (
|
||||
zip_code TEXT,
|
||||
plus4 TEXT,
|
||||
city TEXT,
|
||||
state TEXT,
|
||||
county TEXT,
|
||||
delivery_point TEXT,
|
||||
PRIMARY KEY (zip_code, plus4)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_zip_city ON zip_codes(zip_code, city);
|
||||
CREATE INDEX idx_city_state ON zip_codes(city, state);
|
||||
```
|
||||
|
||||
### API Endpoints
|
||||
|
||||
#### POST /validate
|
||||
Validate and standardize an address.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"street": "123 Main St",
|
||||
"city": "Anytown",
|
||||
"state": "CA",
|
||||
"zip": "90210",
|
||||
"strict": false // Optional: require exact match
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"valid": true,
|
||||
"confidence": 0.95,
|
||||
"source": "local", // "local", "usps_api", "cached"
|
||||
"standardized": {
|
||||
"street": "123 MAIN ST",
|
||||
"city": "ANYTOWN",
|
||||
"state": "CA",
|
||||
"zip": "90210",
|
||||
"plus4": "1234",
|
||||
"delivery_point": "12"
|
||||
},
|
||||
"corrections": [
|
||||
"Standardized street abbreviation ST"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /health
|
||||
Health check endpoint.
|
||||
|
||||
#### POST /batch-validate
|
||||
Batch validation for multiple addresses (up to 50).
|
||||
|
||||
#### GET /stats
|
||||
Service statistics (cache hit rate, API usage, etc.).
|
||||
|
||||
## Privacy & Security Features
|
||||
|
||||
### Data Minimization
|
||||
- Only street numbers/names sent to USPS API when necessary
|
||||
- ZIP/city/state validation happens offline first
|
||||
- Validated addresses cached to avoid repeat API calls
|
||||
- No logging of personal addresses
|
||||
|
||||
### Rate Limiting
|
||||
- USPS API limited to 5 requests/second
|
||||
- Internal queue system for burst requests
|
||||
- Fallback to local-only validation when rate limited
|
||||
|
||||
### Caching Strategy
|
||||
- Redis cache with 30-day TTL for validated addresses
|
||||
- Cache key: SHA256 hash of normalized address
|
||||
- Cache hit ratio target: >80% after initial warmup
|
||||
|
||||
## Data Sources
|
||||
|
||||
### USPS ZIP+4 Database
|
||||
- **Source:** USPS Address Management System
|
||||
- **Update frequency:** Monthly
|
||||
- **Size:** ~500MB compressed, ~2GB uncompressed
|
||||
- **Format:** Fixed-width text files (legacy format)
|
||||
- **Download:** Automated monthly sync via USPS FTP
|
||||
|
||||
### USPS Address Validation API
|
||||
- **Endpoint:** https://secure.shippingapis.com/ShippingAPI.dll
|
||||
- **Rate limit:** 5 requests/second, 10,000/day free
|
||||
- **Authentication:** USPS Web Tools User ID required
|
||||
- **Response format:** XML (convert to JSON internally)
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Basic Service (1-2 days)
|
||||
- FastAPI service setup
|
||||
- Basic ZIP code validation using downloaded USPS data
|
||||
- Docker containerization
|
||||
- Simple /validate endpoint
|
||||
|
||||
### Phase 2: USPS Integration (1 day)
|
||||
- USPS API client implementation
|
||||
- Street-level validation
|
||||
- Error handling and fallbacks
|
||||
|
||||
### Phase 3: Caching & Optimization (1 day)
|
||||
- Redis integration
|
||||
- Performance optimization
|
||||
- Batch validation endpoint
|
||||
|
||||
### Phase 4: Data Management (1 day)
|
||||
- Automated USPS data downloads
|
||||
- Database update procedures
|
||||
- Monitoring and alerting
|
||||
|
||||
### Phase 5: Integration (0.5 day)
|
||||
- Update legal app to use address service
|
||||
- Form validation integration
|
||||
- Error handling in UI
|
||||
|
||||
## Docker Configuration
|
||||
|
||||
### Dockerfile
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
COPY requirements.txt .
|
||||
RUN pip install -r requirements.txt
|
||||
|
||||
COPY . .
|
||||
EXPOSE 8001
|
||||
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]
|
||||
```
|
||||
|
||||
### Docker Compose Addition
|
||||
```yaml
|
||||
services:
|
||||
address-service:
|
||||
build: ./address-service
|
||||
ports:
|
||||
- "8001:8001"
|
||||
environment:
|
||||
- USPS_USER_ID=${USPS_USER_ID}
|
||||
- REDIS_URL=redis://redis:6379
|
||||
depends_on:
|
||||
- redis
|
||||
volumes:
|
||||
- ./address-service/data:/app/data
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
- `USPS_USER_ID`: USPS Web Tools user ID
|
||||
- `REDIS_URL`: Redis connection string
|
||||
- `ZIP_DB_PATH`: Path to SQLite ZIP database
|
||||
- `UPDATE_SCHEDULE`: Cron schedule for data updates
|
||||
- `API_RATE_LIMIT`: USPS API rate limit (default: 5/second)
|
||||
- `CACHE_TTL`: Cache time-to-live in seconds (default: 2592000 = 30 days)
|
||||
|
||||
## Monitoring & Metrics
|
||||
|
||||
### Key Metrics
|
||||
- Cache hit ratio
|
||||
- USPS API usage/limits
|
||||
- Response times (local vs API)
|
||||
- Validation success rates
|
||||
- Database update status
|
||||
|
||||
### Health Checks
|
||||
- Service availability
|
||||
- Database connectivity
|
||||
- Redis connectivity
|
||||
- USPS API connectivity
|
||||
- Disk space for ZIP database
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Graceful Degradation
|
||||
1. USPS API down → Fall back to local ZIP validation only
|
||||
2. Redis down → Skip caching, direct validation
|
||||
3. ZIP database corrupt → Use USPS API only
|
||||
4. All systems down → Return input address with warning
|
||||
|
||||
### Error Responses
|
||||
```json
|
||||
{
|
||||
"valid": false,
|
||||
"error": "USPS_API_UNAVAILABLE",
|
||||
"message": "Street validation temporarily unavailable",
|
||||
"fallback_used": "local_zip_only"
|
||||
}
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- Address normalization functions
|
||||
- ZIP database queries
|
||||
- USPS API client
|
||||
- Caching logic
|
||||
|
||||
### Integration Tests
|
||||
- Full validation workflow
|
||||
- Error handling scenarios
|
||||
- Performance benchmarks
|
||||
- Data update procedures
|
||||
|
||||
### Load Testing
|
||||
- Concurrent validation requests
|
||||
- Cache performance under load
|
||||
- USPS API rate limiting behavior
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Input Validation
|
||||
- Sanitize all address inputs
|
||||
- Prevent SQL injection in ZIP queries
|
||||
- Validate against malicious payloads
|
||||
|
||||
### Network Security
|
||||
- Internal service communication only
|
||||
- No direct external access to service
|
||||
- HTTPS for USPS API calls
|
||||
- Redis authentication if exposed
|
||||
|
||||
### Data Protection
|
||||
- No persistent logging of addresses
|
||||
- Secure cache key generation
|
||||
- Regular security updates for dependencies
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Phase 2 Features
|
||||
- International address validation (Google/SmartyStreets)
|
||||
- Address autocomplete suggestions
|
||||
- Geocoding integration
|
||||
- Delivery route optimization
|
||||
|
||||
### Performance Optimizations
|
||||
- Database partitioning by state
|
||||
- Compressed cache storage
|
||||
- Async batch processing
|
||||
- CDN for static ZIP data
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
### Infrastructure Costs
|
||||
- Additional container resources: ~$10/month
|
||||
- Redis cache: ~$5/month
|
||||
- USPS ZIP data storage: Minimal
|
||||
- USPS API: Free tier (10K requests/day)
|
||||
|
||||
### Development Time
|
||||
- Initial implementation: 3-5 days
|
||||
- Testing and refinement: 1-2 days
|
||||
- Documentation and deployment: 0.5 day
|
||||
- **Total: 4.5-7.5 days**
|
||||
|
||||
### ROI
|
||||
- Improved data quality
|
||||
- Reduced shipping errors
|
||||
- Better client communication
|
||||
- Compliance with data standards
|
||||
- Foundation for future location-based features
|
||||
Reference in New Issue
Block a user