This commit is contained in:
HotSwapp
2025-08-18 20:20:04 -05:00
parent 89b2bc0aa2
commit bac8cc4bd5
114 changed files with 30258 additions and 1341 deletions

View File

@@ -1,304 +0,0 @@
# Address Validation Service Design
## Overview
A separate Docker service for validating and standardizing addresses using a hybrid approach that prioritizes privacy and minimizes external API calls.
## Architecture
### Service Design
- **Standalone FastAPI service** running on port 8001
- **SQLite database** containing USPS ZIP+4 data (~500MB)
- **USPS API integration** for street-level validation when needed
- **Redis cache** for validated addresses
- **Internal HTTP API** for communication with main legal application
### Data Flow
```
1. Legal App → Address Service (POST /validate)
2. Address Service checks local ZIP database
3. If ZIP/city/state valid → return immediately
4. If street validation needed → call USPS API
5. Cache result in Redis
6. Return standardized address to Legal App
```
## Technical Requirements
### Dependencies
- FastAPI framework
- SQLAlchemy for database operations
- SQLite for ZIP+4 database storage
- Redis for caching validated addresses
- httpx for USPS API calls
- Pydantic for request/response validation
### Database Schema
```sql
-- ZIP+4 Database (from USPS monthly files)
CREATE TABLE zip_codes (
zip_code TEXT,
plus4 TEXT,
city TEXT,
state TEXT,
county TEXT,
delivery_point TEXT,
PRIMARY KEY (zip_code, plus4)
);
CREATE INDEX idx_zip_city ON zip_codes(zip_code, city);
CREATE INDEX idx_city_state ON zip_codes(city, state);
```
### API Endpoints
#### POST /validate
Validate and standardize an address.
**Request:**
```json
{
"street": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "90210",
"strict": false // Optional: require exact match
}
```
**Response:**
```json
{
"valid": true,
"confidence": 0.95,
"source": "local", // "local", "usps_api", "cached"
"standardized": {
"street": "123 MAIN ST",
"city": "ANYTOWN",
"state": "CA",
"zip": "90210",
"plus4": "1234",
"delivery_point": "12"
},
"corrections": [
"Standardized street abbreviation ST"
]
}
```
#### GET /health
Health check endpoint.
#### POST /batch-validate
Batch validation for multiple addresses (up to 50).
#### GET /stats
Service statistics (cache hit rate, API usage, etc.).
## Privacy & Security Features
### Data Minimization
- Only street numbers/names sent to USPS API when necessary
- ZIP/city/state validation happens offline first
- Validated addresses cached to avoid repeat API calls
- No logging of personal addresses
### Rate Limiting
- USPS API limited to 5 requests/second
- Internal queue system for burst requests
- Fallback to local-only validation when rate limited
### Caching Strategy
- Redis cache with 30-day TTL for validated addresses
- Cache key: SHA256 hash of normalized address
- Cache hit ratio target: >80% after initial warmup
## Data Sources
### USPS ZIP+4 Database
- **Source:** USPS Address Management System
- **Update frequency:** Monthly
- **Size:** ~500MB compressed, ~2GB uncompressed
- **Format:** Fixed-width text files (legacy format)
- **Download:** Automated monthly sync via USPS FTP
### USPS Address Validation API
- **Endpoint:** https://secure.shippingapis.com/ShippingAPI.dll
- **Rate limit:** 5 requests/second, 10,000/day free
- **Authentication:** USPS Web Tools User ID required
- **Response format:** XML (convert to JSON internally)
## Implementation Phases
### Phase 1: Basic Service (1-2 days)
- FastAPI service setup
- Basic ZIP code validation using downloaded USPS data
- Docker containerization
- Simple /validate endpoint
### Phase 2: USPS Integration (1 day)
- USPS API client implementation
- Street-level validation
- Error handling and fallbacks
### Phase 3: Caching & Optimization (1 day)
- Redis integration
- Performance optimization
- Batch validation endpoint
### Phase 4: Data Management (1 day)
- Automated USPS data downloads
- Database update procedures
- Monitoring and alerting
### Phase 5: Integration (0.5 day)
- Update legal app to use address service
- Form validation integration
- Error handling in UI
## Docker Configuration
### Dockerfile
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8001
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8001"]
```
### Docker Compose Addition
```yaml
services:
address-service:
build: ./address-service
ports:
- "8001:8001"
environment:
- USPS_USER_ID=${USPS_USER_ID}
- REDIS_URL=redis://redis:6379
depends_on:
- redis
volumes:
- ./address-service/data:/app/data
```
## Configuration
### Environment Variables
- `USPS_USER_ID`: USPS Web Tools user ID
- `REDIS_URL`: Redis connection string
- `ZIP_DB_PATH`: Path to SQLite ZIP database
- `UPDATE_SCHEDULE`: Cron schedule for data updates
- `API_RATE_LIMIT`: USPS API rate limit (default: 5/second)
- `CACHE_TTL`: Cache time-to-live in seconds (default: 2592000 = 30 days)
## Monitoring & Metrics
### Key Metrics
- Cache hit ratio
- USPS API usage/limits
- Response times (local vs API)
- Validation success rates
- Database update status
### Health Checks
- Service availability
- Database connectivity
- Redis connectivity
- USPS API connectivity
- Disk space for ZIP database
## Error Handling
### Graceful Degradation
1. USPS API down → Fall back to local ZIP validation only
2. Redis down → Skip caching, direct validation
3. ZIP database corrupt → Use USPS API only
4. All systems down → Return input address with warning
### Error Responses
```json
{
"valid": false,
"error": "USPS_API_UNAVAILABLE",
"message": "Street validation temporarily unavailable",
"fallback_used": "local_zip_only"
}
```
## Testing Strategy
### Unit Tests
- Address normalization functions
- ZIP database queries
- USPS API client
- Caching logic
### Integration Tests
- Full validation workflow
- Error handling scenarios
- Performance benchmarks
- Data update procedures
### Load Testing
- Concurrent validation requests
- Cache performance under load
- USPS API rate limiting behavior
## Security Considerations
### Input Validation
- Sanitize all address inputs
- Prevent SQL injection in ZIP queries
- Validate against malicious payloads
### Network Security
- Internal service communication only
- No direct external access to service
- HTTPS for USPS API calls
- Redis authentication if exposed
### Data Protection
- No persistent logging of addresses
- Secure cache key generation
- Regular security updates for dependencies
## Future Enhancements
### Phase 2 Features
- International address validation (Google/SmartyStreets)
- Address autocomplete suggestions
- Geocoding integration
- Delivery route optimization
### Performance Optimizations
- Database partitioning by state
- Compressed cache storage
- Async batch processing
- CDN for static ZIP data
## Cost Analysis
### Infrastructure Costs
- Additional container resources: ~$10/month
- Redis cache: ~$5/month
- USPS ZIP data storage: Minimal
- USPS API: Free tier (10K requests/day)
### Development Time
- Initial implementation: 3-5 days
- Testing and refinement: 1-2 days
- Documentation and deployment: 0.5 day
- **Total: 4.5-7.5 days**
### ROI
- Improved data quality
- Reduced shipping errors
- Better client communication
- Compliance with data standards
- Foundation for future location-based features

View File

@@ -0,0 +1,260 @@
# Advanced Template Features Documentation
This document explains the enhanced template system with advanced features like conditional sections, loops, rich variable formatting, and PDF generation.
## Overview
The enhanced template system supports:
- **Conditional Content Blocks** - Show/hide content based on conditions
- **Loop Functionality** - Repeat content for data tables and lists
- **Rich Variable Formatting** - Apply formatting filters to variables
- **Template Functions** - Built-in functions for data manipulation
- **PDF Generation** - Convert DOCX templates to PDF using LibreOffice
- **Advanced Variable Resolution** - Enhanced variable processing with caching
## Template Syntax
### Basic Variables
```
{{ variable_name }}
```
Standard variable substitution from context or database.
### Formatted Variables
```
{{ variable_name | format_spec }}
```
Apply formatting to variables:
- `{{ amount | currency }}``$1,234.56`
- `{{ date | date:%m/%d/%Y }}``12/25/2023`
- `{{ phone | phone }}``(555) 123-4567`
- `{{ text | upper }}``UPPERCASE TEXT`
### Conditional Sections
```
{% if condition %}
Content to show if condition is true
{% else %}
Content to show if condition is false (optional)
{% endif %}
```
Examples:
```
{% if CLIENT_BALANCE > 0 %}
Outstanding balance: {{ CLIENT_BALANCE | currency }}
{% else %}
Account is current
{% endif %}
```
### Loop Sections
```
{% for item in collection %}
Content repeated for each item
Access item properties: {{ item.property }}
{% endfor %}
```
Loop variables available inside loops:
- `{{ item_index }}` - Current index (1-based)
- `{{ item_index0 }}` - Current index (0-based)
- `{{ item_first }}` - True if first item
- `{{ item_last }}` - True if last item
- `{{ item_length }}` - Total number of items
Example:
```
{% for payment in payments %}
{{ payment_index }}. {{ payment.date | date }} - {{ payment.amount | currency }}
{% endfor %}
```
### Template Functions
```
{{ function_name(arg1, arg2) }}
```
Built-in functions:
- `{{ format_currency(amount, "$", 2) }}`
- `{{ format_date(date, "%B %d, %Y") }}`
- `{{ math_add(value1, value2) }}`
- `{{ join(items, ", ") }}`
## Variable Formatting Options
### Currency Formatting
| Format | Example Input | Output |
|--------|---------------|--------|
| `currency` | 1234.56 | $1,234.56 |
| `currency:€` | 1234.56 | €1,234.56 |
| `currency:$:0` | 1234.56 | $1,235 |
### Date Formatting
| Format | Example Input | Output |
|--------|---------------|--------|
| `date` | 2023-12-25 | December 25, 2023 |
| `date:%m/%d/%Y` | 2023-12-25 | 12/25/2023 |
| `date:%B %d` | 2023-12-25 | December 25 |
### Number Formatting
| Format | Example Input | Output |
|--------|---------------|--------|
| `number` | 1234.5678 | 1,234.57 |
| `number:1` | 1234.5678 | 1,234.6 |
| `number:2: ` | 1234.5678 | 1 234.57 |
### Text Transformations
| Format | Example Input | Output |
|--------|---------------|--------|
| `upper` | hello world | HELLO WORLD |
| `lower` | HELLO WORLD | hello world |
| `title` | hello world | Hello World |
| `truncate:10` | Very long text | Very lo... |
## API Endpoints
### Generate Advanced Document
```http
POST /api/templates/{template_id}/generate-advanced
```
Request body:
```json
{
"context": {
"CLIENT_NAME": "John Doe",
"AMOUNT": 1500.00,
"payments": [
{"date": "2023-01-15", "amount": 500.00},
{"date": "2023-02-15", "amount": 1000.00}
]
},
"output_format": "PDF",
"enable_conditionals": true,
"enable_loops": true,
"enable_formatting": true,
"enable_functions": true
}
```
### Analyze Template
```http
POST /api/templates/{template_id}/analyze
```
Analyzes template complexity and features used.
### Test Variable Formatting
```http
POST /api/templates/test-formatting
```
Test formatting without generating full document:
```json
{
"variable_value": "1234.56",
"format_spec": "currency:€:0"
}
```
## Example Template
Here's a complete example template showcasing advanced features:
```docx
LEGAL INVOICE
Client: {{ CLIENT_NAME | title }}
Date: {{ TODAY | date }}
{% if CLIENT_BALANCE > 0 %}
NOTICE: Outstanding balance of {{ CLIENT_BALANCE | currency }}
{% endif %}
Services Provided:
{% for service in services %}
{{ service_index }}. {{ service.description }}
Date: {{ service.date | date:%m/%d/%Y }}
Hours: {{ service.hours | number:1 }}
Rate: {{ service.rate | currency }}
Amount: {{ service.amount | currency }}
{% endfor %}
Total: {{ format_currency(total_amount) }}
{% if payment_terms %}
Payment Terms: {{ payment_terms }}
{% else %}
Payment due within 30 days
{% endif %}
```
## PDF Generation Setup
For PDF generation, LibreOffice must be installed on the server:
### Ubuntu/Debian
```bash
sudo apt-get update
sudo apt-get install libreoffice
```
### Docker
Add to Dockerfile:
```dockerfile
RUN apt-get update && apt-get install -y libreoffice
```
### Usage
Set `output_format: "PDF"` in the generation request.
## Error Handling
The system gracefully handles errors:
- **Missing variables** - Listed in `unresolved` array
- **Invalid conditions** - Default to false
- **Failed loops** - Skip section
- **PDF conversion errors** - Fall back to DOCX
- **Formatting errors** - Return original value
## Performance Considerations
- **Variable caching** - Expensive calculations are cached
- **Template analysis** - Analyze templates to optimize processing
- **Conditional short-circuiting** - Skip processing unused sections
- **Loop optimization** - Efficient handling of large datasets
## Migration from Basic Templates
Existing templates continue to work unchanged. To use advanced features:
1. Add formatting to variables: `{{ amount }}``{{ amount | currency }}`
2. Add conditionals for optional content
3. Use loops for repeating data
4. Test with the analyze endpoint
5. Enable PDF output if needed
## Security
- **Safe evaluation** - Template expressions run in restricted environment
- **Input validation** - All template inputs are validated
- **Resource limits** - Processing timeouts prevent infinite loops
- **Access control** - Template access follows existing permissions

View File

@@ -25,7 +25,7 @@ This guide covers the complete data migration process for importing legacy Delph
| STATES.csv | State | ✅ Ready | US States lookup |
| FILETYPE.csv | FileType | ✅ Ready | File type categories |
| FILESTAT.csv | FileStatus | ✅ Ready | File status codes |
| TRNSTYPE.csv | TransactionType | ⚠️ Partial | Some field mappings incomplete |
| TRNSTYPE.csv | TransactionType | ✅ Ready | Transaction type definitions |
| TRNSLKUP.csv | TransactionCode | ✅ Ready | Transaction lookup codes |
| GRUPLKUP.csv | GroupLookup | ✅ Ready | Group categories |
| FOOTERS.csv | Footer | ✅ Ready | Statement footer templates |
@@ -61,11 +61,11 @@ This guide covers the complete data migration process for importing legacy Delph
- STATES.csv
- EMPLOYEE.csv
- FILETYPE.csv
- FOOTERS.csv
- FILESTAT.csv
- TRNSTYPE.csv
- TRNSLKUP.csv
- GRUPLKUP.csv
- FOOTERS.csv
- PLANINFO.csv
- FVARLKUP.csv (form variables)
- RVARLKUP.csv (report variables)

View File

@@ -41,8 +41,8 @@ Based on the comprehensive analysis of the legacy Paradox system, this document
#### 1.4 Form Selection Interface
- [x] Multi-template selection UI
- [x] Template preview and description display
- [ ] Batch document generation (planned for future iteration)
- [ ] Generated document management (planned for future iteration)
- [x] Batch document generation (MVP synchronous; async planned)
- [x] Generated document management (store outputs, link to `File`, list/delete)
**API Endpoints Needed**:
```
@@ -83,10 +83,11 @@ POST /api/documents/generate-batch
- [x] Auditing: record variables resolved and their sources (context vs `FormVariable`/`ReportVariable`)
#### 1.8 Batch Generation
- [x] Synchronous batch merges (MVP; per-item results returned immediately)
- [ ] Async queue jobs for batch merges (Celery/RQ) with progress tracking (future iteration)
- [ ] Idempotency keys to avoid duplicate batches (future iteration)
- [ ] Per-item success/failure reporting; partial retry support (future iteration)
- [ ] Output bundling: store each generated document; optional ZIP download of the set (future iteration)
- [x] Per-item success/failure reporting (MVP; partial retry future)
- [ ] Output bundling: optional ZIP download of the set (future iteration)
- [ ] Throttling and concurrency limits (future iteration)
- [ ] Audit trail: who initiated, when, template/version used, filters applied (future iteration)
@@ -221,6 +222,27 @@ POST /api/documents/generate-batch
⏳ GET /api/reports/account-aging # Future enhancement
```
### 🔴 4. Pension Valuation & Present Value Tools
**Legacy Feature**: Annuity Evaluator (present value calculations)
**Current Status**: ✅ **COMPLETED**
**Required Components**:
- Present value calculators for common pension/annuity scenarios (single life, joint & survivor)
- Integration with `LifeTable`/`NumberTable` for life expectancy and numeric factors
- Configurable discount/interest rates and COLA assumptions
- Support for pre/post-retirement adjustments and early/late commencement
- Validation/reporting of inputs and computed outputs
**API Endpoints Needed**:
```
POST /api/pensions/valuation/annuity # compute PV for specified inputs
POST /api/pensions/valuation/joint-survivor # compute PV with J&S parameters
GET /api/pensions/valuation/examples # sample scenarios for QA
```
---
## MEDIUM PRIORITY - Productivity Features
@@ -323,6 +345,26 @@ POST /api/documents/generate-batch
✅ GET /api/file-management/closure-candidates
```
### 🟡 5.1 Deposit Book & Payments Register
**Legacy Feature**: Daily deposit summaries and payments register
**Current Status**: ✅ **COMPLETED**
**Implemented Components**:
- Endpoints to create/list deposits and attach `Payment` records
- Summaries by date range and reconciliation helpers
- Export to CSV and printable reports
**API Endpoints Needed**:
```
GET /api/financial/deposits?start=…&end=…
POST /api/financial/deposits
POST /api/financial/deposits/{date}/payments
GET /api/financial/deposits/{date}
GET /api/financial/reports/deposits
```
### 🟡 6. Advanced Printer Management
**Legacy Feature**: Sophisticated printer configuration and report formatting
@@ -342,6 +384,8 @@ POST /api/documents/generate-batch
- [ ] Print preview functionality
- [ ] Batch printing capabilities
- [ ] Print queue management
- [ ] Envelope and mailing label generation from `Rolodex`/`Files`
- [ ] Phone book report outputs (numbers only, with addresses, full rolodex)
**Note**: Modern web applications typically rely on browser printing, but for a legal office, direct printer control might still be valuable.
@@ -353,7 +397,23 @@ POST /api/documents/generate-batch
**Legacy Feature**: Calendar management with appointment archival
**Current Status**: ❌ Not implemented
**Current Status**: ⚠️ Partially implemented
Implemented (Deadlines & Court Calendar):
- Deadline models and services: `Deadline`, `DeadlineReminder`, `DeadlineTemplate`, `CourtCalendar`
- Endpoints:
- CRUD: `POST/GET/PUT/DELETE /api/deadlines/…`
- Actions: `/api/deadlines/{id}/complete`, `/extend`, `/cancel`
- Templates: `GET/POST /api/deadlines/templates/`, `POST /api/deadlines/from-template/`
- Reporting: `/api/deadlines/reports/{upcoming|overdue|completion|workload|trends}`
- Notifications & alerts: `/api/deadlines/alerts/urgent`, `/alerts/process-daily`, preferences
- Calendar views: `/api/deadlines/calendar/{monthly|weekly|daily}`
- Export: `/api/deadlines/calendar/export/ical` (ICS)
Remaining (Appointments):
- General appointment models and scheduling (non-deadline events)
- Conflict detection across appointments (beyond deadlines)
- Appointment archival and lifecycle
**Required Components**:
@@ -473,6 +533,11 @@ POST /api/documents/generate-batch
- [ ] Accounting software integration
- [ ] Case management platforms
#### 12.3 Data Quality Services
- [ ] Address Validation Service (see `docs/ADDRESS_VALIDATION_SERVICE.md`)
- Standalone service, USPS ZIP+4 + USPS API integration
- Integration endpoints and UI validation for addresses
---
## IMPLEMENTATION ROADMAP

View File

@@ -132,6 +132,82 @@ SECURE_SSL_REDIRECT=True
- **CORS restrictions**
- **API rate limiting**
## 🛠️ Security Improvements Applied
### Backend Security (Python/FastAPI)
#### Critical Issues Resolved
- **SQL Injection Vulnerability** - Fixed in `app/database/schema_updates.py:125`
- Replaced f-string SQL queries with parameterized `text()` queries
- Status: ✅ FIXED
- **Weak Cryptography** - Fixed in `app/services/cache.py:45`
- Upgraded from SHA-1 to SHA-256 for hash generation
- Status: ✅ FIXED
#### Exception Handling Improvements
- **6 bare except statements** fixed in `app/api/admin.py`
- Added specific exception types and structured logging
- Status: ✅ FIXED
- **22+ files** with poor exception handling patterns improved
- Standardized error handling across the codebase
- Status: ✅ FIXED
#### Logging & Debugging
- **Print statement** in `app/api/import_data.py` replaced with structured logging
- **Debug console.log** statements removed from production templates
- Status: ✅ FIXED
### Frontend Security (JavaScript/HTML)
#### XSS Protection
- **Comprehensive HTML sanitization** using DOMPurify with fallback
- **Safe innerHTML usage** - all dynamic content goes through sanitization
- **Input validation** and HTML escaping for all user content
- Status: ✅ EXCELLENT
#### Modern JavaScript Practices
- **481 modern variable declarations** using `let`/`const`
- **35 proper event listeners** using `addEventListener`
- **97 try-catch blocks** with appropriate error handling
- **No dangerous patterns** (no `eval()`, `document.write()`, etc.)
- Status: ✅ EXCELLENT
## 🏗️ New Utility Modules Created
### Exception Handling (`app/utils/exceptions.py`)
- Centralized exception handling with decorators and context managers
- Standardized error types: `DatabaseError`, `BusinessLogicError`, `SecurityError`
- Decorators: `@handle_database_errors`, `@handle_validation_errors`, `@handle_security_errors`
- Safe execution utilities and error response builders
### Logging (`app/utils/logging.py`)
- Structured logging with specialized loggers
- **ImportLogger** - for import operations with progress tracking
- **SecurityLogger** - for security events and auth attempts
- **DatabaseLogger** - for query performance and transaction events
- Function call decorator for automatic logging
### Security Auditing (`app/utils/security.py`)
- **CredentialValidator** for detecting hardcoded secrets
- **PasswordStrengthValidator** with secure password generation
- Code scanning for common security vulnerabilities
- Automated security reporting
## 📊 Security Audit Results
### Before Improvements
- **3 issues** (1 critical, 2 medium)
- SQL injection vulnerability
- Weak cryptographic algorithms
- Hardcoded IP addresses
### After Improvements
- **1 issue** (1 medium - acceptable hardcoded IP for development)
- **99% Security Score**
- ✅ **Zero critical vulnerabilities**
## 🚨 Incident Response
### If Secrets Are Accidentally Committed
@@ -174,7 +250,7 @@ git push origin --force --all
5. **Forensic analysis** - How did it happen?
6. **Strengthen defenses** - Prevent recurrence
## 📊 Security Monitoring
## 📊 Monitoring & Logs
### Health Checks
```bash
@@ -243,8 +319,6 @@ grep "401\|403" access.log
3. **Within 24 hours**: Document incident
4. **Within 72 hours**: Complete investigation
---
## ✅ Security Verification Checklist
Before going to production, verify:
@@ -262,4 +336,50 @@ Before going to production, verify:
- [ ] Incident response plan documented
- [ ] Team trained on security procedures
**Remember: Security is everyone's responsibility!**
## 📈 Current Security Status
### Code Quality
- **~15K lines** of Python backend code
- **~22K lines** of frontend code (HTML/CSS/JS)
- **175 classes** with modular architecture
- **Zero technical debt markers** (no TODOs/FIXMEs)
### Security Practices
- Multi-layered XSS protection
- Parameterized database queries
- Secure authentication with JWT rotation
- Comprehensive input validation
- Structured error handling
### Testing & Validation
- **111 tests** collected
- **108 passed, 4 skipped, 9 warnings**
- ✅ **All tests passing**
- Comprehensive coverage of API endpoints, validation, and security features
## 🎯 Recommendations for Production
### Immediate Actions
1. Set `SECRET_KEY` environment variable with 32+ character random string
2. Configure Redis for caching if high performance needed
3. Set up log rotation and monitoring
4. Configure reverse proxy with security headers
### Security Headers (Infrastructure Level)
Consider implementing at reverse proxy level:
- `Content-Security-Policy`
- `X-Frame-Options: DENY`
- `X-Content-Type-Options: nosniff`
- `Strict-Transport-Security`
### Monitoring
- Set up log aggregation and alerting
- Monitor security events via `SecurityLogger`
- Track database performance via `DatabaseLogger`
- Monitor import operations via `ImportLogger`
---
**Remember: Security is everyone's responsibility!**
The Delphi Consulting Group Database System now demonstrates **enterprise-grade security practices** with zero critical vulnerabilities, comprehensive error handling, modern secure frontend practices, and production-ready configuration.

View File

@@ -1,190 +0,0 @@
# Security & Code Quality Improvements
## Overview
Comprehensive security audit and code quality improvements implemented for the Delphi Consulting Group Database System. All critical security vulnerabilities have been eliminated and enterprise-grade practices implemented.
## 🛡️ Security Fixes Applied
### Backend Security (Python/FastAPI)
#### Critical Issues Resolved
- **SQL Injection Vulnerability** - Fixed in `app/database/schema_updates.py:125`
- Replaced f-string SQL queries with parameterized `text()` queries
- Status: ✅ FIXED
- **Weak Cryptography** - Fixed in `app/services/cache.py:45`
- Upgraded from SHA-1 to SHA-256 for hash generation
- Status: ✅ FIXED
#### Exception Handling Improvements
- **6 bare except statements** fixed in `app/api/admin.py`
- Added specific exception types and structured logging
- Status: ✅ FIXED
- **22+ files** with poor exception handling patterns improved
- Standardized error handling across the codebase
- Status: ✅ FIXED
#### Logging & Debugging
- **Print statement** in `app/api/import_data.py` replaced with structured logging
- **Debug console.log** statements removed from production templates
- Status: ✅ FIXED
### Frontend Security (JavaScript/HTML)
#### XSS Protection
- **Comprehensive HTML sanitization** using DOMPurify with fallback
- **Safe innerHTML usage** - all dynamic content goes through sanitization
- **Input validation** and HTML escaping for all user content
- Status: ✅ EXCELLENT
#### Modern JavaScript Practices
- **481 modern variable declarations** using `let`/`const`
- **35 proper event listeners** using `addEventListener`
- **97 try-catch blocks** with appropriate error handling
- **No dangerous patterns** (no `eval()`, `document.write()`, etc.)
- Status: ✅ EXCELLENT
## 🏗️ New Utility Modules Created
### Exception Handling (`app/utils/exceptions.py`)
- Centralized exception handling with decorators and context managers
- Standardized error types: `DatabaseError`, `BusinessLogicError`, `SecurityError`
- Decorators: `@handle_database_errors`, `@handle_validation_errors`, `@handle_security_errors`
- Safe execution utilities and error response builders
### Logging (`app/utils/logging.py`)
- Structured logging with specialized loggers
- **ImportLogger** - for import operations with progress tracking
- **SecurityLogger** - for security events and auth attempts
- **DatabaseLogger** - for query performance and transaction events
- Function call decorator for automatic logging
### Database Management (`app/utils/database.py`)
- Transaction management with `@transactional` decorator
- `db_transaction()` context manager with automatic rollback
- **BulkOperationManager** for large data operations
- Retry logic for transient database failures
### Security Auditing (`app/utils/security.py`)
- **CredentialValidator** for detecting hardcoded secrets
- **PasswordStrengthValidator** with secure password generation
- Code scanning for common security vulnerabilities
- Automated security reporting
### API Responses (`app/utils/responses.py`)
- Standardized error codes and response schemas
- **ErrorResponse**, **SuccessResponse**, **PaginatedResponse** classes
- Helper functions for common HTTP responses
- Consistent error envelope structure
## 📊 Security Audit Results
### Before Improvements
- **3 issues** (1 critical, 2 medium)
- SQL injection vulnerability
- Weak cryptographic algorithms
- Hardcoded IP addresses
### After Improvements
- **1 issue** (1 medium - acceptable hardcoded IP for development)
- **99% Security Score**
- ✅ **Zero critical vulnerabilities**
## 🧪 Testing & Validation
### Test Suite Results
- **111 tests** collected
- **108 passed, 4 skipped, 9 warnings**
- ✅ **All tests passing**
- Comprehensive coverage of:
- API endpoints and validation
- Search functionality and highlighting
- File uploads and imports
- Authentication and authorization
- Error handling patterns
### Database Integrity
- ✅ All core tables present and accessible
- ✅ Schema migrations working correctly
- ✅ FTS indexing operational
- ✅ Secondary indexes in place
### Module Import Validation
- ✅ All new utility modules import correctly
- ✅ No missing dependencies
- ✅ Backward compatibility maintained
## 🔧 Configuration & Infrastructure
### Environment Variables
- ✅ Secure configuration with `pydantic-settings`
- ✅ Required `SECRET_KEY` with no insecure defaults
- ✅ Environment precedence over `.env` files
- ✅ Support for key rotation with `previous_secret_key`
### Docker Security
- ✅ Non-root user (`delphi`) in containers
- ✅ Proper file ownership with `--chown` flags
- ✅ Minimal attack surface with slim base images
- ✅ Build-time security practices
### Logging Configuration
- ✅ Structured logging with loguru
- ✅ Configurable log levels and rotation
- ✅ Separate log files for different concerns
- ✅ Proper file permissions
## 📈 Performance & Quality Metrics
### Code Quality
- **~15K lines** of Python backend code
- **~22K lines** of frontend code (HTML/CSS/JS)
- **175 classes** with modular architecture
- **Zero technical debt markers** (no TODOs/FIXMEs)
### Security Practices
- Multi-layered XSS protection
- Parameterized database queries
- Secure authentication with JWT rotation
- Comprehensive input validation
- Structured error handling
### Monitoring & Observability
- Correlation ID tracking for request tracing
- Structured logging for debugging
- Performance metrics for database operations
- Security event logging
## 🎯 Recommendations for Production
### Immediate Actions
1. Set `SECRET_KEY` environment variable with 32+ character random string
2. Configure Redis for caching if high performance needed
3. Set up log rotation and monitoring
4. Configure reverse proxy with security headers
### Security Headers (Infrastructure Level)
Consider implementing at reverse proxy level:
- `Content-Security-Policy`
- `X-Frame-Options: DENY`
- `X-Content-Type-Options: nosniff`
- `Strict-Transport-Security`
### Monitoring
- Set up log aggregation and alerting
- Monitor security events via `SecurityLogger`
- Track database performance via `DatabaseLogger`
- Monitor import operations via `ImportLogger`
## ✅ Summary
The Delphi Consulting Group Database System now demonstrates **enterprise-grade security practices** with:
- **Zero critical security vulnerabilities**
- **Comprehensive error handling and logging**
- **Modern, secure frontend practices**
- **Robust testing and validation**
- **Production-ready configuration**
All improvements follow industry best practices and maintain full backward compatibility while significantly enhancing security posture and code quality.

349
docs/WEBSOCKET_POOLING.md Normal file
View File

@@ -0,0 +1,349 @@
# WebSocket Connection Pooling and Management
This document describes the WebSocket connection pooling system implemented in the Delphi Database application.
## Overview
The WebSocket pooling system provides:
- **Connection Pooling**: Efficient management of multiple concurrent WebSocket connections
- **Automatic Cleanup**: Removal of stale and inactive connections
- **Resource Management**: Prevention of memory leaks and resource exhaustion
- **Health Monitoring**: Connection health checks and heartbeat management
- **Topic-Based Broadcasting**: Efficient message distribution to subscriber groups
- **Admin Management**: Administrative tools for monitoring and managing connections
## Architecture
### Core Components
1. **WebSocketPool** (`app/services/websocket_pool.py`)
- Central connection pool manager
- Handles connection lifecycle
- Provides broadcasting and cleanup functionality
2. **WebSocketManager** (`app/middleware/websocket_middleware.py`)
- High-level interface for WebSocket operations
- Handles authentication and message processing
- Provides convenient decorators and utilities
3. **Admin API** (`app/api/admin.py`)
- Administrative endpoints for monitoring and management
- Connection statistics and health metrics
- Manual cleanup and broadcasting tools
### Key Features
#### Connection Management
- **Unique Connection IDs**: Each connection gets a unique identifier
- **User Association**: Connections can be associated with authenticated users
- **Topic Subscriptions**: Connections can subscribe to multiple topics
- **Metadata Storage**: Custom metadata can be attached to connections
#### Automatic Cleanup
- **Stale Connection Detection**: Identifies inactive connections
- **Background Cleanup**: Automatic removal of stale connections
- **Failed Message Cleanup**: Removes connections that fail to receive messages
- **Configurable Timeouts**: Customizable timeout settings
#### Health Monitoring
- **Heartbeat System**: Regular health checks via ping/pong
- **Connection State Tracking**: Monitors connection lifecycle states
- **Error Counting**: Tracks connection errors and failures
- **Activity Monitoring**: Tracks last activity timestamps
#### Broadcasting System
- **Topic-Based**: Efficient message distribution by topic
- **User-Based**: Send messages to all connections for a specific user
- **Selective Exclusion**: Exclude specific connections from broadcasts
- **Message Types**: Structured message format with type classification
## Configuration
### Pool Settings
```python
# Initialize WebSocket pool with custom settings
await initialize_websocket_pool(
cleanup_interval=60, # Cleanup check interval (seconds)
connection_timeout=300, # Connection timeout (seconds)
heartbeat_interval=30, # Heartbeat interval (seconds)
max_connections_per_topic=1000, # Max connections per topic
max_total_connections=10000 # Max total connections
)
```
### Environment Variables
The pool respects the following configuration from `app/config.py`:
- Database connection settings for user authentication
- Logging configuration for structured logging
- Security settings for token verification
## Usage Examples
### Basic WebSocket Endpoint
```python
from app.middleware.websocket_middleware import websocket_endpoint
@router.websocket("/ws/notifications")
@websocket_endpoint(topics={"notifications"}, require_auth=True)
async def notifications_endpoint(websocket: WebSocket, connection_id: str, manager: WebSocketManager):
# Connection is automatically managed
# Authentication is handled automatically
# Cleanup is handled automatically
pass
```
### Manual Connection Management
```python
from app.middleware.websocket_middleware import get_websocket_manager
@router.websocket("/ws/custom")
async def custom_endpoint(websocket: WebSocket):
manager = get_websocket_manager()
async def handle_message(connection_id: str, message: WebSocketMessage):
if message.type == "chat":
await manager.broadcast_to_topic(
topic="chat_room",
message_type="chat_message",
data=message.data
)
await manager.handle_connection(
websocket=websocket,
topics={"chat_room"},
require_auth=True,
message_handler=handle_message
)
```
### Broadcasting Messages
```python
from app.middleware.websocket_middleware import get_websocket_manager
async def send_notification(user_id: int, message: str):
manager = get_websocket_manager()
# Send to specific user
await manager.send_to_user(
user_id=user_id,
message_type="notification",
data={"message": message}
)
async def broadcast_announcement(message: str):
manager = get_websocket_manager()
# Broadcast to all subscribers of a topic
await manager.broadcast_to_topic(
topic="announcements",
message_type="system_announcement",
data={"message": message}
)
```
## Administrative Features
### WebSocket Statistics
```bash
GET /api/admin/websockets/stats
```
Returns comprehensive statistics about the WebSocket pool:
- Total and active connections
- Message counts (sent/failed)
- Topic distribution
- Connection states
- Cleanup statistics
### Connection Management
```bash
# List all connections
GET /api/admin/websockets/connections
# Filter connections
GET /api/admin/websockets/connections?user_id=123&topic=notifications
# Get specific connection details
GET /api/admin/websockets/connections/{connection_id}
# Disconnect connections
POST /api/admin/websockets/disconnect
{
"user_id": 123, // or connection_ids, or topic
"reason": "maintenance"
}
# Manual cleanup
POST /api/admin/websockets/cleanup
# Broadcast message
POST /api/admin/websockets/broadcast
{
"topic": "announcements",
"message_type": "admin_message",
"data": {"message": "System maintenance in 5 minutes"}
}
```
## Message Format
All WebSocket messages follow a structured format:
```json
{
"type": "message_type",
"topic": "optional_topic",
"data": {
"key": "value"
},
"timestamp": "2023-01-01T12:00:00Z",
"error": "optional_error_message"
}
```
### Standard Message Types
- `ping`/`pong`: Heartbeat messages
- `welcome`: Initial connection message
- `subscribe`/`unsubscribe`: Topic subscription management
- `data`: General data messages
- `error`: Error notifications
- `heartbeat`: Automated health checks
## Security
### Authentication
- Token-based authentication via query parameters or headers
- User session validation against database
- Automatic connection termination for invalid credentials
### Authorization
- Admin-only access to management endpoints
- User-specific connection filtering
- Topic-based access control (application-level)
### Resource Protection
- Connection limits per topic and total
- Automatic cleanup of stale connections
- Rate limiting integration (via existing middleware)
## Monitoring and Debugging
### Structured Logging
All WebSocket operations are logged with structured data:
- Connection lifecycle events
- Message broadcasting statistics
- Error conditions and cleanup actions
- Performance metrics
### Health Checks
- Connection state monitoring
- Stale connection detection
- Message delivery success rates
- Resource usage tracking
### Metrics
The system provides metrics for:
- Active connection count
- Message throughput
- Error rates
- Cleanup efficiency
## Integration with Existing Features
### Billing API Integration
The existing billing WebSocket endpoint has been migrated to use the pool:
- Topic: `batch_progress_{batch_id}`
- Automatic connection management
- Improved reliability and resource usage
### Future Integration Opportunities
- Real-time search result updates
- Document processing notifications
- User activity broadcasts
- System status updates
## Performance Considerations
### Scalability
- Connection pooling reduces resource overhead
- Topic-based broadcasting is more efficient than individual sends
- Background cleanup prevents resource leaks
### Memory Management
- Automatic cleanup of stale connections
- Efficient data structures for connection storage
- Minimal memory footprint per connection
### Network Efficiency
- Heartbeat system prevents connection timeouts
- Failed connection detection and cleanup
- Structured message format reduces parsing overhead
## Troubleshooting
### Common Issues
1. **Connections not cleaning up**
- Check cleanup interval configuration
- Verify connection timeout settings
- Monitor stale connection detection
2. **Messages not broadcasting**
- Verify topic subscription
- Check connection state
- Review authentication status
3. **High memory usage**
- Monitor connection count limits
- Check for stale connections
- Review cleanup efficiency
### Debug Tools
1. **Admin API endpoints** for real-time monitoring
2. **Structured logs** for detailed operation tracking
3. **Connection metrics** for performance analysis
4. **Health check endpoints** for system status
## Testing
Comprehensive test suite covers:
- Connection pool functionality
- Message broadcasting
- Cleanup mechanisms
- Health monitoring
- Admin API operations
- Integration scenarios
- Stress testing
Run tests with:
```bash
pytest tests/test_websocket_pool.py -v
pytest tests/test_websocket_admin_api.py -v
```
## Future Enhancements
Potential improvements:
- Redis-based connection sharing across multiple application instances
- WebSocket cluster support for horizontal scaling
- Advanced message routing and filtering
- Integration with external message brokers
- Enhanced monitoring and alerting
## Examples
See `examples/websocket_pool_example.py` for comprehensive usage examples including:
- Basic WebSocket endpoints
- Custom message handling
- Broadcasting services
- Connection monitoring
- Real-time data streaming