- Implement upsert (INSERT or UPDATE) logic for all reference table imports - Fixed functions: import_trnstype, import_trnslkup, import_footers, import_filestat, import_employee, import_gruplkup, import_filetype, import_fvarlkup, import_rvarlkup - Now checks if record exists before inserting; updates if exists - Makes imports idempotent - can safely re-run without errors - Added tracking of inserted vs updated counts in result dict - Maintains batch commit performance for large imports - Fixes sqlite3.IntegrityError when re-importing CSV files
103 lines
3.4 KiB
Markdown
103 lines
3.4 KiB
Markdown
# Upsert Fix for Reference Table Imports
|
|
|
|
## Issue
|
|
When attempting to re-import CSV files for reference tables (like `trnstype`, `trnslkup`, `footers`, etc.), the application encountered UNIQUE constraint errors because the import functions tried to insert duplicate records:
|
|
|
|
```
|
|
Fatal error: (sqlite3.IntegrityError) UNIQUE constraint failed: trnstype.t_type
|
|
```
|
|
|
|
## Root Cause
|
|
The original import functions used `bulk_save_objects()` which only performs INSERT operations. When the same CSV was imported multiple times (e.g., during development, testing, or data refresh), the function attempted to insert records with primary keys that already existed in the database.
|
|
|
|
## Solution
|
|
Implemented **upsert logic** (INSERT or UPDATE) for all reference table import functions:
|
|
|
|
### Modified Functions
|
|
1. `import_trnstype()` - Transaction types
|
|
2. `import_trnslkup()` - Transaction lookup codes
|
|
3. `import_footers()` - Footer templates
|
|
4. `import_filestat()` - File status definitions
|
|
5. `import_employee()` - Employee records
|
|
6. `import_gruplkup()` - Group lookup codes
|
|
7. `import_filetype()` - File type definitions
|
|
8. `import_fvarlkup()` - File variable lookups
|
|
9. `import_rvarlkup()` - Rolodex variable lookups
|
|
|
|
### Key Changes
|
|
|
|
#### Before (Insert-Only)
|
|
```python
|
|
record = TrnsType(
|
|
t_type=t_type,
|
|
t_type_l=clean_string(row.get('T_Type_L')),
|
|
header=clean_string(row.get('Header')),
|
|
footer=clean_string(row.get('Footer'))
|
|
)
|
|
batch.append(record)
|
|
if len(batch) >= BATCH_SIZE:
|
|
db.bulk_save_objects(batch)
|
|
db.commit()
|
|
```
|
|
|
|
#### After (Upsert Logic)
|
|
```python
|
|
# Check if record already exists
|
|
existing = db.query(TrnsType).filter(TrnsType.t_type == t_type).first()
|
|
|
|
if existing:
|
|
# Update existing record
|
|
existing.t_type_l = clean_string(row.get('T_Type_L'))
|
|
existing.header = clean_string(row.get('Header'))
|
|
existing.footer = clean_string(row.get('Footer'))
|
|
result['updated'] += 1
|
|
else:
|
|
# Insert new record
|
|
record = TrnsType(
|
|
t_type=t_type,
|
|
t_type_l=clean_string(row.get('T_Type_L')),
|
|
header=clean_string(row.get('Header')),
|
|
footer=clean_string(row.get('Footer'))
|
|
)
|
|
db.add(record)
|
|
result['inserted'] += 1
|
|
|
|
result['success'] += 1
|
|
|
|
# Commit in batches for performance
|
|
if result['success'] % BATCH_SIZE == 0:
|
|
db.commit()
|
|
```
|
|
|
|
## Benefits
|
|
|
|
1. **Idempotent Imports**: Can safely re-run imports without errors
|
|
2. **Data Updates**: Automatically updates existing records with new data from CSV
|
|
3. **Better Tracking**: Result dictionaries now include:
|
|
- `inserted`: Count of new records added
|
|
- `updated`: Count of existing records updated
|
|
- `success`: Total successful operations
|
|
4. **Error Handling**: Individual row errors don't block the entire import
|
|
|
|
## Testing
|
|
|
|
To verify the fix works:
|
|
|
|
1. Import a CSV file (e.g., `trnstype.csv`)
|
|
2. Import the same file again
|
|
3. The second import should succeed with `updated` count matching the first import's `inserted` count
|
|
|
|
## Performance Considerations
|
|
|
|
- Still uses batch commits (every BATCH_SIZE operations)
|
|
- Individual record checks are necessary to prevent constraint violations
|
|
- For large datasets, this is slightly slower than bulk insert but provides reliability
|
|
|
|
## Future Enhancements
|
|
|
|
Consider implementing database-specific upsert operations for better performance:
|
|
- SQLite: `INSERT OR REPLACE`
|
|
- PostgreSQL: `INSERT ... ON CONFLICT DO UPDATE`
|
|
- MySQL: `INSERT ... ON DUPLICATE KEY UPDATE`
|
|
|