Fixes caused by unsafe UTF‑8 decoding and DOM reserialization that could corrupt EPUBs. Introduces encoding detection/preservation with UTF‑16 only honored when BOM/XML declares it, logs low‑confidence decodes, and writes text using per‑file target encodings. HTML charset updates now preserve existing http‑equiv tags, and safe mode removes stray `<img>` tags without full reserialization. Adds `kindle_epub_fixer_aggressive` (default off) to settings UI and schema so riskier transforms are opt‑in.
Fix archived book count mismatch by deleting archived_book rows when a book is deleted.
Add TaskCleanArchivedBooks to purge stale archived references safely in batches.
Schedule cleanup via CWA settings (default 03:00 local) and expose schedule controls in CWA Settings UI.
Add new cwa_settings defaults/schema fields for archived cleanup timing.
[bug] Deleting An Archived Book Doesn't Remove Archived Book Entry From app.db's archived_book table
Fixes#8243
Problems:
- Sidecar manifest files were being treated as ingest targets, causing premature deletion.
- Ignored/temporary ingest artifacts could be deleted too early when readiness checks timed out.
- Stale temp cleanup was hardcoded, not user-configurable, and required restarts to change behavior.
Solutions:
- Filtered manifest files in the ingest watcher and added processor guards to skip them.
- Added skip-delete handling for ignored/temporary files on readiness timeout to preserve artifacts.
Implemented robust stale temp cleanup with age and interval settings.
- Persisted cleanup settings in the CWA database with sane defaults and validation.
- Exposed new cleanup controls in the settings UI and made the ingest service read live values from the database instead of environment variables.
Other changes:
- Centralized integer parsing and defaulting logic for the new settings.
- Added clear UI descriptions and bounds for the new cleanup options.
- Improved observability with explicit log messages for skip-delete behavior and cleanup timing.
Major Features:
- Auto-duplicate resolution: 6 strategies (newest, oldest, merge, highest_quality_format, most_metadata, largest_file_size)
- Automatic cancellation of pending tasks and scheduled jobs when books deleted by resolution
- Settings UI for enabling/configuring auto-resolution with cooldown periods
- Enhanced Duplicates Manager UI with clickable book covers, titles, and Edit/Archive buttons
Performance Fixes:
- Fixed critical application hang: Pass pre-scanned duplicate groups to auto_resolve_duplicates() to avoid expensive re-scan
- Fixed deadlock in cancel_tasks_for_book(): Access queue/dequeued directly instead of using .tasks property to prevent recursive lock
- Optimized incremental scan to include last scanned book (>= instead of >)
Implementation Details:
- cps/duplicates.py: auto_resolve_duplicates() with dry-run preview, backup, deletion, and audit logging
- cps/tasks/duplicate_scan.py: Pass found_duplicate_groups to resolution, added comprehensive debug logging
- cps/services/worker.py: cancel_tasks_for_book() method with deadlock prevention
- scripts/cwa_db.py: scheduled_cancel_for_book() to cancel pending auto-send/scheduled jobs
- cps/templates/duplicates.html: Fixed blueprint endpoints, added clickable UI elements
- cps/templates/cwa_settings.html: Uncommented and fixed auto-resolution settings section
Bug Fixes:
- Fixed template crash from wrong blueprint endpoint ('editbook' vs 'edit-book')
- Fixed settings page overwriting format lists with duplicate_auto_resolve_cooldown_minutes
- Fixed permission errors by bypassing user context check for automatic deletions
- Fixed SQL query debugging output for hybrid prefilter
Resolves SQL syntax error: near '>>': syntax error
The merge conflict marker '>>>>>>> origin/main' at line 95 was causing:
- cwa-update-notification-service failures
- translation-notification-service failures
- ingest_processor crashes
- All services depending on CWA_DB initialization to fail
This was missed during the merge conflict resolution in commit 646544b.
default hybrid detection + SQL prefilter; updated defaults in schema
duplicate scans now run as background tasks with progress + cancel
debounced after-import scans and cron-based scheduled scans
settings UI updated for cron, defaults, and explanations + next run display
duplicates page shows progress bar + next scheduled run
task queue fixes + better error/confirmation messaging
cron validation on save and ISO task date formatting
Implements ability to keep original book formats after conversion to target format.
Users can now select which formats to retain via CWA settings UI.
Features:
- New auto_convert_retained_formats setting with checkbox grid UI
- Automatic conflict prevention (target format always retained)
- Database migration support for backward compatibility
- Enhanced ingest processor with robust format addition logic
Credit to @angelicadvocate for original implementation concept in PR #284.
Fixes edge cases including race conditions, UI state handling, and iteration safety.
Fix issue where books in different languages were incorrectly grouped as duplicates
by implementing a comprehensive configurable duplicate detection system.
Key Changes:
Database Schema:
- Add 6 new duplicate detection settings to cwa_schema.sql:
- duplicate_detection_title/author/language (default: enabled)
- duplicate_detection_series/publisher/format (default: disabled)
Frontend UI:
- Add "CWA Duplicate Detection Criteria" section to cwa_settings.html
- Implement checkbox grid for configuring detection criteria
- Include explanatory text and validation warnings
Core Logic Rewrite:
- Replace hardcoded (title, author) matching with configurable criteria
- Support dynamic key generation based on selected metadata fields
- Add comprehensive error handling and edge case coverage
Robustness Improvements:
- Handle missing/null metadata gracefully with fallback values
- Add safety checks for empty collections and corrupt data
- Include CWA database connection error handling
- Performance warnings for large libraries (50k+ books)
Issue Resolution:
- Books in different languages no longer considered duplicates (language included by default)
- Users can now fully customize duplicate detection criteria
- Maintains backward compatibility with existing duplicate manager
- Comprehensive error handling prevents crashes on edge cases
Technical Details:
- Follows established CWA settings patterns for seamless integration
- Boolean settings automatically handled by existing backend logic
- Added datetime import for timestamp sorting fallbacks
- Extensive null/empty validation throughout duplicate detection pipeline
## Major Features Added
### 📧 Auto-Send System
- Automatically emails newly ingested books to users' eReaders
- Configurable delay (1-60 minutes) to allow for processing
- Supports multiple formats (EPUB, MOBI, AZW3, KEPUB, PDF)
- Integrates with existing Calibre-Web email configuration
- Respects user preferences and access controls
### 🏷️ Auto-Metadata Fetch System Enhancements
- Enhanced metadata fetching with multiple provider support
- Added smart metadata application mode with intelligent criteria
- Moved control from user-level to admin-only configuration
- Implemented provider hierarchy with drag-and-drop interface
- Added quality-based metadata replacement logic
## Database Schema Changes
### CWA Settings (scripts/cwa_schema.sql)
- Added auto_metadata_smart_application SMALLINT DEFAULT 0
- Enables intelligent vs direct metadata replacement modes
## User Interface Updates
### Admin Interface (cps/templates/cwa_settings.html)
- Added smart metadata application toggle with detailed tooltip
- Enhanced provider hierarchy management
### User Interface (cps/templates/user_edit.html)
- Removed auto_metadata_fetch controls (now admin-only)
- Cleaned up user profile interface
## Smart Metadata Application Logic
### Direct Replacement Mode (Default)
- Takes metadata from preferred provider exactly as provided
- Complete replacement of existing metadata
- Philosophy: "Just take the metadata as is"
### Smart Application Mode (Optional)
- Intelligent criteria for metadata replacement:
* Titles: Only replace if longer/more descriptive
* Descriptions: Only replace if longer/more detailed
* Publishers: Only replace if current field is empty
* Covers: Only replace if higher resolution
* Authors: Always update for consistency
* Tags/Series: Always add for discoverability
## Technical Implementation
### Metadata Helper (cps/metadata_helper.py)
- Enhanced _apply_metadata_to_book() with smart application logic
- Updated fetch_and_apply_metadata() for admin-only control
- Integrated CWA_DB settings checking for both modes
### Ingest Processor (scripts/ingest_processor.py)
- Removed user-based metadata checking
- Streamlined to use admin settings only
- Improved processing pipeline integration
### Form Processing (cps/cwa_functions.py)
- Auto-detection of boolean settings from schema
- Automatic handling of auto_metadata_smart_application
## Provider System Enhancements
- Google Books, Internet Archive, DNB, ComicVine, Douban support
- Priority-based searching with first-success-wins logic
- Quality criteria evaluation for metadata selection
- Configurable provider hierarchy with drag-and-drop interface
## Documentation
### Wiki Pages Created
- Auto-Send-System.md: Comprehensive user and admin guide
- Auto-Metadata-Fetch-System.md: Detailed configuration and usage
- Enhanced with relevant emojis for improved readability
- Covers troubleshooting, best practices, and technical details
## Integration & Compatibility
- Maintains backward compatibility with existing email settings
- Integrates seamlessly with auto-convert and ingest systems
- Respects existing access controls and user permissions
- No breaking changes to existing functionality
## Testing Notes
- Database schema updates will apply automatically on app startup
- Settings form processing handles new boolean field automatically
- Metadata fetching now controlled entirely by admin settings
- User interface cleaned of deprecated metadata controls
This implementation provides a complete automated book delivery and metadata enhancement system while maintaining the principle of admin-controlled automation and user-friendly operation.
New setting is stored in CWA Settings as
'auto_ingest_automerge' and can be set to 'ignore',
'overwrite' or 'new_record' (which is the default and
should match previous behaviour). Descriptions for each
value should broadly match the calibredb docs.
- kindle_epub_fixer.py has been completely rewritten to more closely replicate the function of the original JS tool written by innocenat
- Auto backup of files processed by kindle_epub_fixer is now added, enabled by default and able to be turned on or off by the user in the settings panel
- Entries for Files that have been processed by kindle_epub_fixer are now automatically added to cwa.db in a new table called epub_fixes. These entries are now also available to view in the CWA Stats page accessible via the Admin Panel
- CWA Stats pages has been rearranged to have the stats for the functions users care most about at the top and a bug was fixed in the see more pages where the title for all pages was for the enforcement statistics regardless of what was being shown
- Creation of the /config/processed_books archive folders as well the setting of their permissions is now the sole responsibility of the CWA Auto Zipper service
- Major improvements to exception handling in both convert_library and ingest_processor when handling files to improve performance, reliability as well as making it much easier to diagnose user errors
- The default CWA settings are now pulled from the schema instead of being hardcoded into the CWA_DB class
- Plus general refactoring and tidying up of the codebase
- Originally developed by [innocenat](https://github.com/innocenat/kindle-epub-fix), this tool corrects the following potential issues for every EPUB processed by CWA:
- Fixes UTF-8 encoding problem by adding UTF-8 declaration if no encoding is specified
- Fixes hyperlink problem (result in Amazon rejecting the EPUB) when NCX table of content link to `<body>` with ID hash.
- Detect invalid and/or missing language tag in metadata, and prompt user to select new language.
- Remove stray `<img>` tags with no source field.
- This ensures maximum comparability for each EPUB file with the Amazon Send-to-Kindle service and for those who don't use Amazon devices, has the side benefit of cleaning up your lower quality files
- This feature is on by default and is able to be toggled on and off by the user in the CWA Settings panel
Minor Changes:
- All CWA python scripts now conform to the snake_case naming convention
- Minor refactoring of ingest_processor script
Also rearranged CWA Settings page to make the layout abit more logical ect.
Added end format to conversion history DB as well as original format
Added lock file for Cover & Metadata Enforcement service
- CWA Settings page checkbox labels are now clickable and the target format is unable to be selected to be ignored
- Users can now also set certain formats to be ignored by the auto importer