38 Commits

Author SHA1 Message Date
crocodilestick 32156431df Make EPUB fixer safer by default + add aggressive toggle
Fixes caused by unsafe UTF‑8 decoding and DOM reserialization that could corrupt EPUBs. Introduces encoding detection/preservation with UTF‑16 only honored when BOM/XML declares it, logs low‑confidence decodes, and writes text using per‑file target encodings. HTML charset updates now preserve existing http‑equiv tags, and safe mode removes stray `<img>` tags without full reserialization. Adds `kindle_epub_fixer_aggressive` (default off) to settings UI and schema so riskier transforms are opt‑in.
2026-01-29 23:13:22 +01:00
crocodilestick be6cb19cb3 Fix archived_book cleanup and add scheduled maintenance
Fix archived book count mismatch by deleting archived_book rows when a book is deleted.
Add TaskCleanArchivedBooks to purge stale archived references safely in batches.
Schedule cleanup via CWA settings (default 03:00 local) and expose schedule controls in CWA Settings UI.
Add new cwa_settings defaults/schema fields for archived cleanup timing.

[bug] Deleting An Archived Book Doesn't Remove Archived Book Entry From app.db's archived_book table
Fixes #8243
2026-01-29 21:14:38 +01:00
crocodilestick 8a889050ab cwa: harden ingest flow, manifest handling, and stale temp cleanup
Problems:

- Sidecar manifest files were being treated as ingest targets, causing premature deletion.
- Ignored/temporary ingest artifacts could be deleted too early when readiness checks timed out.
- Stale temp cleanup was hardcoded, not user-configurable, and required restarts to change behavior.

Solutions:

- Filtered manifest files in the ingest watcher and added processor guards to skip them.
- Added skip-delete handling for ignored/temporary files on readiness timeout to preserve artifacts.
Implemented robust stale temp cleanup with age and interval settings.
- Persisted cleanup settings in the CWA database with sane defaults and validation.
- Exposed new cleanup controls in the settings UI and made the ingest service read live values from the database instead of environment variables.

Other changes:

- Centralized integer parsing and defaulting logic for the new settings.
- Added clear UI descriptions and bounds for the new cleanup options.
- Improved observability with explicit log messages for skip-delete behavior and cleanup timing.
2026-01-29 20:06:57 +01:00
crocodilestick fe60df7bf6 Fixed duplicate detection notifications not displaying reliablely immedietly after the ingest of multiple books 2026-01-29 13:55:34 +01:00
crocodilestick e3db8b2152 feat: Add auto-duplicate resolution with task cancellation and fix critical deadlocks
Major Features:
- Auto-duplicate resolution: 6 strategies (newest, oldest, merge, highest_quality_format, most_metadata, largest_file_size)
- Automatic cancellation of pending tasks and scheduled jobs when books deleted by resolution
- Settings UI for enabling/configuring auto-resolution with cooldown periods
- Enhanced Duplicates Manager UI with clickable book covers, titles, and Edit/Archive buttons

Performance Fixes:
- Fixed critical application hang: Pass pre-scanned duplicate groups to auto_resolve_duplicates() to avoid expensive re-scan
- Fixed deadlock in cancel_tasks_for_book(): Access queue/dequeued directly instead of using .tasks property to prevent recursive lock
- Optimized incremental scan to include last scanned book (>= instead of >)

Implementation Details:
- cps/duplicates.py: auto_resolve_duplicates() with dry-run preview, backup, deletion, and audit logging
- cps/tasks/duplicate_scan.py: Pass found_duplicate_groups to resolution, added comprehensive debug logging
- cps/services/worker.py: cancel_tasks_for_book() method with deadlock prevention
- scripts/cwa_db.py: scheduled_cancel_for_book() to cancel pending auto-send/scheduled jobs
- cps/templates/duplicates.html: Fixed blueprint endpoints, added clickable UI elements
- cps/templates/cwa_settings.html: Uncommented and fixed auto-resolution settings section

Bug Fixes:
- Fixed template crash from wrong blueprint endpoint ('editbook' vs 'edit-book')
- Fixed settings page overwriting format lists with duplicate_auto_resolve_cooldown_minutes
- Fixed permission errors by bypassing user context check for automatic deletions
- Fixed SQL query debugging output for hybrid prefilter
2026-01-25 01:24:04 +01:00
crocodilestick 52fb64455e Fix: Remove leftover git merge conflict marker from cwa_schema.sql
Resolves SQL syntax error: near '>>': syntax error

The merge conflict marker '>>>>>>> origin/main' at line 95 was causing:
- cwa-update-notification-service failures
- translation-notification-service failures
- ingest_processor crashes
- All services depending on CWA_DB initialization to fail

This was missed during the merge conflict resolution in commit 646544b.
2026-01-24 21:37:37 +01:00
crocodilestick 646544b820 Merge main into auto-hardcover-id - sync with latest changes and resolve conflicts 2026-01-24 21:25:37 +01:00
crocodilestick 1011dd509a Phase 3: incremental duplicate scans, debounced scheduling, and metadata-safe title normalization 2026-01-15 16:48:50 +01:00
crocodilestick 689d223195 feat(duplicates): finalize Phase 2 scanning with hybrid detection, background tasking, cron scheduling, and UI feedback
default hybrid detection + SQL prefilter; updated defaults in schema
duplicate scans now run as background tasks with progress + cancel
debounced after-import scans and cron-based scheduled scans
settings UI updated for cron, defaults, and explanations + next run display
duplicates page shows progress bar + next scheduled run
task queue fixes + better error/confirmation messaging
cron validation on save and ISO task date formatting
2026-01-15 14:07:24 +01:00
crocodilestick ce3c452dca Implemented performant & more reliable SQL query + python hybrid system 2026-01-15 11:51:33 +01:00
crocodilestick 1491589bbf Added SQL dupe search as fallback for python dupe search as I couldn't get it to work as reliably. Disabled for now, might come back to in the future 2026-01-14 18:57:58 +01:00
crocodilestick 63f2cbbecc Added V1 of auto-resolving duplicate detection system 2026-01-14 17:07:16 +01:00
crocodilestick a4cff97a7c Added a notification system for duplicates that can be disabled in teh cwa settings panel 2026-01-08 19:08:07 +01:00
crocodilestick 378b2facff Implement Hardcover Auto-Fetch feature
- Add confidence scoring algorithm with Levenshtein distance and Jaccard similarity
- Create background worker (TaskAutoHardcoverID) with rate limiting and exponential backoff
- Add database schema: hardcover_match_queue and hardcover_auto_fetch_stats tables
- Implement comprehensive scheduling system (10 options: 15min intervals through monthly)
- Build settings UI with token validation and dynamic schedule selectors
- Add manual trigger button in admin panel
- Create review queue UI with gradient status hero cards
- Integrate stats dashboard with auto-matched, pending review, and manually reviewed counts
- Add text similarity utilities (normalized_levenshtein_similarity, author_list_similarity)
- Enhance Hardcover provider with calculate_confidence_score method
- Extend MetaRecord with confidence_score and match_reason fields
- Add IntervalTrigger support to background scheduler
2026-01-06 00:20:40 +01:00
crocodilestick 1949261110 Version 1 User Activity Stats page 2025-12-29 16:31:14 +01:00
crocodilestick 123f105fe7 feat(ui): Add Bootstrap toast notifications to scheduled jobs UI
Enhance user feedback with toast notifications:

tasks.html:
- Add showNotification() helper for consistent toast display
- Enhanced cancelScheduled() with success/error feedback

cwa_convert_library.html & cwa_epub_fixer.html:
- Convert schedule links to AJAX buttons
- Add scheduleConvertLibrary(5|15) and scheduleEpubFixer(5|15) functions
- Show success/error notifications for all scheduling actions
- Maintain existing manual trigger functionality

All notifications:
- Positioned top-right with 4-second auto-dismiss
- Use Bootstrap's alert-success/alert-danger styling
- Provide clear action confirmation to users
2025-11-17 14:39:55 +01:00
crocodilestick 7d3f411da4 feat: implement global enablement for metadata providers with UI controls 2025-09-12 23:14:18 +02:00
crocodilestick ebd68c3091 Added metadata_providers_enabled to schema to resolve conflict with PR #632 2025-09-12 23:07:10 +02:00
crocodilestick 42e7aabe5f feat: Add retained formats functionality for auto-conversion
Implements ability to keep original book formats after conversion to target format.
Users can now select which formats to retain via CWA settings UI.

Features:
- New auto_convert_retained_formats setting with checkbox grid UI
- Automatic conflict prevention (target format always retained)
- Database migration support for backward compatibility
- Enhanced ingest processor with robust format addition logic

Credit to @angelicadvocate for original implementation concept in PR #284.

Fixes edge cases including race conditions, UI state handling, and iteration safety.
2025-09-12 21:34:32 +02:00
crocodilestick 7c58906e53 feat: Implement configurable duplicate detection system (#604)
Fix issue where books in different languages were incorrectly grouped as duplicates
by implementing a comprehensive configurable duplicate detection system.

Key Changes:

Database Schema:
- Add 6 new duplicate detection settings to cwa_schema.sql:
  - duplicate_detection_title/author/language (default: enabled)
  - duplicate_detection_series/publisher/format (default: disabled)

Frontend UI:
- Add "CWA Duplicate Detection Criteria" section to cwa_settings.html
- Implement checkbox grid for configuring detection criteria
- Include explanatory text and validation warnings

Core Logic Rewrite:
- Replace hardcoded (title, author) matching with configurable criteria
- Support dynamic key generation based on selected metadata fields
- Add comprehensive error handling and edge case coverage

Robustness Improvements:
- Handle missing/null metadata gracefully with fallback values
- Add safety checks for empty collections and corrupt data
- Include CWA database connection error handling
- Performance warnings for large libraries (50k+ books)

Issue Resolution:
- Books in different languages no longer considered duplicates (language included by default)
- Users can now fully customize duplicate detection criteria
- Maintains backward compatibility with existing duplicate manager
- Comprehensive error handling prevents crashes on edge cases

Technical Details:
- Follows established CWA settings patterns for seamless integration
- Boolean settings automatically handled by existing backend logic
- Added datetime import for timestamp sorting fallbacks
- Extensive null/empty validation throughout duplicate detection pipeline
2025-09-12 16:26:22 +02:00
crocodilestick 8402e3225e Added the ability to select which fields can be overwritten by the automatic metadata fetching service, for both the smart and verbatim modes 2025-09-05 15:46:42 +02:00
crocodilestick 3e3d2e5e4d feat: Implement auto-send and enhanced auto-metadata fetch systems
## Major Features Added

### 📧 Auto-Send System
- Automatically emails newly ingested books to users' eReaders
- Configurable delay (1-60 minutes) to allow for processing
- Supports multiple formats (EPUB, MOBI, AZW3, KEPUB, PDF)
- Integrates with existing Calibre-Web email configuration
- Respects user preferences and access controls

### 🏷️ Auto-Metadata Fetch System Enhancements
- Enhanced metadata fetching with multiple provider support
- Added smart metadata application mode with intelligent criteria
- Moved control from user-level to admin-only configuration
- Implemented provider hierarchy with drag-and-drop interface
- Added quality-based metadata replacement logic

## Database Schema Changes

### CWA Settings (scripts/cwa_schema.sql)
- Added auto_metadata_smart_application SMALLINT DEFAULT 0
- Enables intelligent vs direct metadata replacement modes

## User Interface Updates

### Admin Interface (cps/templates/cwa_settings.html)
- Added smart metadata application toggle with detailed tooltip
- Enhanced provider hierarchy management

### User Interface (cps/templates/user_edit.html)
- Removed auto_metadata_fetch controls (now admin-only)
- Cleaned up user profile interface

## Smart Metadata Application Logic

### Direct Replacement Mode (Default)
- Takes metadata from preferred provider exactly as provided
- Complete replacement of existing metadata
- Philosophy: "Just take the metadata as is"

### Smart Application Mode (Optional)
- Intelligent criteria for metadata replacement:
  * Titles: Only replace if longer/more descriptive
  * Descriptions: Only replace if longer/more detailed
  * Publishers: Only replace if current field is empty
  * Covers: Only replace if higher resolution
  * Authors: Always update for consistency
  * Tags/Series: Always add for discoverability

## Technical Implementation

### Metadata Helper (cps/metadata_helper.py)
- Enhanced _apply_metadata_to_book() with smart application logic
- Updated fetch_and_apply_metadata() for admin-only control
- Integrated CWA_DB settings checking for both modes

### Ingest Processor (scripts/ingest_processor.py)
- Removed user-based metadata checking
- Streamlined to use admin settings only
- Improved processing pipeline integration

### Form Processing (cps/cwa_functions.py)
- Auto-detection of boolean settings from schema
- Automatic handling of auto_metadata_smart_application

## Provider System Enhancements
- Google Books, Internet Archive, DNB, ComicVine, Douban support
- Priority-based searching with first-success-wins logic
- Quality criteria evaluation for metadata selection
- Configurable provider hierarchy with drag-and-drop interface

## Documentation

### Wiki Pages Created
- Auto-Send-System.md: Comprehensive user and admin guide
- Auto-Metadata-Fetch-System.md: Detailed configuration and usage
- Enhanced with relevant emojis for improved readability
- Covers troubleshooting, best practices, and technical details

## Integration & Compatibility
- Maintains backward compatibility with existing email settings
- Integrates seamlessly with auto-convert and ingest systems
- Respects existing access controls and user permissions
- No breaking changes to existing functionality

## Testing Notes
- Database schema updates will apply automatically on app startup
- Settings form processing handles new boolean field automatically
- Metadata fetching now controlled entirely by admin settings
- User interface cleaned of deprecated metadata controls

This implementation provides a complete automated book delivery and metadata enhancement system while maintaining the principle of admin-controlled automation and user-friendly operation.
2025-09-04 18:22:54 +02:00
crocodilestick 955320d648 Added ability to change ingest timeout duration in CWA Settings 2025-09-02 15:37:38 +02:00
crocodilestick 4dfbcba34d Reverted unwanted changes 2025-08-09 23:42:20 +02:00
crocodilestick 0c891d259f REFACTOR - Moved audiobook.py, auto_library.py, auto_zip.py, convert_library.py, cover_enforcer.py, cwa_db.py, cwa_functions.py, cwa_schema.sql, ingest_processor.py & kindle_epub_fixer.py to the cps dir, changing dependencies as necessary 2025-08-09 20:21:58 +02:00
crocodilestick 91f727529d Added "sexy-background-blur" option to cwa-settings to give mobile dark mode users the same cover incorporated background blur effect they have on desktop (can be disabled in cwa settings) 2025-08-09 11:56:30 +02:00
crocodilestick f9a22ea94d Added system to prompt users using CWA in languages other than English that has missing translations, to help complete the translations for that language. These notifications can be disabled in the CWA settings panel 2025-08-06 16:02:42 +02:00
have-a-boy 76947e5ad3 Add configurable setting for automerge param
New setting is stored in CWA Settings as
'auto_ingest_automerge' and can be set to 'ignore',
'overwrite' or 'new_record' (which is the default and
should match previous behaviour). Descriptions for each
value should broadly match the calibredb docs.
2025-07-09 18:04:36 +02:00
crocodilestick 47e8cf0d66 Major Changes -
- kindle_epub_fixer.py has been completely rewritten to more closely replicate the function of the original JS tool written by innocenat
- Auto backup of files processed by kindle_epub_fixer is now added, enabled by default and able to be turned on or off by the user in the settings panel
- Entries for Files that have been processed by kindle_epub_fixer are now automatically added to cwa.db in a new table called epub_fixes. These entries are now also available to view in the CWA Stats page accessible via the Admin Panel
- CWA Stats pages has been rearranged to have the stats for the functions users care most about at the top and a bug was fixed in the see more pages where the title for all pages was for the enforcement statistics regardless of what was being shown
- Creation of the /config/processed_books archive folders as well the setting of their permissions is now the sole responsibility of the CWA Auto Zipper service
- Major improvements to exception handling in both convert_library and ingest_processor when handling files to improve performance, reliability as well as making it much easier to diagnose user errors
- The default CWA settings are now pulled from the schema instead of being hardcoded into the CWA_DB class
- Plus general refactoring and tidying up of the codebase
2025-01-05 21:26:32 +01:00
crocodilestick 1a63a51c1f NEW FEATURE - Added Kindle-EPUB-Fixer
- Originally developed by [innocenat](https://github.com/innocenat/kindle-epub-fix), this tool corrects the following potential issues for every EPUB processed by CWA:
    - Fixes UTF-8 encoding problem by adding UTF-8 declaration if no encoding is specified
    - Fixes hyperlink problem (result in Amazon rejecting the EPUB) when NCX table of content link to `<body>` with ID hash.
    - Detect invalid and/or missing language tag in metadata, and prompt user to select new language.
    - Remove stray `<img>` tags with no source field.
- This ensures maximum comparability for each EPUB file with the Amazon Send-to-Kindle service and for those who don't use Amazon devices, has the side benefit of cleaning up your lower quality files
- This feature is on by default and is able to be toggled on and off by the user in the CWA Settings panel

Minor Changes:
- All CWA python scripts now conform to the snake_case naming convention
- Minor refactoring of ingest_processor script
2024-12-11 13:59:53 +00:00
crocodilestick 5bbd15af62 Finished work on making Cover & Metadata Enforcement service compatible with multiple formats and the presence of multiple formats for each book.
Also rearranged CWA Settings page to make the layout abit more logical ect.

Added end format to conversion history DB as well as original format

Added lock file for Cover & Metadata Enforcement service
2024-11-22 16:03:45 +00:00
crocodilestick fc949d281c Made major changes to the CWA Cover & Metadata Enforcement service. Like the ingest service, it now also supports multiple formats (currently limited to EPUB & AZW3 due to limitations of the calibre ebook-polish function) and can also now be disabled in the CWA Settings panel. The majority of the necessary work has been done to achieve these goals but these changes are currently untested 2024-11-21 17:29:53 +00:00
crocodilestick 06a5e0fd96 IN PROGRESS - Testing additional ingest functionality (User definable output formats & behaviour) 2024-11-18 11:35:38 +00:00
crocodilestick fc188873a4 - Fixed numerous issues with the cwa_db that were resulting in the db occasionally becoming locked for some users
- CWA Settings page checkbox labels are now clickable and the target format is unable to be selected to be ignored
- Users can now also set certain formats to be ignored by the auto importer
2024-11-15 11:47:20 +00:00
crocodilestick d6b2caa7f8 IN PROGRESS - Continuing work on CWA Settings system for upcoming features 2024-11-14 15:50:21 +00:00
crocodilestick 3befb24e95 Fixed errors in schema, the adding of new settings to existing databases and fixed jinja errors in cwa_settings.html 2024-11-11 15:52:42 +00:00
crocodilestick e6ae408268 Working on adding support to give the user more control over auto-conversion, formats ect 2024-11-11 13:43:26 +00:00
crocodilestick 3b1da04a16 Implementing user toggleable CWA settings for update notifications and auto-backup behaviour during ingest 2024-09-25 09:43:12 +00:00