254 Commits

Author SHA1 Message Date
crocodilestick d11a1c45b7 Updated translations.
Major Update to 4.0.0/.4.0.1 messes up translations
Fixes #960
2026-01-30 20:58:44 +01:00
crocodilestick e6b0c38cb5 Update copyright years in all relevant files to 2026 for Calibre-Web contributors and Calibre-Web Automated contributors 2026-01-29 23:56:43 +01:00
crocodilestick 4a5774fead Harden app.db resolution and startup checks for auto-send migration issues
Fix mismatched app.db usage across services by resolving the settings DB from CWA_APP_DB_PATH or CALIBRE_DBPATH, and refuse non-app.db file paths.
Align ingest pipeline and Calibre init helpers to the same app.db resolver to avoid stale/wrong DB usage (e.g., accidentally using cwa.db).
Add clear error logging when the auto_send_enabled column migration fails so permission/lock/path issues surface instead of silently breaking startup.
Introduce a lightweight app.db healthcheck at startup to flag missing, non-writable, or locked DBs early.
Skip PRAGMA quick_check when NETWORK_SHARE_MODE=true to avoid network-share stalls or spurious lock errors.
2026-01-29 23:53:37 +01:00
crocodilestick 32156431df Make EPUB fixer safer by default + add aggressive toggle
Fixes caused by unsafe UTF‑8 decoding and DOM reserialization that could corrupt EPUBs. Introduces encoding detection/preservation with UTF‑16 only honored when BOM/XML declares it, logs low‑confidence decodes, and writes text using per‑file target encodings. HTML charset updates now preserve existing http‑equiv tags, and safe mode removes stray `<img>` tags without full reserialization. Adds `kindle_epub_fixer_aggressive` (default off) to settings UI and schema so riskier transforms are opt‑in.
2026-01-29 23:13:22 +01:00
crocodilestick 9b2d59948c Fix auto-zip by updating backup mtime
Backups used shutil.copy2, preserving original file timestamps. The auto-zipper only zips files with today’s mtime, so many backups were skipped. Switch to shutil.copy and reset mtime to now so daily zips include new backups.

Fixes #260
2026-01-29 22:21:03 +01:00
crocodilestick be6cb19cb3 Fix archived_book cleanup and add scheduled maintenance
Fix archived book count mismatch by deleting archived_book rows when a book is deleted.
Add TaskCleanArchivedBooks to purge stale archived references safely in batches.
Schedule cleanup via CWA settings (default 03:00 local) and expose schedule controls in CWA Settings UI.
Add new cwa_settings defaults/schema fields for archived cleanup timing.

[bug] Deleting An Archived Book Doesn't Remove Archived Book Entry From app.db's archived_book table
Fixes #8243
2026-01-29 21:14:38 +01:00
crocodilestick e3cea8cbaa fix(fixer): repair duplicate XML declarations in container.xml
The Kindle EPUB fixer now detects and removes duplicate XML declarations in META-INF/container.xml, validating the result before writing. This normalizes malformed EPUBs during ingest to reduce reader failures and potential Send‑to‑Kindle E999 ingestion errors.
2026-01-29 20:21:39 +01:00
crocodilestick 8a889050ab cwa: harden ingest flow, manifest handling, and stale temp cleanup
Problems:

- Sidecar manifest files were being treated as ingest targets, causing premature deletion.
- Ignored/temporary ingest artifacts could be deleted too early when readiness checks timed out.
- Stale temp cleanup was hardcoded, not user-configurable, and required restarts to change behavior.

Solutions:

- Filtered manifest files in the ingest watcher and added processor guards to skip them.
- Added skip-delete handling for ignored/temporary files on readiness timeout to preserve artifacts.
Implemented robust stale temp cleanup with age and interval settings.
- Persisted cleanup settings in the CWA database with sane defaults and validation.
- Exposed new cleanup controls in the settings UI and made the ingest service read live values from the database instead of environment variables.

Other changes:

- Centralized integer parsing and defaulting logic for the new settings.
- Added clear UI descriptions and bounds for the new cleanup options.
- Improved observability with explicit log messages for skip-delete behavior and cleanup timing.
2026-01-29 20:06:57 +01:00
crocodilestick 3a1af44d62 fix(cover-enforcer): avoid polling loop on failed logs
The metadata-change-detector polling watcher would re-process the same JSON log when calibredb/enforcement fails because the log was left in place. This change deletes failed logs and records a failure entry (“auto -log (failed)”) so the job is not retried indefinitely and admins can see failures in CWA stats.

[bug] Subprocess error in cover enforcer results in infinite loop with polling watcher
Fixes #829
2026-01-29 18:16:17 +01:00
crocodilestick b3c2230e3e fix(stats): resolve Unknown user dropdown entries
Cause: stats dropdown used activity table user_name and stale user_ids, leaving many unmatched entries shown as “Unknown”.
Fix: resolve names against app users by user_id, group all unmatched IDs under a single “Unknown User”, and allow list-based user filters in stats queries (with tests).

[bug] Many unknown users in stats
Fixes #942
2026-01-29 18:00:57 +01:00
crocodilestick fe60df7bf6 Fixed duplicate detection notifications not displaying reliablely immedietly after the ingest of multiple books 2026-01-29 13:55:34 +01:00
crocodilestick 8d280c246f fix(ingest): load GDrive settings for sync
The ingest worker never initialized CPS config, so config_use_google_drive stayed false and GDrive sync was skipped. Load minimal settings from app.db on startup so sync runs when enabled.

[bug] GDrive not syncing
Fixes #933.
2026-01-28 23:12:12 +01:00
crocodilestick bf3b0d42a8 Rolled back problematic new epub fixer functions to come back to later 2026-01-28 22:21:26 +01:00
crocodilestick fa2f2f70c2 Fix database locking and race conditions in cover_enforcer
Resolves issue where CWA hangs when editing metadata and cover
simultaneously, particularly on network shares or when NETWORK_SHARE_MODE
is enabled.

Changes:
- Increase SQLite connection timeout from 30s to 60s throughout
  cover_enforcer.py to handle slower network share operations
- Add retry logic with exponential backoff (3 attempts) to calibredb
  export, specifically handling "database is locked" errors
- Replace os.system() with subprocess.run() for ebook-polish operations:
  * Better error handling and logging
  * 120s timeout to prevent indefinite hangs
  * Capture stderr/stdout for debugging
- Add 0.5s delay before ebook-polish execution to ensure file buffers
  are flushed and locks released
- Explicit database connection closure before file modification operations

All changes occur in background service; no UI impact.

Fixes #904
2026-01-26 17:50:43 +01:00
crocodilestick 984647e00c fix(epub-fixer): improve Kindle compatibility and fix XML parsing order
- Reorder process() execution: fix_encoding() now runs BEFORE fix_book_language()
  to clean malformed XML declarations before parsing attempts
- Add try/except wrapper to fix_book_language() for graceful handling of
  non-standard EPUB structures (e.g., duplicate XML declarations)
- Replace minidom.toxml() with regex-based updates in fix_book_language() to
  preserve XML attribute order (prevents breaking Amazon's strict parser)
- Add strip_embedded_fonts() to remove TTF/OTF/WOFF fonts and @font-face CSS
  declarations, plus manifest entries (Kindle doesn't support custom fonts reliably)
- Add remove_javascript() to strip <script> tags (not supported on Kindle)
- Add validate_images() to check format/size compatibility and warn about issues
- Add validate_css() for non-invasive syntax validation with Kindle compatibility warnings
- Add strip_amazon_identifiers() function (implemented but not called by default
  per user preference to preserve complete metadata)
- Extend fix_encoding() malformed XML detection to cover OPF/XML/NCX files,
  not just HTML/XHTML

Tested with both working and failing EPUBs - execution order fix allows proper
language validation on files with initially malformed XML, and all new fixes
are non-destructive with proper error handling.

Resolves #920
2026-01-26 16:43:12 +01:00
crocodilestick aea35e27e8 Fix malformed XML declarations in OPF/XML files causing Amazon E999 rejections
Amazon's Send-to-Kindle service was rejecting some EPUBs with E999 errors due
to malformed XML declarations containing excessive whitespace. Example:
  <?xml version="1.0"  encoding="UTF-8"?>
                     ^^
              Double space breaks Amazon's strict XML parser

Root cause: fix_encoding() only processed HTML/XHTML files and missed OPF/XML/NCX
files entirely. Additionally, the validation regex incorrectly accepted malformed
declarations with multiple consecutive spaces.

Changes to kindle_epub_fixer.py:
- Extended fix_encoding() to process OPF, XML, and NCX files (not just HTML/XHTML)
- Added malformed XML declaration detection pattern (2+ consecutive spaces)
- Normalizes malformed declarations to proper single-space format
- Uses elif logic to ensure malformed fix runs before missing declaration check
- Preserves all existing behavior for properly formatted files

Fixes issue where EPUBs passed Python's lenient minidom parser but were rejected
by Amazon's strict validation.
2026-01-26 14:36:31 +01:00
crocodilestick 0746472a8f Fix EPUB language tag handling to prevent Amazon Kindle E999 rejections (#920)
Amazon's Send-to-Kindle service was rejecting EPUBs with E999 errors when
language metadata didn't match content. Root cause: CWA was silently forcing
ALL invalid/unsupported language tags to 'en', causing German books with
"Unknown" tags to be mislabeled as English.

Changes to kindle_epub_fixer.py:
- Rewrote fix_book_language() with proper validation logic:
  * Validate format with regex before processing (prevents false positives)
  * Extract 2-character base code to match Amazon's behavior (de-DE → de)
  * Handle case-insensitive tags (en-us/EN-US → en)
  * Preserve valid language codes instead of forcing to English

- Added _detect_language_from_metadata() fallback:
  * Queries Calibre's metadata.db when EPUB tag is invalid
  * Extracts book_id from file path to find correct language

- Performance & code quality improvements:
  * Moved regex compilation to module-level constant (LANGUAGE_TAG_PATTERN)
  * Removed duplicate 'nb' entry from allowed_languages list
  * Cleaned up redundant imports scattered in functions
  * Added detailed comments explaining Amazon's 2-char limitation

Fixes #920
2026-01-26 14:21:35 +01:00
crocodilestick e3db8b2152 feat: Add auto-duplicate resolution with task cancellation and fix critical deadlocks
Major Features:
- Auto-duplicate resolution: 6 strategies (newest, oldest, merge, highest_quality_format, most_metadata, largest_file_size)
- Automatic cancellation of pending tasks and scheduled jobs when books deleted by resolution
- Settings UI for enabling/configuring auto-resolution with cooldown periods
- Enhanced Duplicates Manager UI with clickable book covers, titles, and Edit/Archive buttons

Performance Fixes:
- Fixed critical application hang: Pass pre-scanned duplicate groups to auto_resolve_duplicates() to avoid expensive re-scan
- Fixed deadlock in cancel_tasks_for_book(): Access queue/dequeued directly instead of using .tasks property to prevent recursive lock
- Optimized incremental scan to include last scanned book (>= instead of >)

Implementation Details:
- cps/duplicates.py: auto_resolve_duplicates() with dry-run preview, backup, deletion, and audit logging
- cps/tasks/duplicate_scan.py: Pass found_duplicate_groups to resolution, added comprehensive debug logging
- cps/services/worker.py: cancel_tasks_for_book() method with deadlock prevention
- scripts/cwa_db.py: scheduled_cancel_for_book() to cancel pending auto-send/scheduled jobs
- cps/templates/duplicates.html: Fixed blueprint endpoints, added clickable UI elements
- cps/templates/cwa_settings.html: Uncommented and fixed auto-resolution settings section

Bug Fixes:
- Fixed template crash from wrong blueprint endpoint ('editbook' vs 'edit-book')
- Fixed settings page overwriting format lists with duplicate_auto_resolve_cooldown_minutes
- Fixed permission errors by bypassing user context check for automatic deletions
- Fixed SQL query debugging output for hybrid prefilter
2026-01-25 01:24:04 +01:00
crocodilestick 9571f89a62 [bug] Can't save CWA Settings on latest DEV-335 due to issues with duplicate detection settings
Fixes #903
2026-01-24 22:55:52 +01:00
crocodilestick 52fb64455e Fix: Remove leftover git merge conflict marker from cwa_schema.sql
Resolves SQL syntax error: near '>>': syntax error

The merge conflict marker '>>>>>>> origin/main' at line 95 was causing:
- cwa-update-notification-service failures
- translation-notification-service failures
- ingest_processor crashes
- All services depending on CWA_DB initialization to fail

This was missed during the merge conflict resolution in commit 646544b.
2026-01-24 21:37:37 +01:00
crocodilestick 646544b820 Merge main into auto-hardcover-id - sync with latest changes and resolve conflicts 2026-01-24 21:25:37 +01:00
CrocodileStick c4ac747797 Merge branch 'main' into checksum-split-awareness 2026-01-23 23:04:12 +01:00
crocodilestick d67c1db641 refactor: improve split library error handling and add test coverage
Improvements to the split library implementation in generate_book_checksums.py:

- Rename books_path() to get_books_path() for better naming convention
- Add NULL-safe handling for config_calibre_split_dir database field
- Check config_calibre_split flag before using split path (not just path existence)
- Replace sys.exit() with graceful warnings and fallback to library_path
- Remove lambda in favor of explicit, readable conditional logic
- Enhance logging to clearly indicate when split library mode is active
- Improve docstrings and inline comments for clarity

Add comprehensive test coverage (177 lines, 4 test cases):

- test_split_library_with_separate_paths - core split library functionality
- test_books_path_falls_back_to_library_path - invalid path handling
- test_books_path_with_none_value - default behavior without --books-path
- test_split_library_with_multiple_formats - multi-format books in split mode

All tests follow existing patterns using subprocess execution, helper functions,
and tmp_path isolation for consistency with the test suite.config_calibre_split_dir setting
- Check config_calibre_split flag before using split path
- Replace sys.exit() with graceful fallback and warnings on DB errors
- Remove lambda in favor of explicit conditional logic
- Add clear logging to indicate split library mode vs normal mode
- Improve docstrings to document split library behavior

Added 4 comprehensive test cases (177 lines) following existing patterns:

- test_split_library_with_separate_paths: Core split library functionality
- test_books_path_falls_back_to_library_path: Invalid path handling
- test_books_path_with_none_value: Normal mode operation
- test_split_library_with_multiple_formats: Multi-format split library

All tests use existing helpers, subprocess execution, and tmp_path
isolation for consistency with the test suite.
2026-01-23 22:49:41 +01:00
crocodilestick 1011dd509a Phase 3: incremental duplicate scans, debounced scheduling, and metadata-safe title normalization 2026-01-15 16:48:50 +01:00
crocodilestick 689d223195 feat(duplicates): finalize Phase 2 scanning with hybrid detection, background tasking, cron scheduling, and UI feedback
default hybrid detection + SQL prefilter; updated defaults in schema
duplicate scans now run as background tasks with progress + cancel
debounced after-import scans and cron-based scheduled scans
settings UI updated for cron, defaults, and explanations + next run display
duplicates page shows progress bar + next scheduled run
task queue fixes + better error/confirmation messaging
cron validation on save and ISO task date formatting
2026-01-15 14:07:24 +01:00
crocodilestick ce3c452dca Implemented performant & more reliable SQL query + python hybrid system 2026-01-15 11:51:33 +01:00
crocodilestick 1491589bbf Added SQL dupe search as fallback for python dupe search as I couldn't get it to work as reliably. Disabled for now, might come back to in the future 2026-01-14 18:57:58 +01:00
crocodilestick 63f2cbbecc Added V1 of auto-resolving duplicate detection system 2026-01-14 17:07:16 +01:00
crocodilestick 897fe0ec34 fix(cover-enforcer): Add robust error handling for race conditions in metadata log processing
Resolves #892

- Add retry logic with 3 attempts and 0.5s delays in read_log() method
- Handle FileNotFoundError, JSONDecodeError, and file existence checks gracefully
- Return None instead of crashing when log files are missing or invalid
- Update main() to exit gracefully when log file is unavailable
- Update check_for_other_logs() to skip invalid entries and continue processing
- Prevent metadata-change-detector service crashes from timing issues

This prevents the server from crashing when the metadata-change-detector
service detects a log file that gets deleted or is not yet fully written
before cover_enforcer.py attempts to read it. The service now handles
these race conditions gracefully and continues operating without requiring
manual container restarts.
2026-01-13 16:33:40 +01:00
Seth Milliken f575d3258e fix: add --books-path argument to generate_book_checksums.py
Enables `generate_book_checksums` to use `config_calibre_split_dir` setting.
2026-01-09 12:22:55 -08:00
crocodilestick 7f7948cd41 Fixed comment is DB schemas causing errors 2026-01-09 14:49:15 +01:00
crocodilestick a4cff97a7c Added a notification system for duplicates that can be disabled in teh cwa settings panel 2026-01-08 19:08:07 +01:00
crocodilestick 378b2facff Implement Hardcover Auto-Fetch feature
- Add confidence scoring algorithm with Levenshtein distance and Jaccard similarity
- Create background worker (TaskAutoHardcoverID) with rate limiting and exponential backoff
- Add database schema: hardcover_match_queue and hardcover_auto_fetch_stats tables
- Implement comprehensive scheduling system (10 options: 15min intervals through monthly)
- Build settings UI with token validation and dynamic schedule selectors
- Add manual trigger button in admin panel
- Create review queue UI with gradient status hero cards
- Integrate stats dashboard with auto-matched, pending review, and manually reviewed counts
- Add text similarity utilities (normalized_levenshtein_similarity, author_list_similarity)
- Enhance Hardcover provider with calculate_confidence_score method
- Extend MetaRecord with confidence_score and match_reason fields
- Add IntervalTrigger support to background scheduler
2026-01-06 00:20:40 +01:00
crocodilestick 70d2e3a6c9 Fix #866: Validate book ID before adding format to prevent duplicates
- Add _validate_book_exists() to check book ID against metadata.db before add_format
- Enhance add_format_to_book() with upstream validation and improved error reporting
- Preserve failed manifests as .cwa.failed.json for debugging
- Add book existence check in editbooks.py before creating upload manifest
- Properly handle invalid book IDs by backing up to failed/ with clear error messages
2026-01-05 12:40:23 +01:00
CrocodileStick 509ada7a59 Merge pull request #863 from navels/fix/ingest-metadata-autosend
Fix auto-send and auto-metadata fetching
2026-01-03 14:05:39 +01:00
crocodilestick bb9d932706 Added stats tracking for magic shelves 2026-01-03 02:20:28 +01:00
Lee Nave 0f3c2efa5c Improve ingest metadata and auto-send 2026-01-02 21:16:15 +00:00
crocodilestick a62ad9e122 Fixed issues with the User Activity stats 2025-12-31 00:47:52 +01:00
crocodilestick 01229a750c Added Rating Statistics, Top Enforced Books & Import Source Flows to Library Stats tab 2025-12-30 21:22:53 +01:00
crocodilestick 1d502a291b Added API Usage stats tab 2025-12-30 20:02:02 +01:00
crocodilestick 78ec7d35d8 Added Average Session Duration, Search Success Rate & Shelf activity stats to User Activity tab 2025-12-30 17:49:44 +01:00
crocodilestick 52f1f5a2e4 Added Largest Series, Publication Year Distribution & Most Fixed Books sections to Library Stats tab 2025-12-30 16:13:12 +01:00
crocodilestick b97506153b Added tab for library stats 2025-12-30 15:29:09 +01:00
crocodilestick bcf604017f Added stat sections for Book Discovery Methods, Device Breakdown & Failed Auth Attempts 2025-12-30 14:04:28 +01:00
crocodilestick 65eb2dcc68 Added Peak Usage Hours heat map, reading velocity & format preferences by user 2025-12-30 13:52:23 +01:00
crocodilestick 880aff67e9 Added user specific filtering to user stats dashboard 2025-12-30 13:05:37 +01:00
crocodilestick 0e199f5fc5 Top Users usernames now link to user edit page 2025-12-30 12:31:42 +01:00
crocodilestick 4288ee2bb8 added user selectable time period functionality 2025-12-30 11:15:22 +01:00
crocodilestick 1949261110 Version 1 User Activity Stats page 2025-12-29 16:31:14 +01:00
crocodilestick d2fe1f1657 - Fixes #818: Internal API calls (ingest, library conversion, etc.) now respect SSL configuration instead of forcing HTTP.
- Added `get_internal_api_url` helper to dynamically construct localhost URLs based on cert/key presence.
- Updated `ingest_processor.py`, `cwa_functions.py`, and `tasks/ops.py` to use the new helper.
- Added unit tests for internal API URL generation.
- Updated Dockerfile HEALTHCHECK to fallback to HTTPS if HTTP fails.
- Upgraded external links to HTTPS in  (ISFDB) and  (Rakuten/Help) to fallback to HTTPS if HTTP fails.

Upgraded external links to HTTPS in db.py (ISFDB) and kobo.py (Rakuten/Help).

[bug] DB fails to update when adding new book if using TLS/SSL
Fixes #818
2025-12-08 22:15:30 +01:00