- Use CollectionSchema instead of ReturnType<typeof collectionsStore.getCollection>
in AddRoms.vue and RemoveRoms.vue (simpler, per gantoine review)
- Wrap bulk INSERT in a savepoint so a concurrent duplicate-key violation
is caught via IntegrityError and ignored rather than aborting the transaction
- Only bump Collection.updated_at in remove_roms_from_collection when rows
were actually deleted (rowcount > 0), matching add_roms_to_collection behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace full rom_ids list replacement with atomic POST/DELETE endpoints
that add or remove individual ROMs from a collection. This prevents
concurrent rapid clicks from overwriting each other (last-write-wins).
Also fix missing session.flush() in add_rom_user() and add collection
endpoint tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pylance infers dict[str, Unknown] from the {**old, "highest_award_kind": ...}
spread, which then fails to assign to list[RAUserGameProgression]. Wrap in
typing.cast so the TypedDict type survives the reassignment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Callers now pass the full platform dict and rom.fs_extension; the service
normalizes the extension (optional leading dot, case-insensitive) before
checking the compressed-archive skip set, so ROMs stored with bare
extensions like "zip" correctly hit the skip path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RAHasher was being spawned for every hashable ROM regardless of file
type. When the source file is a zip/7z/tar and the RA platform needs
an on-disk disc image (PSX, PS2, PSP, Saturn, Dreamcast, Sega CD,
3DO, PC-FX, Neo Geo CD, TurboGrafx CD, Atari Jaguar CD, Wii), the
subprocess fails with "Unsupported console for buffer hash: {id}"
after paying full process-spawn overhead per ROM — a serious slowdown
when indexing large zipped collections (e.g. myrient PS2/PSP sets).
calculate_hash now short-circuits those combinations with a debug log
and no subprocess. Raw disc images (.iso, .chd, .cue/.bin) and
archives on cartridge platforms still go through RAHasher as before.
Also centralize COMPRESSED_FILE_EXTENSIONS in utils/filesystem.py so
roms_handler (is_compressed_file / hashing), rahasher (skip logic),
and feeds (PKGi passthrough) share one source of truth. The shared
set adds .rar, which is_compressed_file now recognizes too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
get_rom now also fetches Named_Snaps, Named_Titles, and Named_Logos
when the matching MetadataMediaType (SCREENSHOT, TITLE_SCREEN, LOGO)
is in SCAN_MEDIA. Box art is still fetched unconditionally — it drives
url_cover and libretro_id. Matching extras are appended to
url_screenshots so the scan_handler artwork priority loop picks them
up without further changes. All enabled listings are fetched
concurrently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SGDB's url_cover used to clobber whatever the artwork priority loop
picked, which meant user-set priorities (e.g. libretro > sgdb) and
manually uploaded covers were silently overridden. SGDB now only
replaces the current url_cover when it outranks every other source
that produced one under SCAN_ARTWORK_PRIORITY, and never over a
manual cover preserved on UPDATE/UNMATCHED scans. Default artwork
priority gains sgdb at the top so existing "SGDB wins" behavior is
preserved for default configs.
On the /search/roms endpoint, libretro is now an enrichment source
alongside SGDB instead of a primary match source: it decorates
entries resolved by IGDB/Moby/SS/Flashpoint/Launchbox with
libretro_id and libretro_url_cover, mirroring how SGDB works.
get_matched_roms_by_name is removed from the libretro handler since
nothing else calls it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Commit 3991e1b6e removed `@with_details` from `get_roms_by_fs_name` but
left the body using the `query` parameter that decorator was supposed
to inject, so every scan hit `'NoneType' object has no attribute
'filter'` and crashed the platform identification task.
Make the function self-contained: build `select(Rom)` directly and
eager-load only `Rom.platform`, the one relationship the scan loop
actually needs (via `rom.platform_slug` / `rom.platform.fs_slug`).
Keeps the prior commit's intent of avoiding the heavy `with_details`
eager-load on every batch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collapse the two parallel id lists and their mirrored chunked-update
loops into a `flips: dict[bool, list[int]]` keyed by desired state, and
drop unused rom assignments in the related tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the libretro thumbnail repository as a first-class artwork source so
region-correct box art (PAL/Europe, Japan, etc.) can be matched directly
to ROM filenames, addressing rommapp/romm#3239.
Implementation follows the SGDB handler pattern (artwork-only, no game
metadata): MetadataSource enum entry, scan-time fetch wired into the
SCAN_ARTWORK_PRIORITY loop, /search/roms integration, MatchRom dialog
chip + cover selection, and a heartbeat flag.
Matching is exact case-insensitive against the directory listing first
(so a ROM named "(Europe)" lands on the (Europe) artwork), with a
JaroWinkler fuzzy fallback at 0.8 that strips parenthetical tags from
both sides. Listings are cached in Redis with a 24h TTL.
`libretro_id` is persisted on the Rom model as the SHA1 hex of the
matched libretro filename — stable across scans, distinct per region,
indexed for lookup. Migration 0077 adds the column.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The scan was spending excessive time on large platforms even when all ROMs
were already scanned. Root causes: per-ROM UPDATE queries for skipped ROMs
(10k individual writes), missing composite index on (platform_id, fs_name)
causing full table scans, NOT IN clauses with 10k+ values in
mark_missing_roms(), and redundant filesystem reads.
Changes:
- Add bulk_mark_present() for batch-updating skipped ROMs in one query
- Move skip detection from _identify_rom to the batch loop so skipped ROMs
never enter the async scan pipeline, and report progress for them
- Add composite index idx_roms_platform_id_fs_name via migration 0077
- Rewrite mark_missing_roms() with flip-based approach: mark all missing,
then un-mark present ones in chunks of 1000
- Cache filesystem reads in scan_platforms() to avoid double directory
traversal (precounting + scanning)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Problem
_check_content_type used the full Content-Type header string (lowercased) and matched it with startswith(...) against allowed prefixes.
That is mostly fine when the server sends a bare type like application/pdf. It breaks down when vendors send parameters on the same header (e.g. name="…", charset=…). In theory application/force-download; name="…" should still start with application/force-download, but in practice you can get:
Leading whitespace or a UTF‑8 BOM before the type token, so the string no longer starts with your prefix even though the MIME type is correct.
Confusing logs: logging only the lowercased full header is fine, but the decision should be based on the standardized MIME essence (type + subtype, no parameters), which is what other stacks use for “what is this?”
So the fix is to parse the header the usual way and only then apply your allowlist.
What changed
_content_type_essence(header_value)
Takes everything before the first ; (the essence).
Strips whitespace, lowercases, strips a leading BOM (\ufeff) so odd clients/proxies don’t break the check.
_check_content_type
Reads the raw content-type header once.
Runs startswith on the essence, not on the full header with parameters.
Rejects if the essence is empty (missing or useless header).
Logging uses the raw header string (or (missing header)), so operators still see exactly what the server sent.
Call sites and allowed prefixes (image/, application/pdf, etc.) are unchanged; only how the string is normalized before comparison changes.
Security / SSRF
This does not replace URL / SSRF controls; it only makes post-fetch type checking consistent with how Content-Type is defined (essence vs parameters). You are not widening the allowlist—same prefixes, stricter handling of “empty” and clearer matching on the actual type token.
Risk / regression
Low: same allowed prefixes, strictly more tolerant of benign formatting (whitespace, BOM, parameters). The only stricter case is empty essence after strip (e.g. malformed header), which correctly fails the check.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
I have reviewed the proposal and these edits will handle cases where the string we match against for the content_type is cleaned up more before comparing against the allow list of content_types.
I have tested this, and confirm that I do not get any errors loading PDFs for game manuals using this. Please consider this, as this should be compatible with the existing content type allowlist, and easily work with any new types added to it.