Compare commits

...

237 Commits

Author SHA1 Message Date
panni 2006ebb244 2.0.20.1364 RC9 2017-05-24 21:47:51 +02:00
panni 58c852cdba submod: OCR update eng data 2017-05-24 21:41:51 +02:00
panni 9e77a8e304 update guessit to d96859d056864b8956cbeb8c8f5bb6875d270e39 2017-05-24 21:40:12 +02:00
panni e9817f1e0d bump version 2017-05-24 18:03:53 +02:00
panni 123dde7b8f don't verify hashes of specials 2017-05-24 18:02:23 +02:00
panni c1b84eabdb improve specials support (opensubtitles), mostly for manual listing 2017-05-24 16:24:23 +02:00
panni c7ececde77 add doc 2017-05-23 23:06:12 +02:00
panni 6f305d636e make legandastv subtitle picklable for availablesubsforitem 2017-05-23 22:37:00 +02:00
panni d25990895c something's wrong with the menu history key here; add error debug 2017-05-23 22:12:22 +02:00
panni d406ced759 bump version 2017-05-23 18:08:28 +02:00
panni b858b56120 add hearing_impaired_verifiable per provider/subtitle and only bail out on force-non-hi if necessary; #289 2017-05-23 18:07:56 +02:00
panni c94fe81dbf bump dev version 2017-05-23 17:54:49 +02:00
panni a67bbebb84 Merge remote-tracking branch 'origin/develop-2.0' into develop-2.0 2017-05-23 13:00:38 +02:00
panni cf577c81e1 submod: OCR fixes: compile new dictionaries 2017-05-23 13:00:27 +02:00
panni ad236be02c submod: OCR fixes: and more. 2017-05-23 12:59:37 +02:00
panni 3412e379d6 submod: better unopened/unclosed font tag handling 2017-05-23 12:49:46 +02:00
panni 95f240ab07 submod: HI: HI_before_colon broke font style tags 2017-05-23 12:17:54 +02:00
panni 0c8ae3f45b submod: update eng OCR fix data 2017-05-23 11:58:08 +02:00
pannal fe87944049 Update README.md 2017-05-22 05:48:50 +02:00
pannal d7918b1714 Update README.md 2017-05-22 05:16:07 +02:00
panni c147c29756 add wiki notice to notify_executable pref 2017-05-22 02:33:18 +02:00
panni 5a4a50bc9d add note about enforce_encoding 2017-05-22 02:30:49 +02:00
panni 55ea4009c9 rename exotic_ext prefs to reflect its current function 2017-05-22 02:28:59 +02:00
panni 536fd7dfe4 bump dev version 2017-05-22 02:13:12 +02:00
panni a1f6568b84 only use the first video stream #270 2017-05-22 02:11:35 +02:00
panni 6a9112f03c add more known info about the media file/streams; resolves #270 2017-05-22 02:10:26 +02:00
panni 89b4305ccb don't query plex item twice in case of movies 2017-05-22 01:26:26 +02:00
panni e2756e85b7 2.0.19.1337 RC8 2017-05-21 15:52:37 +02:00
panni 0f7bc36e86 add fixme 2017-05-21 15:50:12 +02:00
panni 5e20032976 fix findbetter 2017-05-21 15:40:37 +02:00
panni c7dbac05a9 update guessit to 8d56c9f 2017-05-21 15:35:06 +02:00
panni a0a5adb807 remove info log 2017-05-21 06:19:41 +02:00
panni ac6a43f6e5 re-up recently to 2 weeks and 1000 items 2017-05-21 06:13:59 +02:00
panni 91f57da735 fix findallrecentlymissing 2017-05-21 06:13:29 +02:00
panni 488ac604f9 better debug info for findbettersubtitles 2017-05-21 04:09:33 +02:00
panni 70ab3e456f add missing info to hints and video_info 2017-05-21 04:00:36 +02:00
panni d0017d2ab8 fix 2017-05-21 03:41:44 +02:00
panni 9633abc09e ditch OMDB refiner support for now. all needed info comes from the PMS 2017-05-21 01:49:51 +02:00
panni 8f608acc71 submod: OCR update data 2017-05-20 22:31:28 +02:00
panni dbce582bdf submod: skip empty line post processors when not needed 2017-05-20 22:24:29 +02:00
panni 62f03bcf11 submod: fix not opened/closed font tags after modification 2017-05-20 16:20:27 +02:00
panni 530eb9ef66 adapt 1.4 readme 2017-05-20 05:22:39 +02:00
panni 497a94e3a5 submod: update dictionaries from SE 2017-05-20 04:07:40 +02:00
panni e17082d27e task allrecentlymissing: fix logging 2017-05-20 02:17:41 +02:00
panni 2eefb8e225 fixes; lower default recently added to 1 week 2017-05-20 01:26:38 +02:00
panni 5d9b1a1810 don't re-guess encoding when saving modified subtitle 2017-05-20 00:42:38 +02:00
panni f274e76253 submod: simplify 2017-05-20 00:35:34 +02:00
panni 3bfef7f67b submod: break mods.modify up to make it smaller 2017-05-20 00:34:09 +02:00
panni 5d6651e00e submod: HI: remove obsolete fixme 2017-05-19 23:33:40 +02:00
panni f0ed0b7c41 submod: common: move CM_double_apostrophe further up the chain 2017-05-19 23:29:49 +02:00
panni 0d4bf7b6b3 submod: common: CM_uppercase_i_in_word: support "WeII" aswell 2017-05-19 23:23:55 +02:00
panni a5c7c656e6 set get my logs link as title2 also 2017-05-19 23:15:18 +02:00
panni fb3a937c81 submod: add performance debug 2017-05-19 23:11:24 +02:00
panni e50820abd0 submod: common: fix CM_uppercase_i_in_word 2017-05-19 23:03:17 +02:00
panni 083084136c don't fall back to utf-8, we should be good here 2017-05-19 22:57:20 +02:00
panni 0188b81220 clarify 2017-05-19 22:55:09 +02:00
panni c7468dbfb5 submod: OCR add more eng data 2017-05-19 22:53:19 +02:00
panni d92ba7125e in case of microdvd, try guessing the fps from the file, else suggest the FPS from our media file. add docs 2017-05-19 22:52:05 +02:00
panni 050d5dd063 add config.enforce_encoding to debug log 2017-05-19 21:54:37 +02:00
panni a860c57bd1 when force-utf8 is enabled, also store subtitle content in utf-8 2017-05-19 21:52:19 +02:00
panni 1b0b189c16 add more encodings for western, eastern and northern europe 2017-05-19 18:51:52 +02:00
panni 7d2b3d6663 add our pysubs2 to_unicode encoder to PatchedSubtitle; add iso-8859-2 for polish; 2017-05-19 18:42:31 +02:00
panni 2899d68973 add fps to napiprojekt subtitle for when it can't be guessed from the MicroDVD format contents 2017-05-19 18:28:14 +02:00
panni 0cc8238b1a don't trigger text conversion more than once in is_valid 2017-05-19 17:55:05 +02:00
panni f277751d86 don't blerg all of the subtitle content into stdout; log the traceback for pysubs2 2017-05-19 17:51:58 +02:00
panni 74d63a9144 2.0.19.1299 RC7 2017-05-19 14:51:22 +02:00
panni 07f7b4e7fb add fixme 2017-05-19 14:42:58 +02:00
panni 92fda093f7 submod: CM_spaces_in_numbers: don't break up ellipses 2017-05-19 14:38:33 +02:00
panni 714751d2d8 submod: merge mergeable mods; skip duplicate exclusive mods early; make offset args mergeable to avoid nasty stuff like negative offset first, then positive 2017-05-19 14:29:59 +02:00
panni 2c949192b2 submod: improve processing performance by adding some shortcuts 2017-05-19 14:08:36 +02:00
panni c0e3c6a0eb submod: improve processing performance by feeding line mods already cleaned-up lines 2017-05-19 13:43:30 +02:00
panni 764484f735 submod: add fixed order to line mods 2017-05-19 03:29:05 +02:00
panni 208bd4fcb2 reset last order change 2017-05-19 03:28:44 +02:00
panni ba53a5fa93 add more stuff to test.srt 2017-05-18 13:55:50 +02:00
panni 4d40da5661 submod: common: leading crocodile can also have a space in front 2017-05-18 13:49:36 +02:00
panni 4ab157e2a1 submod: re_processor: clean font style tags before processing the line 2017-05-18 13:47:36 +02:00
panni dbf64d2a2b submod: HI: make bracket detection more aggressive 2017-05-18 13:44:55 +02:00
panni 03d4ee3482 submod: HI: add HI_starting_upper_then_sentence 2017-05-18 13:17:43 +02:00
panni 959a061380 submod: set default order 2017-05-18 13:17:20 +02:00
panni f5432dfb9e submod: OCR: more eng default fixes 2017-05-18 13:16:59 +02:00
panni fb494a911d fix character ranges 2017-05-17 20:15:45 +02:00
panni bc9dec659c submod: update uppercase after dot to be less greedy 2017-05-17 20:12:20 +02:00
panni b68cc3f61e submod: use À-Ž instead of A-Z for patterns 2017-05-17 20:00:38 +02:00
panni 0db80add2c submod: common: fit non-uppercase after dot 2017-05-17 19:56:41 +02:00
panni 2a67632497 update OCR fix data 2017-05-17 19:17:04 +02:00
panni 5260b28c15 submod: HI: be less aggressive on removing text-before-colon 2017-05-17 19:14:26 +02:00
panni 4d365cba22 submod: don't fix countdown numbers 2017-05-17 19:02:42 +02:00
panni 8174a8efc3 submod fixes english: Âs='s 2017-05-17 19:01:07 +02:00
panni a5d8df35b6 more stuff for the readme 2017-05-17 18:50:06 +02:00
panni 0ad429ffaa add automation 2017-05-17 15:23:05 +02:00
panni 3108572387 move changelog for now 2017-05-17 15:17:28 +02:00
panni 98a406ff9e revert, preformatted looks better 2017-05-17 15:16:51 +02:00
panni 9257550e56 update readme for mods 2017-05-17 15:15:56 +02:00
panni ef19ed0a26 update readme for mods 2017-05-17 15:13:43 +02:00
panni 80daa8560d first version of the 2.0 readme 2017-05-17 15:06:53 +02:00
panni 797cc16a91 add cleanline processor; remove Mr->Mr. as it's valid in the UK 2017-05-17 14:26:19 +02:00
panni 771e0464d7 update OCR fixes 2017-05-17 13:51:48 +02:00
panni 715e9c0015 2.0.19.1267 RC6 2017-05-16 18:10:59 +02:00
panni d13a0c4fb3 submod: allow for more punctuation in spaced numbers; add more english OCR fixes 2017-05-16 17:55:49 +02:00
panni 2bb0517264 correctly handle partiallines 2017-05-16 17:46:56 +02:00
panni ac174673ef fix major whoopsie in item details 2017-05-16 14:22:31 +02:00
panni dacab5ece7 enzyme: fix logging; skip element without type 2017-05-16 14:22:22 +02:00
panni 69a5ef6f18 common fixes: test for leading ellipsis earlier to skip unnecessary CM_ellipsis_no_space 2017-05-16 14:15:19 +02:00
panni 47be8eef62 HI: improve all caps line matching (allow some punctuation) 2017-05-16 14:13:41 +02:00
panni fe7760e779 color mod: return the original line if color not found 2017-05-16 13:50:05 +02:00
panni 18dddaf0a1 add our own dictionaries to submod fixes 2017-05-16 13:44:23 +02:00
panni b32066e6f8 don't bother listing unexistant parts in item details menu 2017-05-16 13:37:02 +02:00
panni eca378c09e submod: fix patterns for beginlines/endlines 2017-05-16 13:34:28 +02:00
panni 2c3e4173f4 only append extension to jsonpath if necessary; bail out correctly 2017-05-16 13:00:59 +02:00
panni 488a65055b cache guessed encoding and don't re-guess every time 2017-05-15 18:47:52 +02:00
panni cb94f0c2c6 remove invalid comment 2017-05-15 18:09:31 +02:00
panni 8dc4cf8d63 subtitle history: don't fail on old dict data 2017-05-15 18:07:39 +02:00
panni 82ec5e0d5e only store subtitle info if save was successful 2017-05-15 18:02:33 +02:00
panni 91cebd2902 store encoding of subtitle in storage; store unicode version; add migration task 2017-05-15 18:00:51 +02:00
panni cecee18d8e implement new json/gzip based subtitle storage format; auto-migrate legacy data 2017-05-15 17:01:20 +02:00
panni 2b1ea2eb6f add json_tricks 3.9.0 2017-05-15 16:11:28 +02:00
panni bc67b380e5 Merge remote-tracking branch 'origin/develop-2.0' into develop-2.0 2017-05-14 02:53:36 +02:00
panni b7b784f442 clarify not found preferences.xml 2017-05-14 02:53:24 +02:00
pannal 6889effbb6 Update README.md 2017-05-14 02:44:43 +02:00
panni ae7865ecb8 2.0.18.1245 RC5 2017-05-14 02:31:25 +02:00
panni 83c9d4887b rename Auto-search to Force-find 2017-05-14 02:26:31 +02:00
panni 75da4dab70 clear up already decoded debug info 2017-05-14 02:25:14 +02:00
panni 07fccf9b52 shift_offset should be non-exclusive 2017-05-14 02:15:20 +02:00
panni 6cfafd60ef add full color range; add color submod menu 2017-05-14 02:13:12 +02:00
panni b24bd740c2 fix stupidity. add newline to subtitle line index 2017-05-14 01:36:38 +02:00
panni 6c81ee7b3a addic7ed: format also matches if release group was correct 2017-05-14 01:33:48 +02:00
panni cd00194819 add more debug 2017-05-14 01:24:19 +02:00
panni 0eda52e3b2 update readme 2017-05-13 16:47:29 +02:00
panni 56de3b5658 again 2017-05-13 15:00:37 +02:00
panni b8f31fc36f forgot version 2017-05-13 15:00:31 +02:00
panni 7354110d2f pre-release 2.0.15.1234 RC4 2017-05-13 14:59:15 +02:00
panni c08335b5a8 fail miserably when last-resort utf-8 encoding fails also 2017-05-13 14:49:43 +02:00
panni f4d9a3c65c add color mod; add to_unicode to submod 2017-05-13 06:32:40 +02:00
panni 174b73a5cb doc 2017-05-13 04:55:45 +02:00
panni 5df5123682 simplify data patterns 2017-05-13 04:32:29 +02:00
panni 1aef828fcd debug mods with repr; (um) = (?um) 2017-05-13 04:11:04 +02:00
panni 6401183eff increase searchallrecentlymissing wait to 5 seconds per request 2017-05-13 02:13:17 +02:00
panni 82757a2f0c apply correct path to env on non-windows 2017-05-13 02:05:15 +02:00
panni 736386bc31 try mitigating #27 2017-05-13 01:45:32 +02:00
panni 922bed81fa resolve #256 2017-05-13 01:34:20 +02:00
panni 708e8c5b14 also print SZ environment variables 2017-05-13 01:26:17 +02:00
panni 1e02082472 don't fail on metadata query timeout 2017-05-13 01:20:10 +02:00
panni 9599bcb70f searchallrecentlymissing: don't error on timeout; don't fail on no current mods 2017-05-13 01:17:48 +02:00
panni dad8460574 correctly handle multiple media files with multiple parts; honor physical ignore in missing subtitles 2017-05-12 18:23:53 +02:00
panni 021d12963f update provider test; add custom repr for napiprojektsubtitle 2017-05-12 16:30:24 +02:00
panni e5599650ac implement custom user agent (for OS) 2017-05-12 15:29:44 +02:00
panni 22a1eff98e backport provider download retry behaviour 2017-05-12 01:28:33 +02:00
panni 2e05eb91ca also discard provider 2017-05-12 01:18:43 +02:00
panni 031e035a50 2.0.15.1216 RC3 2017-05-08 17:56:25 +02:00
panni 02374575bc add missing thread.sleep 2017-05-08 17:54:57 +02:00
panni adef9e1014 only retry on specific RequestExceptions 2017-05-08 17:51:04 +02:00
panni 5bb3f15332 only retry on RequestException 2017-05-08 17:46:44 +02:00
panni 089e0d5d6c use WholeLineProcessor for WholeLines 2017-05-08 17:40:20 +02:00
panni 513bc2ae8b use correct sys.modules path; add non-refreshing local subtitle search 2017-05-08 06:01:14 +02:00
panni 8a1c61ac22 2.0.15.1209 RC2 2017-05-08 05:34:32 +02:00
panni 3e1910a28b 2.0.15.1209 RC2 2017-05-08 04:07:24 +02:00
panni b5e5341436 add generic back options in sub menus 2017-05-08 03:59:53 +02:00
panni 223ef16583 add back menu items for season/episodes 2017-05-08 03:40:07 +02:00
panni 114312e1e5 rename leeway to sleep_after_request 2017-05-08 02:30:36 +02:00
panni 1a49159b64 by default don't download better subtitles for manually modified ones 2017-05-08 02:22:47 +02:00
panni d0ee9badb2 don't cleanup matching custom or embedded tag 2017-05-08 02:08:34 +02:00
panni b9116c30ed debounce crucial items in advanced menu 2017-05-08 02:03:22 +02:00
panni d7e6436d8d stagger less 2017-05-08 01:41:40 +02:00
panni c039172880 stagger thread creation on scheduled and manual (GUI) triggered tasks; react faster on requested task run 2017-05-08 01:39:34 +02:00
panni bd5da47370 adjust leeway to 0.2s 2017-05-08 01:29:17 +02:00
panni e9aabe0a5e spawn scheduled tasks in separate threads 2017-05-08 01:26:59 +02:00
panni f3f09dbb9d stagger SearchAllRecentlyAddedMissing 2017-05-08 01:26:33 +02:00
panni 3cc8a98f67 stagger FindBetter by 1 second per item 2017-05-08 01:07:28 +02:00
panni 31e923c080 reduce sudmod shift minute range from -59/60 to -15/15 2017-05-07 22:39:49 +02:00
panni 39b3b4a0c2 move update_local_media before ignore list checking 2017-05-07 22:21:24 +02:00
panni 8470daa20f more debug info when loading stored sub info; delete invalid sub info when loading; don't fail apply_default_mods on invalid sub info 2017-05-07 06:17:03 +02:00
panni e852137baf rename titles for on-deck and recently added items menu items 2017-05-07 05:32:48 +02:00
panni 753c46d9fd move PartUnknownException to helpers; add items.set_mods_for_part; add ApplyDefaultMods and ReApplyMods to advanced menu 2017-05-07 05:32:23 +02:00
panni e06ca730a2 make amount of stored recently played items dynamic 2017-05-07 05:31:02 +02:00
panni f84e84b17b allow wrong subtitle FPS when manually listing subtitles 2017-05-07 05:16:12 +02:00
panni 4f927b272b log no better subtitles found 2017-05-07 04:41:36 +02:00
panni 662e1a93a9 store last 20 played items; shift last played item accordingly if already in last played list 2017-05-07 03:40:41 +02:00
panni e25a043457 return save_successful on save_subtitles 2017-05-07 02:47:06 +02:00
panni b32f923513 add subtitle modification debug setting; also apply mods on metadata-stored subtitles 2017-05-07 02:45:12 +02:00
panni ad8898266e mod: common: fix starting space dots 2017-05-07 02:22:37 +02:00
panni 51e87bdda5 don't crash the menu when no mods are applied on the current subtitle 2017-05-06 18:07:53 +02:00
panni f88677b0f6 fix common fixes description 2017-05-06 18:04:20 +02:00
panni fc71ec0250 remove unnecessary debounces 2017-05-06 18:00:40 +02:00
panni ca6089c220 Pre-Release 2.0.12.1180 RC1 2017-05-06 17:49:58 +02:00
panni 7cc051fd90 set default movie score to lowest (60) 2017-05-06 17:43:38 +02:00
panni 5b01fda526 adapt forced_only for new providers (disable them) 2017-05-06 17:37:31 +02:00
panni 585f6b8a4d rename config.use_activities to react_to_activities and act accordingly 2017-05-06 17:29:11 +02:00
panni 81aeba0874 use added icon instead of recent icon for recently added menu 2017-05-06 17:24:05 +02:00
panni d9133e2793 add recently played menu 2017-05-06 17:22:33 +02:00
panni 9ef740ae1f remove_HI: less aggressive bracket content matching 2017-05-06 16:53:32 +02:00
panni e54fe71e93 reduce addicted default boost to 21 2017-05-06 16:46:54 +02:00
panni 9df878b8e3 add common fixes as default; remove debug print 2017-05-06 16:46:22 +02:00
panni 1a59c267c1 remove doublequote processors, doesn't seem possible 2017-05-06 16:42:07 +02:00
panni f8a07d983b fix typo resolves #274 2017-05-06 15:28:40 +02:00
panni 1f1847f246 change doublequote regexes 2017-05-06 06:48:52 +02:00
panni a32dfd6b37 add common fixes 2017-05-06 06:14:58 +02:00
panni b1cce92e04 use positive lookahead for HI all caps line detection 2017-05-06 01:35:43 +02:00
panni fdf32439c9 don't remove dash-in-front on hearing impaired; skip empty lines properly 2017-05-06 01:26:17 +02:00
panni fc2208f9e5 bump version 2017-05-05 19:32:12 +02:00
panni 1a4eb366bb add helping indicator to FPS mod; add 30fps 2017-05-05 19:31:43 +02:00
panni b89c64a2c2 add modification management menu 2017-05-05 19:19:34 +02:00
panni 68e8f6e753 don't remove HI by default 2017-05-05 19:11:43 +02:00
panni f15cc4cb3c add offset shifter submod 2017-05-05 19:10:32 +02:00
panni 903273e3ef add advanced submods; add global (non-line) submods; test implementation of ChangeFPS mod 2017-05-05 15:39:18 +02:00
panni 1c9b744d31 move subtitle modification menu to separate file 2017-05-05 14:58:19 +02:00
panni 7c0fb29886 fix init_cache whoopsie 2017-05-05 14:58:06 +02:00
panni 2505a7510c enzyme: incorporate 0.4.2 fixes 2017-05-05 14:44:59 +02:00
panni 0a66db40a2 fix findbetter 2017-05-05 14:30:49 +02:00
panni 6c68893979 add mod.long_description; add remove_last action to subtitle modification menu 2017-05-04 20:10:35 +02:00
panni c512eab0b6 testcommit 2017-05-04 20:00:12 +02:00
panni 3cedd4bd0f try getting plex token from environment by default 2017-05-04 19:33:05 +02:00
panni 0759c5e4c6 add environment debug 2017-05-04 19:31:07 +02:00
panni ad6cf4be79 move config debug to better position; verify readability of log files 2017-05-04 19:15:38 +02:00
panni 23c3899fb2 add fixme 2017-05-04 14:30:25 +02:00
panni 1a6515a660 add platform and os to config debug 2017-05-04 14:20:29 +02:00
panni 58815a7650 use external ip fallback when logs were requested from plex.tv 2017-05-04 14:16:10 +02:00
panni c15ec9fefc disable get_logs when universal plex token is None 2017-05-04 13:49:02 +02:00
panni 0e18d59680 2.0.0.12 2017-05-03 23:12:42 +02:00
panni 2d88efa5b4 add doc 2017-05-03 23:12:26 +02:00
panni b3da7572f3 add PartialWordsAlways to OCR_fixes 2017-05-03 23:11:02 +02:00
panni 099ec4e85d remove debug print; add doc 2017-05-03 23:04:25 +02:00
panni ff88a15c61 reset initialized mods after load 2017-05-03 22:59:47 +02:00
panni 839791b0fa add OCR fixes as default; fix little whoopsie in SubtitleModifications.modify 2017-05-03 22:52:33 +02:00
panni 159a533731 add precompiled patterns to data dict; add more parsed data; add OCR fixes finally 2017-05-03 22:44:54 +02:00
panni fb5835baa4 separate ocr fix data further into line, word, partial 2017-05-03 15:19:52 +02:00
panni a3f05cd597 separat partial and full replace data 2017-05-03 15:16:22 +02:00
panni f3af1672f6 use memory cache on windows for now; add config debug logging 2017-05-03 13:33:29 +02:00
panni c984c9849b only add better subtitle if its score is higher than the minimum configured 2017-05-02 21:37:40 +02:00
panni e28d264125 language conversion test 2017-05-02 19:22:57 +02:00
panni 7166ab9502 use default mods in tasks as well 2017-05-02 18:47:58 +02:00
panni ab242c2ecb add current find/replace data 2017-05-02 18:43:45 +02:00
panni 6f829dd4c7 move xmls to xml/; add make_data and test_data script; 2017-05-02 18:43:35 +02:00
panni 3e0602cdf0 add OCRFixReplaceList dictionaries of SubtitleEdit; commit 4f43a84c354d53251614fe6fa4c1b9df92839f57; add second test srt 2017-05-02 18:03:17 +02:00
panni 67cdebfb67 make subtitle modifications a subpackage of subzero 2017-05-02 18:01:46 +02:00
panni 0f87973742 modify test.srt to accomodate for specials chars in text-before-colon; handle special chars in HI_before_colon better 2017-05-02 17:42:39 +02:00
panni 92317f7730 add task run info logging 2017-05-01 05:37:38 +02:00
panni ce936c2553 add task debug 2017-05-01 05:37:09 +02:00
105 changed files with 25128 additions and 735 deletions
+107
View File
@@ -1,3 +1,110 @@
2.0.19.1337 RC8
- napiprojekt: fixed: couldn't convert microdvd to SRT in certain occasions
- core: when normalize to UTF-8 is enabled, also store the subtitle in UTF-8 encoding in the internal storage
- core: add more encodings for western/eastern/northern europe
- submod: OCR: update dictionaries from SubtitleEdit
- submod: common: be smarter about uppercase i's in words that should have lowercase L's
- submod: fix unopened/unclosed font style tags after modification
- core: re-enable OMDB support
- core: update guessit for better matching
- core: fix SearchAllRecentlyMissing (was broken since RC3)
2.0.19.1299 RC7
- submod: offset mods now get merged internally when applied multiple times (to avoid errors and increase performance)
- submod: improve performance
- submod: core mods (OCR, common, remove_HI) now are always applied in a fixed order internally, regardless of the order they were added in
- submod: CM_spaces_in_numbers: don't break up ellipses (30... 29... 28...)
- submod: CM_spaces_in_numbers: don't fix countdown numbers (30, 29, 28)
- submod: remove_HI: make bracket removal more aggressive
- submod: remove_HI: be less aggressive when removing text-before-colon
- submod: remove_HI: remove all-uppercase-before-sentence (THIS IS ALL UPPERCASE And here starts a sentence -> And here starts a sentence)
- submod: fix all character ranges to include non-ASCII characters
- add new README for 2.0
2.0.19.1267 RC6
- core: add new SZ subtitle storage format
- smaller data files and less cumbersome
- it will auto migrate when old data is accessed - to speed this up, use "Trigger subtitle storage migration (expensive)" in advanced menu)
- core: performance optimizations
- addic7ed: when release group matches, assume the format matches, too (leftover change from RC5)
- submod: fix patterns for beginlines/endlines
- submod: add our own dictionaries to OCR fixes (english)
- submod: hearing impaired: also remove full-caps with punctuation inside
- submod: correctly handle partiallines
- submod: in numbers with spaces (incorrect), also allow for some punctuation (,.:')
2.0.18.1245 RC5
- core: add more debug info
- core: fix subtitle modifications (was broken in RC4, created non-usable subtitles)
- submod: add ANSI colors
- menu/submod: add color mod menu
- submod: exclusive mods now are mutually exclusive and get cleaned on duplicate
- menu/core: naming
For everyone who runs RC4: your subtitles are broken. Go to the advanced menu and trigger `Re-Apply mods of all stored subtitles` to fix them.
2.0.17.1234 RC4
- core: backport provider-download-retry implementation
- core: implement custom user agent (for OpenSubtitles)
- core/menu: correct handling of media with multiple files
- core: fix SearchAllRecentlyMissing; also wait 5 seconds between searches
- core: SearchAllRecentlyMissing: honor physical ignores
- submod: pattern fixes
- submod: better unicode handling
- submod: add color mod (only automatic by now)
2.0.15.1216 RC3
- core: fixes
- scheduler: revert some of the aggressive changes in RC2
- submod: be smarter about WholeLine matches
2.0.15.1209 RC2
- core: fixes
- core: submod-common: fix multiple dots at start of line
- core/menu: add subtitle modification debug setting
- core/menu: when manually listing available subtitles in menu, display those with wrong FPS also (opensubtitles), because you can fix them later
- core/menu: advanced-menu: add apply-all-default-mods menu item; add re-apply all mods menu item
- core: always look for currently (not-) existing subtitles when called; hopefully fixes #276
- scheduler/menu: be faster; also launch scheduled tasks in threads, not just manually launched ones
- core: don't delete subtitles with .custom or .embedded in their filenames when running auto cleanup, if the correct media file exists
- menu: add back-to-previous menu items
2.0.12.1180 RC1
- core: update subliminal to version 2
- core: update all dependencies
- core: add new providers: legendastv (pt-BR), napiprojekt (pl), shooter (cn), subscenter (heb)
- core: rewritten all subliminal patches for version 2
- menu: add icons for menu items; update main channel icon
- core: use SSL again for opensubtitles
- core: improved matching due to subliminal 2 (and SZ custom) tvdb/omdb refiners
- menu: add "Get my logs" function to the advanced menu, which zips up all necessary logs suitable for posting in the forums
- core: on non-windows systems, utilize a file-based cache database for provider media lists and subliminal refiner results
- core: add manual and automatic subtitle modification framework (fix common OCR issues, remove hearing impaired etc.)
- menu: add subtitle modifications (subtitle content fixes, offset-based shifting, framerate conversion)
- menu: add recently played menu
- improve almost everything Sub-Zero did in 1.4 :)
1.4.27.973
- core: ignore "obfuscated" and "scrambled" tags in filenames when searching for subtitles
- core: exotic embedded subtitles are now also considered when searching (and when the option is enabled); fixes #264
1.4.27.967
- core: remember the last 10 played items; only consider on_playback for "playing" state within the first 60 seconds of an item
1.4.27.965
- core: on_playback activity bugfixes
1.4.27.957
- core: correctly fall back to the next best subtitle if the current one couldn't be downloaded; hopefully fixes #231
- core: add "Scan: which external subtitles should be picked up?"-setting
+5 -12
View File
@@ -24,12 +24,11 @@ import support
import interface
sys.modules["interface"] = interface
from subliminal.cli import MutexLock
from subzero.constants import OS_PLEX_USERAGENT, PERSONAL_MEDIA_IDENTIFIER
from interface.menu import *
from support.plex_media import media_to_videos, get_media_item_ids, scan_videos
from support.subtitlehelpers import get_subtitles_from_metadata
from support.storage import whack_missing_parts, save_subtitles, get_subtitle_storage
from support.storage import whack_missing_parts, save_subtitles
from support.items import is_ignored
from support.config import config
from support.lib import get_intent
@@ -43,13 +42,7 @@ def Start():
HTTP.CacheTime = 0
HTTP.Headers['User-agent'] = OS_PLEX_USERAGENT
try:
subliminal.region.configure('dogpile.cache.dbm', expiration_time=datetime.timedelta(days=30),
arguments={'filename': os.path.join(config.data_items_path, 'subzero.dbm'),
'lock_factory': MutexLock})
except:
Log.Warn("Not using file based cache!")
subliminal.region.configure('dogpile.cache.memory')
config.init_cache()
# clear expired intents
intent = get_intent()
@@ -191,6 +184,9 @@ class SubZeroAgent(object):
config.init_subliminal_patches()
videos = media_to_videos(media, kind=self.agent_type)
# find local media
update_local_media(metadata, media, media_type=self.agent_type)
# media ignored?
use_any_parts = False
for video in videos:
@@ -211,9 +207,6 @@ class SubZeroAgent(object):
set_refresh_menu_state(media, media_type=self.agent_type)
# find local media
update_local_media(metadata, media, media_type=self.agent_type)
# scanned_video_part_map = {subliminal.Video: plex_part, ...}
scanned_video_part_map = scan_videos(videos, kind=self.agent_type)
+3
View File
@@ -18,3 +18,6 @@ sys.modules["interface.refresh_item"] = refresh_item
import item_details
sys.modules["interface.item_details"] = item_details
import sub_mod
sys.modules["interface.modification"] = sub_mod
+109 -5
View File
@@ -3,19 +3,23 @@ import datetime
import StringIO
import glob
import os
import traceback
import urlparse
from zipfile import ZipFile, ZIP_DEFLATED
from babelfish import Language
from subzero.lib.io import FileIO
from subzero.constants import PREFIX, PLUGIN_IDENTIFIER
from menu_helpers import SubFolderObjectContainer, debounce, set_refresh_menu_state, ZipObject
from menu_helpers import SubFolderObjectContainer, debounce, set_refresh_menu_state, ZipObject, ObjectContainer
from main import fatality
from support.helpers import timestamp, pad_title
from support.config import config
from support.lib import Plex
from support.storage import reset_storage, log_storage
from support.storage import reset_storage, log_storage, get_subtitle_storage
from support.scheduler import scheduler
from support.items import set_mods_for_part, get_item_kind_from_rating_key
@route(PREFIX + '/advanced')
@@ -49,6 +53,18 @@ def AdvancedMenu(randomize=None, header=None, message=None):
key=Callback(TriggerStorageMaintenance, randomize=timestamp()),
title=pad_title("Trigger subtitle storage maintenance"),
))
oc.add(DirectoryObject(
key=Callback(TriggerStorageMigration, randomize=timestamp()),
title=pad_title("Trigger subtitle storage migration (expensive)"),
))
oc.add(DirectoryObject(
key=Callback(ApplyDefaultMods, randomize=timestamp()),
title=pad_title("Apply configured default subtitle mods to all (active) stored subtitles"),
))
oc.add(DirectoryObject(
key=Callback(ReApplyMods, randomize=timestamp()),
title=pad_title("Re-Apply mods of all stored subtitles"),
))
oc.add(DirectoryObject(
key=Callback(LogStorage, key="tasks", randomize=timestamp()),
title=pad_title("Log the plugin's scheduled tasks state storage"),
@@ -92,6 +108,7 @@ def Restart():
@route(PREFIX + '/storage/reset', sure=bool)
@debounce
def ResetStorage(key, randomize=None, sure=False):
if not sure:
oc = SubFolderObjectContainer(no_history=True, title1="Reset subtitle storage", title2="Are you sure?")
@@ -127,6 +144,7 @@ def LogStorage(key, randomize=None):
@route(PREFIX + '/triggerbetter')
@debounce
def TriggerBetterSubtitles(randomize=None):
scheduler.dispatch_task("FindBetterSubtitles")
return AdvancedMenu(
@@ -137,6 +155,7 @@ def TriggerBetterSubtitles(randomize=None):
@route(PREFIX + '/triggermaintenance')
@debounce
def TriggerStorageMaintenance(randomize=None):
scheduler.dispatch_task("SubtitleStorageMaintenance")
return AdvancedMenu(
@@ -146,27 +165,111 @@ def TriggerStorageMaintenance(randomize=None):
)
@route(PREFIX + '/triggerstoragemigration')
@debounce
def TriggerStorageMigration(randomize=None):
scheduler.dispatch_task("MigrateSubtitleStorage")
return AdvancedMenu(
randomize=timestamp(),
header='Success',
message='MigrateSubtitleStorage triggered'
)
def apply_default_mods(reapply_current=False):
storage = get_subtitle_storage()
subs_applied = 0
for fn in storage.get_all_files():
data = storage.load(None, filename=fn)
if data:
video_id = data.video_id
item_type = get_item_kind_from_rating_key(video_id)
if not item_type:
continue
for part_id, part in data.parts.iteritems():
for lang, subs in part.iteritems():
current_sub = subs.get("current")
if not current_sub:
continue
sub = subs[current_sub]
if not sub.content:
continue
current_mods = sub.mods or []
if not reapply_current:
add_mods = list(set(config.default_mods).difference(set(current_mods)))
if not add_mods:
continue
else:
if not current_mods:
continue
add_mods = []
try:
set_mods_for_part(video_id, part_id, Language.fromietf(lang), item_type, add_mods, mode="add")
except:
Log.Error("Couldn't set mods for %s:%s: %s", video_id, part_id, traceback.format_exc())
continue
subs_applied += 1
Log.Debug("Applied mods to %i items" % subs_applied)
@route(PREFIX + '/applydefaultmods')
@debounce
def ApplyDefaultMods(randomize=None):
Thread.CreateTimer(1.0, apply_default_mods)
return AdvancedMenu(
randomize=timestamp(),
header='Success',
message='This may take some time ...'
)
@route(PREFIX + '/reapplyallmods')
@debounce
def ReApplyMods(randomize=None):
Thread.CreateTimer(1.0, apply_default_mods, reapply_current=True)
return AdvancedMenu(
randomize=timestamp(),
header='Success',
message='This may take some time ...'
)
@route(PREFIX + '/get_logs_link')
def GetLogsLink():
if not config.plex_token:
oc = ObjectContainer(title2="Download Logs", no_cache=True, no_history=True,
header="Sorry, feature unavailable",
message="Universal Plex token not available")
return oc
# try getting the link base via the request in context, first, otherwise use the public ip
req_headers = Core.sandbox.context.request.headers
get_external_ip = True
link_base = ""
if "Origin" in req_headers:
link_base = req_headers["Origin"]
Log.Debug("Using origin-based link_base")
get_external_ip = False
elif "Referer" in req_headers:
parsed = urlparse.urlparse(req_headers["Referer"])
link_base = "%s://%s:%s" % (parsed.scheme, parsed.hostname, parsed.port)
Log.Debug("Using referer-based link_base")
get_external_ip = False
else:
if get_external_ip or "plex.tv" in link_base:
ip = Core.networking.http_request("http://www.plexapp.com/ip.php", cacheTime=7200).content.strip()
link_base = "https://%s:32400" % ip
Log.Debug("Using ip-based fallback link_base")
logs_link = "%s%s?X-Plex-Token=%s" % (link_base, PREFIX + '/logs', config.universal_plex_token)
oc = ObjectContainer(title2="Download Logs", no_cache=True, no_history=True,
logs_link = "%s%s?X-Plex-Token=%s" % (link_base, PREFIX + '/logs', config.plex_token)
oc = ObjectContainer(title2=logs_link, no_cache=True, no_history=True,
header="Copy this link and open this in your browser, please",
message=logs_link)
return oc
@@ -189,6 +292,7 @@ def DownloadLogs():
@route(PREFIX + '/invalidatecache')
@debounce
def InvalidateCache(randomize=None):
from subliminal.cache import region
region.invalidate()
+96 -112
View File
@@ -1,23 +1,19 @@
# coding=utf-8
import os
import traceback
from babelfish import Language
from subzero.constants import PREFIX
from sub_mod import SubtitleModificationsMenu
from menu_helpers import debounce, SubFolderObjectContainer, default_thumb, add_ignore_options, get_item_task_data, \
set_refresh_menu_state
from refresh_item import RefreshItem
from subzero.constants import PREFIX
from support.config import config
from support.helpers import timestamp, cast_bool, df, get_language
from support.items import get_item_kind_from_rating_key, get_item, get_current_sub
from support.plex_media import get_plex_metadata, scan_videos
from support.lib import Plex
from support.storage import get_subtitle_storage, save_subtitles
from support.config import config
from support.plex_media import get_plex_metadata, scan_videos, PMSMediaProxy
from support.scheduler import scheduler
from subliminal_patch import PatchedSubtitle as Subtitle
from subzero.modification import registry as mod_registry
from support.storage import get_subtitle_storage
@route(PREFIX + '/item/{rating_key}/actions')
@@ -41,6 +37,29 @@ def ItemDetailsMenu(rating_key, title=None, base_title=None, item_title=None, ra
timeout = 30
oc = SubFolderObjectContainer(title2=title, replace_parent=True)
# add back to season for episode
if current_kind == "episode":
from interface.menu import MetadataMenu
show = get_item(item.show.rating_key)
season = get_item(item.season.rating_key)
oc.add(DirectoryObject(
key=Callback(MetadataMenu, rating_key=season.rating_key, title=season.title, base_title=show.title,
previous_item_type="show", previous_rating_key=show.rating_key,
display_items=True, randomize=timestamp()),
title=u"< Back to %s" % season.title,
summary="Back to %s > %s" % (show.title, season.title),
thumb=season.thumb or default_thumb
))
oc.add(DirectoryObject(
key=Callback(UpdateLocalMedia, rating_key=rating_key, title=title, item_title=item_title, base_title=base_title,
randomize=timestamp()),
title=u"Find local subtitles (doesn't refresh metadata)",
summary="Searches for locally available subtitles",
thumb=item.thumb or default_thumb
))
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=item_title, randomize=timestamp(),
timeout=timeout * 1000),
@@ -51,7 +70,7 @@ def ItemDetailsMenu(rating_key, title=None, base_title=None, item_title=None, ra
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=item_title, force=True, randomize=timestamp(),
timeout=timeout * 1000),
title=u"Auto-search: %s" % item_title,
title=u"Force-find subtitles: %s" % item_title,
summary="Issues a forced refresh, ignoring known subtitles and searching for new ones",
thumb=item.thumb or default_thumb
))
@@ -63,52 +82,76 @@ def ItemDetailsMenu(rating_key, title=None, base_title=None, item_title=None, ra
# get the plex item
plex_item = list(Plex["library"].metadata(rating_key))[0]
# get current media info for that item
media = plex_item.media
# look for subtitles for all available media parts and all of their languages
for part in media.parts:
filename = os.path.basename(part.file)
part_id = str(part.id)
has_multiple_parts = len(plex_item.media) > 1
part_index = 0
for media in plex_item.media:
for part in media.parts:
filename = os.path.basename(part.file)
if not os.path.exists(part.file):
continue
# iterate through all configured languages
for lang in config.lang_list:
lang_a2 = lang.alpha2
# ietf lang?
if cast_bool(Prefs["subtitles.language.ietf"]) and "-" in lang_a2:
lang_a2 = lang_a2.split("-")[0]
part_id = str(part.id)
part_index += 1
# get corresponding stored subtitle data for that media part (physical media item), for language
current_sub = stored_subs.get_any(part_id, lang_a2)
current_sub_id = None
current_sub_provider_name = None
# iterate through all configured languages
for lang in config.lang_list:
lang_a2 = lang.alpha2
# ietf lang?
if cast_bool(Prefs["subtitles.language.ietf"]) and "-" in lang_a2:
lang_a2 = lang_a2.split("-")[0]
summary = u"No current subtitle in storage"
current_score = None
if current_sub:
current_sub_id = current_sub.id
current_sub_provider_name = current_sub.provider_name
current_score = current_sub.score
# get corresponding stored subtitle data for that media part (physical media item), for language
current_sub = stored_subs.get_any(part_id, lang_a2)
current_sub_id = None
current_sub_provider_name = None
summary = u"Current subtitle: %s (added: %s, %s), Language: %s, Score: %i, Storage: %s" % \
(current_sub.provider_name, df(current_sub.date_added), current_sub.mode_verbose, lang,
current_sub.score, current_sub.storage_type)
part_index_addon = ""
part_summary_addon = ""
if has_multiple_parts:
part_index_addon = u"File %s: " % part_index
part_summary_addon = "%s " % filename
oc.add(DirectoryObject(
key=Callback(SubtitleOptionsMenu, rating_key=rating_key, part_id=part_id, title=title,
item_title=item_title, language=lang, language_name=lang.name, current_id=current_sub_id,
item_type=plex_item.type, filename=filename, current_data=summary,
randomize=timestamp(), current_provider=current_sub_provider_name,
current_score=current_score),
title=u"Actions for %s subtitle" % lang.name,
summary=summary
))
summary = u"%sNo current subtitle in storage" % part_summary_addon
current_score = None
if current_sub:
current_sub_id = current_sub.id
current_sub_provider_name = current_sub.provider_name
current_score = current_sub.score
summary = u"%sCurrent subtitle: %s (added: %s, %s), Language: %s, Score: %i, Storage: %s" % \
(part_summary_addon, current_sub.provider_name, df(current_sub.date_added),
current_sub.mode_verbose, lang, current_sub.score, current_sub.storage_type)
oc.add(DirectoryObject(
key=Callback(SubtitleOptionsMenu, rating_key=rating_key, part_id=part_id, title=title,
item_title=item_title, language=lang, language_name=lang.name, current_id=current_sub_id,
item_type=plex_item.type, filename=filename, current_data=summary,
randomize=timestamp(), current_provider=current_sub_provider_name,
current_score=current_score),
title=u"%sActions for %s subtitle" % (part_index_addon, lang.name),
summary=summary
))
add_ignore_options(oc, "videos", title=item_title, rating_key=rating_key, callback_menu=IgnoreMenu)
return oc
@route(PREFIX + '/item/update_local_media/{rating_key}', force=bool)
@debounce
def UpdateLocalMedia(**kwargs):
from support.localmedia import find_subtitles
rating_key = kwargs["rating_key"]
parts = PMSMediaProxy(rating_key).get_all_parts()
for part in parts:
find_subtitles(part)
kwargs.pop("randomize")
return ItemDetailsMenu(**kwargs)
@route(PREFIX + '/item/current_sub/{rating_key}/{part_id}', force=bool)
@debounce
def SubtitleOptionsMenu(**kwargs):
@@ -123,7 +166,7 @@ def SubtitleOptionsMenu(**kwargs):
oc.add(DirectoryObject(
key=Callback(ItemDetailsMenu, rating_key=kwargs["rating_key"], item_title=kwargs["item_title"],
title=kwargs["title"], randomize=timestamp()),
title=u"Back to: %s" % kwargs["title"],
title=u"< Back to %s" % kwargs["title"],
summary=kwargs["current_data"],
thumb=default_thumb
))
@@ -141,69 +184,6 @@ def SubtitleOptionsMenu(**kwargs):
return oc
@route(PREFIX + '/item/sub_mods/{rating_key}/{part_id}', force=bool)
@debounce
def SubtitleModificationsMenu(**kwargs):
rating_key = kwargs["rating_key"]
part_id = kwargs["part_id"]
language = kwargs["language"]
current_sub, stored_subs, storage = get_current_sub(rating_key, part_id, language)
kwargs.pop("randomize")
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
for identifier, mod in mod_registry.mods.iteritems():
oc.add(DirectoryObject(
key=Callback(SubtitleApplyMod, mod_identifier=identifier, randomize=timestamp(), **kwargs),
title=mod.description
))
oc.add(DirectoryObject(
key=Callback(SubtitleApplyMod, mod_identifier=None, randomize=timestamp(), **kwargs),
title="Restore original version",
summary=u"Currently applied mods: %s" % (", ".join(current_sub.mods) if current_sub.mods else "none")
))
return oc
@route(PREFIX + '/item/sub_add_mod/{rating_key}/{part_id}/{mod_identifier}', force=bool)
@debounce
def SubtitleApplyMod(mod_identifier=None, **kwargs):
if mod_identifier is not None and mod_identifier not in mod_registry.mods:
raise NotImplementedError
rating_key = kwargs["rating_key"]
part_id = kwargs["part_id"]
lang_a2 = kwargs["language"]
item_type = kwargs["item_type"]
language = Language.fromietf(lang_a2)
current_sub, stored_subs, storage = get_current_sub(rating_key, part_id, language)
current_sub.add_mod(mod_identifier)
storage.save(stored_subs)
metadata = get_plex_metadata(rating_key, part_id, item_type)
scanned_parts = scan_videos([metadata], kind="series" if item_type == "episode" else "movie", ignore_all=True)
video, plex_part = scanned_parts.items()[0]
subtitle = Subtitle(language, mods=current_sub.mods)
subtitle.content = current_sub.content
subtitle.plex_media_fps = plex_part.fps
subtitle.page_link = "modify subtitles with: %s" % (", ".join(current_sub.mods) if current_sub.mods else "none")
subtitle.language = language
try:
save_subtitles(scanned_parts, {video: [subtitle]}, mode="m", bare_save=True)
Log.Debug("Modified %s subtitle for: %s:%s with: %s", language.name, rating_key, part_id,
", ".join(current_sub.mods) if current_sub.mods else "none")
except:
Log.Error("Something went wrong when modifying subtitle: %s", traceback.format_exc())
kwargs.pop("randomize")
return SubtitleModificationsMenu(randomize=timestamp(), **kwargs)
@route(PREFIX + '/item/search/{rating_key}/{part_id}', force=bool)
@debounce
def ListAvailableSubsForItemMenu(rating_key=None, part_id=None, title=None, item_title=None, filename=None,
@@ -223,7 +203,7 @@ def ListAvailableSubsForItemMenu(rating_key=None, part_id=None, title=None, item
oc = SubFolderObjectContainer(title2=unicode(title), replace_parent=True)
oc.add(DirectoryObject(
key=Callback(ItemDetailsMenu, rating_key=rating_key, item_title=item_title, title=title, randomize=timestamp()),
title=u"Back to: %s" % title,
title=u"< Back to %s" % title,
summary=current_data,
thumb=default_thumb
))
@@ -269,11 +249,15 @@ def ListAvailableSubsForItemMenu(rating_key=None, part_id=None, title=None, item
return oc
for subtitle in search_results:
wrong_fps_addon = ""
if subtitle.wrong_fps:
wrong_fps_addon = " (wrong FPS, sub: %s, media: %s)" % (subtitle.fps, plex_part.fps)
oc.add(DirectoryObject(
key=Callback(TriggerDownloadSubtitle, rating_key=rating_key, randomize=timestamp(), item_title=item_title,
subtitle_id=str(subtitle.id), language=language),
title=u"%s: %s, score: %s" % ("Available" if current_id != subtitle.id else "Current",
subtitle.provider_name, subtitle.score),
title=u"%s: %s, score: %s%s" % ("Available" if current_id != subtitle.id else "Current",
subtitle.provider_name, subtitle.score, wrong_fps_addon),
summary=u"Release: %s, Matches: %s" % (subtitle.release_info, ", ".join(subtitle.matches)),
thumb=default_thumb
))
+41 -8
View File
@@ -2,11 +2,13 @@
from subzero.constants import PREFIX, TITLE, ART
from support.config import config
from support.helpers import pad_title, timestamp, df
from support.helpers import pad_title, timestamp, df, get_plex_item_display_title
from support.scheduler import scheduler
from support.ignore import ignore_list
from support.items import get_item_thumb, get_on_deck_items, get_all_items, get_items_info
from menu_helpers import main_icon, debounce, SubFolderObjectContainer, default_thumb, dig_tree, add_ignore_options
from support.items import get_item_thumb, get_on_deck_items, get_all_items, get_items_info, get_item, \
get_item_kind_from_item
from menu_helpers import main_icon, debounce, SubFolderObjectContainer, default_thumb, dig_tree, add_ignore_options,\
ObjectContainer
from item_details import ItemDetailsMenu
@@ -69,16 +71,24 @@ def fatality(randomize=None, force_title=None, header=None, message=None, only_r
oc.add(DirectoryObject(
key=Callback(OnDeckMenu),
title="On Deck items",
title="On-deck items",
summary="Shows the current on deck items and allows you to individually (force-) refresh their metadata/"
"subtitles.",
thumb=R("icon-ondeck.jpg")
))
if "last_played_items" in Dict and Dict["last_played_items"]:
oc.add(DirectoryObject(
key=Callback(RecentlyPlayedMenu),
title=pad_title("Recently played items"),
summary="Shows the %i recently played items and allows you to individually (force-) refresh their "
"metadata/subtitles." % config.store_recently_played_amount,
thumb=R("icon-played.jpg")
))
oc.add(DirectoryObject(
key=Callback(RecentlyAddedMenu),
title="Recently Added items",
title="Recently-added items",
summary="Shows the recently added items per section.",
thumb=R("icon-recent.jpg")
thumb=R("icon-added.jpg")
))
oc.add(DirectoryObject(
key=Callback(RecentMissingSubtitlesMenu, randomize=timestamp()),
@@ -168,6 +178,31 @@ def OnDeckMenu(message=None):
return mergedItemsMenu(title="Items On Deck", base_title="Items On Deck", itemGetter=get_on_deck_items)
@route(PREFIX + '/recently_played')
def RecentlyPlayedMenu():
base_title = "Recently Played"
oc = SubFolderObjectContainer(title2=base_title, replace_parent=True)
for item in [get_item(rating_key) for rating_key in Dict["last_played_items"]]:
kind = get_item_kind_from_item(item)
if kind not in ("episode", "movie"):
continue
if kind == "episode":
item_title = get_plex_item_display_title(item, "show", parent=item.season, section_title=None,
parent_title=item.show.title)
else:
item_title = get_plex_item_display_title(item, kind, section_title=None)
oc.add(DirectoryObject(
title=item_title,
key=Callback(ItemDetailsMenu, title=base_title + " > " + item.title, item_title=item.title,
rating_key=item.rating_key)
))
return oc
@route(PREFIX + '/recently_added')
def RecentlyAddedMenu(message=None):
"""
@@ -215,8 +250,6 @@ def RecentMissingSubtitlesMenu(force=False, randomize=None):
thumb=get_item_thumb(item) or default_thumb
))
scheduler.clear_task_data("MissingSubtitles")
return oc
+65 -11
View File
@@ -1,5 +1,8 @@
# coding=utf-8
import locale
import logging
import os
import logger
from item_details import ItemDetailsMenu
@@ -11,10 +14,10 @@ from advanced import DispatchRestart
from subzero.constants import ART, PREFIX, DEPENDENCY_MODULE_NAMES
from support.scheduler import scheduler
from support.config import config
from support.helpers import timestamp, df
from support.helpers import timestamp, df
from support.ignore import ignore_list
from support.items import get_all_items, get_items_info, \
get_item_kind_from_rating_key
get_item_kind_from_rating_key, get_item
# init GUI
ObjectContainer.art = R(ART)
@@ -53,7 +56,7 @@ def FirstLetterMetadataMenu(rating_key, key, title=None, base_title=None, displa
@route(PREFIX + '/section/contents', display_items=bool)
def MetadataMenu(rating_key, title=None, base_title=None, display_items=False, previous_item_type=None,
previous_rating_key=None):
previous_rating_key=None, randomize=None):
"""
displays the contents of a section based on whether it has a deeper tree or not (movies->movie (item) list; series->series list)
:param rating_key:
@@ -72,6 +75,22 @@ def MetadataMenu(rating_key, title=None, base_title=None, display_items=False, p
current_kind = get_item_kind_from_rating_key(rating_key)
if display_items:
timeout = 30
# add back to series for season
if current_kind == "season":
timeout = 360
show = get_item(previous_rating_key)
oc.add(DirectoryObject(
key=Callback(MetadataMenu, rating_key=show.rating_key, title=show.title, base_title=show.section.title,
previous_item_type="section", display_items=True, randomize=timestamp()),
title=u"< Back to %s" % show.title,
thumb=show.thumb or default_thumb
))
elif current_kind == "series":
timeout = 1800
items = get_all_items(key="children", value=rating_key, base="library/metadata")
kind, deeper = get_items_info(items)
dig_tree(oc, items, MetadataMenu,
@@ -81,12 +100,6 @@ def MetadataMenu(rating_key, title=None, base_title=None, display_items=False, p
if should_display_ignore(items, previous=previous_item_type):
add_ignore_options(oc, "series", title=item_title, rating_key=rating_key, callback_menu=IgnoreMenu)
timeout = 30
if current_kind == "season":
timeout = 360
elif current_kind == "series":
timeout = 1800
# add refresh
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=title, refresh_kind=current_kind,
@@ -147,7 +160,6 @@ def RefreshMissing(randomize=None):
@route(PREFIX + '/ValidatePrefs', enforce_route=True)
def ValidatePrefs():
Core.log.setLevel(logging.DEBUG)
Log.Debug("Validate Prefs called.")
# cache the channel state
update_dict = False
@@ -182,9 +194,51 @@ def ValidatePrefs():
Core.log.removeHandler(logger.console_handler)
Log.Debug("Stop logging to console")
Log.Debug("Validate Prefs called.")
# SZ config debug
Log.Debug("--- SZ Config-Debug ---")
for attr in [
"app_support_path", "data_path", "data_items_path", "enable_agent",
"enable_channel", "permissions_ok", "missing_permissions", "fs_encoding", "enforce_encoding",
"subtitle_destination_folder"]:
Log.Debug("config.%s: %s", attr, getattr(config, attr))
for attr in ["plugin_log_path", "server_log_path"]:
value = getattr(config, attr)
access = os.access(value, os.R_OK)
if Core.runtime.os == "Windows":
try:
f = open(value, "r")
f.read(1)
f.close()
except:
access = False
Log.Debug("config.%s: %s (accessible: %s)", attr, value, access)
for attr in [
"subtitles.save.filesystem", ]:
Log.Debug("Pref.%s: %s", attr, Prefs[attr])
# fixme: check existance of and os access of logs
Log.Debug("Platform: %s", Core.runtime.platform)
Log.Debug("OS: %s", Core.runtime.os)
Log.Debug("----- Environment -----")
for key, value in os.environ.iteritems():
if key.startswith("PLEX") or key.startswith("SZ_"):
if "TOKEN" in key:
outval = "xxxxxxxxxxxxxxxxxxx"
else:
outval = value
Log.Debug("%s: %s", key, outval)
Log.Debug("Locale: %s", locale.getdefaultlocale())
Log.Debug("-----------------------")
Log.Debug("Setting log-level to %s", Prefs["log_level"])
logger.register_logging_handler(DEPENDENCY_MODULE_NAMES, level=Prefs["log_level"])
Core.log.setLevel(logging.getLevelName(Prefs["log_level"]))
os.environ['U1pfT01EQl9LRVk'] = '789CF30DAC2C8B0AF433F5C9AD34290A712DF30D7135F12D0FB3E502006FDE081E'
return
+8 -3
View File
@@ -43,8 +43,8 @@ def add_ignore_options(oc, kind, callback_menu=None, title=None, rating_key=None
oc.add(DirectoryObject(
key=Callback(callback_menu, kind=use_kind, rating_key=rating_key, title=title),
title=u"%s %s \"%s\" %s the ignore list" % (
"Remove" if in_list else "Add", ignore_list.verbose(kind) if add_kind else "", unicode(title), "from" if in_list else "to")
title=u"%s %s \"%s\"" % (
"Un-Ignore" if in_list else "Ignore", ignore_list.verbose(kind) if add_kind else "", unicode(title))
)
)
@@ -157,7 +157,12 @@ def debounce(func):
return ObjectContainer()
else:
Dict["menu_history"][key] = datetime.datetime.now() + datetime.timedelta(days=1)
Dict.Save()
try:
Dict.Save()
except TypeError:
Log.Error("Can't save menu history for: %r", key)
del Dict["menu_history"][key]
return func(*args, **kwargs)
return wrap
+251
View File
@@ -0,0 +1,251 @@
# coding=utf-8
import traceback
import types
from babelfish import Language
from menu_helpers import debounce, SubFolderObjectContainer, default_thumb
from subzero.modification import registry as mod_registry, SubtitleModifications
from subzero.constants import PREFIX
from support.plex_media import get_plex_metadata, scan_videos
from support.helpers import timestamp, pad_title
from support.items import get_current_sub, set_mods_for_part
@route(PREFIX + '/item/sub_mods/{rating_key}/{part_id}', force=bool)
@debounce
def SubtitleModificationsMenu(**kwargs):
rating_key = kwargs["rating_key"]
part_id = kwargs["part_id"]
language = kwargs["language"]
current_sub, stored_subs, storage = get_current_sub(rating_key, part_id, language)
kwargs.pop("randomize")
current_mods = current_sub.mods or []
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
from interface.item_details import SubtitleOptionsMenu
oc.add(DirectoryObject(
key=Callback(SubtitleOptionsMenu, randomize=timestamp(), **kwargs),
title=u"< Back to subtitle options for: %s" % kwargs["title"],
summary=kwargs["current_data"],
thumb=default_thumb
))
for identifier, mod in mod_registry.mods.iteritems():
if mod.advanced:
continue
if mod.exclusive and identifier in current_mods:
continue
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=identifier, mode="add", randomize=timestamp(), **kwargs),
title=pad_title(mod.description), summary=mod.long_description or ""
))
fps_mod = SubtitleModifications.get_mod_class("change_FPS")
oc.add(DirectoryObject(
key=Callback(SubtitleFPSModMenu, randomize=timestamp(), **kwargs),
title=pad_title(fps_mod.description), summary=fps_mod.long_description or ""
))
shift_mod = SubtitleModifications.get_mod_class("shift_offset")
oc.add(DirectoryObject(
key=Callback(SubtitleShiftModUnitMenu, randomize=timestamp(), **kwargs),
title=pad_title(shift_mod.description), summary=shift_mod.long_description or ""
))
color_mod = SubtitleModifications.get_mod_class("color")
oc.add(DirectoryObject(
key=Callback(SubtitleColorModMenu, randomize=timestamp(), **kwargs),
title=pad_title(color_mod.description), summary=color_mod.long_description or ""
))
if current_mods:
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=None, mode="remove_last", randomize=timestamp(), **kwargs),
title=pad_title("Remove last applied mod (%s)" % current_mods[-1]),
summary=u"Currently applied mods: %s" % (", ".join(current_mods) if current_mods else "none")
))
oc.add(DirectoryObject(
key=Callback(SubtitleListMods, randomize=timestamp(), **kwargs),
title=pad_title("Manage applied mods"),
summary=u"Currently applied mods: %s" % (", ".join(current_mods))
))
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=None, mode="clear", randomize=timestamp(), **kwargs),
title=pad_title("Restore original version"),
summary=u"Currently applied mods: %s" % (", ".join(current_mods) if current_mods else "none")
))
return oc
@route(PREFIX + '/item/sub_mod_fps/{rating_key}/{part_id}', force=bool)
def SubtitleFPSModMenu(**kwargs):
rating_key = kwargs["rating_key"]
part_id = kwargs["part_id"]
item_type = kwargs["item_type"]
kwargs.pop("randomize")
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
oc.add(DirectoryObject(
key=Callback(SubtitleModificationsMenu, randomize=timestamp(), **kwargs),
title="< Back to subtitle modification menu"
))
metadata = get_plex_metadata(rating_key, part_id, item_type)
scanned_parts = scan_videos([metadata], kind="series" if item_type == "episode" else "movie", ignore_all=True)
video, plex_part = scanned_parts.items()[0]
target_fps = plex_part.fps
for fps in ["23.976", "24.000", "25.000", "29.970", "30.000", "50.000", "59.940", "60.000"]:
if float(fps) == float(target_fps):
continue
if float(fps) > float(target_fps):
indicator = "subs constantly getting faster"
else:
indicator = "subs constantly getting slower"
mod_ident = SubtitleModifications.get_mod_signature("change_FPS", **{"from": fps, "to": target_fps})
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=mod_ident, mode="add", randomize=timestamp(), **kwargs),
title="%s fps -> %s fps (%s)" % (fps, target_fps, indicator)
))
return oc
POSSIBLE_UNITS = (("ms", "milliseconds"), ("s", "seconds"), ("m", "minutes"), ("h", "hours"))
POSSIBLE_UNITS_D = dict(POSSIBLE_UNITS)
@route(PREFIX + '/item/sub_mod_shift_unit/{rating_key}/{part_id}', force=bool)
def SubtitleShiftModUnitMenu(**kwargs):
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
kwargs.pop("randomize")
oc.add(DirectoryObject(
key=Callback(SubtitleModificationsMenu, randomize=timestamp(), **kwargs),
title="< Back to subtitle modifications"
))
for unit, title in POSSIBLE_UNITS:
oc.add(DirectoryObject(
key=Callback(SubtitleShiftModMenu, unit=unit, randomize=timestamp(), **kwargs),
title="Adjust by %s" % title
))
return oc
@route(PREFIX + '/item/sub_mod_shift/{rating_key}/{part_id}/{unit}', force=bool)
def SubtitleShiftModMenu(unit=None, **kwargs):
if unit not in POSSIBLE_UNITS_D:
raise NotImplementedError
kwargs.pop("randomize")
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
oc.add(DirectoryObject(
key=Callback(SubtitleShiftModUnitMenu, randomize=timestamp(), **kwargs),
title="< Back to unit selection"
))
rng = []
if unit == "h":
rng = range(-10, 11)
elif unit in ("m", "s"):
rng = range(-15, 15)
elif unit == "ms":
rng = range(-900, 1000, 100)
for i in rng:
if i == 0:
continue
mod_ident = SubtitleModifications.get_mod_signature("shift_offset", **{unit: i})
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=mod_ident, mode="add", randomize=timestamp(), **kwargs),
title="%s %s" % (("%s" if i < 0 else "+%s") % i, unit)
))
return oc
@route(PREFIX + '/item/sub_mod_colors/{rating_key}/{part_id}', force=bool)
def SubtitleColorModMenu(**kwargs):
kwargs.pop("randomize")
color_mod = SubtitleModifications.get_mod_class("color")
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
oc.add(DirectoryObject(
key=Callback(SubtitleModificationsMenu, randomize=timestamp(), **kwargs),
title="< Back to subtitle modification menu"
))
for color, code in color_mod.colors.iteritems():
mod_ident = SubtitleModifications.get_mod_signature("color", **{"name": color})
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=mod_ident, mode="add", randomize=timestamp(), **kwargs),
title="%s (%s)" % (color, code)
))
return oc
@route(PREFIX + '/item/sub_set_mods/{rating_key}/{part_id}/{mods}/{mode}', force=bool)
@debounce
def SubtitleSetMods(mods=None, mode=None, **kwargs):
if not isinstance(mods, types.ListType) and mods:
mods = [mods]
rating_key = kwargs["rating_key"]
part_id = kwargs["part_id"]
lang_a2 = kwargs["language"]
item_type = kwargs["item_type"]
language = Language.fromietf(lang_a2)
set_mods_for_part(rating_key, part_id, language, item_type, mods, mode=mode)
kwargs.pop("randomize")
return SubtitleModificationsMenu(randomize=timestamp(), **kwargs)
@route(PREFIX + '/item/sub_list_mods/{rating_key}/{part_id}', force=bool)
@debounce
def SubtitleListMods(**kwargs):
rating_key = kwargs["rating_key"]
part_id = kwargs["part_id"]
language = kwargs["language"]
current_sub, stored_subs, storage = get_current_sub(rating_key, part_id, language)
kwargs.pop("randomize")
oc = SubFolderObjectContainer(title2=kwargs["title"], replace_parent=True)
oc.add(DirectoryObject(
key=Callback(SubtitleModificationsMenu, randomize=timestamp(), **kwargs),
title="< Back to subtitle modifications"
))
for identifier in current_sub.mods:
oc.add(DirectoryObject(
key=Callback(SubtitleSetMods, mods=identifier, mode="remove", randomize=timestamp(), **kwargs),
title="Remove: %s" % identifier
))
return oc
+1 -1
View File
@@ -18,7 +18,7 @@ sys.modules["support.plex_media"] = plex_media
import localmedia
sys.modules["subzero.localmedia"] = localmedia
sys.modules["support.localmedia"] = localmedia
import subtitlehelpers
+15 -8
View File
@@ -11,9 +11,9 @@ class PlexActivityManager(object):
def start(self):
activity_sources_enabled = None
if config.universal_plex_token:
if config.plex_token:
from plex import Plex
Plex.configuration.defaults.authentication(config.universal_plex_token)
Plex.configuration.defaults.authentication(config.plex_token)
activity_sources_enabled = ["websocket"]
Activity.on('websocket.playing', self.on_playing)
@@ -27,9 +27,6 @@ class PlexActivityManager(object):
@throttle(5, instance_method=True)
def on_playing(self, info):
if not config.use_activities:
return
# ignore non-playing states and anything too far in
if info["state"] != "playing" or info["viewOffset"] > 60000:
return
@@ -41,13 +38,22 @@ class PlexActivityManager(object):
return
rating_key = info["ratingKey"]
if rating_key not in Dict["last_played_items"]:
# new playing; store last 10 recently played items
if rating_key in Dict["last_played_items"] and rating_key != Dict["last_played_items"][0]:
# shift last played
Dict["last_played_items"].insert(0,
Dict["last_played_items"].pop(Dict["last_played_items"].index(rating_key)))
Dict.Save()
elif rating_key not in Dict["last_played_items"]:
# new playing; store last X recently played items
Dict["last_played_items"].insert(0, rating_key)
Dict["last_played_items"] = Dict["last_played_items"][:10]
Dict["last_played_items"] = Dict["last_played_items"][:config.store_recently_played_amount]
Dict.Save()
if not config.react_to_activities:
return
debug_msg = "Started playing %s. Refreshing it." % rating_key
key_to_refresh = None
@@ -108,4 +114,5 @@ class PlexActivityManager(object):
if ep.index == 1:
return ep
activity = PlexActivityManager()
+60 -8
View File
@@ -9,6 +9,7 @@ import datetime
import subliminal
import subliminal_patch
from babelfish import Language
from subliminal.cli import MutexLock
from subzero.lib.io import FileIO, get_viable_encoding
from subzero.constants import PLUGIN_NAME, PLUGIN_IDENTIFIER, MOVIE, SHOW
from lib import Plex
@@ -45,6 +46,7 @@ class Config(object):
data_path = None
data_items_path = None
universal_plex_token = None
plex_token = None
is_development = False
enable_channel = True
@@ -68,6 +70,9 @@ class Config(object):
sections = None
enabled_sections = None
remove_hi = False
fix_ocr = False
fix_common = False
colors = ""
enforce_encoding = False
chmod = None
forced_only = False
@@ -75,8 +80,12 @@ class Config(object):
treat_und_as_first = False
ext_match_strictness = False
default_mods = None
use_activities = False
debug_mods = False
react_to_activities = False
activity_mode = None
subtitles_save_to = None
store_recently_played_amount = 20
initialized = False
@@ -91,6 +100,9 @@ class Config(object):
self.data_path = getattr(Data, "_core").storage.data_path
self.data_items_path = os.path.join(self.data_path, "DataItems")
self.universal_plex_token = self.get_universal_plex_token()
self.plex_token = os.environ.get("PLEXTOKEN", self.universal_plex_token)
os.environ["SZ_USER_AGENT"] = self.get_user_agent()
self.set_plugin_mode()
self.set_plugin_lock()
@@ -98,6 +110,7 @@ class Config(object):
self.lang_list = self.get_lang_list()
self.subtitle_destination_folder = self.get_subtitle_destination_folder()
self.forced_only = cast_bool(Prefs["subtitles.only_foreign"])
self.providers = self.get_providers()
self.provider_settings = self.get_provider_settings()
self.max_recent_items_per_library = int_or_default(Prefs["scheduler.max_recent_items_per_library"], 2000)
@@ -109,15 +122,37 @@ class Config(object):
self.permissions_ok = self.check_permissions()
self.notify_executable = self.check_notify_executable()
self.remove_hi = cast_bool(Prefs['subtitles.remove_hi'])
self.fix_ocr = cast_bool(Prefs['subtitles.fix_ocr'])
self.fix_common = cast_bool(Prefs['subtitles.fix_common'])
self.colors = Prefs['subtitles.colors'] if Prefs['subtitles.colors'] != "don't change" else None
self.enforce_encoding = cast_bool(Prefs['subtitles.enforce_encoding'])
os.environ["SZ_ENFORCE_ENCODING"] = str(self.enforce_encoding)
self.chmod = self.check_chmod()
self.forced_only = cast_bool(Prefs["subtitles.only_foreign"])
self.exotic_ext = cast_bool(Prefs["subtitles.scan.exotic_ext"])
self.treat_und_as_first = cast_bool(Prefs["subtitles.language.treat_und_as_first"])
self.ext_match_strictness = self.determine_ext_sub_strictness()
self.default_mods = self.get_default_mods()
self.debug_mods = cast_bool(Prefs['log_debug_mods'])
self.subtitles_save_to = Prefs['subtitles.save.filesystem']
self.initialized = True
def init_cache(self):
use_fallback_cache = True
if Core.runtime.os != "Windows":
try:
subliminal.region.configure('dogpile.cache.dbm', expiration_time=datetime.timedelta(days=30),
arguments={'filename': os.path.join(config.data_items_path, 'subzero.dbm'),
'lock_factory': MutexLock})
use_fallback_cache = False
except:
pass
if use_fallback_cache:
Log.Warn("Not using file based cache!")
subliminal.region.configure('dogpile.cache.memory')
def set_log_paths(self):
# find log handler
for handler in Core.log.handlers:
@@ -142,7 +177,9 @@ class Config(object):
except:
Log.Warn("Couldn't determine Plex Token")
else:
Log("Did NOT find Preferences file - please check logfile and hierarchy. Aborting!")
Log("Did NOT find Preferences file - most likely Windows OS. Otherwise please check logfile and hierarchy.")
# fixme: windows
def set_plugin_mode(self):
if Prefs["plugin_mode"] == "only agent":
@@ -217,11 +254,17 @@ class Config(object):
return all_permissions_ok
def get_version(self):
return self.get_bare_version() + ("" if not self.is_development else " DEV")
def get_bare_version(self):
result = VERSION_RE.search(self.plugin_info)
add = "" if not self.is_development else " DEV"
if result:
return result.group(1) + add
return result.group(1)
return "2.x.x.x"
def get_user_agent(self):
return "Sub-Zero/%s" % (self.get_bare_version() + ("" if not self.is_development else "-dev"))
def get_dev_mode(self):
dev = DEV_RE.search(self.plugin_info)
@@ -347,10 +390,13 @@ class Config(object):
}
# ditch non-forced-subtitles-reporting providers
if cast_bool(Prefs['subtitles.only_foreign']):
if self.forced_only:
providers["addic7ed"] = False
providers["tvsubtitles"] = False
providers["legendastv"] = False
providers["napiprojekt"] = False
providers["shooter"] = False
providers["subscenter"] = False
return filter(lambda prov: providers[prov], providers)
@@ -412,16 +458,22 @@ class Config(object):
mods = []
if self.remove_hi:
mods.append("remove_HI")
if self.fix_ocr:
mods.append("OCR_fixes")
if self.fix_common:
mods.append("common")
if self.colors:
mods.append("color(name=%s)" % self.colors)
return mods
def set_activity_modes(self):
val = Prefs["activity.on_playback"]
if val == "never":
self.use_activities = False
self.react_to_activities = False
return
self.use_activities = True
self.react_to_activities = True
if val == "current media item":
self.activity_mode = "refresh"
elif val == "hybrid: current item or next episode":
+36 -11
View File
@@ -9,15 +9,24 @@ import time
import re
import platform
import subprocess
from bs4 import UnicodeDammit
import sys
import chardet
from bs4 import UnicodeDammit
from babelfish import Language
from subzero.analytics import track_event
mswindows = (sys.platform == "win32")
if mswindows:
from subprocess import list2cmdline
quote_args = list2cmdline
else:
# POSIX
from pipes import quote
def quote_args(seq):
return ' '.join(quote(arg) for arg in seq)
# Unicode control characters can appear in ID3v2 tags but are not legal in XML.
RE_UNICODE_CONTROL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
u'|' + \
@@ -30,7 +39,7 @@ RE_UNICODE_CONTROL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])'
def cast_bool(value):
return str(value) in ("true", "True")
return str(value).strip() in ("true", "True")
# A platform independent way to split paths which might come in with different separators.
@@ -110,9 +119,9 @@ def str_pad(s, length, align='left', pad_char=' ', trim=False):
raise ValueError("Unknown align type, expected either 'left' or 'right'")
def pad_title(value):
def pad_title(value, width=49):
"""Pad a title to 30 characters to force the 'details' view."""
return str_pad(value, 49, pad_char=' ')
return str_pad(value, width, pad_char=' ')
def get_plex_item_display_title(item, kind, parent=None, parent_title=None, section_title=None,
@@ -236,13 +245,13 @@ def get_item_hints(data):
:param data: video item dict of media_to_videos
:return:
"""
hints = {"title": data["title"], "type": "movie"}
hints = {"title": data["original_title"] or data["title"], "type": "movie"}
if data["type"] == "episode":
hints.update(
{
"type": "episode",
"episode_title": data["title"],
"title": data["series"],
"title": data["original_title"] or data["series"],
}
)
return hints
@@ -273,9 +282,21 @@ def notify_executable(exe_info, videos, subtitles, storage):
prepared_arguments = [arg % prepared_data for arg in arguments]
Log.Debug(u"Calling %s with arguments: %s" % (exe, prepared_arguments))
env = os.environ
if not mswindows:
env_path = {"PATH": os.pathsep.join(
[
"/usr/local/bin",
"/usr/bin",
os.environ.get("PATH", "")
]
)
}
env = dict(os.environ, **env_path)
try:
output = subprocess.check_output(subprocess.list2cmdline([exe] + prepared_arguments),
stderr=subprocess.STDOUT, shell=True)
output = subprocess.check_output(quote_args([exe] + prepared_arguments),
stderr=subprocess.STDOUT, shell=True, env=env)
except subprocess.CalledProcessError:
Log.Error(u"Calling %s failed: %s" % (exe, traceback.format_exc()))
else:
@@ -303,3 +324,7 @@ def dispatch_track_usage(*args, **kwargs):
def get_language(lang_short):
return Language.fromietf(lang_short)
class PartUnknownException(Exception):
pass
+99 -19
View File
@@ -2,12 +2,15 @@
import logging
import re
import traceback
import types
import os
from ignore import ignore_list
from helpers import is_recent, get_plex_item_display_title, query_plex
from helpers import is_recent, get_plex_item_display_title, query_plex, PartUnknownException
from lib import Plex, get_intent
from config import config, IGNORE_FN
from subliminal_patch.subtitle import ModifiedSubtitle
from subzero.modification import registry as mod_registry, SubtitleModifications
logger = logging.getLogger(__name__)
@@ -40,11 +43,11 @@ PLEX_API_TYPE_MAP = {
def get_item_kind_from_rating_key(key):
item = get_item(key)
return PLEX_API_TYPE_MAP[get_item_kind(item)]
return PLEX_API_TYPE_MAP.get(get_item_kind(item))
def get_item_kind_from_item(item):
return PLEX_API_TYPE_MAP[get_item_kind(item)]
return PLEX_API_TYPE_MAP.get(get_item_kind(item))
def get_item_thumb(item):
@@ -164,14 +167,17 @@ def get_recent_items():
"X-Plex-Container-Size": "%s" % config.max_recent_items_per_library
}
episode_re = re.compile(ur'ratingKey="(?P<key>\d+)"'
episode_re = re.compile(ur'(?su)ratingKey="(?P<key>\d+)"'
ur'.+?grandparentRatingKey="(?P<parent_key>\d+)"'
ur'.+?title="(?P<title>.*?)"'
ur'.+?grandparentTitle="(?P<parent_title>.*?)"'
ur'.+?index="(?P<episode>\d+?)"'
ur'.+?parentIndex="(?P<season>\d+?)".+?addedAt="(?P<added>\d+)"')
movie_re = re.compile(ur'ratingKey="(?P<key>\d+)".+?title="(?P<title>.*?)".+?addedAt="(?P<added>\d+)"')
available_keys = ("key", "title", "parent_key", "parent_title", "season", "episode", "added")
ur'.+?parentIndex="(?P<season>\d+?)".+?addedAt="(?P<added>\d+)"'
ur'.+?<Part.+? file="(?P<filename>[^"]+?)"')
movie_re = re.compile(ur'(?su)ratingKey="(?P<key>\d+)".+?title="(?P<title>.*?)'
ur'".+?addedAt="(?P<added>\d+)"'
ur'.+?<Part.+? file="(?P<filename>[^"]+?)"')
available_keys = ("key", "title", "parent_key", "parent_title", "season", "episode", "added", "filename")
recent = []
for section in Plex["library"].sections():
@@ -182,8 +188,10 @@ def get_recent_items():
continue
use_args = args.copy()
plex_item_type = "Movie"
if section.type == "show":
use_args["type"] = "4"
plex_item_type = "Episode"
url = "http://127.0.0.1:32400/library/sections/%s/all" % int(section.key)
response = query_plex(url, use_args)
@@ -198,6 +206,10 @@ def get_recent_items():
if data["key"] in ignore_list.videos:
Log.Debug(u"Skipping item: %s" % data["title"])
continue
if is_physically_ignored(data["filename"], plex_item_type):
Log.Debug(u"Skipping item: %s" % data["title"])
continue
if is_recent(int(data["added"])):
recent.append((int(data["added"]), section.type, section.title, data["key"]))
@@ -242,6 +254,16 @@ def is_ignored(rating_key, item=None):
return True
# physical/path ignore
if config.ignore_sz_files or config.ignore_paths:
for media in item.media:
for part in media.parts:
if is_physically_ignored(part.file, kind):
return True
return False
def is_physically_ignored(fn, kind):
if config.ignore_sz_files or config.ignore_paths:
# normally check current item folder and the library
check_ignore_paths = [".", "../"]
@@ -249,18 +271,15 @@ def is_ignored(rating_key, item=None):
# series/episode, we've got a season folder here, also
check_ignore_paths.append("../../")
for part in item.media.parts:
if config.ignore_paths and config.is_path_ignored(part.file):
Log.Debug("Item %s's path is manually ignored" % rating_key)
return True
if config.ignore_paths and config.is_path_ignored(fn):
Log.Debug("Item %s's path is manually ignored" % fn)
return True
if config.ignore_sz_files:
for sub_path in check_ignore_paths:
if config.is_physically_ignored(os.path.abspath(os.path.join(os.path.dirname(part.file), sub_path))):
Log.Debug("An ignore file exists in either the items or its parent folders")
return True
return False
if config.ignore_sz_files:
for sub_path in check_ignore_paths:
if config.is_physically_ignored(os.path.normpath(os.path.join(os.path.dirname(fn), sub_path))):
Log.Debug("An ignore file exists in either the items or its parent folders")
return True
def refresh_item(rating_key, force=False, timeout=8000, refresh_kind=None, parent_rating_key=None):
@@ -292,4 +311,65 @@ def get_current_sub(rating_key, part_id, language):
subtitle_storage = get_subtitle_storage()
stored_subs = subtitle_storage.load_or_new(item)
current_sub = stored_subs.get_any(part_id, language)
return current_sub, stored_subs, subtitle_storage
return current_sub, stored_subs, subtitle_storage
def set_mods_for_part(rating_key, part_id, language, item_type, mods, mode="add"):
from support.plex_media import get_plex_metadata, scan_videos
from support.storage import save_subtitles
current_sub, stored_subs, storage = get_current_sub(rating_key, part_id, language)
if mode == "add":
for mod in mods:
identifier, args = SubtitleModifications.parse_identifier(mod)
mod_class = SubtitleModifications.get_mod_class(identifier)
if identifier not in mod_registry.mods_available:
raise NotImplementedError("Mod unknown or not registered")
# clean exclusive mods
if mod_class.exclusive and current_sub.mods:
for current_mod in current_sub.mods[:]:
if current_mod.startswith(identifier):
current_sub.mods.remove(current_mod)
Log.Info("Removing superseded mod %s" % current_mod)
current_sub.add_mod(mod)
elif mode == "clear":
current_sub.add_mod(None)
elif mode == "remove":
for mod in mods:
current_sub.mods.remove(mod)
elif mode == "remove_last":
if current_sub.mods:
current_sub.mods.pop()
else:
raise NotImplementedError("Wrong mode given")
storage.save(stored_subs)
try:
metadata = get_plex_metadata(rating_key, part_id, item_type)
except PartUnknownException:
return
scanned_parts = scan_videos([metadata], kind="series" if item_type == "episode" else "movie", ignore_all=True)
video, plex_part = scanned_parts.items()[0]
subtitle = ModifiedSubtitle(language, mods=current_sub.mods)
subtitle.content = current_sub.content
if current_sub.encoding:
# thanks plex
setattr(subtitle, "_guessed_encoding", current_sub.encoding)
subtitle.plex_media_fps = plex_part.fps
subtitle.page_link = "modify subtitles with: %s" % (", ".join(current_sub.mods) if current_sub.mods else "none")
subtitle.language = language
subtitle.id = current_sub.id
try:
save_subtitles(scanned_parts, {video: [subtitle]}, mode="m", bare_save=True)
Log.Debug("Modified %s subtitle for: %s:%s with: %s", language.name, rating_key, part_id,
", ".join(current_sub.mods) if current_sub.mods else "none")
except:
Log.Error("Something went wrong when modifying subtitle: %s", traceback.format_exc())
+4 -4
View File
@@ -108,7 +108,8 @@ def find_subtitles(part):
if ext.lower()[1:] in config.SUBTITLE_EXTS:
# get fn without forced/default/normal tag
split_tag = root.rsplit(".", 1)
if len(split_tag) > 1 and split_tag[1].lower() in ['forced', 'normal', 'default']:
if len(split_tag) > 1 and split_tag[1].lower() in ['forced', 'normal', 'default', 'embedded',
'custom']:
root = split_tag[0]
# get associated media file name without language
@@ -160,9 +161,8 @@ def find_subtitles(part):
# determine whether to pick up the subtitle based on our match strictness
elif not filename_matches_part:
if sz_config.ext_match_strictness == "strict" or (
sz_config.ext_match_strictness == "loose" and not filename_contains_part):
#Log.Debug("%s doesn't match %s, skipping" % (helpers.unicodize(local_filename),
sz_config.ext_match_strictness == "loose" and not filename_contains_part):
# Log.Debug("%s doesn't match %s, skipping" % (helpers.unicodize(local_filename),
# helpers.unicodize(part_basename)))
continue
+29 -23
View File
@@ -1,5 +1,6 @@
# coding=utf-8
import traceback
import time
from support.config import config
from support.helpers import get_plex_item_display_title, cast_bool
@@ -8,8 +9,6 @@ from support.lib import Plex
def item_discover_missing_subs(rating_key, kind="show", added_at=None, section_title=None, internal=False, external=True, languages=()):
existing_subs = {"internal": [], "external": [], "count": 0}
item_id = int(rating_key)
item = get_item(rating_key)
@@ -18,36 +17,41 @@ def item_discover_missing_subs(rating_key, kind="show", added_at=None, section_t
else:
item_title = get_plex_item_display_title(item, kind, section_title=section_title)
video = item.media
missing = set()
languages_set = set(languages)
for media in item.media:
existing_subs = {"internal": [], "external": [], "count": 0}
for part in media.parts:
for stream in part.streams:
if stream.stream_type == 3:
if stream.index:
key = "internal"
else:
key = "external"
for part in video.parts:
for stream in part.streams:
if stream.stream_type == 3:
if stream.index:
key = "internal"
else:
key = "external"
existing_subs[key].append(Locale.Language.Match(stream.language_code or ""))
existing_subs["count"] = existing_subs["count"] + 1
existing_subs[key].append(Locale.Language.Match(stream.language_code or ""))
existing_subs["count"] = existing_subs["count"] + 1
missing_from_part = set(languages_set)
if existing_subs["count"]:
existing_flat = set((existing_subs["internal"] if internal else []) + (existing_subs["external"] if external else []))
if languages_set.issubset(existing_flat) or (len(existing_flat) >= 1 and Prefs['subtitles.only_one']):
# all subs found
#Log.Info(u"All subtitles exist for '%s'", item_title)
continue
missing = languages
if existing_subs["count"]:
existing_flat = (existing_subs["internal"] if internal else []) + (existing_subs["external"] if external else [])
languages_set = set(languages)
if languages_set.issubset(existing_flat) or (len(existing_flat) >= 1 and Prefs['subtitles.only_one']):
# all subs found
Log.Info(u"All subtitles exist for '%s'", item_title)
return
missing_from_part = languages_set - existing_flat
missing = languages_set - set(existing_flat)
Log.Info(u"Subs still missing for '%s': %s", item_title, missing)
if missing_from_part:
Log.Info(u"Subs still missing for '%s' (%s: %s): %s", item_title, rating_key, media.id,
missing_from_part)
missing.update(missing_from_part)
if missing:
return added_at, item_id, item_title, item, missing
def items_get_all_missing_subs(items):
def items_get_all_missing_subs(items, sleep_after_request=False):
missing = []
for added_at, kind, section_title, key in items:
try:
@@ -65,6 +69,8 @@ def items_get_all_missing_subs(items):
missing.append(state)
except:
Log.Error("Something went wrong when getting the state of item %s: %s", key, traceback.format_exc())
if sleep_after_request:
time.sleep(sleep_after_request)
return missing
+171 -44
View File
@@ -1,15 +1,14 @@
# coding=utf-8
import os
from urllib2 import URLError
import helpers
from config import config
from items import get_item
from lib import get_intent, Plex
from config import config
from subzero.video import parse_video
def get_metadata_dict(item, part, add):
data = {
"item": item,
@@ -22,6 +21,54 @@ def get_metadata_dict(item, part, add):
return data
imdb_guid_identifier = "com.plexapp.agents.imdb://"
tvdb_guid_identifier = "com.plexapp.agents.thetvdb://"
def get_plexapi_stream_info(plex_item, part_id=None):
d = {"stream": {}}
data = d["stream"]
# find current part
current_part = None
current_media = None
for media in plex_item.media:
for part in media.parts:
if not part_id or str(part.id) == part_id:
current_part = part
current_media = media
break
if current_part:
break
if not current_part:
return d
data["video_codec"] = current_media.video_codec
data["audio_codec"] = current_media.audio_codec.upper()
if data["audio_codec"] == "DCA":
data["audio_codec"] = "DTS"
if current_media.audio_channels == 8:
data["audio_channels"] = "7.1"
elif current_media.audio_channels == 6:
data["audio_channels"] = "5.1"
else:
data["audio_channels"] = "%s.0" % str(current_media.audio_channels)
# iter streams
for stream in current_part.streams:
if stream.stream_type == 1:
# video stream
data["resolution"] = "%s%s" % (current_media.video_resolution,
"i" if stream.scan_type != "progressive" else "p")
break
return d
def media_to_videos(media, kind="series"):
"""
iterates through media and returns the associated parts (videos)
@@ -31,36 +78,61 @@ def media_to_videos(media, kind="series"):
"""
videos = []
# this is a Show or a Movie object
plex_item = get_item(media.id)
year = plex_item.year
original_title = plex_item.title_original
if kind == "series":
for season in media.seasons:
season_object = media.seasons[season]
for episode in media.seasons[season].episodes:
ep = media.seasons[season].episodes[episode]
tvdb_id = None
series_tvdb_id = None
if tvdb_guid_identifier in ep.guid:
tvdb_id = ep.guid[len(tvdb_guid_identifier):].split("?")[0]
series_tvdb_id = tvdb_id.split("/")[0]
# get plex item via API for additional metadata
plex_episode = get_item(ep.id)
stream_info = get_plexapi_stream_info(plex_episode)
for item in media.seasons[season].episodes[episode].items:
for part in item.parts:
videos.append(
get_metadata_dict(plex_episode, part,
{"plex_part": part, "type": "episode", "title": ep.title,
"series": media.title, "id": ep.id,
"series_id": media.id, "season_id": season_object.id,
"episode": plex_episode.index, "season": plex_episode.season.index,
"section": plex_episode.section.title
})
dict(stream_info, **{"plex_part": part, "type": "episode",
"title": ep.title,
"series": media.title, "id": ep.id, "year": year,
"series_id": media.id,
"season_id": season_object.id,
"imdb_id": None, "series_tvdb_id": series_tvdb_id,
"tvdb_id": tvdb_id,
"original_title": original_title,
"episode": plex_episode.index,
"season": plex_episode.season.index,
"section": plex_episode.section.title
})
)
)
else:
plex_item = get_item(media.id)
stream_info = get_plexapi_stream_info(plex_item)
imdb_id = None
if imdb_guid_identifier in media.guid:
imdb_id = media.guid[len(imdb_guid_identifier):].split("?")[0]
for item in media.items:
for part in item.parts:
videos.append(
get_metadata_dict(plex_item, part, {"plex_part": part, "type": "movie",
"title": media.title, "id": media.id,
"series_id": None,
"season_id": None,
"section": plex_item.section.title})
get_metadata_dict(plex_item, part, dict(stream_info, **{"plex_part": part, "type": "movie",
"title": media.title, "id": media.id,
"series_id": None, "year": year,
"season_id": None, "imdb_id": imdb_id,
"original_title": original_title,
"series_tvdb_id": None, "tvdb_id": None,
"section": plex_item.section.title})
)
)
return videos
@@ -92,10 +164,10 @@ def get_media_item_ids(media, kind="series"):
return ids
def scan_video(plex_part, ignore_all=False, hints=None, rating_key=None):
def scan_video(pms_video_info, ignore_all=False, hints=None, rating_key=None):
"""
returnes a subliminal/guessit-refined parsed video
:param plex_part:
:param pms_video_info:
:param ignore_all:
:param hints:
:param rating_key:
@@ -104,6 +176,8 @@ def scan_video(plex_part, ignore_all=False, hints=None, rating_key=None):
embedded_subtitles = not ignore_all and Prefs['subtitles.scan.embedded']
external_subtitles = not ignore_all and Prefs['subtitles.scan.external']
plex_part = pms_video_info["plex_part"]
if ignore_all:
Log.Debug("Force refresh intended.")
@@ -111,7 +185,10 @@ def scan_video(plex_part, ignore_all=False, hints=None, rating_key=None):
plex_part.file, external_subtitles, embedded_subtitles))
known_embedded = []
parts = list(Plex["library"].metadata(rating_key))[0].media.parts
parts = []
for media in list(Plex["library"].metadata(rating_key))[0].media:
parts += media.parts
plexpy_part = None
for part in parts:
if int(part.id) == int(plex_part.id):
@@ -139,7 +216,7 @@ def scan_video(plex_part, ignore_all=False, hints=None, rating_key=None):
try:
# get basic video info scan (filename)
video = parse_video(plex_part.file, hints, external_subtitles=external_subtitles,
video = parse_video(plex_part.file, pms_video_info, hints, external_subtitles=external_subtitles,
embedded_subtitles=embedded_subtitles, known_embedded=known_embedded,
forced_only=config.forced_only, video_fps=plex_part.fps)
@@ -165,7 +242,7 @@ def scan_videos(videos, kind="series", ignore_all=False):
hints = helpers.get_item_hints(video)
video["plex_part"].fps = get_stream_fps(video["plex_part"].streams)
scanned_video = scan_video(video["plex_part"], ignore_all=force_refresh or ignore_all, hints=hints,
scanned_video = scan_video(video, ignore_all=force_refresh or ignore_all, hints=hints,
rating_key=video["id"])
if not scanned_video:
@@ -179,49 +256,78 @@ def scan_videos(videos, kind="series", ignore_all=False):
return ret
class PartUnknownException(Exception):
pass
def get_plex_metadata(rating_key, part_id, item_type):
"""
uses the Plex 3rd party API accessor to get metadata information
:param rating_key:
:param rating_key: movie or episode
:param part_id:
:param item_type:
:return:
"""
plex_item = list(Plex["library"].metadata(rating_key))[0]
try:
plex_item = list(Plex["library"].metadata(rating_key))[0]
except URLError:
return None
# find current part
current_part = None
for part in plex_item.media.parts:
if str(part.id) == part_id:
current_part = part
for media in plex_item.media:
for part in media.parts:
if str(part.id) == part_id:
current_part = part
if not current_part:
raise PartUnknownException("Part unknown")
raise helpers.PartUnknownException("Part unknown")
stream_info = get_plexapi_stream_info(plex_item, part_id)
# get normalized metadata
# fixme: duplicated logic of media_to_videos
if item_type == "episode":
show = list(Plex["library"].metadata(plex_item.show.rating_key))[0]
year = show.year
tvdb_id = None
series_tvdb_id = None
original_title = show.title_original
if tvdb_guid_identifier in plex_item.guid:
tvdb_id = plex_item.guid[len(tvdb_guid_identifier):].split("?")[0]
series_tvdb_id = tvdb_id.split("/")[0]
metadata = get_metadata_dict(plex_item, current_part,
{"plex_part": current_part, "type": "episode", "title": plex_item.title,
"series": plex_item.show.title, "id": plex_item.rating_key,
"series_id": plex_item.show.rating_key,
"season_id": plex_item.season.rating_key,
"season": plex_item.season.index,
"episode": plex_item.index
})
dict(stream_info,
**{"plex_part": current_part, "type": "episode", "title": plex_item.title,
"series": plex_item.show.title, "id": plex_item.rating_key,
"series_id": plex_item.show.rating_key,
"season_id": plex_item.season.rating_key,
"imdb_id": None,
"year": year,
"tvdb_id": tvdb_id,
"series_tvdb_id": series_tvdb_id,
"original_title": original_title,
"season": plex_item.season.index,
"episode": plex_item.index
})
)
else:
metadata = get_metadata_dict(plex_item, current_part, {"plex_part": current_part, "type": "movie",
"title": plex_item.title, "id": plex_item.rating_key,
"series_id": None,
"season_id": None,
"season": None,
"episode": None,
"section": plex_item.section.title})
imdb_id = None
original_title = plex_item.title_original
if imdb_guid_identifier in plex_item.guid:
imdb_id = plex_item.guid[len(imdb_guid_identifier):].split("?")[0]
metadata = get_metadata_dict(plex_item, current_part,
dict(stream_info, **{"plex_part": current_part, "type": "movie",
"title": plex_item.title, "id": plex_item.rating_key,
"series_id": None,
"season_id": None,
"imdb_id": imdb_id,
"year": plex_item.year,
"tvdb_id": None,
"series_tvdb_id": None,
"original_title": original_title,
"season": None,
"episode": None,
"section": plex_item.section.title})
)
return metadata
@@ -257,3 +363,24 @@ class PMSMediaProxy(object):
break
m = m.children[0]
def get_all_parts(self):
"""
walk the mediatree until the given part was found; if no part was given, return the first one
:param part_id:
:return:
"""
m = self.mediatree
parts = []
while 1:
if m.items:
media_item = m.items[0]
for part in media_item.parts:
parts.append(part)
break
if not m.children:
break
m = m.children[0]
return parts
+6 -1
View File
@@ -168,6 +168,7 @@ class DefaultScheduler(object):
for args, kwargs in queue:
Log.Debug("Dispatching single task: %s, %s", args, kwargs)
Thread.Create(self.run_task, True, *args, **kwargs)
Thread.Sleep(5.0)
# scheduled tasks
for name, info in self.tasks.iteritems():
@@ -185,9 +186,13 @@ class DefaultScheduler(object):
continue
if not task.last_run or (task.last_run + datetime.timedelta(**{frequency_key: frequency_num}) <= now):
# fixme: scheduled tasks run synchronously. is this the best idea?
#Thread.Create(self.run_task, True, name)
#Thread.Sleep(5.0)
self.run_task(name)
Thread.Sleep(5.0)
Thread.Sleep(5.0)
Thread.Sleep(1)
scheduler = DefaultScheduler()
+7 -3
View File
@@ -137,7 +137,8 @@ def save_subtitles_to_file(subtitles):
os.makedirs(fld)
subliminal.save_subtitles(video, video_subtitles, directory=fld, single=cast_bool(Prefs['subtitles.only_one']),
encode_with=force_utf8 if config.enforce_encoding else None,
chmod=config.chmod, forced_tag=config.forced_only, path_decoder=force_unicode)
chmod=config.chmod, forced_tag=config.forced_only, path_decoder=force_unicode,
debug_mods=config.debug_mods)
return True
@@ -145,7 +146,8 @@ def save_subtitles_to_metadata(videos, subtitles):
for video, video_subtitles in subtitles.items():
mediaPart = videos[video]
for subtitle in video_subtitles:
content = force_utf8(subtitle.text) if config.enforce_encoding else subtitle.content
content = force_utf8(subtitle.get_modified_text(debug=config.debug_mods)) if config.enforce_encoding else \
subtitle.get_modified_content(debug=config.debug_mods)
if not isinstance(mediaPart, Framework.api.agentkit.MediaPart):
# we're being handed a Plex.py model instance here, not an internal PMS MediaPart object.
@@ -204,6 +206,8 @@ def save_subtitles(scanned_video_part_map, downloaded_subtitles, mode="a", bare_
if not bare_save and save_successful and config.notify_executable:
notify_executable(config.notify_executable, scanned_video_part_map, downloaded_subtitles, storage)
if not bare_save:
if not bare_save and save_successful:
store_subtitle_info(scanned_video_part_map, downloaded_subtitles, storage, mode=mode)
return save_successful
+6 -4
View File
@@ -129,9 +129,8 @@ class DefaultSubtitleHelper(SubtitleHelper):
default = '1'
# Attempt to extract the language from the filename (e.g. Avatar (2009).eng)
language = ""
# IETF support thanks to https://github.com/hpsbranco/LocalMedia.bundle/commit/4fad9aefedece78a1fa96401304351347f644369
# IETF support thanks to
# https://github.com/hpsbranco/LocalMedia.bundle/commit/4fad9aefedece78a1fa96401304351347f644369
language = Locale.Language.Match(match_ietf_language(file))
# skip non-SRT if wanted
@@ -194,7 +193,10 @@ def get_subtitles_from_metadata(part):
def force_utf8(content):
a = UnicodeDammit(content)
Log.Debug("detected encoding: %s (None: most likely already successfully decoded)" % a.original_encoding)
if a.original_encoding:
Log.Debug("detected encoding: %s (None: most likely already successfully decoded)" % a.original_encoding)
else:
Log.Debug("detected encoding: unicode (already decoded)")
# easy way out - already utf-8
if a.original_encoding and a.original_encoding == "utf-8":
+96 -25
View File
@@ -4,6 +4,7 @@ import datetime
import time
import operator
import traceback
from urllib2 import URLError
from subliminal_patch.score import compute_score
from subliminal_patch.core import download_subtitles
@@ -16,8 +17,8 @@ from storage import save_subtitles, whack_missing_parts, get_subtitle_storage
from support.config import config
from support.items import get_recent_items, is_ignored, get_item
from support.lib import Plex
from support.helpers import track_usage, get_title_for_video_metadata, cast_bool
from support.plex_media import scan_videos, get_plex_metadata, PartUnknownException
from support.helpers import track_usage, get_title_for_video_metadata, cast_bool, PartUnknownException
from support.plex_media import scan_videos, get_plex_metadata
class Task(object):
@@ -80,14 +81,16 @@ class Task(object):
return
def run(self):
Log.Info(u"Task: running: %s", self.name)
self.time_start = datetime.datetime.now()
def post_run(self, data_holder):
self.running = False
self.last_run = datetime.datetime.now()
if self.time_start:
if self.time_start and self.last_run:
self.last_run_time = self.last_run - self.time_start
self.time_start = None
Log.Info(u"Task: ran: %s", self.name)
class SearchAllRecentlyAddedMissing(Task):
@@ -122,7 +125,7 @@ class SearchAllRecentlyAddedMissing(Task):
def prepare(self, *args, **kwargs):
self.items_done = []
recent_items = get_recent_items()
missing = items_get_all_missing_subs(recent_items)
missing = items_get_all_missing_subs(recent_items, sleep_after_request=0.2)
ids = set([id for added_at, id, title, item, missing_languages in missing if not is_ignored(id, item=item)])
self.items_searching = missing
self.items_searching_ids = ids
@@ -138,14 +141,19 @@ class SearchAllRecentlyAddedMissing(Task):
for added_at, item_id, title, item, missing_languages in self.items_searching:
Log.Debug(u"Task: %s, triggering refresh for %s (%s)", self.name, title, item_id)
refresh_item(item_id)
try:
refresh_item(item_id)
except URLError:
# timeout
pass
search_started = datetime.datetime.now()
tries = 1
while 1:
if item_id in self.items_done:
items_done_count += 1
Log.Debug(u"Task: %s, item %s done", self.name, item_id)
self.percentage = int(items_done_count * 100 / missing_count)
Log.Debug(u"Task: %s, item %s done (%s%%, %s/%s)", self.name, item_id, self.percentage,
items_done_count, missing_count)
break
# item considered stalled after self.stall_time seconds passed after last refresh
@@ -158,14 +166,18 @@ class SearchAllRecentlyAddedMissing(Task):
Log.Debug(u"Task: %s, item stalled for %s seconds: %s, retrying", self.name, self.stall_time,
item_id)
tries += 1
refresh_item(item_id)
try:
refresh_item(item_id)
except URLError:
pass
search_started = datetime.datetime.now()
time.sleep(1)
time.sleep(0.1)
# we can't hammer the PMS, otherwise requests will be stalled
time.sleep(1)
time.sleep(5)
Log.Debug("Task: %s, done. Failed items: %s", self.name, self.items_failed)
Log.Debug("Task: %s, done (%s%%, %s/%s). Failed items: %s", self.name, self.percentage,
items_done_count, missing_count, self.items_failed)
self.running = False
def post_run(self, task_data):
@@ -179,13 +191,11 @@ class SearchAllRecentlyAddedMissing(Task):
class SubtitleListingMixin(object):
def list_subtitles(self, rating_key, item_type, part_id, language):
def list_subtitles(self, rating_key, item_type, part_id, language, skip_wrong_fps=True):
metadata = get_plex_metadata(rating_key, part_id, item_type)
if item_type == "episode":
min_score = 240
else:
min_score = 60
if not metadata:
return
scanned_parts = scan_videos([metadata], kind="series" if item_type == "episode" else "movie", ignore_all=True)
if not scanned_parts:
@@ -195,9 +205,21 @@ class SubtitleListingMixin(object):
video, plex_part = scanned_parts.items()[0]
config.init_subliminal_patches()
provider_settings = config.provider_settings.copy()
if not skip_wrong_fps:
provider_settings = config.provider_settings.copy()
provider_settings["opensubtitles"]["skip_wrong_fps"] = False
if item_type == "episode":
min_score = 240
if video.is_special:
min_score = 180
else:
min_score = 60
available_subs = list_all_subtitles(scanned_parts, {Language.fromietf(language)},
providers=config.providers,
provider_configs=config.provider_settings,
provider_configs=provider_settings,
pool_class=config.provider_pool)
use_hearing_impaired = Prefs['subtitles.search.hearingImpaired'] in ("prefer", "force HI")
@@ -248,7 +270,7 @@ class DownloadSubtitleMixin(object):
if subtitle.content:
try:
whack_missing_parts(scanned_parts)
save_subtitles(scanned_parts, {video: [subtitle]}, mode=mode)
save_subtitles(scanned_parts, {video: [subtitle]}, mode=mode, mods=config.default_mods)
Log.Debug("Manually downloaded subtitle for: %s", rating_key)
download_successful = True
refresh_item(rating_key)
@@ -291,7 +313,13 @@ class AvailableSubsForItem(SubtitleListingMixin, Task):
super(AvailableSubsForItem, self).run()
self.running = True
track_usage("Subtitle", "manual", "list", 1)
self.data = self.list_subtitles(self.rating_key, self.item_type, self.part_id, self.language)
subs = self.list_subtitles(self.rating_key, self.item_type, self.part_id, self.language, skip_wrong_fps=False)
if not subs:
self.data = None
return
# we can't have nasty unpicklable stuff like ZipFile, BytesIO etc in self.data
self.data = [s.make_picklable() for s in subs]
def post_run(self, task_data):
super(AvailableSubsForItem, self).post_run(task_data)
@@ -362,13 +390,26 @@ class FindBetterSubtitles(DownloadSubtitleMixin, SubtitleListingMixin, Task):
return
now = datetime.datetime.now()
min_score_series = int(Prefs["subtitles.search.minimumTVScore2"].strip())
min_score_movies = int(Prefs["subtitles.search.minimumMovieScore2"].strip())
overwrite_manually_modified = cast_bool(
Prefs["scheduler.tasks.FindBetterSubtitles.overwrite_manually_modified"])
overwrite_manually_selected = cast_bool(
Prefs["scheduler.tasks.FindBetterSubtitles.overwrite_manually_selected"])
subtitle_storage = get_subtitle_storage()
recent_subs = subtitle_storage.load_recent_files(age_days=max_search_days)
viable_item_count = 0
for fn, stored_subs in recent_subs.iteritems():
video_id = stored_subs.video_id
cutoff = self.series_cutoff if stored_subs.item_type == "episode" else self.movies_cutoff
if stored_subs.item_type == "episode":
cutoff = self.series_cutoff
min_score = min_score_series
else:
cutoff = self.movies_cutoff
min_score = min_score_movies
# don't search for better subtitles until at least 30 minutes have passed
if stored_subs.added_at + datetime.timedelta(minutes=30) > now:
@@ -379,6 +420,7 @@ class FindBetterSubtitles(DownloadSubtitleMixin, SubtitleListingMixin, Task):
if stored_subs.added_at + datetime.timedelta(days=max_search_days) <= now:
continue
viable_item_count += 1
ditch_parts = []
# look through all stored subtitle data
@@ -398,14 +440,20 @@ class FindBetterSubtitles(DownloadSubtitleMixin, SubtitleListingMixin, Task):
# late cutoff met? skip
if current_score >= cutoff:
Log.Debug(u"Skipping finding better subs, cutoff met (current: %s, cutoff: %s): %s",
current_score, cutoff, stored_subs.title)
Log.Debug(u"Skipping finding better subs, cutoff met (current: %s, cutoff: %s): %s (%s)",
current_score, cutoff, stored_subs.title, video_id)
continue
# got manual subtitle but don't want to touch those?
if current_mode == "m" and \
not cast_bool(Prefs["scheduler.tasks.FindBetterSubtitles.overwrite_manually_selected"]):
Log.Debug(u"Skipping finding better subs, had manual: %s", stored_subs.title)
if current_mode == "m" and not overwrite_manually_selected:
Log.Debug(u"Skipping finding better subs, had manual: %s (%s)", stored_subs.title, video_id)
continue
# subtitle modifications different from default
if not overwrite_manually_modified and current.mods \
and set(current.mods).difference(set(config.default_mods)):
Log.Debug(u"Skipping finding better subs, it has manual modifications: %s (%s)",
stored_subs.title, video_id)
continue
try:
@@ -420,7 +468,7 @@ class FindBetterSubtitles(DownloadSubtitleMixin, SubtitleListingMixin, Task):
better_downloaded = False
better_tried_download = 0
for sub in subs:
if sub.score > current_score:
if sub.score > current_score and sub.score > min_score:
Log.Debug("Better subtitle found for %s, downloading", video_id)
better_tried_download += 1
ret = self.download_subtitle(sub, video_id, mode="b")
@@ -444,8 +492,13 @@ class FindBetterSubtitles(DownloadSubtitleMixin, SubtitleListingMixin, Task):
pass
subtitle_storage.save(stored_subs)
time.sleep(1)
if better_found:
Log.Debug("Task: %s, done. Better subtitles found for %s items", self.name, better_found)
Log.Debug("Task: %s, done. Better subtitles found for %s/%s items", self.name, better_found,
viable_item_count)
else:
Log.Debug("Task: %s, done. No better subtitles found for %s items", self.name, viable_item_count)
class SubtitleStorageMaintenance(Task):
@@ -465,9 +518,27 @@ class SubtitleStorageMaintenance(Task):
Log.Info("Nothing to do")
class MigrateSubtitleStorage(Task):
periodic = False
frequency = None
def run(self):
super(MigrateSubtitleStorage, self).run()
self.running = True
Log.Info("Running subtitle storage migration")
storage = get_subtitle_storage()
for fn in storage.get_all_files():
if fn.endswith(".json.gz"):
continue
Log.Debug("Migrating %s", fn)
storage.load(None, fn)
scheduler.register(SearchAllRecentlyAddedMissing)
scheduler.register(AvailableSubsForItem)
scheduler.register(DownloadSubtitleForItem)
scheduler.register(MissingSubtitles)
scheduler.register(FindBetterSubtitles)
scheduler.register(SubtitleStorageMaintenance)
scheduler.register(MigrateSubtitleStorage)
+57 -7
View File
@@ -258,13 +258,14 @@
"35",
"30",
"25",
"21",
"20",
"15",
"10",
"5",
"0"
],
"default": "25"
"default": "21"
},
{
"id": "provider.addic7ed.use_random_agents",
@@ -332,7 +333,7 @@
},
{
"id": "providers.multithreading",
"label": "Search enabled providers simuntaneously (multithreading)",
"label": "Search enabled providers simultaneously (multithreading)",
"type": "bool",
"default": "true"
},
@@ -356,7 +357,7 @@
},
{
"id": "subtitles.scan.exotic_ext",
"label": "Scan: include \"exotic\" external subtitle formats (anything else than .srt/.ssa/.ass)",
"label": "Scan: include \"exotic\" subtitle formats (anything else than .srt/.ssa/.ass; embedded or external)",
"type": "bool",
"default": "false"
},
@@ -381,7 +382,7 @@
"id": "subtitles.search.minimumMovieScore2",
"label": "Minimum score for movies (min: 60, def/sane: 69, min-ideal: 82; see http://v.ht/szscores)",
"type": "text",
"default": "69"
"default": "60"
},
{
"id": "subtitles.search.hearingImpaired",
@@ -399,14 +400,51 @@
"id": "subtitles.remove_hi",
"label": "Remove Hearing Impaired tags from downloaded subtitles",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.fix_common",
"label": "Fix common whitespace/punctuation issues in subtitles",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.fix_ocr",
"label": "Fix common OCR errors in downloaded subtitles",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.enforce_encoding",
"label": "Normalize subtitle encoding to UTF-8",
"label": "Normalize subtitle encoding to UTF-8 (highly recommended!)",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.colors",
"label": "Change colors of subtitles to",
"type": "enum",
"values": [
"don't change",
"white",
"light-grey",
"red",
"green",
"yellow",
"blue",
"magenta",
"cyan",
"black",
"dark-red",
"dark-green",
"dark-yellow",
"dark-blue",
"dark-magenta",
"dark-cyan",
"dark-grey"
],
"default": "don't change"
},
{
"id": "subtitles.save.filesystem",
"label": "Store subtitles next to media files (instead of metadata)",
@@ -498,7 +536,7 @@
"id": "scheduler.max_recent_items_per_library",
"label": "Scheduler: Recent items to consider per library",
"type": "text",
"default": "500"
"default": "1000"
},
{
"id": "scheduler.tasks.FindBetterSubtitles.frequency",
@@ -524,6 +562,12 @@
"type": "bool",
"default": "true"
},
{
"id": "scheduler.tasks.FindBetterSubtitles.overwrite_manually_modified",
"label": "Scheduler: Overwrite subtitles with non-default subtitle modifications when better found",
"type": "bool",
"default": "false"
},
{
"id": "history_size",
"label": "History: amount of items to store historical data for",
@@ -599,7 +643,7 @@
},
{
"id": "notify_executable",
"label": "Call this executable upon successful subtitle download",
"label": "Call this executable upon successful subtitle download (see Wiki for details)",
"type": "text",
"default": ""
},
@@ -622,6 +666,12 @@
],
"default": "WARNING"
},
{
"id": "log_debug_mods",
"label": "Log subtitle modification (debug)",
"type": "bool",
"default": "false"
},
{
"id": "log_console",
"label": "Log to console (for development/debugging)",
+3 -3
View File
@@ -9,11 +9,11 @@
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleShortVersionString</key>
<string>2.0.0</string>
<string>2.0.20</string>
<key>CFBundleSignature</key>
<string>????</string>
<key>CFBundleVersion</key>
<string>2.0.0.10</string>
<string>2.0.20.1364</string>
<key>PlexFrameworkVersion</key>
<string>2</string>
<key>PlexPluginClass</key>
@@ -32,7 +32,7 @@
&lt;h1&gt;Sub-Zero for Plex&lt;/h1&gt;&lt;i&gt;Subtitles done right&lt;/i&gt;
Version 2.0.0.10 DEV
Version 2.0.20.1364 RC9
Originally based on @bramwalet's awesome &lt;a href=&quot;https://github.com/bramwalet/Subliminal.bundle&quot;&gt;Subliminal.bundle&lt;/a&gt;
+2 -1
View File
@@ -369,7 +369,8 @@ class Chapter(object):
if chapterdisplays:
string = chapterdisplays[0].get('ChapString')
language = chapterdisplays[0].get('ChapLanguage')
return cls(start, hidden, enabled, end, string, language)
return cls(start, hidden, enabled, end, string, language)
return cls(start, hidden, enabled, end)
def __repr__(self):
return '<%s [%s, enabled=%s]>' % (self.__class__.__name__, self.start, self.enabled)
@@ -168,9 +168,13 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
while size is None or stream.tell() - start < size:
try:
element = parse_element(stream, specs)
if not element or not hasattr(element, "type"):
stream.seek(element.size, 1)
continue
if element.type is None:
logger.error('Element with id 0x%x is not in the specs' % element_id)
stream.seek(element_size, 1)
logger.error('Element with id 0x%x is not in the specs' % element.id)
stream.seek(element.size, 1)
continue
elif element.type in ignore_element_types or element.name in ignore_element_names:
logger.info('%s %s %s ignored', element.__class__.__name__, element.name, element.type)
@@ -39,12 +39,13 @@ def audio_codec():
rebulk.defaults(name="audio_codec", conflict_solver=audio_codec_priority)
rebulk.regex("MP3", "LAME", r"LAME(?:\d)+-?(?:\d)+", value="MP3")
rebulk.regex("Dolby", "DolbyDigital", "Dolby-Digital", "DDP?", value="DolbyDigital")
rebulk.regex("Dolby", "DolbyDigital", "Dolby-Digital", "DD", value="DolbyDigital")
rebulk.regex("DolbyAtmos", "Dolby-Atmos", "Atmos", value="DolbyAtmos")
rebulk.regex("AAC", value="AAC")
rebulk.string("AAC", value="AAC")
rebulk.regex("AC3D?", value="AC3")
rebulk.regex("Flac", value="FLAC")
rebulk.regex("DTS", value="DTS")
rebulk.string('EAC3', 'DDP', 'DD+', value="EAC3")
rebulk.string("Flac", value="FLAC")
rebulk.string("DTS", value="DTS")
rebulk.regex("True-?HD", value="TrueHD")
rebulk.defaults(name="audio_profile")
@@ -34,15 +34,17 @@ def container():
'ogv', 'qt', 'ra', 'ram', 'rm', 'ts', 'wav', 'webm', 'wma', 'wmv',
'iso', 'vob']
torrent = ['torrent']
nzb = ['nzb']
rebulk.regex(r'\.'+build_or_pattern(subtitles)+'$', exts=subtitles, tags=['extension', 'subtitle'])
rebulk.regex(r'\.'+build_or_pattern(info)+'$', exts=info, tags=['extension', 'info'])
rebulk.regex(r'\.'+build_or_pattern(videos)+'$', exts=videos, tags=['extension', 'video'])
rebulk.regex(r'\.'+build_or_pattern(torrent)+'$', exts=torrent, tags=['extension', 'torrent'])
rebulk.regex(r'\.'+build_or_pattern(nzb)+'$', exts=nzb, tags=['extension', 'nzb'])
rebulk.defaults(name='container',
validator=seps_surround,
formatter=lambda s: s.upper(),
formatter=lambda s: s.lower(),
conflict_solver=lambda match, other: match
if other.name in ['format',
'video_codec'] or other.name == 'container' and 'extension' in other.tags
@@ -51,5 +53,6 @@ def container():
rebulk.string(*[sub for sub in subtitles if sub not in ['sub']], tags=['subtitle'])
rebulk.string(*videos, tags=['video'])
rebulk.string(*torrent, tags=['torrent'])
rebulk.string(*nzb, tags=['nzb'])
return rebulk
@@ -5,7 +5,7 @@ Episode title
"""
from collections import defaultdict
from rebulk import Rebulk, Rule, AppendMatch, RenameMatch, POST_PROCESS
from rebulk import Rebulk, Rule, AppendMatch, RemoveMatch, RenameMatch, POST_PROCESS
from ..common import seps, title_seps
from ..common.formatters import cleanup
@@ -19,8 +19,12 @@ def episode_title():
:return: Created Rebulk object
:rtype: Rebulk
"""
rebulk = Rebulk().rules(EpisodeTitleFromPosition,
AlternativeTitleReplace,
previous_names = ('episode', 'episode_details', 'episode_count',
'season', 'season_count', 'date', 'title', 'year')
rebulk = Rebulk().rules(RemoveConflictsWithEpisodeTitle(previous_names),
EpisodeTitleFromPosition(previous_names),
AlternativeTitleReplace(previous_names),
TitleToEpisodeTitle,
Filepart3EpisodeTitle,
Filepart2EpisodeTitle,
@@ -28,6 +32,62 @@ def episode_title():
return rebulk
class RemoveConflictsWithEpisodeTitle(Rule):
"""
Remove conflicting matches that might lead to wrong episode_title parsing.
"""
priority = 64
consequence = RemoveMatch
def __init__(self, previous_names):
super(RemoveConflictsWithEpisodeTitle, self).__init__()
self.previous_names = previous_names
self.next_names = ('streaming_service', 'screen_size', 'format',
'video_codec', 'audio_codec', 'other', 'container')
self.affected_if_holes_after = ('part', )
self.affected_names = ('part', 'year')
def when(self, matches, context):
to_remove = []
for filepart in matches.markers.named('path'):
for match in matches.range(filepart.start, filepart.end,
predicate=lambda m: m.name in self.affected_names):
before = matches.previous(match, index=0,
predicate=lambda m, fp=filepart: not m.private and m.start >= fp.start)
if not before or before.name not in self.previous_names:
continue
after = matches.next(match, index=0,
predicate=lambda m, fp=filepart: not m.private and m.end <= fp.end)
if not after or after.name not in self.next_names:
continue
group = matches.markers.at_match(match, predicate=lambda m: m.name == 'group', index=0)
def has_value_in_same_group(current_match, current_group=group):
"""Return true if current match has value and belongs to the current group."""
return current_match.value.strip(seps) and (
current_group == matches.markers.at_match(current_match,
predicate=lambda mm: mm.name == 'group', index=0)
)
holes_before = matches.holes(before.end, match.start, predicate=has_value_in_same_group)
holes_after = matches.holes(match.end, after.start, predicate=has_value_in_same_group)
if not holes_before and not holes_after:
continue
if match.name in self.affected_if_holes_after and not holes_after:
continue
to_remove.append(match)
if match.parent:
to_remove.append(match.parent)
return to_remove
class TitleToEpisodeTitle(Rule):
"""
If multiple different title are found, convert the one following episode number to episode_title.
@@ -65,12 +125,14 @@ class EpisodeTitleFromPosition(TitleBaseRule):
"""
dependency = TitleToEpisodeTitle
def __init__(self, previous_names):
super(EpisodeTitleFromPosition, self).__init__('episode_title', ['title'])
self.previous_names = previous_names
def hole_filter(self, hole, matches):
episode = matches.previous(hole,
lambda previous: any(name in previous.names
for name in ['episode', 'episode_details',
'episode_count', 'season', 'season_count',
'date', 'title', 'year']),
for name in self.previous_names),
0)
crc32 = matches.named('crc32')
@@ -88,9 +150,6 @@ class EpisodeTitleFromPosition(TitleBaseRule):
return False
return super(EpisodeTitleFromPosition, self).should_remove(match, matches, filepart, hole, context)
def __init__(self):
super(EpisodeTitleFromPosition, self).__init__('episode_title', ['title'])
def when(self, matches, context):
if matches.named('episode_title'):
return
@@ -104,6 +163,10 @@ class AlternativeTitleReplace(Rule):
dependency = EpisodeTitleFromPosition
consequence = RenameMatch
def __init__(self, previous_names):
super(AlternativeTitleReplace, self).__init__()
self.previous_names = previous_names
def when(self, matches, context):
if matches.named('episode_title'):
return
@@ -115,10 +178,7 @@ class AlternativeTitleReplace(Rule):
if main_title:
episode = matches.previous(main_title,
lambda previous: any(name in previous.names
for name in ['episode', 'episode_details',
'episode_count', 'season',
'season_count',
'date', 'title', 'year']),
for name in self.previous_names),
0)
crc32 = matches.named('crc32')
@@ -231,14 +231,16 @@ def episodes():
formatter={'season': int, 'other': lambda match: 'Complete'})
# 12, 13
rebulk.chain(tags=['bonus-conflict', 'weak-movie', 'weak-episode'], formatter={'episode': int, 'version': int}) \
rebulk.chain(tags=['bonus-conflict', 'weak-movie', 'weak-episode'], formatter={'episode': int, 'version': int},
disabled=lambda context: context.get('type') == 'movie') \
.defaults(validator=None) \
.regex(r'(?P<episode>\d{2})') \
.regex(r'v(?P<version>\d+)').repeater('?') \
.regex(r'(?P<episodeSeparator>[x-])(?P<episode>\d{2})').repeater('*')
# 012, 013
rebulk.chain(tags=['bonus-conflict', 'weak-movie', 'weak-episode'], formatter={'episode': int, 'version': int}) \
rebulk.chain(tags=['bonus-conflict', 'weak-movie', 'weak-episode'], formatter={'episode': int, 'version': int},
disabled=lambda context: context.get('type') == 'movie') \
.defaults(validator=None) \
.regex(r'0(?P<episode>\d{1,2})') \
.regex(r'v(?P<version>\d+)').repeater('?') \
@@ -246,7 +248,8 @@ def episodes():
# 112, 113
rebulk.chain(tags=['bonus-conflict', 'weak-movie', 'weak-episode'], formatter={'episode': int, 'version': int},
disabled=lambda context: not context.get('episode_prefer_number', False)) \
disabled=lambda context: (not context.get('episode_prefer_number', False) or
context.get('type') == 'movie')) \
.defaults(validator=None) \
.regex(r'(?P<episode>\d{3,4})') \
.regex(r'v(?P<version>\d+)').repeater('?') \
@@ -287,7 +290,8 @@ def episodes():
rebulk.chain(tags=['bonus-conflict', 'weak-movie', 'weak-episode', 'weak-duplicate'],
formatter={'season': int, 'episode': int, 'version': int},
conflict_solver=lambda match, other: match if other.name == 'year' else '__default__',
disabled=lambda context: context.get('episode_prefer_number', False)) \
disabled=lambda context: (context.get('episode_prefer_number', False) or
context.get('type') == 'movie')) \
.defaults(validator=None) \
.regex(r'(?P<season>\d{1,2})(?P<episode>\d{2})') \
.regex(r'v(?P<version>\d+)').repeater('?') \
@@ -460,8 +464,21 @@ class RemoveWeakIfMovie(Rule):
return context.get('type') != 'episode'
def when(self, matches, context):
if matches.named('year'):
return matches.tagged('weak-movie')
to_remove = []
to_ignore = set()
remove = False
for filepart in matches.markers.named('path'):
year = matches.range(filepart.start, filepart.end, predicate=lambda m: m.name == 'year', index=0)
if year:
remove = True
next_match = matches.next(year, predicate=lambda m, fp=filepart: m.private and m.end <= fp.end, index=0)
if next_match and not matches.at_match(next_match, predicate=lambda m: m.name == 'year'):
to_ignore.add(next_match.initiator)
if remove:
to_remove.extend(matches.tagged('weak-movie', predicate=lambda m: m.initiator not in to_ignore))
return to_remove
class RemoveWeakIfSxxExx(Rule):
@@ -39,8 +39,7 @@ COMMON_WORDS_STRICT = frozenset(['brazil'])
UNDETERMINED = babelfish.Language('und')
SYN = {('und', None): ['unknown', 'inconnu', 'unk'],
('ell', None): ['gr', 'greek'],
SYN = {('ell', None): ['gr', 'greek'],
('spa', None): ['esp', 'español', 'espanol'],
('fra', None): ['français', 'vf', 'vff', 'vfi', 'vfq'],
('swe', None): ['se'],
@@ -85,6 +85,7 @@ class ValidateWebsitePrefix(Rule):
"""
Validate website prefixes
"""
priority = 64
consequence = RemoveMatch
def when(self, matches, context):
@@ -1814,7 +1814,7 @@
format: HDTV
video_codec: h264
audio_codec: AAC
container: MP4
container: mp4
release_group: k3n
type: episode
@@ -1885,7 +1885,7 @@
? Breaking.Bad.S01E01.2008.BluRay.VC1.1080P.5.1.WMV-NOVO
: audio_channels: '5.1'
container: WMV
container: wmv
episode: 1
format: BluRay
release_group: NOVO
@@ -1922,9 +1922,7 @@
? Fear.The.Walking.Dead.S02E01.HDTV.x264.AAC.MP4-k3n.mp4
: audio_codec: AAC
container:
- MP4
- mp4
container: mp4
episode: 1
format: HDTV
mimetype: video/mp4
@@ -2242,7 +2240,7 @@
screen_size: 1080p
streaming_service: Amazon Prime
format: WEBRip
audio_codec: DolbyDigital
audio_codec: EAC3
audio_channels: '5.1'
video_codec: h264
type: episode
@@ -2692,7 +2690,7 @@
screen_size: 4K
streaming_service: Amazon Prime
format: WEBRip
audio_codec: DolbyDigital
audio_codec: EAC3
audio_channels: '5.1'
video_codec: h264
release_group: Group
@@ -3311,7 +3309,7 @@
screen_size: 720p
format: WEBRip
video_codec: h264
container: MKV
container: mkv
audio_codec: AC3
audio_channels: '5.1'
release_group: Ehhhh
@@ -3846,3 +3844,113 @@
release_group: 0SEC [GloDLS]
container: mkv
type: episode
? Anthony.Bourdain.Parts.Unknown.S09E01.Los.Angeles.720p.HDTV.x264-MiNDTHEGAP
: title: Anthony Bourdain Parts Unknown
season: 9
episode: 1
episode_title: Los Angeles
screen_size: 720p
format: HDTV
video_codec: h264
release_group: MiNDTHEGAP
type: episode
? -feud.s01e05.and.the.winner.is.(the.oscars.of.1963).720p.amzn.webrip.dd5.1.x264-casstudio.mkv
: year: 1963
? feud.s01e05.and.the.winner.is.(the.oscars.of.1963).720p.amzn.webrip.dd5.1.x264-casstudio.mkv
: title: feud
season: 1
episode: 5
episode_title: and the winner is
screen_size: 720p
streaming_service: Amazon Prime
format: WEBRip
audio_codec: DolbyDigital
audio_channels: '5.1'
video_codec: h264
release_group: casstudio
container: mkv
type: episode
? Adventure.Time.S08E16.Elements.Part.1.Skyhooks.720p.WEB-DL.AAC2.0.H.264-RTN.mkv
: title: Adventure Time
season: 8
episode: 16
season: 8
episode: 16
episode_title: Elements Part 1 Skyhooks
screen_size: 720p
format: WEB-DL
audio_codec: AAC
audio_channels: '2.0'
video_codec: h264
release_group: RTN
container: mkv
type: episode
? D:\TV\SITCOMS (CLASSIC)\That '70s Show\Season 07\That '70s Show - S07E22 - 2000 Light Years from Home.mkv
: title: That '70s Show
season: 7
episode: 22
episode_title: 2000 Light Years from Home
other: Classic
container: mkv
mimetype: video/x-matroska
type: episode
? Show.Name.S02E01.Super.Title.720p.WEB-DL.DD5.1.H.264-ABC.nzb
: title: Show Name
season: 2
episode: 1
episode_title: Super Title
screen_size: 720p
format: WEB-DL
audio_codec: DolbyDigital
audio_channels: '5.1'
video_codec: h264
release_group: ABC
container: nzb
type: episode
? "[SGKK] Bleach 312v1 [720p/mkv]-Group.mkv"
: title: Bleach
season: 3
episode: 12
version: 1
screen_size: 720p
release_group: Group
container: mkv
type: episode
? The.Expanse.S02E08.720p.WEBRip.x264.EAC3-KiNGS.mkv
: title: The Expanse
season: 2
episode: 8
screen_size: 720p
format: WEBRip
video_codec: h264
audio_codec: EAC3
release_group: KiNGS
container: mkv
type: episode
? Series_name.2005.211.episode.title.avi
: title: Series name
year: 2005
season: 2
episode: 11
episode_title: episode title
container: avi
type: episode
? the.flash.2014.208.hdtv-lol[ettv].mkv
: title: the flash
year: 2014
season: 2
episode: 8
format: HDTV
release_group: lol[ettv]
container: mkv
type: episode
@@ -644,7 +644,7 @@
- Timsit
- Lindon
screen_size: 1080p
container: MKV
container: mkv
format: HDTV
? some.movie.720p.bluray.x264-mind
@@ -1082,3 +1082,18 @@
format: BluRay
screen_size: 1080p
type: movie
? 10 Cloverfield Lane.[Blu-Ray 1080p].[MULTI]
: options: --type movie
title: 10 Cloverfield Lane
format: BluRay
screen_size: 1080p
language: Multiple languages
type: movie
? 007.Spectre.[HDTC.MD].[TRUEFRENCH]
: options: --type movie
title: 007 Spectre
format: HDTC
language: French
type: movie
@@ -10,10 +10,14 @@
? +DolbyDigital
? +DD
? +DDP
? +Dolby Digital
: audio_codec: DolbyDigital
? +DDP
? +DD+
? +EAC3
: audio_codec: EAC3
? +DolbyAtmos
? +Dolby Atmos
? +Atmos
@@ -146,7 +146,7 @@
? Show.Name.-.Season.1.to.3.-.Mp4.1080p
? Show.Name.-.Season.1~3.-.Mp4.1080p
? Show.Name.-.Saison.1.a.3.-.Mp4.1080p
: container: MP4
: container: mp4
screen_size: 1080p
season:
- 1
@@ -761,14 +761,15 @@
type: episode
video_codec: h264
# Episode title is indeed 'October 8, 2014'
# https://thetvdb.com/?tab=episode&seriesid=82483&seasonid=569935&id=4997362&lid=7
? The Soup - 11x41 - October 8, 2014.mp4
: container: mp4
episode: 41
episode_title: October 8
episode_title: October 8, 2014
season: 11
title: The Soup
type: episode
year: 2014
? Red.Rock.S02E59.WEB-DLx264-JIVE
: episode: 59
@@ -0,0 +1,24 @@
from .utils import hashodict, NoNumpyException, NoPandasException, get_scalar_repr, encode_scalars_inplace
from .comment import strip_comment_line_with_symbol, strip_comments
from .encoders import TricksEncoder, json_date_time_encode, class_instance_encode, json_complex_encode, \
numeric_types_encode, ClassInstanceEncoder, json_set_encode, pandas_encode, nopandas_encode, \
numpy_encode, NumpyEncoder, nonumpy_encode, NoNumpyEncoder
from .decoders import DuplicateJsonKeyException, TricksPairHook, json_date_time_hook, json_complex_hook, \
numeric_types_hook, ClassInstanceHook, json_set_hook, pandas_hook, nopandas_hook, json_numpy_obj_hook, \
json_nonumpy_obj_hook
from .nonp import dumps, dump, loads, load
try:
# find_module takes just as long as importing, so no optimization possible
import numpy
except ImportError:
NUMPY_MODE = False
# from .nonp import dumps, dump, loads, load, nonumpy_encode as numpy_encode, json_nonumpy_obj_hook as json_numpy_obj_hook
else:
NUMPY_MODE = True
# from .np import dumps, dump, loads, load, numpy_encode, NumpyEncoder, json_numpy_obj_hook
# from .np_utils import encode_scalars_inplace
@@ -0,0 +1,29 @@
from re import findall
def strip_comment_line_with_symbol(line, start):
parts = line.split(start)
counts = [len(findall(r'(?:^|[^"\\]|(?:\\\\|\\")+)(")', part)) for part in parts]
total = 0
for nr, count in enumerate(counts):
total += count
if total % 2 == 0:
return start.join(parts[:nr+1]).rstrip()
else:
return line.rstrip()
def strip_comments(string, comment_symbols=frozenset(('#', '//'))):
"""
:param string: A string containing json with comments started by comment_symbols.
:param comment_symbols: Iterable of symbols that start a line comment (default # or //).
:return: The string with the comments removed.
"""
lines = string.splitlines()
for k in range(len(lines)):
for symbol in comment_symbols:
lines[k] = strip_comment_line_with_symbol(lines[k], start=symbol)
return '\n'.join(lines)
@@ -0,0 +1,248 @@
from datetime import datetime, date, time, timedelta
from fractions import Fraction
from importlib import import_module
from collections import OrderedDict
from decimal import Decimal
from logging import warning
from json_tricks import NoPandasException, NoNumpyException
class DuplicateJsonKeyException(Exception):
""" Trying to load a json map which contains duplicate keys, but allow_duplicates is False """
class TricksPairHook(object):
"""
Hook that converts json maps to the appropriate python type (dict or OrderedDict)
and then runs any number of hooks on the individual maps.
"""
def __init__(self, ordered=True, obj_pairs_hooks=None, allow_duplicates=True):
"""
:param ordered: True if maps should retain their ordering.
:param obj_pairs_hooks: An iterable of hooks to apply to elements.
"""
self.map_type = OrderedDict
if not ordered:
self.map_type = dict
self.obj_pairs_hooks = []
if obj_pairs_hooks:
self.obj_pairs_hooks = list(obj_pairs_hooks)
self.allow_duplicates = allow_duplicates
def __call__(self, pairs):
if not self.allow_duplicates:
known = set()
for key, value in pairs:
if key in known:
raise DuplicateJsonKeyException(('Trying to load a json map which contains a' +
' duplicate key "{0:}" (but allow_duplicates is False)').format(key))
known.add(key)
map = self.map_type(pairs)
for hook in self.obj_pairs_hooks:
map = hook(map)
return map
def json_date_time_hook(dct):
"""
Return an encoded date, time, datetime or timedelta to it's python representation, including optional timezone.
:param dct: (dict) json encoded date, time, datetime or timedelta
:return: (date/time/datetime/timedelta obj) python representation of the above
"""
def get_tz(dct):
if not 'tzinfo' in dct:
return None
try:
import pytz
except ImportError as err:
raise ImportError(('Tried to load a json object which has a timezone-aware (date)time. '
'However, `pytz` could not be imported, so the object could not be loaded. '
'Error: {0:}').format(str(err)))
return pytz.timezone(dct['tzinfo'])
if isinstance(dct, dict):
if '__date__' in dct:
return date(year=dct.get('year', 0), month=dct.get('month', 0), day=dct.get('day', 0))
elif '__time__' in dct:
tzinfo = get_tz(dct)
return time(hour=dct.get('hour', 0), minute=dct.get('minute', 0), second=dct.get('second', 0),
microsecond=dct.get('microsecond', 0), tzinfo=tzinfo)
elif '__datetime__' in dct:
tzinfo = get_tz(dct)
return datetime(year=dct.get('year', 0), month=dct.get('month', 0), day=dct.get('day', 0),
hour=dct.get('hour', 0), minute=dct.get('minute', 0), second=dct.get('second', 0),
microsecond=dct.get('microsecond', 0), tzinfo=tzinfo)
elif '__timedelta__' in dct:
return timedelta(days=dct.get('days', 0), seconds=dct.get('seconds', 0),
microseconds=dct.get('microseconds', 0))
return dct
def json_complex_hook(dct):
"""
Return an encoded complex number to it's python representation.
:param dct: (dict) json encoded complex number (__complex__)
:return: python complex number
"""
if isinstance(dct, dict):
if '__complex__' in dct:
parts = dct['__complex__']
assert len(parts) == 2
return parts[0] + parts[1] * 1j
return dct
def numeric_types_hook(dct):
if isinstance(dct, dict):
if '__decimal__' in dct:
return Decimal(dct['__decimal__'])
if '__fraction__' in dct:
return Fraction(numerator=dct['numerator'], denominator=dct['denominator'])
return dct
class ClassInstanceHook(object):
"""
This hook tries to convert json encoded by class_instance_encoder back to it's original instance.
It only works if the environment is the same, e.g. the class is similarly importable and hasn't changed.
"""
def __init__(self, cls_lookup_map=None):
self.cls_lookup_map = cls_lookup_map or {}
def __call__(self, dct):
if isinstance(dct, dict) and '__instance_type__' in dct:
mod, name = dct['__instance_type__']
attrs = dct['attributes']
if mod is None:
try:
Cls = getattr((__import__('__main__')), name)
except (ImportError, AttributeError) as err:
if not name in self.cls_lookup_map:
raise ImportError(('class {0:s} seems to have been exported from the main file, which means '
'it has no module/import path set; you need to provide cls_lookup_map which maps names '
'to classes').format(name))
Cls = self.cls_lookup_map[name]
else:
imp_err = None
try:
module = import_module('{0:}'.format(mod, name))
except ImportError as err:
imp_err = ('encountered import error "{0:}" while importing "{1:}" to decode a json file; perhaps '
'it was encoded in a different environment where {1:}.{2:} was available').format(err, mod, name)
else:
if not hasattr(module, name):
imp_err = 'imported "{0:}" but could find "{1:}" inside while decoding a json file (found {2:}'.format(
module, name, ', '.join(attr for attr in dir(module) if not attr.startswith('_')))
Cls = getattr(module, name)
if imp_err:
if 'name' in self.cls_lookup_map:
Cls = self.cls_lookup_map[name]
else:
raise ImportError(imp_err)
try:
obj = Cls.__new__(Cls)
except TypeError:
raise TypeError(('problem while decoding instance of "{0:s}"; this instance has a special '
'__new__ method and can\'t be restored').format(name))
if hasattr(obj, '__json_decode__'):
obj.__json_decode__(**attrs)
else:
obj.__dict__ = dict(attrs)
return obj
return dct
def json_set_hook(dct):
"""
Return an encoded set to it's python representation.
"""
if isinstance(dct, dict):
if '__set__' in dct:
return set((tuple(item) if isinstance(item, list) else item) for item in dct['__set__'])
return dct
def pandas_hook(dct):
if '__pandas_dataframe__' in dct or '__pandas_series__' in dct:
# todo: this is experimental
if not getattr(pandas_hook, '_warned', False):
pandas_hook._warned = True
warning('Pandas loading support in json-tricks is experimental and may change in future versions.')
if '__pandas_dataframe__' in dct:
try:
from pandas import DataFrame
except ImportError:
raise NoPandasException('Trying to decode a map which appears to represent a pandas data structure, but pandas appears not to be installed.')
from numpy import dtype, array
meta = dct.pop('__pandas_dataframe__')
indx = dct.pop('index') if 'index' in dct else None
dtypes = dict((colname, dtype(tp)) for colname, tp in zip(meta['column_order'], meta['types']))
data = OrderedDict()
for name, col in dct.items():
data[name] = array(col, dtype=dtypes[name])
return DataFrame(
data=data,
index=indx,
columns=meta['column_order'],
# mixed `dtypes` argument not supported, so use duct of numpy arrays
)
elif '__pandas_series__' in dct:
from pandas import Series
from numpy import dtype, array
meta = dct.pop('__pandas_series__')
indx = dct.pop('index') if 'index' in dct else None
return Series(
data=dct['data'],
index=indx,
name=meta['name'],
dtype=dtype(meta['type']),
)
return dct
def nopandas_hook(dct):
if isinstance(dct, dict) and ('__pandas_dataframe__' in dct or '__pandas_series__' in dct):
raise NoPandasException(('Trying to decode a map which appears to represent a pandas '
'data structure, but pandas support is not enabled, perhaps it is not installed.'))
return dct
def json_numpy_obj_hook(dct):
"""
Replace any numpy arrays previously encoded by NumpyEncoder to their proper
shape, data type and data.
:param dct: (dict) json encoded ndarray
:return: (ndarray) if input was an encoded ndarray
"""
if isinstance(dct, dict) and '__ndarray__' in dct:
try:
from numpy import asarray
import numpy as nptypes
except ImportError:
raise NoNumpyException('Trying to decode a map which appears to represent a numpy '
'array, but numpy appears not to be installed.')
order = 'A'
if 'Corder' in dct:
order = 'C' if dct['Corder'] else 'F'
if dct['shape']:
return asarray(dct['__ndarray__'], dtype=dct['dtype'], order=order)
else:
dtype = getattr(nptypes, dct['dtype'])
return dtype(dct['__ndarray__'])
return dct
def json_nonumpy_obj_hook(dct):
"""
This hook has no effect except to check if you're trying to decode numpy arrays without support, and give you a useful message.
"""
if isinstance(dct, dict) and '__ndarray__' in dct:
raise NoNumpyException(('Trying to decode a map which appears to represent a numpy array, '
'but numpy support is not enabled, perhaps it is not installed.'))
return dct
@@ -0,0 +1,311 @@
from datetime import datetime, date, time, timedelta
from fractions import Fraction
from logging import warning
from json import JSONEncoder
from sys import version
from decimal import Decimal
from .utils import hashodict, call_with_optional_kwargs, NoPandasException, NoNumpyException
class TricksEncoder(JSONEncoder):
"""
Encoder that runs any number of encoder functions or instances on
the objects that are being encoded.
Each encoder should make any appropriate changes and return an object,
changed or not. This will be passes to the other encoders.
"""
def __init__(self, obj_encoders=None, silence_typeerror=False, primitives=False, **json_kwargs):
"""
:param obj_encoders: An iterable of functions or encoder instances to try.
:param silence_typeerror: If set to True, ignore the TypeErrors that Encoder instances throw (default False).
"""
self.obj_encoders = []
if obj_encoders:
self.obj_encoders = list(obj_encoders)
self.silence_typeerror = silence_typeerror
self.primitives = primitives
super(TricksEncoder, self).__init__(**json_kwargs)
def default(self, obj, *args, **kwargs):
"""
This is the method of JSONEncoders that is called for each object; it calls
all the encoders with the previous one's output used as input.
It works for Encoder instances, but they are expected not to throw
`TypeError` for unrecognized types (the super method does that by default).
It never calls the `super` method so if there are non-primitive types
left at the end, you'll get an encoding error.
"""
prev_id = id(obj)
for encoder in self.obj_encoders:
if hasattr(encoder, 'default'):
#todo: write test for this scenario (maybe ClassInstanceEncoder?)
try:
obj = call_with_optional_kwargs(encoder.default, obj, primitives=self.primitives)
except TypeError as err:
if not self.silence_typeerror:
raise
elif hasattr(encoder, '__call__'):
obj = call_with_optional_kwargs(encoder, obj, primitives=self.primitives)
else:
raise TypeError('`obj_encoder` {0:} does not have `default` method and is not callable'.format(encoder))
if id(obj) == prev_id:
#todo: test
raise TypeError('Object of type {0:} could not be encoded by {1:} using encoders [{2:s}]'.format(
type(obj), self.__class__.__name__, ', '.join(str(encoder) for encoder in self.obj_encoders)))
return obj
def json_date_time_encode(obj, primitives=False):
"""
Encode a date, time, datetime or timedelta to a string of a json dictionary, including optional timezone.
:param obj: date/time/datetime/timedelta obj
:return: (dict) json primitives representation of date, time, datetime or timedelta
"""
if primitives and isinstance(obj, (date, time, datetime)):
return obj.isoformat()
if isinstance(obj, datetime):
dct = hashodict([('__datetime__', None), ('year', obj.year), ('month', obj.month),
('day', obj.day), ('hour', obj.hour), ('minute', obj.minute),
('second', obj.second), ('microsecond', obj.microsecond)])
if obj.tzinfo:
dct['tzinfo'] = obj.tzinfo.zone
elif isinstance(obj, date):
dct = hashodict([('__date__', None), ('year', obj.year), ('month', obj.month), ('day', obj.day)])
elif isinstance(obj, time):
dct = hashodict([('__time__', None), ('hour', obj.hour), ('minute', obj.minute),
('second', obj.second), ('microsecond', obj.microsecond)])
if obj.tzinfo:
dct['tzinfo'] = obj.tzinfo.zone
elif isinstance(obj, timedelta):
if primitives:
return obj.total_seconds()
else:
dct = hashodict([('__timedelta__', None), ('days', obj.days), ('seconds', obj.seconds),
('microseconds', obj.microseconds)])
else:
return obj
for key, val in tuple(dct.items()):
if not key.startswith('__') and not val:
del dct[key]
return dct
def class_instance_encode(obj, primitives=False):
"""
Encodes a class instance to json. Note that it can only be recovered if the environment allows the class to be
imported in the same way.
"""
if isinstance(obj, list) or isinstance(obj, dict):
return obj
if hasattr(obj, '__class__') and hasattr(obj, '__dict__'):
if not hasattr(obj, '__new__'):
raise TypeError('class "{0:s}" does not have a __new__ method; '.format(obj.__class__) +
('perhaps it is an old-style class not derived from `object`; add `object` as a base class to encode it.'
if (version[:2] == '2.') else 'this should not happen in Python3'))
try:
obj.__new__(obj.__class__)
except TypeError:
raise TypeError(('instance "{0:}" of class "{1:}" cannot be encoded because it\'s __new__ method '
'cannot be called, perhaps it requires extra parameters').format(obj, obj.__class__))
mod = obj.__class__.__module__
if mod == '__main__':
mod = None
warning(('class {0:} seems to have been defined in the main file; unfortunately this means'
' that it\'s module/import path is unknown, so you might have to provide cls_lookup_map when '
'decoding').format(obj.__class__))
name = obj.__class__.__name__
if hasattr(obj, '__json_encode__'):
attrs = obj.__json_encode__()
else:
attrs = hashodict(obj.__dict__.items())
if primitives:
return attrs
else:
return hashodict((('__instance_type__', (mod, name)), ('attributes', attrs)))
return obj
def json_complex_encode(obj, primitives=False):
"""
Encode a complex number as a json dictionary of it's real and imaginary part.
:param obj: complex number, e.g. `2+1j`
:return: (dict) json primitives representation of `obj`
"""
if isinstance(obj, complex):
if primitives:
return [obj.real, obj.imag]
else:
return hashodict(__complex__=[obj.real, obj.imag])
return obj
def numeric_types_encode(obj, primitives=False):
"""
Encode Decimal and Fraction.
:param primitives: Encode decimals and fractions as standard floats. You may lose precision. If you do this, you may need to enable `allow_nan` (decimals always allow NaNs but floats do not).
"""
if isinstance(obj, Decimal):
if primitives:
return float(obj)
else:
return {
'__decimal__': str(obj.canonical()),
}
if isinstance(obj, Fraction):
if primitives:
return float(obj)
else:
return hashodict((
('__fraction__', True),
('numerator', obj.numerator),
('denominator', obj.denominator),
))
return obj
class ClassInstanceEncoder(JSONEncoder):
"""
See `class_instance_encoder`.
"""
# Not covered in tests since `class_instance_encode` is recommended way.
def __init__(self, obj, encode_cls_instances=True, **kwargs):
self.encode_cls_instances = encode_cls_instances
super(ClassInstanceEncoder, self).__init__(obj, **kwargs)
def default(self, obj, *args, **kwargs):
if self.encode_cls_instances:
obj = class_instance_encode(obj)
return super(ClassInstanceEncoder, self).default(obj, *args, **kwargs)
def json_set_encode(obj, primitives=False):
"""
Encode python sets as dictionary with key __set__ and a list of the values.
Try to sort the set to get a consistent json representation, use arbitrary order if the data is not ordinal.
"""
if isinstance(obj, set):
try:
repr = sorted(obj)
except Exception:
repr = list(obj)
if primitives:
return repr
else:
return hashodict(__set__=repr)
return obj
def pandas_encode(obj, primitives=False):
from pandas import DataFrame, Series
if isinstance(obj, (DataFrame, Series)):
#todo: this is experimental
if not getattr(pandas_encode, '_warned', False):
pandas_encode._warned = True
warning('Pandas dumping support in json-tricks is experimental and may change in future versions.')
if isinstance(obj, DataFrame):
repr = hashodict()
if not primitives:
repr['__pandas_dataframe__'] = hashodict((
('column_order', tuple(obj.columns.values)),
('types', tuple(str(dt) for dt in obj.dtypes)),
))
repr['index'] = tuple(obj.index.values)
for k, name in enumerate(obj.columns.values):
repr[name] = tuple(obj.ix[:, k].values)
return repr
if isinstance(obj, Series):
repr = hashodict()
if not primitives:
repr['__pandas_series__'] = hashodict((
('name', str(obj.name)),
('type', str(obj.dtype)),
))
repr['index'] = tuple(obj.index.values)
repr['data'] = tuple(obj.values)
return repr
return obj
def nopandas_encode(obj):
if ('DataFrame' in getattr(obj.__class__, '__name__', '') or 'Series' in getattr(obj.__class__, '__name__', '')) \
and 'pandas.' in getattr(obj.__class__, '__module__', ''):
raise NoPandasException(('Trying to encode an object of type {0:} which appears to be '
'a numpy array, but numpy support is not enabled, perhaps it is not installed.').format(type(obj)))
return obj
def numpy_encode(obj, primitives=False):
"""
Encodes numpy `ndarray`s as lists with meta data.
Encodes numpy scalar types as Python equivalents. Special encoding is not possible,
because int64 (in py2) and float64 (in py2 and py3) are subclasses of primitives,
which never reach the encoder.
:param primitives: If True, arrays are serialized as (nested) lists without meta info.
"""
from numpy import ndarray, generic
if isinstance(obj, ndarray):
if primitives:
return obj.tolist()
else:
dct = hashodict((
('__ndarray__', obj.tolist()),
('dtype', str(obj.dtype)),
('shape', obj.shape),
))
if len(obj.shape) > 1:
dct['Corder'] = obj.flags['C_CONTIGUOUS']
return dct
elif isinstance(obj, generic):
if NumpyEncoder.SHOW_SCALAR_WARNING:
NumpyEncoder.SHOW_SCALAR_WARNING = False
warning('json-tricks: numpy scalar serialization is experimental and may work differently in future versions')
return obj.item()
return obj
class NumpyEncoder(ClassInstanceEncoder):
"""
JSON encoder for numpy arrays.
"""
SHOW_SCALAR_WARNING = True # show a warning that numpy scalar serialization is experimental
def default(self, obj, *args, **kwargs):
"""
If input object is a ndarray it will be converted into a dict holding
data type, shape and the data. The object can be restored using json_numpy_obj_hook.
"""
warning('`NumpyEncoder` is deprecated, use `numpy_encode`') #todo
obj = numpy_encode(obj)
return super(NumpyEncoder, self).default(obj, *args, **kwargs)
def nonumpy_encode(obj):
"""
Raises an error for numpy arrays.
"""
if 'ndarray' in getattr(obj.__class__, '__name__', '') and 'numpy.' in getattr(obj.__class__, '__module__', ''):
raise NoNumpyException(('Trying to encode an object of type {0:} which appears to be '
'a pandas data stucture, but pandas support is not enabled, perhaps it is not installed.').format(type(obj)))
return obj
class NoNumpyEncoder(JSONEncoder):
"""
See `nonumpy_encode`.
"""
def default(self, obj, *args, **kwargs):
warning('`NoNumpyEncoder` is deprecated, use `nonumpy_encode`') #todo
obj = nonumpy_encode(obj)
return super(NoNumpyEncoder, self).default(obj, *args, **kwargs)
@@ -0,0 +1,207 @@
from gzip import GzipFile
from io import BytesIO
from json import loads as json_loads
from os import fsync
from sys import exc_info, version
from .utils import NoNumpyException # keep 'unused' imports
from .comment import strip_comment_line_with_symbol, strip_comments # keep 'unused' imports
from .encoders import TricksEncoder, json_date_time_encode, class_instance_encode, ClassInstanceEncoder, \
json_complex_encode, json_set_encode, numeric_types_encode, numpy_encode, nonumpy_encode, NoNumpyEncoder, \
nopandas_encode, pandas_encode # keep 'unused' imports
from .decoders import DuplicateJsonKeyException, TricksPairHook, json_date_time_hook, ClassInstanceHook, \
json_complex_hook, json_set_hook, numeric_types_hook, json_numpy_obj_hook, json_nonumpy_obj_hook, \
nopandas_hook, pandas_hook # keep 'unused' imports
from json import JSONEncoder
is_py3 = (version[:2] == '3.')
str_type = str if is_py3 else (basestring, unicode,)
ENCODING = 'UTF-8'
_cih_instance = ClassInstanceHook()
DEFAULT_ENCODERS = [json_date_time_encode, class_instance_encode, json_complex_encode, json_set_encode, numeric_types_encode,]
DEFAULT_HOOKS = [json_date_time_hook, _cih_instance, json_complex_hook, json_set_hook, numeric_types_hook,]
try:
import numpy
except ImportError:
DEFAULT_ENCODERS = [nonumpy_encode,] + DEFAULT_ENCODERS
DEFAULT_HOOKS = [json_nonumpy_obj_hook,] + DEFAULT_HOOKS
else:
# numpy encode needs to be before complex
DEFAULT_ENCODERS = [numpy_encode,] + DEFAULT_ENCODERS
DEFAULT_HOOKS = [json_numpy_obj_hook,] + DEFAULT_HOOKS
try:
import pandas
except ImportError:
DEFAULT_ENCODERS = [nopandas_encode,] + DEFAULT_ENCODERS
DEFAULT_HOOKS = [nopandas_hook,] + DEFAULT_HOOKS
else:
DEFAULT_ENCODERS = [pandas_encode,] + DEFAULT_ENCODERS
DEFAULT_HOOKS = [pandas_hook,] + DEFAULT_HOOKS
DEFAULT_NONP_ENCODERS = [nonumpy_encode,] + DEFAULT_ENCODERS # DEPRECATED
DEFAULT_NONP_HOOKS = [json_nonumpy_obj_hook,] + DEFAULT_HOOKS # DEPRECATED
def dumps(obj, sort_keys=None, cls=TricksEncoder, obj_encoders=DEFAULT_ENCODERS, extra_obj_encoders=(),
primitives=False, compression=None, allow_nan=False, conv_str_byte=False, **jsonkwargs):
"""
Convert a nested data structure to a json string.
:param obj: The Python object to convert.
:param sort_keys: Keep this False if you want order to be preserved.
:param cls: The json encoder class to use, defaults to NoNumpyEncoder which gives a warning for numpy arrays.
:param obj_encoders: Iterable of encoders to use to convert arbitrary objects into json-able promitives.
:param extra_obj_encoders: Like `obj_encoders` but on top of them: use this to add encoders without replacing defaults. Since v3.5 these happen before default encoders.
:param allow_nan: Allow NaN and Infinity values, which is a (useful) violation of the JSON standard (default False).
:param conv_str_byte: Try to automatically convert between strings and bytes (assuming utf-8) (default False).
:return: The string containing the json-encoded version of obj.
Other arguments are passed on to `cls`. Note that `sort_keys` should be false if you want to preserve order.
"""
if not hasattr(extra_obj_encoders, '__iter__'):
raise TypeError('`extra_obj_encoders` should be a tuple in `json_tricks.dump(s)`')
encoders = tuple(extra_obj_encoders) + tuple(obj_encoders)
txt = cls(sort_keys=sort_keys, obj_encoders=encoders, allow_nan=allow_nan,
primitives=primitives, **jsonkwargs).encode(obj)
if not is_py3 and isinstance(txt, str):
txt = unicode(txt, ENCODING)
if not compression:
return txt
if compression is True:
compression = 5
txt = txt.encode(ENCODING)
sh = BytesIO()
with GzipFile(mode='wb', fileobj=sh, compresslevel=compression) as zh:
zh.write(txt)
gzstring = sh.getvalue()
return gzstring
def dump(obj, fp, sort_keys=None, cls=TricksEncoder, obj_encoders=DEFAULT_ENCODERS, extra_obj_encoders=(),
primitives=False, compression=None, force_flush=False, allow_nan=False, conv_str_byte=False, **jsonkwargs):
"""
Convert a nested data structure to a json string.
:param fp: File handle or path to write to.
:param compression: The gzip compression level, or None for no compression.
:param force_flush: If True, flush the file handle used, when possibly also in the operating system (default False).
The other arguments are identical to `dumps`.
"""
txt = dumps(obj, sort_keys=sort_keys, cls=cls, obj_encoders=obj_encoders, extra_obj_encoders=extra_obj_encoders,
primitives=primitives, compression=compression, allow_nan=allow_nan, conv_str_byte=conv_str_byte, **jsonkwargs)
if isinstance(fp, str_type):
fh = open(fp, 'wb+')
else:
fh = fp
if conv_str_byte:
try:
fh.write(b'')
except TypeError:
pass
# if not isinstance(txt, str_type):
# # Cannot write bytes, so must be in text mode, but we didn't get a text
# if not compression:
# txt = txt.decode(ENCODING)
else:
try:
fh.write(u'')
except TypeError:
if isinstance(txt, str_type):
txt = txt.encode(ENCODING)
try:
if 'b' not in getattr(fh, 'mode', 'b?') and not isinstance(txt, str_type) and compression:
raise IOError('If compression is enabled, the file must be opened in binary mode.')
try:
fh.write(txt)
except TypeError as err:
err.args = (err.args[0] + '. A possible reason is that the file is not opened in binary mode; '
'be sure to set file mode to something like "wb".',)
raise
finally:
if force_flush:
fh.flush()
try:
if fh.fileno() is not None:
fsync(fh.fileno())
except (ValueError,):
pass
if isinstance(fp, str_type):
fh.close()
return txt
def loads(string, preserve_order=True, ignore_comments=True, decompression=None, obj_pairs_hooks=DEFAULT_HOOKS,
extra_obj_pairs_hooks=(), cls_lookup_map=None, allow_duplicates=True, conv_str_byte=False, **jsonkwargs):
"""
Convert a nested data structure to a json string.
:param string: The string containing a json encoded data structure.
:param decode_cls_instances: True to attempt to decode class instances (requires the environment to be similar the the encoding one).
:param preserve_order: Whether to preserve order by using OrderedDicts or not.
:param ignore_comments: Remove comments (starting with # or //).
:param decompression: True to use gzip decompression, False to use raw data, None to automatically determine (default). Assumes utf-8 encoding!
:param obj_pairs_hooks: A list of dictionary hooks to apply.
:param extra_obj_pairs_hooks: Like `obj_pairs_hooks` but on top of them: use this to add hooks without replacing defaults. Since v3.5 these happen before default hooks.
:param cls_lookup_map: If set to a dict, for example ``globals()``, then classes encoded from __main__ are looked up this dict.
:param allow_duplicates: If set to False, an error will be raised when loading a json-map that contains duplicate keys.
:param parse_float: A function to parse strings to integers (e.g. Decimal). There is also `parse_int`.
:param conv_str_byte: Try to automatically convert between strings and bytes (assuming utf-8) (default False).
:return: The string containing the json-encoded version of obj.
Other arguments are passed on to json_func.
"""
if not hasattr(extra_obj_pairs_hooks, '__iter__'):
raise TypeError('`extra_obj_pairs_hooks` should be a tuple in `json_tricks.load(s)`')
if decompression is None:
decompression = string[:2] == b'\x1f\x8b'
if decompression:
with GzipFile(fileobj=BytesIO(string), mode='rb') as zh:
string = zh.read()
string = string.decode(ENCODING)
if not isinstance(string, str_type):
if conv_str_byte:
string = string.decode(ENCODING)
else:
raise TypeError(('Cannot automatically encode object of type "{0:}" in `json_tricks.load(s)` since '
'the encoding is not known. You should instead encode the bytes to a string and pass that '
'string to `load(s)`, for example bytevar.encode("utf-8") if utf-8 is the encoding.').format(type(string)))
if ignore_comments:
string = strip_comments(string)
obj_pairs_hooks = tuple(obj_pairs_hooks)
_cih_instance.cls_lookup_map = cls_lookup_map or {}
hooks = tuple(extra_obj_pairs_hooks) + obj_pairs_hooks
hook = TricksPairHook(ordered=preserve_order, obj_pairs_hooks=hooks, allow_duplicates=allow_duplicates)
return json_loads(string, object_pairs_hook=hook, **jsonkwargs)
def load(fp, preserve_order=True, ignore_comments=True, decompression=None, obj_pairs_hooks=DEFAULT_HOOKS,
extra_obj_pairs_hooks=(), cls_lookup_map=None, allow_duplicates=True, conv_str_byte=False, **jsonkwargs):
"""
Convert a nested data structure to a json string.
:param fp: File handle or path to load from.
The other arguments are identical to loads.
"""
try:
if isinstance(fp, str_type):
with open(fp, 'rb') as fh:
string = fh.read()
else:
string = fp.read()
except UnicodeDecodeError as err:
# todo: not covered in tests, is it relevant?
raise Exception('There was a problem decoding the file content. A possible reason is that the file is not ' +
'opened in binary mode; be sure to set file mode to something like "rb".').with_traceback(exc_info()[2])
return loads(string, preserve_order=preserve_order, ignore_comments=ignore_comments, decompression=decompression,
obj_pairs_hooks=obj_pairs_hooks, extra_obj_pairs_hooks=extra_obj_pairs_hooks, cls_lookup_map=cls_lookup_map,
allow_duplicates=allow_duplicates, conv_str_byte=conv_str_byte, **jsonkwargs)
@@ -0,0 +1,28 @@
"""
This file exists for backward compatibility reasons.
"""
from logging import warning
from .nonp import NoNumpyException, DEFAULT_ENCODERS, DEFAULT_HOOKS, dumps, dump, loads, load # keep 'unused' imports
from .utils import hashodict, NoPandasException
from .comment import strip_comment_line_with_symbol, strip_comments # keep 'unused' imports
from .encoders import TricksEncoder, json_date_time_encode, class_instance_encode, ClassInstanceEncoder, \
numpy_encode, NumpyEncoder # keep 'unused' imports
from .decoders import DuplicateJsonKeyException, TricksPairHook, json_date_time_hook, ClassInstanceHook, \
json_complex_hook, json_set_hook, json_numpy_obj_hook # keep 'unused' imports
try:
import numpy
except ImportError:
raise NoNumpyException('Could not load numpy, maybe it is not installed? If you do not want to use numpy encoding '
'or decoding, you can import the functions from json_tricks.nonp instead, which do not need numpy.')
# todo: warning('`json_tricks.np` is deprecated, you can import directly from `json_tricks`')
DEFAULT_NP_ENCODERS = [numpy_encode,] + DEFAULT_ENCODERS # DEPRECATED
DEFAULT_NP_HOOKS = [json_numpy_obj_hook,] + DEFAULT_HOOKS # DEPRECATED
@@ -0,0 +1,15 @@
"""
This file exists for backward compatibility reasons.
"""
from .utils import hashodict, get_scalar_repr, encode_scalars_inplace
from .nonp import NoNumpyException
from . import np
# try:
# from numpy import generic, complex64, complex128
# except ImportError:
# raise NoNumpyException('Could not load numpy, maybe it is not installed?')
@@ -0,0 +1,81 @@
from collections import OrderedDict
class hashodict(OrderedDict):
"""
This dictionary is hashable. It should NOT be mutated, or all kinds of weird
bugs may appear. This is not enforced though, it's only used for encoding.
"""
def __hash__(self):
return hash(frozenset(self.items()))
try:
from inspect import signature
except ImportError:
try:
from inspect import getfullargspec
except ImportError:
from inspect import getargspec
def get_arg_names(callable):
argspec = getargspec(callable)
return set(argspec.args)
else:
#todo: this is not covered in test case (py 3+ uses `signature`, py2 `getfullargspec`); consider removing it
def get_arg_names(callable):
argspec = getfullargspec(callable)
return set(argspec.args) | set(argspec.kwonlyargs)
else:
def get_arg_names(callable):
sig = signature(callable)
return set(sig.parameters.keys())
def call_with_optional_kwargs(callable, *args, **optional_kwargs):
accepted_kwargs = get_arg_names(callable)
use_kwargs = {}
for key, val in optional_kwargs.items():
if key in accepted_kwargs:
use_kwargs[key] = val
return callable(*args, **use_kwargs)
class NoNumpyException(Exception):
""" Trying to use numpy features, but numpy cannot be found. """
class NoPandasException(Exception):
""" Trying to use pandas features, but pandas cannot be found. """
def get_scalar_repr(npscalar):
return hashodict((
('__ndarray__', npscalar.item()),
('dtype', str(npscalar.dtype)),
('shape', ()),
))
def encode_scalars_inplace(obj):
"""
Searches a data structure of lists, tuples and dicts for numpy scalars
and replaces them by their dictionary representation, which can be loaded
by json-tricks. This happens in-place (the object is changed, use a copy).
"""
from numpy import generic, complex64, complex128
if isinstance(obj, (generic, complex64, complex128)):
return get_scalar_repr(obj)
if isinstance(obj, dict):
for key, val in tuple(obj.items()):
obj[key] = encode_scalars_inplace(val)
return obj
if isinstance(obj, list):
for k, val in enumerate(obj):
obj[k] = encode_scalars_inplace(val)
return obj
if isinstance(obj, (tuple, set)):
return type(obj)(encode_scalars_inplace(val) for val in obj)
return obj
@@ -23,6 +23,17 @@ class Media(Descriptor):
bitrate = Property(type=int)
duration = Property(type=int)
#@classmethod
#def from_node(cls, client, node):
# return cls.construct(client, cls.helpers.find(node, 'Media'), child=True)
@classmethod
def from_node(cls, client, node):
return cls.construct(client, cls.helpers.find(node, 'Media'), child=True)
items = []
for genre in cls.helpers.findall(node, 'Media'):
_, obj = Media.construct(client, genre, child=True)
items.append(obj)
return [], items
+8 -8
View File
@@ -1,27 +1,27 @@
# addic7ed
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; ProviderPool(providers=['addic7ed'], provider_configs={'addic7ed': {'use_random_agents': True}})['addic7ed'].query('Game of Thrones', 2)"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; ProviderPool(providers=['addic7ed'], provider_configs={'addic7ed': {'use_random_agents': True}})['addic7ed'].query('Game of Thrones', 2)"
# opensubtitles
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; from babelfish import Language; ProviderPool(providers=['opensubtitles'], )['opensubtitles'].query([Language('eng')], query='Game of Thrones', season=2, episode=1, tag='Game.of.Thrones.S06E01.The.Red.Woman.720p.WEB-DL.DD5.1.H.264-NTB.mkv', use_tag_search=True)"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from babelfish import Language; from subzero.video import parse_video; SZProviderPool(providers=['opensubtitles'], )['opensubtitles'].list_subtitles(parse_video('FULL_PATH', {}, {'type': 'episode'}), languages=[Language('eng')])"
# podnapisi
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; from babelfish import Language; ProviderPool(providers=['podnapisi'], )['podnapisi'].query([Language('eng')], 'Game of Thrones', season=2, episode=1)"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from babelfish import Language; SZProviderPool(providers=['podnapisi'], )['podnapisi'].query([Language('eng')], 'Game of Thrones', season=2, episode=1)"
# tvsubtitles
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; from babelfish import Language; ProviderPool(providers=['tvsubtitles'], )['tvsubtitles'].query('Game of Thrones', 2, 1)"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from babelfish import Language; SZProviderPool(providers=['tvsubtitles'], )['tvsubtitles'].query('Game of Thrones', 2, 1)"
# napiprojekt:list
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; from babelfish import Language; from subliminal.core import scan_video; print ProviderPool(providers=['napiprojekt'], )['napiprojekt'].list_subtitles(scan_video('FULL_PATH'), languages=[Language('pol')])"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from babelfish import Language; from subliminal.core import scan_video; print SZProviderPool(providers=['napiprojekt'], )['napiprojekt'].list_subtitles(scan_video('FULL_PATH'), languages=[Language('pol')])"
# napiprojekt:download
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import PatchedProviderPool; from subliminal import download_best_subtitles; from babelfish import Language; from subliminal.core import scan_video; subs = download_best_subtitles([scan_video('FULL_PATH')], languages={Language('eng')}, providers=['napiprojekt'], ); print subs.values()[0][0].is_valid()"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from subliminal_patch.score import compute_score; from subliminal import download_best_subtitles; from babelfish import Language; from subliminal.core import scan_video; subs = download_best_subtitles([scan_video('FULL_PATH')], languages={Language('eng')}, providers=['napiprojekt'], pool_class=SZProviderPool, compute_score=compute_score); print subs.values()[0][0].is_valid()"
# shooter:list
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; from babelfish import Language; from subliminal.core import scan_video; print ProviderPool(providers=['shooter'], )['shooter'].list_subtitles(scan_video('FULL_PATH'), languages=[Language('zho')])"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from babelfish import Language; from subliminal.core import scan_video; print SZProviderPool(providers=['shooter'], )['shooter'].list_subtitles(scan_video('FULL_PATH'), languages=[Language('zho')])"
# subscenter:list
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal import ProviderPool; from babelfish import Language; from subliminal.core import scan_video; print ProviderPool(providers=['subscenter'], )['subscenter'].list_subtitles(scan_video('FULL_PATH'), languages=[Language('heb')])"
python -c "import logging; logging.basicConfig(level=logging.DEBUG); logging.getLogger('rebulk').setLevel(logging.WARNING); import subliminal_patch, subliminal; subliminal.region.configure('dogpile.cache.memory'); from subliminal_patch.core import SZProviderPool; from babelfish import Language; from subliminal.core import scan_video; print SZProviderPool(providers=['subscenter'], )['subscenter'].list_subtitles(scan_video('FULL_PATH'), languages=[Language('heb')])"
# refining
@@ -9,12 +9,6 @@ from .providers import Provider
from .http import RetryingSession
subliminal.subtitle.Subtitle = PatchedSubtitle
try:
subliminal.provider_manager.register('napiprojekt = subliminal.providers.napiprojekt:NapiProjektProvider',)
except ValueError:
# already registered
pass
# add our patched base classes
for name in ("Addic7ed", "Podnapisi", "TVsubtitles", "OpenSubtitles", "LegendasTV", "NapiProjekt", "Shooter",
"SubsCenter"):
@@ -28,13 +22,18 @@ for name in ("Addic7ed", "Podnapisi", "TVsubtitles", "OpenSubtitles", "LegendasT
from .core import scan_video, search_external_subtitles, list_all_subtitles, save_subtitles, refine
from .score import compute_score
from .extensions import provider_manager
from .video import Video
# patch subliminal's core functions
subliminal.scan_video = subliminal.core.scan_video = scan_video
subliminal.core.search_external_subtitles = search_external_subtitles
subliminal.save_subtitles = subliminal.core.save_subtitles = save_subtitles
subliminal.refine = subliminal.core.refine = refine
subliminal.video.Video = subliminal.Video = Video
subliminal.video.Episode.__bases__ = (Video,)
subliminal.video.Movie.__bases__ = (Video,)
# add our own list_all_subtitles
subliminal.list_all_subtitles = subliminal.core.list_all_subtitles = list_all_subtitles
subliminal.provider_manager = subliminal.core.provider_manager = provider_manager
subliminal.provider_manager = subliminal.core.provider_manager = subliminal.extensions.provider_manager = \
provider_manager
@@ -102,14 +102,18 @@ class SZProviderPool(ProviderPool):
try:
self[subtitle.provider_name].download_subtitle(subtitle)
break
except (requests.Timeout, socket.timeout):
logger.error('Provider %r timed out', subtitle.provider_name)
except ProviderError:
logger.error('Unexpected error in provider %r, Traceback: %s', subtitle.provider_name,
traceback.format_exc())
except (requests.ConnectionError,
requests.exceptions.ProxyError,
requests.exceptions.SSLError,
requests.Timeout,
socket.timeout):
logger.error('Provider %r connection error', subtitle.provider_name)
except:
logger.exception('Unexpected error in provider %r, Traceback: %s', subtitle.provider_name,
traceback.format_exc())
self.discarded_providers.add(subtitle.provider_name)
return False
if tries == DOWNLOAD_TRIES:
self.discarded_providers.add(subtitle.provider_name)
@@ -121,6 +125,10 @@ class SZProviderPool(ProviderPool):
subtitle.provider_name, DOWNLOAD_RETRY_SLEEP)
time.sleep(DOWNLOAD_RETRY_SLEEP)
if os.environ.get("SZ_ENFORCE_ENCODING", "False") == "True":
logger.info("Enforcing encoding of %s from %s to %s", subtitle, subtitle.guess_encoding(), "utf-8")
subtitle.set_encoding("utf-8")
# check subtitle validity
if not subtitle.is_valid():
logger.error('Invalid subtitle')
@@ -192,7 +200,8 @@ class SZProviderPool(ProviderPool):
continue
# bail out if hearing_impaired was wrong
if "hearing_impaired" not in matches and hearing_impaired in ("force HI", "force non-HI"):
if subtitle.hearing_impaired_verifiable and "hearing_impaired" not in matches and \
hearing_impaired in ("force HI", "force non-HI"):
logger.debug('%r: Skipping subtitle with score %d because hearing-impaired set to %s', subtitle,
score, hearing_impaired)
continue
@@ -460,7 +469,7 @@ def get_subtitle_path(video_path, language=None, extension='.srt', forced_tag=Fa
def save_subtitles(video, subtitles, single=False, directory=None, encoding=None, encode_with=None, chmod=None,
forced_tag=False, path_decoder=None):
forced_tag=False, path_decoder=None, debug_mods=False):
"""Save subtitles on filesystem.
Subtitles are saved in the order of the list. If a subtitle with a language has already been saved, other subtitles
@@ -515,7 +524,8 @@ def save_subtitles(video, subtitles, single=False, directory=None, encoding=None
# save normalized subtitle if encoder or no encoding is given
if has_encoder or encoding is None:
content = encode_with(subtitle.get_modified_text()) if has_encoder else subtitle.get_modified_content()
content = encode_with(subtitle.get_modified_text(debug=debug_mods)) if has_encoder else \
subtitle.get_modified_content(debug=debug_mods)
with io.open(subtitle_path, 'wb') as f:
f.write(content)
@@ -3,7 +3,7 @@ import subliminal
import babelfish
from subliminal.extensions import RegistrableExtensionManager
provider_manager = RegistrableExtensionManager('subliminal.providers', [
provider_manager = RegistrableExtensionManager('subliminal_patch.providers', [
'addic7ed = subliminal_patch.providers.addic7ed:Addic7edProvider',
'legendastv = subliminal_patch.providers.legendastv:LegendasTVProvider',
'opensubtitles = subliminal_patch.providers.opensubtitles:OpenSubtitlesProvider',
@@ -19,4 +19,5 @@ provider_manager = RegistrableExtensionManager('subliminal.providers', [
babelfish.language_converters.unregister('addic7ed = subliminal.converters.addic7ed:Addic7edConverter')
babelfish.language_converters.register('addic7ed = subliminal_patch.language:PatchedAddic7edConverter')
subliminal.refiner_manager.register('sz_metadata = subliminal_patch.refiners.metadata:refine')
subliminal.refiner_manager.register('sz_omdb = subliminal_patch.refiners.omdb:refine')
@@ -4,7 +4,8 @@ from xmlrpclib import SafeTransport
import certifi
import ssl
import os
from requests import Session
import socket
from requests import Session, exceptions
from retry.api import retry_call
pem_file = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), "..", certifi.where()))
@@ -23,7 +24,14 @@ class RetryingSession(Session):
self.verify = pem_file
def retry_method(self, method, *args, **kwargs):
return retry_call(getattr(super(RetryingSession, self), method), fargs=args, fkwargs=kwargs, tries=3, delay=1)
return retry_call(getattr(super(RetryingSession, self), method), fargs=args, fkwargs=kwargs, tries=3, delay=5,
exceptions=(exceptions.ConnectionError,
exceptions.ProxyError,
exceptions.SSLError,
exceptions.Timeout,
exceptions.ConnectTimeout,
exceptions.ReadTimeout,
socket.timeout))
def get(self, *args, **kwargs):
return self.retry_method("get", *args, **kwargs)
@@ -5,3 +5,6 @@ from subliminal.providers import Provider as _Provider
class Provider(_Provider):
hash_verifiable = False
hearing_impaired_verifiable = False
skip_wrong_fps = True
@@ -17,6 +17,8 @@ series_year_re = re.compile(r'^(?P<series>[ \w\'.:(),&!?-]+?)(?: \((?P<year>\d{4
class Addic7edSubtitle(_Addic7edSubtitle):
hearing_impaired_verifiable = True
def __init__(self, language, hearing_impaired, page_link, series, season, episode, title, year, version,
download_link):
super(Addic7edSubtitle, self).__init__(language, hearing_impaired, page_link, series, season, episode,
@@ -28,6 +30,10 @@ class Addic7edSubtitle(_Addic7edSubtitle):
if not subliminal.score.episode_scores.get("addic7ed_boost"):
return matches
# if the release group matches, the format is most likely correct, as well
if "release_group" in matches:
matches.add("format")
if {"series", "season", "episode", "year"}.issubset(matches) and "format" in matches:
matches.add("addic7ed_boost")
logger.info("Boosting Addic7ed subtitle by %s" % subliminal.score.episode_scores.get("addic7ed_boost"))
@@ -40,6 +46,7 @@ class Addic7edSubtitle(_Addic7edSubtitle):
class Addic7edProvider(_Addic7edProvider):
USE_ADDICTED_RANDOM_AGENTS = False
hearing_impaired_verifiable = True
subtitle_class = Addic7edSubtitle
def __init__(self, username=None, password=None, use_random_agents=False):
@@ -13,6 +13,10 @@ class LegendasTVSubtitle(_LegendasTVSubtitle):
self.release_info = archive.name
self.page_link = archive.link
def make_picklable(self):
self.archive.content = None
return self
class LegendasTVProvider(_LegendasTVProvider):
subtitle_class = LegendasTVSubtitle
@@ -3,6 +3,7 @@
import re
import time
import logging
import traceback
logger = logging.getLogger(__name__)
@@ -33,10 +34,11 @@ class ProviderRetryMixin(object):
while i <= amount:
try:
return f()
except exc, e:
except exc:
formatted_exc = traceback.format_exc()
i += 1
if i == amount:
raise
logger.debug(u"Retrying %s, try: %i/%i, exception: %s" % (self.__class__.__name__, i, amount, e))
logger.debug(u"Retrying %s, try: %i/%i, exception: %s" % (self.__class__.__name__, i, amount, formatted_exc))
time.sleep(retry_timeout)
@@ -1,14 +1,50 @@
# coding=utf-8
import logging
from subliminal.providers.napiprojekt import NapiProjektProvider as _NapiProjektProvider, \
NapiProjektSubtitle as _NapiProjektSubtitle
NapiProjektSubtitle as _NapiProjektSubtitle, get_subhash
logger = logging.getLogger(__name__)
class NapiProjektSubtitle(_NapiProjektSubtitle):
def __init__(self, language, hash):
def __init__(self, language, hash, fps):
super(NapiProjektSubtitle, self).__init__(language, hash)
self.release_info = hash
self.plex_media_fps = float(fps)
def __repr__(self):
return '<%s %r [%s]>' % (
self.__class__.__name__, self.release_info, self.language)
class NapiProjektProvider(_NapiProjektProvider):
subtitle_class = NapiProjektSubtitle
def query(self, language, hash, fps):
params = {
'v': 'dreambox',
'kolejka': 'false',
'nick': '',
'pass': '',
'napios': 'Linux',
'l': language.alpha2.upper(),
'f': hash,
't': get_subhash(hash)}
logger.info('Searching subtitle %r', params)
response = self.session.get(self.server_url, params=params, timeout=10)
response.raise_for_status()
# handle subtitles not found and errors
if response.content[:4] == b'NPc0':
logger.debug('No subtitles found')
return None
subtitle = self.subtitle_class(language, hash, fps)
subtitle.content = response.content
logger.debug('Found subtitle %r', subtitle)
return subtitle
def list_subtitles(self, video, languages):
return [s for s in [self.query(l, video.hashes['napiprojekt'], video.fps) for l in languages] if s is not None]
@@ -16,10 +16,11 @@ logger = logging.getLogger(__name__)
class OpenSubtitlesSubtitle(_OpenSubtitlesSubtitle):
hash_verifiable = True
hearing_impaired_verifiable = True
def __init__(self, language, hearing_impaired, page_link, subtitle_id, matched_by, movie_kind, hash, movie_name,
movie_release_name, movie_year, movie_imdb_id, series_season, series_episode, query_parameters,
filename, encoding, fps):
filename, encoding, fps, skip_wrong_fps=True):
super(OpenSubtitlesSubtitle, self).__init__(language, hearing_impaired, page_link, subtitle_id,
matched_by, movie_kind, hash,
movie_name, movie_release_name, movie_year, movie_imdb_id,
@@ -27,6 +28,8 @@ class OpenSubtitlesSubtitle(_OpenSubtitlesSubtitle):
self.query_parameters = query_parameters or {}
self.fps = fps
self.release_info = movie_release_name
self.wrong_fps = False
self.skip_wrong_fps = skip_wrong_fps
def get_matches(self, video, hearing_impaired=False):
matches = super(OpenSubtitlesSubtitle, self).get_matches(video)
@@ -39,9 +42,14 @@ class OpenSubtitlesSubtitle(_OpenSubtitlesSubtitle):
# video has fps info, sub also, and sub's fps is greater than 0
if video.fps and sub_fps and (video.fps != self.fps):
logger.debug("Wrong FPS (expected: %s, got: %s, lowering score massively)", video.fps, self.fps)
# fixme: may be too harsh
return set()
self.wrong_fps = True
if self.skip_wrong_fps:
logger.debug("Wrong FPS (expected: %s, got: %s, lowering score massively)", video.fps, self.fps)
# fixme: may be too harsh
return set()
else:
logger.debug("Wrong FPS (expected: %s, got: %s, continuing)", video.fps, self.fps)
# matched by tag?
if self.matched_by == "tag":
@@ -57,8 +65,10 @@ class OpenSubtitlesProvider(ProviderRetryMixin, _OpenSubtitlesProvider):
only_foreign = True
subtitle_class = OpenSubtitlesSubtitle
hash_verifiable = True
hearing_impaired_verifiable = True
skip_wrong_fps = True
def __init__(self, username=None, password=None, use_tag_search=False, only_foreign=False):
def __init__(self, username=None, password=None, use_tag_search=False, only_foreign=False, skip_wrong_fps=True):
if username is not None and password is None or username is None and password is not None:
raise ConfigurationError('Username and password must be specified')
@@ -66,6 +76,7 @@ class OpenSubtitlesProvider(ProviderRetryMixin, _OpenSubtitlesProvider):
self.password = password or ''
self.use_tag_search = use_tag_search
self.only_foreign = only_foreign
self.skip_wrong_fps = skip_wrong_fps
if use_tag_search:
logger.info("Using tag/exact filename search")
@@ -81,7 +92,7 @@ class OpenSubtitlesProvider(ProviderRetryMixin, _OpenSubtitlesProvider):
# fixme: retry on SSLError
response = self.retry(
lambda: checked(
self.server.LogIn(self.username, self.password, 'eng', 'subliminal v%s' % __short_version__)
self.server.LogIn(self.username, self.password, 'eng', os.environ.get("SZ_USER_AGENT", "Sub-Zero/2"))
)
)
self.token = response['token']
@@ -101,6 +112,12 @@ class OpenSubtitlesProvider(ProviderRetryMixin, _OpenSubtitlesProvider):
query = video.series
season = video.season
episode = video.episode
if video.is_special:
season = None
episode = None
query = u"%s %s" % (video.series, video.title)
logger.info("%s: Searching for special: %r", self.__class__, query)
# elif ('opensubtitles' not in video.hashes or not video.size) and not video.imdb_id:
# query = video.name.split(os.sep)[-1]
else:
@@ -176,7 +193,7 @@ class OpenSubtitlesProvider(ProviderRetryMixin, _OpenSubtitlesProvider):
movie_kind,
hash, movie_name, movie_release_name, movie_year, movie_imdb_id,
series_season, series_episode, query_parameters, filename, encoding,
movie_fps)
movie_fps, skip_wrong_fps=self.skip_wrong_fps)
logger.debug('Found subtitle %r by %s', subtitle, matched_by)
subtitles.append(subtitle)
@@ -22,6 +22,7 @@ logger = logging.getLogger(__name__)
class PodnapisiSubtitle(_PodnapisiSubtitle):
provider_name = 'podnapisi'
hearing_impaired_verifiable = True
def __init__(self, language, hearing_impaired, page_link, pid, releases, title, season=None, episode=None,
year=None):
@@ -33,6 +34,7 @@ class PodnapisiSubtitle(_PodnapisiSubtitle):
class PodnapisiProvider(_PodnapisiProvider):
only_foreign = False
subtitle_class = PodnapisiSubtitle
hearing_impaired_verifiable = True
def __init__(self, only_foreign=False):
self.only_foreign = only_foreign
@@ -43,6 +45,10 @@ class PodnapisiProvider(_PodnapisiProvider):
super(PodnapisiProvider, self).__init__()
def list_subtitles(self, video, languages):
if video.is_special:
logger.info("%s can't search for specials right now, skipping", self)
return []
if isinstance(video, Episode):
return [s for l in languages for s in self.query(l, video.series, season=video.season,
episode=video.episode, year=video.year,
@@ -5,6 +5,8 @@ from subliminal.providers.subscenter import SubsCenterProvider as _SubsCenterPro
class SubsCenterSubtitle(_SubsCenterSubtitle):
hearing_impaired_verifiable = True
def __init__(self, language, hearing_impaired, page_link, series, season, episode, title, subtitle_id, subtitle_key,
subtitle_version, downloaded, releases):
super(SubsCenterSubtitle, self).__init__(language, hearing_impaired, page_link, series, season, episode, title,
@@ -20,3 +22,4 @@ class SubsCenterSubtitle(_SubsCenterSubtitle):
class SubsCenterProvider(_SubsCenterProvider):
subtitle_class = SubsCenterSubtitle
hearing_impaired_verifiable = True
@@ -0,0 +1,67 @@
# coding=utf-8
import os
import subliminal
import base64
import zlib
from subliminal import __short_version__
from subliminal.refiners.omdb import OMDBClient, refine
class SZOMDBClient(OMDBClient):
def __init__(self, version=1, session=None, headers=None, timeout=10):
super(SZOMDBClient, self).__init__(version=version, session=session, headers=headers, timeout=timeout)
def get_params(self, params):
self.session.params['apikey'] = \
zlib.decompress(base64.b16decode(os.environ['U1pfT01EQl9LRVk']))\
.decode('cm90MTM=\n'.decode("base64")) \
.decode('YmFzZTY0\n'.decode("base64")).split("x")[0]
return dict(self.session.params, **params)
def get(self, id=None, title=None, type=None, year=None, plot='short', tomatoes=False):
# build the params
params = {}
if id:
params['i'] = id
if title:
params['t'] = title
if not params:
raise ValueError('At least id or title is required')
params['type'] = type
params['y'] = year
params['plot'] = plot
params['tomatoes'] = tomatoes
# perform the request
r = self.session.get(self.base_url, params=self.get_params(params))
r.raise_for_status()
# get the response as json
j = r.json()
# check response status
if j['Response'] == 'False':
return None
return j
def search(self, title, type=None, year=None, page=1):
# build the params
params = {'s': title, 'type': type, 'y': year, 'page': page}
# perform the request
r = self.session.get(self.base_url, params=self.get_params(params))
r.raise_for_status()
# get the response as json
j = r.json()
# check response status
if j['Response'] == 'False':
return None
return j
omdb_client = SZOMDBClient(headers={'User-Agent': 'Subliminal/%s' % __short_version__})
subliminal.refiners.omdb.omdb_client = omdb_client
@@ -45,16 +45,18 @@ def compute_score(matches, subtitle, video, hearing_impaired=None):
# hash is error-prone, try to fix that
hash_valid_if = episode_hash_valid_if if is_episode else movie_hash_valid_if
if hash_valid_if <= set(matches):
# series, season and episode matched, hash is valid
logger.debug('%r: Using valid hash, as %s are correct (%r) and (%r)', subtitle, hash_valid_if, matches,
video)
matches &= {'hash', 'hearing_impaired'}
else:
# no match, invalidate hash
logger.debug('%r: Ignoring hash as other matches are wrong (missing: %r) and (%r)', subtitle,
hash_valid_if - matches, video)
matches -= {"hash"}
# don't validate hashes of specials, as season and episode tend to be wrong
if is_movie or not video.is_special:
if hash_valid_if <= set(matches):
# series, season and episode matched, hash is valid
logger.debug('%r: Using valid hash, as %s are correct (%r) and (%r)', subtitle, hash_valid_if, matches,
video)
matches &= {'hash'}
else:
# no match, invalidate hash
logger.debug('%r: Ignoring hash as other matches are wrong (missing: %r) and (%r)', subtitle,
hash_valid_if - matches, video)
matches -= {"hash"}
elif 'hash' in matches:
logger.debug('%r: Hash not verifiable for this provider. Keeping it', subtitle)
@@ -75,6 +77,13 @@ def compute_score(matches, subtitle, video, hearing_impaired=None):
if 'series_tvdb_id' in matches:
logger.debug('Adding series_tvdb_id match equivalents')
matches |= {'series', 'year'}
# specials
if video.is_special and 'title' in matches and 'series' in matches \
and 'year' in matches:
logger.debug('Adding special title match equivalent')
matches |= {'season', 'episode'}
elif is_movie:
if 'imdb_id' in matches:
logger.debug('Adding imdb_id match equivalents')
@@ -2,13 +2,18 @@
import logging
import traceback
import re
import chardet
import pysrt
import pysubs2
from bs4 import UnicodeDammit
from subliminal import Subtitle
from pysubs2 import SSAStyle
from pysubs2.subrip import ms_to_timestamp, parse_tags
from subzero.modification import SubtitleModifications
from subliminal import Subtitle
logger = logging.getLogger(__name__)
@@ -18,8 +23,13 @@ class PatchedSubtitle(Subtitle):
release_info = None
matches = None
hash_verifiable = False
hearing_impaired_verifiable = False
mods = None
plex_media_fps = None
skip_wrong_fps = False
wrong_fps = False
_guessed_encoding = None
def __init__(self, language, hearing_impaired=False, page_link=None, encoding=None, mods=None):
super(PatchedSubtitle, self).__init__(language, hearing_impaired=hearing_impaired, page_link=page_link,
@@ -30,6 +40,21 @@ class PatchedSubtitle(Subtitle):
return '<%s %r [%s]>' % (
self.__class__.__name__, self.page_link, self.language)
def make_picklable(self):
"""
some subtitle instances might have unpicklable objects stored; clean them up here
:return: self
"""
return self
def set_encoding(self, encoding):
if encoding == self.guess_encoding():
return
unicontent = self.text
self.content = unicontent.encode(encoding)
self._guessed_encoding = encoding
def guess_encoding(self):
"""Guess encoding using the language, falling back on chardet.
@@ -37,11 +62,17 @@ class PatchedSubtitle(Subtitle):
:rtype: str
"""
if self._guessed_encoding:
logger.info('Encoding already guessed: %s', self._guessed_encoding)
return self._guessed_encoding
logger.info('Guessing encoding for language %s', self.language.alpha3)
encodings = ['utf-8']
# add language-specific encodings
# http://scratchpad.wikia.com/wiki/Character_Encoding_Recommendation_for_Languages
if self.language.alpha3 == 'zho':
encodings.extend(['gb18030', 'big5'])
elif self.language.alpha3 == 'jpn':
@@ -67,15 +98,15 @@ class PatchedSubtitle(Subtitle):
elif self.language.alpha3 in ('pol', 'cze', 'ces', 'slk', 'slo', 'slv', 'hun', 'bos', 'hbs', 'hrv', 'rsb',
'ron', 'rum', 'sqi', 'alb'):
# Eastern European Group 1
encodings.append('windows-1250')
encodings.extend(['iso-8859-2', 'windows-1250'])
# Bulgarian, Serbian and Macedonian
elif self.language.alpha3 in ('bul', 'srp', 'mkd', 'mac'):
# Bulgarian, Serbian and Macedonian, Ukranian and Russian
elif self.language.alpha3 in ('bul', 'srp', 'mkd', 'mac', 'rus', 'ukr'):
# Eastern European Group 2
encodings.append('windows-1251')
encodings.extend(['iso-8859-5', 'windows-1251'])
else:
# Western European (windows-1252)
encodings.append('latin-1')
# Western European (windows-1252) / Northern European
encodings.extend(['iso-8859-15', 'iso-8859-9', 'iso-8859-4', 'iso-8859-1', 'latin-1'])
# try to decode
logger.debug('Trying encodings %r', encodings)
@@ -86,6 +117,7 @@ class PatchedSubtitle(Subtitle):
pass
else:
logger.info('Guessed encoding %s', encoding)
self._guessed_encoding = encoding
return encoding
logger.warning('Could not guess encoding from language')
@@ -102,9 +134,11 @@ class PatchedSubtitle(Subtitle):
Log.Debug("bs4 detected encoding: %s" % a.original_encoding)
if a.original_encoding:
self._guessed_encoding = a.original_encoding
return a.original_encoding
raise ValueError(u"Couldn't guess the proper encoding for %s" % self)
self._guessed_encoding = encoding
return encoding
def is_valid(self):
@@ -114,50 +148,95 @@ class PatchedSubtitle(Subtitle):
:rtype: bool
"""
if not self.text:
text = self.text
if not text:
return False
# valid srt
try:
pysrt.from_string(self.text, error_handling=pysrt.ERROR_RAISE)
except Exception, e:
logger.error("PySRT-parsing failed: %s, trying pysubs2", e)
pysrt.from_string(text, error_handling=pysrt.ERROR_RAISE)
except Exception:
logger.error("PySRT-parsing failed, trying pysubs2")
else:
return True
# something else, try to return srt
try:
logger.debug("Trying parsing with PySubs2")
subs = pysubs2.SSAFile.from_string(self.text)
self.content = subs.to_string("srt")
try:
# in case of microdvd, try parsing the fps from the subtitle
subs = pysubs2.SSAFile.from_string(text)
if subs.format == "microdvd":
logger.info("Got FPS from MicroDVD subtitle: %s", subs.fps)
except pysubs2.UnknownFPSError:
# if parsing failed, suggest our media file's fps
subs = pysubs2.SSAFile.from_string(text, fps=self.plex_media_fps)
if subs.format == "microdvd":
logger.info("No FPS info in subtitle. Using our own media FPS for the MicroDVD subtitle: %s",
subs.fps)
unicontent = self.pysubs2_to_unicode(subs)
self.content = unicontent.encode(self.guess_encoding())
except:
logger.exception("Couldn't convert subtitle %s to .srt format", self)
logger.exception("Couldn't convert subtitle %s to .srt format: %s", self, traceback.format_exc())
return False
return True
def get_modified_content(self):
@classmethod
def pysubs2_to_unicode(cls, sub):
def prepare_text(text, style):
body = []
for fragment, sty in parse_tags(text, style, sub.styles):
fragment = fragment.replace(ur"\h", u" ")
fragment = fragment.replace(ur"\n", u"\n")
fragment = fragment.replace(ur"\N", u"\n")
if sty.italic: fragment = u"<i>%s</i>" % fragment
if sty.underline: fragment = u"<u>%s</u>" % fragment
if sty.strikeout: fragment = u"<s>%s</s>" % fragment
body.append(fragment)
return re.sub(u"\n+", u"\n", u"".join(body).strip())
visible_lines = (line for line in sub if not line.is_comment)
out = []
for i, line in enumerate(visible_lines, 1):
start = ms_to_timestamp(line.start)
end = ms_to_timestamp(line.end)
text = prepare_text(line.text, sub.styles.get(line.style, SSAStyle.DEFAULT_STYLE))
out.append(u"%d\n" % i)
out.append(u"%s --> %s\n" % (start, end))
out.append(u"%s%s" % (text, "\n\n"))
return u"".join(out)
def get_modified_content(self, debug=False):
"""
:param language:
:param fps:
:return: string
"""
if not self.mods:
return self.content
encoding = self.guess_encoding()
submods = SubtitleModifications()
submods.load(content=self.text, fps=self.plex_media_fps)
submods = SubtitleModifications(debug=debug)
submods.load(content=self.text, language=self.language)
submods.modify(*self.mods)
return submods.to_string("srt", encoding=encoding).encode(encoding=encoding)
def get_modified_text(self):
return self.pysubs2_to_unicode(submods.f).encode(encoding=encoding)
def get_modified_text(self, debug=False):
"""
:param language:
:param fps:
:return: unicode
"""
content = self.get_modified_content()
content = self.get_modified_content(debug=debug)
if not content:
return
encoding = self.guess_encoding()
return content.decode(encoding=encoding)
class ModifiedSubtitle(PatchedSubtitle):
id = None
@@ -0,0 +1,7 @@
# coding=utf-8
from subliminal.video import Video as Video_
class Video(Video_):
is_special = False
+17 -3
View File
@@ -1,7 +1,10 @@
# coding=utf-8
import sys
import logging
import sys
import codecs
from babelfish import Language
logger = logging.getLogger(__name__)
@@ -14,7 +17,18 @@ if debug:
logging.basicConfig(level=logging.DEBUG)
submod = SubMod(debug=debug)
submod.load(fn)
submod.modify("remove_HI")
submod.load(fn, language=Language.fromietf("eng"), encoding="utf-8")
submod.modify("remove_HI", "OCR_fixes", "common", "OCR_fixes", "shift_offset(s=20)", "OCR_fixes", "color(color=#FF0000)", "shift_offset(s=-5, ms=-350)")
#srt = submod.to_unicode()
#print submod.f.to_string("srt", encoding="utf-8")
#print repr(srt)
#f = codecs.open("testout.srt", "w+", encoding="latin-1")
#f.write(srt)
#f.close()
#print submod.f.to_string("srt")
#submod.modify("OCR_fixes")
#submod.modify("change_FPS(from=24,to=25)")
#submod.modify("common")
#print submod.f.to_string("srt")
@@ -3,6 +3,7 @@
import datetime
import logging
import traceback
import types
from constants import mode_map
@@ -71,9 +72,10 @@ class SubtitleHistory(object):
self.history_items = storage.LoadObject("subtitle_history") or []
except:
logger.error("Failed to load history storage: %s" % traceback.format_exc())
if not isinstance(self.history_items, types.ListType):
self.history_items = []
def add(self, item_title, rating_key, section_title=None, subtitle=None, mode="a", time=None):
# create copy
items = self.history_items
item = SubtitleHistoryItem(item_title, rating_key, section_title=section_title, subtitle=subtitle, mode=mode, time=time)
@@ -1,246 +0,0 @@
# coding=utf-8
import re
import traceback
from collections import OrderedDict
import pysubs2
import logging
logger = logging.getLogger(__name__)
class SubtitleModifications(object):
debug = False
def __init__(self, debug=False):
self.debug = debug
def load(self, fn=None, content=None, fps=None):
"""
:param fn: filename
:param content: unicode
:param fps:
:return:
"""
try:
if fn:
self.f = pysubs2.load(fn, fps=fps)
elif content:
self.f = pysubs2.SSAFile.from_string(content, fps=fps)
except (IOError,
UnicodeDecodeError,
pysubs2.exceptions.UnknownFPSError,
pysubs2.exceptions.UnknownFormatIdentifierError,
pysubs2.exceptions.FormatAutodetectionError):
if fn:
logger.exception("Couldn't load subtitle: %s: %s", fn, traceback.format_exc())
elif content:
logger.exception("Couldn't load subtitle: %s", traceback.format_exc())
def modify(self, *mods):
new_f = []
for line in self.f:
applied_mods = []
for identifier in mods:
if identifier in registry.mods:
mod = registry.mods[identifier]
# don't bother reapplying exclusive mods multiple times
if mod.exclusive and identifier in applied_mods:
continue
new_content = mod.modify(line.text, debug=self.debug)
if not new_content:
if self.debug:
logger.debug("%s: deleting %s", identifier, line)
continue
line.text = new_content
new_f.append(line)
applied_mods.append(identifier)
self.f.events = new_f
def to_string(self, format="srt", encoding="utf-8"):
return self.f.to_string(format, encoding=encoding)
def save(self, fn):
self.f.save(fn)
SubMod = SubtitleModifications
class SubtitleModRegistry(object):
mods = None
mods_available = None
def __init__(self):
self.mods = OrderedDict()
self.mods_available = []
def register(self, mod):
self.mods[mod.identifier] = mod
self.mods_available.append(mod.identifier)
registry = SubtitleModRegistry()
class Processor(object):
"""
Processor base class
"""
name = None
def __init__(self, name=None):
self.name = name
@property
def info(self):
return self.name
def process(self, content):
return content
def __repr__(self):
return "Processor <%s %s>" % (self.__class__.__name__, self.info)
def __str__(self):
return repr(self)
def __unicode__(self):
return unicode(repr(self))
class StringProcessor(Processor):
"""
String replacement processor base
"""
def __init__(self, search, replace, name=None):
super(StringProcessor, self).__init__(name=name)
self.search = search
self.replace = replace
def process(self, content):
return content.replace(self.search, self.replace)
class ReProcessor(Processor):
"""
Regex processor
"""
pattern = None
replace_with = None
def __init__(self, pattern, replace_with, name=None):
super(ReProcessor, self).__init__(name=name)
self.pattern = pattern
self.replace_with = replace_with
def process(self, content, debug=False):
return self.pattern.sub(self.replace_with, content)
class NReProcessor(ReProcessor):
"""
Single line regex processor
"""
def process(self, content, debug=False):
lines = []
for line in content.split(r"\N"):
a = super(NReProcessor, self).process(line, debug=debug)
if not a:
continue
lines.append(a)
return r"\N".join(lines)
class SubtitleModification(object):
identifier = None
description = None
exclusive = False
pre_processors = []
processors = []
post_processors = []
@classmethod
def _process(cls, content, processors, debug=False):
if not content:
return
new_content = content
for processor in processors:
old_content = new_content
new_content = processor.process(new_content, debug=debug)
if not new_content:
if debug:
logger.debug("Processor returned empty line: %s", processor)
break
if debug:
if old_content == new_content:
continue
logger.debug("%s: %s -> %s", processor, old_content, new_content)
return new_content
@classmethod
def pre_process(cls, content, debug=False):
return cls._process(content, cls.pre_processors, debug=debug)
@classmethod
def process(cls, content, debug=False):
return cls._process(content, cls.processors, debug=debug)
@classmethod
def post_process(cls, content, debug=False):
return cls._process(content, cls.post_processors, debug=debug)
@classmethod
def modify(cls, content, debug=False):
new_content = content
for method in ("pre_process", "process", "post_process"):
new_content = getattr(cls, method)(new_content, debug=debug)
return new_content
class SubtitleTextModification(SubtitleModification):
post_processors = [
# empty tag
ReProcessor(re.compile(r'({\\\w+1})[\s.,-_!?]+({\\\w+0})'), "", name="empty_tag"),
# empty line (needed?)
NReProcessor(re.compile(r'^\s+$'), "", name="empty_line"),
# empty dash line (needed?)
NReProcessor(re.compile(r'(^[\s]*[\-]+[\s]*)$'), "", name="empty_dash_line"),
# clean whitespace at start and end
ReProcessor(re.compile(r'^\s*([^\s]+)\s*$'), r"\1", name="surrounding_whitespace"),
]
class HearingImpaired(SubtitleTextModification):
identifier = "remove_HI"
description = "Remove Hearing Impaired tags"
exclusive = True
processors = [
# brackets
NReProcessor(re.compile(r'(?sux)[([].+[)\]]'), "", name="HI_brackets"),
# text before colon (and possible dash in front)
NReProcessor(re.compile(r'(?u)(^[A-z\-]+[\w\s]*:[^0-9{2}][\s]*)'), "", name="HI_before_colon"),
# all caps line (at least 3 chars)
NReProcessor(re.compile(r'(?u)(^[A-Z]{3,}$)'), "", name="HI_all_caps"),
# dash in front
NReProcessor(re.compile(r'(?u)^\s*-\s*'), "", name="HI_starting_dash"),
]
registry.register(HearingImpaired)
@@ -0,0 +1,5 @@
# coding=utf-8
from registry import registry
from mods import hearing_impaired, ocr_fixes, fps, offset, common, color
from main import SubtitleModifications, SubMod
@@ -0,0 +1,3 @@
# coding=utf-8
from data import data
File diff suppressed because one or more lines are too long
@@ -0,0 +1,98 @@
# coding=utf-8
import re
import os
import pprint
from collections import OrderedDict
from bs4 import BeautifulSoup
TEMPLATE = """\
import re
from collections import OrderedDict
data = """
TEMPLATE_END = """\
for lang, grps in data.iteritems():
for grp in grps.iterkeys():
if data[lang][grp]["pattern"]:
data[lang][grp]["pattern"] = re.compile(data[lang][grp]["pattern"])
"""
SZ_FIX_DATA = {
"eng": {
"PartialWordsAlways": {
u"°x°": u"%",
u"compiete": u"complete",
u"Âs": u"'s",
u"ÃÂs": u"'s",
u"a/ion": u"ation",
u"at/on": u"ation",
u"l/an": u"lian",
},
"WholeWords": {
u"I'11": u"I'll",
u"Tun": u"Run",
u"pan'": u"part",
u"al'": u"at",
u"a re": u"are",
u"wail'": u"wait",
u"he)'": u"hey",
u"He)'": u"Hey",
u"Yea h": u"Yeah",
u"yea h": u"yeah",
u"h is": u"his",
u" 're ": u"'re ",
u"LAst": u"Last",
}
}
}
if __name__ == "__main__":
cur_dir = os.path.dirname(os.path.realpath(__file__))
xml_dir = os.path.join(cur_dir, "xml")
file_list = os.listdir(xml_dir)
data = {}
for fn in file_list:
if fn.endswith("_OCRFixReplaceList.xml"):
lang = fn.split("_")[0]
soup = BeautifulSoup(open(os.path.join(xml_dir, fn)), "xml")
fetch_data = (
# group, item_name, pattern
("WholeLines", "Line", None),
("WholeWords", "Word", lambda d: (ur"(?um)\b(?:" + u"|".join([re.escape(k) for k in d.keys()])
+ ur')\b') if d else None),
("PartialWordsAlways", "WordPart", None),
("PartialLines", "LinePart", lambda d: (ur"(?um)(?:(?<=\s)|(?<=^)|(?<=\b))(?:" +
u"|".join([re.escape(k) for k in d.keys()]) +
ur")(?:(?=\s)|(?=$)|(?=\b))") if d else None),
("BeginLines", "Beginning", lambda d: (ur"(?um)^(?:"+u"|".join([re.escape(k) for k in d.keys()])
+ ur')') if d else None),
("EndLines", "Ending", lambda d: (ur"(?um)(?:" + u"|".join([re.escape(k) for k in d.keys()]) +
ur")$") if d else None,),
)
data[lang] = dict((grp, {"data": OrderedDict(), "pattern": None}) for grp, item_name, pattern in fetch_data)
for grp, item_name, pattern in fetch_data:
for grp_data in soup.find_all(grp):
for line in grp_data.find_all(item_name):
data[lang][grp]["data"][line["from"]] = line["to"]
# add our own dictionaries
if lang in SZ_FIX_DATA and grp in SZ_FIX_DATA[lang]:
data[lang][grp]["data"].update(SZ_FIX_DATA[lang][grp])
if pattern:
data[lang][grp]["pattern"] = pattern(data[lang][grp]["data"])
f = open(os.path.join(cur_dir, "data.py"), "w+")
f.write(TEMPLATE)
f.write(pprint.pformat(data, width=1))
f.write(TEMPLATE_END)
f.close()
@@ -0,0 +1,10 @@
# coding=utf-8
from babelfish import Language
from data import data
#for lang, data in data.iteritems():
# print Language.fromietf(lang).alpha2
for find, rep in data["dan"].iteritems():
print find, rep
@@ -0,0 +1,638 @@
<OCRFixReplaceList>
<WholeWords>
<Word from="Haner" to="Han er" />
<Word from="JaveL" to="Javel" />
<Word from="Pa//e" to="Palle" />
<Word from="bffte" to="bitte" />
<Word from="Utro//gt" to="Utroligt" />
<Word from="Kommerdu" to="Kommer du" />
<Word from="smi/er" to="smiler" />
<Word from="/eg" to="leg" />
<Word from="harvinger" to="har vinger" />
<Word from="/et" to="let" />
<Word from="erjeres" to="er jeres" />
<Word from="hardet" to="har det" />
<Word from="tænktjer" to="tænkt jer" />
<Word from="erjo" to="er jo" />
<Word from="sti/" to="stil" />
<Word from="Iappe" to="lappe" />
<Word from="Beklagelç" to="Beklager," />
<Word from="vardet" to="var det" />
<Word from="afden" to="af den" />
<Word from="snupperjeg" to="snupper jeg" />
<Word from="ikkejeg" to="ikke jeg" />
<Word from="bliverjeg" to="bliver jeg" />
<Word from="hartravit" to="har travlt" />
<Word from="pandekagef/ag" to="pandekageflag" />
<Word from="Stormvarsell" to="Stormvarsel!" />
<Word from="stormvejn" to="stormvejr." />
<Word from="morgenkomp/et" to="morgenkomplet" />
<Word from="/yv" to="lyv" />
<Word from="varjo" to="var jo" />
<Word from="/eger" to="leger" />
<Word from="harjeg" to="har jeg" />
<Word from="havdejeg" to="havde jeg" />
<Word from="hvorjeg" to="hvor jeg" />
<Word from="nårjeg" to="når jeg" />
<Word from="gårvi" to="går vi" />
<Word from="atjeg" to="at jeg" />
<Word from="isine" to="i sine" />
<Word from="fårjeg" to="får jeg" />
<Word from="kærtighed" to="kærlighed" />
<Word from="skullejeg" to="skulle jeg" />
<Word from="laest" to="læst" />
<Word from="laese" to="læse" />
<Word from="gørjeg" to="gør jeg" />
<Word from="gørvi" to="gør vi" />
<Word from="angrerjo" to="angrer jo" />
<Word from="Hvergang" to="Hver gang" />
<Word from="erder" to="er der" />
<Word from="villetilgive" to="ville tilgive" />
<Word from="fieme" to="fjeme" />
<Word from="genopståri" to="genopstår i" />
<Word from="svigtejer" to="svigte jer" />
<Word from="kommernu" to="kommer nu" />
<Word from="nårman" to="når man" />
<Word from="erfire" to="er fire" />
<Word from="Hvorforfinderdu" to="Hvorfor finder du" />
<Word from="undertigt" to="underligt" />
<Word from="itroen" to="i troen" />
<Word from="erløgnt" to="er løgn!" />
<Word from="gørden" to="gør den" />
<Word from="forhelvede" to="for helvede" />
<Word from="hjpe" to="hjælpe" />
<Word from="togeti" to="toget i" />
<Word from="Måjeg" to="Må jeg" />
<Word from="savnerjer" to="savner jer" />
<Word from="erjeg" to="er jeg" />
<Word from="vaere" to="være" />
<Word from="geme" to="gerne" />
<Word from="trorpå" to="tror på" />
<Word from="forham" to="for ham" />
<Word from="afham" to="af ham" />
<Word from="harjo" to="har jo" />
<Word from="ovemafiet" to="overnattet" />
<Word from="begaefiighed" to="begærlighed" />
<Word from="syg" to="syg" />
<Word from="Imensjeg" to="Imens jeg" />
<Word from="bliverdu" to="bliver du" />
<Word from="fiser" to="fiser" />
<Word from="manipuierer" to="manipulerer" />
<Word from="forjeg" to="for jeg" />
<Word from="iivgivendefor" to="livgivende for" />
<Word from="formig" to="for mig" />
<Word from="Hardu" to="Har du" />
<Word from="fornold" to="forhold" />
<Word from="defrelste" to="de frelste" />
<Word from="Såjeg" to="Så jeg" />
<Word from="varjeg" to="var jeg" />
<Word from="gørved" to="gør ved" />
<Word from="kalderjeg" to="kalder jeg" />
<Word from="flytte" to="flytte" />
<Word from="handlerdet" to="handler det" />
<Word from="trorjeg" to="tror jeg" />
<Word from="flytter" to="flytter" />
<Word from="soverjeg" to="sover jeg" />
<Word from="finderud" to="finder ud" />
<Word from="naboerpå" to="naboer på" />
<Word from="ervildt" to="er vildt" />
<Word from="væreher" to="være her" />
<Word from="hyggerjer" to="hygger jer" />
<Word from="borjo" to="bor jo" />
<Word from="kommerikke" to="kommer ikke" />
<Word from="folkynde" to="forkynde" />
<Word from="farglad" to="far glad" />
<Word from="misterjeg" to="mister jeg" />
<Word from="fint" to="fint" />
<Word from="Harl" to="Har I" />
<Word from="bedejer" to="bede jer" />
<Word from="synesjeg" to="synes jeg" />
<Word from="vartil" to="var til" />
<Word from="eren" to="er en" />
<Word from="\Al" to="Vil" />
<Word from="\A" to="Vi" />
<Word from="fjeme" to="fjerne" />
<Word from="Iigefyldt" to="lige fyldt" />
<Word from="ertil" to="er til" />
<Word from="fafiigt" to="farligt" />
<Word from="finder" to="finder" />
<Word from="findes" to="findes" />
<Word from="irettesaefielse" to="irettesættelse" />
<Word from="ermed" to="er med" />
<Word from="èn" to="én" />
<Word from="gikjoi" to="gik jo i" />
<Word from="Hvisjeg" to="Hvis jeg" />
<Word from="ovemafier" to="overnatter" />
<Word from="hoident" to="holdent" />
<Word from="\Adne" to="Vidne" />
<Word from="fori" to="for i" />
<Word from="vei" to="vel" />
<Word from="savnerjerjo" to="savner jer jo" />
<Word from="elskerjer" to="elsker jer" />
<Word from="harløjet" to="har løjet" />
<Word from="eri" to="er i" />
<Word from="fiende" to="fjende" />
<Word from="derjo" to="der jo" />
<Word from="sigerjo" to="siger jo" />
<Word from="menerjeg" to="mener jeg" />
<Word from="Harjeg" to="Har jeg" />
<Word from="sigerjeg" to="siger jeg" />
<Word from="splitterjeg" to="splitter jeg" />
<Word from="erjournalist" to="er journalist" />
<Word from="erjoumalist" to="er journalist" />
<Word from="Forjeg" to="For jeg" />
<Word from="gârjeg" to="går jeg" />
<Word from="Nârjeg" to="Når jeg" />
<Word from="afllom" to="afkom" />
<Word from="farerjo" to="farer jo" />
<Word from="tagerjeg" to="tager jeg" />
<Word from="Virkerjeg" to="Virker jeg" />
<Word from="morerjer" to="morer jer" />
<Word from="kommerjo" to="kommer jo" />
<Word from="istand" to="i stand" />
<Word from="bøm" to="børn" />
<Word from="frygterjeg" to="frygter jeg" />
<Word from="kommerjeg" to="kommer jeg" />
<Word from="eriournalistelev" to="er journalistelev" />
<Word from="harfat" to="har fat" />
<Word from="fårfingre" to="får fingre" />
<Word from="slârjeg" to="slår jeg" />
<Word from="bam" to="barn" />
<Word from="erjournalistelev" to="er journalistelev" />
<Word from="politietjo" to="politiet jo" />
<Word from="elskerjo" to="elsker jo" />
<Word from="vari" to="var i" />
<Word from="fornemmerjeres" to="fornemmer jeres" />
<Word from="udklækketl" to="udklækket!" />
<Word from="í" to="i" />
<Word from="nyi" to="ny i" />
<Word from="Iumijelse" to="fornøjelse" />
<Word from="vures" to="vores" />
<Word from="I/Vashíngtan" to="Washington" />
<Word from="opleverjeg" to="oplever jeg" />
<Word from="PANTELÃNER" to="PANTELÅNER" />
<Word from="Gudmurgen" to="Godmorgen" />
<Word from="SKYDEVÃBEN" to="SKYDEVÅBEN" />
<Word from="PÃLIDELIG" to="PÅLIDELIG" />
<Word from="avertalte" to="overtalte" />
<Word from="Omsíder" to="Omsider" />
<Word from="lurtebåd" to="lortebåd" />
<Word from="Telrslning" to="Tekstning" />
<Word from="miUø" to="miljø" />
<Word from="gåri" to="går i" />
<Word from="Fan/el" to="Farvel" />
<Word from="abefiæs" to="abefjæs" />
<Word from="hartalt" to="har talt" />
<Word from="\Årkelig" to="Virkelig" />
<Word from="beklagerjeg" to="beklager jeg" />
<Word from="Nårjeg" to="Når jeg" />
<Word from="rnaend" to="mænd" />
<Word from="vaskebjorn" to="vaskebjørn" />
<Word from="Ivil" to="I vil" />
<Word from="besog" to="besøg" />
<Word from="Vaer" to="Vær" />
<Word from="Undersogte" to="Undersøgte" />
<Word from="modte" to="mødte" />
<Word from="toj" to="tøj" />
<Word from="fodt" to="født" />
<Word from="gore" to="gøre" />
<Word from="provede" to="prøvede" />
<Word from="forste" to="første" />
<Word from="igang" to="i gang" />
<Word from="ligenu" to="lige nu" />
<Word from="clet" to="det" />
<Word from="Strombell" to="Strombel!" />
<Word from="tmvlt" to="travlt" />
<Word from="studererjournalistik" to="studerer journalistik" />
<Word from="inforrnererjeg" to="informerer jeg" />
<Word from="omkfing" to="omkring" />
<Word from="tilAsgård" to="til Asgård" />
<Word from="Kederjeg" to="Keder jeg" />
<Word from="jaettetamp" to="jættetamp" />
<Word from="erjer" to="er jer" />
<Word from="atjulehygge" to="at julehygge" />
<Word from="Ueneste" to="tjeneste" />
<Word from="foltsaetter" to="fortsætter" />
<Word from="A/ice" to="Alice" />
<Word from="tvivlerjeg" to="tvivler jeg" />
<Word from="henterjer" to="henter jer" />
<Word from="forstårjeg" to="forstår jeg" />
<Word from="hvisjeg" to="hvis jeg" />
<Word from="/ært" to="lært" />
<Word from="vfgtrgt" to="vigtigt" />
<Word from="hurtigtjeg" to="hurtigt jeg" />
<Word from="kenderjo" to="kender jo" />
<Word from="seiv" to="selv" />
<Word from="/ægehuset" to="lægehuset" />
<Word from="herjo" to="her jo" />
<Word from="stolerjeg" to="stoler jeg" />
<Word from="digi" to="dig i" />
<Word from="taberi" to="taber i" />
<Word from="slårjeres" to="slår jeres" />
<Word from="laere" to="lære" />
<Word from="trænerwushu" to="træner wushu" />
<Word from="efterjeg" to="efter jeg" />
<Word from="efier" to="efter" />
<Word from="dui" to="du i" />
<Word from="afien" to="aften" />
<Word from="bliveri" to="bliver i" />
<Word from="acceptererjer" to="accepterer jer" />
<Word from="drikkerjo" to="drikker jo" />
<Word from="fianjin" to="Tianjin" />
<Word from="erlænge" to="er længe" />
<Word from="erikke" to="er ikke" />
<Word from="medjer" to="med jer" />
<Word from="Tmykke" to="Tillykke" />
<Word from="'fianjins" to="Tianjins" />
<Word from="Mesteri" to="Mester i" />
<Word from="sagdetil" to="sagde til" />
<Word from="indei" to="inde i" />
<Word from="ofie" to="ofte" />
<Word from="'filgiv" to="Tilgiv" />
<Word from="Lfår" to="I får" />
<Word from="viserjer" to="viser jer" />
<Word from="Rejsjerblot" to="Rejs jer blot" />
<Word from="'fillad" to="Tillad" />
<Word from="iiiiefinger" to="lillefinger" />
<Word from="VILOMFATTE" to="VIL OMFATTE" />
<Word from="mofio" to="motto" />
<Word from="gørjer" to="gør jer" />
<Word from="gifi" to="gift" />
<Word from="hardu" to="har du" />
<Word from="gifi" to="gift" />
<Word from="Iaeggerjeg" to="lægger jeg" />
<Word from="iet" to="i et" />
<Word from="sv/yte" to="svigte" />
<Word from="ti/" to="til" />
<Word from="Wdal" to="Vidal" />
<Word from="fiået" to="fået" />
<Word from="Hvo/for" to="Hvorfor" />
<Word from="hellerikke" to="heller ikke" />
<Word from="Wlle" to="Ville" />
<Word from="dr/ver" to="driver" />
<Word from="V\fllliam" to="William" />
<Word from="V\fllliams" to="Williams" />
<Word from="Vkfilliam" to="William" />
<Word from="vådejakke" to="våde jakke" />
<Word from="kæfll" to="kæft!" />
<Word from="sagdejeg" to="sagde jeg" />
<Word from="oven/ejet" to="overvejet" />
<Word from="karameisauce" to="karamelsauce" />
<Word from="Lfølgejødisk" to="Ifølge jødisk" />
<Word from="blevjo" to="blev jo" />
<Word from="asiateri" to="asiater i" />
<Word from="erV\fllliam" to="er William" />
<Word from="lidtflov" to="lidt flov" />
<Word from="sagdejo" to="sagde jo" />
<Word from="erlige" to="er lige" />
<Word from="Vtfilliam" to="William" />
<Word from="WfiII" to="Will" />
<Word from="afldarede" to="afklarede" />
<Word from="hjæiperjeg" to="hjælper jeg" />
<Word from="laderjeg" to="lader jeg" />
<Word from="Hândledsbeskyttere" to="Håndledsbeskyttere" />
<Word from="Lsabels" to="Isabels" />
<Word from="Gørjeg" to="Gør jeg" />
<Word from="mâjeg" to="må jeg" />
<Word from="ogjeg" to="og jeg" />
<Word from="gjordejeg" to="gjorde jeg" />
<Word from="villejeg" to="ville jeg" />
<Word from="Vlfllliams" to="Williams" />
<Word from="Dajeg" to="Da jeg" />
<Word from="iorden" to="i orden" />
<Word from="fandtjeg" to="fandt jeg" />
<Word from="Tilykke" to="Tillykke" />
<Word from="kørerjer" to="kører jer" />
<Word from="gøfjeg" to="gør jeg" />
<Word from="Selvflgelig" to="Selvfølgelig" />
<Word from="fdder" to="fadder" />
<Word from="bnfaldt" to="bønfaldt" />
<Word from="t\/ehovedede" to="tvehovedede" />
<Word from="EIler" to="Eller" />
<Word from="ringerjeg" to="ringer jeg" />
<Word from="blevvæk" to="blev væk" />
<Word from="stárjeg" to="står jeg" />
<Word from="varforbi" to="var forbi" />
<Word from="harfortalt" to="har fortalt" />
<Word from="iflere" to="i flere" />
<Word from="tørjeg" to="tør jeg" />
<Word from="kunnejeg" to="kunne jeg" />
<Word from="má" to="må" />
<Word from="hartænkt" to="har tænkt" />
<Word from="Fárjeg" to="Får jeg" />
<Word from="afdelingervar" to="afdelinger var" />
<Word from="0rd" to="ord" />
<Word from="pástá" to="påstå" />
<Word from="gráharet" to="gråharet" />
<Word from="varforbløffende" to="var forbløffende" />
<Word from="holdtjeg" to="holdt jeg" />
<Word from="hængerjo" to="hænger jo" />
<Word from="fikjeg" to="fik jeg" />
<Word from="fár" to="får" />
<Word from="Hvorforfølerjeg" to="Hvorfor føler jeg" />
<Word from="harfeber" to="har feber" />
<Word from="ándssvagt" to="åndssvagt" />
<Word from="0g" to="Og" />
<Word from="vartre" to="var tre" />
<Word from="abner" to="åbner" />
<Word from="garjeg" to="går jeg" />
<Word from="sertil" to="ser til" />
<Word from="hvorfin" to="hvor fin" />
<Word from="harfri" to="har fri" />
<Word from="forstarjeg" to="forstår jeg" />
<Word from="Sä" to="Så" />
<Word from="hvorfint" to="hvor fint" />
<Word from="mærkerjeg" to="mærker jeg" />
<Word from="ogsa" to="også" />
<Word from="nárjeg" to="når jeg" />
<Word from="Jasá" to="Jaså" />
<Word from="bándoptager" to="båndoptager" />
<Word from="bedárende" to="bedårende" />
<Word from="sá" to="så" />
<Word from="nár" to="når" />
<Word from="kunnejo" to="kunne jo" />
<Word from="Brammertil" to="Brammer til" />
<Word from="serjeg" to="ser jeg" />
<Word from="gikjeg" to="gik jeg" />
<Word from="udholderjeg" to="udholder jeg" />
<Word from="máneder" to="måneder" />
<Word from="vartræt" to="var træt" />
<Word from="dárligt" to="dårligt" />
<Word from="klaretjer" to="klaret jer" />
<Word from="pavirkelig" to="påvirkelig" />
<Word from="spekulererjeg" to="spekulerer jeg" />
<Word from="forsøgerjeg" to="forsøger jeg" />
<Word from="huskerjeg" to="husker jeg" />
<Word from="ifavnen" to="i favnen" />
<Word from="skullejo" to="skulle jo" />
<Word from="vartung" to="var tung" />
<Word from="varfuldstændig" to="var fuldstændig" />
<Word from="Paskedag" to="Påskedag" />
<Word from="turi" to="tur i" />
<Word from="spillerschumanns" to="spiller Schumanns" />
<Word from="forstárjeg" to="forstår jeg" />
<Word from="istedet" to="i stedet" />
<Word from="nárfrem" to="når frem" />
<Word from="habertrods" to="håber trods" />
<Word from="forførste" to="for første" />
<Word from="varto" to="var to" />
<Word from="overtil" to="over til" />
<Word from="forfem" to="for fem" />
<Word from="holdtjo" to="holdt jo" />
<Word from="passerjo" to="passer jo" />
<Word from="ellerto" to="eller to" />
<Word from="hartrods" to="har trods" />
<Word from="harfuldstændig" to="har fuldstændig" />
<Word from="gårjeg" to="går jeg" />
<Word from="giderjeg" to="gider jeg" />
<Word from="forjer" to="for jer" />
<Word from="erindrerjeg" to="erindrer jeg" />
<Word from="tænkerjeg" to="tænker jeg" />
<Word from="GAEt" to="GÅET" />
<Word from="hørerjo" to="hører jo" />
<Word from="forladerjeg" to="forlader jeg" />
<Word from="kosterjo" to="koster jo" />
<Word from="fortællerjeg" to="fortæller jeg" />
<Word from="Forstyrrerjeg" to="Forstyrrer jeg" />
<Word from="tjekkerjeg" to="tjekker jeg" />
<Word from="erjurist" to="er jurist" />
<Word from="tlLBUD" to="TILBUD" />
<Word from="serjo" to="se rjo" />
<Word from="bederjeg" to="beder jeg" />
<Word from="bilderjeg" to="bilder jeg" />
<Word from="ULVEtlME" to="ULVETlME" />
<Word from="skærerjo" to="skærer jo" />
<Word from="afjer" to="af jer" />
<Word from="ordnerjeg" to="ordner jeg" />
<Word from="giverjeg" to="giver jeg" />
<Word from="rejservi" to="rejser vi" />
<Word from="fangerjeg" to="fanger jeg" />
<Word from="erjaloux" to="er jaloux" />
<Word from="glemmerjeg" to="glemmer jeg" />
<Word from="Behøverjeg" to="Behøver jeg" />
<Word from="harvi" to="har vi" />
<Word from="ertyndere" to="er tyndere" />
<Word from="fårtordenvejr" to="får tordenvejr" />
<Word from="varfærdig" to="var færdig" />
<Word from="hørerfor" to="hører for" />
<Word from="varvel" to="var vel" />
<Word from="erforbi" to="er forbi" />
<Word from="AIle" to="Alle" />
<Word from="læserjo" to="læser jo" />
<Word from="Edgarer" to="Edgar er" />
<Word from="hartaget" to="har taget" />
<Word from="derer" to="der er" />
<Word from="stikkerfrem" to="stikker frem" />
<Word from="haraldrig" to="har aldrig" />
<Word from="ellerfar" to="eller far" />
<Word from="erat" to="er at" />
<Word from="turtil" to="tur til" />
<Word from="erfærdig" to="er færdig" />
<Word from="følerjeg" to="føler jeg" />
<Word from="jerfra" to="jer fra" />
<Word from="eralt" to="er alt" />
<Word from="harfaktisk" to="har faktisk" />
<Word from="harfundet" to="har fundet" />
<Word from="harvendt" to="har vendt" />
<Word from="Kunstneraf" to="Kunstner af" />
<Word from="ervel" to="er vel" />
<Word from="ståransigt" to="står ansigt" />
<Word from="Erjeg" to="Er jeg" />
<Word from="venterjeg" to="venter jeg" />
<Word from="Hvorvar" to="Hvor var" />
<Word from="varfint" to="var fint" />
<Word from="ervarmt" to="er varmt" />
<Word from="gårfint" to="går fint" />
<Word from="flyverforbi" to="flyver forbi" />
<Word from="Dervar" to="Der var" />
<Word from="dervar" to="der var" />
<Word from="meneråndeligt" to="mener åndeligt" />
<Word from="forat" to="for at" />
<Word from="herovertil" to="herover til" />
<Word from="soverfor" to="sover for" />
<Word from="begyndtejeg" to="begyndte jeg" />
<Word from="vendertilbage" to="vender tilbage" />
<Word from="erforfærdelig" to="er forfærdelig" />
<Word from="gøraltid" to="gør altid" />
<Word from="ertilbage" to="er tilbage" />
<Word from="harværet" to="har været" />
<Word from="bagoverellertil" to="bagover eller til" />
<Word from="hertaler" to="her taler" />
<Word from="vågnerjeg" to="vågner jeg" />
<Word from="vartomt" to="var tomt" />
<Word from="gårfrem" to="går frem" />
<Word from="talertil" to="taler til" />
<Word from="ertryg" to="er tryg" />
<Word from="ansigtervendes" to="ansigter vendes" />
<Word from="hervirkeligt" to="her virkeligt" />
<Word from="herer" to="her er" />
<Word from="drømmerjo" to="drømmer jo" />
<Word from="erfuldkommen" to="er fuldkommen" />
<Word from="hveren" to="hver en" />
<Word from="erfej" to="er fej" />
<Word from="datterforgæves" to="datter forgæves" />
<Word from="forsøgerjo" to="forsøger jo" />
<Word from="ertom" to="er tom" />
<Word from="vareftermiddag" to="var eftermiddag" />
<Word from="vartom" to="var tom" />
<Word from="angerellerforventninger" to="anger eller forventninger" />
<Word from="kørtejeg" to="kørte jeg" />
<Word from="Hvorforfortæller" to="Hvorfor fortæller" />
<Word from="gårtil" to="går til" />
<Word from="ringerefter" to="ringer efter" />
<Word from="søgertilflugt" to="søger tilflugt" />
<Word from="ertvunget" to="er tvunget" />
<Word from="megetjeg" to="meget jeg" />
<Word from="varikke" to="var ikke" />
<Word from="Derermange" to="Der e rmange" />
<Word from="dervilhindre" to="der vil hindre" />
<Word from="erså" to="er så" />
<Word from="DetforstårLeggodt" to="Det forstår jeg godt" />
<Word from="ergodt" to="er godt" />
<Word from="vorventen" to="vor venten" />
<Word from="tagerfejl" to="tager fejl" />
<Word from="ellerer" to="eller er" />
<Word from="laverjeg" to="laver jeg" />
<Word from="0mgang" to="omgang" />
<Word from="afstár" to="afstår" />
<Word from="pá" to="på" />
<Word from="rejserjeg" to="rejser jeg" />
<Word from="ellertage" to="eller tage" />
<Word from="takkerjeg" to="takker jeg" />
<Word from="ertilfældigvis" to="er tilfældigvis" />
<Word from="fremstar" to="fremstår" />
<Word from="ertæt" to="er tæt" />
<Word from="ijeres" to="i jeres" />
<Word from="Sagdejeg" to="Sagde jeg" />
<Word from="overi" to="over i" />
<Word from="plukkerjordbær" to="plukker jordbær" />
<Word from="klarerjeg" to="klarer jeg" />
<Word from="jerfire" to="jer fire" />
<Word from="tábeligste" to="tåbeligste" />
<Word from="sigertvillingerne" to="siger tvillingerne" />
<Word from="erfaktisk" to="er faktisk" />
<Word from="gár" to="går" />
<Word from="harvasket" to="har vasket" />
<Word from="harplukketjordbærtil" to="har plukket jordbær til" />
<Word from="plukketjordbær" to="plukket jordbær" />
<Word from="klaverfirehændigt" to="klaver firehændigt" />
<Word from="erjævnaldrende" to="er jævnaldrende" />
<Word from="tierjeg" to="tier jeg" />
<Word from="Hvorerden" to="Hvor er den" />
<Word from="0veraltjeg" to="overalt jeg" />
<Word from="gårpå" to="går på" />
<Word from="finderjeg" to="finder jeg" />
<Word from="serhans" to="ser hans" />
<Word from="tiderbliver" to="tider bliver" />
<Word from="ellertrist" to="eller trist" />
<Word from="forstårjeres" to="forstår jeres" />
<Word from="Hvorsjælen" to="Hvor sjælen" />
<Word from="finderro" to="finder ro" />
<Word from="sidderjeg" to="sidder jeg" />
<Word from="tagerjo" to="tager jo" />
<Word from="efterjeres" to="efter jeres" />
<Word from="10O" to="100" />
<Word from="besluttedejeg" to="besluttede jeg" />
<Word from="varsket" to="var sket" />
<Word from="uadskillige" to="uadskillelige" />
<Word from="harjetlag" to="har jetlag" />
<Word from="lkke" to="Ikke" />
<Word from="lntet" to="Intet" />
<Word from="afslørerjeg" to="afslører jeg" />
<Word from="måjeg" to="må jeg" />
<Word from="Vl" to="VI" />
<Word from="atbygge" to="at bygge" />
<Word from="detmakabre" to="det makabre" />
<Word from="vilikke" to="vil ikke" />
<Word from="talsmandbekræfter" to="talsmand bekræfter" />
<Word from="vedatrenovere" to="ved at renovere" />
<Word from="forsøgeratforstå" to="forsøger at forstå" />
<Word from="ersket" to="er sket" />
<Word from="morderpå" to="morder på" />
<Word from="frifodiRosewood" to="fri fod i Rosewood" />
<Word from="holdtpressemøde" to="holdt pressemøde" />
<Word from="lngen" to="Ingen" />
<Word from="lND" to="IND" />
<Word from="henterjeg" to="henter jeg" />
<Word from="lsabel" to="Isabel" />
<Word from="lsabels" to="Isabels" />
<Word from="vinderjo" to="vinder jo" />
<Word from="rødmerjo" to="rødmer jo" />
<Word from="etjakkesæt" to="et jakkesæt" />
<Word from="glæderjeg" to="glæder jeg" />
<Word from="lgen" to="Igen" />
<Word from="lsær" to="Især" />
<Word from="iparken" to="i parken" />
<Word from="nårl" to="når I" />
<Word from="tilA1" to="til A1" />
<Word from="FBl" to="FBI" />
<Word from="viljo" to="vil jo" />
<Word from="detpå" to="det på" />
<Word from="KIar" to="Klar" />
<Word from="PIan" to="Plan" />
<Word from="EIIer" to="Eller" />
<Word from="FIot" to="Flot" />
<Word from="AIIe" to="Alle" />
<Word from="AIt" to="Alt" />
<Word from="KIap" to="Klap" />
<Word from="PIaza" to="Plaza" />
<Word from="SIap" to="Slap" />
<Word from="Iå" to="lå" />
<Word from="BIing" to="Bling" />
<Word from="GIade" to="Glade" />
<Word from="Iejrbålssange" to="lejrbålssange" />
<Word from="bedtjer" to="bedt jer" />
<Word from="hørerjeg" to="hører jeg" />
<Word from="Fårjeg" to="Får jeg" />
<Word from="fikJames" to="fik James" />
<Word from="atsnakke" to="at snakke" />
<Word from="varkun" to="var kun" />
<Word from="retterjeg" to="retter jeg" />
<Word from="ernormale" to="er normale" />
<Word from="viljeg" to="vil jeg" />
<Word from="Sætjer" to="Sæt jer" />
<Word from="udsatham" to="udsat ham" />
</WholeWords>
<PartialWordsAlways>
<WordPart from="¤" to="o" />
<WordPart from="IVI" to="M" />
<WordPart from="lVI" to="M" />
<WordPart from="IVl" to="M" />
<WordPart from="lVl" to="M" />
</PartialWordsAlways>
<PartialWords>
<!-- Will be used to check words not in dictionary -->
<!-- If new word(s) exists in spelling dictionary, it(they) is accepted -->
<WordPart from="fi" to="fi" />
<WordPart from="fl" to="fl" />
<WordPart from="/" to="l" />
<WordPart from="vv" to="w" />
<WordPart from="m" to="rn" />
<WordPart from="l" to="i" />
<WordPart from="€" to="e" />
<WordPart from="I" to="l" />
<WordPart from="c" to="o" />
<WordPart from="i" to="t" />
<WordPart from="cc" to="oo" />
<WordPart from="ii" to="tt" />
<WordPart from="n/" to="ry" />
<WordPart from="ae" to="æ" />
<!-- "f " will be two words -->
<WordPart from="f" to="f " />
<WordPart from="c" to="e" />
<WordPart from="o" to="e" />
<WordPart from="I" to="t" />
<WordPart from="n" to="o" />
<WordPart from="s" to="e" />
<WordPart from="\A" to="Vi" />
<WordPart from="n/" to="rv" />
<WordPart from="Ã" to="Å" />
<WordPart from="í" to="i" />
</PartialWords>
<PartialLines />
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions />
</OCRFixReplaceList>
@@ -0,0 +1,270 @@
<OCRFixReplaceList>
<WholeWords>
<Word from="@immatriculation" to="d'immatriculation" />
<Word from="acquer" to="acquér" />
<Word from="acteurjoue" to="acteur joue" />
<Word from="aerien" to="aérien" />
<Word from="agreable" to="agréable" />
<Word from="aientjamais" to="aient jamais" />
<Word from="AII" to="All" />
<Word from="aitjamais" to="ait jamais" />
<Word from="aitjus" to="ait jus" />
<Word from="alle" to="allé" />
<Word from="alles" to="allés" />
<Word from="appele" to="appelé" />
<Word from="apres" to="après" />
<Word from="aujourdhui" to="aujourd'hui" />
<Word from="aupres" to="auprès" />
<Word from="beaute" to="beauté" />
<Word from="cabossee" to="cabossée" />
<Word from="carj'" to="car j'" />
<Word from="Carj'" to="Car j'" />
<Word from="carla" to="car la" />
<Word from="CEdipe" to="Œdipe" />
<Word from="Cest" to="C'est" />
<Word from="c'etaient" to="c'étaient" />
<Word from="Cétaient" to="C'étaient" />
<Word from="c'etait" to="c'était" />
<Word from="C'etait" to="C'était" />
<Word from="Cétait" to="C'était" />
<Word from="choregraphiee" to="chorégraphiée" />
<Word from="cinema" to="cinéma" />
<Word from="cl'AIcatraz" to="d'Alcatraz" />
<Word from="cles" to="clés" />
<Word from="cœurjoie" to="cœur-joie" />
<Word from="completer" to="compléter" />
<Word from="costumiere" to="costumière" />
<Word from="cree" to="créé" />
<Word from="daccord" to="d'accord" />
<Word from="d'AIbert" to="d'Albert" />
<Word from="d'AIdous" to="d'Aldous" />
<Word from="d'AIec" to="d'Alec" />
<Word from="danniversaire" to="d'anniversaire" />
<Word from="d'Arra'bida" to="d'Arrabida" />
<Word from="d'autodérision" to="d'auto-dérision" />
<Word from="dautres" to="d'autres" />
<Word from="debattait" to="débattait" />
<Word from="decor" to="décor" />
<Word from="decorateurs" to="décorateurs" />
<Word from="decors" to="décors" />
<Word from="defi" to="défi" />
<Word from="dejà" to="déjà" />
<Word from="déjàm" to="déjà..." />
<Word from="dejeunait" to="déjeunait" />
<Word from="dengager" to="d'engager" />
<Word from="déquipement" to="d'équipement" />
<Word from="dérnièré" to="dernière" />
<Word from="Desole" to="Désolé" />
<Word from="dessayage" to="d'essayage" />
<Word from="dessence" to="d'essence" />
<Word from="détaient" to="c'étaient" />
<Word from="detail" to="détail" />
<Word from="dexcellents" to="d'excellents" />
<Word from="dexpérience" to="d'expérience" />
<Word from="dexpériences" to="d'expériences" />
<Word from="d'héro'l'ne" to="d'héroïne" />
<Word from="d'idees" to="d'idées" />
<Word from="d'intensite" to="d'intensité" />
<Word from="dontj" to="dont j" />
<Word from="doublaitAlfo" to="doublait Alfo" />
<Word from="DrNo" to="Dr No" />
<Word from="e'" to="é" />
<Word from="ecrit" to="écrit" />
<Word from="elegant" to="élégant" />
<Word from="Ellé" to="Elle" />
<Word from="én" to="en" />
<Word from="equipe" to="équipe" />
<Word from="erjus" to="er jus" />
<Word from="estjamais" to="est jamais" />
<Word from="ét" to="et" />
<Word from="etaient" to="étaient" />
<Word from="etait" to="était" />
<Word from="ete" to="été" />
<Word from="etiez" to="étiez" />
<Word from="etj'" to="et j'" />
<Word from="Etj'" to="Et j'" />
<Word from="etje" to="et je" />
<Word from="Etje" to="Et je" />
<Word from="EtsouvenL" to="Et souvent" />
<Word from="eviter" to="éviter" />
<Word from="Fabsence" to="l'absence" />
<Word from="fadapter" to="t'adapter" />
<Word from="fadore" to="j'adore" />
<Word from="Fâge" to="l'âge" />
<Word from="Fagent" to="l'agent" />
<Word from="faiessayé" to="j'ai essayé" />
<Word from="Failure" to="l'alllure" />
<Word from="Fambiance" to="l'ambiance" />
<Word from="Famener" to="l'amener" />
<Word from="Fanniversaire" to="l'anniversaire" />
<Word from="Fapparence" to="l'apparence" />
<Word from="Fapres" to="l'apres" />
<Word from="Faprès" to="l'après" />
<Word from="Farmée" to="l'armée" />
<Word from="Farrière" to="l'arrière" />
<Word from="Farrivée" to="l'arrivée" />
<Word from="Fascenseur" to="l'ascenseur" />
<Word from="Fascension" to="l'ascension" />
<Word from="Fassaut" to="l'assaut" />
<Word from="Fassomme" to="l'assomme" />
<Word from="Fatmosphère" to="l'atmosphère" />
<Word from="Fattention" to="l'attention" />
<Word from="Favalanche" to="l'avalanche" />
<Word from="Féclairage" to="l'éclairage" />
<Word from="Fécran" to="l'écran" />
<Word from="Fémotion" to="l'émotion" />
<Word from="Femplacement" to="l'emplacement" />
<Word from="Fendroit" to="l'endroit" />
<Word from="Fenseigne" to="l'enseigne" />
<Word from="Fensemble" to="l'ensemble" />
<Word from="Fentouraient" to="l'entouraient" />
<Word from="Fentrée" to="l'entrée" />
<Word from="Fépaisseur" to="l'épaisseur" />
<Word from="Fépoque" to="l'époque" />
<Word from="Féquipe" to="Équipe" />
<Word from="Fespace" to="l'espace" />
<Word from="fespérais" to="j'espérais" />
<Word from="Fespère" to="l'espère" />
<Word from="Festhétique" to="l'esthétique" />
<Word from="Fetranger" to="l'etranger" />
<Word from="Févasion" to="l'évasion" />
<Word from="Févoque" to="l'évoque" />
<Word from="Fexpérience" to="l'expérience" />
<Word from="Fexplique" to="l'explique" />
<Word from="Fexplosion" to="l'explosion" />
<Word from="Fextérieur" to="l'extérieur" />
<Word from="Fhabituelle" to="l'habituelle" />
<Word from="Fhélicoptère" to="l'hélicoptère" />
<Word from="Fhéliport" to="l'héliport" />
<Word from="Fhélistation" to="l'hélistation" />
<Word from="Fhonneur" to="l'honneur" />
<Word from="Fhorloge" to="l'horloge" />
<Word from="Fidée" to="l'idée" />
<Word from="Fimage" to="l'image" />
<Word from="Fimportance" to="l'importance" />
<Word from="Fimpression" to="l'impression" />
<Word from="Finfluence" to="l'influence" />
<Word from="Finscription" to="l'inscription" />
<Word from="Fintérieur" to="l'intérieur" />
<Word from="Fintrigue" to="l'intrigue" />
<Word from="Fobjectif" to="l'objectif" />
<Word from="Foccasion" to="l'occasion" />
<Word from="Fordre" to="l'ordre" />
<Word from="Forigine" to="l'origine" />
<Word from="frêre" to="frère" />
<Word from="gaylns" to="gaijins" />
<Word from="general" to="général" />
<Word from="hawaïennel" to="hawaïenne" />
<Word from="hawa'l'en" to="hawaïen" />
<Word from="Ia" to="la" />
<Word from="Ià" to="là" />
<Word from="Iaryngotomie" to="laryngotomie" />
<Word from="idee" to="idée" />
<Word from="idees" to="idées" />
<Word from="Ie" to="le" />
<Word from="Ies" to="les" />
<Word from="Iester" to="Lester" />
<Word from="II" to="Il" />
<Word from="Iimit" to="limit" />
<Word from="IIs" to="Ils" />
<Word from="immediatement" to="immédiatement" />
<Word from="insufflee" to="insufflée" />
<Word from="integrer" to="intégrer" />
<Word from="interessante" to="intéressante" />
<Word from="Iogions" to="logions" />
<Word from="Iorsqu" to="lorsqu" />
<Word from="isee" to="isée" />
<Word from="Iumiere" to="lumiere" />
<Word from="Iynchage" to="lynchage" />
<Word from="J'espere" to="J'espère" />
<Word from="Jessaie" to="J'essaie" />
<Word from="j'etais" to="j'étais" />
<Word from="J'etais" to="J'étais" />
<Word from="latéralémént" to="latéralement" />
<Word from="lci" to="Ici" />
<Word from="Lci" to="Ici" />
<Word from="lé-" to="là-" />
<Word from="lepidopteres" to="lépidoptères" />
<Word from="litteraire" to="littéraire" />
<Word from="ll" to="il" />
<Word from="Ll" to="Il" />
<Word from="lls" to="ils" />
<Word from="Lls" to="Ils" />
<Word from="maintenanu" to="maintenant" />
<Word from="maniere" to="manière" />
<Word from="mariee" to="mariée" />
<Word from="Mayer/ing" to="Mayerling" />
<Word from="meilleurjour" to="meilleur jour" />
<Word from="melange" to="mélange" />
<Word from="n'avaiént" to="n'avaient" />
<Word from="n'etait" to="n'était" />
<Word from="oitjamais" to="oit jamais" />
<Word from="oitjus" to="oit jus" />
<Word from="ontete" to="ont été" />
<Word from="operateur" to="opérateur" />
<Word from="ouvérté" to="ouverte" />
<Word from="Pépreuve" to="l'épreuve" />
<Word from="pere" to="père" />
<Word from="plateforme" to="plate-forme" />
<Word from="pourjouer" to="pour jouer" />
<Word from="precipice" to="précipice" />
<Word from="preferes" to="préférés" />
<Word from="premierjour" to="premier jour" />
<Word from="presenter" to="présenter" />
<Word from="prevu" to="prévu" />
<Word from="prevue" to="prévue" />
<Word from="propriete" to="propriété" />
<Word from="protègeraient" to="protégeraient" />
<Word from="qué" to="que" />
<Word from="qwangoissé" to="qu'angoissé" />
<Word from="realisateur" to="réalisateur" />
<Word from="reception" to="réception" />
<Word from="reévalu" to="réévalu" />
<Word from="repute" to="réputé" />
<Word from="reussi" to="réussi" />
<Word from="s'arrétait" to="s'arrêtait" />
<Word from="s'ave'rer" to="s'avérer" />
<Word from="scenario" to="scénario" />
<Word from="scene" to="scène" />
<Word from="scenes" to="scènes" />
<Word from="seances" to="séances" />
<Word from="sequence" to="séquence" />
<Word from="sflécrasa" to="s'écrasa" />
<Word from="speciale" to="spéciale" />
<Word from="Supen" to="Super" />
<Word from="torturee" to="torturée" />
<Word from="Uadmirable" to="L'admirable" />
<Word from="Uensemblier" to="L'ensemblier" />
<Word from="Uexplosion" to="L'explosion" />
<Word from="Uouvre" to="L'ouvre" />
<Word from="Vaise" to="l'aise" />
<Word from="vecu" to="vécu" />
<Word from="vehicules" to="véhicules" />
<Word from="Ÿappréciais" to="J'appréciais" />
<Word from="Ÿespère" to="J'espère" />
<Word from="ÿétrangle" to="s'étrangle" />
</WholeWords>
<PartialWordsAlways />
<PartialWords />
<PartialLines>
<LinePart from=" I'" to=" l'" />
<LinePart from=" |'" to=" l'" />
</PartialLines>
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines>
<Line from="&quot;D'ac:c:ord.&quot;" to="&quot;D'accord.&quot;" />
<Line from="“i QUÎ gagne, qui perd," to="ni qui gagne, qui perd," />
<Line from="L'ac:c:ent est mis &#xD;&#xA; &#xD;&#xA; sur son trajet jusqu'en Suisse." to="L'accent est mis &#xD;&#xA; &#xD;&#xA; sur son trajet jusqu'en Suisse." />
<Line from="C'est la plus gentille chose &#xD;&#xA; &#xD;&#xA; qu'Hitchc:oc:k m'ait jamais dite." to="C'est la plus gentille chose &#xD;&#xA; &#xD;&#xA; qu'Hitchcock m'ait jamais dite." />
<Line from="Tout le monde, en revanche, qualifie &#xD;&#xA; &#xD;&#xA; Goldfinger d'aventu re structurée," to="Tout le monde, en revanche, qualifie &#xD;&#xA; &#xD;&#xA; Goldfinger d'aventure structurée," />
<Line from="et le film Shadow of a man &#xD;&#xA; &#xD;&#xA; a lancé sa carrière au cinéma." to="et le film &lt;i&gt;Shadow of a man&lt;/i&gt; &#xD;&#xA; &#xD;&#xA; a lancé sa carrière au cinéma." />
<Line from="En 1948, Young est passé à la réalisation &#xD;&#xA; &#xD;&#xA; avec One night with you." to="En 1948, Young est passé à la réalisation &#xD;&#xA; &#xD;&#xA; avec &lt;i&gt;One night with you&lt;/i&gt;." />
<Line from="Il a construit tous ces véhicules &#xD;&#xA; &#xD;&#xA; à C)c:ala, en Floride." to="Il a construit tous ces véhicules &#xD;&#xA; &#xD;&#xA; à Ocala, en Floride." />
<Line from="Tokyo Pop et A Taxing Woman? Return." to="Tokyo Pop et A Taxing Woman's Return." />
<Line from="Peter H u nt." to="Peter Hunt." />
<Line from="&quot;C'est bien mieux dans Peau. &#xD;&#xA; &#xD;&#xA; On peut sfléclabousser, faire du bruit.&quot;" to="&quot;C'est bien mieux dans l'eau. &#xD;&#xA; &#xD;&#xA; On peut s'éclabousser, faire du bruit.&quot;" />
</WholeLines>
<RegularExpressions />
</OCRFixReplaceList>
@@ -0,0 +1,25 @@
<OCRFixReplaceList>
<WholeWords />
<PartialWordsAlways />
<PartialWords />
<PartialLines />
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions>
<!-- nagy I-l javítások -->
<RegEx find="([\x41-\x5a\x61-\x7a\xc1-\xfc])II" replaceWith="$1ll" />
<RegEx find="II([\x61-\x7a\xe1-\xfc])" replaceWith="ll$1" />
<RegEx find="([\x61-\x7a\xe1-\xfc])I" replaceWith="$1l" />
<RegEx find="([\x20])I([^aeou\x41-\x5a\xc1-\xdc])" replaceWith="$1l$2" />
<RegEx find="\bl([bcdfghjklmnpqrstvwxz])" replaceWith="I$1" />
<RegEx find="([\x41-\x5a\xc1-\xdc])I([\x61-\x7a\xe1-\xfc])" replaceWith="$1l$2" />
<RegEx find="([\x61-\x7a\xe1-\xfc][\-])I([\x61-\x7a\xe1-\xfc])" replaceWith="$1l$2" />
<RegEx find="([\x41-\x5a\xc1-\xdc])I([\-][\x41-\x5a\xc1-\xdc][\x61-\x7a\xe1-\xfc])" replaceWith="$1l$2" />
<RegEx find="\b([AEÜÓ])I([^\x41-\x5a\xc1-\xdc])" replaceWith="$1l$2" />
<RegEx find="\bI([aáeéiíoóöuúüy\xf5\xfb])" replaceWith="l$1" />
<RegEx find="\b(?:II|ll)" replaceWith="Il" />
<RegEx find="([\xf5\xfb])I" replaceWith="$1l" />
</RegularExpressions>
</OCRFixReplaceList>
@@ -0,0 +1,24 @@
<OCRFixReplaceList>
<WholeWords>
<Word from="ls" to="Is" />
<Word from="ln" to="In" />
<Word from="lk" to="Ik" />
<Word from="ledereen" to="Iedereen" />
<Word from="ledere" to="Iedere" />
<Word from="lemand" to="Iemand" />
</WholeWords>
<PartialWordsAlways />
<PartialWords />
<PartialLines />
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions>
<RegEx find="\blk(?=\p{Ll}{2})" replaceWith="Ik" />
<RegEx find="\bln(?=\p{Ll}{2})" replaceWith="In" />
<RegEx find="\bls(?=\p{Ll}{2})" replaceWith="Is" />
<RegEx find="\beIk" replaceWith="elk" />
<RegEx find="\bler(land|se|s|)\b" replaceWith="Ier$1" />
</RegularExpressions>
</OCRFixReplaceList>
@@ -0,0 +1,43 @@
<OCRFixReplaceList>
<WholeWords />
<PartialWordsAlways />
<PartialWords>
<!-- Will be used to check words not in dictionary -->
<!-- If new word(s) exists in spelling dictionary, it(they) is accepted -->
<WordPart from="¤" to="o" />
<WordPart from="fi" to="fi" />
<WordPart from="fl" to="fl" />
<WordPart from="/" to="l" />
<WordPart from="vv" to="w" />
<WordPart from="IVI" to="M" />
<WordPart from="lVI" to="M" />
<WordPart from="IVl" to="M" />
<WordPart from="lVl" to="M" />
<WordPart from="m" to="rn" />
<WordPart from="l" to="i" />
<WordPart from="€" to="e" />
<WordPart from="I" to="l" />
<WordPart from="c" to="o" />
<WordPart from="i" to="t" />
<WordPart from="cc" to="oo" />
<WordPart from="ii" to="tt" />
<WordPart from="n/" to="ry" />
<WordPart from="ae" to="æ" />
<!-- "f " will be two words -->
<WordPart from="f" to="f " />
<WordPart from="c" to="e" />
<WordPart from="I" to="t" />
<WordPart from="n" to="o" />
<WordPart from="s" to="e" />
<WordPart from="\A" to="Vi" />
<WordPart from="n/" to="rv" />
<WordPart from="Ã" to="Å" />
<WordPart from="í" to="i" />
</PartialWords>
<PartialLines />
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions />
</OCRFixReplaceList>
@@ -0,0 +1,508 @@
<OCRFixReplaceList>
<WholeWords>
<Word from="abitual" to="habitual" />
<Word from="àcerca" to="acerca" />
<Word from="acessor" to="assessor" />
<Word from="acólico" to="acólito" />
<Word from="açoreano" to="açoriano" />
<Word from="actuacao" to="actuação" />
<Word from="acucar" to="açúcar" />
<Word from="açucar" to="açúcar" />
<Word from="advinhar" to="adivinhar" />
<Word from="africa" to="África" />
<Word from="ajuisar" to="ajuizar" />
<Word from="album" to="álbum" />
<Word from="alcoolémia" to="alcoolemia" />
<Word from="aldião" to="aldeão" />
<Word from="algerino" to="argelino" />
<Word from="ameixeal" to="ameixial" />
<Word from="amiaça" to="ameaça" />
<Word from="analizar" to="analisar" />
<Word from="andáste" to="andaste" />
<Word from="anemona" to="anémona" />
<Word from="antartico" to="antárctico" />
<Word from="antártico" to="antárctico" />
<Word from="antepôr" to="antepor" />
<Word from="apárte" to="aparte" />
<Word from="apiadeiro" to="apeadeiro" />
<Word from="apiar" to="apear" />
<Word from="apreciacao" to="apreciação" />
<Word from="arctico" to="árctico" />
<Word from="arrazar" to="arrasar" />
<Word from="ártico" to="árctico" />
<Word from="artifice" to="artífice" />
<Word from="artifícial" to="artificial" />
<Word from="ascenção" to="ascensão" />
<!-- <Word from="assucar" to="açúcar" /> assucar é uma palavra existente no dicionário -->
<Word from="assúcar" to="açúcar" />
<Word from="aste" to="haste" />
<Word from="asterístico" to="asterisco" />
<Word from="averção" to="aversão" />
<Word from="avizar" to="avisar" />
<Word from="avulsso" to="avulso" />
<Word from="baínha" to="bainha" />
<Word from="banca-rota" to="bancarrota" />
<Word from="bandeija" to="bandeja" />
<Word from="bébé" to="bebé" />
<Word from="beige" to="bege" />
<Word from="benção" to="bênção" />
<Word from="beneficiência" to="beneficência" />
<Word from="beneficiente" to="beneficente" />
<Word from="benvinda" to="bem-vinda" />
<Word from="benvindo" to="bem-vindo" />
<Word from="boasvindas" to="boas-vindas" />
<Word from="borborinho" to="burburinho" />
<Word from="Brazil" to="Brasil" />
<Word from="bussula" to="bússola" />
<Word from="cabo-verdeano" to="cabo-verdiano" />
<Word from="caimbras" to="cãibras" />
<Word from="calcáreo" to="calcário" />
<Word from="calsado" to="calçado" />
<Word from="calvíce" to="calvície" />
<Word from="camoneano" to="camoniano" />
<Word from="campião" to="campeão" />
<Word from="cançacos" to="cansaços" />
<Word from="caracter" to="carácter" />
<Word from="caractéres" to="caracteres" />
<Word from="catequeze" to="catequese" />
<Word from="catequisador" to="catequizador" />
<Word from="catequisar" to="catequizar" />
<Word from="chícara" to="xícara" />
<Word from="ciclano" to="sicrano" />
<Word from="cicrano" to="sicrano" />
<Word from="cidadães" to="cidadãos" />
<Word from="cidadões" to="cidadãos" />
<Word from="cincoenta" to="cinquenta" />
<Word from="cinseiro" to="cinzeiro" />
<Word from="cinsero" to="sincero" />
<Word from="citacoes" to="citações" />
<Word from="coalizão" to="colisão" />
<Word from="côdia" to="côdea" />
<Word from="combóio" to="comboio" />
<Word from="compôr" to="compor" />
<Word from="concerteza" to="com certeza" />
<Word from="constituia" to="constituía" />
<Word from="constituíu" to="constituiu" />
<Word from="contato" to="contacto" />
<Word from="contensão" to="contenção" />
<Word from="contribuicoes" to="contribuições" />
<Word from="côr" to="cor" />
<Word from="corassão" to="coração" />
<Word from="corçario" to="corsário" />
<Word from="corçário" to="corsário" />
<Word from="cornprimidosinbo" to="comprimidozinho" />
<!-- <Word from="cota-parte" to="quota-parte" /> é uma palavra existente no dicionário -->
<Word from="crâneo" to="crânio" />
<Word from="dE" to="de" />
<Word from="defenição" to="definição" />
<Word from="defenido" to="definido" />
<Word from="defenir" to="definir" />
<Word from="deficite" to="défice" />
<Word from="degladiar" to="digladiar" />
<Word from="deiche" to="deixe" />
<Word from="desinteria" to="disenteria" />
<Word from="despendio" to="dispêndio" />
<Word from="despêndio" to="dispêndio" />
<Word from="desplicência" to="displicência" />
<Word from="dificulidade" to="dificuldade" />
<Word from="dispender" to="despender" />
<Word from="dispendio" to="dispêndio" />
<Word from="distribuido" to="distribuído" />
<Word from="druída" to="druida" />
<Word from="écrã" to="ecrã" />
<Word from="ecran" to="ecrã" />
<Word from="écran" to="ecrã" />
<Word from="êle" to="ele" />
<Word from="elice" to="hélice" />
<Word from="élice" to="hélice" />
<Word from="emiratos" to="emirados" />
<Word from="engolis-te" to="engoliste" />
<Word from="engulir" to="engolir" />
<Word from="enguliste" to="engoliste" />
<Word from="entertido" to="entretido" />
<Word from="entitular" to="intitular" />
<Word from="entreterimento" to="entretenimento" />
<Word from="entreti-me" to="entretive-me" />
<Word from="envólucro" to="invólucro" />
<Word from="erói" to="herói" />
<Word from="escluir" to="excluir" />
<Word from="esclusão" to="exclusão" />
<Word from="escrivões" to="escrivães" />
<Word from="esqueiro" to="isqueiro" />
<Word from="esquesito" to="esquisito" />
<Word from="estacoes" to="estações" />
<Word from="esteje" to="esteja" />
<Word from="excavação" to="escavação" />
<Word from="excavar" to="escavar" />
<Word from="exdrúxula" to="esdrúxula" />
<Word from="exdrúxulas" to="esdrúxulas" />
<Word from="exitar" to="hesitar" />
<Word from="explicacoes" to="explicações" />
<Word from="exquisito" to="esquisito" />
<Word from="extende" to="estende" />
<Word from="extender" to="estender" />
<Word from="fàcilmenfe" to="facilmente" />
<Word from="fàcilmente" to="facilmente" />
<Word from="fariam-lhe" to="far-lhe-iam" />
<Word from="FARMÁClAS" to="FARMÁCIAS" />
<Word from="farmecêutico" to="farmacêutico" />
<Word from="fassa" to="faça" />
<Word from="fébre" to="febre" />
<Word from="fecula" to="fécula" />
<Word from="fémea" to="fêmea" />
<Word from="femenino" to="feminino" />
<Word from="femininismo" to="feminismo" />
<Word from="físiologista" to="fisiologista" />
<Word from="fizémos" to="fizemos" />
<Word from="fizes-te" to="fizeste" />
<Word from="flôr" to="flor" />
<Word from="forão" to="foram" />
<Word from="formalisar" to="formalizar" />
<Word from="fôro" to="foro" />
<Word from="fos-te" to="foste" />
<Word from="fragância" to="fragrância" />
<Word from="françês" to="francês" />
<Word from="frasqutnho" to="frasquinho" />
<Word from="frustado" to="frustrado" />
<Word from="furá" to="furar" />
<Word from="gaz" to="gás" />
<Word from="gáz" to="gás" />
<Word from="geito" to="jeito" />
<Word from="geneceu" to="gineceu" />
<Word from="geropiga" to="jeropiga" />
<Word from="glicémia" to="glicemia" />
<Word from="gorgeta" to="gorjeta" />
<Word from="grangear" to="granjear" />
<Word from="guizar" to="guisar" />
<Word from="hectar" to="hectare" />
<Word from="herméticamente" to="hermeticamente" />
<Word from="hernia" to="hérnia" />
<Word from="higiéne" to="higiene" />
<Word from="hilariedade" to="hilaridade" />
<Word from="hiperacídez" to="hiperacidez" />
<Word from="hontem" to="ontem" />
<Word from="igiene" to="higiene" />
<Word from="igienico" to="higiénico" />
<Word from="igiénico" to="higiénico" />
<Word from="igreija" to="igreja" />
<Word from="iguasu" to="iguaçu" />
<Word from="ilacção" to="ilação" />
<Word from="imbigo" to="umbigo" />
<Word from="impecilho" to="empecilho" />
<Word from="íncas" to="incas" />
<Word from="incêsto" to="incesto" />
<Word from="inclusivé" to="inclusive" />
<Word from="incômodos" to="incómodos" />
<Word from="incontestávelmente" to="incontestavelmente" />
<Word from="incontestàvelmente" to="incontestavelmente" />
<Word from="indespensáveis" to="indispensáveis" />
<Word from="indespensável" to="indispensável" />
<Word from="India" to="Índia" />
<Word from="indiguinação" to="indignação" />
<Word from="indiguinado" to="indignado" />
<Word from="indiguinar" to="indignar" />
<Word from="inflacção" to="inflação" />
<Word from="ingreja" to="igreja" />
<Word from="INSCRICOES" to="INSCRIÇÕES" />
<Word from="intensão" to="intenção" />
<Word from="intertido" to="entretido" />
<Word from="intoxica" to="Intoxica" />
<Word from="intrega" to="entrega" />
<Word from="inverosímel" to="inverosímil" />
<Word from="iorgute" to="iogurte" />
<Word from="ipopótamo" to="hipopótamo" />
<Word from="ipsilon" to="ípsilon" />
<Word from="ipslon" to="ípsilon" />
<Word from="isquesito" to="esquisito" />
<Word from="juíz" to="juiz" />
<Word from="juiza" to="juíza" />
<Word from="júniores" to="juniores" />
<Word from="justanzente" to="justamente" />
<Word from="juz" to="jus" />
<Word from="kilo" to="quilo" />
<Word from="laboratório-porque" to="laboratório porque" />
<Word from="ladravaz" to="ladrava" />
<Word from="lamentàvelmente" to="lamentavelmente" />
<Word from="lampeão" to="lampião" />
<Word from="largartixa" to="lagartixa" />
<Word from="largarto" to="lagarto" />
<Word from="lêm" to="lêem" />
<Word from="leucémia" to="leucemia" />
<Word from="licensa" to="licença" />
<Word from="linguísta" to="linguista" />
<Word from="lisongear" to="lisonjear" />
<Word from="logista" to="lojista" />
<Word from="maçajar" to="massajar" />
<Word from="Macfadden-o" to="Macfadden o" />
<Word from="mae" to="mãe" />
<Word from="magestade" to="majestade" />
<Word from="mãgua" to="mágoa" />
<Word from="mangerico" to="manjerico" />
<Word from="mangerona" to="manjerona" />
<Word from="manteem-se" to="mantêm-se" />
<Word from="mantega" to="manteiga" />
<Word from="mantem-se" to="mantém-se" />
<Word from="massiço" to="maciço" />
<Word from="massisso" to="maciço" />
<Word from="médica-Rio" to="médica Rio" />
<Word from="menistro" to="ministro" />
<Word from="merciaria" to="mercearia" />
<Word from="metrelhadora" to="metralhadora" />
<Word from="miscegenação" to="miscigenação" />
<Word from="misogenia" to="misoginia" />
<Word from="misogeno" to="misógino" />
<Word from="misógeno" to="misógino" />
<Word from="mº" to="º" />
<Word from="môlho" to="molho" />
<Word from="monumentânea" to="momentânea" />
<Word from="mortandela" to="mortadela" />
<Word from="morteIa" to="mortela" />
<Word from="muinto" to="muito" />
<Word from="nasaias" to="nasais" />
<Word from="nêle" to="nele" />
<Word from="nest" to="neste" />
<Word from="Nivea" to="Nívea" />
<Word from="nonagessimo" to="nonagésimo" />
<Word from="nonagéssimo" to="nonagésimo" />
<Word from="nornal" to="normal" />
<Word from="notàvelmente" to="notavelmente" />
<Word from="obcessão" to="obsessão" />
<Word from="obesidae" to="obesidade" />
<Word from="óbviamente" to="obviamente" />
<Word from="òbviamente" to="obviamente" />
<Word from="ofecina" to="oficina" />
<Word from="oje" to="hoje" />
<Word from="omem" to="homem" />
<Word from="opcoes" to="opções" />
<Word from="opóbrio" to="opróbrio" />
<Word from="opróbio" to="opróbrio" />
<Word from="orfão" to="órfão" />
<Word from="organigrama" to="organograma" />
<Word from="organisar" to="organizar" />
<Word from="orgão" to="órgão" />
<Word from="orta" to="horta" />
<Word from="ótima" to="óptima" />
<Word from="ótimos" to="óptimos" />
<Word from="paralização" to="paralisação" />
<Word from="paralizado" to="paralisado" />
<Word from="paralizar" to="paralisar" />
<Word from="paráste" to="paraste" />
<Word from="Pátria" to="pátria" />
<Word from="paúl" to="Paul" />
<Word from="pecalço" to="percalço" />
<Word from="pêga" to="pega" />
<Word from="periodo" to="período" />
<Word from="pertubar" to="perturbar" />
<Word from="perú" to="peru" />
<Word from="piqueno" to="pequeno" />
<Word from="pirinéus" to="Pirenéus" />
<Word from="poblema" to="problema" />
<Word from="pobrema" to="problema" />
<Word from="poden" to="podem" />
<Word from="poder-mos" to="pudermos" />
<Word from="ponteagudo" to="pontiagudo" />
<Word from="pontuacoes" to="pontuações" />
<Word from="prazeiroso" to="prazeroso" />
<Word from="precaridade" to="precariedade" />
<Word from="precizar" to="precisar" />
<Word from="preserverança" to="perseverança" />
<Word from="previlégio" to="privilégio" />
<Word from="primária-que" to="primária que" />
<Word from="priúdo" to="período" />
<Word from="probalidade" to="probabilidade" />
<Word from="progreso" to="progresso" />
<Word from="proibído" to="proibido" />
<Word from="proíbido" to="proibido" />
<Word from="própia" to="própria" />
<Word from="propiedade" to="propriedade" />
<Word from="propio" to="próprio" />
<Word from="própio" to="próprio" />
<Word from="provocacoes" to="provocações" />
<Word from="prsença" to="presença" />
<Word from="prustituta" to="prostituta" />
<Word from="pudérmos" to="pudermos" />
<Word from="púlico" to="público" />
<Word from="pús" to="pus" />
<Word from="pusémos" to="pusemos" />
<Word from="quadricomia" to="quadricromia" />
<Word from="quadriplicado" to="quadruplicado" />
<Word from="quaisqueres" to="quaisquer" />
<Word from="quer-a" to="quere-a" />
<Word from="quere-se" to="quer-se" />
<Word from="quer-o" to="quere-o" />
<Word from="químco" to="químico" />
<Word from="quises-te" to="quiseste" />
<Word from="quizer" to="quiser" />
<Word from="quizeram" to="quiseram" />
<Word from="quizesse" to="quisesse" />
<Word from="quizessem" to="quisessem" />
<Word from="raínha" to="rainha" />
<Word from="raíz" to="raiz" />
<Word from="raizes" to="raízes" />
<Word from="ratato" to="retrato" />
<Word from="raúl" to="raul" />
<Word from="razar" to="rasar" />
<Word from="rectaguarda" to="retaguarda" />
<Word from="rédia" to="rédea" />
<Word from="reestabelecer" to="restabelecer" />
<Word from="refeicoes" to="refeições" />
<Word from="refêrencia" to="referência" />
<Word from="regeitar" to="rejeitar" />
<Word from="regurjitar" to="regurgitar" />
<Word from="reinvidicação" to="reivindicação" />
<Word from="reinvidicar" to="reivindicar" />
<Word from="requer-a" to="requere-a" />
<Word from="requere-se" to="requer-se" />
<Word from="requer-o" to="requere-o" />
<Word from="requesito" to="requisito" />
<Word from="requisicoes" to="requisições" />
<Word from="RESIDENCIA" to="RESIDÊNCIA" />
<Word from="respiraçáo" to="respiração" />
<Word from="restablecer" to="restabelecer" />
<Word from="réstea" to="réstia" />
<Word from="ruborisar" to="ruborizar" />
<Word from="rúbrica" to="rubrica" />
<Word from="sàdia" to="sadia" />
<Word from="saiem" to="saem" />
<Word from="salchicha" to="salsicha" />
<Word from="salchichas" to="salsichas" />
<Word from="saloice" to="saloiice" />
<Word from="salvé" to="salve" />
<Word from="salve-raínha" to="salve-rainha" />
<Word from="salvé-rainha" to="salve-rainha" />
<Word from="salvé-raínha" to="salve-rainha" />
<Word from="sao" to="são" />
<Word from="sargeta" to="sarjeta" />
<Word from="seções" to="secções" />
<Word from="seija" to="seja" />
<Word from="seissentos" to="seiscentos" />
<Word from="seje" to="seja" />
<Word from="semiar" to="semear" />
<Word from="séniores" to="seniores" />
<Word from="sensibilidadc" to="sensibilidade" />
<Word from="sensívelmente" to="sensivelmente" />
<Word from="setessentos" to="setecentos" />
<Word from="siclano" to="sicrano" />
<Word from="Sifilis" to="Sífilis" />
<Word from="sifílis" to="sífilis" />
<Word from="sinão" to="senão" />
<Word from="sinmtoma" to="sintoma" />
<Word from="sintéticamente" to="sinteticamente" />
<Word from="sintetisa" to="sintetiza" />
<Word from="SÓ" to="só" />
<Word from="sôfra" to="sofra" />
<Word from="sôfregamente" to="sofregamente" />
<Word from="somáste" to="somaste" />
<Word from="sombracelha" to="sobrancelha" />
<Word from="sombrancelha" to="sobrancelha" />
<Word from="sombrancelhas" to="sobrancelhas" />
<Word from="suavisar" to="suavizar" />
<Word from="substituido" to="substituído" />
<Word from="suburbio" to="subúrbio" />
<!-- <Word from="sues" to="seus" /> sues existe "Cuidado, não sues muito." -->
<Word from="suI" to="sul" />
<Word from="Suiça" to="Suíça" />
<Word from="suiças" to="suíças" />
<Word from="suiço" to="suíço" />
<Word from="suiços" to="suíços" />
<Word from="supôr" to="supor" />
<Word from="tabeliões" to="tabeliães" />
<Word from="taínha" to="tainha" />
<Word from="tava" to="estava" />
<Word from="têem" to="têm" />
<Word from="telemovel" to="telemóvel" />
<Word from="telémovel" to="telemóvel" />
<Word from="terminacoes" to="terminações" />
<Word from="toráxico" to="torácico" />
<Word from="tou" to="estou" />
<Word from="transpôr" to="transpor" />
<Word from="trasnporte" to="transporte" />
<Word from="tumors" to="tumores" />
<Word from="úmida" to="húmida" />
<Word from="umidade" to="unidade" />
<Word from="vai-vem" to="vaivém" />
<Word from="vegilância" to="vigilância" />
<Word from="vegilante" to="vigilante" />
<Word from="ventoínha" to="ventoinha" />
<Word from="verosímel" to="verosímil" />
<Word from="video" to="vídeo" />
<Word from="virus" to="vírus" />
<Word from="visiense" to="viseense" />
<Word from="voçe" to="você" />
<Word from="voçê" to="você" />
<Word from="vôo" to="voo" />
<Word from="xadrês" to="xadrez" />
<Word from="xafariz" to="chafariz" />
<Word from="xéxé" to="xexé" />
<Word from="xilindró" to="chilindró" />
<Word from="zaíre" to="Zaire" />
<Word from="zepelin" to="zepelim" />
<Word from="zig-zag" to="ziguezague" />
<Word from="zoô" to="zoo" />
<Word from="zôo" to="zoo" />
<Word from="zuar" to="zoar" />
<Word from="zum-zum" to="zunzum" />
</WholeWords>
<PartialWordsAlways />
<PartialWords />
<PartialLines>
<LinePart from="IN 6-E" to="N 6 E" />
<LinePart from="in tegrar-se" to="integrar-se" />
<LinePart from="in teresse" to="interesse" />
<LinePart from="in testinos" to="intestinos" />
<LinePart from="indica ção" to="indicação" />
<LinePart from="inte tino" to="intestino" />
<LinePart from="intes tinos" to="intestinos" />
<LinePart from="L da" to="Lda" />
<LinePart from="mal estar" to="mal-estar" />
<LinePart from="mastiga çáo" to="mastigação" />
<LinePart from="médi cas" to="médicas" />
<LinePart from="mineo rais" to="minerais" />
<LinePart from="mola res" to="molares" />
<LinePart from="movi mentos" to="movimentos" />
<LinePart from="movimen to" to="movimento" />
<LinePart from="N 5-Estendido" to="Nº 5 Estendido" />
<LinePart from="oxigé nio" to="oxigénio" />
<LinePart from="pod mos" to="podemos" />
<LinePart from="poder-se ia" to="poder-se-ia" />
<LinePart from="pos sibilidade" to="possibilidade" />
<LinePart from="possibi lidades" to="possibilidades" />
<LinePart from="pro duto" to="produto" />
<LinePart from="procu rar" to="procurar" />
<LinePart from="Q u e" to="Que" />
<LinePart from="qualifi cam" to="qualificam" />
<LinePart from="R egião" to="Região" />
<LinePart from="unsuficien temente" to="insuficientemente" />
</PartialLines>
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions>
<!-- <RegEx find="\bi\b" replaceWith="I" /> just an example - do not use this regex -->
<RegEx find="([0-9]) +º" replaceWith="$1º" />
<RegEx find="\Bcao\b" replaceWith="ção" />
<RegEx find="\Bcoes\b" replaceWith="ções" />
<!-- <RegEx find="\Bccao\b" replaceWith="cção" /> não faz sentido ter este e ter a linha de cima -->
<!-- <RegEx find="\Bccoes\b" replaceWith="cções" /> não faz sentido ter este e ter a linha de cima -->
<RegEx find="\b(m|M)ae\b" replaceWith="$1ãe" />
<RegEx find="\Bdmnis\B" replaceWith="dminis" />
<RegEx find="\Blcól\B" replaceWith="lcoól" />
<RegEx find="\b(t|T)a[nm]b[eé]m\b" replaceWith="$1ambém" />
<RegEx find="\bzeppeli[mn]\b" replaceWith="zepelim" />
<RegEx find="\b(s|S)ufe?ciente\b" replaceWith="$1uficiente" />
<RegEx find="\b(n|N)ao\b" replaceWith="$1ão" />
<RegEx find="\b(B|b)elem\b" replaceWith="$1elém" />
<RegEx find="\b(s|S)u[íi]sso(s)?\b" replaceWith="$1uíço$2" />
<RegEx find="\b(s|S)u[íi]ssa(s)?\b" replaceWith="$1uíça$2" />
<RegEx find="\b(p|P)rivelig[ie]\p{Ll}d" replaceWith="$1rivelegiad" />
<RegEx find="\bpud(?:és|e-)se\b" replaceWith="pudesse" />
<RegEx find="\biquilíbr(?:e|i)o\b" replaceWith="equilíbrio" />
<RegEx find="\b(c|C)orregi\B" replaceWith="$1orrigid" />
<RegEx find="(?&lt;=A|a)ssociacao" replaceWith="ssociação" />
<RegEx find="(?&lt;=N|n)inguem" replaceWith="inguém" />
<RegEx find="(?&lt;=g|G)rat(?:uí|úi)to" replaceWith="ratuito" />
<RegEx find="(?&lt;=d|D)esiquilíbr[ei]o" replaceWith="esequilíbrio" />
<RegEx find="\b[k|K]il(ogramas?|ómetros?)" replaceWith="qui$1" />
</RegularExpressions>
</OCRFixReplaceList>
@@ -0,0 +1,257 @@
<OCRFixReplaceList>
<WholeWords>
<Word from="НЄЙ" to="НЕЙ" />
<Word from="ОРГЗНИЗМОБ" to="ОРГАНИЗМА" />
<Word from="Чї0" to="ЧТО" />
<Word from="НЭ" to="НА" />
<Word from="СОСЄДНЮЮ" to="СОСЕДНЮЮ" />
<Word from="ПЛЗНЄТУ" to="ПЛАНЕТУ" />
<Word from="ЗЗГЭДОК" to="ЗАГАДОК" />
<Word from="СОТВОРЄНИЯ" to="СОТВОРЕНИЯ" />
<Word from="МИРЭ" to="МИРА" />
<Word from="ПОЯБЛЄНИЯ" to="ПОЯВЛЕНИЯ" />
<Word from="ЗЄМЛЄ" to="ЗЕМЛЕ" />
<Word from="ЄЩЄ" to="ЕЩЁ" />
<Word from="ТЄМНЬІХ" to="ТЕМНЫХ" />
<Word from="СЄРЬЄЗНЬІМ" to="СЕРЬЕЗНЫМ" />
<Word from="ПОШІІ0" to="ПОШЛО" />
<Word from="Пр0ИЗ0ШЄЛ" to="ПРОИЗОШЕЛ" />
<Word from="СЄКРЄТЭМИ" to="СЕКРЕТАМИ" />
<Word from="МЭТЄРИЗЛЬІ" to="МАТЕРИАЛЫ" />
<Word from="ПЯТЄН" to="ПЯТЕН" />
<Word from="ПЛаНЄїЄ" to="ПЛАНЕТЕ" />
<Word from="КЗТЭКЛИЗМ" to="КАТАКЛИЗМ" />
<Word from="ОКЗЗЗЛСЯ" to="ОКАЗАЛСЯ" />
<Word from="ДЭЛЬШЕ" to="ДАЛЬШЕ" />
<Word from="ТВК" to="ТАК" />
<Word from="ПЛЗНЄТЗ" to="ПЛАНЕТА" />
<Word from="ЧЄГО" to="ЧЕГО" />
<Word from="УЗНЭТЬ" to="УЗНАТЬ" />
<Word from="ПЛЭНЄТЄ" to="ПЛАНЕТЕ" />
<Word from="НЄМ" to="НЕМ" />
<Word from="БОЗМОЖНЗ" to="ВОЗМОЖНА" />
<Word from="СОБЄРШЄННО" to="СОВЕРШЕННО" />
<Word from="ИНЭЧЄ" to="ИНАЧЕ" />
<Word from="БСЄ" to="ВСЕ" />
<Word from="НЕДОСТЗТКИ" to="НЕДОСТАТКИ" />
<Word from="НОВЬІЄ" to="НОВЫЕ" />
<Word from="ВЄЛИКОЛЄПНЭЯ" to="ВЕЛИКОЛЕПНАЯ" />
<Word from="ОСТЭІІОСЬ" to="ОСТАЛОСЬ" />
<Word from="НЗЛИЧИЄ" to="НАЛИЧИЕ" />
<Word from="бЫ" to="бы" />
<Word from="ПРОЦВЕТВТЬ" to="ПРОЦВЕТАТЬ" />
<Word from="КЗК" to="КАК" />
<Word from="ВОДЗ" to="ВОДА" />
<Word from="НЗШЕЛ" to="НАШЕЛ" />
<Word from="НЄ" to="НЕ" />
<Word from="ТОЖЄ" to="ТОЖЕ" />
<Word from="ВУЛКЭНИЧЄСКОЙ" to="ВУЛКАНИЧЕСКОЙ" />
<Word from="ЭКТИБНОСТИ" to="АКТИВНОСТИ" />
<Word from="ПОЯВИЛЗСЬ" to="ПОЯВИЛАСЬ" />
<Word from="НОВЗЯ" to="НОВАЯ" />
<Word from="СТРЭТЄГИЯ" to="СТРАТЕГИЯ" />
<Word from="УСПЄШН0" to="УСПЕШНО" />
<Word from="ПОСЗДКУ" to="ПОСАДКУ" />
<Word from="ГОТОБЫ" to="ГОТОВЫ" />
<Word from="НЗЧЗТЬ" to="НАЧАТЬ" />
<Word from="ОХОТЭ" to="ОХОТА" />
<Word from="ПРИЗНЗКЗМИ" to="ПРИЗНАКАМИ" />
<Word from="Пр0ШЛОМ" to="ПРОШЛОМ" />
<Word from="НЭСТОЯЩЄМ" to="НАСТОЯЩЕМ" />
<Word from="ПУСТОТЗХ" to="ПУСТОТАХ" />
<Word from="БЛЗЖНОЙ" to="ВЛАЖНОЙ" />
<Word from="ПОЧБЄ" to="ПОЧВЕ" />
<Word from="МЬІ" to="МЫ" />
<Word from="СЄЙЧЗС" to="СЕЙЧАС" />
<Word from="ЄСЛИ" to="ЕСЛИ" />
<Word from="ЗЗТРОНЕМ" to="ЗАТРОНЕМ" />
<Word from="ОПЗСЗЄМСЯ" to="ОПАСАЕМСЯ" />
<Word from="СИЛЬН0" to="СИЛЬНО" />
<Word from="ОТЛИЧЗЄТСЯ" to="ОТЛИЧАЕТСЯ" />
<Word from="РЭНЬШЄ" to="РАНЬШЕ" />
<Word from="НЗЗЬІВЗЮТ" to="НАЗЫВАЮТ" />
<Word from="ТЄКЛ3" to="ТЕКЛА" />
<Word from="ОСЗДОЧНЫМИ" to="ОСАДОЧНЫМИ" />
<Word from="ПОСТЄПЄНН0" to="ПОСТЕПЕННО" />
<Word from="ИСПЭРЯЛЗСЬ" to="ИСПАРЯЛАСЬ" />
<Word from="ЄОЛЬШОЄ" to="БОЛЬШОЕ" />
<Word from="КОЛИЧЄСТБО" to="КОЛИЧЕСТВО" />
<Word from="ГЄМЗТИТЕ" to="ГЕМАТИТА" />
<Word from="ПОЛУЧЭЄТ" to="ПОЛУЧАЕТ" />
<Word from="НЄДОСТЗЧН0" to="НЕДОСТАТОЧНО" />
<Word from="ПИТЭНИЯ" to="ПИТАНИЯ" />
<Word from="ПОКЗ" to="ПОКА" />
<Word from="БЬІХОДИЛИ" to="ВЫХОДИЛИ" />
<Word from="ЗЄМІІЄ" to="ЗЕМЛЕ" />
<Word from="ВЄСЬІИЗ" to="ВЕСЬМА" />
<Word from="ЗЄМЛИ" to="ЗЕМЛИ" />
<Word from="бЬІЛО" to="БЫЛО" />
<Word from="КИЗНИ" to="ЖИЗНИ" />
<Word from="СТЗНОВИЛЗСЬ" to="СТАНОВИЛАСЬ" />
<Word from="СОЛЄНЄЄ" to="СОЛЁНЕЕ" />
<Word from="МЭГНИТНЫМ" to="МАГНИТНЫМ" />
<Word from="ЧТОбЬІ" to="ЧТОБЫ" />
<Word from="СОЗДЕТЬ" to="СОЗДАТЬ" />
<Word from="МЗГНИТНОЄ" to="МАГНИТНОЕ" />
<Word from="КЭЖУТСЯ" to="КАЖУТСЯ" />
<Word from="ОЗНЗЧЗЄТ" to="ОЗНАЧАЕТ" />
<Word from="МОГЛЗ" to="МОГЛА" />
<Word from="ИМЄТЬ" to="ИМЕТЬ" />
<Word from="КОСМОСЭ" to="КОСМОСА" />
<Word from="СОЛНЄЧНЗЯ" to="СОЛНЕЧНАЯ" />
<Word from="СИСТЄМЗ" to="СИСТЕМА" />
<Word from="ПОСІІУЖИЛО" to="ПОСЛУЖИЛО" />
<Word from="МЗГНИТНОГО" to="МАГНИТНОГО" />
<Word from="ПЛВНЄТЫ" to="ПЛАНЕТЫ" />
<Word from="ЛОКЗЛЬНЬІХ" to="ЛОКАЛЬНЫХ" />
<Word from="ПОЛЄЙ" to="ПОЛЕЙ" />
<Word from="КЗЖУТСЯ" to="КАЖУТСЯ" />
<Word from="КЗКОГО" to="КАКОГО" />
<Word from="СТРЗШНОГО" to="СТРАШНОГО" />
<Word from="СТОЛКНОЕЄНИЯ" to="СТОЛКНОВЕНИЯ" />
<Word from="МЕСТЗМИ" to="МЕСТАМИ" />
<Word from="СДЄЛЗТЬ" to="СДЕЛАТЬ" />
<Word from="СТЗЛО" to="СТАЛО" />
<Word from="МЭГНИТНОГО" to="МАГНИТНОГО" />
<Word from="ЗЗКЛЮЧЗВШЄЙСЯ" to="ЗАКЛЮЧАВШЕЙСЯ" />
<Word from="ЄГО" to="ЕГО" />
<Word from="ЯДРЄ" to="ЯДРЕ" />
<Word from="НЗ" to="НА" />
<Word from="ИСЧЄЗЛ3" to="ИСЧЕЗЛА" />
<Word from="СЧИТЗЮ" to="СЧИТАЮ" />
<Word from="ШЭНСЫ" to="ШАНСЫ" />
<Word from="ИНЗЧЄ" to="ИНАЧЕ" />
<Word from="СТЗЛ" to="СТАЛ" />
<Word from="ТРЗТИТЬ" to="ТРАТИТЬ" />
<Word from="НЗПРЗВЛЯЄТСЯ" to="НАПРАВЛЯЕТСЯ" />
<Word from="ОБЛЭСТИ" to="ОБЛАСТИ" />
<Word from="ЯВЛЯІОТСЯ" to="ЯВЛЯЮТСЯ" />
<Word from="ГЛЭВНОЙ" to="ГЛАВНОЙ" />
<Word from="ДОКЗЗЗТЄЛЬСТВ" to="ДОКАЗАТЕЛЬСТВ" />
<Word from="КИСЛОТЭМИ" to="КИСЛОТАМИ" />
<Word from="ОНЭ" to="ОНА" />
<Word from="ПРЗКТИЧЄСКИ" to="ПРАКТИЧЕСКИ" />
<Word from="ЛЄСУ" to="ЛЕСУ" />
<Word from="УСЛОБИЯМ" to="УСЛОВИЯМ" />
<Word from="СПЗСТИСЬ" to="СПАСТИСЬ" />
<Word from="РЗЗВИВЗЮЩИЄСЯ" to="РАЗВИВАЮЩИЕСЯ" />
<Word from="ШЭПКИ" to="ШАПКИ" />
<Word from="ЗНЗЄМ" to="ЗНАЕМ" />
<Word from="СООИРЭЄМСЯ" to="СОБИРАЕМСЯ" />
<Word from="БЫЯСНИТЬ" to="ВЫЯСНИТЬ" />
<Word from="СЗМ" to="САМ" />
<Word from="РЗСПОЗНЗТЬ" to="РАСПОЗНАТЬ" />
<Word from="УЗНЗТЬ" to="УЗНАТЬ" />
<Word from="КЭЖЄТСЯ" to="КАЖЕТСЯ" />
<Word from="ОРЄИТЗЛЬНЬІЄ" to="ОРБИТАЛЬНЫЕ" />
<Word from="ЛЄТЭТЄЛЬНЬІЄ" to="ЛЕТАТЕЛЬНЫЕ" />
<Word from="ЗППЗРЕТЬІ" to="АППАРАТЫ" />
<Word from="ЖЄ" to="ЖЕ" />
<Word from="ТЗКЗЯ" to="ТАКАЯ" />
<Word from="МЗЛЄНЬКЗЯ" to="МАЛЕНЬКАЯ" />
<Word from="ПЛЭНЄТЗ" to="ПЛАНЕТА" />
<Word from="СПЗІІЬКО" to="СТОЛЬКО" />
<Word from="бЬІЛ3" to="БЫЛА" />
<Word from="ЁЕСЧИСЛЄННОЄ" to="БЕСЧИСЛЕННОЕ" />
<Word from="МЗГНИїНЬІХ" to="МАГНИТНЫХ" />
<Word from="ПОСТраД3Л" to="ПОСТРАДАЛ" />
<Word from="ДЗЖЄ" to="ДАЖЕ" />
<Word from="РЗЗНЬІМИ" to="РАЗНЫМИ" />
<Word from="СУЩЄСТБОВЭНИЄ" to="СУЩЕСТВОВАНИЕ" />
<Word from="ПЛаНЄїЬІ" to="ПЛАНЕТЫ" />
<Word from="ПОДВЄРГЛЗСЬ" to="ПОДВЕРГЛАСЬ" />
<Word from="ОПЗСІ-ІОСТИ" to="ОПАСНОСТИ" />
<Word from="ПЛЗНЄТЄ" to="ПЛАНЕТЕ" />
<Word from="Н0" to="НО" />
<Word from="бЬІ" to="БЫ" />
<Word from="ОТДЗЛЄННЫЄ" to="ОТДАЛЁННЫЕ" />
<Word from="ПОЛЯРНЬІЄ" to="ПОЛЯРНЫЕ" />
<Word from="ЦЄЛЬІ-О" to="ЦЕЛЬЮ" />
<Word from="ПЄЩЄРЗХ" to="ПЕЩЕРАХ" />
<Word from="НЗПОЛНЄННЬІХ" to="НАПОЛНЕННЫХ" />
<Word from="ИСПЗРЄНИЯМИ" to="ИСПАРЕНИЯМИ" />
<Word from="МИНИЗТЮРНЬІЄ" to="МИНИАТЮРНЫЕ" />
<Word from="ТЭКЗЯ" to="ТАКАЯ" />
<Word from="ПрИСП0СОбИТЬСЯ" to="ПРИСПОСОБИТЬСЯ" />
<Word from="НЄОЄХОДИМЬІЄ" to="НЕОБХОДИМЫЕ" />
<Word from="ОРГВНИЧЄСКИЄ" to="ОРГАНИЧЕСКИЕ" />
<Word from="МЗРСИЗНСКИЄ" to="МАРСИАНСКИЕ" />
<Word from="МЄСТЄ" to="МЕСТЕ" />
<Word from="І\/ІАККЕЙШ" to="МАККЕЙН" />
<Word from="НЗХОДЯЩИЄСЯ" to="НАХОДЯЩИЕСЯ" />
<Word from="НЄЗКТИВНОМ" to="НЕАКТИВНОМ" />
<Word from="ЗЭСНЯТЬ" to="ЗАСНЯТЬ" />
<Word from="ОРГЗНИЗМЬІ" to="ОРГАНИЗМЫ" />
<Word from="ВЗЕИМОДЄЙСТВОВЕТЬ" to="ВЗАИМОДЕЙСТВОВАТЬ" />
<Word from="ПУТЄШЄСТБИЄ" to="ПУТЕШЕСТВИЕ" />
<Word from="ПуСїЬІННЫХ" to="ПУСТЫННЫХ" />
<Word from="ТЗКИХ" to="ТАКИХ" />
<Word from="ПЄРЄТЗСКИВЗЄМ" to="ПЕРЕТАСКИВАЕМ" />
<Word from="ЧТ0" to="ЧТО" />
<Word from="ВЄСЬМЗ" to="ВЕСЬМА" />
<Word from="ПОЛОСЗМИ" to="ПОЛОСАМИ" />
<Word from="ОрїЭНИЗМЬІ" to="ОРГАНИЗМЫ" />
<Word from="ОЁЛЗСТИ" to="ОБЛАСТИ" />
<Word from="ЯБЛЯЮТСЯ" to="ЯВЛЯЮТСЯ" />
<Word from="ЦЄЛЬЮ" to="ЦЕЛЬЮ" />
<Word from="ПОИСКОБ" to="ПОИСКОВ" />
<Word from="ДОКЗЗЗТЄІІЬСТВ" to="ДОКАЗАТЕЛЬСТВ" />
<Word from="МОЖЄТ" to="МОЖЕТ" />
<Word from="НЭХОДИТЬСЯ" to="НАХОДИТЬСЯ" />
<Word from="ОЧЄНЬ" to="ОЧЕНЬ" />
<Word from="СРЗВНИТЬ" to="СРАВНИТЬ" />
<Word from="ОЄНЗРУЖИЛ" to="ОБНАРУЖИЛ" />
<Word from="ЛЬДЗ" to="ЛЬДА" />
<Word from="ПОТЄПЛЄНИЄІИ" to="ПОТЕПЛЕНИЕМ" />
<Word from="ПОХОЛОДЗНИЄБД" to="ПОХОЛОДАНИЕМ" />
<Word from="КЭК" to="КАК" />
<Word from="ТЄЛО" to="ТЕЛО" />
<Word from="бОЛЬШЄ" to="БОЛЬШЕ" />
<Word from="НЭКЛОНЯЄТСЯ" to="НАКЛОНЯЕТСЯ" />
<Word from="СОІІНЦУ" to="СОЛНЦУ" />
<Word from="СТ3бИЛИЗИрОБЗТЬ" to="СТАБИЛИЗИРОВАТЬ" />
<Word from="СТЭБИЛЬНЭ" to="СТАБИЛЬНА" />
<Word from="МИЛІІИОНОВ" to="МИЛЛИОНОВ" />
<Word from="НЗЗЭД" to="НАЗАД" />
<Word from="ТЄПЛ0" to="ТЕПЛО" />
<Word from="ПОІІЯРНЫХ" to="ПОЛЯРНЫХ" />
<Word from="СОІІЕНЫМИ" to="СОЛЕНЫМИ" />
<Word from="КЕКИМИ" to="КАКИМИ" />
<Word from="кислютнюсггь" to="кислотность" />
<Word from="ТЗМ" to="ТАМ" />
<Word from="ОРГЗНИЗМЫ" to="ОРГАНИЗМЫ" />
<Word from="СУЩЄСТВОВЄТЬ" to="СУЩЕСТВОВАТЬ" />
<Word from="ВНИМЗНИЄ" to="ВНИМАНИЕ" />
<Word from="СДЄЛЗЄТ" to="СДЕЛАЕТ" />
<Word from="ПОЗНЭКОМИТЬСЯ" to="ПОЗНАКОМИТЬСЯ" />
<Word from="НЭШИМ" to="НАШИМ" />
<Word from="ДОКЗЗЭТЄЛЬСТБО" to="ДОКАЗАТЕЛЬСТВО" />
<Word from="ЩЗЗЩЄНИЯ" to="ВРАЩЕНИЯ" />
<Word from="бЬІЛ0" to="БЫЛО" />
<Word from="ОЄЛЕСТЯХ" to="ОБЛАСТЯХ" />
<Word from="бЬІЛИ" to="БЫЛИ" />
<Word from="РЭЗМЬІШЛЯІІИ" to="РАЗМЫШЛЯЛИ" />
<Word from="КОЛИЧЄСТБЄ" to="КОЛИЧЕСТВЕ" />
<Word from="ЩЄІІОЧНЫЄ" to="ЩЕЛОЧНЫЕ" />
<Word from="НЄКОТЩЗЬІЄ" to="НЕКОТОРЫЕ" />
<Word from="ПрИБІ1ЕКуї" to="ПРИВЛЕКУТ" />
<Word from="НЗЗЬІВЭЄМЫЄ" to="НАЗЫВАЕМЫЕ" />
<Word from="Чї06Ы" to="ЧТОБЫ" />
</WholeWords>
<PartialWordsAlways />
<PartialWords>
<WordPart from="Є" to="Е" />
<WordPart from="ЬІ" to="Ы" />
<WordPart from="КЗ" to="КА" />
<WordPart from="ЛЗ" to="ЛА" />
<WordPart from="НЗ" to="НА" />
<WordPart from="ШЗ" to="ША" />
<WordPart from="І\/І" to="М" />
</PartialWords>
<PartialLines />
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions />
</OCRFixReplaceList>
@@ -0,0 +1,946 @@
<OCRFixReplaceList>
<WholeWords>
<!-- Abreviaturas simples -->
<Word from="KBs" to="kB" />
<Word from="Vd" to="Ud" />
<Word from="N°" to="N.°" />
<Word from="n°" to="n.°" />
<Word from="nro." to="n.°" />
<Word from="Nro." to="N.°" />
<!-- Ortografía básica -->
<Word from="aca" to="acá" />
<Word from="actuas" to="actúas" />
<Word from="actues" to="actúes" />
<Word from="adios" to="adiós" />
<Word from="agarrenla" to="agárrenla" />
<Word from="agarrenlo" to="agárrenlo" />
<Word from="agarrandose" to="agarrándose" />
<Word from="algun" to="algún" />
<Word from="alli" to="allí" />
<Word from="alla" to="allá" />
<Word from="alejate" to="aléjate" />
<Word from="ahi" to="ahí" />
<Word from="angel" to="ángel" />
<Word from="angeles" to="ángeles" />
<Word from="apagala" to="apágala" />
<Word from="aqui" to="aquí" />
<Word from="asi" to="así" />
<Word from="bahia" to="bahía" />
<Word from="busqueda" to="búsqueda" />
<Word from="busquedas" to="búsquedas" />
<Word from="callate" to="cállate" />
<Word from="carcel" to="cárcel" />
<Word from="camara" to="cámara" />
<Word from="caido" to="caído" />
<Word from="cabron" to="cabrón" />
<Word from="camion" to="camión" />
<Word from="codigo" to="código" />
<Word from="codigos" to="códigos" />
<Word from="comence" to="comencé" />
<Word from="comprate" to="cómprate" />
<Word from="consegui" to="conseguí" />
<Word from="confias" to="confías" />
<Word from="convertira" to="convertirá" />
<Word from="corazon" to="corazón" />
<Word from="crei" to="creí" />
<Word from="creia" to="creía" />
<Word from="creido" to="creído" />
<Word from="creiste" to="creíste" />
<Word from="cubrenos" to="cúbrenos" />
<Word from="comio" to="comió" />
<Word from="dara" to="dará" />
<Word from="dia" to="día" />
<Word from="dias" to="días" />
<Word from="debio" to="debió" />
<Word from="demelo" to="démelo" />
<Word from="dimelo" to="dímelo" />
<Word from="denoslo" to="dénoslo" />
<Word from="deselo" to="déselo" />
<Word from="decia" to="decía" />
<Word from="decian" to="decían" />
<Word from="detras" to="detrás" />
<Word from="deberia" to="debería" />
<Word from="deberas" to="deberás" />
<Word from="deberias" to="deberías" />
<Word from="deberian" to="deberían" />
<Word from="deberiamos" to="deberíamos" />
<Word from="dejame" to="déjame" />
<Word from="dejate" to="déjate" />
<Word from="dejalo" to="déjalo" />
<Word from="dejarian" to="dejarían" />
<Word from="damela" to="dámela" />
<Word from="despues" to="después" />
<Word from="diciendome" to="diciéndome" />
<Word from="dificil" to="difícil" />
<Word from="dificiles" to="difíciles" />
<Word from="disculpate" to="discúlpate" />
<Word from="dolares" to="dólares" />
<Word from="hechar" to="echar" />
<Word from="examenes" to="exámenes" />
<Word from="empezo" to="empezó" />
<Word from="empujon" to="empujón" />
<Word from="empujalo" to="empújalo" />
<Word from="escondanme" to="escóndanme" />
<Word from="esperame" to="espérame" />
<Word from="estara" to="estará" />
<Word from="estare" to="estaré" />
<Word from="estaria" to="estaría" />
<Word from="estan" to="están" />
<Word from="estaran" to="estarán" />
<Word from="estabamos" to="estábamos" />
<Word from="estuvieramos" to="estuviéramos" />
<Word from="exito" to="éxito" />
<Word from="facil" to="fácil" />
<Word from="fiscalia" to="fiscalía" />
<Word from="fragil" to="frágil" />
<Word from="fragiles" to="frágiles" />
<Word from="frances" to="francés" />
<Word from="gustaria" to="gustaría" />
<Word from="habia" to="había" />
<Word from="habias" to="habías" />
<Word from="habian" to="habían" />
<Word from="habrian" to="habrían" />
<Word from="habrias" to="habrías" />
<Word from="hagalo" to="hágalo" />
<Word from="haria" to="haría" />
<Word from="increible" to="increíble" />
<Word from="incredulo" to="incrédulo" />
<Word from="intentalo" to="inténtalo" />
<Word from="ire" to="iré" />
<Word from="jovenes" to="jóvenes" />
<Word from="ladron" to="ladrón" />
<Word from="linea" to="línea" />
<Word from="llamame" to="llámame" />
<Word from="llevalo" to="llévalo" />
<Word from="mama" to="mamá" />
<Word from="maricon" to="maricón" />
<Word from="mayoria" to="mayoría" />
<Word from="metodo" to="método" />
<Word from="metodos" to="métodos" />
<Word from="mio" to="mío" />
<Word from="mostro" to="mostró" />
<Word from="morira" to="morirá" />
<Word from="muevete" to="muévete" />
<Word from="murio" to="murió" />
<Word from="numero" to="número" />
<Word from="numeros" to="números" />
<Word from="ningun" to="ningún" />
<Word from="oido" to="oído" />
<Word from="oidos" to="oídos" />
<Word from="oimos" to="oímos" />
<Word from="oiste" to="oíste" />
<Word from="pasale" to="pásale" />
<Word from="pasame" to="pásame" />
<Word from="paraiso" to="paraíso" />
<Word from="parate" to="párate" />
<Word from="pense" to="pensé" />
<Word from="peluqueria" to="peluquería" />
<Word from="platano" to="plátano" />
<Word from="plastico" to="plástico" />
<Word from="plasticos" to="plásticos" />
<Word from="policia" to="policía" />
<Word from="policias" to="policías" />
<Word from="poster" to="póster" />
<Word from="podia" to="podía" />
<Word from="podias" to="podías" />
<Word from="podria" to="podría" />
<Word from="podrian" to="podrían" />
<Word from="podrias" to="podrías" />
<Word from="podriamos" to="podríamos" />
<Word from="prometio" to="prometió" />
<Word from="proposito" to="propósito" />
<Word from="pideselo" to="pídeselo" />
<Word from="ponganse" to="pónganse" />
<Word from="prometeme" to="prométeme" />
<Word from="publico" to="público" />
<Word from="publicos" to="públicos" />
<Word from="publicamente" to="públicamente" />
<Word from="quedate" to="quédate" />
<Word from="queria" to="quería" />
<Word from="querrias" to="querrías" />
<Word from="querian" to="querían" />
<Word from="rapido" to="rápido" />
<Word from="rapidamente" to="rápidamente" />
<Word from="razon" to="razón" />
<Word from="rehusen" to="rehúsen" />
<Word from="rie" to="ríe" />
<Word from="rias" to="rías" />
<Word from="rindete" to="ríndete" />
<Word from="sacame" to="sácame" />
<Word from="sentian" to="sentían" />
<Word from="sientate" to="siéntate" />
<Word from="sera" to="será" />
<Word from="soplon" to="soplón" />
<Word from="sueltalo" to="suéltalo" />
<Word from="tambien" to="también" />
<Word from="teoria" to="teoría" />
<Word from="tendra" to="tendrá" />
<Word from="telefono" to="teléfono" />
<Word from="tipica" to="típica" />
<Word from="todavia" to="todavía" />
<Word from="tomalo" to="tómalo" />
<Word from="tonterias" to="tonterías" />
<Word from="torci" to="torcí" />
<Word from="traelos" to="tráelos" />
<Word from="traiganlo" to="tráiganlo" />
<Word from="traiganlos" to="tráiganlos" />
<Word from="trio" to="trío" />
<Word from="tuvieramos" to="tuviéramos" />
<Word from="union" to="unión" />
<Word from="ultimo" to="último" />
<Word from="ultima" to="última" />
<Word from="ultimos" to="últimos" />
<Word from="ultimas" to="últimas" />
<Word from="unica" to="única" />
<Word from="unico" to="único" />
<Word from="vamonos" to="vámonos" />
<Word from="vayanse" to="váyanse" />
<Word from="victima" to="víctima" />
<Word from="vivira" to="vivirá" />
<Word from="volvio" to="volvió" />
<Word from="volvia" to="volvía" />
<Word from="volvian" to="volvían" />
<!-- Palabras con eír/oír más usadas -->
<Word from="reir" to="reír" />
<Word from="freir" to="freír" />
<Word from="sonreir" to="sonreír" />
<Word from="hazmerreir" to="hazmerreír" />
<Word from="oir" to="oír" />
<Word from="oirlo" to="oírlo" />
<Word from="oirte" to="oírte" />
<Word from="oirse" to="oírse" />
<Word from="oirme" to="oírme" />
<Word from="oirle" to="oírle" />
<Word from="oirla" to="oírla" />
<Word from="oirles" to="oírles" />
<Word from="oirnos" to="oírnos" />
<Word from="oirlas" to="oírlas" />
<!-- Palabras que no llevan acento -->
<Word from="bién" to="bien" />
<Word from="crímen" to="crimen" />
<Word from="fué" to="fue" />
<Word from="fuí" to="fui" />
<Word from="quiéres" to="quieres" />
<Word from="tí" to="ti" />
<Word from="dí" to="di" />
<Word from="vá" to="va" />
<Word from="vé" to="ve" />
<Word from="ví" to="vi" />
<Word from="vió" to="vio" />
<Word from="ó" to="o" />
<Word from="clón" to="clon" />
<Word from="dió" to="dio" />
<Word from="guión" to="guion" />
<Word from="dón" to="don" />
<Word from="fé" to="fe" />
<Word from="áquel" to="aquel" />
<!-- Palabras donde se puede prescindir de la tilde diacrítica -->
<Word from="éste" to="este" />
<Word from="ésta" to="esta" />
<Word from="éstos" to="estos" />
<Word from="éstas" to="estas" />
<Word from="ése" to="ese" />
<Word from="ésa" to="esa" />
<Word from="ésos" to="esos" />
<Word from="ésas" to="esas" />
<Word from="sólo" to="solo" />
<!-- Errores no relacionados con los tildes -->
<Word from="coktel" to="cóctel" />
<Word from="cocktel" to="cóctel" />
<Word from="conciente" to="consciente" />
<Word from="comenzé" to="comencé" />
<Word from="desilucionarte" to="desilusionarte" />
<Word from="dijieron" to="dijeron" />
<Word from="empezé" to="empecé" />
<Word from="hize" to="hice" />
<Word from="ilucionarte" to="ilusionarte" />
<Word from="inconciente" to="inconsciente" />
<Word from="quize" to="quise" />
<Word from="quizo" to="quiso" />
<Word from="verguenza" to="vergüenza" />
<!-- Errores en nombres propios o de países -->
<Word from="Nuñez" to="Núñez" />
<Word from="Ivan" to="Iván" />
<Word from="Japon" to="Japón" />
<Word from="Monica" to="Mónica" />
<Word from="Maria" to="María" />
<Word from="Jose" to="José" />
<Word from="Ramon" to="Ramón" />
<Word from="Garcia" to="García" />
<Word from="Gonzalez" to="González" />
<Word from="Jesus" to="Jesús" />
<Word from="Alvarez" to="Álvarez" />
<Word from="Damian" to="Damián" />
<Word from="Rene" to="René" />
<Word from="Nicolas" to="Nicolás" />
<Word from="Jonas" to="Jonás" />
<Word from="Lopez" to="López" />
<Word from="Hernandez" to="Hernández" />
<Word from="Bermudez" to="Bermúdez" />
<Word from="Fernandez" to="Fernández" />
<Word from="Suarez" to="Suárez" />
<Word from="Sofia" to="Sofía" />
<Word from="Seneca" to="Séneca" />
<Word from="Tokyo" to="Tokio" />
<Word from="Canada" to="Canadá" />
<Word from="Paris" to="París" />
<Word from="Turquia" to="Turquía" />
<Word from="Mexico" to="México" />
<Word from="Mejico" to="México" />
<Word from="Matias" to="Matías" />
<Word from="Valentin" to="Valentín" />
<Word from="mejicano" to="mexicano" />
<Word from="mejicanos" to="mexicanos" />
<Word from="mejicana" to="mexicana" />
<Word from="mejicanas" to="mexicanas" />
<!-- Creados por SE -->
<Word from="io" to="lo" />
<Word from="ia" to="la" />
<Word from="ie" to="le" />
<Word from="Io" to="lo" />
<Word from="Ia" to="la" />
<Word from="AI" to="Al" />
<Word from="Ie" to="le" />
<Word from="EI" to="El" />
<Word from="subafluente" to="subafluente" />
<Word from="aflójalo" to="aflójalo" />
<Word from="Aflójalo" to="Aflójalo" />
<Word from="perdi" to="perdí" />
<Word from="Podria" to="Podría" />
<Word from="confia" to="confía" />
<Word from="pasaria" to="pasaría" />
<Word from="Podias" to="Podías" />
<Word from="responsabke" to="responsable" />
<Word from="Todavia" to="Todavía" />
<Word from="envien" to="envíen" />
<Word from="Queria" to="Quería" />
<Word from="tio" to="tío" />
<Word from="traido" to="traído" />
<Word from="Asi" to="Así" />
<Word from="elegi" to="elegí" />
<Word from="habria" to="habría" />
<Word from="encantaria" to="encantaría" />
<Word from="leido" to="leído" />
<Word from="conocias" to="conocías" />
<Word from="harias" to="harías" />
<Word from="Aqui" to="Aquí" />
<Word from="decidi" to="decidí" />
<Word from="mia" to="mía" />
<Word from="Crei" to="Creí" />
<Word from="podiamos" to="podíamos" />
<Word from="avisame" to="avísame" />
<Word from="debia" to="debía" />
<Word from="pensarias" to="pensarías" />
<Word from="reuniamos" to="reuníamos" />
<Word from="POÏ" to="por" />
<Word from="vendria" to="vendría" />
<Word from="caida" to="caída" />
<Word from="venian" to="venían" />
<Word from="compañias" to="compañías" />
<Word from="leiste" to="leíste" />
<Word from="Leiste" to="Leíste" />
<Word from="fiaria" to="fiaría" />
<Word from="Hungria" to="Hungría" />
<Word from="fotografia" to="fotografía" />
<Word from="cafeteria" to="cafetería" />
<Word from="Digame" to="Dígame" />
<Word from="debias" to="debías" />
<Word from="tendria" to="tendría" />
<Word from="CÏGO" to="creo" />
<Word from="anteg" to="antes" />
<Word from="SóIo" to="Solo" />
<Word from="Ilamándola" to="llamándola" />
<Word from="Cáflaté" to="Cállate" />
<Word from="Ilamaste" to="llamaste" />
<Word from="daria" to="daría" />
<Word from="Iargaba" to="largaba" />
<Word from="Yati" to="Y a ti" />
<Word from="querias" to="querías" />
<Word from="Iimpiarlo" to="limpiarlo" />
<Word from="Iargado" to="largado" />
<Word from="galeria" to="galería" />
<Word from="Bartomeu" to="Bertomeu" />
<Word from="Iocalizarlo" to="localizarlo" />
<Word from="Ilámame" to="llámame" />
</WholeWords>
<PartialWordsAlways />
<PartialWords />
<PartialLines>
<!-- Varios -->
<LinePart from="de gratis" to="gratis" />
<LinePart from="si quiera" to="siquiera" />
<LinePart from="Cada una de los" to="Cada uno de los" />
<LinePart from="Cada uno de las" to="Cada una de las" />
<!-- Uso incorrecto de haber / a ver -->
<LinePart from="haber que" to="a ver qué" />
<LinePart from="haber qué" to="a ver qué" />
<LinePart from="Haber si" to="A ver si" />
<!-- Ponombres exclamativos o interrogativos Parte 1 -->
<LinePart from=" que hora" to=" qué hora" />
<LinePart from="yo que se" to="yo qué sé" />
<LinePart from="Yo que se" to="Yo qué sé" />
<!-- Acentos al final de los signos de exclamación -->
<LinePart from=" tu!" to=" tú!" />
<LinePart from=" si!" to=" sí!" />
<LinePart from=" mi!" to=" mí!" />
<LinePart from=" el!" to=" él!" />
<!-- Acentos al final de los signos de interrogación -->
<LinePart from=" tu?" to=" tú?" />
<LinePart from=" si?" to=" sí?" />
<LinePart from=" mi?" to=" mí?" />
<LinePart from=" el?" to=" él?" />
<LinePart from=" aun?" to=" aún?" />
<LinePart from=" mas?" to=" más?" />
<LinePart from=" que?" to=" qué?" />
<LinePart from=" paso?" to=" pasó?" />
<LinePart from=" cuando?" to=" cuándo?" />
<LinePart from=" cuanto?" to=" cuánto?" />
<LinePart from=" cuanta?" to=" cuánta?" />
<LinePart from=" cuantas?" to=" cuántas?" />
<LinePart from=" cuantos?" to=" cuántos?" />
<LinePart from=" donde?" to=" dónde?" />
<LinePart from=" quien?" to=" quién?" />
<LinePart from=" como?" to=" cómo?" />
<LinePart from=" adonde?" to=" adónde?" />
<LinePart from=" cual?" to=" cuál?" />
<!-- Acentos en los signos de interrogación completos -->
<LinePart from="¿Si?" to="¿Sí?" />
<LinePart from="¿esta bien?" to="¿está bien?" />
<!-- Enunciados que son a la vez interrogativos y exclamativos -->
<LinePart from="¿Pero qué haces?" to="¡¿Pero qué haces?!" />
<LinePart from="¿pero qué haces?" to="¡¿pero qué haces?!" />
<LinePart from="¿Es que no me has escuchado?" to="¡¿Es que no me has escuchado?!" />
<LinePart from="¡¿es que no me has escuchado?!" to="¡¿es que no me has escuchado?!" />
<!-- Acentos al principio de los signos de interrogación con minúsculas -->
<LinePart from="¿aun" to="¿aún" />
<LinePart from="¿tu " to="¿tú " />
<LinePart from="¿que " to="¿qué " />
<LinePart from="¿sabes que" to="¿sabes qué" />
<LinePart from="¿sabes adonde" to="¿sabes adónde" />
<LinePart from="¿sabes cual" to="¿sabes cuál" />
<LinePart from="¿sabes quien" to="¿sabes quién" />
<LinePart from="¿sabes como" to="¿sabes cómo" />
<LinePart from="¿sabes cuan" to="¿sabes cuán" />
<LinePart from="¿sabes cuanto" to="¿sabes cuánto" />
<LinePart from="¿sabes cuanta" to="¿sabes cuánta" />
<LinePart from="¿sabes cuantos" to="¿sabes cuántos" />
<LinePart from="¿sabes cuantas" to="¿sabes cuántas" />
<LinePart from="¿sabes cuando" to="¿sabes cuándo" />
<LinePart from="¿sabes donde" to="¿sabes dónde" />
<LinePart from="¿sabe que" to="¿sabe qué" />
<LinePart from="¿sabe adonde" to="¿sabe adónde" />
<LinePart from="¿sabe cual" to="¿sabe cuál" />
<LinePart from="¿sabe quien" to="¿sabe quién" />
<LinePart from="¿sabe como" to="¿sabe cómo" />
<LinePart from="¿sabe cuan" to="¿sabe cuán" />
<LinePart from="¿sabe cuanto" to="¿sabe cuánto" />
<LinePart from="¿sabe cuanta" to="¿sabe cuánta" />
<LinePart from="¿sabe cuantos" to="¿sabe cuántos" />
<LinePart from="¿sabe cuantas" to="¿sabe cuántas" />
<LinePart from="¿sabe cuando" to="¿sabe cuándo" />
<LinePart from="¿sabe donde" to="¿sabe dónde" />
<LinePart from="¿saben que" to="¿saben qué" />
<LinePart from="¿saben adonde" to="¿saben adónde" />
<LinePart from="¿saben cual" to="¿saben cuál" />
<LinePart from="¿saben quien" to="¿saben quién" />
<LinePart from="¿saben como" to="¿saben cómo" />
<LinePart from="¿saben cuan" to="¿saben cuán" />
<LinePart from="¿saben cuanto" to="¿saben cuánto" />
<LinePart from="¿saben cuanta" to="¿saben cuánta" />
<LinePart from="¿saben cuantos" to="¿saben cuántos" />
<LinePart from="¿saben cuantas" to="¿saben cuántas" />
<LinePart from="¿saben cuando" to="¿saben cuándo" />
<LinePart from="¿saben donde" to="¿saben dónde" />
<LinePart from="¿de que" to="¿de qué" />
<LinePart from="¿de donde" to="¿de dónde" />
<LinePart from="¿de cual" to="¿de cuál" />
<LinePart from="¿de quien" to="¿de quién" />
<LinePart from="¿de cuanto" to="¿de cuánto" />
<LinePart from="¿de cuanta" to="¿de cuánta" />
<LinePart from="¿de cuantos" to="¿de cuántos" />
<LinePart from="¿de cuantas" to="¿de cuántas" />
<LinePart from="¿de cuando" to="¿de cuándo" />
<LinePart from="¿sobre que" to="¿sobre qué" />
<LinePart from="¿como " to="¿cómo " />
<LinePart from="¿cual " to="¿cuál " />
<LinePart from="¿en cual" to="¿en cuál" />
<LinePart from="¿cuando" to="¿cuándo" />
<LinePart from="¿hasta cual" to="¿hasta cuál" />
<LinePart from="¿hasta quien" to="¿hasta quién" />
<LinePart from="¿hasta cuanto" to="¿hasta cuánto" />
<LinePart from="¿hasta cuantas" to="¿hasta cuántas" />
<LinePart from="¿hasta cuantos" to="¿hasta cuántos" />
<LinePart from="¿hasta cuando" to="¿hasta cuándo" />
<LinePart from="¿hasta donde" to="¿hasta dónde" />
<LinePart from="¿hasta que" to="¿hasta qué" />
<LinePart from="¿hasta adonde" to="¿hasta adónde" />
<LinePart from="¿desde que" to="¿desde qué" />
<LinePart from="¿desde cuando" to="¿desde cuándo" />
<LinePart from="¿desde quien" to="¿desde quién" />
<LinePart from="¿desde donde" to="¿desde dónde" />
<LinePart from="¿cuanto" to="¿cuánto" />
<LinePart from="¿cuantos" to="¿cuántos" />
<LinePart from="¿donde" to="¿dónde" />
<LinePart from="¿adonde" to="¿adónde" />
<LinePart from="¿con que" to="¿con qué" />
<LinePart from="¿con cual" to="¿con cuál" />
<LinePart from="¿con quien" to="¿con quién" />
<LinePart from="¿con cuantos" to="¿con cuántos" />
<LinePart from="¿con cuantas" to="¿con cuántas" />
<LinePart from="¿con cuanta" to="¿con cuánta" />
<LinePart from="¿con cuanto" to="¿con cuánto" />
<LinePart from="¿para donde" to="¿para dónde" />
<LinePart from="¿para adonde" to="¿para adónde" />
<LinePart from="¿para cuando" to="¿para cuándo" />
<LinePart from="¿para que" to="¿para qué" />
<LinePart from="¿para quien" to="¿para quién" />
<LinePart from="¿para cuanto" to="¿para cuánto" />
<LinePart from="¿para cuanta" to="¿para cuánta" />
<LinePart from="¿para cuantos" to="¿para cuántos" />
<LinePart from="¿para cuantas" to="¿para cuántas" />
<LinePart from="¿a donde" to="¿a dónde" />
<LinePart from="¿a que" to="¿a qué" />
<LinePart from="¿a cual" to="¿a cuál" />
<LinePart from="¿a quien" to="¿a quien" />
<LinePart from="¿a como" to="¿a cómo" />
<LinePart from="¿a cuanto" to="¿a cuánto" />
<LinePart from="¿a cuanta" to="¿a cuánta" />
<LinePart from="¿a cuantos" to="¿a cuántos" />
<LinePart from="¿a cuantas" to="¿a cuántas" />
<LinePart from="¿por que" to="¿por qué" />
<LinePart from="¿por cual" to="¿por cuál" />
<LinePart from="¿por quien" to="¿por quién" />
<LinePart from="¿por cuanto" to="¿por cuánto" />
<LinePart from="¿por cuanta" to="¿por cuánta" />
<LinePart from="¿por cuantos" to="¿por cuántos" />
<LinePart from="¿por cuantas" to="¿por cuántas" />
<LinePart from="¿por donde" to="¿por dónde" />
<LinePart from="¿porque" to="¿por qué" />
<LinePart from="¿porqué" to="¿por qué" />
<LinePart from="¿y que" to="¿y qué" />
<LinePart from="¿y como" to="¿y cómo" />
<LinePart from="¿y cuando" to="¿y cuándo" />
<LinePart from="¿y cual" to="¿y cuál" />
<LinePart from="¿y quien" to="¿y quién" />
<LinePart from="¿y cuanto" to="¿y cuánto" />
<LinePart from="¿y cuanta" to="¿y cuánta" />
<LinePart from="¿y cuantos" to="¿y cuántos" />
<LinePart from="¿y cuantas" to="¿y cuántas" />
<LinePart from="¿y donde" to="¿y dónde" />
<LinePart from="¿y adonde" to="¿y adónde" />
<LinePart from="¿quien " to="¿quién " />
<LinePart from="¿esta " to="¿está " />
<LinePart from="¿estas " to="¿estás " />
<!-- Acentos al principio de los signos de interrogación con mayúsculas -->
<LinePart from="¿Aun" to="¿Aún" />
<LinePart from="¿Que " to="¿Qué " />
<LinePart from="¿Sabes que" to="¿Sabes qué" />
<LinePart from="¿Sabes adonde" to="¿Sabes adónde" />
<LinePart from="¿Sabes cual" to="¿Sabes cuál" />
<LinePart from="¿Sabes quien" to="¿Sabes quién" />
<LinePart from="¿Sabes como" to="¿Sabes cómo" />
<LinePart from="¿Sabes cuan" to="¿Sabes cuán" />
<LinePart from="¿Sabes cuanto" to="¿Sabes cuánto" />
<LinePart from="¿Sabes cuanta" to="¿Sabes cuánta" />
<LinePart from="¿Sabes cuantos" to="¿Sabes cuántos" />
<LinePart from="¿Sabes cuantas" to="¿Sabes cuántas" />
<LinePart from="¿Sabes cuando" to="¿Sabes cuándo" />
<LinePart from="¿Sabes donde" to="¿Sabes dónde" />
<LinePart from="¿Sabe que" to="¿Sabe qué" />
<LinePart from="¿Sabe adonde" to="¿Sabe adónde" />
<LinePart from="¿Sabe cual" to="¿Sabe cuál" />
<LinePart from="¿Sabe quien" to="¿Sabe quién" />
<LinePart from="¿Sabe como" to="¿Sabe cómo" />
<LinePart from="¿Sabe cuan" to="¿Sabe cuán" />
<LinePart from="¿Sabe cuanto" to="¿Sabe cuánto" />
<LinePart from="¿Sabe cuanta" to="¿Sabe cuánta" />
<LinePart from="¿Sabe cuantos" to="¿Sabe cuántos" />
<LinePart from="¿Sabe cuantas" to="¿Sabe cuántas" />
<LinePart from="¿Sabe cuando" to="¿Sabe cuándo" />
<LinePart from="¿Sabe donde" to="¿Sabe dónde" />
<LinePart from="¿Saben que" to="¿Saben qué" />
<LinePart from="¿Saben adonde" to="¿Saben adónde" />
<LinePart from="¿Saben cual" to="¿Saben cuál" />
<LinePart from="¿Saben quien" to="¿Saben quién" />
<LinePart from="¿Saben como" to="¿Saben cómo" />
<LinePart from="¿Saben cuan" to="¿Saben cuán" />
<LinePart from="¿Saben cuanto" to="¿Saben cuánto" />
<LinePart from="¿Saben cuanta" to="¿Saben cuánta" />
<LinePart from="¿Saben cuantos" to="¿Saben cuántos" />
<LinePart from="¿Saben cuantas" to="¿Saben cuántas" />
<LinePart from="¿Saben cuando" to="¿Saben cuándo" />
<LinePart from="¿Saben donde" to="¿Saben dónde" />
<LinePart from="¿De que" to="¿De qué" />
<LinePart from="¿De donde" to="¿De dónde" />
<LinePart from="¿De cual" to="¿De cuál" />
<LinePart from="¿De quien" to="¿De quién" />
<LinePart from="¿De cuanto" to="¿De cuánto" />
<LinePart from="¿De cuanta" to="¿De cuánta" />
<LinePart from="¿De cuantos" to="¿De cuántos" />
<LinePart from="¿De cuantas" to="¿De cuántas" />
<LinePart from="¿De cuando" to="¿De cuándo" />
<LinePart from="¿Desde que" to="¿Desde qué" />
<LinePart from="¿Desde cuando" to="¿Desde cuándo" />
<LinePart from="¿Desde quien" to="¿Desde quién" />
<LinePart from="¿Desde donde" to="¿Desde dónde" />
<LinePart from="¿Sobre que" to="¿Sobre qué" />
<LinePart from="¿Como " to="¿Cómo " />
<LinePart from="¿Cual " to="¿Cuál " />
<LinePart from="¿En cual" to="¿En cuál" />
<LinePart from="¿Cuando" to="¿Cuándo" />
<LinePart from="¿Hasta cual" to="¿Hasta cuál" />
<LinePart from="¿Hasta quien" to="¿Hasta quién" />
<LinePart from="¿Hasta cuanto" to="¿Hasta cuánto" />
<LinePart from="¿Hasta cuantas" to="¿Hasta cuántas" />
<LinePart from="¿Hasta cuantos" to="¿Hasta cuántos" />
<LinePart from="¿Hasta cuando" to="¿Hasta cuándo" />
<LinePart from="¿Hasta donde" to="¿Hasta dónde" />
<LinePart from="¿Hasta que" to="¿Hasta qué" />
<LinePart from="¿Hasta adonde" to="¿Hasta adónde" />
<LinePart from="¿Cuanto" to="¿Cuánto" />
<LinePart from="¿Cuantos" to="¿Cuántos" />
<LinePart from="¿Donde" to="¿Dónde" />
<LinePart from="¿Adonde" to="¿Adónde" />
<LinePart from="¿Con que" to="¿Con qué" />
<LinePart from="¿Con cual" to="¿Con cuál" />
<LinePart from="¿Con quien" to="¿Con quién" />
<LinePart from="¿Con cuantos" to="¿Con cuántos" />
<LinePart from="¿Con cuanta" to="¿Con cuántas" />
<LinePart from="¿Con cuanta" to="¿Con cuánta" />
<LinePart from="¿Con cuanto" to="¿Con cuánto" />
<LinePart from="¿Para donde" to="¿Para dónde" />
<LinePart from="¿Para adonde" to="¿Para adónde" />
<LinePart from="¿Para cuando" to="¿Para cuándo" />
<LinePart from="¿Para que" to="¿Para qué" />
<LinePart from="¿Para quien" to="¿Para quién" />
<LinePart from="¿Para cuanto" to="¿Para cuánto" />
<LinePart from="¿Para cuanta" to="¿Para cuánta" />
<LinePart from="¿Para cuantos" to="¿Para cuántos" />
<LinePart from="¿Para cuantas" to="¿Para cuántas" />
<LinePart from="¿A donde" to="¿A dónde" />
<LinePart from="¿A que" to="¿A qué" />
<LinePart from="¿A cual" to="¿A cuál" />
<LinePart from="¿A quien" to="¿A quien" />
<LinePart from="¿A como" to="¿A cómo" />
<LinePart from="¿A cuanto" to="¿A cuánto" />
<LinePart from="¿A cuanta" to="¿A cuánta" />
<LinePart from="¿A cuantos" to="¿A cuántos" />
<LinePart from="¿A cuantas" to="¿A cuántas" />
<LinePart from="¿Por que" to="¿Por qué" />
<LinePart from="¿Por cual" to="¿Por cuál" />
<LinePart from="¿Por quien" to="¿Por quién" />
<LinePart from="¿Por cuanto" to="¿Por cuánto" />
<LinePart from="¿Por cuanta" to="¿Por cuánta" />
<LinePart from="¿Por cuantos" to="¿Por cuántos" />
<LinePart from="¿Por cuantas" to="¿Por cuántas" />
<LinePart from="¿Por donde" to="¿Por dónde" />
<LinePart from="¿Porque" to="¿Por qué" />
<LinePart from="¿Porqué" to="¿Por qué" />
<LinePart from="¿Y que" to="¿Y qué" />
<LinePart from="¿Y como" to="¿Y cómo" />
<LinePart from="¿Y cuando" to="¿Y cuándo" />
<LinePart from="¿Y cual" to="¿Y cuál" />
<LinePart from="¿Y quien" to="¿Y quién" />
<LinePart from="¿Y cuanto" to="¿Y cuánto" />
<LinePart from="¿Y cuanta" to="¿Y cuánta" />
<LinePart from="¿Y cuantos" to="¿Y cuántos" />
<LinePart from="¿Y cuantas" to="¿Y cuántas" />
<LinePart from="¿Y donde" to="¿Y dónde" />
<LinePart from="¿Y adonde" to="¿Y adónde" />
<LinePart from="¿Quien " to="¿Quién " />
<LinePart from="¿Esta " to="¿Está " />
<!-- Tilde diacrítica en oraciones interrogativas o exclamativas indirectas -->
<LinePart from="el porque" to="el porqué" />
<LinePart from="su porque" to="su porqué" />
<LinePart from="los porqués" to="los porqués" />
<!-- aún -->
<LinePart from="aun," to="aún," />
<LinePart from="aun no" to="aún no" />
<!---->
<LinePart from=" de y " to=" dé y " />
<LinePart from=" nos de " to=" nos dé " />
<!---->
<LinePart from=" tu ya " to=" tú ya " />
<LinePart from="Tu ya " to="Tú ya " />
<!-- casos específicos antes de la coma -->
<LinePart from=" de, " to=" dé," />
<LinePart from=" mi, " to=" mí," />
<LinePart from=" tu, " to=" tú," />
<LinePart from=" el, " to=" él," />
<LinePart from=" te, " to=" té," />
<LinePart from=" mas, " to=" más," />
<LinePart from=" quien, " to=" quién," />
<LinePart from=" cual," to=" cuál," />
<LinePart from="porque, " to="porqué," />
<LinePart from="cuanto, " to="cuánto," />
<LinePart from="cuando, " to="cuándo," />
<!---->
<LinePart from=" se," to=" sé," />
<LinePart from="se donde" to="sé dónde" />
<LinePart from="se cuando" to="sé cuándo" />
<LinePart from="se adonde" to="sé adónde" />
<LinePart from="se como" to="sé cómo" />
<LinePart from="se cual" to="sé cuál" />
<LinePart from="se quien" to="sé quién" />
<LinePart from="se cuanto" to="sé cuánto" />
<LinePart from="se cuanta" to="sé cuánta" />
<LinePart from="se cuantos" to="sé cuántos" />
<LinePart from="se cuantas" to="sé cuántas" />
<LinePart from="se cuan" to="sé cuán" />
<!-- si/sí -->
<LinePart from=" el si " to=" el sí " />
<LinePart from="si mismo" to="sí mismo" />
<LinePart from="si misma" to="sí misma" />
<!-- Errores de "l" en vez de "i" en casos específicos -->
<LinePart from=" llegal" to=" ilegal" />
<LinePart from=" lluminar" to=" iluminar" />
<LinePart from="sllbato" to="silbato" />
<LinePart from="sllenclo" to="silencio" />
<LinePart from="clemencla" to="clemencia" />
<LinePart from="socledad" to="sociedad" />
<LinePart from="tlene" to="tiene" />
<LinePart from="tlempo" to="tiempo" />
<LinePart from="equlvocaba" to="equivocaba" />
<LinePart from="qulnce" to="quince" />
<LinePart from="comlen" to="comien" />
<LinePart from="historl" to="histori" />
<LinePart from="misterl" to="misteri" />
<LinePart from="vivencl" to="vivenci" />
</PartialLines>
<PartialLinesAlways />
<BeginLines />
<EndLines>
<Ending from=".»." to="»." />
</EndLines>
<WholeLines>
<!-- Todas las líneas -->
<Line from="No" to="No." />
</WholeLines>
<RegularExpressions>
<!-- Abreviaturas compuestas -->
<RegEx find="\b[Ss](r|ra|rta)\b\.?" replaceWith="S$1." />
<RegEx find="\b[Dd](r|ra)\b\.?" replaceWith="D$1." />
<RegEx find="\b[Uu](d|ds)\b\.?" replaceWith="U$1." />
<RegEx find="(\d)(\s){0,1}([Aa])(\.){0,1}([Mm])(\.){0,1}(\W){0,1}" replaceWith="$1 a. m.$7" />
<RegEx find="(\d)(\s){0,1}([Pp])(\.){0,1}([Mm])(\.){0,1}(\W){0,1}" replaceWith="$1 p. m.$7" />
<RegEx find="(\d)(\s){0,1}(h)(s\b|r\b|rs\b){0,1}(\.){0,1}(\W){0,1}" replaceWith="$1 $3$6" />
<RegEx find="(\d)(\s){0,1}([Kk])(m\b|ms\b)(\.){0,1}(\W){0,1}" replaceWith="$1 km$6" />
<RegEx find="(\d)(\s){0,1}(s)(g\b|eg\b){0,1}(\.){0,1}(\W){0,1}" replaceWith="$1 s$6" />
<RegEx find="(\d)(\s){0,1}([Kk])(g\b|gs\b)(\.){0,1}(\W){0,1}" replaceWith="$1 kg$6" />
<RegEx find="(\d)(\s){0,1}(m)(t\b|ts\b){0,1}(\.){0,1}(\W){0,1}" replaceWith="$1 m$6" />
<RegEx find="(\d)KBs(\W){0,1}" replaceWith="$1 kB$2" />
<RegEx find="([Nn])°(\s){0,1}(\d)" replaceWith="$1.° $3" />
<RegEx find="([Nn])ro(\.){0,1}(\s){0,1}(\d)" replaceWith="$1.° $4" />
<!-- Signos invertidos -->
<RegEx find="\?¿(\W|\w)" replaceWith="? ¿$1" />
<RegEx find="\!¡(\W|\w)" replaceWith="! ¡$1" />
<RegEx find="\?¿¿(\W|\w)" replaceWith="? ¿$1" />
<RegEx find="\!¡¡(\W|\w)" replaceWith="! ¡$1" />
<!-- Inicio de línea -->
<RegEx find="^_(\s)" replaceWith="-$1" />
<RegEx find="^_(\w)" replaceWith="- $1" />
<!-- Uso de comillas según la recomendación de la RAE y la Wikipedia -->
<RegEx find="(«[^“«»]+)«" replaceWith="$1“" />
<RegEx find="(“[^«»”]+)»" replaceWith="$1”" />
<RegEx find="`" replaceWith="" />
<RegEx find="´" replaceWith="" />
<RegEx find="([\wá-ú])(\.)(«|»)" replaceWith="$1»." />
<RegEx find="«(\?)" replaceWith="»?" />
<RegEx find="«(\!)" replaceWith="»!" />
<RegEx find="«\s" replaceWith="» " />
<RegEx find="«(\))" replaceWith="»)" />
<RegEx find="(\?)«" replaceWith="?»" />
<RegEx find="(\!)«" replaceWith="!»" />
<RegEx find="«(,)" replaceWith="»," />
<RegEx find="«(;)" replaceWith="»;" />
<RegEx find="«(:)" replaceWith="»:" />
<RegEx find="(¿)»" replaceWith="¿«" />
<RegEx find="(¡)»" replaceWith="¡«" />
<!-- Uso de comillas (ANSI) según la recomendación de la RAE («\x22» es el carácter «"») -->
<RegEx find="([\wá-ú])([\.,]) ?[\x22»]" replaceWith="$1»$2" />
<RegEx find="([\wá-ú])\?[\x22»](\s|$)" replaceWith="$1?».$2" />
<RegEx find="^(\.\.\.)(\s){0,1}\x22" replaceWith="$1«" />
<RegEx find="«\x22" replaceWith="«" />
<RegEx find="\x22»" replaceWith="»" />
<RegEx find="^\x22{2,}" replaceWith="«" />
<RegEx find="\x22{2,}$" replaceWith="»" />
<RegEx find="\x22\r" replaceWith="»" />
<RegEx find="^\x22" replaceWith="«" />
<RegEx find="\x22$" replaceWith="»." />
<RegEx find="([\wá-ú])\.[\x22»]" replaceWith="$1»." />
<RegEx find="\s\x22" replaceWith=" «" />
<RegEx find="\x22\s" replaceWith="» " />
<RegEx find="\x22(,)" replaceWith="»," />
<RegEx find="\x22(\.)" replaceWith="»." />
<RegEx find="\x22(;)" replaceWith="»;" />
<RegEx find="\x22(:)" replaceWith="»:" />
<RegEx find="(\!)\x22" replaceWith="!»" />
<RegEx find="\x22(\!)" replaceWith="»!" />
<RegEx find="(\?)\x22" replaceWith="?»" />
<RegEx find="\x22(\?)" replaceWith="»?" />
<RegEx find="\x22(¿)" replaceWith="«¿" />
<RegEx find="(¿)\x22" replaceWith="¿«" />
<RegEx find="\x22(¡)" replaceWith="«¡" />
<RegEx find="(¡)\x22" replaceWith="¡«" />
<RegEx find="\x22(\))" replaceWith="»)" />
<RegEx find="(\))\x22" replaceWith=")»" />
<RegEx find="(\()\x22" replaceWith="(«" />
<!-- Uso de comillas (Unicode) según la recomendación de la RAE («\u0022» es el carácter «"») -->
<RegEx find="^(\.\.\.)(\s){0,1}\u0022" replaceWith="$1«" />
<RegEx find="^\u0022{2,}" replaceWith="«" />
<RegEx find="\u0022{2,}$" replaceWith="»" />
<RegEx find="\u0022\r" replaceWith="»" />
<RegEx find="^\u0022" replaceWith="«" />
<RegEx find="\u0022$" replaceWith="»" />
<RegEx find="(\w)(\.)\u0022" replaceWith="$1»." />
<RegEx find="\s\u0022" replaceWith=" «" />
<RegEx find="\u0022\s" replaceWith="» " />
<RegEx find="\u0022(,)" replaceWith="»," />
<RegEx find="\u0022(\.)" replaceWith="»." />
<RegEx find="\u0022(;)" replaceWith="»;" />
<RegEx find="\u0022(:)" replaceWith="»:" />
<RegEx find="(\!)\u0022" replaceWith="!»" />
<RegEx find="\u0022(\!)" replaceWith="»!" />
<RegEx find="(\?)\u0022" replaceWith="?»" />
<RegEx find="\u0022(\?)" replaceWith="»?" />
<RegEx find="\u0022(¿)" replaceWith="«¿" />
<RegEx find="(¿)\u0022" replaceWith="¿«" />
<RegEx find="\u0022(¡)" replaceWith="«¡" />
<RegEx find="(¡)\u0022" replaceWith="¡«" />
<RegEx find="\u0022(\))" replaceWith="»)" />
<RegEx find="(\))\u0022" replaceWith=")»" />
<RegEx find="(\()\u0022" replaceWith="(«" />
<!-- Numeración -->
<RegEx find="([0-9])\.([0-9])\b" replaceWith="$1,$2" />
<RegEx find="(^|\s|[¡¿«])([0-9])(,|\.)?([0-9]{3})\b" replaceWith="$1$2$4" />
<RegEx find="(\d)\s(?=\d{2}\b)" replaceWith="$1-" />
<!-- "1 :", "2 :"... "n :" a "n:" -->
<RegEx find="(\d) ([:;])" replaceWith="$1$2" />
<!-- Corregir las comas y puntos por ej. «, ,» por «,» & «,,,» o similar por «...» -->
<RegEx find="(\.\.\.+)$" replaceWith="..." />
<RegEx find="(, ,+)$" replaceWith="," />
<RegEx find="(,\s),+\s" replaceWith="$1" />
<RegEx find="(\.\.\.),$" replaceWith="$1" />
<RegEx find="([\wá-ú])(\.\.)$" replaceWith="$1." />
<!-- Puntos innecesarios (complemento) -->
<RegEx find="([\w\W]\.{3})([¡¿])" replaceWith="$1 $2" />
<RegEx find="(\w)\.\.(\s)" replaceWith="$1.$2" />
<RegEx find="([\wá-ú\x22»])\.([\?\!])" replaceWith="$1$2" />
<RegEx find="([\:\;])\." replaceWith="$1" />
<RegEx find="\.([\:\;])" replaceWith="$1" />
<RegEx find="\:+" replaceWith=":" />
<!-- Terminaciones ción/sión -->
<RegEx find="([sc]i)o(n)\b" replaceWith="$1ó$2" />
<RegEx find="([SC]I)O(N)\b" replaceWith="$1Ó$2" />
<!-- "i" en vez de "l" en terminaciones «clón» -->
<RegEx find="clón\b" replaceWith="ción" />
<!-- "si" en vez de "sl" -->
<RegEx find="\b([Ss])(l)\b" replaceWith="$1i" />
<!-- Para corregir por ej. raclones, perforaclones, opclones, etc -->
<RegEx find="([Rr]ac)l(o)" replaceWith="$1i$2" />
<RegEx find="([Oo]pc)l(o)" replaceWith="$1i$2" />
<!-- Para corregir por ej. tenldo, víctlmas, olvldarlo, legítlmo, etc -->
<RegEx find="([BbCcDdFfHhMmNnRrSsTtVv])l([bcdhmnrstv])" replaceWith="$1i$2" />
<!-- Corrige los errores en el ripeo de la «o» mayúscula por el cero «0» y viceversa -->
<RegEx find="(\d)O" replaceWith="$1 0" />
<RegEx find="(\d)[,\.]O" replaceWith="$1.0" />
<RegEx find="([A-Z])0" replaceWith="$1O" />
<RegEx find="\b0([A-Za-z])" replaceWith="O$1" />
<!-- Signos musicales -->
<RegEx find="[♪♫☺☹♥©☮☯Σ∞≡⇒π#](\r\n)[♪♫☺☹♥©☮☯Σ∞≡⇒π#]" replaceWith="$1" />
<!-- Tilde diacrítica antes del punto -->
<RegEx find="(\s)([dst])e\.(\s|\$)" replaceWith="$1$2é.$3" />
<RegEx find="(\s)mi\.(\s|\$)" replaceWith="$1mí.$2" />
<RegEx find="(\s)el\.(\s|\$)" replaceWith="$1él.$2" />
<RegEx find="(\s)tu\.(\s|\$)" replaceWith="$1tú.$2" />
<RegEx find="(\s)si\.(\s|\$)" replaceWith="$1sí.$2" />
<RegEx find="(\s)aun\.(\s|\$)" replaceWith="$1aún.$2" />
<RegEx find="(\s)mas\.(\s|\$)" replaceWith="$1más.$2" />
<RegEx find="(\s)quien\.(\s|\$)" replaceWith="$1quién.$2" />
<RegEx find="(\s)cual\.(\s|\$)" replaceWith="$1cuál.$2" />
<RegEx find="(\s)que\.(\s|\$)" replaceWith="$1qué.$2" />
<RegEx find="(\s)porque\.(\s|\$)" replaceWith="$1porqué.$2" />
<RegEx find="(\s)cuanto\.(\s|\$)" replaceWith="$1cuánto.$2" />
<RegEx find="(\s)cuando\.(\s|\$)" replaceWith="$1cuándo.$2" />
<!-- Prefijos; palabras compuestas (simple) -->
<RegEx find="(\b[Ee]x|\b[Ss]uper|\b[Aa]nti|\b[Pp]os|\b[Pp]re|\b[Pp]ro|\b[Vv]ice)[\s\x2D]([a-zá-ú]{3,20})(\b)" replaceWith="$1$2" />
<!-- Prefijos; palabras compuestas (números) -->
<RegEx find="(\b[Ss]ub|\b[Ss]uper)[\s\x2D](\d{2})(\b)" replaceWith="$1-$2$3" />
<!-- Prefijos; palabras compuestas (mayúsculas) -->
<RegEx find="(\b[Aa]nti|\b[Mm]ini|\b[Pp]os|\b[Pp]ro)\s([A-Z]{1,10})([A-Z][a-zá-ú]){0,10}(\b)" replaceWith="$1-$2$3" />
<!-- Casos de mayúsculas con dos puntos -->
<RegEx find="([\wá-ú]:\s[«\x22]?)(a)" replaceWith="$1A" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(b)" replaceWith="$1B" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(c)" replaceWith="$1C" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(d)" replaceWith="$1D" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(e)" replaceWith="$1E" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(f)" replaceWith="$1F" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(g)" replaceWith="$1G" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(h)" replaceWith="$1H" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(i)" replaceWith="$1I" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(j)" replaceWith="$1J" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(k)" replaceWith="$1K" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(l)" replaceWith="$1L" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(m)" replaceWith="$1M" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(n)" replaceWith="$1N" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(ñ)" replaceWith="$1Ñ" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(o)" replaceWith="$1O" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(p)" replaceWith="$1P" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(q)" replaceWith="$1Q" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(r)" replaceWith="$1R" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(s)" replaceWith="$1S" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(t)" replaceWith="$1T" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(u)" replaceWith="$1U" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(v)" replaceWith="$1V" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(w)" replaceWith="$1W" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(x)" replaceWith="$1X" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(y)" replaceWith="$1Y" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(z)" replaceWith="$1Z" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(á)" replaceWith="$1Á" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(é)" replaceWith="$1É" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(í)" replaceWith="$1Í" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(ó)" replaceWith="$1Ó" />
<RegEx find="([\wá-ú]:\s[«\x22]?)(ú)" replaceWith="$1Ú" />
<!-- Usos correctos de la coma -->
<RegEx find="(\b[Pp]ero),(\s)([¡¿])" replaceWith="$1$2$3" />
<RegEx find="(\b[Aa]unque),(\s|$)" replaceWith="$1$2" />
<!-- Vocativos -->
<RegEx find="(\bHola|\bBueno|\bBien|\bVen|\bVen acá|\besto|\bBuenos días|\bFeliz cumpleaños|\bsiento)\s([A-Z][a-zá-ú]{3,12}\b|seño(r|ra|rita)\b|hij(o|a) mío\b|amig(o|a)\b)" replaceWith="$1, $2" />
<!-- «aún» cuando son sinónimos de «incluso» o «hasta» -->
<RegEx find="(\W|^)(\b[Aa])ú(n)(\s)(así\b|cuando\b|los\b|las\b|negar(te|se)\b)" replaceWith="$1$2u$3$4$5" />
<RegEx find="(\b[Nn]i)(\s)(a)ú(n)(\W|$)" replaceWith="$1$2$3u$4$5" />
<!-- «sí» -->
<RegEx find="\b([Ss])i(:|;|\.)" replaceWith="$1í$2" />
<!-- «sé» -->
<RegEx find="(\b[Ll]o|\b[Ll]a|\b[Ll]e)(\s)se(\W|$)" replaceWith="$1$2sé$3" />
<RegEx find="[Ss]e\s(dónde\b|cuándo\b|adónde\b|cómo\b|cuál\b|quién\b|cuánto\b|cuánta\b|cuántos\b|cuántas\b|cuán\b)" replaceWith="sé $1" />
<!-- «té» -->
<RegEx find="\b([Tt])e\s(verde\b|negro\b|perla\b|de manzanilla\b|de lim[óo]n\b|de jazm[íi]n\b)" replaceWith="$1é $2" />
<!-- Apóstrofo -->
<RegEx find="(\b[A-Z][a-zá-ú]{3,12})\s(|')(\d\d(\s|$))" replaceWith="$1 $3" />
<RegEx find="(\b[A-Z]{2,5})(|')(s)" replaceWith="(Ej. Devedés)$1$3" />
<RegEx find="(\b\d{1,2})(|')(\d{2})\s(s|m)(\W|$)" replaceWith="$1,$3 $4$5" />
<RegEx find="(\b\d{1,2})(|')(\d{2})\s(h)(\W|$)" replaceWith="$1:$3 $4$5" />
<!-- Porcentaje (debe llevar espacio) -->
<RegEx find="(\b\d{1,3})%(\W)" replaceWith="$1 %$2" />
<!-- Haz/has -->
<RegEx find="(\b)([Hh])as\s(la\b|lo\b|clic\b)(\W)" replaceWith="$1$2az $3$4" />
<RegEx find="(\b)([Hh])az\s(de\b)(\W)" replaceWith="$1$2as $3$4" />
<RegEx find="(\b)([Hh])as(le\b|nos\b|me\b)(\W)" replaceWith="$1$2az$3$4" />
<!-- Quitar itálicas en 3 o menos letras -->
<RegEx find="\x3ci\x3e(.{1,3})\x3c\/i\x3e" replaceWith="$1" />
<!-- Miscelánea -->
<RegEx find="(\b[Cc]erca|\b[Ee]ncima|\b[Dd]ebajo|\b[Dd]etrás|\b[Dd]elante)(\s)mío" replaceWith="$1 de mí" />
<RegEx find="(\b[Cc]erca|\b[Ee]ncima|\b[Dd]ebajo|\b[Dd]etrás|\b[Dd]elante)(\s)tuyo" replaceWith="$1 de ti" />
<!-- Punto antes de «¿» y «¡» -->
<RegEx find="([\wá-ú»])\s(?=(¿|¡)[A-ZÁ-Ú])" replaceWith="$1. " />
<!-- Espacios después del guión -->
<RegEx find="(^|\n)(-)([^\s])" replaceWith="$1$2 $3" />
<!-- Punto antes del guión -->
<RegEx find="([^\.\?\!]) - " replaceWith="$1. - " />
<!-- Terminaciones en «ólogo», «ílogo» y «álogo» -->
<RegEx find="\Bo(log[ao]s?\b)" replaceWith="ó$1" />
<RegEx find="\Ba(log[ao]s?\b)" replaceWith="á$1" />
<RegEx find="\Bi(log[ao]s?\b)" replaceWith="í$1" />
</RegularExpressions>
</OCRFixReplaceList>
@@ -0,0 +1,234 @@
<!-- Credit goes to: MilanRS [http://www.prijevodi-online.org] -->
<OCRFixReplaceList>
<WholeWords>
<Word from="ču" to="ću" />
<Word from="češ" to="ćeš" />
<Word from="če" to="će" />
<Word from="ćš" to="ćeš" />
<Word from="ćmo" to="ćemo" />
<Word from="ćte" to="ćete" />
<Word from="čemo" to="ćemo" />
<Word from="čete" to="čete" />
<Word from="djete" to="dijete" />
<Word from="Hey" to="Hej" />
<Word from="hey" to="hej" />
<Word from="htjeo" to="htio" />
<Word from="Hočeš" to="Hoćeš" />
<Word from="hočeš" to="hoćeš" />
<Word from="iči" to="ići" />
<Word from="jel" to="je l'" />
<Word from="Jel" to="Je l'" />
<Word from="nedaj" to="ne daj" />
<Word from="Rješit" to="Riješit" />
<Word from="smjeo" to="smio" />
<Word from="uopče" to="uopće" />
<Word from="valda" to="valjda" />
<Word from="želila" to="željela" />
</WholeWords>
<PartialWordsAlways />
<PartialWords>
<WordPart from="¤" to="o" />
<WordPart from="vv" to="w" />
<WordPart from="IVI" to="M" />
<WordPart from="lVI" to="M" />
<WordPart from="IVl" to="M" />
<WordPart from="lVl" to="M" />
</PartialWords>
<PartialLines>
<LinePart from="bi smo" to="bismo" />
<LinePart from="dali je" to="da li je" />
<LinePart from="dali si" to="da li si" />
<LinePart from="Dali si" to="Da li si" />
<LinePart from="Jel sam ti" to="Jesam li ti" />
<LinePart from="Jel si" to="Jesi li" />
<LinePart from="Jel' si" to="Jesi li" />
<LinePart from="Je I'" to="Jesi li" />
<LinePart from="Jel si to" to="Jesi li to" />
<LinePart from="Jel' si to" to="Da li si to" />
<LinePart from="jel si to" to="da li si to" />
<LinePart from="jel' si to" to="jesi li to" />
<LinePart from="Jel si ti" to="Da li si ti" />
<LinePart from="Jel' si ti" to="Da li si ti" />
<LinePart from="jel si ti" to="da li si ti" />
<LinePart from="jel' si ti" to="da li si ti" />
<LinePart from="jel ste " to="jeste li " />
<LinePart from="Jel ste" to="Jeste li" />
<LinePart from="jel' ste " to="jeste li " />
<LinePart from="Jel' ste " to="Jeste li " />
<LinePart from="Jel su " to="Jesu li " />
<LinePart from="Jel da " to="Zar ne" />
<LinePart from="jel da " to="zar ne" />
<LinePart from="jel'da " to="zar ne" />
<LinePart from="Jeli sve " to="Je li sve" />
<LinePart from="Jeli on " to="Je li on" />
<LinePart from="Jeli ti " to="Je li ti" />
<LinePart from="jeli ti " to="je li ti" />
<LinePart from="Jeli to " to="Je li to" />
<LinePart from="Nebrini" to="Ne brini" />
<LinePart from="nedaj" to="ne daj" />
<LinePart from="ne ću" to="neću" />
<LinePart from="Nemogu" to="Ne mogu" />
<LinePart from="ne mogu" to="ne mogu" />
<LinePart from="Nemoraš" to="Ne moraš" />
<LinePart from="od kako" to="otkako" />
<LinePart from="Si dobro" to="Jesi li dobro" />
<LinePart from="Svo vreme" to="Sve vrijeme" />
<LinePart from="Svo vrijeme" to="Sve vrijeme" />
<LinePart from="Cijelo vrijeme" to="Sve vrijeme" />
</PartialLines>
<PartialLinesAlways />
<BeginLines />
<EndLines />
<WholeLines />
<RegularExpressions>
<RegEx find="đž" replaceWith="dž" />
<RegEx find="ajsmiješnij" replaceWith="ajsmješnij" />
<RegEx find="boži[čć]([aeiu]|em|ima)?\b" replaceWith="Božić$1" />
<RegEx find=" g-dine\.$" replaceWith=" gospodine." />
<RegEx find=" g-dine +(?=[A-ZČĐŠŽ])" replaceWith=" g. " />
<RegEx find="([gG])dine? +(?=[A-ZČĐŠŽ])" replaceWith="$1. " />
<RegEx find="([gG])-đo +(?=[A-ZČĐŠŽ])" replaceWith="$1gđo " />
<RegEx find="gdina +(?=[A-ZČĐŠŽ])" replaceWith="g. " />
<RegEx find=" gosp +" replaceWith=" g. " />
<RegEx find="Jel si sigur" replaceWith="Jesi li sigur" />
<RegEx find="Jel' si sigur" replaceWith="Jesi li sigur" />
<RegEx find="\b([jJ])el\?" replaceWith="$1e l'?" />
<RegEx find="\bJel'" replaceWith="Je l'" />
<RegEx find="([kK]alib(?:ar|r[aeui]))\. *([0-9])" replaceWith="$1 .$2" />
<RegEx find="([mM])jenjati" replaceWith="$1ijenjati" />
<RegEx find="([mM])oguč" replaceWith="$1oguć" />
<RegEx find="\b([nN])ebih?" replaceWith="$1e bi" />
<RegEx find="\b([nN])eč([ue]š?|emo|ete)\b" replaceWith="$1eć$2" />
<RegEx find="\b([nN])emože(mo|š|te)?\b" replaceWith="$1e može$2" />
<RegEx find="\b([nN])ezna([šm]o?|t[ei]|ju|jući|vši)?\b" replaceWith="$1e zna$2" />
<RegEx find="najcijenjen" replaceWith="najcjenjen" />
<RegEx find="N[jJ]u Jork" replaceWith="Njujork" />
<RegEx find="([oO])d([kp])" replaceWith="$1t$2" />
<RegEx find="([oO])ružij([aeu])" replaceWith="$1ružj$2" />
<RegEx find="([oO])sječa" replaceWith="$1sjeća" />
<RegEx find="([pPdD])onje([lt])" replaceWith="$1onije$2" />
<RegEx find="([pP])objedi([mšto])" replaceWith="$1obijedi$2" />
<RegEx find="redamnom" replaceWith="reda mnom" />
<RegEx find="redpostav" replaceWith="retpostav" />
<RegEx find="([pP])rimjeti" replaceWith="$1rimijeti" />
<RegEx find="([pP])romjeni([mštol])" replaceWith="$1romijeni$2" />
<RegEx find="([rR])azumijeć" replaceWith="$1azumjeć" />
<RegEx find="rascjepljen" replaceWith="rascijepljen" />
<RegEx find="redhodn" replaceWith="rethodn" />
<RegEx find="rimjenjen" replaceWith="rimijenjen" />
<RegEx find="([^d])rješit" replaceWith="$1riješit" />
<RegEx find="([sSzZ])amnom" replaceWith="$1a mnom" />
<RegEx find="([sS])lijede[čć]([aeiu]|e[mg])" replaceWith="$1ljedeć$2" />
<RegEx find="([sS])mješno" replaceWith="$1miješno" />
<RegEx find="([uU])mijesto" replaceWith="$1mjesto" />
<RegEx find="([uU])spijeh" replaceWith="$1spjeh" />
<RegEx find="([uU])spiješ(an|n[aeiou]|no[mgj])" replaceWith="$1spješ$2" />
<RegEx find="([uU])vjek" replaceWith="$1vijek" />
<RegEx find="\b([vV])eč([aeiou])" replaceWith="$1eć$2" />
<RegEx find="([zZ])ahtijeva" replaceWith="$1ahtjeva" />
<RegEx find="([zZ])ahtjeva([ojlmšt])" replaceWith="$1ahtijeva$2" />
<RegEx find="([ks]ao)\.:" replaceWith="$1:" />
<RegEx find="(?&lt;=[a-zčđšž])Ij(?=[a-zčđšž])" replaceWith="lj" />
<RegEx find="(?&lt;=[^A-ZČĐŠŽa-zčđšž])Iju(?=bav|d|t)" replaceWith="lju" />
<!-- kad ima razmak između tagova </i> <i> -->
<!-- <RegEx find="(&gt;) +(&lt;)" replaceWith="$1$2" /> -->
<!-- ',"' to '",' -->
<RegEx find="(?&lt;=\w),&quot;(?=\s|$)" replaceWith="&quot;," />
<RegEx find=",\.{3}|\.{3},|\.{2} \." replaceWith="..." />
<!-- "1 :", "2 :"... "n :" to "n:" -->
<RegEx find="([0-9]) +: +(\D)" replaceWith="$1: $2" />
<!-- Two or more consecutive "," to "..." -->
<RegEx find=",{2,}" replaceWith="..." />
<!-- Two or more consecutive "-" to "..." -->
<RegEx find="-{2,}" replaceWith="..." />
<RegEx find="([^().])\.{2}([^().:])" replaceWith="$1...$2" />
<!-- separator stotica i decimalnog ostatka 1,499,000.00 -> 1.499.000,00 -->
<RegEx find="([0-9]{3})\.([0-9]{2}[^0-9])" replaceWith="$1,$2" />
<RegEx find="([0-9]),([0-9]{3}\D)" replaceWith="$1.$2" />
<!-- Apostrophes -->
<RegEx find="´´" replaceWith="&quot;" />
<!-- <RegEx find="[´`]" replaceWith="'" /> -->
<!-- <RegEx find="[“”]" replaceWith="&quot;" /> -->
<RegEx find="''" replaceWith="&quot;" />
<!-- Two or more consecutive '"' to one '"' -->
<RegEx find="&quot;{2,}" replaceWith="&quot;" />
<!-- Fix zero and capital 'o' ripping mistakes -->
<RegEx find="(?&lt;=[0-9]\.?)O" replaceWith="0" />
<RegEx find="\b0(?=[A-ZČĐŠŽa-zčđšž])" replaceWith="O" />
<!-- Brisanje crte - na početku 1. reda (i kada ima dva reda) -->
<RegEx find="\A- ?([A-ZČĐŠŽa-zčđšž0-9„'&quot;]|\.{3})" replaceWith="$1" />
<RegEx find="\A(&lt;[ibu]&gt;)- ?" replaceWith="$1" />
<RegEx find=" - " replaceWith=" -" />
<!-- Brisanje razmaka iza crte - na početku 2. reda -->
<RegEx find="(?&lt;=\n(&lt;[ibu]&gt;)?)- (?=[A-ZČĐŠŽčš0-9„'&quot;&lt;])" replaceWith="-" />
<!-- Korigovanje crte - kad je u sredini prvog reda -->
<RegEx find="([.!?&quot;&gt;]) - ([A-ZČĐŠŽčš'&quot;&lt;])" replaceWith="$1 -$2" />
<!-- Zatvoren tag pa razmak poslije crtice -->
<RegEx find="(&gt;) - ([A-ZČĐŠŽčš„'&quot;])" replaceWith="$1 -$2" />
<!-- Zatvoren tag pa crtica razmak -->
<RegEx find="(&gt;)- ([A-ZČĐŠŽčš„'&quot;])" replaceWith="$1-$2" />
<!-- Zagrada pa crtica razmak -->
<RegEx find="\(- ([A-ZČĐŠŽčš„'&quot;])" replaceWith="(-$1" />
<!-- Smart space after dot -->
<!-- osim kad je zadnje t (riječ kolt) -->
<RegEx find="(?&lt;=[a-su-zá-úñä-ü])\.(?=[^\s\n().:?!*^“”'&quot;&lt;])" replaceWith=". " />
<!-- Oznaka za kalibar. Npr. "Colt .45" -->
<!-- Da bi radilo, da bi ovaj razmak bio dozvoljen, odčekirajte "Razmaci ispred tačke" -->
<RegEx find="t\.(?=[0-9]{2})" replaceWith="t ." />
<!-- Joey(j)a -->
<RegEx find="(?&lt;=\b[A-Z][a-z])eyj(?=[a-z])" replaceWith="ey" />
<!-- Sređuje zarez sa razmakom -->
<RegEx find="(?&lt;=[A-ZČĐŠŽa-zčđšžá-úñä-ü&quot;]),(?=[^\s(),?!“&lt;])" replaceWith=", " />
<RegEx find=" +,(?=[A-ZČĐŠŽa-zčđšž])" replaceWith=", " />
<RegEx find=" +, +" replaceWith=", " />
<RegEx find=" +,$" replaceWith="," />
<RegEx find="([?!])-" replaceWith="$1 -" />
<!-- Space after last of some consecutive dots (eg. "...") -->
<RegEx find="(?&lt;=[a-zčđšž])(\.{3}|!)(?=[a-zčđšž])" replaceWith="$1 " />
<!-- Delete space after "..." that is at the beginning of the line. You may delete this line if you don't like it -->
<!-- <RegEx find="^\.{3} +" replaceWith="..." /> -->
<!-- "tekst ... tekst" mijenja u "tekst... tekst" -->
<RegEx find="(?&lt;=[A-ZČĐŠŽa-zčđšž]) +\.{3} +" replaceWith="... " />
<RegEx find="(?&lt;=\S)\. +&quot;" replaceWith=".&quot;" />
<RegEx find="&quot; +\." replaceWith="&quot;." />
<RegEx find="(?&lt;=\S\.{3}) +&quot;(?=\s|$)" replaceWith="&quot;" />
<RegEx find=" +\.{3}$" replaceWith="..." />
<RegEx find="(?&lt;=[a-zčđšž])(?: +\.{3}|\.{2}$)" replaceWith="..." />
<!-- Razmak ispred zagrade -->
<RegEx find="(?&lt;=[A-ZČĐŠŽa-zčđšž])\(" replaceWith=" (" />
<!-- Razmak iza upitnika -->
<RegEx find="\?(?=[A-ZČĐŠŽčš])" replaceWith="? " />
<RegEx find="(?&lt;=^|&gt;)\.{3} +(?=[A-ZČĐŠŽčš])" replaceWith="..." />
<!-- Brise ... kad je na poč. reda "... -->
<RegEx find="^&quot;\.{3} +" replaceWith="&quot;" />
<RegEx find="(?&lt;=[0-9])\$" replaceWith=" $$" />
<!-- ti š -> t š by Strider -->
<!-- Zamijeni sva "**ti šu*" s "**t šu*" i "**ti še*" s "**t še*" -->
<!-- <RegEx find="([a-z])ti (š+[eu])" replaceWith="$1t $2" /> -->
<!-- <RegEx find="([A-Za-z])ti( |\r?\n)(š[eu])" replaceWith="$1t$2$3" /> -->
<!-- <RegEx find="(?i)\b(ni)t (š[eu])" replaceWith="$1ti $2" /> -->
<!-- <RegEx find="\. +Mr. " replaceWith=". G. " /> -->
<!-- <RegEx find="\. +Mrs. " replaceWith=". Gđa " /> -->
<!-- <RegEx find="\. +Miss " replaceWith=". Gđica " /> -->
<!-- <RegEx find=", +Mrs. " replaceWith=", gđo " /> -->
<!-- <RegEx find=", +Miss " replaceWith=", gđice " /> -->
<!-- Razmak poslije <i> i poslije .. -->
<RegEx find="^(&lt;[ibu]&gt;) +" replaceWith="$1" />
<RegEx find="^\.{2} +" replaceWith="..." />
<!-- Razmak ? "</i> -->
<RegEx find="([.?!]) +(&quot;&lt;)" replaceWith="$1$2" />
<!-- Bez razmaka kod Npr.: -->
<RegEx find="(?&lt;=[Nn]pr\.) *: *" replaceWith=": " />
<RegEx find="\. ," replaceWith=".," />
<RegEx find="([?!])\." replaceWith="$1" />
<!-- Da ne kvari potpise sa ..:: -->
<RegEx find="\.{3}::" replaceWith="..::" />
<RegEx find="::\.{3}" replaceWith="::.." />
<RegEx find="\.{2} +::" replaceWith="..::" />
<!-- Skracenice bez razmaka -->
<RegEx find="d\. o\.o\." replaceWith="d.o.o." />
<!-- Kad red počinje sa ...pa malo slovo -->
<!-- <RegEx find="^\.{3}([a-zčđšž&quot;&lt;])" replaceWith="$1" /> -->
<!-- <RegEx find=" +([.?!])" replaceWith="$1" /> -->
</RegularExpressions>
</OCRFixReplaceList>
@@ -0,0 +1,405 @@
<OCRFixReplaceList>
<WholeWords>
<Word from="lârt" to="lärt" />
<Word from="hedervårda" to="hedervärda" />
<Word from="stormâstare" to="stormästare" />
<Word from="Avfârd" to="Avfärd" />
<Word from="tâlten" to="tälten" />
<Word from="ârjag" to="är jag" />
<Word from="ärjag" to="är jag" />
<Word from="jâmlikar" to="jämlikar" />
<Word from="Riskakofl" to="Riskakor" />
<Word from="Karamellen/" to="Karamellen" />
<Word from="Lngenüng" to="Ingenting" />
<Word from="ärju" to="är ju" />
<Word from="Sá" to="Så" />
<Word from="närjag" to="när jag" />
<Word from="alltjag" to="allt jag" />
<Word from="görjag" to="gör jag" />
<Word from="trorjag" to="tror jag" />
<Word from="varju" to="var ju" />
<Word from="görju" to="gör ju" />
<Word from="kanju" to="kan ju" />
<Word from="blirjag" to="blir jag" />
<Word from="sägerjag" to="säger jag" />
<Word from="behållerjag" to="behåller jag" />
<Word from="prøblem" to="problem" />
<Word from="räddadeju" to="räddade ju" />
<Word from="honøm" to="honom" />
<Word from="Ln" to="In" />
<Word from="svårflörtad" to="svårflörtad" />
<Word from="øch" to="och" />
<Word from="flörtar" to="flörtar" />
<Word from="kännerjag" to="känner jag" />
<Word from="flickan" to="flickan" />
<Word from="snø" to="snö" />
<Word from="gerju" to="ger ju" />
<Word from="køntakter" to="kontakter" />
<Word from="ølycka" to="olycka" />
<Word from="nølla" to="nolla" />
<Word from="sinnenajublar" to="sinnena jublar" />
<Word from="ijobbet" to="i jobbet" />
<Word from="Fårjag" to="Får jag" />
<Word from="Ar" to="Är" />
<Word from="liggerju" to="ligger ju" />
<Word from="um" to="om" />
<Word from="lbland" to="Ibland" />
<Word from="skjuterjag" to="skjuter jag" />
<Word from="Vaddå" to="Vad då" />
<Word from="pratarjämt" to="pratar jämt" />
<Word from="harju" to="har ju" />
<Word from="sitterjag" to="sitter jag" />
<Word from="häfla" to="härja" />
<Word from="sfiäl" to="stjäl" />
<Word from="FÖU" to="Följ" />
<Word from="varförjag" to="varför jag" />
<Word from="sfiärna" to="stjärna" />
<Word from="böflar" to="börjar" />
<Word from="böflan" to="början" />
<Word from="stäri" to="står" />
<Word from="pä" to="på" />
<Word from="harjag" to="har jag" />
<Word from="attjag" to="att jag" />
<Word from="Verkarjag" to="Verkar jag" />
<Word from="Kännerjag" to="Känner jag" />
<Word from="därjag" to="där jag" />
<Word from="tufi" to="tuff" />
<Word from="lurarjag" to="lurar jag" />
<Word from="varjättebra" to="var jättebra" />
<Word from="allvan" to="allvar" />
<Word from="dethär" to="det här" />
<Word from="vafle" to="varje" />
<Word from="FöUer" to="Följer" />
<Word from="personalmötetl" to="personalmötet!" />
<Word from="harjust" to="har just" />
<Word from="ärjätteduktig" to="är jätteduktig" />
<Word from="därja" to="där ja" />
<Word from="lngenüng" to="lngenting" />
<Word from="iluften" to="i luften" />
<Word from="ösen" to="öser" />
<Word from="tvâ" to="två" />
<Word from="Uejerna" to="Tjejerna" />
<Word from="hån*" to="hårt" />
<Word from="Ärjag" to="Är jag" />
<Word from="keL" to="Okej" />
<Word from="Förjag" to="För jag" />
<Word from="varjättekul" to="var jättekul" />
<Word from="kämpan" to="kämpar" />
<Word from="mycketjobb" to="mycket jobb" />
<Word from="Uus" to="ljus" />
<Word from="serjag" to="ser jag" />
<Word from="vetjag" to="vet jag" />
<Word from="fårjag" to="får jag" />
<Word from="hurjag" to="hur jag" />
<Word from="försökerjag" to="försöker jag" />
<Word from="tánagel" to="tånagel" />
<Word from="vaüe" to="varje" />
<Word from="Uudet" to="ljudet" />
<Word from="amhopa" to="allihopa" />
<Word from="Väü" to="Välj" />
<Word from="gäri" to="går" />
<Word from="rödüus" to="rödljus" />
<Word from="Uuset" to="ljuset" />
<Word from="Ridàn" to="Ridån" />
<Word from="viüa" to="vilja" />
<Word from="gåri" to="går i" />
<Word from="Hurdå" to="Hur då" />
<Word from="inter\/juar" to="intervjuar" />
<Word from="menarjag" to="menar jag" />
<Word from="spyrjag" to="spyr jag" />
<Word from="briüera" to="briljera" />
<Word from="Närjag" to="När jag" />
<Word from="ner\/ös" to="nervös" />
<Word from="ilivets" to="i livets" />
<Word from="nägot" to="något" />
<Word from="pà" to="på" />
<Word from="Lnnan" to="Innan" />
<Word from="Uf" to="Ut" />
<Word from="lnnan" to="Innan" />
<Word from="Dàren" to="Dåren" />
<Word from="Fàrjag" to="Får jag" />
<Word from="VadärdetdäL" to="Vad är det där" />
<Word from="smàtjuv" to="småtjuv" />
<Word from="tàgrånare" to="tågrånare" />
<Word from="ditàt" to="ditåt" />
<Word from="sä" to="så" />
<Word from="vàrdslösa" to="vårdslösa" />
<Word from="nàn" to="nån" />
<Word from="kommerjag" to="kommer jag" />
<Word from="ärjättebra" to="är jättebra" />
<Word from="ärjävligt" to="är jävligt" />
<Word from="àkerjag" to="åker jag" />
<Word from="ellerjapaner" to="eller japaner" />
<Word from="attjaga" to="att jaga" />
<Word from="eften" to="efter" />
<Word from="hästan" to="hästar" />
<Word from="Lntensivare" to="Intensivare" />
<Word from="fràgarjag" to="frågar jag" />
<Word from="pen/ers" to="pervers" />
<Word from="ràbarkade" to="råbarkade" />
<Word from="styrkon" to="styrkor" />
<Word from="Difåf" to="Ditåt" />
<Word from="händen" to="händer" />
<Word from="föfia" to="följa" />
<Word from="Idioten/" to="Idioter!" />
<Word from="Varförjagade" to="Varför jagade" />
<Word from="därförjag" to="därför jag" />
<Word from="forjag" to="for jag" />
<Word from="Iivsgladje" to="livsglädje" />
<Word from="narjag" to="när jag" />
<Word from="sajag" to="sa jag" />
<Word from="genastja" to="genast ja" />
<Word from="rockumentàren" to="rockumentären" />
<Word from="turne" to="turné" />
<Word from="fickjag" to="fick jag" />
<Word from="sager" to="säger" />
<Word from="Ijushårig" to="ljushårig" />
<Word from="tradgårdsolycka" to="trädgårdsolycka" />
<Word from="kvavdes" to="kvävdes" />
<Word from="dàrja" to="där ja" />
<Word from="hedersgaster" to="hedersgäster" />
<Word from="Nar" to="När" />
<Word from="smakiösa" to="smaklösa" />
<Word from="lan" to="Ian" />
<Word from="Lan" to="Ian" />
<Word from="eri" to="er i" />
<Word from="universitetsamne" to="universitetsämne" />
<Word from="garna" to="gärna" />
<Word from="ar" to="är" />
<Word from="baltdjur" to="bältdjur" />
<Word from="varjag" to="var jag" />
<Word from="àr" to="är" />
<Word from="förförstàrkare" to="förförstärkare" />
<Word from="arjattespeciell" to="är jättespeciell" />
<Word from="hàrgår" to="här går" />
<Word from="Ia" to="la" />
<Word from="Iimousinen" to="limousinen" />
<Word from="krickettra" to="kricketträ" />
<Word from="hårdrockvàrlden" to="hårdrockvärlden" />
<Word from="tràbit" to="träbit" />
<Word from="Mellanvastern" to="Mellanvästern" />
<Word from="arju" to="är ju" />
<Word from="turnen" to="turnén" />
<Word from="kanns" to="känns" />
<Word from="battre" to="bättre" />
<Word from="vàrldsturne" to="världsturne" />
<Word from="dar" to="där" />
<Word from="sjàlvantànder" to="självantänder" />
<Word from="jattelange" to="jättelänge" />
<Word from="berattade" to="berättade" />
<Word from="Sä" to="Så" />
<Word from="vandpunkten" to="vändpunkten" />
<Word from="Nàrjag" to="När jag" />
<Word from="lasa" to="läsa" />
<Word from="skitlàskigt" to="skitläskigt" />
<Word from="sambandsvàg" to="sambandsväg" />
<Word from="valdigt" to="väldigt" />
<Word from="Stamgafiel" to="Stämgaffel" />
<Word from="àrjag" to="är jag" />
<Word from="tajming" to="tajmning" />
<Word from="utgäng" to="utgång" />
<Word from="Hàråt" to="Häråt" />
<Word from="hàråt" to="häråt" />
<Word from="anvander" to="använder" />
<Word from="harjobbat" to="har jobbat" />
<Word from="imageide" to="imageidé" />
<Word from="klafien" to="klaffen" />
<Word from="sjalv" to="själv" />
<Word from="dvarg" to="dvärg" />
<Word from="detjag" to="det jag" />
<Word from="dvargarna" to="dvärgarna" />
<Word from="fantasivàrld" to="fantasivärld" />
<Word from="fiolliga" to="Fjolliga" />
<Word from="mandoiinstràngar" to="mandollnsträngar" />
<Word from="mittjobb" to="mitt jobb" />
<Word from="Skajag" to="Ska jag" />
<Word from="landari" to="landar i" />
<Word from="gang" to="gäng" />
<Word from="Detjag" to="Det jag" />
<Word from="Narmre" to="Närmre" />
<Word from="Iåtjavelni" to="låtjäveln" />
<Word from="Hållerjag" to="Håller jag" />
<Word from="visionarer" to="visionärer" />
<Word from="Tülvad" to="Till vad" />
<Word from="militàrbas" to="militärbas" />
<Word from="jattegiada" to="jätteglada" />
<Word from="Fastjag" to="Fast jag" />
<Word from="såjag" to="så jag" />
<Word from="rockvarlden" to="rockvärlden" />
<Word from="saknarjag" to="saknar jag" />
<Word from="allafall" to="alla fall" />
<Word from="fianta" to="fjanta" />
<Word from="Kràma" to="Kräma" />
<Word from="stammer" to="stämmer" />
<Word from="budbàrare" to="budbärare" />
<Word from="Iivsfiiosofi" to="livsfiiosofi" />
<Word from="förjämnan" to="för jämnan" />
<Word from="gillarjag" to="gillar jag" />
<Word from="Iarvat" to="larvat" />
<Word from="klararjag" to="klarar jag" />
<Word from="hattafi'àr" to="hattaffär" />
<Word from="Dà" to="Då" />
<Word from="uppfinna" to="uppfinna" />
<Word from="Ràttfåglar" to="Råttfåglar" />
<Word from="Sväüboda" to="Sväljboda" />
<Word from="Påböflar" to="Påbörjar" />
<Word from="slutarju" to="slutar ju" />
<Word from="nifiskebuüken" to="i fiskebutiken" />
<Word from="härjäkeln" to="här jäkeln" />
<Word from="Hßppa" to="Hoppa" />
<Word from="förstörds" to="förstördes" />
<Word from="varjättegoda" to="var jättegoda" />
<Word from="Kor\/" to="Korv" />
<Word from="brüléel" to="brülée!" />
<Word from="Hei" to="Hej" />
<Word from="älskarjordgubbsglass" to="älskar jordgubbsglass" />
<Word from="Snöbom" to="Snöboll" />
<Word from="SnöboH" to="Snöboll" />
<Word from="Snöbol" to="Snöboll" />
<Word from="snöboH" to="snöboll" />
<Word from="Läggerpå" to="Lägger på" />
<Word from="lngefl" to="lnget!" />
<Word from="Sägerjättesmarta" to="Säger jättesmarta" />
<Word from="dopplen/äderradar" to="dopplerväderradar" />
<Word from="säkertjättefin" to="säkert jättefin" />
<Word from="ärjättefin" to="är jättefin" />
<Word from="verkarju" to="verkar ju" />
<Word from="blirju" to="blir ju" />
<Word from="kor\/" to="korv" />
<Word from="naturkatastrofi" to="naturkatastrof!" />
<Word from="stickerjag" to="stickerj ag" />
<Word from="jättebufié" to="jättebuffé" />
<Word from="befinner" to="befinner" />
<Word from="Spflng" to="Spring" />
<Word from="trecfie" to="tredje" />
<Word from="ryckerjag" to="rycker jag" />
<Word from="skullejag" to="skulle jag" />
<Word from="vetju" to="vet ju" />
<Word from="afljag" to="att jag" />
<Word from="flnns" to="finns" />
<Word from="ärlång" to="är lång" />
<Word from="kåra" to="kära" />
<Word from="ärfina" to="är fina" />
<Word from="äri" to="är i" />
<Word from="hörden" to="hör den" />
<Word from="ättjäg" to="att jäg" />
<Word from="gär" to="går" />
<Word from="föri" to="för i" />
<Word from="Hurvisste" to="Hur visste" />
<Word from="fick" to="fick" />
<Word from="finns" to="finns" />
<Word from="fin" to="fin" />
<Word from="Fa" to="Bra." />
<Word from="bori" to="bor i" />
<Word from="fiendeplanl" to="fiendeplan!" />
<Word from="iförnamn" to="i förnamn" />
<Word from="detju" to="det ju" />
<Word from="Nüd" to="Niki" />
<Word from="hatarjag" to="hatar jag" />
<Word from="Klararjag" to="Klarar jag" />
<Word from="detafier" to="detaljer" />
<Word from="vä/" to="väl" />
<Word from="smakarju" to="smakar ju" />
<Word from="Teachefl" to="Teacher!" />
<Word from="imorse" to="i morse" />
<Word from="drickerjag" to="dricker jag" />
<Word from="ståri" to="står i" />
<Word from="Harjag" to="Har jag" />
<Word from="Talarjag" to="Talar jag" />
<Word from="undrarjag" to="undrar jag" />
<Word from="ålderjag" to="ålder jag" />
<Word from="vafie" to="varje" />
<Word from="förfalskningl" to="förfalskning!" />
<Word from="Vifiiiiam" to="William" />
<Word from="V\filliams" to="Williams" />
<Word from="attjobba" to="att jobba" />
<Word from="intei" to="inte i" />
<Word from="närV\filliam" to="när William" />
<Word from="V\filliam" to="William" />
<Word from="Efiersom" to="Eftersom" />
<Word from="Vlfilliam" to="William" />
<Word from="Iängejag" to="länge jag" />
<Word from="'fidigare" to="Tidigare" />
<Word from="börjadei" to="började i" />
<Word from="merjust" to="mer just" />
<Word from="efieråt" to="efteråt" />
<Word from="gjordejag" to="gjorde jag" />
<Word from="hadeju" to="hade ju" />
<Word from="gårvi" to="går vi" />
<Word from="köperjag" to="köper jag" />
<Word from="Måstejag" to="Måste jag" />
<Word from="kännerju" to="känner ju" />
<Word from="fln" to="fin" />
<Word from="treviig" to="trevlig" />
<Word from="Grattisl" to="Grattis!" />
<Word from="kande" to="kände" />
<Word from="'llden" to="Tiden" />
<Word from="sakjag" to="sak jag" />
<Word from="klartjag" to="klart jag" />
<Word from="häfiigt" to="häftigt" />
<Word from="Iämnarjag" to="lämnar jag" />
<Word from="gickju" to="gick ju" />
<Word from="skajag" to="ska jag" />
<Word from="Görjag" to="Gör jag" />
<Word from="måstejag" to="måste jag" />
<Word from="gra\/iditet" to="graviditet" />
<Word from="hittadqdin" to="hittade din" />
<Word from="ärjobbigt" to="är jobbigt" />
<Word from="Overdrivet" to="Överdrivet" />
<Word from="hOgtidlig" to="högtidlig" />
<Word from="Overtyga" to="Övertyga" />
<Word from="SKILSMASSA" to="SKILSMÄSSA" />
<Word from="brukarju" to="brukar ju" />
<Word from="lsabel" to="Isabel" />
<Word from="kundejag" to="kunde jag" />
<Word from="ärläget" to="är läget" />
<Word from="blirinte" to="blir inte" />
<Word from="l'm" to="I'm" />
<Word from="lt's" to="It's" />
<Word from="ijakt" to="i jakt" />
<Word from="avjordens" to="av jordens" />
</WholeWords>
<PartialWordsAlways />
<PartialWords>
<!-- Will be used to check words not in dictionary -->
<!-- If new word(s) exists in spelling dictionary, it(they) is accepted -->
<WordPart from="¤" to="o" />
<WordPart from="fi" to="fi" />
<WordPart from="â" to="ä" />
<WordPart from="/" to="l" />
<WordPart from="vv" to="w" />
<WordPart from="IVI" to="M" />
<WordPart from="lVI" to="M" />
<WordPart from="IVl" to="M" />
<WordPart from="lVl" to="M" />
<WordPart from="m" to="rn" />
<WordPart from="l" to="i" />
<WordPart from="€" to="e" />
<WordPart from="I" to="l" />
<WordPart from="c" to="o" />
<WordPart from="i" to="t" />
<WordPart from="cc" to="oo" />
<WordPart from="ii" to="tt" />
<WordPart from="n/" to="ry" />
<WordPart from="ae" to="æ" />
<!-- "f " will be two words -->
<WordPart from="f" to="f " />
<WordPart from="c" to="e" />
<WordPart from="o" to="e" />
<WordPart from="I" to="t" />
<WordPart from="n" to="o" />
<WordPart from="s" to="e" />
<WordPart from="å" to="ä" />
<WordPart from="à" to="å" />
<WordPart from="n/" to="rv" />
</PartialWords>
<PartialLines />
<PartialLinesAlways />
<BeginLines>
<Beginning from="Ln " to="In " />
<Beginning from="U ppfattat" to="Uppfattat" />
</BeginLines>
<EndLines />
<WholeLines />
<RegularExpressions />
</OCRFixReplaceList>
@@ -0,0 +1,238 @@
# coding=utf-8
import traceback
import pysubs2
import logging
import time
from mods import EMPTY_TAG_PROCESSOR
from registry import registry
logger = logging.getLogger(__name__)
class SubtitleModifications(object):
debug = False
language = None
initialized_mods = {}
font_style_tag_start = u"{\\"
def __init__(self, debug=False):
self.debug = debug
self.initialized_mods = {}
def load(self, fn=None, content=None, language=None, encoding="utf-8"):
"""
:param encoding: used for decoding the content when fn is given, not used in case content is given
:param language: babelfish.Language language of the subtitle
:param fn: filename
:param content: unicode
:return:
"""
self.language = language
self.initialized_mods = {}
try:
if fn:
self.f = pysubs2.load(fn, encoding=encoding)
elif content:
self.f = pysubs2.SSAFile.from_string(content)
except (IOError,
UnicodeDecodeError,
pysubs2.exceptions.UnknownFPSError,
pysubs2.exceptions.UnknownFormatIdentifierError,
pysubs2.exceptions.FormatAutodetectionError):
if fn:
logger.exception("Couldn't load subtitle: %s: %s", fn, traceback.format_exc())
elif content:
logger.exception("Couldn't load subtitle: %s", traceback.format_exc())
@classmethod
def parse_identifier(cls, identifier):
# simple identifier
if identifier in registry.mods:
return identifier, {}
# identifier with params; identifier(param=value)
split_args = identifier[identifier.find("(")+1:-1].split(",")
args = dict((key, value) for key, value in [sub.split("=") for sub in split_args])
return identifier[:identifier.find("(")], args
@classmethod
def get_mod_class(cls, identifier):
identifier, args = cls.parse_identifier(identifier)
return registry.mods[identifier]
@classmethod
def get_mod_signature(cls, identifier, **kwargs):
return cls.get_mod_class(identifier).get_signature(**kwargs)
def prepare_mods(self, *mods):
parsed_mods = [SubtitleModifications.parse_identifier(mod) for mod in mods]
final_mods = {}
line_mods = []
non_line_mods = []
for identifier, args in parsed_mods:
if identifier not in registry.mods:
logger.error("Mod %s not loaded", identifier)
continue
mod_cls = registry.mods[identifier]
# exclusive mod, kill old, use newest
if identifier in final_mods and mod_cls.exclusive:
final_mods.pop(identifier)
# merge args of duplicate mods if possible
elif identifier in final_mods and mod_cls.args_mergeable:
final_mods[identifier] = mod_cls.merge_args(final_mods[identifier], args)
continue
final_mods[identifier] = args
# separate all mods into line and non-line mods
for identifier, args in final_mods.iteritems():
mod_cls = registry.mods[identifier]
if mod_cls.modifies_whole_file:
non_line_mods.append((identifier, args))
else:
line_mods.append((mod_cls.order, identifier, args))
# initialize the mods
if identifier not in self.initialized_mods:
self.initialized_mods[identifier] = mod_cls(self)
return line_mods, non_line_mods
def modify(self, *mods):
new_entries = []
start = time.time()
line_mods, non_line_mods = self.prepare_mods(*mods)
# apply file mods
if non_line_mods:
non_line_mods_start = time.time()
self.apply_non_line_mods(non_line_mods)
if self.debug:
logger.debug("Non-Line mods took %ss", time.time() - non_line_mods_start)
# sort line mods
line_mods.sort(key=lambda x: (x is None, x))
# apply line mods
if line_mods:
line_mods_start = time.time()
self.apply_line_mods(new_entries, line_mods)
if self.debug:
logger.debug("Line mods took %ss", time.time() - line_mods_start)
self.f.events = new_entries
if self.debug:
logger.debug("Subtitle Modification took %ss", time.time() - start)
def apply_non_line_mods(self, mods):
for identifier, args in mods:
mod = self.initialized_mods[identifier]
mod.modify(None, debug=self.debug, parent=self, **args)
def apply_line_mods(self, new_entries, mods):
for entry in self.f:
applied_mods = []
lines = []
line_count = 0
start_tags = []
end_tags = []
for line in entry.text.split(ur"\N"):
# don't bother the mods with surrounding tags
old_line = line
line = line.strip()
skip_line = False
line_count += 1
# clean {\X0} tags before processing
# fixme: handle nested tags?
start_tag = u""
end_tag = u""
if line.startswith(self.font_style_tag_start):
start_tag = line[:5]
line = line[5:]
if line[-5:-3] == self.font_style_tag_start:
end_tag = line[-5:]
line = line[:-5]
for order, identifier, args in mods:
mod = self.initialized_mods[identifier]
line = mod.modify(line.strip(), debug=self.debug, parent=self, **args)
if not line:
if self.debug:
logger.debug(u"%s: %r -> ''", identifier, old_line)
skip_line = True
break
applied_mods.append(identifier)
if skip_line:
continue
if start_tag:
start_tags.append(start_tag)
if end_tag:
end_tags.append(end_tag)
# append new line and clean possibly newly added empty tags
cleaned_line = EMPTY_TAG_PROCESSOR.process(start_tag + line + end_tag, debug=self.debug).strip()
if cleaned_line:
# we may have a single closing tag, if so, try appending it to the previous line
if len(cleaned_line) == 5 and cleaned_line.startswith("{\\") and cleaned_line.endswith("0}"):
if lines:
prev_line = lines.pop()
lines.append(prev_line + cleaned_line)
continue
lines.append(cleaned_line)
else:
if self.debug:
logger.debug(u"Ditching now empty line (%r -> %r)", line)
if not lines:
# don't bother logging when the entry only had one line
if self.debug and line_count > 1:
logger.debug(u"%r -> ''", entry.text)
continue
new_text = ur"\N".join(lines)
# cheap man's approach to avoid open tags
add_start_tags = []
add_end_tags = []
if len(start_tags) != len(end_tags):
for tag in start_tags:
end_tag = tag.replace("1", "0")
if end_tag not in end_tags and new_text.count(tag) > new_text.count(end_tag):
add_end_tags.append(end_tag)
for tag in end_tags:
start_tag = tag.replace("0", "1")
if start_tag not in start_tags and new_text.count(tag) > new_text.count(start_tag):
add_start_tags.append(start_tag)
if add_end_tags or add_start_tags:
entry.text = u"".join(add_start_tags) + new_text + u"".join(add_end_tags)
if self.debug:
logger.debug(u"Fixing tags: %s (%r -> %r)", str(add_start_tags+add_end_tags), new_text,
entry.text)
else:
entry.text = new_text
new_entries.append(entry)
SubMod = SubtitleModifications
@@ -0,0 +1,94 @@
# coding=utf-8
import re
import logging
from subzero.modification.processors.re_processor import ReProcessor, NReProcessor
logger = logging.getLogger(__name__)
class SubtitleModification(object):
identifier = None
description = None
long_description = None
exclusive = False
advanced = False # has parameters
args_mergeable = False
order = None
modifies_whole_file = False # operates on the whole file, not individual entries
pre_processors = []
processors = []
post_processors = []
def __init__(self, parent):
return
def _process(self, content, processors, debug=False, parent=None, **kwargs):
if not content:
return
# processors may be a list or a callable
#if callable(processors):
# _processors = processors()
#else:
# _processors = processors
_processors = processors
new_content = content
for processor in _processors:
old_content = new_content
new_content = processor.process(new_content, debug=debug)
if not new_content:
if debug:
logger.debug("Processor returned empty line: %s", processor)
break
if debug:
if old_content == new_content:
continue
logger.debug("%s: %s -> %s", processor, repr(old_content), repr(new_content))
return new_content
def pre_process(self, content, debug=False, parent=None, **kwargs):
return self._process(content, self.pre_processors, debug=debug, parent=parent, **kwargs)
def process(self, content, debug=False, parent=None, **kwargs):
return self._process(content, self.processors, debug=debug, parent=parent, **kwargs)
def post_process(self, content, debug=False, parent=None, **kwargs):
return self._process(content, self.post_processors, debug=debug, parent=parent, **kwargs)
def modify(self, content, debug=False, parent=None, **kwargs):
if not content:
return
new_content = content
for method in ("pre_process", "process", "post_process"):
if not new_content:
return
new_content = getattr(self, method)(new_content, debug=debug, parent=parent, **kwargs)
return new_content
@classmethod
def get_signature(cls, **kwargs):
string_args = ",".join(["%s=%s" % (key, value) for key, value in kwargs.iteritems()])
return "%s(%s)" % (cls.identifier, string_args)
@classmethod
def merge_args(cls, args1, args2):
raise NotImplementedError
class SubtitleTextModification(SubtitleModification):
pass
EMPTY_TAG_PROCESSOR = ReProcessor(re.compile(r'({\\\w1})[\s.,-_!?]*({\\\w0})'), "", name="empty_tag")
empty_line_post_processors = [
# empty tag
EMPTY_TAG_PROCESSOR,
# empty line (needed?)
NReProcessor(re.compile(r'^[\s-]+$'), "", name="empty_line"),
]
@@ -0,0 +1,51 @@
# coding=utf-8
import logging
from collections import OrderedDict
from subzero.modification.mods import SubtitleModification
from subzero.modification import registry
logger = logging.getLogger(__name__)
COLOR_MAP = OrderedDict([
("white", "#FFFFFF"),
("light-grey", "#C0C0C0"),
("red", "#FF0000"),
("green", "#00FF00"),
("yellow", "#FFFF00"),
("blue", "#0000FF"),
("magenta", "#FF00FF"),
("cyan", "#00FFFF"),
("black", "#000000"),
("dark-red", "#800000"),
("dark-green", "#008000"),
("dark-yellow", "#808000"),
("dark-blue", "#000080"),
("dark-magenta", "#800080"),
("dark-cyan", "#008080"),
("dark-grey", "#808080"),
])
class Color(SubtitleModification):
identifier = "color"
description = "Change the color of the subtitle"
exclusive = True
advanced = True
colors = COLOR_MAP
long_description = """\
Adds the requested color to every line of the subtitle. Support depends on player.
"""
def modify(self, content, debug=False, parent=None, **kwargs):
color = self.colors.get(kwargs.get("name"))
if color:
return u'<font color="%s">%s</font>' % (color, content)
return content
registry.register(Color)
@@ -0,0 +1,72 @@
# coding=utf-8
import re
from subzero.modification.mods import SubtitleTextModification, empty_line_post_processors
from subzero.modification.processors.string_processor import StringProcessor
from subzero.modification.processors.re_processor import NReProcessor
from subzero.modification import registry
class CommonFixes(SubtitleTextModification):
identifier = "common"
description = "Basic common fixes"
exclusive = True
order = 40
long_description = """\
Fix common whitespace/punctuation issues in subtitles
"""
processors = [
# -- = ...
StringProcessor("-- ", '... ', name="CM_doubledash"),
# '' = "
StringProcessor("''", '"', name="CM_double_apostrophe"),
# remove leading ...
NReProcessor(re.compile(r'(?u)^\.\.\.[\s]*'), "", name="CM_leading_ellipsis"),
# no space after ellipsis
NReProcessor(re.compile(r'(?u)\.\.\.(?![\s.,!?\'"])(?!$)'), "... ", name="CM_ellipsis_no_space"),
# multiple spaces
NReProcessor(re.compile(r'(?u)[\s]{2,}'), " ", name="CM_multiple_spaces"),
# no space after starting dash
NReProcessor(re.compile(r'(?u)^-(?![\s-])'), "- ", name="CM_dash_space"),
# remove starting spaced dots (not matching ellipses
NReProcessor(re.compile(r'(?u)^(?!\s?(\.\s\.\s\.)|(\s?\.{3}))[\s.]*'), "", name="CM_starting_spacedots"),
# space missing before doublequote
# ReProcessor(re.compile(r'(?u)(?<!^)(?<![\s(\["])("[^"]+")'), r' \1', name="CM_space_before_dblquote"),
# space missing after doublequote
# ReProcessor(re.compile(r'(?u)("[^"\s][^"]+")([^\s.,!?)\]]+)'), r"\1 \2", name="CM_space_after_dblquote"),
# space before ending doublequote?
# remove >>
NReProcessor(re.compile(r'(?u)^\s?>>\s*'), "", name="CM_leading_crocodiles"),
# replace uppercase I with lowercase L in words
NReProcessor(re.compile(ur'(?u)([A-zÀ-ž][a-zà-ž]+)(I+)'),
lambda match: ur'%s%s' % (match.group(1), "l"*len(match.group(2))), name="CM_uppercase_i_in_word"),
# fix spaces in numbers (allows for punctuation: ,.:' (comma only fixed if after space, those may be
# countdowns otherwise); don't break up ellipses
# fixme: maybe check whether it's a countdown (second part smaller than the first), otherwise handle default?
NReProcessor(re.compile(r'(?u)([0-9]+[0-9.:\']*(?<!\.\.))\s+((?!\.\.)[0-9,.:\']*[0-9]+)'), r"\1\2",
name="CM_spaces_in_numbers"),
# uppercase after dot
NReProcessor(re.compile(ur'(?u)((?:[^.\s])+\.\s+)([a-zà-ž])'),
lambda match: ur'%s%s' % (match.group(1), match.group(2).upper()), name="CM_uppercase_after_dot"),
]
post_processors = empty_line_post_processors
registry.register(CommonFixes)
@@ -0,0 +1,27 @@
# coding=utf-8
import logging
from subzero.modification.mods import SubtitleModification
from subzero.modification import registry
logger = logging.getLogger(__name__)
class ChangeFPS(SubtitleModification):
identifier = "change_FPS"
description = "Change the FPS of the subtitle"
exclusive = True
advanced = True
modifies_whole_file = True
long_description = """\
Re-syncs the subtitle to the framerate of the current media file.
"""
def modify(self, content, debug=False, parent=None, **kwargs):
fps_from = kwargs.get("from")
fps_to = kwargs.get("to")
parent.f.transform_framerate(float(fps_from), float(fps_to))
registry.register(ChangeFPS)
@@ -0,0 +1,48 @@
# coding=utf-8
import re
from subzero.modification.mods import SubtitleTextModification, empty_line_post_processors
from subzero.modification.processors.re_processor import NReProcessor
from subzero.modification import registry
class HearingImpaired(SubtitleTextModification):
identifier = "remove_HI"
description = "Remove Hearing Impaired tags"
exclusive = True
order = 10
long_description = """\
Removes tags, text and characters from subtitles that are meant for hearing impaired people
"""
processors = [
# brackets (only remove if at least 3 consecutive uppercase chars in brackets
NReProcessor(re.compile(ur'(?sux)[([].+(?=[A-ZÀ-Ž]{3,}).+[)\]]'), "", name="HI_brackets"),
# text before colon (and possible dash in front), max 11 chars after the first whitespace (if any)
# NReProcessor(re.compile(r'(?u)(^[A-z\-\'"_]+[\w\s]{0,11}:[^0-9{2}][\s]*)'), "", name="HI_before_colon"),
# text before colon (at least 4 consecutive uppercase chars)
NReProcessor(re.compile(ur'(?u)(^(?=.*[A-ZÀ-Ž]{4,})[A-ZÀ-Ž-_\s]+:\s*)'), "", name="HI_before_colon"),
# text in brackets at start, after optional dash, before colon or at end of line
# fixme: may be too aggressive
NReProcessor(re.compile(ur'(?um)(^-?\s?[([][A-zÀ-ž-_\s]{3,}[)\]](?:(?=$)|:\s*))'), "",
name="HI_brackets_special"),
# all caps line (at least 4 consecutive uppercase chars)
NReProcessor(re.compile(ur'(?u)(^(?=.*[A-ZÀ-Ž]{4,})[A-ZÀ-Ž-_\s]+$)'), "", name="HI_all_caps"),
# dash in front
# NReProcessor(re.compile(r'(?u)^\s*-\s*'), "", name="HI_starting_dash"),
# all caps at start before new sentence
NReProcessor(re.compile(ur'(?u)^(?=[A-ZÀ-Ž]{4,})[A-ZÀ-Ž-_\s]+\s([A-ZÀ-Ž][a-zà-ž].+)'), r"\1",
name="HI_starting_upper_then_sentence"),
]
post_processors = empty_line_post_processors
registry.register(HearingImpaired)
@@ -0,0 +1,48 @@
# coding=utf-8
import logging
from subzero.modification.mods import SubtitleTextModification
from subzero.modification.processors.string_processor import MultipleLineProcessor, WholeLineProcessor
from subzero.modification.processors.re_processor import MultipleWordReProcessor
from subzero.modification import registry
from subzero.modification.dictionaries.data import data as OCR_fix_data
logger = logging.getLogger(__name__)
class FixOCR(SubtitleTextModification):
identifier = "OCR_fixes"
description = "Fix common OCR issues"
exclusive = True
order = 20
data_dict = None
long_description = """\
Fix issues that happen when a subtitle gets converted from bitmap to text through OCR
"""
def __init__(self, parent):
super(FixOCR, self).__init__(parent)
data_dict = OCR_fix_data.get(parent.language.alpha3t)
if not data_dict:
logger.debug("No SnR-data available for language %s", parent.language)
return
self.data_dict = data_dict
self.processors = self.get_processors()
def get_processors(self):
if not self.data_dict:
return []
return [
WholeLineProcessor(self.data_dict["WholeLines"], name="SE_replace_line"),
MultipleWordReProcessor(self.data_dict["WholeWords"], name="SE_replace_word"),
MultipleWordReProcessor(self.data_dict["BeginLines"], name="SE_replace_beginline"),
MultipleWordReProcessor(self.data_dict["EndLines"], name="SE_replace_endline"),
MultipleWordReProcessor(self.data_dict["PartialLines"], name="SE_replace_partialline"),
MultipleLineProcessor(self.data_dict["PartialWordsAlways"], name="SE_replace_partialwordsalways")
]
registry.register(FixOCR)
@@ -0,0 +1,40 @@
# coding=utf-8
import logging
from subzero.modification.mods import SubtitleModification
from subzero.modification import registry
logger = logging.getLogger(__name__)
class ShiftOffset(SubtitleModification):
identifier = "shift_offset"
description = "Change the timing of the subtitle"
exclusive = False
advanced = True
args_mergeable = True
modifies_whole_file = True
long_description = """\
Adds or substracts a certain amount of time from the whole subtitle to match your media
"""
@classmethod
def merge_args(cls, args1, args2):
new_args = dict((key, int(value)) for key, value in args1.iteritems())
for key, value in args2.iteritems():
if key in new_args:
new_args[key] += int(value)
else:
new_args[key] = int(value)
return new_args
def modify(self, content, debug=False, parent=None, **kwargs):
parent.f.shift(h=int(kwargs.get("h", 0)), m=int(kwargs.get("m", 0)), s=int(kwargs.get("s", 0)),
ms=int(kwargs.get("ms", 0)))
registry.register(ShiftOffset)
@@ -0,0 +1,29 @@
# coding=utf-8
class Processor(object):
"""
Processor base class
"""
name = None
parent = None
def __init__(self, name=None, parent=None):
self.name = name
self.parent = parent
@property
def info(self):
return self.name
def process(self, content, debug=False):
return content
def __repr__(self):
return "Processor <%s %s>" % (self.__class__.__name__, self.info)
def __str__(self):
return repr(self)
def __unicode__(self):
return unicode(repr(self))
@@ -0,0 +1,48 @@
# coding=utf-8
import re
import logging
from subzero.modification.processors import Processor
logger = logging.getLogger(__name__)
class ReProcessor(Processor):
"""
Regex processor
"""
pattern = None
replace_with = None
def __init__(self, pattern, replace_with, name=None):
super(ReProcessor, self).__init__(name=name)
self.pattern = pattern
self.replace_with = replace_with
def process(self, content, debug=False):
return self.pattern.sub(self.replace_with, content)
class NReProcessor(ReProcessor):
pass
class MultipleWordReProcessor(ReProcessor):
"""
Expects a dictionary in the form of:
dict = {
"data": {"old_value": "new_value"},
"pattern": compiled re object that matches data.keys()
}
replaces found key in pattern with the corresponding value in data
"""
def __init__(self, snr_dict, name=None, parent=None):
super(ReProcessor, self).__init__(name=name)
self.snr_dict = snr_dict
def process(self, content, debug=False):
if not self.snr_dict["data"]:
return content
return self.snr_dict["pattern"].sub(lambda x: self.snr_dict["data"][x.group(0)], content)
@@ -0,0 +1,84 @@
# coding=utf-8
import logging
from subzero.modification.processors import Processor
logger = logging.getLogger(__name__)
class StringProcessor(Processor):
"""
String replacement processor base
"""
def __init__(self, search, replace, name=None, parent=None):
super(StringProcessor, self).__init__(name=name)
self.search = search
self.replace = replace
def process(self, content, debug=False):
return content.replace(self.search, self.replace)
class MultipleLineProcessor(Processor):
"""
replaces stuff in whole lines
takes a search/replace dict as first argument
Expects a dictionary in the form of:
dict = {
"data": {"old_value": "new_value"}
}
"""
def __init__(self, snr_dict, name=None, parent=None):
super(MultipleLineProcessor, self).__init__(name=name)
self.snr_dict = snr_dict
def process(self, content, debug=False):
if not self.snr_dict["data"]:
return content
for key, value in self.snr_dict["data"].iteritems():
if debug and key in content:
logger.debug(u"Replacing '%s' with '%s' in '%s'", key, value, content)
content = content.replace(key, value)
return content
class WholeLineProcessor(MultipleLineProcessor):
def process(self, content, debug=False):
if not self.snr_dict["data"]:
return content
content = content.strip()
for key, value in self.snr_dict["data"].iteritems():
if content == key:
if debug:
logger.debug(u"Replacing '%s' with '%s'", key, value)
content = value
break
return content
class MultipleWordProcessor(MultipleLineProcessor):
"""
replaces words
takes a search/replace dict as first argument
Expects a dictionary in the form of:
dict = {
"data": {"old_value": "new_value"}
}
"""
def process(self, content, debug=False):
words = content.split(u" ")
new_words = []
for word in words:
new_words.append(self.snr_dict.get(word, word))
return u" ".join(new_words)
@@ -0,0 +1,17 @@
# coding=utf-8
from collections import OrderedDict
class SubtitleModRegistry(object):
mods = None
mods_available = None
def __init__(self):
self.mods = OrderedDict()
self.mods_available = []
def register(self, mod):
self.mods[mod.identifier] = mod
self.mods_available.append(mod.identifier)
registry = SubtitleModRegistry()
@@ -4,13 +4,23 @@ import hashlib
import os
import logging
import traceback
import gzip
from babelfish import Language
from json_tricks.nonp import loads, dumps
from constants import mode_map
from subliminal_patch.subtitle import ModifiedSubtitle
logger = logging.getLogger(__name__)
class StoredSubtitle(object):
"""
legacy class used for PMS LoadObject/SaveObject
"""
score = None
storage_type = None
hash = None
@@ -46,8 +56,59 @@ class StoredSubtitle(object):
return mode_map.get(self.mode, "Unknown")
class JSONStoredSubtitle(object):
score = None
storage_type = None
hash = None
provider_name = None
id = None
date_added = None
mode = "a" # auto/manual/auto-better (a/m/b)
content = None
mods = None
encoding = None
def initialize(self, score, storage_type, hash, provider_name, id, date_added=None, mode="a", content=None,
mods=None, encoding=None):
self.score = int(score)
self.storage_type = storage_type
self.hash = hash
self.provider_name = provider_name
self.id = id
self.date_added = date_added or datetime.datetime.now()
self.mode = mode
self.content = content
self.mods = mods or []
self.encoding = encoding
def add_mod(self, identifier):
self.mods = self.mods or []
if identifier is None:
self.mods = []
return
self.mods.append(identifier)
@property
def mode_verbose(self):
return mode_map.get(self.mode, "Unknown")
def serialize(self):
if self.content:
# content is always stored in unicode (gets converted to string with escaped unicode chars by json)
self.content = self.content.decode(self.encoding)
return self.__dict__
def deserialize(self, data):
if data["content"]:
# content is always present in encoded form
data["content"] = data["content"].encode(data["encoding"])
self.initialize(**data)
class StoredVideoSubtitles(object):
"""
legacy class
manages stored subtitles for video_id per media_part/language combination
"""
video_id = None # rating_key
@@ -112,12 +173,136 @@ class StoredVideoSubtitles(object):
return str(self.video_id)
class JSONStoredVideoSubtitles(object):
"""
manages stored subtitles for video_id per media_part/language combination
"""
video_id = None # rating_key
title = None
parts = None
version = None
item_type = None # movie / episode
added_at = None
def initialize(self, plex_item, version=None):
self.video_id = str(plex_item.rating_key)
self.title = plex_item.title
self.parts = {}
self.version = version
self.item_type = plex_item.type
self.added_at = datetime.datetime.fromtimestamp(plex_item.added_at)
def deserialize(self, data):
parts = data.pop("parts")
self.parts = {}
self.__dict__.update(data)
if parts:
for part_id, part in parts.iteritems():
self.parts[part_id] = {}
for language, sub_data in part.iteritems():
self.parts[part_id][language] = {}
for sub_key, subtitle_data in sub_data.iteritems():
if sub_key == "current":
if not isinstance(subtitle_data, tuple):
subtitle_data = tuple(subtitle_data.split("__"))
self.parts[part_id][language]["current"] = subtitle_data
else:
sub = JSONStoredSubtitle()
# legacy subtitle storage instance
if isinstance(subtitle_data, StoredSubtitle):
subtitle_data = subtitle_data.__dict__
sub.initialize(**subtitle_data)
if not isinstance(sub_key, tuple):
sub_key = tuple(sub_key.split("__"))
self.parts[part_id][language][sub_key] = sub
def serialize(self):
data = {"parts": {}}
for key, value in self.__dict__.iteritems():
if key != "parts":
data[key] = value
for part_id, part in self.parts.iteritems():
data["parts"][part_id] = {}
for language, sub_data in part.iteritems():
data["parts"][part_id][language] = {}
for sub_key, stored_subtitle in sub_data.iteritems():
if sub_key == "current":
data["parts"][part_id][language]["current"] = "__".join(stored_subtitle)
else:
# migrate missing encoding data
if stored_subtitle.content and not stored_subtitle.encoding:
# correctly serialize the content
lang = Language.fromietf(language)
subtitle = ModifiedSubtitle(lang)
subtitle.content = stored_subtitle.content
stored_subtitle.encoding = subtitle.guess_encoding()
data["parts"][part_id][language]["__".join(sub_key)] = stored_subtitle.serialize()
return data
def add(self, part_id, lang, subtitle, storage_type, date_added=None, mode="a"):
part_id = str(part_id)
part = self.parts.get(part_id)
if not part:
self.parts[part_id] = {}
part = self.parts[part_id]
subs = part.get(lang)
if not subs:
part[lang] = {}
subs = part[lang]
sub_key = self.get_sub_key(subtitle.provider_name, subtitle.id)
subs[sub_key] = JSONStoredSubtitle()
subs[sub_key].initialize(subtitle.score, storage_type, hashlib.md5(subtitle.content).hexdigest(),
subtitle.provider_name, subtitle.id, date_added=date_added, mode=mode,
content=subtitle.content, mods=subtitle.mods, encoding=subtitle.guess_encoding())
subs["current"] = sub_key
return True
def get_any(self, part_id, lang):
part_id = str(part_id)
part = self.parts.get(part_id)
if not part:
return
subs = part.get(lang)
if not subs:
return
if "current" in subs and subs["current"]:
return subs.get(subs["current"])
def get_sub_key(self, provider_name, id):
return provider_name, str(id)
def __repr__(self):
return unicode(self)
def __unicode__(self):
return u"%s (%s)" % (self.title, self.video_id)
def __str__(self):
return str(self.video_id)
class StoredSubtitlesManager(object):
"""
manages the storage and retrieval of StoredVideoSubtitles instances for a given video_id
"""
storage = None
version = 2
extension = ".json.gz"
def __init__(self, storage, plexapi_item_getter):
self.storage = storage
@@ -130,6 +315,11 @@ class StoredSubtitlesManager(object):
def dataitems_path(self):
return os.path.join(getattr(self.storage, "_core").storage.data_path, "DataItems")
def get_json_data_path(self, bare_fn):
if not bare_fn.endswith(self.extension):
return os.path.join(self.dataitems_path, "%s%s" % (bare_fn, self.extension))
return os.path.join(self.dataitems_path, bare_fn)
def get_all_files(self):
return [fn for fn in os.listdir(self.dataitems_path) if fn.startswith("subs_")]
@@ -156,10 +346,13 @@ class StoredSubtitlesManager(object):
def delete_missing_files(self):
deleted = []
for fn in self.get_all_files():
video_id = os.path.basename(fn).split("subs_")[1]
video_id = os.path.basename(fn).split(".")[0].split("subs_")[1]
item = self.get_item(video_id)
if not item:
self.delete(fn)
if fn.endswith(".json.gz"):
self.delete(self.get_json_data_path(fn))
else:
self.legacy_delete(fn)
deleted.append(video_id)
return deleted
@@ -172,13 +365,47 @@ class StoredSubtitlesManager(object):
subs_for_video.version = 2
return True
def migrate_legacy_data(self, from_fn, to_fn):
try:
subs_for_video = self.storage.LoadObject(from_fn)
except:
logger.error("Failed to load item \"%s\": %s" % (from_fn, traceback.format_exc()))
# delete
return
if not subs_for_video or not hasattr(subs_for_video, "version"):
self.legacy_delete(from_fn)
# migrate to our new json format
new_subs_for_video = JSONStoredVideoSubtitles()
new_subs_for_video.deserialize(subs_for_video.__dict__)
self.save(new_subs_for_video)
self.legacy_delete(from_fn)
return new_subs_for_video
def load(self, video_id=None, filename=None):
subs_for_video = None
fn = self.get_storage_filename(video_id) if video_id else filename
try:
subs_for_video = self.storage.LoadObject(fn)
except:
logger.error("Failed to load item %s: %s" % (fn, traceback.format_exc()))
bare_fn = self.get_storage_filename(video_id) if video_id else filename
json_path = self.get_json_data_path(bare_fn)
if os.path.exists(json_path):
# new style data
subs_for_video = JSONStoredVideoSubtitles()
try:
with gzip.open(json_path, 'rb') as f:
s = f.read()
data = loads(s)
except:
logger.error("Couldn't load JSON data for %s", bare_fn)
return
subs_for_video.deserialize(data)
elif not bare_fn.endswith(".json.gz") and os.path.exists(os.path.join(self.dataitems_path, bare_fn)):
subs_for_video = self.migrate_legacy_data(bare_fn, json_path)
if not subs_for_video:
return
@@ -196,7 +423,7 @@ class StoredSubtitlesManager(object):
success = getattr(self, mig_func)(subs_for_video)
if success is False:
logger.error("Couldn't migrate %s, removing data", subs_for_video.video_id)
self.delete(fn)
self.delete(json_path)
break
if cur_ver > old_ver and success:
@@ -210,18 +437,29 @@ class StoredSubtitlesManager(object):
def load_or_new(self, plex_item):
subs_for_video = self.load(plex_item.rating_key)
if not subs_for_video:
subs_for_video = StoredVideoSubtitles(plex_item, version=self.version)
subs_for_video = JSONStoredVideoSubtitles()
subs_for_video.initialize(plex_item, version=self.version)
self.save(subs_for_video)
return subs_for_video
def save(self, subs_for_video):
data = subs_for_video.serialize()
fn = self.get_json_data_path(self.get_storage_filename(subs_for_video.video_id))
json_data = dumps(data)
with gzip.open(fn, "wb", compresslevel=6) as f:
f.write(json_data)
def delete(self, filename):
os.remove(filename)
def legacy_save(self, subs_for_video):
fn = self.get_storage_filename(subs_for_video.video_id)
try:
self.storage.SaveObject(fn, subs_for_video)
except:
logger.error("Failed to save item %s: %s" % (fn, traceback.format_exc()))
def delete(self, filename):
def legacy_delete(self, filename):
try:
self.storage.Remove(filename)
except:
+52 -15
View File
@@ -10,8 +10,8 @@ from subliminal_patch import scan_video, refine, search_external_subtitles
logger = logging.getLogger(__name__)
def parse_video(fn, hints, external_subtitles=False, embedded_subtitles=False, known_embedded=None, forced_only=False,
video_fps=None, dry_run=False):
def parse_video(fn, video_info, hints, external_subtitles=False, embedded_subtitles=False, known_embedded=None,
forced_only=False, video_fps=None, dry_run=False):
logger.debug("Parsing video: %s, hints: %s", os.path.basename(fn), hints)
video = scan_video(fn, hints=hints, dont_use_actual_file=dry_run)
@@ -19,29 +19,58 @@ def parse_video(fn, hints, external_subtitles=False, embedded_subtitles=False, k
# refiners
refine_kwargs = {
"episode_refiners": ('sz_metadata', 'tvdb', 'omdb'),
"movie_refiners": ('sz_metadata', 'omdb',),
"episode_refiners": ('tvdb', 'sz_omdb'),
"movie_refiners": ('sz_omdb',),
"embedded_subtitles": False,
}
# our own metadata refiner :)
if "stream" in video_info:
for key, value in video_info["stream"].iteritems():
if hasattr(video, key) and not getattr(video, key):
logger.info(u"Adding stream %s info: %s", key, value)
setattr(video, key, value)
plex_title = video_info.get("original_title", video_info.get("title"))
if hints["type"] == "episode":
plex_title = video_info.get("original_title", video_info.get("series"))
if not video.year:
video.year = video_info.get("year")
refine(video, **refine_kwargs)
if not video.imdb_id:
video.imdb_id = video_info.get("imdb_id")
if video.imdb_id:
logger.info(u"Adding PMS imdb_id info: %s", video.imdb_id)
if hints["type"] == "episode":
if not video.series_tvdb_id:
logger.info(u"Adding PMS series_tvdb_id info: %s", video_info.get("series_tvdb_id"))
video.series_tvdb_id = video_info.get("series_tvdb_id")
if not video.tvdb_id:
logger.info(u"Adding PMS tvdb_id info: %s", video_info.get("tvdb_id"))
video.tvdb_id = video_info.get("tvdb_id")
# re-refine with plex's known data?
refine_with_plex = False
# episode but wasn't able to match title
if hints["type"] == "episode" and not video.series_tvdb_id and not video.tvdb_id and not video.series_imdb_id \
and video.series != hints["title"]:
logger.info(u"Re-refining with series title: '%s' instead of '%s'", hints["title"], video.series)
video.series = hints["title"]
refine_with_plex = True
if plex_title:
if hints["type"] == "episode" and not video.series_tvdb_id and not video.tvdb_id and not video.series_imdb_id \
and video.series != plex_title:
logger.info(u"Re-refining with series title: '%s' instead of '%s'", plex_title, video.series)
video.series = plex_title
refine_with_plex = True
# movie
elif hints["type"] == "movie" and not video.imdb_id and video.title != hints["title"]:
# movie
logger.info(u"Re-refining with series title: '%s' instead of '%s'", hints["title"], video.title)
video.title = hints["title"]
refine_with_plex = True
elif hints["type"] == "movie" and not video.imdb_id and video.title != plex_title:
# movie
logger.info(u"Re-refining with series title: '%s' instead of '%s'", plex_title, video.title)
video.title = plex_title
refine_with_plex = True
# title not matched? try plex title hint
if refine_with_plex:
@@ -60,7 +89,6 @@ def parse_video(fn, hints, external_subtitles=False, embedded_subtitles=False, k
)
# add video fps info
# fixme: still needed?
video.fps = video_fps
# add known embedded subtitles
@@ -77,4 +105,13 @@ def parse_video(fn, hints, external_subtitles=False, embedded_subtitles=False, k
logger.debug('Found embedded subtitle %r', embedded_subtitle_languages)
video.subtitle_languages.update(embedded_subtitle_languages)
# guess special
if hints["type"] == "episode":
if video.season == 0 or video.episode == 0:
video.is_special = True
else:
# check parent folder name
if os.path.dirname(fn).split(os.path.sep)[-1].lower() in ("specials", "season 00"):
video.is_special = True
return video
+29 -20
View File
@@ -4,37 +4,43 @@
2
00:00:10,759 --> 00:00:12,678
ROSE: So what is it?
What's wrong?
ROSE: (Help us. Please. . .help us.)
What's "wrong"? over 9, 000!
3
00:00:12,679 --> 00:00:16,097
I don't know. Some kind
I don't know. Some kind of wrong "1 00" number
of signal, drawing the Tardis off course.
4
00:00:16,099 --> 00:00:17,224
Where are we?
this is a"subtitle" test "with a"text before colons and "peter"following: Where are we?."
5
00:00:17,225 --> 00:00:19,684
Earth. Utah, North America.
"less text before colons: Earth. Utah, North America."
MUSIC PLAYS What is that sound?!
ls it?
take them balls it
6
00:00:19,686 --> 00:00:21,103
About half a mile underground.
Ithinkyou're About half a miIe underground. ls it
Don't fix this countdown: 81, 80, 79, 78
But fix this: 81 ,00
7
00:00:21,103 --> 00:00:23,603
And when are we?
<i>(laughing): lrn gonna And when are we? (chuckles)
lrn gonna And when are we?</i>
8
00:00:24,274 --> 00:00:26,649
2012.
...2012. weII it's 1 2:00 o'clock
9
00:00:26,650 --> 00:00:29,370
God, that's so close. I should be 26!
(BIG BROTHER THEME MUSIC)
10
00:00:30,612 --> 00:00:33,112
@@ -43,32 +49,34 @@ God, that's so close. I should be 26!
11
00:00:33,658 --> 00:00:34,783
(WHOO
SHING) geil
SHING) >>geil
12
00:00:34,783 --> 00:00:36,826
Blimey.
-- Blimey.
13
00:00:36,828 --> 00:00:39,328
ROSE: Like a great big museum.
ROSE: Like a "great...big museum".
14
00:00:40,414 --> 00:00:42,914
DOCTOR: An alien museum.
DOCTOR's MOM: ''An alien museum".
15
00:00:43,542 --> 00:00:46,042
Someone's got a hobby.
Someone's got a hobby.
16
00:00:46,378 --> 00:00:49,048
They must've spent a fortune on this.
FULL UPPERCASE LINE HERE
and some text
- (chuckles)
17
00:00:49,631 --> 00:00:51,924
AGUGU
pepipi
<i>AGUGU
pepipi</i>
18
00:00:51,926 --> 00:00:55,304
@@ -263,12 +271,13 @@ Is it talking?
60
00:03:45,641 --> 00:03:48,141
(DRILLING)
<u>This will end up with an open end tag
<i>(DRILLING)</i></u>
61
00:03:53,233 --> 00:03:56,151
- Not exactly talking, no.
- Then what's it doing?
- (REMOVE ME <s> PLEASE)
- Then <i>what's</i> it doing?</s>
62
00:03:56,151 --> 00:03:57,235

Some files were not shown because too many files have changed in this diff Show More