Compare commits

...

586 Commits

Author SHA1 Message Date
panni 824e2c5106 only handle sections where SZ is enabled for the primary agent; fixes #167 2016-06-12 16:03:14 +02:00
panni 5ec1f31434 add media_type constants; check on startup for which libraries sub-zero is enabled 2016-06-12 15:29:17 +02:00
panni 4f4a9a8048 wip #167 2016-06-12 07:14:52 +02:00
panni a456ae4fa7 debounce functions so plexweb navigation/refresh doesn't retrigger crucial stuff; fixes #168 2016-06-12 05:32:18 +02:00
panni b3b0ab225b check for empty config.missing_permissions 2016-06-12 03:25:51 +02:00
panni f4aa5d2bf1 add plex api metadata to scanned videos; set storage_path on PatchedSubtitle; add notify_executable handling; fixes #65 2016-06-12 02:32:40 +02:00
panni 8cc7ab5775 don't error out on empty ignore_paths 2016-06-12 01:49:16 +02:00
panni 6d4a07db2e add notify_executable setting 2016-06-12 01:48:33 +02:00
panni a0d924c3b0 cleanup ignore handling; add debug info 2016-06-11 17:07:41 +02:00
panni c201bf3ef3 add optional metadata storage fallback on filesystem failure; fixes #100 2016-06-11 16:43:45 +02:00
panni 8d45a46ee2 implement ignore by path setting; fixes #134 2016-06-11 16:34:36 +02:00
panni 6a5a9b33c2 Merge branch '#159_encoding_problems' into develop 2016-06-11 15:08:13 +02:00
panni 6d237b1781 implement real ignore list check (soft/physical); fixes #164 2016-06-05 05:09:32 +02:00
panni 46b40bf2f0 fix scheduler, self.items_searching was badly unpacked 2016-06-05 02:27:33 +02:00
panni 546c258c82 add generic get_item_thumb supporting sections, episodes and everything else; display show thumbs on episode items; display section art on sections 2016-06-04 05:15:57 +02:00
panni f6031e9b9c show section art if available 2016-06-04 04:48:25 +02:00
panni b6480f9e32 move get_item to support.items; 2016-06-04 04:39:39 +02:00
panni b830aba31c add thumb for recently added 2016-06-04 04:29:34 +02:00
panni c6b0c95aa4 use default_thumb instead of thumb 2016-06-04 03:54:27 +02:00
pannal 129f58c059 Merge pull request #165 from ukdtom/master
Still some work to be done, but great, thanks :)
2016-06-04 02:13:25 +02:00
Tommy Mikkelsen c10242b388 Take two on a better channel menu 2016-06-02 00:48:34 +02:00
panni 5c0a430d84 set bases of subtitles, not provider classes 2016-05-29 17:53:37 +02:00
panni 382afa52e9 add encoding detection for Czech, Slovak, Hungarian, Slovene, Bosnian, Croatian, Serbian (Latin script), Romanian (before 1993 spelling reform), Albanian, Serbian and Macedonian; fixes #162 2016-05-29 04:27:28 +02:00
panni 8fd5191685 use get_viable_encoding() on permission check and subtitle finding; may fix #159 2016-05-29 04:01:41 +02:00
panni ce67d74980 intent now handles multiple keys; fixes #160
(cherry picked from commit a238f1875e36417a19ae27e499c0943645047d90)
2016-05-29 03:21:20 +02:00
pannal 0e95e67d7e Update README.md 2016-05-18 23:48:34 +02:00
pannal 26e7a572d4 Update README.md 2016-05-16 03:02:21 +02:00
pannal 0d3d27c343 Update README.md 2016-05-16 03:02:04 +02:00
pannal 97764cbac8 Update README.md 2016-05-16 02:22:49 +02:00
pannal 883d9b60ee Merge pull request #158 from ukdtom/master
Updated Readme.md and added Sub-Zero to TV/The movie db
2016-05-16 02:15:52 +02:00
Tommy Mikkelsen 24f6a8e1f2 Merge branch 'master' of https://github.com/ukdtom/Sub-Zero.bundle
Conflicts:
	README.md

	modified:   README.md
	new file:   Wiki/Images/Advanced_1.png
	new file:   Wiki/Images/Channel_1.png
	new file:   Wiki/Images/Channel_2.png
	new file:   Wiki/Images/Channel_3.png
2016-05-15 19:59:34 +02:00
Tommy Mikkelsen fa366f2789 Update README.md 2016-05-15 19:52:11 +02:00
Tommy Mikkelsen 2bbe7d15eb Update README.md 2016-05-15 19:48:40 +02:00
Tommy Mikkelsen c5e3dda387 Update README.md 2016-05-15 19:34:40 +02:00
Tommy Mikkelsen 0184c41c8e Update README.md 2016-05-15 19:09:56 +02:00
Tommy Mikkelsen 0c8b0c1dd9 Update README.md 2016-05-15 19:07:07 +02:00
Tommy Mikkelsen 71e5c74b77 Update README.md 2016-05-15 19:05:43 +02:00
Tommy Mikkelsen 21ab566cff Update README.md 2016-05-15 19:04:12 +02:00
Tommy Mikkelsen 20e475cfb7 Update README.md 2016-05-15 18:59:13 +02:00
Tommy Mikkelsen febf592db6 Update README.md 2016-05-15 18:58:28 +02:00
Tommy Mikkelsen fe94358f0c Merge pull request #157 from ukdtom/master
new file:   Wiki/Images/Advanced_1.png
2016-05-15 18:45:22 +02:00
Tommy Mikkelsen 0cb560b856 new file: Wiki/Images/Advanced_1.png 2016-05-15 18:44:00 +02:00
Tommy Mikkelsen faa0bb7550 Merge pull request #156 from ukdtom/master
Channel pics
2016-05-15 17:39:50 +02:00
Tommy Mikkelsen 1d7df79465 new file: Wiki/Images/Channel_1.png
new file:   Wiki/Images/Channel_2.png
	new file:   Wiki/Images/Channel_3.png
2016-05-15 17:37:26 +02:00
Tommy Mikkelsen 72f2a4fc86 Merge pull request #155 from ukdtom/master
new images for the wiki
2016-05-15 16:51:13 +02:00
Tommy Mikkelsen 8434eb4ff4 new file: Wiki/Images/Agent_Conf1.png
new file:   Wiki/Images/Agent_Conf2.png
	new file:   Wiki/Images/Agent_Conf3.png
	new file:   Wiki/Images/Agent_Conf4.png
2016-05-15 16:45:50 +02:00
Tommy Mikkelsen ba4280ee4e Merge branch 'master' of https://github.com/ukdtom/Sub-Zero.bundle 2016-05-15 16:42:20 +02:00
Tommy Mikkelsen 34f34cef4d Merge branch 'master' of https://github.com/ukdtom/Sub-Zero.bundle
Updating local repo
2016-05-15 16:31:33 +02:00
Tommy Mikkelsen 30f21d71c8 modified: Contents/Code/__init__.py
Added Sub-Zero as a provider for TV-Shows/The Movie DB
2016-05-15 16:30:07 +02:00
Tommy Mikkelsen 592d264b19 Update README.md 2016-05-15 01:09:29 +02:00
Tommy Mikkelsen 9d55dca0e1 Update README.md 2016-05-15 00:35:41 +02:00
Tommy Mikkelsen da4111904c Update README.md 2016-05-15 00:22:01 +02:00
Tommy Mikkelsen a4b9358f14 Update README.md 2016-05-15 00:21:15 +02:00
Tommy Mikkelsen 122c6527d4 Update README.md 2016-05-14 23:42:16 +02:00
Tommy Mikkelsen 844b76e116 Update README.md 2016-05-14 23:28:43 +02:00
Tommy Mikkelsen f262009349 Update README.md 2016-05-14 23:26:24 +02:00
Tommy Mikkelsen bc1a4ceb42 Update README.md 2016-05-14 22:29:10 +02:00
Tommy Mikkelsen a8ba984064 Update README.md 2016-05-14 22:25:32 +02:00
Tommy Mikkelsen fda6dab572 Update README.md 2016-05-14 22:02:43 +02:00
Tommy Mikkelsen 4cdb777840 Merge pull request #154 from ukdtom/master
modified:   Wiki/Images/Conf-2.png
2016-05-14 16:43:48 +02:00
Tommy Mikkelsen f94d9595a8 modified: Wiki/Images/Conf-2.png
new file:   Wiki/Images/Conf-3.png
	new file:   Wiki/Images/Conf-4.png
	new file:   Wiki/Images/Conf-5.png
	new file:   Wiki/Images/Conf-6.png
2016-05-14 16:19:36 +02:00
panni 5d38bd26a2 reset plugin dev mode to 0 2016-05-14 05:06:47 +02:00
panni 9239261c5a Merge remote-tracking branch 'origin/master' 2016-05-14 05:04:57 +02:00
panni e3aed706fb move generic functions to support/plex_media
(cherry picked from commit 6dd87e7)

merge fixes; add test.py; cleanup
2016-05-14 05:04:17 +02:00
pannal 89d87c6356 Merge pull request #153 from ukdtom/master
Some pics for the yet to be Wiki
2016-05-14 04:22:36 +02:00
panni a0cfe0b6fd move generic functions to support/plex_media
(cherry picked from commit 6dd87e7)
2016-05-14 04:17:12 +02:00
panni 476c311e01 leftover fixes CamelCase to snake; add TriggerListAvailableSubsForItem
(cherry picked from commit 38239f5)
2016-05-14 04:06:25 +02:00
panni bb10b8fffa CamelCase to snake_case for Sub-Zero base
(cherry picked from commit 1313abc)
2016-05-14 03:59:49 +02:00
panni 4a8fa4a838 docstrings scanVideo rename part parameter to plex_video
(cherry picked from commit 7fc2148)
2016-05-14 03:48:35 +02:00
panni 624b844454 move item hinting to support.helpers.get_item_hints
(cherry picked from commit 6f38f06)
2016-05-14 03:48:10 +02:00
panni 027f1f4045 add series/season (force)-refresh;
(cherry picked from commit 495848e)
2016-05-14 03:45:05 +02:00
panni 28d66dc162 fix SSA (other-than-SRT) handling; fixes #138 2016-05-14 03:33:37 +02:00
Tommy Mikkelsen 3995e732f6 modified: Images/Conf-2.png 2016-05-09 00:58:38 +02:00
Tommy Mikkelsen 60f553707a new file: Images/Conf-2.png 2016-05-09 00:47:07 +02:00
Tommy Mikkelsen c37e2ceaab new file: Images/Conf-1.png 2016-05-09 00:23:02 +02:00
Tommy Mikkelsen abd7922700 Select Gear-Icon 2016-05-09 00:07:44 +02:00
Tommy Mikkelsen c47389426e Select Channels img 2016-05-09 00:03:12 +02:00
Tommy Mikkelsen 6b5c7bd14b First image for the Wiki 2016-05-08 22:51:02 +02:00
panni cb072c2aa6 add fixme 2016-05-08 05:12:01 +02:00
panni 533649c791 reset plexplugindevmode=0 2016-05-08 05:10:22 +02:00
panni 3105f2e8ae fix #148; use inplace patched request/response objects for plex.py with HTTP.Request to skip plex.tv token requirement 2016-05-08 05:07:48 +02:00
pannal 8160bc98fd update licenses 2016-05-05 05:02:40 +02:00
panni 8a1b615fe9 release 1.3.33.522 2016-05-05 04:46:49 +02:00
panni 3f3bb2d830 Merge branch '#151_permission_check_windows' into 1.3-bugfixes 2016-05-05 04:18:40 +02:00
panni ae4871f6dd Merge branch '#149_treat_one_as_found' into 1.3-bugfixes 2016-05-05 04:18:34 +02:00
panni f46da7b12f Merge branch '#138_support_other_formats' into 1.3-bugfixes 2016-05-05 04:18:27 +02:00
panni b3f5bdd58d use locking on intents; fixes #118 2016-05-05 04:09:48 +02:00
panni ca8ecd297b try to handle other subtitle formats and return .srt; fixes #138 2016-05-05 03:41:27 +02:00
panni d954d25a73 fix #149; if we've got a subtitle for a file and we only want one (without language suffix), treat any subtitle as a found one 2016-05-05 03:01:22 +02:00
panni bda261b495 check for correct library permissions on windows; fixes #151 2016-05-05 02:37:46 +02:00
panni af3142546e update readme 2016-04-23 00:09:10 +02:00
panni 05b9a400fd update readme/changelog/info to 1.3.31.513 2016-04-22 23:36:21 +02:00
panni 5a0f6969d9 finally call dict.save() at the end, always 2016-04-22 23:13:20 +02:00
panni 7ea0f3f73b update six to 1.10.0 2016-04-22 22:48:17 +02:00
panni a383682147 Merge remote-tracking branch 'origin/check_permissions' into intermediate_release 2016-04-22 22:41:09 +02:00
panni 7dd414bc8f resolve #143 check permissions on plugin start 2016-04-21 14:02:26 +02:00
pannal 32fca9dadb Update README.md 2016-04-19 22:51:01 +02:00
pannal dd75eacebf update installation instructions 2016-03-02 10:04:31 +01:00
pannal ca42e7e7f1 Merge pull request #135 from plexinc-agents/master
Remove default 'Username' value from 'Addic7ed Username'; update logo
2016-03-01 10:21:19 +01:00
sander1 3430702d51 Update logo (make it 512x512px and jpeg). 2016-03-01 00:43:16 +01:00
sander1 653a9087c4 Remove default 'Userame' value from 'Addic7ed Username'. 2016-03-01 00:42:39 +01:00
panni 05889e7554 move ignore list to the bottom 2016-02-27 03:51:42 +01:00
panni 7fca0cd201 add top menu item for refreshing the current state 2016-02-27 03:47:49 +01:00
panni 7321c9095e fix #101 patch earlier 2016-02-23 18:36:38 +01:00
panni bbb83d9cad fix #101 better encoding detection with bs4 fallback 2016-02-23 18:32:29 +01:00
panni 275023c844 fix #128 actually use subliminal's subtitle.text if applicable
(cherry picked from commit 101da21)
2016-02-23 18:01:01 +01:00
panni 35c6aee5dd fix #128 add utf-8 enforcing 2016-02-21 05:59:29 +01:00
panni 25686e981f updated beautifulsoup to 4.4.1 2016-02-21 04:53:07 +01:00
panni ad8022666f re-add chardet license 2016-02-21 04:40:38 +01:00
panni 68f246cda5 update chardet to 2.3.0 2016-02-21 04:39:34 +01:00
panni cc977fce35 fix #126; re-add single language setting 2016-02-21 04:23:21 +01:00
panni d4f7e2712e update configuration docs again 2016-01-31 04:35:19 +01:00
panni 9a89b01741 update configuration docs 2016-01-31 04:33:22 +01:00
panni 009938bc06 bump to 1.3.27.491 2016-01-31 04:28:21 +01:00
panni 487e933c25 catch guessit/transfo AttributeError in download_best_subtitles; fixes #120 2016-01-31 04:01:38 +01:00
panni 539f621c0b again menu unicode fixes 2016-01-31 03:57:20 +01:00
panni bfe9860c92 TVSubtitles: remove greediness off link_re series match - correctly detect "Series Name (country)"; fixes #121 2016-01-31 03:52:13 +01:00
panni 5b5645e042 more menu unicode fixes 2016-01-31 03:49:04 +01:00
panni b9053d1dfd import intent early 2016-01-31 03:33:38 +01:00
panni a7084ecd88 fix item refresh menu unicode errors 2016-01-31 03:27:44 +01:00
panni b3b301332c make tag/exact filename search optional; fixes #123 2016-01-31 03:15:07 +01:00
panni eb43778718 Merge branch 'title_match' into develop 2016-01-31 02:56:04 +01:00
panni d6ed4e6b0b OpenSubtitles: handle "0.000" subtitle fps 2016-01-31 02:39:39 +01:00
panni e38d696ac9 score episode title as zero 2016-01-30 05:57:05 +01:00
panni 07c3a48657 treat unspecified fps as no given fps #119 2016-01-24 07:34:03 +01:00
panni ebede7a297 add markerlib; fix #115 2016-01-24 07:27:41 +01:00
panni 59ab5e16cc revert: treat 23.976 fps like 23.98 #119 2016-01-24 07:11:04 +01:00
panni 889399fc04 treat 23.976 fps like 23.98 #119 2016-01-24 07:04:54 +01:00
panni 69eda1420b broken debug log 2016-01-24 06:51:31 +01:00
panni 063920a2a5 detect and match FPS; maybe fix #119 2016-01-24 06:41:18 +01:00
panni 1623ee858f move enable_channel function to menu_helpers; fix #111 2016-01-10 04:11:16 +01:00
panni 1edd13b229 rename channel enable setting #111 2016-01-10 04:10:24 +01:00
panni d0b6fbb7b4 wrap @route and @handler and add global channel disabling #111 2016-01-10 03:39:26 +01:00
panni 4fc21a29e3 rename SubZeroAgent.agent_type_short to agent_type_verbose 2016-01-04 03:03:41 +01:00
panni 3e6d03eea1 core: simplify tv/movie agent detection 2016-01-04 02:57:24 +01:00
panni b950485f6c bump 2016 2016-01-03 03:32:25 +01:00
panni 3007c0d57f messed up the versioning. 1.3.23.459 release 2016-01-03 03:16:30 +01:00
panni 5a2b30432c Merge branch 'master' into develop 2016-01-03 03:13:07 +01:00
panni cfb66db035 1.3.20.459 release 2016-01-03 03:11:23 +01:00
panni 1eec18b76d Merge branch 'master' into develop 2016-01-03 02:59:06 +01:00
panni 1d2bfe2195 1.3.20.422 release 2016-01-03 02:58:12 +01:00
panni f4a13b2e7a Merge branch 'master' into 1.3-stable 2016-01-03 02:55:43 +01:00
panni b29667b9f6 1.3.20.422 release 2016-01-03 02:55:01 +01:00
pannal dcd21aab1c Merge pull request #108 from pannal/opensubtitles-smarty
Opensubtitles: Implement tag matching
2016-01-02 22:23:47 +01:00
panni bfbfcd2d8b OpenSubtitles: QueryParameters seems optional 2016-01-01 19:44:34 +01:00
panni bb72181359 OpenSubtitles: move tag above imdb_id 2016-01-01 05:46:21 +01:00
panni 2d0b9ab9f1 OpenSubtitles: fix QueryParameters usage 2016-01-01 05:27:28 +01:00
panni 291f462955 OpenSubtitles: os.path.basename on video.name 2015-12-31 06:06:51 +01:00
panni 74d6de9c78 prefs: rename label for physical ignore 2015-12-31 04:57:45 +01:00
panni 9f99390145 readme: add documentation for physical ignore 2015-12-31 04:57:25 +01:00
panni 8cdf12bafd readme: clarify scan: include embedded subtitles 2015-12-31 04:34:36 +01:00
panni 2b5442a2a8 readme: add plex signup link; minor corrections 2015-12-31 04:31:02 +01:00
panni ebb9f42771 readme: more detailed recommended-steps docs 2015-12-31 04:26:08 +01:00
panni c75e2b778f readme: add registration links to OS and addic7ed #105 2015-12-31 04:23:16 +01:00
panni 1f6d198bf5 add opensubtitles configuration details; add recommended section to usage 2015-12-31 04:19:50 +01:00
panni 30bbfc37fc don't treat embedded forced subtitles as found embedded subtitles; fixes #106 2015-12-31 04:07:44 +01:00
panni 5a693ae673 pep8 2015-12-31 03:50:07 +01:00
panni 38325f84ac OpenSubitles: list_subtitles: provide tag parameter to query 2015-12-31 03:48:14 +01:00
panni 0eebd164ec OpenSubitles: store QueryParameters for debug logging 2015-12-31 03:33:33 +01:00
panni f60b730411 OpenSubitles: treat a tag match like a hash match 2015-12-31 03:16:28 +01:00
panni 16db1db748 remove "format" from the hash validation for now 2015-12-31 03:16:01 +01:00
panni 818cf4bc33 don't fail on empty hint (most likely command line debugging) 2015-12-31 03:15:37 +01:00
panni 789b7ba9aa Merge remote-tracking branch 'origin/master' into opensubtitles-smarty 2015-12-31 02:40:58 +01:00
pannal fdf389f62c Merge pull request #107 from infernix/master
Define sub_dir_* only if use_filesystem is true
2015-12-29 17:00:28 +01:00
Gerben Meijer 8ae8433463 Define sub_dir_* only if use_filesystem is true 2015-12-29 14:09:23 +01:00
pannal 71464cd5bf Merge pull request #104 from pannal/deeper_guessit_hinting
fix video referenced before assignment; hint guessit two parent folde…
2015-12-28 16:44:05 +01:00
panni 44ca3b9e34 fix video referenced before assignment; hint guessit two parent folders of an episode and one of a movie 2015-12-14 19:53:08 +01:00
pannal 2dd24f02c6 MediaTree has no len 2015-12-06 15:04:18 +01:00
panni a6e6bc810a error if media not given 2015-12-06 06:28:13 +01:00
panni 9d00a82343 move IGNORE_FN 2015-12-06 06:24:59 +01:00
panni 1246c53c77 Merge remote-tracking branch 'origin/master' 2015-12-06 06:22:45 +01:00
panni 8fc10c873e move flattenToParts 2015-12-06 06:22:12 +01:00
panni b48aac638f Merge remote-tracking branch 'origin/master' into 1.3-stable 2015-12-06 06:20:28 +01:00
panni e427565fcf add filesystem ignore mode; fixes #87 2015-12-06 06:18:35 +01:00
panni 0e028b3ffe flatten the agents even more 2015-12-06 05:33:28 +01:00
panni c81e3a7def generify scanTvMedia and scanMovieMedia to scanParts 2015-12-06 05:23:55 +01:00
pannal 669c9b4fb7 Update README.md 2015-12-05 15:17:36 +01:00
panni 5f015c3d69 1.3.20.422 2015-12-05 04:50:57 +01:00
panni faa46a7e4d update settings descriptions 2015-12-05 04:32:50 +01:00
panni 70d2a225f3 do not retry on generic providererror 2015-12-05 04:22:02 +01:00
panni 1521a77281 catch ProviderError; fixes #60 2015-12-05 04:20:50 +01:00
panni 516551714b tvsubtitles: re-re-fix dashes in series name matching; stupid; fixes #93 2015-12-05 04:13:18 +01:00
panni e794122b7f addic7ed: match show ids with language modifier to non-modifier (US/UK...); fixes #90 2015-12-05 03:50:32 +01:00
panni 67282d1ebd reuse use_filesystem instead of accessing prefs again 2015-12-05 03:18:00 +01:00
panni 2c5975cf26 Merge remote-tracking branch 'origin/master' 2015-12-05 03:17:18 +01:00
panni dc142281f5 really skip filesystem if only metadata is wanted; fixes #94 2015-12-05 03:16:42 +01:00
pannal 5a445fc5bd Merge pull request #96 from Erliz/master
Add hama in to supported agents
2015-12-04 01:32:24 +01:00
pannal 7ff2f97ac3 Merge pull request #92 from pannal/unicode_test
fix unicode problems
2015-12-01 01:21:32 +01:00
pannal d47492188e unicodize title parameter in SectionMenu 2015-11-30 19:46:25 +01:00
panni 263d3e7546 use http by default, not https, for local API queries 2015-11-29 04:05:18 +01:00
panni ce31bf63e9 newline 2015-11-29 03:57:25 +01:00
panni 3c030dd6c3 use UnicodeDammit for path 2015-11-29 03:07:09 +01:00
panni 147c3dfe9d Merge remote-tracking branch 'origin/master' 2015-11-28 01:36:29 +01:00
panni c2e820f851 encoding test 2015-11-28 01:35:17 +01:00
pannal f53f5f1870 CFBundleShortVersionString 1.3.20 2015-11-27 15:00:44 +01:00
panni c20ecaa616 Merge remote-tracking branch 'origin/1.3-stable' into 1.3-stable 2015-11-27 02:03:24 +01:00
panni 2cc270708a 1.3.20.403 2015-11-27 02:01:49 +01:00
panni ddf7d4fc96 add debug logging for found metadata subtitles 2015-11-27 01:58:25 +01:00
panni 1e73b530ed leftover import 2015-11-27 01:43:29 +01:00
panni 5c4a1275fb Merge branch 'master' into opensubtitles-smarty
Conflicts:
	Contents/Libraries/Shared/subliminal_patch/patch_providers/opensubtitles.py
2015-11-27 00:08:13 +01:00
panni d55a809493 don't use unneeded subtitle metadata proxy info 2015-11-26 22:30:02 +01:00
Stanislav Vetlovskiy 50ecf71879 Add hama in to supported agents 2015-11-26 22:29:00 +03:00
panni af7434e35d opensubtitles, movies: use query even if hash, size or imdb_id are known 2015-11-26 19:50:47 +01:00
panni ad7239c5d8 set default score to 85 again 2015-11-26 10:34:03 +01:00
panni f90efceac3 catch logging error on unexpected metadata storage 2015-11-26 10:31:46 +01:00
panni c6f70dccca punctuation fixes for & and dash 2015-11-23 15:36:22 +01:00
pannal eca358e73a Merge pull request #85 from pannal/master
new stable
2015-11-22 03:48:43 +01:00
panni 4e6ce7e8bb 1.3.20.396 2015-11-22 03:22:13 +01:00
panni a2049200b1 lower minimum tv series score to 67; series=44 + season=11 + episode=11 + hearing_impaired=1 2015-11-22 02:38:16 +01:00
panni b10306aca0 rename dry to dont_use_actual_file in scan_video 2015-11-22 02:12:46 +01:00
panni aaf430cae8 let guessit see only the parent directory and the filename; add dry parameter to scan_video for testing 2015-11-22 01:51:27 +01:00
panni e7ee9e3304 more debug testing on scanVideo 2015-11-22 00:54:38 +01:00
panni a4f65adda9 register subzero's libraries' logging handlers exclusively 2015-11-22 00:48:29 +01:00
panni d38b90d1f3 move debug log call 2015-11-22 00:26:54 +01:00
panni a07a4a167c try to catch enzyme scanning errors generally 2015-11-22 00:13:10 +01:00
panni a77c29af48 blank 2015-11-21 17:05:30 +01:00
panni 4044f3e787 don't fail on unscanned video file; fixes #84 2015-11-21 17:03:28 +01:00
panni 70de96a9e8 don't fail on wrong proxy info 2015-11-21 17:00:38 +01:00
panni 014f34d813 add tag to possible opensubtitles query 2015-11-20 00:50:36 +01:00
panni 8fdc50b2aa regression: actually refresh the menu again 2015-11-19 22:48:50 +01:00
panni 88874fb9b6 bad merge 2015-11-19 22:22:27 +01:00
panni 11ad4cdeac Merge remote-tracking branch 'origin/master'
Conflicts:
	Contents/Code/support/missing_subtitles.py
	Contents/DefaultPrefs.json
2015-11-19 22:19:53 +01:00
panni c5f1b39fba 1.3.19.379 2015-11-19 22:13:41 +01:00
panni 6eb8af8fd5 make max_recent_items_per_library configurable 2015-11-19 21:45:18 +01:00
panni 2ec3b393fc make logging to console configurable, default off 2015-11-19 19:42:23 +01:00
panni 7a2977d4c8 remove thesubdb support 2015-11-19 19:19:50 +01:00
panni b987142b3f add fixme; set correct default value for flat 2015-11-15 17:48:32 +01:00
panni 22656d62d4 remove debug print 2015-11-15 17:44:49 +01:00
panni 7d6693e206 add logging configuration handlers; implement dynamic ignore list 2015-11-15 17:33:29 +01:00
panni c3f2bb4d21 add log_level setting; remove blacklist prefs 2015-11-15 16:44:16 +01:00
panni e154019d07 move builtins restoring to shared library to make it more readable :) 2015-11-14 06:41:51 +01:00
panni 1b891eba73 add ignore list management to menu; add key_order ordering to ignore list; slightly break out of the sandbox 2015-11-14 06:37:02 +01:00
panni 38e5f8e4e9 add IgnoreListMenu dummy; make IgnoreMenu smarter so it can be used programatically (don't toggle) 2015-11-14 04:31:39 +01:00
panni 428ab4c6d7 added proof of concept to restore globals (sandbox) 2015-11-14 04:03:22 +01:00
panni 27ce34bce6 change some obsolete no_history replace_parent attributes which do nothing 2015-11-14 02:15:59 +01:00
panni 6fb5760a6a store and display last state in addition to current state in menu 2015-11-13 17:23:26 +01:00
panni 2e2fd1580d only match hash if format also is right 2015-11-13 14:56:40 +01:00
panni 8ab826d27d move DictProxy to subzero.lib to avoid sandbox 2015-11-13 14:55:45 +01:00
panni d1f33baa30 explicitly save ignore list 2015-11-13 07:02:31 +01:00
panni 7239941168 save ignore list on setitem 2015-11-13 06:57:19 +01:00
panni ca00e8680d rename interface.helpers; add ignorelist log and reset functions; add title storage to ignore list for later use 2015-11-13 06:53:02 +01:00
panni 57d9e0c600 correctly move ignore stuff 2015-11-13 06:21:39 +01:00
panni f2811422f0 move menu and ignore stuff 2015-11-13 06:10:10 +01:00
panni 0f71d2e0e2 add support/ignore; add ignore option to sections, series, items 2015-11-13 05:53:31 +01:00
panni 388c4baa15 add iter to Libraries/Shared/subzero, because somehow we can't have it in the sandbox 2015-11-13 05:51:54 +01:00
panni 13a8c2facd fix typo; simplify hash validity detection 2015-11-13 00:46:12 +01:00
panni def5a26d98 reduce info logger to debug 2015-11-13 00:40:35 +01:00
panni d1ad72b0f2 correct title doesn't automatically mean episode and season are correct 2015-11-13 00:39:08 +01:00
panni da62656f7e correct hash matching, but only if important other stuff matches 2015-11-13 00:30:04 +01:00
pannal da3e2399f7 subtitles.scan.embedded now default false 2015-11-12 13:31:59 +01:00
panni c70af212d1 remove redundant menu description 2015-11-11 23:42:59 +01:00
panni 8becc8bd72 use pprint.pformat for storage logging 2015-11-11 23:40:34 +01:00
panni 5bc0307242 show the refresh trigger action in the menu state, also; add doc 2015-11-11 23:38:35 +01:00
panni 034b2975d6 clean up menu items; show current plugin state (restart, force/refreshing) on the refresh button in the menu; add intent resolving 2015-11-11 23:33:19 +01:00
panni 3ffde8c52b add comment 2015-11-11 22:47:22 +01:00
panni b125a747c8 handle all possible media types in section/first_character interface 2015-11-11 22:46:20 +01:00
panni 00e656dbce better # support; add Track parsing to section/first_character interface 2015-11-11 22:43:44 +01:00
panni a7f6224237 use the item title in firstlettermetadatamenu instead of key, to support # 2015-11-11 22:31:11 +01:00
panni 81f469531b add "All" to firstCharacter view 2015-11-11 22:25:38 +01:00
panni a4794d1619 remove section from item name; show current breadcrumbs in title2 2015-11-11 19:04:15 +01:00
panni d6b7bd1194 add version to title; better section/letter title 2015-11-11 18:19:25 +01:00
panni c0169afbc2 implement dynamic section menu; use a section/X/firstCharacter based menu if too many items are in one section to display in one go 2015-11-11 18:03:58 +01:00
panni 19fcc6a175 add function to get size of a section; special Directory handling to support sections/X/firstCharacter 2015-11-11 18:03:07 +01:00
panni cada8483fe add simple plex api query for retrieving basic information, without using any big parsing library 2015-11-11 18:02:07 +01:00
panni 2464894fd5 Plex.py: add size property to Directory object; Plex.py implement firstCharacter section filtering interface 2015-11-11 18:01:22 +01:00
panni d700df9a60 don't use hash for an episode if season and episode index don't match; fixes #80 2015-11-11 16:03:52 +01:00
panni 273a376a4a add size and total_size to plex.py's MediaContainer parser for later usage for pagination 2015-11-09 23:30:21 +01:00
panni 41b78d80e4 fix #81 2015-11-09 22:56:24 +01:00
panni d904462417 better fix than the previous quick one 2015-11-09 22:52:12 +01:00
panni 6bf9836f57 quick fix for empty season or episode index 2015-11-09 22:41:32 +01:00
pannal 92c4a2af59 do the ignore list bailout a bit earlier 2015-11-09 22:38:42 +01:00
panni bbeced7e7e re-add the upper limit of 200 per section 2015-11-08 21:49:01 +01:00
panni c94295b472 remove item count limitation on recently added 2015-11-08 18:49:22 +01:00
panni 4905429bb0 don't use /recentlyAdded per section anymore, but do a real item search 2015-11-08 16:03:25 +01:00
panni c0d60222aa finalize library-digger interface 2015-11-08 15:50:00 +01:00
panni 312c6c9729 menu update 2015-11-08 06:48:59 +01:00
panni 137cb6bb45 Merge branch 'recently-added' into menu-more
Conflicts:
	Contents/Code/interface/menu.py
	Contents/Code/support/items.py
2015-11-08 05:27:07 +01:00
panni bc3408c25d correct description 2015-11-08 04:19:13 +01:00
panni 5cb8e5e49c cleanup 2015-11-08 04:07:00 +01:00
panni 36b924443d use new recent items in recentlyAddedItems task 2015-11-08 03:58:10 +01:00
panni 5122935e10 finalize real recently added items with missing subtitles 2015-11-08 03:54:57 +01:00
panni b5176600f4 temporarily support both recently_added implementations 2015-11-07 06:16:52 +01:00
panni e073a3c289 blank current recently_added implementation 2015-11-07 06:06:22 +01:00
panni 18c2f782c2 test new recently_added implementation 2015-11-07 05:58:36 +01:00
panni 6449513cb8 remove mutable parameters 2015-11-07 04:30:57 +01:00
panni f56e39e3c2 use native String.UUID instead of uuid.uuid1 2015-11-07 02:54:17 +01:00
panni 90e423b62c 1.3.6.316 2015-11-06 22:21:47 +01:00
panni 8e455b48c3 add doc 2015-11-06 22:14:08 +01:00
panni c0d54dc6dd add doc 2015-11-06 22:07:46 +01:00
panni 3d7f4ba844 Merge branch '1.3-fixes' 2015-11-06 22:06:32 +01:00
panni ae4a0f8caa remove speedup, readd delay to 1 second 2015-11-06 19:54:45 +01:00
panni 61e02f0666 task speedup 2015-11-06 19:22:13 +01:00
panni ee9460d43e Merge branch '1.3-fixes' 2015-11-06 18:32:44 +01:00
panni 264c640036 Merge branch 'hint-guessit' 2015-11-06 18:32:15 +01:00
panni 8ae0c9bee1 report failed items to the logs after finishing the task 2015-11-06 17:14:59 +01:00
panni 670b2d18b4 try a stalled item for 4 times, then skip it 2015-11-06 17:10:31 +01:00
panni 4a37f1e6f0 add stalled items handling 2015-11-06 17:06:13 +01:00
panni 897bdff957 1.3.6.304 2015-11-06 15:35:52 +01:00
panni f1893517e0 handle rare cases of getfilesystemencoding==ANSI_X3.4-1968 2015-11-06 15:21:45 +01:00
panni 4b510f1ff6 handle filesystemencoding==ascii 2015-11-06 15:08:15 +01:00
panni 961944b0b2 patch subliminal.api.save_subtitles to work with the correct filesystem encoding 2015-11-06 14:24:03 +01:00
panni 93d0959766 fix simplejson warning 2015-11-06 14:13:44 +01:00
panni 00a5678784 correct is_recent; when searching for missing subtitles, don't refresh all at once 2015-11-06 13:59:31 +01:00
panni c34373cc00 test deep menu; make getMergedItems be more like getItems 2015-11-06 04:09:44 +01:00
panni d2992adddb correctly hint type 2015-11-06 00:20:39 +01:00
panni 0d826be66e hint guessit to the correct title and series if applicable 2015-11-06 00:10:40 +01:00
panni 67d4250c71 regression, ids needed after all 2015-11-05 23:26:25 +01:00
panni 9c2b7aead1 1.3.6.297 2015-11-05 22:36:47 +01:00
panni 67ad6cd551 reformat 2015-11-05 22:11:21 +01:00
panni a4d1ee4be0 reformatted subliminal_patch 2015-11-05 21:52:27 +01:00
panni 72b725c933 remove leftover scannedVideo.id storage 2015-11-05 20:15:26 +01:00
panni 7a308e5aed reformat menu.py; add scheduler.init_storage and call it on storage reset aswell 2015-11-05 20:09:03 +01:00
panni 7dd4bdbf74 reset self.items_searching_ids and move self.running = False 2015-11-05 20:00:16 +01:00
panni 5560afcd8f reformat DefaultPrefs; move plex credentials to the top 2015-11-05 19:52:43 +01:00
panni e2c90548ed split task run logic into prepare(), run() and post_run(); remove running as a stored parameter; get correct item ids while task is running 2015-11-05 19:51:54 +01:00
panni dd050ba770 re-add path encoding 2015-11-05 18:31:49 +01:00
panni d2e67af495 Merge remote-tracking branch 'origin/master'
Conflicts:
	Contents/Code/support/localmedia.py
2015-11-05 18:29:11 +01:00
panni b870175031 pep8; add .idea to gitignore; reformat project 2015-11-05 18:23:50 +01:00
pannal f8fc50b37b actually use the file system encoding and utf-8 as a fallback 2015-11-04 23:36:27 +01:00
pannal 730a46e32f utf-8ify file path in localmedia 2015-11-04 23:29:28 +01:00
pannal a06343b1f1 clarify on initial refresh 2015-11-04 23:11:48 +01:00
panni 675fcf8dbc remove ascii-enforcing on menu items, let plex decide 2015-11-04 22:58:11 +01:00
panni 7ef23c8434 menu: add log option for internal storages; let tasks handle their running state 2015-11-04 22:40:10 +01:00
panni 8ae7d5b755 1.3.5.281 2015-11-02 22:13:14 +01:00
pannal 46ce038238 fix no previous task storage existing raises error on signal 2015-11-02 21:59:21 +01:00
pannal d4b3e7680a Merge pull request #67 from pannal/1.3.0
1.3.5.273
2015-11-02 20:00:58 +01:00
pannal c64cdc6525 Update README.md 2015-11-02 20:00:09 +01:00
pannal 5c4bd03c94 Update README.md 2015-11-02 19:58:40 +01:00
pannal 06fe8f3144 Update README.md 2015-11-02 19:56:49 +01:00
pannal 9044090afd Update README.md 2015-11-02 19:56:27 +01:00
panni c282ff2dfb 1.3.5.273 2015-11-02 19:55:23 +01:00
panni 1e45429795 1.3.0.273 2015-11-01 16:52:40 +01:00
panni ba73109b5c Merge remote-tracking branch 'origin/1.3.0' into 1.3.0 2015-11-01 05:00:49 +01:00
panni aee03abc63 time.sleep instead of Thread.Sleep 2015-11-01 04:52:42 +01:00
panni d56bc38aeb enforce ascii on item titles 2015-11-01 04:45:05 +01:00
panni 995b917ae6 handle single refreshes while missing subtitles task is running 2015-11-01 04:32:03 +01:00
panni 821e35ebab better menu; actually skip task if already running 2015-11-01 04:19:13 +01:00
panni ecf942d267 add refresh menu item to channel 2015-11-01 03:20:33 +01:00
panni 8061dd2ed4 remove debug print 2015-11-01 02:16:13 +01:00
panni 4962fb8b66 force wide items in plex api error mode menu, in plex web 2015-11-01 02:12:13 +01:00
pannal 6e949b9cbe reduce to try:finally: 2015-11-01 00:07:06 +01:00
panni 9e1d32a8e6 make the update function more robust and make sure to always send a state info to the scheduler 2015-10-31 20:13:14 +01:00
panni 44edd4a92a correct route in PMS API ERROR menu mode 2015-10-31 18:02:38 +01:00
panni 7b6cea3b1f 1.3.0.261 2015-10-31 17:27:14 +01:00
panni dab490e21c remove localization again 2015-10-31 17:25:57 +01:00
panni bcd32924dc 1.3.0.259 2015-10-31 15:33:59 +01:00
panni df463ae2e7 add locale-data to repo 2015-10-31 15:32:21 +01:00
pannal 77cb9e328a add restart note 2015-10-31 15:22:05 +01:00
panni c1df4a06a6 1.3.0.256 2015-10-31 15:05:28 +01:00
panni 1b5a61f69d re-add babel 2015-10-31 15:03:32 +01:00
panni c546035f32 force refresh now actually force refreshes 2015-10-31 15:00:31 +01:00
panni e4eddcb9a6 1.3.0.253 2015-10-31 14:42:48 +01:00
panni bc83076daf test PMS API and fail miserably if failed; fixes #58 2015-10-31 14:38:39 +01:00
panni 7f0d1436a2 add internal provider test script; fix addic7ed show id parsing for shows with years 2015-10-31 14:19:03 +01:00
panni 056d73801b hide plex token from logs; fixes #64 2015-10-31 13:44:00 +01:00
panni 536371a580 Merge remote-tracking branch 'origin/1.3.0' into 1.3.0 2015-10-31 13:38:56 +01:00
panni cede650552 add localization stuff; localize date/time in channel menu 2015-10-31 13:38:32 +01:00
panni 96360498f8 rewrite task scheduling; keep track of missing subtitles search task 2015-10-31 04:07:33 +01:00
pannal 1c489e361d Update README.md 2015-10-30 05:00:05 +01:00
panni abc26bbba2 1.3.0.245 2015-10-30 04:56:22 +01:00
panni 3e0adb422a add date_added to subtitle storage, fixes #59 2015-10-30 04:41:46 +01:00
panni 7d2fa36d2c add donate button to info 2015-10-30 03:59:20 +01:00
panni ea6cab53ad more robust scheduler; update menu; better last_run and next_run handling 2015-10-30 03:23:12 +01:00
panni 92610fd46a move config.Plex to lib.Plex 2015-10-30 02:53:51 +01:00
pannal bcc8a1fd81 a task never ran actually is none, not now() 2015-10-29 02:33:30 +01:00
pannal edd137c7f4 fix syntax error 2015-10-29 01:48:23 +01:00
pannal 6ed0889ce9 clarify menu items 2015-10-29 01:46:50 +01:00
pannal 25fdfa5ba3 use correct way of setting Plex.configuration defaults 2015-10-29 01:38:51 +01:00
pannal 28c811163f force-save the task state even if it has never run before 2015-10-29 01:26:01 +01:00
pannal b6cf3d588a more robust task running; ensure task state even if errors occurred 2015-10-29 01:15:23 +01:00
pannal 2cce587a72 add donation button 2015-10-28 11:10:27 +01:00
pannal 5d54c24c7b Update README.md 2015-10-28 02:01:38 +01:00
panni cd152eec7f 1.3.0.232 2015-10-28 01:57:19 +01:00
panni ef8e0a4b13 add client specific uuid to plex auth 2015-10-28 01:56:26 +01:00
panni b15347ea8e 1.3.0.230 2015-10-28 01:44:29 +01:00
panni be1ad61f8b add more info to the menu 2015-10-28 01:42:31 +01:00
panni a0b44dd833 some menu cleanup 2015-10-28 01:02:35 +01:00
panni c15b316aba hopefully support plex.tv authentication now 2015-10-28 00:30:06 +01:00
panni 6349d8acfd add plexpy/Plex.tv 2015-10-27 22:16:02 +01:00
pannal 9625b63577 update intent handling; should fix issues with multiple intent sets at a time 2015-10-27 19:57:19 +01:00
pannal 3a574c7b1f fix version display in the agent names 2015-10-27 19:48:48 +01:00
pannal f2be845b10 1.3.0.222 2015-10-25 20:15:30 +01:00
pannal 8fd0d3f79b 1.3.0.222 2015-10-25 20:15:03 +01:00
pannal bfe0cd04f2 actually honor the "never" setting 2015-10-25 20:04:08 +01:00
pannal 60a01e8e85 forgot brackets 2015-10-25 20:00:11 +01:00
pannal 01e2e49f20 Update README.md 2015-10-25 16:14:02 +01:00
pannal 6c5876364b Update README.md 2015-10-25 16:13:26 +01:00
pannal 8f3c62e2a8 Update CHANGELOG.md 2015-10-25 16:10:48 +01:00
panni 04882952e1 update version 2015-10-25 16:09:55 +01:00
panni 36ac372b15 add recently added missing subtitles search task; finalize scheduler 2015-10-25 16:08:36 +01:00
panni 757f9628b6 add scheduler prefs; add refresh missing to menu; bulk commit 2015-10-25 15:38:49 +01:00
panni 3d861bf5d3 correct routing 2015-10-25 07:23:58 +01:00
panni 74a3dce903 simplify video title 2015-10-25 07:12:48 +01:00
panni 123550fa9a add locmem key-value intent object; add refresh item menu stuff 2015-10-25 07:10:17 +01:00
panni 4be85c8515 make KV-store less caring 2015-10-25 05:19:38 +01:00
panni f6059a98a2 add temporary key-value-store 2015-10-25 05:16:34 +01:00
panni 016e067596 Merge remote-tracking branch 'origin/1.3.0' into 1.3.0 2015-10-25 04:28:47 +01:00
panni a7e2141528 add advanced menu; move advanced stuff there; add plex.py handler for onDeck; add on_deck to menu 2015-10-25 04:24:20 +01:00
panni 2be59901c9 add on_deck to plex.py 2015-10-25 02:39:18 +01:00
pannal 861c2c3d80 reflect license change in readme 2015-10-24 22:13:42 +02:00
panni 9f092c539b mute prints in recent_items 2015-10-24 17:38:10 +02:00
panni e38279719b add confirmation step to storage reset 2015-10-24 17:36:21 +02:00
panni f87845f839 remove reset settings; add basic GUI; add artwork, defaults; 2015-10-24 16:07:35 +02:00
panni 734c32a63f change LICENSE from MIT to The Unlicense; update licenses in README 2015-10-24 14:59:08 +02:00
panni f367f24dc9 move subzero lib to support; add basic agent handler; add restart endpoint 2015-10-24 04:20:22 +02:00
panni 90bb518922 move ./subzero to ./support; add basic routes 2015-10-24 04:00:32 +02:00
panni 31cd106b7d updated gitignore; added subzero/lib and plex/lib 2015-10-23 15:24:59 +02:00
panni b7c15471b0 keep score of subtitle in subtitle instance for later storage 2015-10-23 15:14:54 +02:00
panni 30881d68a5 store subtitle information; update plex_test 2015-10-23 15:14:14 +02:00
panni 10cc126e99 generalize agents; add version information to logs and agents 2015-10-23 13:47:17 +02:00
panni fff9b72dd0 Merge remote-tracking branch 'origin/1.2.11-fixes' into 1.3.0 2015-10-23 12:17:35 +02:00
panni 727d0db354 improved show id search on addic7ed 2015-10-23 12:15:43 +02:00
panni 21285c2f54 declutter __init__.py; move custom configuration stuff into subzero/config.py#Config() 2015-10-22 18:33:00 +02:00
panni 9e8f60cde1 Merge remote-tracking branch 'origin/master' into 1.3.0 2015-10-22 16:06:41 +02:00
pannal 496b477ce3 Update README.md 2015-10-22 15:28:30 +02:00
pannal e6da09285b Merge pull request #50 from pannal/1.2.11-fixes
1.2.11.180
2015-10-22 15:28:03 +02:00
panni 68f71ef203 1.2.11.180 2015-10-22 15:27:12 +02:00
panni 416afad49a better fix for localmedia; scan existing metadata subtitles and skip them if found; improve localmedia 2015-10-22 15:20:10 +02:00
panni c4450ff6d6 only update localmedia if we're using local as storage, not metadata; fixes #49 2015-10-22 14:40:59 +02:00
panni 6595ff525a Merge remote-tracking branch 'origin/1.3.0' into 1.3.0 2015-10-21 17:18:47 +02:00
panni ed4752bdc9 incorporate previous test functions for missing subtitles; add scheduler 2015-10-21 17:17:30 +02:00
panni 86a59ed08d contribute to themoviedb 2015-10-21 15:13:06 +02:00
pannal 807a38d117 move all languages downloaded condition up 2015-10-21 14:43:37 +02:00
panni 7b0b7c623c add basic tester for automatic refresh of items with missing subtitles 2015-10-20 17:47:44 +02:00
panni e2f7845b94 plex.py: add refresh endpoint to library/metadata 2015-10-20 17:18:47 +02:00
panni cc7c9d4597 add missing Stream properties to plex.py 2015-10-20 16:05:56 +02:00
panni 3b8e72c0de add plex.py 0.7.0 2015-10-20 14:22:42 +02:00
panni 95181c2ce2 update release naming scheme 2015-10-20 10:51:47 +02:00
pannal d7e500585e and again. 2015-10-19 22:17:58 +02:00
pannal c6f1620dbf and forgot the version number again. 2015-10-19 22:17:44 +02:00
pannal 8990ca32b6 Merge pull request #48 from pannal/1.1.0.5
1.1.0.5
2015-10-19 22:09:10 +02:00
pannal 15accb0d71 1.1.0.5 2015-10-19 22:08:45 +02:00
pannal 5e75470dc5 Addic7ed: Remove obsolete error-prone series name/year matching 2015-10-19 11:34:17 +02:00
pannal 1fd9d73cba Merge pull request #46 from pannal/1.1.0.5
1.1.0.5
2015-10-19 03:22:56 +02:00
pannal 71c9ec33eb add support for com.plexapp.agents.xbmcnfo[tv]
https://github.com/gboudreau/XBMCnfoTVImporter.bundle and https://github.com/gboudreau/XBMCnfoMoviesImporter.bundle
2015-10-19 03:16:09 +02:00
panni c4f6a5f93c adjust default scores: TV: 85; movie: 23 2015-10-18 15:53:53 +02:00
panni 4f9691c3bd addic7ed: fix typo 2015-10-17 03:53:30 +02:00
pannal dbd2f7d69e fix el picturo 2015-10-16 05:31:28 +02:00
panni 95ac877c08 Merge branch 'master' of github.com:pannal/Sub-Zero 2015-10-16 05:17:52 +02:00
panni 5831f19ae0 forgot constant 2015-10-16 05:17:43 +02:00
pannal 530bdc5510 Update README.md 2015-10-16 05:09:35 +02:00
panni 0c01d6989a search correctly for tv subtitles; 1.1.0.3 2015-10-16 05:08:54 +02:00
pannal 02861d01d6 Update README.md 2015-10-16 04:28:55 +02:00
pannal 668d1693fe Update Info.plist 2015-10-16 04:28:28 +02:00
panni 7a3911c837 adjust default scores 2015-10-16 04:25:55 +02:00
panni 5291cbc136 only old changes in CHANGELOG.md; update logo 2015-10-16 04:09:23 +02:00
panni c1fc68204c Merge branch 'master' of github.com:pannal/Sub-Zero 2015-10-16 04:07:23 +02:00
pannal cd8fed5c7c Update README.md 2015-10-16 04:06:02 +02:00
pannal f2506fa762 Merge pull request #43 from pannal/1.1.0.1
1.1.0.1
2015-10-16 04:04:11 +02:00
pannal 382763c89e Update README.md 2015-10-16 04:02:24 +02:00
panni b4cd1ccaa5 clarify new defaults; cleanup 2015-10-16 04:01:18 +02:00
panni b5032f457f default external folder setting: current folder 2015-10-16 03:59:18 +02:00
panni f0bb3cae90 more readme 2015-10-16 03:52:56 +02:00
panni e416e82179 readme 1.1.0.1 2015-10-16 03:45:34 +02:00
panni 552aed19a0 separate changelog from readme 2015-10-16 03:36:42 +02:00
panni 6c4cefcf25 remove only_one leftover 2015-10-16 03:33:14 +02:00
panni ac41ba699c remove obsolete only_one setting; add IETF to ISO 639-1 option; rename agents 2015-10-16 03:31:05 +02:00
panni cd64118868 update version 2015-10-16 03:19:31 +02:00
panni 735df8078f Log proxy not needed anymore 2015-10-16 03:17:41 +02:00
panni 8304f49273 incorporate localmediaextended functionality into core 2015-10-16 03:16:00 +02:00
panni 3130de3a02 move back because localmediaextended won't be needed anymore 2015-10-16 01:07:27 +02:00
panni a284ac7677 use more common agent names 2015-10-16 00:12:10 +02:00
pannal 7964fd9042 prepare for 1.1.0.1 2015-10-16 00:05:31 +02:00
panni ded012a1bc tvsubtitles: be more smart about punctuation 2015-10-15 15:00:13 +02:00
panni df3e3465f9 addic7ed: be smarter about show ids 2015-10-15 14:50:59 +02:00
pannal bed93bf928 RC5.2 info 2015-10-14 22:13:59 +02:00
pannal 7697ceffef RC 5.2 readme 2015-10-14 22:13:32 +02:00
panni 81dd24a9bd Merge branch 'detached' 2015-10-14 22:05:23 +02:00
panni 729d7d97c4 revert back from plex/localmedia/master to plex/localmedia/dist 2015-10-14 22:04:15 +02:00
pannal c7a4b3c0a4 README.md not so outdated anymore 2015-10-14 19:17:44 +02:00
pannal 3da044ada9 forgot Info.plist update 2015-10-14 19:01:32 +02:00
pannal 44bbc93dae Update README.md 2015-10-14 17:41:13 +02:00
pannal 54341a0afc RC5.1 2015-10-14 17:41:05 +02:00
pannal 599eab3e5b Merge pull request #40 from pannal/RC5
RC5.1
2015-10-14 17:33:44 +02:00
panni 9f9c875234 Merge remote-tracking branch 'origin' into RC5 2015-10-14 17:32:25 +02:00
panni 74c0ed80c5 make hearing impaired more configurable and clear 2015-10-14 17:32:06 +02:00
pannal 5ecb7aea5e update download links 2015-10-14 16:42:10 +02:00
pannal 829eacc4d6 RC5 2015-10-14 16:41:46 +02:00
pannal f7b3f924b4 Merge pull request #39 from pannal/RC5
RC5
2015-10-14 16:32:45 +02:00
panni e247bc0e59 add optional boost for addic7ed subtitles; partly fixes #8 2015-10-14 16:31:56 +02:00
panni 4158416183 hard bail-out if hearing_impaired didn't match 2015-10-14 16:30:33 +02:00
panni cf1181f2af add custom language field; fixes #27 2015-10-14 15:39:42 +02:00
panni a2d1335403 pass known video type info to guessit; fixes #38 2015-10-14 14:53:20 +02:00
panni 520cbb5189 patch subtitle repr to include download/page link; fixes #34 2015-10-14 14:37:44 +02:00
panni e8eeadb094 add colon and single quote to punctuation fix mixin; resolves #36 2015-10-14 13:57:27 +02:00
panni 92a2336dba Merge remote-tracking branch 'origin' into RC5 2015-10-14 13:56:06 +02:00
panni cbc75c8b85 update to newest LocalMediaExtended 2015-10-14 13:40:06 +02:00
panni 563973163e only pass the file name and three parent directories to guessit; should fix #38 2015-10-14 13:24:10 +02:00
panni e147a7a0ca use persistent Daemon mode; use correct bundle versioning; short: 1.0.9, build: 1.0.9.5 2015-10-14 13:16:18 +02:00
panni b494dc7bec cosmetic guessit update; add LICENSE and README 2015-10-14 12:49:10 +02:00
pannal 9ce4b02610 most likely fix punctuation issues with quotes in series names 2015-10-13 10:15:37 +02:00
pannal d0ff69d224 Update README.md 2015-10-11 04:17:56 +02:00
pannal cde09e0f56 add plex forum thread link 2015-10-11 04:17:39 +02:00
pannal 84409395d1 Update README.md 2015-10-11 03:36:40 +02:00
pannal e4e6bcfad2 Update README.md 2015-10-11 03:25:39 +02:00
panni 2103215e41 add dynamic animated logo from github 2015-10-11 03:24:17 +02:00
panni d086569f09 add correct plugin info; test animated subzero :) 2015-10-11 03:13:59 +02:00
panni 28064767ea update Info.plist 2015-10-11 02:42:53 +02:00
panni e996e4d4b6 replace default icon 2015-10-11 02:16:38 +02:00
pannal 422100f9fc Update README.md 2015-10-11 02:12:31 +02:00
pannal c9a7ffd778 Update README.md 2015-10-11 02:11:41 +02:00
pannal db009abf79 Merge pull request #30 from pannal/RC4
decouple from Subliminal.bundle
2015-10-11 02:07:24 +02:00
pannal c1cc7c98ef Update README.md 2015-10-11 02:06:31 +02:00
pannal a08b00d5c4 Update README.md 2015-10-11 02:06:17 +02:00
panni 16a22ab7b2 move more 2015-10-11 02:02:27 +02:00
panni da32ee2504 move moving 2015-10-11 02:01:36 +02:00
panni 54eaa9e695 move stuff 2015-10-11 02:00:11 +02:00
peter penis 28c1481a48 move to Sub-Zero; RC4; add LocalMediaExtended.bundle into SS 2015-10-11 01:57:48 +02:00
pannal cac340ad43 Update Info.plist 2015-10-11 01:53:05 +02:00
pannal d6994d9a60 Update README.md 2015-10-11 01:52:35 +02:00
pannal 90372ad30d Update DefaultPrefs.json 2015-10-10 14:43:12 +02:00
pannal 24fc22dbe6 Update DefaultPrefs.json 2015-10-10 14:42:39 +02:00
pannal 7b7adac774 Update README.md 2015-10-10 00:51:08 +02:00
pannal 7f0ff6ae2f Update README.md 2015-10-10 00:50:27 +02:00
pannal 1b3e58b326 Update README.md 2015-10-10 00:45:55 +02:00
pannal dc47fc60b8 Update README.md 2015-10-09 19:22:16 +02:00
pannal 6c588964a7 Update README.md 2015-10-09 02:42:20 +02:00
pannal f65b24094a Merge pull request #25 from pannal/rc3
pull RC3 into master
2015-10-09 02:36:57 +02:00
panni 6b807be0e6 opensubtitles: add optional credentials for VIPs; fixes #17 2015-10-09 02:35:33 +02:00
panni a794eb8310 providers: move punctuation fix into seperate mixins.py and use it 2015-10-09 02:08:43 +02:00
panni 8290c8a371 tvsubtitles: fix series with punctuation 2015-10-09 02:04:30 +02:00
panni 475152a7eb podnapisi: fix logging 2015-10-09 01:40:24 +02:00
panni 4e75e20ede add download retry option; fixes #24; move questionable only_one setting to the bottom 2015-10-09 01:28:56 +02:00
panni d36823c7ca better score logging; move patched providers to separate folder; better addic7ed punctuation handling in get_show_ids 2015-10-09 00:48:11 +02:00
panni 2a6b387112 addic7ed: fix series detection with punctuation; add missing self 2015-10-08 10:38:29 +02:00
panni a83822bff9 more verbose logging on subtitle download fail 2015-10-08 10:37:51 +02:00
panni 8e7538f6e6 fix broken import 2015-10-07 19:05:48 +02:00
panni 9cdb26f7cc forgot second clean_punctuation 2015-10-07 19:03:45 +02:00
panni 9659c913c4 Merge branch 'master' of github.com:pannal/Subliminal.bundle 2015-10-07 19:02:46 +02:00
panni c9506cb95e fix getting addic7ed show IDs for series with punctuation in their names 2015-10-07 19:02:33 +02:00
pannal 43e6ce3997 Update README.md 2015-10-07 05:13:36 +02:00
pannal dfd12edcb3 Update DefaultPrefs.json 2015-10-07 05:11:10 +02:00
pannal 154a8072f6 Update README.md 2015-10-07 04:07:59 +02:00
pannal 904abaf26b Update README.md 2015-10-07 02:58:32 +02:00
panni bea18a27ba set default TV score to 15; movie score to 30 2015-10-07 02:55:56 +02:00
pannal 2d998eab50 Update README.md 2015-10-07 02:47:40 +02:00
pannal a25a67572b Update README.md 2015-10-07 02:45:23 +02:00
pannal 1bdf6f9969 Merge pull request #22 from pannal/rc1-fix
RC1 fixes
2015-10-07 02:44:10 +02:00
panni 0b32892fa8 better existing subtitles debug logging 2015-10-07 02:42:14 +02:00
panni fea5b8a716 switch to tonswieb/enzyme 2015-10-07 02:06:47 +02:00
panni 90b3707409 update enzyme 2015-10-07 01:07:01 +02:00
panni 1c0224fbe7 skip empty folder creation if not subtitles found; should fix #20 2015-10-07 00:59:07 +02:00
pannal 626fcd1140 Update README.md 2015-09-24 02:57:23 +02:00
pannal b01c84b14c Update README.md 2015-09-24 02:55:53 +02:00
pannal 412492b4d1 Update README.md 2015-09-24 02:55:37 +02:00
panni 9a6f7a4316 forgot import, again 2015-09-24 02:44:30 +02:00
panni 660f887923 correct number casting; fixes #16 2015-09-24 02:34:34 +02:00
panni fe9c67ed91 forgot import 2015-09-24 02:13:20 +02:00
panni d3bbd05e4f subliminal: fix wrong usage of logger; fixes #15 2015-09-24 01:58:18 +02:00
panni 34585129aa Merge branch 'master' of github.com:pannal/Subliminal.bundle 2015-09-24 01:27:26 +02:00
panni 955cd4c173 allow only one subtitle optionally; fixes #3 2015-09-24 01:27:15 +02:00
pannal 4da63a8fd7 Update README.md 2015-09-23 14:40:42 +02:00
panni fa27789608 fixed typo 2015-09-23 14:31:55 +02:00
panni f9e9f35157 Merge branch 'deep_scan_subs'
Conflicts:
	Contents/Code/__init__.py
2015-09-23 14:29:21 +02:00
panni 4a6604f0ab custom folder now takes precedence; also scan subfolders for existing subtitles if configured; update custom folder settings description; remove direct subliminal.video patch and move it to subliminal_patch.patch_video 2015-09-23 14:26:21 +02:00
panni 971d1221da don't die on missing header; maybe fixes #13 2015-09-23 13:36:18 +02:00
panni ba69885477 fix saving subs to video folder without custom_path given; should fix #14 2015-09-23 12:46:07 +02:00
panni 8e23098037 add basic functionality to scan custom (sub-) folders for subtitles 2015-09-19 04:35:48 +02:00
pannal 8da7bf029c Update README.md 2015-09-18 03:48:34 +02:00
pannal e16e58cbfa Update README.md 2015-09-18 03:29:34 +02:00
pannal abb7cd3bfa Update README.md 2015-09-18 03:19:04 +02:00
pannal bfa06f3989 Update README.md 2015-09-18 03:16:37 +02:00
pannal c63529939d Merge pull request #11 from pannal/guessit-0.11.0
update guessit to 0.11.0
2015-09-18 03:16:20 +02:00
panni 2814f57e89 update guessit to 0.11.0 2015-09-18 03:14:21 +02:00
panni 70476883c6 Merge branch 'master' of github.com:pannal/Subliminal.bundle 2015-09-18 03:11:20 +02:00
panni b5ed209453 Revert "update guessit to 0.11.0"
This reverts commit be7687f15d.
2015-09-18 03:10:58 +02:00
panni be7687f15d update guessit to 0.11.0 2015-09-18 03:08:55 +02:00
pannal b7fb8e1e76 Update README.md 2015-09-18 02:56:40 +02:00
pannal 1a03720a7d Update README.md 2015-09-18 02:49:34 +02:00
pannal cb4099109a Update README.md 2015-09-18 02:49:19 +02:00
pannal 131504e7ee Merge pull request #10 from pannal/provider_fixes
Provider fixes/addons
2015-09-18 02:42:31 +02:00
pannal b0c7b480d6 Update README.md 2015-09-18 02:40:03 +02:00
panni e543c927cf add third optional language; update option description 2015-09-18 02:32:16 +02:00
panni 897b602d71 correct typo 2015-09-18 02:27:13 +02:00
panni d94421dcf3 add support for 'fa', Persian (Farsi) 2015-09-18 02:17:30 +02:00
panni e371b99dca add support for pt-br, Portuguese Brasil 2015-09-18 02:16:03 +02:00
panni 49d10e5ff7 remove leftover addic7ed score boost; add use_random_agents option to addic7ed 2015-09-18 02:08:01 +02:00
pannal d959f5b826 Update README.md 2015-09-18 01:07:47 +02:00
pannal 709f5cb605 Merge pull request #7 from pannal/provider_fixes
Provider fixes for newest subliminal
2015-09-18 01:06:48 +02:00
panni b11a051c23 patch language converted for addic7ed to support French (Canadian) 2015-09-18 00:57:54 +02:00
panni 1a77902079 move injection of language converters to subliminal_patch; don't discard provider simply because of LanguageReverseError 2015-09-18 00:43:33 +02:00
pannal 481dc2f3b4 Update README.md 2015-09-13 04:40:55 +02:00
panni 732aa91889 re-add language converters for addic7ed and tvsubtitles 2015-09-12 16:20:34 +02:00
panni 0df4c55548 update babelfish to 0.5.5-dev; remove leftover patch.py 2015-09-12 16:20:10 +02:00
panni 7c72ed41fb moved contents of patch.py into separate files; patch addic7ed provider 2015-09-12 16:04:39 +02:00
panni 83ace14faf patch addic7ed provider to use random user agents (again); honor selected providers again; more info on why a provider was discarded 2015-09-12 15:57:19 +02:00
243 changed files with 23357 additions and 9386 deletions
+2 -1
View File
@@ -13,7 +13,6 @@ build/
develop-eggs/
dist/
eggs/
lib/
lib64/
parts/
sdist/
@@ -53,3 +52,5 @@ coverage.xml
# Sphinx documentation
docs/_build/
# pycharm
.idea
Executable
+248
View File
@@ -0,0 +1,248 @@
1.3.31.513
- core: add option to only download one language again (and skip the addition of .lang to the subtitle filename) (default: off); fixes #126
- core: add option to always encode saved subtitles to UTF-8 (default: on); fixes #128
- core: add fallback encoding detection using bs4.UnicodeDammit; hopefully fixes #101
- core: update libraries: chardet, beautifulsoup, six
- menu/core: check Plex libraries for permission problems on plugin start and report them in the channel menu (option, default: on); fixes #143
- menu: while a manual refresh takes place, add a refresh button to the top of the SZ menu for convenience
- menu: move the "add/remove X to ignore list" menu item to the bottom of the list on item detail
1.3.27.491
- menu/core: make Sub-Zero channel menu optional (setting: "Enable Sub-Zero channel (disabling doesn't affect the subtitle features)?")
- OpenSubtitles: detect and match video/subtitle FPS (framerate) to reduce out of sync subtitle matches
- core: internal fixes; add _markerlib library (rare)
- core: don't score tvshow episode title matches, should improve episode subtitle matches quite a bit (and reduce out of sync subtitles)
- OpenSubtitles: make tag/exact filename matches optional (setting: "I keep the exact (release-) filename of my media files")
- menu: unicode video title errors fixed
- TVSubtitles: correctly match certain show IDs (such as "Series Name (US)")
- core: don't break subtitle evaluation on crashed guessing
1.3.23.459
- core: slight code cleanup and fixes
- core: add physical (filesystem) ignore mode (create files named `subzero.ignore`, `.subzero.ignore`, `.nosz` to ignore specific files/seasons/series/libraries)
- core: fix guessit hinting of tv series with rare folder layout (e.g. series_name/a/S01E01.mkv)
- core: remove "format" necessity from (opensubtitles) hash-validation
- OpenSubtitles: dramatically improve matching: add tag (exact filename) matching and treat it just like hash matches
- core: ignore embedded forced subtitles (fixes #106)
- docs: update
- settings: clarify
1.3.20.422
- tvsubtitles: show matching was partially broken
- addic7ed: better show matching
- core: correctly skip subtitles stored in filesystem if metadata storage was selected (Local Media Assets agent may still pick them up)
- core: fix local API access (switch from HTTPS to HTTP)
- core: fix handling of library names and media paths with non-ascii chars in it
- core: fix bundle version to correctly display current bundle version
- core: skip downloading multi-CD subtitle
- settings: clarify
1.3.20.403
- core: handle & and - ("and" and dash) in names
- core: fixed handling of internal metadata subtitles
- re-upped the minimum tv score to 85 (may be even higher in the future)
- opensubtitles: possibly significantly better movie matching (now also query for movie title, instead of only querying for video hash)
1.3.20.396
- core: fix logging handlers (when saving log_level settings loggers got duplicated)
- core: better movie matching by only hinting the filename and the last subdirectory to guessit (instead of the full path)
- core: don't fail on wrong detection/scanning of media file
- lower minimum tv series score from 85 to 67 (removed title; composed of: series=44 + season=11 + episode=11 + hearing_impaired=1)
1.3.19.379
- core: new recent items implementation (used in "Items with missing subtitles"), now really picking up everything instead of using Plex's recently_added API endpoint
- core: be more strict about title matching - a matched title doesn't automatically mean season and episode are correct, too
- core: rewrote the hash matching algorithm to not blindly trust hash matches anymore, but instead episodes have to match the series name, season number, episode number and format (BluRay, HDTV...); movie have to at least match the title, format and codec for the hash to be considered
- core: remove TheSubDB support for now, as it only supports hash-based matching
- scheduler: more robust item-fail-handling (fixes #81)
- config: "Scan: include embedded subtitles" now by default is off, as embedded subs have proven to be pretty unreliable
- config: add configuration option for how many items per library are to be considered recent (default: 200)
- config: make logging verbosity configurable, default: WARNING - log files should be considerably smaller now
- config: make console logging optional, default: off - good for development/debugging
- config: removed the ignore lists
- menu: added "Browse all items", where you can browse all your libraries and manage your ignore list (add/remove sections/series/items)
- menu: added "Display ignore list", where you can manage your ignored sections, series and items
- menu: the submenu titles are now dynamically composed of a breadcrumb-style tree so you see where you are
- menu: show the current and past state of the important menu actions such as (force)-refresh an item or refreshing the menu, on the Refresh-button's description
- plugin now isn't in the dev mode by default and has logging to the console off (in certain configurations this resulted in huge syslogs)
1.3.6.316
- scheduler: missing subtitles task now able to handle huge libraries (thanks @chopeta, @comrade)
- scheduler: detect item-stalling, add wait and retry logic to make missing subtitles task more robust
- scheduler: report failed items to logs after task run completion
- hint series name and episode title, or movie title to guessit to make detection way better (e.g. for Mr. Robot)
1.3.6.304
- scheduler: correct the recent-determination of the search for missing subtitles in recently_added task
- scheduler: rewrote search for missing subtitles task; it now requests refreshes one by one and not in bulk anymore (hopefully fixes stalling)
- handle rare cases of weird file system encodings (ANSI_X3.4-1968 for example)
- fix simplejson warning on startup
1.3.6.297
- rename Sub-Zero to Sub-Zero.bundle (requirement for adding Sub-Zero to the Plex channel directory)
- channel: add logging actions for the internal storage to the advanced menu
- channel: handle item titles with foreign characters in them correctly
- (hopefully) fix handling file names with foreign characters in them when scanning for local media
- reformat the whole project, mostly honoring pep8
- scheduler: fixed some serious bugs; broken tasks (stalled) and some errors many of you have seen should be gone now
- scheduler: partly rewritten to be more robust, again
- settings: move Plex.tv credentials to the top
1.3.5.281
- fix tasks broken for 1.2 -> 1.3.5 upgraders
1.3.5.273 (same build as Beta Release 1.3.0.273) - changes from previous stable 1.2.11.180
- add a channel menu, making this plugin a hybrid (Agent+Channel)
- add a generic background task scheduler
- add a task to search for subtitles for items with missing subtitles (manually triggered and automatic)
- add artwork
- add Plex.tv credentials/token-generation support (needed for Plex Home users for the API to work)
- addic7ed: improve show name matching again
- channel: able to browse current on-deck and recently-added items, and refresh or force-refresh (search for new subtitles) single items
- add library/series/video blacklist for items which should be skipped in "Search for missing subtitles"-task
- add donation links
- change the license to The Unlicense (while keeping the original MIT license from subliminal.bundle intact)
- store subtitle information in internal plugin storage (for later usage)
- many internal code improvements
- update documentation
1.3.0.273
- more robust update functionality
- menu: add refresh button to menu (to see the task state updating)
- scheduler: actually skip a task if it's already running
- scheduler: better behaviour when a task is running and a single item is refreshed at the same time
- menu: enforce ascii on item titles
1.3.0.261
- removed localization again
1.3.0.259
- forgot locale-data
1.3.0.256
- fix force-refresh single items to actually force-refresh
- re-add babel library
1.3.0.253
- rewrote background tasks subsystem
- keep track of the status of a task and its runtime
- add task state in channel menu to "Search for missing subtitles"
- add date/time localization to channel menu
- hide plex token from logs, when requesting
- fix addic7ed show id parsing for shows with year set
- test PMS API connectivity and fail miserably if needed (channel disabled, scheduler disabled)
- feature-freeze for 1.3.0 final
1.3.0.245
- add the option to buy me a beer
- clarify menu items
- more robust scheduler handling (should fix the issues of scheduler runs in the past)
- internal cleanups
- add date_added to stored subtitle info (all of the 1.3.0 testers: please delete your internal subtitle storage using the channel->advanced menu)
1.3.0.232
- integrate plex.tv authentication for plex home users (test phase)
- menu cleanup
- more info in the menu (scheduler last and next run for example)
- hopefully fixed intent handling (should throw less errors now)
- fix version display in agent names
1.3.0.222
- bugfix for search missing subtitles
- schedduler: honor "never"
1.3.0.216
- add channel menu
- add generic task scheduler
- add functionality to search for missing subtitles (via recently added items)
- add artwork
- change license to The Unlicense
- ...
1.2.11.180
- fix #49 (metadata storage didn't work)
- add better detection for existing subtitles stored in metadata
1.2.11.177
- updated naming scheme to reflect rewrite.major.minor.build (this release is the same as 1.1.0.5)
1.1.0.5
- addic7ed: fixed error in show id search
- addic7ed: even better show matching
- adjusted default scores: TV: 85, movies: 23
- add support for com.plexapp.agents.xbmcnfo/xbmcnfotv (proposed to the author [here](https://github.com/gboudreau/XBMCnfoMoviesImporter.bundle/pull/63) and [here](https://github.com/gboudreau/XBMCnfoTVImporter.bundle/pull/70))
1.1.0.3
- addic7ed/tvsubtitles: be way smarter about punctuation in series names (*A.G.E.N.T.S. ...*)
- ditch LocalMediaExtended and incorporate the functionality in Sub-Zero (**RC-users: delete LocalMediaExtended.bundle and re-enable LocalMedia!**)
- remove (unused) setting "Restrict to one language"
- add "Treat IETF language tags as ISO 639-1 (e.g. pt-BR = pt)" setting (default: true)
- change default external storage to "current folder" instead of "/subs"
- adjust default scores
RC-5.2
- revert back to /plexinc-agents/LocalMedia.bundle/tree/dist instead of /plexinc-agents/LocalMedia.bundle/tree/master, as the current public PMS version is too old for that
RC-5.1
- make hearing_impaired option more configurable and clear (see #configuration-)
RC-5
- fix wrong video type matching by hinting video type to guessit
- update to newest LocalMediaExtended.bundle (incorporated plex-inc's changes)
- show page links for subtitles in log file instead of subtitle ID
- add custom language setting in addition to the three hardcoded ones
- if a subtitle doesn't match our hearing_impaired setting, ignore it
- add an optional boost for addic7ed subtitles, if their series, season, episode, year, and format (e.g. WEB-DL) matches
RC-4
- rename project to Sub-Zero
- incorporate LocalMediaExtended.bundle
- making this a multi-bundle plugin
- update default scores
- add icon
RC-3
- addic7ed/tvsubtitles: punctuation fixes (correctly get show ids for series like "Mr. Poopster" now)
- podnapisi: fix logging
- opensubtitles: add login credentials (for VIPs)
- add retry functionality to retry failed subtitle downloads, including configurable amount of retries until discarding of provider
- move possibly not needed setting "Restrict to one language" to the bottom
- more detailed logging
- some cleanup
RC-2
- fix empty custom subtitle folder creation
- fix detection of existing embedded subtitles (switch to https://github.com/tonswieb/enzyme)
- better logging
- set default TV score to 15; movie score to 30
RC-1
- fix subliminal's logging error on min_score not met (fixes #15)
- separated tv and movies subtitle scores settings (fixes #16)
- add option to save only one subtitle per video (skipping the ".lang." naming scheme plex supports) (fixes #3)
beta5
- fix storing subtitles besides the actual video file, not subfolder (fixes #14)
- "custom folder" setting now always used if given (properly overrides "subtitle folder" setting)
- also scan (custom) given subtitle folders for existing subtitles instead of redownloading them on every refresh (fixes #9, #2)
beta4
- ~~increased score of addic7ed subtitles a bit~~ (not existing currently)
- **support for newest Subliminal ([1.0.1](27a6e51cd36ffb2910cd9a7add6d797a2c6469b7)) and guessit ([0.11.0](2814f57e8999dcc31575619f076c0c1a63ce78f2))**
- **plugin now also [works with com.plexapp.agents.thetvdbdvdorder](924470d2c0db3a71529278bce4b7247eaf2f85b8)**
- providers fixed for subliminal 1.0.1 ([at least addic7ed](131504e7eed8b3400c457fbe49beea3b115bc916))
- providers [don't simply fail and get excluded on non-detected language](1a779020792e0201ad689eefbf5a126155e89c97)
- support for addic7ed languages: [French (Canadian)](b11a051c233fd72033f0c3b5a8c1965260e7e19f)
- support for additional languages: [pt-br (Portuguese (Brasil)), fa (Persian (Farsi))](131504e7eed8b3400c457fbe49beea3b115bc916)
- support for [three (two optional) subtitle languages](e543c927cf49c264eaece36640c99d67a99c7da2)
- optionally use [random user agent for addic7ed provider](83ace14faf75fbd75313f0ceda9b78161895fbcf) (should not be needed)
Regular → Executable
+225 -105
View File
@@ -1,103 +1,151 @@
# hdbits.org
# coding=utf-8
import os
import sys
# just some slight modifications to support sum and iter again
from subzero.sandbox import restore_builtins
module = sys.modules['__main__']
restore_builtins(module, {})
globals = getattr(module, "__builtins__")["globals"]
for key, value in getattr(module, "__builtins__").iteritems():
if key != "globals":
globals()[key] = value
import logger
import logging
# temporarily add the console handler and set it to DEBUG to catch errors upon imports
Core.log.addHandler(logger.console_handler)
Core.log.setLevel(logging.DEBUG)
sys.modules["logger"] = logger
import string, os, urllib, zipfile, re, copy
from babelfish import Language
from datetime import timedelta
import subliminal
import subliminal_patch
import logger
import support
OS_PLEX_USERAGENT = 'plexapp.com v9.0'
import interface
sys.modules["interface"] = interface
from subzero.constants import OS_PLEX_USERAGENT, PERSONAL_MEDIA_IDENTIFIER
from subzero import intent
from interface.menu import *
from support.plex_media import convert_media_to_parts, get_media_item_ids, scan_parts
from support.subtitlehelpers import get_subtitles_from_metadata, force_utf8
from support.helpers import notify_executable
from support.storage import store_subtitle_info, whack_missing_parts
from support.items import is_ignored
from support.config import config
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'requests']
def Start():
HTTP.CacheTime = 0
HTTP.Headers['User-agent'] = OS_PLEX_USERAGENT
Log.Debug("START CALLED")
logger.registerLoggingHander(DEPENDENCY_MODULE_NAMES)
# configured cache to be in memory as per https://github.com/Diaoul/subliminal/issues/303
subliminal.region.configure('dogpile.cache.memory')
def ValidatePrefs():
Log.Debug("Validate Prefs called.")
return
# init defaults; perhaps not the best idea to use ValidatePrefs here, but we'll see
ValidatePrefs()
Log.Debug(config.full_version)
# Prepare a list of languages we want subs for
def getLangList():
langList = {Language.fromietf(Prefs["langPref1"])}
if(Prefs["langPref2"] != "None"):
langList.update({Language.fromietf(Prefs["langPref2"])})
return langList
if not config.permissions_ok:
Log.Error("Insufficient permissions on library folders:")
for title, path in config.missing_permissions:
Log.Error("Insufficient permissions on library %s, folder: %s" % (title, path))
return
def getProviders():
providers = {'opensubtitles' : Prefs['provider.opensubtitles.enabled'],
'thesubdb' : Prefs['provider.thesubdb.enabled'],
'podnapisi' : Prefs['provider.podnapisi.enabled'],
'addic7ed' : Prefs['provider.addic7ed.enabled'],
'tvsubtitles' : Prefs['provider.tvsubtitles.enabled']
}
return filter(lambda prov: providers[prov], providers)
scheduler.run()
def getProviderSettings():
provider_settings = {'addic7ed': {'username': Prefs['provider.addic7ed.username'],
'password': Prefs['provider.addic7ed.password']
},
}
return provider_settings
def scanTvMedia(media):
videos = {}
for season in media.seasons:
for episode in media.seasons[season].episodes:
for item in media.seasons[season].episodes[episode].items:
for part in item.parts:
scannedVideo = scanVideo(part)
videos[scannedVideo] = part
return videos
def init_subliminal_patches():
# configure custom subtitle destination folders for scanning pre-existing subs
dest_folder = config.subtitle_destination_folder
subliminal_patch.patch_video.CUSTOM_PATHS = [dest_folder] if dest_folder else []
subliminal_patch.patch_provider_pool.DOWNLOAD_TRIES = int(Prefs['subtitles.try_downloads'])
subliminal_patch.patch_providers.addic7ed.USE_BOOST = bool(Prefs['provider.addic7ed.boost'])
def scanMovieMedia(media):
videos = {}
for item in media.items:
for part in item.parts:
scannedVideo = scanVideo(part)
videos[scannedVideo] = part
return videos
def scanVideo(part):
embedded_subtitles = Prefs['subtitles.scan.embedded']
external_subtitles = Prefs['subtitles.scan.external']
Log.Debug("Scanning video: %s, subtitles=%s, embedded_subtitles=%s" % (part.file, external_subtitles, embedded_subtitles))
try:
return subliminal.video.scan_video(part.file, subtitles=external_subtitles, embedded_subtitles=embedded_subtitles)
except ValueError:
Log.Warn("File could not be guessed by subliminal")
def downloadBestSubtitles(videos):
min_score = int(Prefs['subtitles.search.minimumScore'])
def download_best_subtitles(video_part_map, min_score=0):
hearing_impaired = Prefs['subtitles.search.hearingImpaired']
Log.Debug("Download best subtitles using settings: min_score: %s, hearing_impaired: %s" %(min_score, hearing_impaired))
# patch subliminal's ProviderPool
subliminal.api.ProviderPool = subliminal_patch.PatchedProviderPool
languages = config.lang_list
if not languages:
return
return subliminal.api.download_best_subtitles(videos, getLangList(), min_score, hearing_impaired, provider_configs=getProviderSettings())
missing_languages = False
for video, part in video_part_map.iteritems():
if not Prefs['subtitles.save.filesystem']:
# scan for existing metadata subtitles
meta_subs = get_subtitles_from_metadata(part)
for language, subList in meta_subs.iteritems():
if subList:
video.subtitle_languages.add(language)
Log.Debug("Found metadata subtitle %s for %s", language, video)
def saveSubtitles(videos, subtitles):
missing_subs = (languages - video.subtitle_languages)
# all languages are found if we either really have subs for all languages or we only want to have exactly one language
# and we've only found one (the case for a selected language, Prefs['subtitles.only_one'] (one found sub matches any language))
found_one_which_is_enough = len(video.subtitle_languages) >= 1 and Prefs['subtitles.only_one']
if not missing_subs or found_one_which_is_enough:
if found_one_which_is_enough:
Log.Debug('Only one language was requested, and we\'ve got a subtitle for %s', video)
else:
Log.Debug('All languages %r exist for %s', languages, video)
continue
missing_languages = True
break
if missing_languages:
Log.Debug("Download best subtitles using settings: min_score: %s, hearing_impaired: %s" % (min_score, hearing_impaired))
return subliminal.api.download_best_subtitles(video_part_map.keys(), languages, min_score, hearing_impaired, providers=config.providers,
provider_configs=config.provider_settings)
Log.Debug("All languages for all requested videos exist. Doing nothing.")
def save_subtitles(videos, subtitles):
meta_fallback = False
save_successful = False
storage = "metadata"
if Prefs['subtitles.save.filesystem']:
Log.Debug("Saving subtitles to filesystem")
saveSubtitlesToFile(subtitles)
else:
Log.Debug("Saving subtitles as metadata")
saveSubtitlesToMetadata(videos, subtitles)
storage = "filesystem"
try:
Log.Debug("Using filesystem as subtitle storage")
save_subtitles_to_file(subtitles)
except OSError:
if Prefs["subtitles.save.metadata_fallback"]:
meta_fallback = True
else:
raise
else:
save_successful = True
def saveSubtitlesToFile(subtitles):
if not Prefs['subtitles.save.filesystem'] or meta_fallback:
if meta_fallback:
Log.Debug("Using metadata as subtitle storage, because filesystem storage failed")
else:
Log.Debug("Using metadata as subtitle storage")
save_successful = save_subtitles_to_metadata(videos, subtitles)
if save_successful and config.notify_executable:
notify_executable(config.notify_executable, videos, subtitles, storage)
store_subtitle_info(videos, subtitles, storage)
def save_subtitles_to_file(subtitles):
fld_custom = Prefs["subtitles.save.subFolder.Custom"].strip() if bool(Prefs["subtitles.save.subFolder.Custom"]) else None
if Prefs["subtitles.save.subFolder"] != "current folder" or fld_custom:
# specific subFolder requested, create it if it doesn't exist
for video, video_subtitles in subtitles.items():
for video, video_subtitles in subtitles.items():
if not video_subtitles:
continue
fld = None
if fld_custom or Prefs["subtitles.save.subFolder"] != "current folder":
# specific subFolder requested, create it if it doesn't exist
fld_base = os.path.split(video.name)[0]
if fld_custom:
if fld_custom.startswith("/"):
@@ -109,46 +157,118 @@ def saveSubtitlesToFile(subtitles):
fld = os.path.join(fld_base, Prefs["subtitles.save.subFolder"])
if not os.path.exists(fld):
os.makedirs(fld)
subliminal.api.save_subtitles(video, video_subtitles, directory=fld)
else:
subliminal.api.save_subtitles(subtitles)
subliminal.api.save_subtitles(video, video_subtitles, directory=fld, single=Prefs['subtitles.only_one'],
encode_with=force_utf8 if Prefs['subtitles.enforce_encoding'] else None)
return True
def saveSubtitlesToMetadata(videos, subtitles):
def save_subtitles_to_metadata(videos, subtitles):
for video, video_subtitles in subtitles.items():
mediaPart = videos[video]
for subtitle in video_subtitles:
mediaPart.subtitles[Locale.Language.Match(subtitle.language.alpha2)][subtitle.page_link] = Proxy.Media(subtitle.content, ext="srt")
for subtitle in video_subtitles:
content = force_utf8(subtitle.text) if Prefs['subtitles.enforce_encoding'] else subtitle.content
mediaPart.subtitles[Locale.Language.Match(subtitle.language.alpha2)][subtitle.page_link] = Proxy.Media(content, ext="srt")
return True
class SubliminalSubtitlesAgentMovies(Agent.Movies):
name = 'Subliminal Movie Subtitles'
def update_local_media(metadata, media, media_type="movies"):
# Look for subtitles
if media_type == "movies":
for item in media.items:
for part in item.parts:
support.localmedia.find_subtitles(part)
return
# Look for subtitles for each episode.
for s in media.seasons:
# If we've got a date based season, ignore it for now, otherwise it'll collide with S/E folders/XML and PMS
# prefers date-based (why?)
if int(s) < 1900 or metadata.guid.startswith(PERSONAL_MEDIA_IDENTIFIER):
for e in media.seasons[s].episodes:
for i in media.seasons[s].episodes[e].items:
# Look for subtitles.
for part in i.parts:
support.localmedia.find_subtitles(part)
else:
pass
class SubZeroAgent(object):
agent_type = None
agent_type_verbose = None
languages = [Locale.Language.English]
primary_provider = False
contributes_to = ['com.plexapp.agents.imdb']
score_prefs_key = None
def __init__(self, *args, **kwargs):
super(SubZeroAgent, self).__init__(*args, **kwargs)
self.agent_type = "movies" if isinstance(self, Agent.Movies) else "series"
self.name = "Sub-Zero Subtitles (%s, %s)" % (self.agent_type_verbose, config.get_version())
def search(self, results, media, lang):
Log.Debug("MOVIE SEARCH CALLED")
Log.Debug("Sub-Zero %s, %s search" % (config.version, self.agent_type))
results.Append(MetadataSearchResult(id='null', score=100))
def update(self, metadata, media, lang):
Log.Debug("MOVIE UPDATE CALLED")
videos = scanMovieMedia(media)
subtitles = downloadBestSubtitles(videos.keys())
saveSubtitles(videos, subtitles)
Log.Debug("Sub-Zero %s, %s update called" % (config.version, self.agent_type))
class SubliminalSubtitlesAgentTvShows(Agent.TV_Shows):
name = 'Subliminal TV Subtitles'
languages = [Locale.Language.English]
primary_provider = False
contributes_to = ['com.plexapp.agents.thetvdb', 'com.plexapp.agents.thetvdbdvdorder']
if not media:
Log.Error("Called with empty media, something is really wrong with your setup!")
return
def search(self, results, media, lang):
Log.Debug("TV SEARCH CALLED")
results.Append(MetadataSearchResult(id='null', score=100))
set_refresh_menu_state(media, media_type=self.agent_type)
def update(self, metadata, media, lang):
Log.Debug("TvUpdate. Lang %s" % lang)
videos = scanTvMedia(media)
subtitles = downloadBestSubtitles(videos.keys())
saveSubtitles(videos, subtitles)
item_ids = []
try:
init_subliminal_patches()
parts = convert_media_to_parts(media, kind=self.agent_type)
# media ignored?
use_any_parts = False
for part in parts:
if is_ignored(part["id"]):
Log.Debug(u"Ignoring %s" % part)
continue
use_any_parts = True
if not use_any_parts:
Log.Debug(u"Nothing to do.")
return
use_score = Prefs[self.score_prefs_key]
scanned_parts = scan_parts(parts, kind=self.agent_type)
subtitles = download_best_subtitles(scanned_parts, min_score=int(use_score))
item_ids = get_media_item_ids(media, kind=self.agent_type)
whack_missing_parts(scanned_parts)
if subtitles:
save_subtitles(scanned_parts, subtitles)
update_local_media(metadata, media, media_type=self.agent_type)
finally:
# update the menu state
set_refresh_menu_state(None)
# notify any running tasks about our finished update
for item_id in item_ids:
scheduler.signal("updated_metadata", item_id)
# resolve existing intent for that id
intent.resolve("force", item_id)
Dict.Save()
class SubZeroSubtitlesAgentMovies(SubZeroAgent, Agent.Movies):
contributes_to = ['com.plexapp.agents.imdb', 'com.plexapp.agents.xbmcnfo', 'com.plexapp.agents.themoviedb', 'com.plexapp.agents.hama']
score_prefs_key = "subtitles.search.minimumMovieScore"
agent_type_verbose = "Movies"
class SubZeroSubtitlesAgentTvShows(SubZeroAgent, Agent.TV_Shows):
contributes_to = ['com.plexapp.agents.thetvdb', 'com.plexapp.agents.themoviedb',
'com.plexapp.agents.thetvdbdvdorder', 'com.plexapp.agents.xbmcnfotv', 'com.plexapp.agents.hama']
score_prefs_key = "subtitles.search.minimumTVScore"
agent_type_verbose = "TV"
+7
View File
@@ -0,0 +1,7 @@
import sys
import menu
sys.modules["interface.menu"] = menu
import menu_helpers
sys.modules["interface.menu_helpers"] = menu_helpers
+570
View File
@@ -0,0 +1,570 @@
# coding=utf-8
import logging
import logger
from menu_helpers import add_ignore_options, dig_tree, set_refresh_menu_state, \
should_display_ignore, enable_channel_wrapper, default_thumb, debounce
from subzero.constants import TITLE, ART, ICON, PREFIX, PLUGIN_IDENTIFIER, DEPENDENCY_MODULE_NAMES
from support.background import scheduler
from support.config import config
from support.helpers import pad_title, timestamp
from support.ignore import ignore_list
from support.items import get_item, get_on_deck_items, refresh_item, get_all_items, get_recent_items, get_items_info, get_item_thumb
from support.lib import Plex
from support.missing_subtitles import items_get_all_missing_subs
from support.storage import reset_storage, log_storage, get_subtitle_info
from support.plex_media import scan_parts
# init GUI
ObjectContainer.art = R(ART)
ObjectContainer.no_cache = True
# default thumb for DirectoryObjects
DirectoryObject.thumb = default_thumb
# noinspection PyUnboundLocalVariable
route = enable_channel_wrapper(route)
# noinspection PyUnboundLocalVariable
handler = enable_channel_wrapper(handler)
@handler(PREFIX, TITLE, art=ART, thumb=ICON)
@route(PREFIX)
def fatality(randomize=None, force_title=None, header=None, message=None, only_refresh=False, no_history=False, replace_parent=False):
"""
subzero main menu
"""
title = force_title if force_title is not None else config.full_version
oc = ObjectContainer(title1=title, title2=None, header=unicode(header) if header else header, message=message, no_history=no_history,
replace_parent=replace_parent, no_cache=True)
if not config.permissions_ok and config.missing_permissions:
for title, path in config.missing_permissions:
oc.add(DirectoryObject(
key=Callback(fatality, randomize=timestamp()),
title=pad_title("Insufficient permissions"),
summary="Insufficient permissions on library %s, folder: %s" % (title, path),
))
return oc
if not only_refresh:
if Dict["current_refresh_state"]:
oc.add(DirectoryObject(
key=Callback(fatality, force_title=" ", randomize=timestamp()),
title=pad_title("Working ... refresh here"),
summary="Current state: %s; Last state: %s" % (
(Dict["current_refresh_state"] or "Idle") if "current_refresh_state" in Dict else "Idle",
(Dict["last_refresh_state"] or "None") if "last_refresh_state" in Dict else "None"
)
))
oc.add(DirectoryObject(
key=Callback(OnDeckMenu),
title="On Deck items",
summary="Shows the current on deck items and allows you to individually (force-) refresh their metadata/subtitles."
))
oc.add(DirectoryObject(
key=Callback(RecentlyAddedMenu),
title="Items with missing subtitles",
summary="Shows the items honoring the configured 'Item age to be considered recent'-setting (%s)"
" and allowing you to individually (force-) refresh their metadata/subtitles. " % Prefs["scheduler.item_is_recent_age"]
))
oc.add(DirectoryObject(
key=Callback(SectionsMenu),
title="Browse all items",
summary="Go through your whole library and manage your ignore list. You can also "
"(force-) refresh the metadata/subtitles of individual items."
))
task_name = "searchAllRecentlyAddedMissing"
task = scheduler.task(task_name)
if task.ready_for_display:
task_state = "Running: %s/%s (%s%%)" % (len(task.items_done), len(task.items_searching), task.percentage)
else:
task_state = "Last scheduler run: %s; Next scheduled run: %s; Last runtime: %s" % (scheduler.last_run(task_name) or "never",
scheduler.next_run(task_name) or "never",
str(task.last_run_time).split(".")[0])
oc.add(DirectoryObject(
key=Callback(RefreshMissing, randomize=timestamp()),
title="Search for missing subtitles (in recently-added items, max-age: %s)" % Prefs["scheduler.item_is_recent_age"],
summary="Automatically run periodically by the scheduler, if configured. %s" % task_state
))
oc.add(DirectoryObject(
key=Callback(IgnoreListMenu),
title="Display ignore list (%d)" % len(ignore_list),
summary="Show the current ignore list (mainly used for the automatic tasks)"
))
oc.add(DirectoryObject(
key=Callback(fatality, force_title=" ", randomize=timestamp()),
title=pad_title("Refresh"),
summary="Current state: %s; Last state: %s" % (
(Dict["current_refresh_state"] or "Idle") if "current_refresh_state" in Dict else "Idle",
(Dict["last_refresh_state"] or "None") if "last_refresh_state" in Dict else "None"
)
))
if not only_refresh:
oc.add(DirectoryObject(
key=Callback(AdvancedMenu),
title=pad_title("Advanced functions"),
summary="Use at your own risk"
))
return oc
@route(PREFIX + '/on_deck')
def OnDeckMenu(message=None):
"""
displays the items on deck
:param message:
:return:
"""
return mergedItemsMenu(title="Items On Deck", base_title="Items On Deck", itemGetter=get_on_deck_items)
@route(PREFIX + '/recent')
def RecentlyAddedMenu(message=None):
"""
displays the recently added items with missing subtitles
:param message:
:return:
"""
return recentItemsMenu(title="Missing Subtitles", base_title="Missing Subtitles")
def recentItemsMenu(title, base_title=None):
oc = ObjectContainer(title2=title, no_cache=True, no_history=True)
recent_items = get_recent_items()
if recent_items:
missing_items = items_get_all_missing_subs(recent_items)
if missing_items:
for added_at, item_id, title, item in missing_items:
oc.add(DirectoryObject(
key=Callback(ItemDetailsMenu, title=base_title + " > " + title, item_title=title, rating_key=item_id),
title=title,
thumb=get_item_thumb(item) or default_thumb
))
return oc
def mergedItemsMenu(title, itemGetter, itemGetterKwArgs=None, base_title=None, *args, **kwargs):
"""
displays an item list of dynamic kinds of items
:param title:
:param itemGetter:
:param itemGetterKwArgs:
:param base_title:
:param args:
:param kwargs:
:return:
"""
oc = ObjectContainer(title2=title, no_cache=True, no_history=True)
items = itemGetter(*args, **kwargs)
for kind, title, item_id, deeper, item in items:
oc.add(DirectoryObject(
title=title,
key=Callback(ItemDetailsMenu, title=base_title + " > " + title, item_title=title, rating_key=item_id),
thumb=get_item_thumb(item) or default_thumb
))
return oc
def determine_section_display(kind, item):
"""
returns the menu function for a section based on the size of it (amount of items)
:param kind:
:param item:
:return:
"""
if item.size > 200:
return SectionFirstLetterMenu
return SectionMenu
@route(PREFIX + '/ignore/set/{kind}/{rating_key}/{todo}/sure={sure}', kind=str, rating_key=str, todo=str, sure=bool)
def IgnoreMenu(kind, rating_key, title=None, sure=False, todo="not_set"):
"""
displays the ignore options for a menu
:param kind:
:param rating_key:
:param title:
:param sure:
:param todo:
:return:
"""
is_ignored = rating_key in ignore_list[kind]
if not sure:
oc = ObjectContainer(no_history=True, replace_parent=True, title1="%s %s %s %s the ignore list" % (
"Add" if not is_ignored else "Remove", ignore_list.verbose(kind), title, "to" if not is_ignored else "from"), title2="Are you sure?")
oc.add(DirectoryObject(
key=Callback(IgnoreMenu, kind=kind, rating_key=rating_key, title=title, sure=True, todo="add" if not is_ignored else "remove"),
title=pad_title("Are you sure?"),
))
return oc
rel = ignore_list[kind]
dont_change = False
if todo == "remove":
if not is_ignored:
dont_change = True
else:
rel.remove(rating_key)
Log.Info("Removed %s (%s) from the ignore list", title, rating_key)
ignore_list.remove_title(kind, rating_key)
ignore_list.save()
state = "removed from"
elif todo == "add":
if is_ignored:
dont_change = True
else:
rel.append(rating_key)
Log.Info("Added %s (%s) to the ignore list", title, rating_key)
ignore_list.add_title(kind, rating_key, title)
ignore_list.save()
state = "added to"
else:
dont_change = True
if dont_change:
return fatality(force_title=" ", header="Didn't change the ignore list", no_history=True)
return fatality(force_title=" ", header="%s %s the ignore list" % (title, state), no_history=True)
@route(PREFIX + '/sections')
def SectionsMenu():
"""
displays the menu for all sections
:return:
"""
items = get_all_items("sections")
return dig_tree(ObjectContainer(title2="Sections", no_cache=True, no_history=True), items, None,
menu_determination_callback=determine_section_display, pass_kwargs={"base_title": "Sections"},
fill_args={"title": "section_title"})
@route(PREFIX + '/section', ignore_options=bool)
def SectionMenu(rating_key, title=None, base_title=None, section_title=None, ignore_options=True):
"""
displays the contents of a section
:param rating_key:
:param title:
:param base_title:
:param section_title:
:param ignore_options:
:return:
"""
items = get_all_items(key="all", value=rating_key, base="library/sections")
kind, deeper = get_items_info(items)
title = unicode(title)
section_title = title
title = base_title + " > " + title
oc = ObjectContainer(title2=title, no_cache=True, no_history=True)
if ignore_options:
add_ignore_options(oc, "sections", title=section_title, rating_key=rating_key, callback_menu=IgnoreMenu)
return dig_tree(oc, items, MetadataMenu,
pass_kwargs={"base_title": title, "display_items": deeper, "previous_item_type": "section",
"previous_rating_key": rating_key})
@route(PREFIX + '/section/firstLetter', deeper=bool)
def SectionFirstLetterMenu(rating_key, title=None, base_title=None, section_title=None):
"""
displays the contents of a section indexed by its first char (A-Z, 0-9...)
:param rating_key:
:param title:
:param base_title:
:param section_title:
:return:
"""
items = get_all_items(key="first_character", value=rating_key, base="library/sections")
kind, deeper = get_items_info(items)
title = unicode(title)
oc = ObjectContainer(title2=section_title, no_cache=True, no_history=True)
title = base_title + " > " + title
add_ignore_options(oc, "sections", title=section_title, rating_key=rating_key, callback_menu=IgnoreMenu)
oc.add(DirectoryObject(
key=Callback(SectionMenu, title="All", base_title=title, rating_key=rating_key, ignore_options=False),
title="All"
)
)
return dig_tree(oc, items, FirstLetterMetadataMenu, force_rating_key=rating_key, fill_args={"key": "key"},
pass_kwargs={"base_title": title, "display_items": deeper, "previous_rating_key": rating_key})
@route(PREFIX + '/section/firstLetter/key', deeper=bool)
def FirstLetterMetadataMenu(rating_key, key, title=None, base_title=None, display_items=False, previous_item_type=None,
previous_rating_key=None):
"""
displays the contents of a section filtered by the first letter
:param rating_key: actually is the section's key
:param key: the firstLetter wanted
:param title: the first letter, or #
:param deeper:
:return:
"""
title = base_title + " > " + unicode(title)
oc = ObjectContainer(title2=title, no_cache=True, no_history=True)
items = get_all_items(key="first_character", value=[rating_key, key], base="library/sections", flat=False)
kind, deeper = get_items_info(items)
dig_tree(oc, items, MetadataMenu,
pass_kwargs={"base_title": title, "display_items": deeper, "previous_item_type": kind, "previous_rating_key": rating_key})
return oc
@route(PREFIX + '/section/contents', display_items=bool)
def MetadataMenu(rating_key, title=None, base_title=None, display_items=False, previous_item_type=None, previous_rating_key=None):
"""
displays the contents of a section based on whether it has a deeper tree or not (movies->movie (item) list; series->series list)
:param rating_key:
:param title:
:param base_title:
:param display_items:
:param previous_item_type:
:param previous_rating_key:
:return:
"""
title = unicode(title)
item_title = title
title = base_title + " > " + title
oc = ObjectContainer(title2=title, no_cache=True, no_history=True)
if display_items:
items = get_all_items(key="children", value=rating_key, base="library/metadata")
kind, deeper = get_items_info(items)
dig_tree(oc, items, MetadataMenu,
pass_kwargs={"base_title": title, "display_items": deeper, "previous_item_type": kind, "previous_rating_key": rating_key})
# we don't know exactly where we are here, only add ignore option to series
if should_display_ignore(items, previous=previous_item_type):
add_ignore_options(oc, "series", title=item_title, rating_key=rating_key, callback_menu=IgnoreMenu)
# add refresh
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=item_title, refresh_kind=kind, previous_rating_key=previous_rating_key,
timeout=16000, randomize=timestamp()),
title=u"Refresh: %s" % item_title,
summary="Refreshes the item, possibly picking up new subtitles on disk"
))
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=item_title, force=True, refresh_kind=kind,
previous_rating_key=previous_rating_key, timeout=16000),
title=u"Force-Refresh: %s" % item_title,
summary="Issues a forced refresh, ignoring known subtitles and searching for new ones"
))
else:
return ItemDetailsMenu(rating_key=rating_key, title=title, item_title=item_title)
return oc
@route(PREFIX + '/ignore_list')
def IgnoreListMenu():
oc = ObjectContainer(title2="Ignore list", replace_parent=True)
for key in ignore_list.key_order:
values = ignore_list[key]
for value in values:
add_ignore_options(oc, key, title=ignore_list.get_title(key, value), rating_key=value, callback_menu=IgnoreMenu)
return oc
@route(PREFIX + '/item/{rating_key}/actions')
def ItemDetailsMenu(rating_key, title=None, base_title=None, item_title=None, randomize=None):
"""
displays the item details menu of an item that doesn't contain any deeper tree, such as a movie or an episode
:param rating_key:
:param title:
:param base_title:
:param item_title:
:param randomize:
:return:
"""
title = unicode(base_title) + " > " + unicode(title) if base_title else unicode(title)
item = get_item(rating_key)
oc = ObjectContainer(title2=title, replace_parent=True)
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=item_title, randomize=timestamp()),
title=u"Refresh: %s" % item_title,
summary="Refreshes the item, possibly picking up new subtitles on disk",
thumb=item.thumb or default_thumb
))
oc.add(DirectoryObject(
key=Callback(RefreshItem, rating_key=rating_key, item_title=item_title, force=True, randomize=timestamp()),
title=u"Force-Refresh: %s" % item_title,
summary="Issues a forced refresh, ignoring known subtitles and searching for new ones",
thumb=item.thumb or default_thumb
))
add_ignore_options(oc, "videos", title=item_title, rating_key=rating_key, callback_menu=IgnoreMenu)
return oc
@route(PREFIX + '/item/{rating_key}')
@debounce
def RefreshItem(rating_key=None, came_from="/recent", item_title=None, force=False, refresh_kind=None, previous_rating_key=None, timeout=8000, randomize=None, trigger=True):
assert rating_key
header = " "
if trigger:
set_refresh_menu_state(u"Triggering %sRefresh for %s" % ("Force-" if force else "", item_title))
Thread.Create(refresh_item, rating_key=rating_key, force=force, refresh_kind=refresh_kind, parent_rating_key=previous_rating_key,
timeout=int(timeout))
header = u"%s of item %s triggered" % ("Refresh" if not force else "Forced-refresh", rating_key)
return fatality(randomize=timestamp(), header=header, replace_parent=True)
@route(PREFIX + '/missing/refresh')
@debounce
def RefreshMissing(randomize=None, trigger=True):
header = " "
if trigger:
Thread.CreateTimer(1.0, lambda: scheduler.run_task("searchAllRecentlyAddedMissing"))
header = "Refresh of recently added items with missing subtitles triggered"
return fatality(header=header, replace_parent=True)
@route(PREFIX + '/advanced')
def AdvancedMenu(randomize=None, header=None, message=None):
oc = ObjectContainer(header=header or "Internal stuff, pay attention!", message=message, no_cache=True, no_history=True,
replace_parent=True, title2="Advanced")
oc.add(DirectoryObject(
key=Callback(TriggerRestart, randomize=timestamp()),
title=pad_title("Restart the plugin"),
))
oc.add(DirectoryObject(
key=Callback(LogStorage, key="tasks", randomize=timestamp()),
title=pad_title("Log the plugin's scheduled tasks state storage"),
))
oc.add(DirectoryObject(
key=Callback(LogStorage, key="subs", randomize=timestamp()),
title=pad_title("Log the plugin's internal subtitle information storage"),
))
oc.add(DirectoryObject(
key=Callback(LogStorage, key="ignore", randomize=timestamp()),
title=pad_title("Log the plugin's internal ignorelist storage"),
))
oc.add(DirectoryObject(
key=Callback(ResetStorage, key="tasks", randomize=timestamp()),
title=pad_title("Reset the plugin's scheduled tasks state storage"),
))
oc.add(DirectoryObject(
key=Callback(ResetStorage, key="subs", randomize=timestamp()),
title=pad_title("Reset the plugin's internal subtitle information storage"),
))
oc.add(DirectoryObject(
key=Callback(ResetStorage, key="ignore", randomize=timestamp()),
title=pad_title("Reset the plugin's internal ignorelist storage"),
))
return oc
@route(PREFIX + '/ValidatePrefs', enforce_route=True)
def ValidatePrefs():
Core.log.setLevel(logging.DEBUG)
Log.Debug("Validate Prefs called.")
# cache the channel state
update_dict = False
restart = False
if "channel_enabled" not in Dict:
update_dict = True
elif Dict["channel_enabled"] != Prefs["enable_channel"]:
Log.Debug("Channel features %s, restarting plugin", "enabled" if Prefs["enable_channel"] else "disabled")
update_dict = True
restart = True
if update_dict:
Dict["channel_enabled"] = Prefs["enable_channel"]
Dict.Save()
if restart:
DispatchRestart()
config.initialize()
scheduler.setup_tasks()
set_refresh_menu_state(None)
if Prefs["log_console"]:
Core.log.addHandler(logger.console_handler)
Log.Debug("Logging to console from now on")
else:
Core.log.removeHandler(logger.console_handler)
Log.Debug("Stop logging to console")
Log.Debug("Setting log-level to %s", Prefs["log_level"])
logger.register_logging_handler(DEPENDENCY_MODULE_NAMES, level=Prefs["log_level"])
Core.log.setLevel(logging.getLevelName(Prefs["log_level"]))
return
def DispatchRestart():
Thread.CreateTimer(1.0, Restart)
@route(PREFIX + '/advanced/restart/trigger')
@debounce
def TriggerRestart(randomize=None, trigger=True):
if trigger:
set_refresh_menu_state("Restarting the plugin")
DispatchRestart()
return fatality(header="Restart triggered, please wait about 5 seconds", force_title=" ", only_refresh=True, replace_parent=True,
no_history=True, randomize=timestamp())
@route(PREFIX + '/advanced/restart/execute')
def Restart():
Plex[":/plugins"].restart(PLUGIN_IDENTIFIER)
@route(PREFIX + '/storage/reset', sure=bool)
def ResetStorage(key, randomize=None, sure=False):
if not sure:
oc = ObjectContainer(no_history=True, title1="Reset subtitle storage", title2="Are you sure?")
oc.add(DirectoryObject(
key=Callback(ResetStorage, key=key, sure=True, randomize=timestamp()),
title=pad_title("Are you really sure?"),
))
return oc
reset_storage(key)
if key == "tasks":
# reinitialize the scheduler
scheduler.init_storage()
scheduler.setup_tasks()
return AdvancedMenu(
randomize=timestamp(),
header='Success',
message='Information Storage (%s) reset' % key
)
@route(PREFIX + '/storage/log')
def LogStorage(key, randomize=None):
log_storage(key)
return AdvancedMenu(
randomize=timestamp(),
header='Success',
message='Information Storage (%s) logged' % key
)
+140
View File
@@ -0,0 +1,140 @@
# coding=utf-8
import types
from support.items import get_kind, get_item_thumb
from subzero import intent
from support.helpers import format_video
from support.ignore import ignore_list
from subzero.constants import ICON
from subzero.func import debouncer
default_thumb = R(ICON)
def should_display_ignore(items, previous=None):
kind = get_kind(items)
return items and (
(kind in ("show", "season")) or
(kind == "episode" and previous != "season")
)
def add_ignore_options(oc, kind, callback_menu=None, title=None, rating_key=None, add_kind=True):
"""
:param oc: oc to add our options to
:param kind: movie, show, episode ... - gets translated to the ignore key (sections, series, items)
:param callback_menu: menu to inject
:param title:
:param rating_key:
:return:
"""
# try to translate kind to the ignore key
use_kind = kind
if kind not in ignore_list:
use_kind = ignore_list.translate_key(kind)
if not use_kind or use_kind not in ignore_list:
return
in_list = rating_key in ignore_list[use_kind]
oc.add(DirectoryObject(
key=Callback(callback_menu, kind=use_kind, rating_key=rating_key, title=title),
title=u"%s %s \"%s\" %s the ignore list" % (
"Remove" if in_list else "Add", ignore_list.verbose(kind) if add_kind else "", unicode(title), "from" if in_list else "to")
)
)
def dig_tree(oc, items, menu_callback, menu_determination_callback=None, force_rating_key=None, fill_args=None, pass_kwargs=None,
thumb=default_thumb):
for kind, title, key, dig_deeper, item in items:
thumb = get_item_thumb(item) or thumb
add_kwargs = {}
if fill_args:
add_kwargs = dict((name, getattr(item, k)) for k, name in fill_args.iteritems() if item and hasattr(item, k))
if pass_kwargs:
add_kwargs.update(pass_kwargs)
oc.add(DirectoryObject(
key=Callback(menu_callback or menu_determination_callback(kind, item), title=title, rating_key=force_rating_key or key,
**add_kwargs),
title=title, thumb=thumb
))
return oc
def set_refresh_menu_state(state_or_media, media_type="movies"):
"""
:param state_or_media: string, None, or Media argument from Agent.update()
:param media_type: movies or series
:return:
"""
if not state_or_media:
# store it in last state and remove the current
Dict["last_refresh_state"] = Dict["current_refresh_state"]
Dict["current_refresh_state"] = None
return
if isinstance(state_or_media, types.StringTypes):
Dict["current_refresh_state"] = state_or_media
return
media = state_or_media
media_id = media.id
title = None
if media_type == "series":
for season in media.seasons:
for episode in media.seasons[season].episodes:
ep = media.seasons[season].episodes[episode]
media_id = ep.id
title = format_video("show", ep.title, parent_title=media.title, season=int(season), episode=int(episode))
else:
title = format_video("movie", media.title)
force_refresh = intent.get("force", media_id)
Dict["current_refresh_state"] = u"%sRefreshing %s" % ("Force-" if force_refresh else "", unicode(title))
def enable_channel_wrapper(func):
"""
returns the original wrapper :func: (route or handler) if applicable, else the plain to-be-wrapped function
:param func: original wrapper
:return: original wrapper or wrapped function
"""
def noop(*args, **kwargs):
def inner(*a, **k):
"""
:param a: args
:param k: kwargs
:return: originally to-be-wrapped function
"""
return a[0]
return inner
def wrap(*args, **kwargs):
enforce_route = kwargs.pop("enforce_route", None)
return (func if Prefs["enable_channel"] or enforce_route else noop)(*args, **kwargs)
return wrap
def debounce(func):
"""
prevent func from being called twice with the same arguments
:param func:
:return:
"""
def wrap(*args, **kwargs):
if "randomize" in kwargs:
if ([func] + list(args), kwargs) in debouncer:
kwargs["trigger"] = False
Log.Debug("not triggering %s twice with %s, %s" % (func, args, kwargs))
else:
debouncer.add([func] + list(args), kwargs)
return func(*args, **kwargs)
return wrap
+20 -8
View File
@@ -1,15 +1,22 @@
import logging
def registerLoggingHander(dependencies):
plexHandler = PlexLoggerHandler()
for dependency in dependencies:
Log.Debug("Registering LoggerHandler for dependency: %s" % dependency)
def register_logging_handler(dependencies, level="ERROR"):
plex_handler = PlexLoggerHandler()
for dependency in dependencies:
Log.Debug("Registering LoggerHandler for dependency: %s" % dependency)
log = logging.getLogger(dependency)
log.setLevel('DEBUG')
log.addHandler(plexHandler)
# remove previous plex logging handlers
# fixme: this is not the most elegant solution...
for handler in log.handlers:
if isinstance(handler, PlexLoggerHandler):
log.removeHandler(handler)
log.setLevel(level)
log.addHandler(plex_handler)
class PlexLoggerHandler(logging.StreamHandler):
def __init__(self, level=0):
super(PlexLoggerHandler, self).__init__(level)
@@ -30,4 +37,9 @@ class PlexLoggerHandler(logging.StreamHandler):
elif record.levelno == logging.FATAL:
Log.Exception(self.getFormattedString(record))
else:
Log.Error("UNKNOWN LEVEL: %s", record.getMessage())
Log.Error("UNKNOWN LEVEL: %s", record.getMessage())
console_handler = logging.StreamHandler()
console_formatter = Framework.core.LogFormatter('%(asctime)-15s - %(name)-32s (%(thread)x) : %(levelname)s (%(module)s:%(lineno)d) - %(message)s')
console_handler.setFormatter(console_formatter)
+6
View File
@@ -0,0 +1,6 @@
License for parts taken out of plexinc-agents/LocalMedia.bundle
License
-------
If the software submitted to this repository accesses or calls any software provided by Plex (“Interfacing Software”), then as a condition for receiving services from Plex in response to such accesses or calls, you agree to grant and do hereby grant to Plex and its affiliates worldwide a worldwide, nonexclusive, and royalty-free right and license to use (including testing, hosting and linking to), copy, publicly perform, publicly display, reproduce in copies for distribution, and distribute the copies of any Interfacing Software made by you or with your assistance; provided, however, that you may notify Plex at legal@plex.tv if you do not wish for Plex to use, distribute, copy, publicly perform, publicly display, reproduce in copies for distribution, or distribute copies of an Interfacing Software that was created by you, and Plex will reasonable efforts to comply with such a request within a reasonable time.
+49
View File
@@ -0,0 +1,49 @@
import sys
# thanks, https://github.com/trakt/Plex-Trakt-Scrobbler/blob/master/Trakttv.bundle/Contents/Code/core/__init__.py
import config
sys.modules["support.config"] = config
import helpers
sys.modules["support.helpers"] = helpers
import lib
sys.modules["support.lib"] = lib
import plex_media
sys.modules["support.plex_media"] = plex_media
import localmedia
sys.modules["subzero.localmedia"] = localmedia
import subtitlehelpers
sys.modules["support.subtitlehelpers"] = subtitlehelpers
import items
sys.modules["support.items"] = items
import missing_subtitles
sys.modules["support.missing_subtitles"] = missing_subtitles
import background
sys.modules["support.background"] = background
import tasks
sys.modules["support.tasks"] = tasks
import storage
sys.modules["support.storage"] = storage
import ignore
sys.modules["support.ignore"] = ignore
+42
View File
@@ -0,0 +1,42 @@
# coding=utf-8
def refresh_plex_token():
username = Prefs["plex_username"]
password = Prefs["plex_password"]
if not username or not password:
if "token" in Dict:
del Dict["token"]
Dict.Save()
return
if "uuid" not in Dict:
Dict["uuid"] = String.UUID()
Dict.Save()
current_uuid = Dict["uuid"]
headers = {
'X-Plex-Device-Name': 'Sub-Zero',
'X-Plex-Product': 'Sub-Zero',
'X-Plex-Version': '1.3.0',
'X-Plex-Client-Identifier': "%s" % current_uuid,
}
request = HTTP.Request("https://plex.tv/users/sign_in.json", headers=headers,
values={'user[login]': Prefs["plex_username"], 'user[password]': Prefs["plex_password"]}, immediate=True)
token = None
if request:
try:
data = JSON.ObjectFromString(request.content)
token = data["user"]["authentication_token"]
log_data = data.copy()
log_data["user"]["authentication_token"] = "xxxxxxxxxxxxxxxxxx"
Log.Debug("Data returned from plex.tv: %s", log_data)
except:
pass
if token:
Dict["token"] = token
Dict.Save()
return True
+127
View File
@@ -0,0 +1,127 @@
# coding=utf-8
import datetime
import logging
import traceback
def parse_frequency(s):
if s == "never":
return None, None
kind, num, unit = s.split()
return int(num), unit
class DefaultScheduler(object):
thread = None
running = False
registry = None
def __init__(self):
self.thread = None
self.running = False
self.registry = []
self.tasks = {}
self.init_storage()
def init_storage(self):
if "tasks" not in Dict:
Dict["tasks"] = {}
Dict.Save()
def register(self, task):
self.registry.append(task)
def setup_tasks(self):
# discover tasks;
self.tasks = {}
for cls in self.registry:
task = cls(self)
self.tasks[task.name] = {"task": task, "frequency": parse_frequency(Prefs["scheduler.tasks.%s" % task.name])}
def run(self):
self.running = True
self.thread = Thread.Create(self.worker)
def stop(self):
self.running = False
def task(self, name):
if name not in self.tasks:
return None
return self.tasks[name]["task"]
def last_run(self, task):
if task not in self.tasks:
return None
return self.tasks[task]["task"].last_run
def next_run(self, task):
if task not in self.tasks:
return None
frequency_num, frequency_key = self.tasks[task]["frequency"]
if not frequency_num:
return None
last = self.tasks[task]["task"].last_run
use_date = last
now = datetime.datetime.now()
if not use_date:
use_date = now
return max(use_date + datetime.timedelta(**{frequency_key: frequency_num}), now)
def run_task(self, name):
task = self.tasks[name]["task"]
if task.running:
Log.Debug("Scheduler: Not running %s, as it's currently running.", name)
return
Log.Debug("Scheduler: Running task %s", name)
try:
task.prepare()
task.run()
except Exception, e:
Log.Error("Scheduler: Something went wrong when running %s: %s", name, traceback.format_exc())
finally:
task.post_run()
def signal(self, name, *args, **kwargs):
for task_name, info in self.tasks.iteritems():
task = info["task"]
if task.running:
Log.Debug("Scheduler: Sending signal %s to task %s (%s, %s)", name, task_name, args, kwargs)
status = task.signal(name, *args, **kwargs)
if status:
Log.Debug("Scheduler: Signal accepted by %s", task_name)
else:
Log.Debug("Scheduler: Signal not accepted by %s", task_name)
continue
Log.Debug("Scheduler: Not sending signal %s to task %s, because: not running", name, task_name)
def worker(self):
Thread.Sleep(10.0)
while 1:
if not self.running:
break
for name, info in self.tasks.iteritems():
now = datetime.datetime.now()
task = info["task"]
if name not in Dict["tasks"]:
continue
if task.running:
continue
frequency_num, frequency_key = info["frequency"]
if not frequency_num:
continue
if not task.last_run or task.last_run + datetime.timedelta(**{frequency_key: frequency_num}) <= now:
self.run_task(name)
Thread.Sleep(10.0)
scheduler = DefaultScheduler()
+222
View File
@@ -0,0 +1,222 @@
# coding=utf-8
import os
import re
import inspect
from babelfish import Language
from subzero.lib.io import FileIO, get_viable_encoding
from subzero.constants import PLUGIN_NAME, PLUGIN_IDENTIFIER, MOVIE, SHOW
from lib import Plex
from helpers import check_write_permissions
SUBTITLE_EXTS = ['utf', 'utf8', 'utf-8', 'srt', 'smi', 'rt', 'ssa', 'aqt', 'jss', 'ass', 'idx', 'sub', 'txt', 'psb']
VIDEO_EXTS = ['3g2', '3gp', 'asf', 'asx', 'avc', 'avi', 'avs', 'bivx', 'bup', 'divx', 'dv', 'dvr-ms', 'evo', 'fli', 'flv',
'm2t', 'm2ts', 'm2v', 'm4v', 'mkv', 'mov', 'mp4', 'mpeg', 'mpg', 'mts', 'nsv', 'nuv', 'ogm', 'ogv', 'tp',
'pva', 'qt', 'rm', 'rmvb', 'sdp', 'svq3', 'strm', 'ts', 'ty', 'vdr', 'viv', 'vob', 'vp3', 'wmv', 'wpl', 'wtv', 'xsp', 'xvid',
'webm']
IGNORE_FN = ("subzero.ignore", ".subzero.ignore", ".nosz")
VERSION_RE = re.compile(ur'CFBundleVersion.+?<string>([0-9\.]+)</string>', re.DOTALL)
def int_or_default(s, default):
try:
return int(s)
except ValueError:
return default
class Config(object):
version = None
full_version = None
lang_list = None
subtitle_destination_folder = None
providers = None
provider_settings = None
max_recent_items_per_library = 200
permissions_ok = False
missing_permissions = None
ignore_paths = None
fs_encoding = None
notify_executable = None
sections = None
enabled_sections = None
initialized = False
def initialize(self):
self.fs_encoding = get_viable_encoding()
self.version = self.get_version()
self.full_version = u"%s %s" % (PLUGIN_NAME, self.version)
self.lang_list = self.get_lang_list()
self.subtitle_destination_folder = self.get_subtitle_destination_folder()
self.providers = self.get_providers()
self.provider_settings = self.get_provider_settings()
self.max_recent_items_per_library = int_or_default(Prefs["scheduler.max_recent_items_per_library"], 200)
self.sections = list(Plex["library"].sections())
self.missing_permissions = []
self.ignore_paths = self.parse_ignore_paths()
self.permissions_ok = self.check_permissions()
self.notify_executable = self.check_notify_executable()
self.enabled_sections = self.check_enabled_sections()
self.initialized = True
def check_permissions(self):
if not Prefs["subtitles.save.filesystem"] or not Prefs["check_permissions"]:
return True
use_ignore_fs = Prefs["subtitles.ignore_fs"]
all_permissions_ok = True
for section in self.sections:
title = section.title
for location in section:
path_str = location.path
if isinstance(path_str, unicode):
path_str = path_str.encode(self.fs_encoding)
if use_ignore_fs:
# check whether we've got an ignore file inside the section path
if self.is_physically_ignored(path_str):
continue
if self.is_path_ignored(path_str):
# is the path in our ignored paths setting?
continue
# section not ignored, check for write permissions
if not check_write_permissions(path_str):
# not enough permissions
self.missing_permissions.append((title, location.path))
all_permissions_ok = False
return all_permissions_ok
def get_version(self):
curDir = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
info_file_path = os.path.abspath(os.path.join(curDir, "..", "..", "Info.plist"))
data = FileIO.read(info_file_path)
result = VERSION_RE.search(data)
if result:
return result.group(1)
def parse_ignore_paths(self):
paths = Prefs["subtitles.ignore_paths"]
if paths:
try:
return [path.strip() for path in paths.split(",")]
except:
Log.Error("Couldn't parse your ignore paths settings: %s" % paths)
return []
def is_physically_ignored(self, folder):
# check whether we've got an ignore file inside the path
for ifn in IGNORE_FN:
if os.path.isfile(os.path.join(folder, ifn)):
Log.Info(u'Ignoring "%s" because "%s" exists', folder, ifn)
return True
return False
def is_path_ignored(self, fn):
for path in self.ignore_paths:
if fn.startswith(path):
return True
return False
def check_notify_executable(self):
fn = Prefs["notify_executable"]
if not fn:
return
splitted_fn = fn.split()
exe_fn = splitted_fn[0]
arguments = [arg.strip() for arg in splitted_fn[1:]]
if os.path.isfile(exe_fn) and os.access(exe_fn, os.X_OK):
return exe_fn, arguments
Log.Error("Notify executable not existing or not executable: %s" % exe_fn)
def check_enabled_sections(self):
enabled_for_primary_agents = []
enabled_sections = {}
# find which agents we're enabled for
for agent in Plex.agents():
if not agent.primary:
continue
for t in list(agent.media_types):
if t.media_type in (MOVIE, SHOW):
related_agents = Plex.primary_agent(agent.identifier, t.media_type)
for a in related_agents:
if a.identifier == PLUGIN_IDENTIFIER and a.enabled:
enabled_for_primary_agents.append(agent.identifier)
# find the libraries that use them
for library in self.sections:
if library.agent in enabled_for_primary_agents:
enabled_sections[library.key] = library
Log.Debug(u"I'm enabled for: %s" % [lib.title for key, lib in enabled_sections.iteritems()])
return enabled_sections
# Prepare a list of languages we want subs for
def get_lang_list(self):
l = {Language.fromietf(Prefs["langPref1"])}
lang_custom = Prefs["langPrefCustom"].strip()
if Prefs['subtitles.only_one']:
return l
if Prefs["langPref2"] != "None":
l.update({Language.fromietf(Prefs["langPref2"])})
if Prefs["langPref3"] != "None":
l.update({Language.fromietf(Prefs["langPref3"])})
if len(lang_custom) and lang_custom != "None":
for lang in lang_custom.split(u","):
lang = lang.strip()
try:
real_lang = Language.fromietf(lang)
except:
try:
real_lang = Language.fromname(lang)
except:
continue
l.update({real_lang})
return l
def get_subtitle_destination_folder(self):
if not Prefs["subtitles.save.filesystem"]:
return
fld_custom = Prefs["subtitles.save.subFolder.Custom"].strip() if bool(Prefs["subtitles.save.subFolder.Custom"]) else None
return fld_custom or (Prefs["subtitles.save.subFolder"] if Prefs["subtitles.save.subFolder"] != "current folder" else None)
def get_providers(self):
providers = {'opensubtitles': Prefs['provider.opensubtitles.enabled'],
#'thesubdb': Prefs['provider.thesubdb.enabled'],
'podnapisi': Prefs['provider.podnapisi.enabled'],
'addic7ed': Prefs['provider.addic7ed.enabled'],
'tvsubtitles': Prefs['provider.tvsubtitles.enabled']
}
return filter(lambda prov: providers[prov], providers)
def get_provider_settings(self):
provider_settings = {'addic7ed': {'username': Prefs['provider.addic7ed.username'],
'password': Prefs['provider.addic7ed.password'],
'use_random_agents': Prefs['provider.addic7ed.use_random_agents'],
},
'opensubtitles': {'username': Prefs['provider.opensubtitles.username'],
'password': Prefs['provider.opensubtitles.password'],
'use_tag_search': Prefs['provider.opensubtitles.use_tags']
},
}
return provider_settings
config = Config()
+204
View File
@@ -0,0 +1,204 @@
# coding=utf-8
import os
import traceback
import unicodedata
import datetime
import urllib
import time
import re
import platform
import subprocess
# Unicode control characters can appear in ID3v2 tags but are not legal in XML.
RE_UNICODE_CONTROL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
u'|' + \
u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \
(
unichr(0xd800), unichr(0xdbff), unichr(0xdc00), unichr(0xdfff),
unichr(0xd800), unichr(0xdbff), unichr(0xdc00), unichr(0xdfff),
unichr(0xd800), unichr(0xdbff), unichr(0xdc00), unichr(0xdfff)
)
# A platform independent way to split paths which might come in with different separators.
def split_path(str):
if str.find('\\') != -1:
return str.split('\\')
else:
return str.split('/')
def unicodize(s):
filename = s
try:
filename = unicodedata.normalize('NFC', unicode(s.decode('utf-8')))
except:
Log('Failed to unicodize: ' + filename)
try:
filename = re.sub(RE_UNICODE_CONTROL, '', filename)
except:
Log('Couldn\'t strip control characters: ' + filename)
return filename
def clean_filename(filename):
# this will remove any whitespace and punctuation chars and replace them with spaces, strip and return as lowercase
return string.translate(filename.encode('utf-8'), string.maketrans(string.punctuation + string.whitespace,
' ' * len(
string.punctuation + string.whitespace))).strip().lower()
def is_recent(t):
now = datetime.datetime.now()
when = datetime.datetime.fromtimestamp(t)
value, key = Prefs["scheduler.item_is_recent_age"].split()
if now - datetime.timedelta(**{key: int(value)}) < when:
return True
return False
# thanks, Plex-Trakt-Scrobbler
def str_pad(s, length, align='left', pad_char=' ', trim=False):
if not s:
return s
if not isinstance(s, (str, unicode)):
s = str(s)
if len(s) == length:
return s
elif len(s) > length and not trim:
return s
if align == 'left':
if len(s) > length:
return s[:length]
else:
return s + (pad_char * (length - len(s)))
elif align == 'right':
if len(s) > length:
return s[len(s) - length:]
else:
return (pad_char * (length - len(s))) + s
else:
raise ValueError("Unknown align type, expected either 'left' or 'right'")
def pad_title(value):
"""Pad a title to 30 characters to force the 'details' view."""
return str_pad(value, 30, pad_char=' ')
def format_item(item, kind, parent=None, parent_title=None, section_title=None, add_section_title=False):
"""
:param item: plex item
:param kind: show or movie
:param parent: season or None
:param parent_title: parentTitle or None
:return:
"""
return format_video(kind, item.title,
section_title=(
section_title or (parent.section.title if parent and getattr(parent, "section") else None)),
parent_title=(parent_title or (parent.show.title if parent else None)),
season=parent.index if parent else None,
episode=item.index if kind == "show" else None,
add_section_title=add_section_title)
def format_video(kind, title, section_title=None, parent_title=None, season=None, episode=None,
add_section_title=False):
section_add = ""
if add_section_title:
section_add = ("%s: " % section_title) if section_title else ""
if kind == "show" and parent_title:
if season and episode:
return '%s%s S%02dE%02d, %s' % (section_add, parent_title, season or 0, episode or 0, title)
return '%s%s, %s' % (section_add, parent_title, title)
return "%s%s" % (section_add, title)
def encode_message(base, s):
return "%s?message=%s" % (base, urllib.quote_plus(s))
def decode_message(s):
return urllib.unquote_plus(s)
def timestamp():
return int(time.time())
def query_plex(url, args):
"""
simple http query to the plex API without parsing anything too complicated
:param url:
:param args:
:return:
"""
use_args = args.copy()
computed_args = "&".join(["%s=%s" % (key, String.Quote(value)) for key, value in use_args.iteritems()])
return HTTP.Request(url + ("?%s" % computed_args) if computed_args else "", immediate=True)
def check_write_permissions(path):
if platform.system() == "Windows":
# physical access check
check_path = os.path.join(os.path.realpath(path), ".sz_perm_chk")
try:
if os.path.exists(check_path):
os.rmdir(check_path)
os.mkdir(check_path)
os.rmdir(check_path)
return True
except OSError:
pass
else:
# os.access check
return os.access(path, os.W_OK | os.X_OK)
return False
def get_item_hints(title, kind, series=None):
hints = {"expected_title": [title]}
hints.update({"type": "episode", "expected_series": [series]} if kind == "series" else {"type": "movie"})
return hints
def notify_executable(exe_info, videos, subtitles, storage):
variables = (
"subtitle_language", "subtitle_path", "subtitle_filename", "provider", "score", "storage", "series_id",
"series", "title", "section", "filename", "path", "folder", "season_id", "type", "id", "season"
)
exe, arguments = exe_info
for video, video_subtitles in subtitles.items():
for subtitle in video_subtitles:
lang = Locale.Language.Match(subtitle.language.alpha2)
data = video.plexapi_metadata.copy()
data.update({
"subtitle_language": lang,
"provider": subtitle.provider_name,
"score": subtitle.score,
"storage": storage,
"subtitle_path": subtitle.storage_path,
"subtitle_filename": os.path.basename(subtitle.storage_path)
})
# fill missing data with None
prepared_data = dict((v, data.get(v)) for v in variables)
prepared_arguments = [arg % prepared_data for arg in arguments]
Log.Debug(u"Calling %s with arguments: %s" % (exe, prepared_arguments))
try:
output = subprocess.check_output([exe] + prepared_arguments, stderr=subprocess.STDOUT)
except subprocess.CalledProcessError:
Log.Error(u"Calling %s failed: %s" % (exe, traceback.format_exc()))
else:
Log.Debug(u"Process output: %s" % output)
+62
View File
@@ -0,0 +1,62 @@
# coding=utf-8
from subzero.lib.dict import DictProxy
class IgnoreDict(DictProxy):
store = "ignore"
# single item keys returned by helpers.items.getItems mapped to their parents
translate_keys = {
"section": "sections",
"show": "series",
"movie": "videos",
"episode": "videos"
}
# getItems types mapped to their verbose names
keys_verbose = {
"sections": "Section",
"series": "Series",
"videos": "Item",
}
key_order = ("sections", "series", "videos")
def __len__(self):
try:
return sum(len(self.Dict[self.store][key]) for key in self.key_order)
except KeyError:
# old version
self.Dict[self.store] = self.setup_defaults()
return 0
def translate_key(self, name):
return self.translate_keys.get(name)
def verbose(self, name):
return self.keys_verbose.get(name)
def get_title_key(self, kind, key):
return "%s_%s" % (kind, key)
def add_title(self, kind, key, title):
self["titles"][self.get_title_key(kind, key)] = title
def remove_title(self, kind, key):
title_key = self.get_title_key(kind, key)
if title_key in self.titles:
del self.titles[title_key]
def get_title(self, kind, key):
title_key = self.get_title_key(kind, key)
if title_key in self.titles:
return self.titles[title_key]
def save(self):
Dict.Save()
def setup_defaults(self):
return {"sections": [], "series": [], "videos": [], "titles": {}}
ignore_list = IgnoreDict(Dict)
+259
View File
@@ -0,0 +1,259 @@
# coding=utf-8
import logging
import re
import types
import os
from ignore import ignore_list
from helpers import is_recent, format_item, query_plex
from subzero import intent
from lib import Plex
from config import config, IGNORE_FN
logger = logging.getLogger(__name__)
MI_KIND, MI_TITLE, MI_KEY, MI_DEEPER, MI_ITEM = 0, 1, 2, 3, 4
container_size_re = re.compile(ur'totalSize="(\d+)"')
def get_item(key):
item_id = int(key)
item_container = Plex["library"].metadata(item_id)
item = list(item_container)[0]
return item
def get_item_kind(item):
return type(item).__name__
def get_item_thumb(item):
kind = get_item_kind(item)
if kind == "Episode":
return item.show.thumb
elif kind == "Section":
return item.art
return item.thumb
def get_items_info(items):
return items[0][MI_KIND], items[0][MI_DEEPER]
def get_kind(items):
return items[0][MI_KIND]
def get_section_size(key):
"""
quick query to determine the section size
:param key:
:return:
"""
size = None
url = "http://127.0.0.1:32400/library/sections/%s/all" % int(key)
use_args = {
"X-Plex-Container-Size": "0",
"X-Plex-Container-Start": "0"
}
response = query_plex(url, use_args)
matches = container_size_re.findall(response.content)
if matches:
size = int(matches[0])
return size
def get_items(key="recently_added", base="library", value=None, flat=False, add_section_title=False):
"""
try to handle all return types plex throws at us and return a generalized item tuple
"""
items = []
apply_value = None
if value:
if isinstance(value, types.ListType):
apply_value = value
else:
apply_value = [value]
result = getattr(Plex[base], key)(*(apply_value or []))
for item in result:
cls = getattr(getattr(item, "__class__"), "__name__")
if hasattr(item, "scanner"):
kind = "section"
elif cls == "Directory":
kind = "directory"
else:
kind = item.type
# only return items for our enabled sections
section_key = None
if kind == "section":
section_key = item.key
else:
if hasattr(item, "section_key"):
section_key = getattr(item, "section_key")
if section_key and section_key not in config.enabled_sections:
continue
if kind == "season":
# fixme: i think this case is unused now
if flat:
# return episodes
for child in item.children():
items.append(("episode", format_item(child, "show", parent=item, add_section_title=add_section_title), int(item.rating_key),
False, child))
else:
# return seasons
items.append(("season", item.title, int(item.rating_key), True, item))
elif kind == "directory":
items.append(("directory", item.title, item.key, True, item))
elif kind == "section":
if item.type in ['movie', 'show']:
item.size = get_section_size(item.key)
items.append(("section", item.title, int(item.key), True, item))
elif kind == "episode":
items.append(
(kind, format_item(item, "show", parent=item.season, parent_title=item.show.title, section_title=item.section.title,
add_section_title=add_section_title), int(item.rating_key), False, item))
elif kind in ("movie", "artist", "photo"):
items.append((kind, format_item(item, kind, section_title=item.section.title, add_section_title=add_section_title),
int(item.rating_key), False, item))
elif kind == "show":
items.append((
kind, format_item(item, kind, section_title=item.section.title, add_section_title=add_section_title), int(item.rating_key), True,
item))
return items
def get_recently_added_items():
items = get_items(key="recently_added")
return filter(lambda x: is_recent(x[MI_ITEM].added_at), items)
def get_recent_items():
"""
actually get the recent items, not limited like /library/recentlyAdded
:return:
"""
args = {
"sort": "addedAt:desc",
"X-Plex-Container-Start": "0",
"X-Plex-Container-Size": "%s" % config.max_recent_items_per_library
}
episode_re = re.compile(ur'ratingKey="(?P<key>\d+)"'
ur'.+?grandparentRatingKey="(?P<parent_key>\d+)"'
ur'.+?title="(?P<title>.*?)"'
ur'.+?grandparentTitle="(?P<parent_title>.*?)"'
ur'.+?index="(?P<episode>\d+?)"'
ur'.+?parentIndex="(?P<season>\d+?)".+?addedAt="(?P<added>\d+)"')
movie_re = re.compile(ur'ratingKey="(?P<key>\d+)".+?title="(?P<title>.*?)".+?addedAt="(?P<added>\d+)"')
available_keys = ("key", "title", "parent_key", "parent_title", "season", "episode", "added")
recent = []
for section in Plex["library"].sections():
if section.type not in ("movie", "show") \
or section.key not in config.enabled_sections \
or section.key in ignore_list.sections:
Log.Debug(u"Skipping section: %s" % section.title)
continue
use_args = args.copy()
if section.type == "show":
use_args["type"] = "4"
url = "http://127.0.0.1:32400/library/sections/%s/all" % int(section.key)
response = query_plex(url, use_args)
matcher = episode_re if section.type == "show" else movie_re
matches = [m.groupdict() for m in matcher.finditer(response.content)]
for match in matches:
data = dict((key, match[key] if key in match else None) for key in available_keys)
if section.type == "show" and data["parent_key"] in ignore_list.series:
Log.Debug(u"Skipping series: %s" % data["parent_title"])
continue
if data["key"] in ignore_list.videos:
Log.Debug(u"Skipping item: %s" % data["title"])
continue
if is_recent(int(data["added"])):
recent.append((int(data["added"]), section.type, section.title, data["key"]))
return recent
def get_on_deck_items():
return get_items(key="on_deck", add_section_title=True)
def get_all_items(key, base="library", value=None, flat=False):
return get_items(key, base=base, value=value, flat=flat)
def is_ignored(rating_key, item=None):
"""
check whether an item, its show/season/section is in the soft or the hard ignore list
:param rating_key:
:param item:
:return:
"""
# item in soft ignore list
if rating_key in ignore_list["videos"]:
Log.Debug("Item %s is in the soft ignore list" % rating_key)
return True
item = item or get_item(rating_key)
kind = get_item_kind(item)
# show in soft ignore list
if kind == "Episode" and item.show.rating_key in ignore_list["series"]:
Log.Debug("Item %s's show is in the soft ignore list" % rating_key)
return True
# section in soft ignore list
if item.section.key in ignore_list["sections"]:
Log.Debug("Item %s's section is in the soft ignore list" % rating_key)
return True
# physical/path ignore
if Prefs["subtitles.ignore_fs"] or config.ignore_paths:
# normally check current item folder and the library
check_ignore_paths = [".", "../"]
if kind == "Episode":
# series/episode, we've got a season folder here, also
check_ignore_paths.append("../../")
for part in item.media.parts:
if config.ignore_paths and config.is_path_ignored(part.file):
Log.Debug("Item %s's path is manually ignored" % rating_key)
return True
if Prefs["subtitles.ignore_fs"]:
for sub_path in check_ignore_paths:
if config.is_physically_ignored(os.path.abspath(os.path.join(os.path.dirname(part.file), sub_path))):
Log.Debug("An ignore file exists in either the items or its parent folders")
return True
return False
def refresh_item(rating_key, force=False, timeout=8000, refresh_kind=None, parent_rating_key=None):
# timeout actually is the time for which the intent will be valid
if force:
intent.set("force", rating_key, timeout=timeout)
if refresh_kind == "episode":
# season refresh
rating_key = parent_rating_key
Log.Info("%s item %s", "Refreshing" if not force else "Forced-refreshing", rating_key)
Plex["library/metadata"].refresh(rating_key)
+37
View File
@@ -0,0 +1,37 @@
# coding=utf-8
import plex
from subzero.lib.httpfake import PlexPyNativeResponseProxy
class PlexPyNativeRequestProxy(object):
"""
A really dumb object that tries to mimic requests.Request in an incomplete way, so that plex.Plex
uses native plex HTTPRequests instead of the better requests.Request class.
This allows us to operate freely on 127.0.0.1's PMS.
To be used in conjunction with subzero.lib.httpfake.PlexPyNativeResponseProxy
"""
url = None
data = None
headers = None
method = None
def prepare(self):
return self
def send(self):
# fixme: add self.data to HTTP.Request
data = None
status_code = 200
try:
data = HTTP.Request(self.url, headers=self.headers, immediate=True, method=self.method)
except Ex.HTTPError as e:
status_code = e.code
return PlexPyNativeResponseProxy(data, status_code, self)
plex.request.Request = PlexPyNativeRequestProxy
Plex = plex.Plex
+119
View File
@@ -0,0 +1,119 @@
# coding=utf-8
import os
import config
import helpers
import subtitlehelpers
from config import config as sz_config
def find_subtitles(part):
lang_sub_map = {}
part_filename = helpers.unicodize(part.file)
part_basename = os.path.splitext(os.path.basename(part_filename))[0]
use_filesystem = bool(Prefs["subtitles.save.filesystem"])
paths = [os.path.dirname(part_filename)] if use_filesystem else []
global_subtitle_folder = None
if use_filesystem:
# Check for local subtitles subdirectory
sub_dir_base = paths[0]
sub_dir_list = []
if Prefs["subtitles.save.subFolder"] != "current folder":
# got selected subfolder
sub_dir_list.append(os.path.join(sub_dir_base, Prefs["subtitles.save.subFolder"]))
sub_dir_custom = Prefs["subtitles.save.subFolder.Custom"].strip() if bool(Prefs["subtitles.save.subFolder.Custom"]) else None
if sub_dir_custom:
# got custom subfolder
if sub_dir_custom.startswith("/"):
# absolute folder
sub_dir_list.append(sub_dir_custom)
else:
# relative folder
sub_dir_list.append(os.path.join(sub_dir_base, sub_dir_custom))
for sub_dir in sub_dir_list:
if os.path.isdir(sub_dir):
paths.append(sub_dir)
# Check for a global subtitle location
global_subtitle_folder = os.path.join(Core.app_support_path, 'Subtitles')
if os.path.exists(global_subtitle_folder):
paths.append(global_subtitle_folder)
# We start by building a dictionary of files to their absolute paths. We also need to know
# the number of media files that are actually present, in case the found local media asset
# is limited to a single instance per media file.
#
file_paths = {}
total_media_files = 0
for path in paths:
path = helpers.unicodize(path)
for file_path_listing in os.listdir(path.encode(sz_config.fs_encoding)):
# When using os.listdir with a unicode path, it will always return a string using the
# NFD form. However, we internally are using the form NFC and therefore need to convert
# it to allow correct regex / comparisons to be performed.
#
file_path_listing = helpers.unicodize(file_path_listing)
if os.path.isfile(os.path.join(path, file_path_listing).encode(sz_config.fs_encoding)):
file_paths[file_path_listing.lower()] = os.path.join(path, file_path_listing)
# If we've found an actual media file, we should record it.
(root, ext) = os.path.splitext(file_path_listing)
if ext.lower()[1:] in config.VIDEO_EXTS:
total_media_files += 1
Log('Looking for subtitle media in %d paths with %d media files.', len(paths), total_media_files)
Log('Paths: %s', ", ".join([helpers.unicodize(p) for p in paths]))
for file_path in file_paths.values():
local_basename = helpers.unicodize(os.path.splitext(os.path.basename(file_path))[0])
local_basename2 = local_basename.rsplit('.', 1)[0]
filename_matches_part = local_basename == part_basename or local_basename2 == part_basename
# If the file is located within the global subtitle folder and it's name doesn't match exactly
# then we should simply ignore it.
#
if global_subtitle_folder and file_path.count(global_subtitle_folder) and not filename_matches_part:
continue
# If we have more than one media file within the folder and located filename doesn't match
# exactly then we should simply ignore it.
#
if total_media_files > 1 and not filename_matches_part:
continue
subtitle_helper = subtitlehelpers.subtitle_helpers(file_path)
if subtitle_helper != None:
local_lang_map = subtitle_helper.process_subtitles(part)
for new_language, subtitles in local_lang_map.items():
# Add the possible new language along with the located subtitles so that we can validate them
# at the end...
#
if not lang_sub_map.has_key(new_language):
lang_sub_map[new_language] = []
lang_sub_map[new_language] = lang_sub_map[new_language] + subtitles
# add known metadata subs to our sub list
if not use_filesystem:
for language, sub_list in subtitlehelpers.get_subtitles_from_metadata(part).iteritems():
if sub_list:
if language not in lang_sub_map:
lang_sub_map[language] = []
lang_sub_map[language] = lang_sub_map[language] + sub_list
# Now whack subtitles that don't exist anymore.
for language in lang_sub_map.keys():
part.subtitles[language].validate_keys(lang_sub_map[language])
# Now whack the languages that don't exist anymore.
for language in list(set(part.subtitles.keys()) - set(lang_sub_map.keys())):
part.subtitles[language].validate_keys({})
+77
View File
@@ -0,0 +1,77 @@
# coding=utf-8
import traceback
from support.config import config
from support.helpers import format_item
from support.items import get_item
from support.lib import Plex
def item_discover_missing_subs(rating_key, kind="show", added_at=None, section_title=None, internal=False, external=True, languages=()):
existing_subs = {"internal": [], "external": [], "count": 0}
item_id = int(rating_key)
item = get_item(rating_key)
if kind == "show":
item_title = format_item(item, kind, parent=item.season, section_title=section_title, parent_title=item.show.title)
else:
item_title = format_item(item, kind, section_title=section_title)
video = item.media
for part in video.parts:
for stream in part.streams:
if stream.stream_type == 3:
if stream.index:
key = "internal"
else:
key = "external"
existing_subs[key].append(Locale.Language.Match(stream.language_code or ""))
existing_subs["count"] = existing_subs["count"] + 1
missing = languages
if existing_subs["count"]:
existing_flat = (existing_subs["internal"] if internal else []) + (existing_subs["external"] if external else [])
languages_set = set(languages)
if languages_set.issubset(existing_flat) or (len(existing_flat) >= 1 and Prefs['subtitles.only_one']):
# all subs found
Log.Info(u"All subtitles exist for '%s'", item_title)
return
missing = languages_set - set(existing_flat)
Log.Info(u"Subs still missing for '%s': %s", item_title, missing)
if missing:
return added_at, item_id, item_title, item
def items_get_all_missing_subs(items):
missing = []
for added_at, kind, section_title, key in items:
try:
state = item_discover_missing_subs(
key,
kind=kind,
added_at=added_at,
section_title=section_title,
languages=config.lang_list,
internal=bool(Prefs["subtitles.scan.embedded"]),
external=bool(Prefs["subtitles.scan.external"])
)
if state:
# (added_at, item_id, title)
missing.append(state)
except:
Log.Error("Something went wrong when getting the state of item %s: %s", key, traceback.format_exc())
return missing
def refresh_item(item, title):
Plex["library/metadata"].refresh(item)
def refresh_items(items):
for item, title in items:
refresh_item(item, title)
+139
View File
@@ -0,0 +1,139 @@
# coding=utf-8
import os
import subliminal
import helpers
from items import get_item
from subzero import intent
def flatten_media(media, kind="series"):
"""
iterates through media and returns the associated parts (videos)
:param media:
:param kind:
:return:
"""
parts = []
def get_metadata_dict(item, part, add):
data = {
"section": item.section.title,
"path": part.file,
"folder": os.path.dirname(part.file),
"filename": os.path.basename(part.file)
}
data.update(add)
return data
if kind == "series":
for season in media.seasons:
season_object = media.seasons[season]
for episode in media.seasons[season].episodes:
ep = media.seasons[season].episodes[episode]
# get plex item via API for additional metadata
plex_episode = get_item(ep.id)
for item in media.seasons[season].episodes[episode].items:
for part in item.parts:
parts.append(
get_metadata_dict(plex_episode, part,
{"video": part, "type": "episode", "title": ep.title,
"series": media.title, "id": ep.id,
"series_id": media.id, "season_id": season_object.id,
"season": plex_episode.season.index,
})
)
else:
plex_item = get_item(media.id)
for item in media.items:
for part in item.parts:
parts.append(
get_metadata_dict(plex_item, part, {"video": part, "type": "movie",
"title": media.title, "id": media.id,
"series_id": None,
"season_id": None,
"section": plex_item.section.title})
)
return parts
IGNORE_FN = ("subzero.ignore", ".subzero.ignore", ".nosz")
def convert_media_to_parts(media, kind="series"):
"""
returns a list of parts to be used later on; ignores folders with an existing "subzero.ignore" file
:param media:
:param kind:
:return:
"""
return flatten_media(media, kind=kind)
def get_stream_fps(streams):
"""
accepts a list of plex streams or a list of the plex api streams
"""
for stream in streams:
# video
stream_type = getattr(stream, "type", getattr(stream, "stream_type", None))
if stream_type == 1:
return getattr(stream, "frameRate", getattr(stream, "frame_rate", "25.000"))
return "25.000"
def get_media_item_ids(media, kind="series"):
ids = []
if kind == "movies":
ids.append(media.id)
else:
for season in media.seasons:
for episode in media.seasons[season].episodes:
ids.append(media.seasons[season].episodes[episode].id)
return ids
def scan_video(plex_video, ignore_all=False, hints=None):
embedded_subtitles = not ignore_all and Prefs['subtitles.scan.embedded']
external_subtitles = not ignore_all and Prefs['subtitles.scan.external']
if ignore_all:
Log.Debug("Force refresh intended.")
Log.Debug("Scanning video: %s, subtitles=%s, embedded_subtitles=%s" % (plex_video.file, external_subtitles, embedded_subtitles))
try:
return subliminal.video.scan_video(plex_video.file, subtitles=external_subtitles, embedded_subtitles=embedded_subtitles,
hints=hints or {}, video_fps=plex_video.fps)
except ValueError:
Log.Warn("File could not be guessed by subliminal")
def scan_parts(parts, kind="series"):
"""
receives a list of parts containing dictionaries returned by flattenToParts
:param parts:
:param kind: series or movies
:return: dictionary of subliminal.video.scan_video, key=subliminal scanned video, value=plex file part
"""
ret = {}
for part in parts:
force_refresh = intent.get("force", part["id"], part["series_id"], part["season_id"])
hints = helpers.get_item_hints(part["title"], kind, series=part["series"] if kind == "series" else None)
part["video"].fps = get_stream_fps(part["video"].streams)
scanned_video = scan_video(part["video"], ignore_all=force_refresh, hints=hints)
if not scanned_video:
continue
scanned_video.id = part["id"]
part_metadata = part.copy()
del part_metadata["video"]
scanned_video.plexapi_metadata = part_metadata
ret[scanned_video] = part["video"]
return ret
+92
View File
@@ -0,0 +1,92 @@
# coding=utf-8
import datetime
import pprint
def get_subtitle_info(rating_key):
return Dict["subs"].get(rating_key)
def whack_missing_parts(videos, existing_parts=None):
"""
cleans out our internal storage's video parts (parts may get updated/deleted/whatever)
:param existing_parts: optional list of part ids known
:param videos: videos to check for
:return:
"""
# shortcut
if not existing_parts:
existing_parts = []
for part in videos.viewvalues():
existing_parts.append(part.id)
whacked_parts = False
for video in videos.keys():
if video.id not in Dict["subs"]:
continue
for part_id in Dict["subs"][video.id].keys():
if part_id not in existing_parts:
del Dict["subs"][video.id][part_id]
Log.Info("Whacking part %s in internal storage of video %s", part_id, video.id)
whacked_parts = True
if whacked_parts:
Dict.Save()
def store_subtitle_info(videos, subtitles, storage_type):
"""
stores information about downloaded subtitles in plex's Dict()
"""
if "subs" not in Dict:
Dict["subs"] = {}
storage = Dict["subs"]
existing_parts = []
for video, video_subtitles in subtitles.items():
part = videos[video]
if video.id not in storage:
storage[video.id] = {}
video_dict = storage[video.id]
if part.id not in video_dict:
video_dict[part.id] = {}
existing_parts.append(part.id)
part_dict = video_dict[part.id]
for subtitle in video_subtitles:
lang = Locale.Language.Match(subtitle.language.alpha2)
if lang not in part_dict:
part_dict[lang] = {}
lang_dict = part_dict[lang]
sub_key = (subtitle.provider_name, subtitle.id)
lang_dict[sub_key] = dict(score=subtitle.score, link=subtitle.page_link, storage=storage_type, hash=Hash.MD5(subtitle.content),
date_added=datetime.datetime.now())
lang_dict["current"] = sub_key
if existing_parts:
whack_missing_parts(videos, existing_parts=existing_parts)
Dict.Save()
def reset_storage(key):
"""
resets the Dict[key] storage, thanks to https://docs.google.com/document/d/1hhLjV1pI-TA5y91TiJq64BdgKwdLnFt4hWgeOqpz1NA/edit#
We can't use the nice Plex interface for this, as it calls get multiple times before set
#Plex[":/plugins/*/prefs"].set("com.plexapp.agents.subzero", "reset_storage", False)
"""
Log.Debug("resetting storage")
Dict[key] = {}
Dict.Save()
def log_storage(key):
if key in Dict:
Log.Debug(pprint.pformat(Dict[key]))
+167
View File
@@ -0,0 +1,167 @@
# coding=utf-8
import re, os
import config
import helpers
from bs4 import UnicodeDammit
class SubtitleHelper(object):
def __init__(self, filename):
self.filename = filename
def subtitle_helpers(filename):
filename = helpers.unicodize(filename)
for cls in [VobSubSubtitleHelper, DefaultSubtitleHelper]:
if cls.is_helper_for(filename):
return cls(filename)
return None
#####################################################################################################################
class VobSubSubtitleHelper(SubtitleHelper):
@classmethod
def is_helper_for(cls, filename):
(file, file_extension) = os.path.splitext(filename)
# We only support idx (and maybe sub)
if not file_extension.lower() in ['.idx', '.sub']:
return False
# If we've been given a sub, we only support it if there exists a matching idx file
return os.path.exists(file + '.idx')
def process_subtitles(self, part):
lang_sub_map = {}
# We don't directly process the sub file, only the idx. Therefore if we are passed on of these files, we simply
# ignore it.
(file, ext) = os.path.splitext(self.filename)
if ext == '.sub':
return lang_sub_map
# If we have an idx file, we need to confirm there is an identically names sub file before we can proceed.
sub_filename = file + ".sub"
if not os.path.exists(sub_filename):
return lang_sub_map
Log('Attempting to parse VobSub file: ' + self.filename)
idx = Core.storage.load(os.path.join(self.filename))
if idx.count('VobSub index file') == 0:
Log('The idx file does not appear to be a VobSub, skipping...')
return lang_sub_map
languages = {}
language_index = 0
basename = os.path.basename(self.filename)
for language in re.findall('\nid: ([A-Za-z]{2})', idx):
if not languages.has_key(language):
languages[language] = []
Log('Found .idx subtitle file: ' + self.filename + ' language: ' + language + ' stream index: ' + str(language_index))
languages[language].append(Proxy.LocalFile(self.filename, index=str(language_index), format="vobsub"))
language_index += 1
if not lang_sub_map.has_key(language):
lang_sub_map[language] = []
lang_sub_map[language].append(basename)
for language, subs in languages.items():
part.subtitles[language][basename] = subs
return lang_sub_map
#####################################################################################################################
class DefaultSubtitleHelper(SubtitleHelper):
@classmethod
def is_helper_for(cls, filename):
(file, file_extension) = os.path.splitext(filename)
return file_extension.lower()[1:] in config.SUBTITLE_EXTS
def process_subtitles(self, part):
lang_sub_map = {}
basename = os.path.basename(self.filename)
(file, ext) = os.path.splitext(self.filename)
# Remove the initial '.' from the extension
ext = ext[1:]
# Attempt to extract the language from the filename (e.g. Avatar (2009).eng)
language = ""
# IETF support thanks to https://github.com/hpsbranco/LocalMedia.bundle/commit/4fad9aefedece78a1fa96401304351347f644369
language_match = re.match(".+\.([^\.]+)$" if not Prefs["subtitles.language.ietf"] else ".+\.([^-.]+)(?:-[A-Za-z]+)?$", file)
if language_match and len(language_match.groups()) == 1:
language = language_match.groups()[0]
language = Locale.Language.Match(language)
codec = None
format = None
if ext in ['txt', 'sub']:
try:
file_contents = Core.storage.load(self.filename)
lines = [line.strip() for line in file_contents.splitlines(True)]
if re.match('^\{[0-9]+\}\{[0-9]*\}', lines[1]):
format = 'microdvd'
elif re.match('^[0-9]{1,2}:[0-9]{2}:[0-9]{2}[:=,]', lines[1]):
format = 'txt'
elif '[SUBTITLE]' in lines[1]:
format = 'subviewer'
else:
Log("The subtitle file does not have a known format, skipping... : " + self.filename)
return lang_sub_map
except:
Log("An error occurred while attempting to parse the subtitle file, skipping... : " + self.filename)
return lang_sub_map
if codec is None and ext in ['ass', 'ssa', 'smi', 'srt', 'psb']:
codec = ext.replace('ass', 'ssa')
if format is None:
format = codec
Log('Found subtitle file: ' + self.filename + ' language: ' + language + ' codec: ' + str(codec) + ' format: ' + str(format))
part.subtitles[language][basename] = Proxy.LocalFile(self.filename, codec=codec, format=format)
lang_sub_map[language] = [basename]
return lang_sub_map
def get_subtitles_from_metadata(part):
subs = {}
for language in part.subtitles:
subs[language] = []
for key, proxy in getattr(part.subtitles[language], "_proxies").iteritems():
if not proxy or not len(proxy) >= 5:
Log.Debug("Can't parse metadata: %s" % repr(proxy))
continue
p_type = proxy[0]
if p_type == "Media":
# metadata subtitle
Log.Debug(u"Found metadata subtitle: %s, %s" % (language, repr(proxy)))
subs[language].append(key)
return subs
def force_utf8(content):
a = UnicodeDammit(content)
Log.Debug("detected encoding: %s (None: most likely already successfully decoded)" % a.original_encoding)
# easy way out - already utf-8
if a.original_encoding and a.original_encoding == "utf-8":
return content
return (a.unicode_markup if a.unicode_markup else content.decode('ascii', 'replace')).encode("utf-8")
+144
View File
@@ -0,0 +1,144 @@
# coding=utf-8
import datetime
import time
from missing_subtitles import items_get_all_missing_subs, refresh_item
from background import scheduler
from support.items import get_recent_items, is_ignored
class Task(object):
name = None
scheduler = None
running = False
time_start = None
stored_attributes = ("last_run", "last_run_time")
# task ready for being status-displayed?
ready_for_display = False
def __init__(self, scheduler):
self.ready_for_display = False
self.running = False
self.time_start = None
self.scheduler = scheduler
if self.name not in Dict["tasks"]:
Dict["tasks"][self.name] = {"last_run": None, "last_run_time": None}
def __getattribute__(self, name):
if name in object.__getattribute__(self, "stored_attributes"):
return Dict["tasks"].get(self.name, {}).get(name, None)
return object.__getattribute__(self, name)
def __setattr__(self, name, value):
if name in object.__getattribute__(self, "stored_attributes"):
Dict["tasks"][self.name][name] = value
Dict.Save()
return
object.__setattr__(self, name, value)
def signal(self, *args, **kwargs):
raise NotImplementedError
def prepare(self):
raise NotImplementedError
def run(self):
raise NotImplementedError
class SearchAllRecentlyAddedMissing(Task):
name = "searchAllRecentlyAddedMissing"
items_done = None
items_searching = None
items_searching_ids = None
items_failed = None
percentage = 0
stall_time = 30
def __init__(self, scheduler):
super(SearchAllRecentlyAddedMissing, self).__init__(scheduler)
self.items_done = None
self.items_searching = None
self.items_searching_ids = None
self.items_failed = None
self.percentage = 0
def signal(self, signal_name, *args, **kwargs):
handler = getattr(self, "signal_%s" % signal_name)
return handler(*args, **kwargs) if handler else None
def signal_updated_metadata(self, *args, **kwargs):
item_id = int(args[0])
if item_id in self.items_searching_ids:
self.items_done.append(item_id)
return True
def prepare(self):
self.items_done = []
recent_items = get_recent_items()
missing = items_get_all_missing_subs(recent_items)
ids = set([id for added_at, id, title, item in missing if not is_ignored(id, item=item)])
self.items_searching = missing
self.items_searching_ids = ids
self.items_failed = []
self.percentage = 0
self.time_start = datetime.datetime.now()
self.ready_for_display = True
def run(self):
self.running = True
missing_count = len(self.items_searching)
items_done_count = 0
for added_at, item_id, title, item in self.items_searching:
Log.Debug(u"Task: %s, triggering refresh for %s (%s)", self.name, title, item_id)
refresh_item(item_id, title)
search_started = datetime.datetime.now()
tries = 1
while 1:
if item_id in self.items_done:
items_done_count += 1
Log.Debug(u"Task: %s, item %s done", self.name, item_id)
self.percentage = int(items_done_count * 100 / missing_count)
break
# item considered stalled after self.stall_time seconds passed after last refresh
if (datetime.datetime.now() - search_started).total_seconds() > self.stall_time:
if tries > 3:
self.items_failed.append(item_id)
Log.Debug(u"Task: %s, item stalled for %s times: %s, skipping", self.name, tries, item_id)
break
Log.Debug(u"Task: %s, item stalled for %s seconds: %s, retrying", self.name, self.stall_time, item_id)
tries += 1
refresh_item(item_id, title)
search_started = datetime.datetime.now()
time.sleep(1)
time.sleep(0.1)
# we can't hammer the PMS, otherwise requests will be stalled
time.sleep(1)
Log.Debug("Task: %s, done. Failed items: %s", self.name, self.items_failed)
self.running = False
def post_run(self):
self.ready_for_display = False
self.last_run = datetime.datetime.now()
if self.time_start:
self.last_run_time = self.last_run - self.time_start
self.time_start = None
self.percentage = 0
self.items_done = None
self.items_failed = None
self.items_searching = None
self.items_searching_ids = None
scheduler.register(SearchAllRecentlyAddedMissing)
+475 -102
View File
@@ -1,104 +1,477 @@
[
{
"id": "provider.addic7ed.username",
"label": "Addic7ed Username",
"type": "text",
"default": "Username"
},
{
"id": "provider.addic7ed.password",
"label": "Addic7ed Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "langPref1",
"label": "Subtitle Language (1)",
"type": "enum",
"values": ["sq","ar","be","bs","bg","ca","zh","cs","da","nl","en","et","fi","fr","de","el","he","hi","hu","is","id","it","ja","ko","lv","lt","mk","ms","no","pl","pt","ro","ru","sr","sk","sl","es","sv","th","tr","uk","vi","hr"],
"default": "en"
},
{
"id": "langPref2",
"label": "Subtitle Language (2)",
"type": "enum",
"values": ["None", "sq","ar","be","bs","bg","ca","zh","cs","da","nl","en","et","fi","fr","de","el","he","hi","hu","is","id","it","ja","ko","lv","lt","mk","ms","no","pl","pt","ro","ru","sr","sk","sl","es","sv","th","tr","uk","vi","hr"],
"default": "None"
},
{
"id": "provider.opensubtitles.enabled",
"label": "Provider: Enable OpenSubtitles",
"type": "bool",
"default": "true"
},
{
"id": "provider.thesubdb.enabled",
"label": "Provider: Enable TheSubDB",
"type": "bool",
"default": "true"
},
{
"id": "provider.podnapisi.enabled",
"label": "Provider: Enable Podnapisi.NET",
"type": "bool",
"default": "true"
},
{
"id": "provider.addic7ed.enabled",
"label": "Provider: Enable Addic7ed",
"type": "bool",
"default": "true"
},
{
"id": "provider.tvsubtitles.enabled",
"label": "Provider: Enable TVsubtitles.net",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.scan.embedded",
"label": "Scan: include embedded subtitles",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.scan.external",
"label": "Scan: include external subtitles",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.search.minimumScore",
"label": "Minimum score for subtitles to download",
"type": "enum",
"values": ["100","95","90","85","80","75","70","65","60","55","50","45","40","35","30","25","20","15","10","5","0"],
"default": "0"
},
{
"id": "subtitles.search.hearingImpaired",
"label": "Download hearing impaired subtitles.",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.save.filesystem",
"label": "Store subtitles next to media files (instead of metadata)",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.save.subFolder",
"label": "Subtitle Folder (\"current folder\" is the folder the current media file lives in)",
"type": "enum",
"values": ["current folder", "sub", "subs", "subtitle", "subtitles"],
"default": "current folder"
},
{
"id": "subtitles.save.subFolder.Custom",
"label": "Custom Subtitle folder (computes to real paths; use for example \"bla\" as a subfolder of the current media file folder - can use real paths aswell)",
"type": "text",
"default": ""
},
{
"id": "enable_channel",
"label": "Enable Sub-Zero channel (disabling doesn't affect the subtitle features)?",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.try_downloads",
"label": "How many download tries per subtitle (on timeout or error)",
"type": "enum",
"values": [
"1",
"2",
"3",
"4"
],
"default": "2"
},
{
"id": "provider.addic7ed.username",
"label": "Addic7ed Username",
"type": "text",
"default": ""
},
{
"id": "provider.addic7ed.password",
"label": "Addic7ed Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "provider.opensubtitles.username",
"label": "Opensubtitles Username (VIP)",
"type": "text",
"default": ""
},
{
"id": "provider.opensubtitles.password",
"label": "Opensubtitles Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "provider.addic7ed.use_random_agents",
"label": "Addic7ed: Use random user agents (should not be necessary)",
"type": "bool",
"default": "false"
},
{
"id": "langPref1",
"label": "Subtitle Language (1)",
"type": "enum",
"values": [
"sq",
"ar",
"be",
"bs",
"bg",
"ca",
"zh",
"cs",
"da",
"nl",
"en",
"et",
"fi",
"fr",
"de",
"el",
"he",
"hi",
"hu",
"is",
"id",
"it",
"ja",
"ko",
"lv",
"lt",
"mk",
"ms",
"no",
"fa",
"pl",
"pt",
"pt-br",
"ro",
"ru",
"sr",
"sk",
"sl",
"es",
"sv",
"th",
"tr",
"uk",
"vi",
"hr"
],
"default": "en"
},
{
"id": "langPref2",
"label": "Subtitle Language (2)",
"type": "enum",
"values": [
"None",
"sq",
"ar",
"be",
"bs",
"bg",
"ca",
"zh",
"cs",
"da",
"nl",
"en",
"et",
"fi",
"fr",
"de",
"el",
"he",
"hi",
"hu",
"is",
"id",
"it",
"ja",
"ko",
"lv",
"lt",
"mk",
"ms",
"no",
"fa",
"pl",
"pt",
"pt-br",
"ro",
"ru",
"sr",
"sk",
"sl",
"es",
"sv",
"th",
"tr",
"uk",
"vi",
"hr"
],
"default": "None"
},
{
"id": "langPref3",
"label": "Subtitle Language (3)",
"type": "enum",
"values": [
"None",
"sq",
"ar",
"be",
"bs",
"bg",
"ca",
"zh",
"cs",
"da",
"nl",
"en",
"et",
"fi",
"fr",
"de",
"el",
"he",
"hi",
"hu",
"is",
"id",
"it",
"ja",
"ko",
"lv",
"lt",
"mk",
"ms",
"no",
"fa",
"pl",
"pt",
"pt-br",
"ro",
"ru",
"sr",
"sk",
"sl",
"es",
"sv",
"th",
"tr",
"uk",
"vi",
"hr"
],
"default": "None"
},
{
"id": "langPrefCustom",
"label": "Additional Subtitle Languages (use ISO-639-1 codes; comma-separated)",
"type": "text",
"default": "None"
},
{
"id": "subtitles.only_one",
"label": "Restrict to one language (skips adding \".lang.\" to the subtitle filename; only uses \"Subtitle Language (1)\")",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.enforce_encoding",
"label": "Normalize subtitle encoding to UTF-8",
"type": "bool",
"default": "true"
},
{
"id": "provider.opensubtitles.enabled",
"label": "Provider: Enable OpenSubtitles",
"type": "bool",
"default": "true"
},
{
"id": "provider.thesubdb.enabled",
"label": "Provider: Enable TheSubDB",
"type": "bool",
"default": "true"
},
{
"id": "provider.podnapisi.enabled",
"label": "Provider: Enable Podnapisi.NET",
"type": "bool",
"default": "true"
},
{
"id": "provider.addic7ed.enabled",
"label": "Provider: Enable Addic7ed",
"type": "bool",
"default": "true"
},
{
"id": "provider.addic7ed.boost",
"label": "Addic7ed: prefer over other providers (if requirements met)",
"type": "bool",
"default": "false"
},
{
"id": "provider.tvsubtitles.enabled",
"label": "Provider: Enable TVsubtitles.net",
"type": "bool",
"default": "true"
},
{
"id": "provider.opensubtitles.use_tags",
"label": "I keep the exact (release-) filename of my media files",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.scan.embedded",
"label": "Scan: include embedded subtitles (in the media file (MKV/MP4), don't download if existing)",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.scan.external",
"label": "Scan: include external subtitles (metadata/filesystem, don't download if existing)",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.search.minimumTVScore",
"label": "Minimum score for TV subtitles to download",
"type": "enum",
"values": [
"100",
"95",
"90",
"85",
"80",
"75",
"70",
"67",
"65",
"60",
"55",
"50",
"45",
"40",
"35",
"30",
"25",
"20",
"15",
"10",
"5",
"0"
],
"default": "85"
},
{
"id": "subtitles.search.minimumMovieScore",
"label": "Minimum score for movie subtitles to download",
"type": "enum",
"values": [
"100",
"95",
"90",
"85",
"80",
"75",
"70",
"65",
"60",
"55",
"50",
"45",
"40",
"35",
"30",
"25",
"23",
"20",
"15",
"10",
"5",
"0"
],
"default": "23"
},
{
"id": "subtitles.search.hearingImpaired",
"label": "Download hearing impaired subtitles.",
"type": "enum",
"values": [
"prefer",
"don't prefer",
"force HI",
"force non-HI"
],
"default": "don't prefer"
},
{
"id": "subtitles.save.filesystem",
"label": "Store subtitles next to media files (instead of metadata)",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.save.subFolder",
"label": "Subtitle Folder (\"current folder\" is the folder the current media file lives in)",
"type": "enum",
"values": [
"current folder",
"sub",
"subs",
"subtitle",
"subtitles"
],
"default": "current folder"
},
{
"id": "subtitles.save.subFolder.Custom",
"label": "Custom Subtitle folder (overrides \"Subtitle Folder\"; computes to real paths)",
"type": "text",
"default": ""
},
{
"id": "subtitles.save.metadata_fallback",
"label": "Fall back to metadata storage if filesystem storage failed",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.language.ietf",
"label": "Treat IETF language tags as ISO 639-1 (e.g. pt-BR = pt)",
"type": "bool",
"default": "true"
},
{
"id": "subtitles.ignore_fs",
"label": "Ignore folders (with \"subzero.ignore/.subzero.ignore/.nosz\" files in them)",
"type": "bool",
"default": "false"
},
{
"id": "subtitles.ignore_paths",
"label": "Ignore anything in the following paths (comma-separated)",
"type": "text",
"default": ""
},
{
"id": "notify_executable",
"label": "Call this executable upon successful subtitle download",
"type": "text",
"default": ""
},
{
"id": "scheduler.tasks.searchAllRecentlyAddedMissing",
"label": "Scheduler: Periodically search for recent items with missing subtitles",
"type": "enum",
"values": [
"never",
"every 1 hours",
"every 3 hours",
"every 6 hours",
"every 12 hours",
"every 24 hours"
],
"default": "every 6 hours"
},
{
"id": "scheduler.item_is_recent_age",
"label": "Scheduler: Item age to be considered recent",
"type": "enum",
"values": [
"1 days",
"2 days",
"3 days",
"4 days",
"1 weeks",
"2 weeks",
"3 weeks",
"4 weeks",
"5 weeks",
"6 weeks"
],
"default": "2 weeks"
},
{
"id": "scheduler.max_recent_items_per_library",
"label": "Scheduler: Recent items to consider per library",
"type": "text",
"default": "200"
},
{
"id": "check_permissions",
"label": "Check for correct folder permissions of every library on plugin start",
"type": "bool",
"default": "true"
},
{
"id": "log_level",
"label": "How verbose should the logging be?",
"type": "enum",
"values": [
"CRITICAL",
"ERROR",
"WARNING",
"INFO",
"DEBUG"
],
"default": "WARNING"
},
{
"id": "log_console",
"label": "Log to console (for development/debugging)",
"type": "bool",
"default": "false"
}
]
Regular → Executable
+24 -8
View File
@@ -4,30 +4,46 @@
<dict>
<key>CFBundleDevelopmentRegion</key>
<string>English</string>
<key>CFBundleExecutable</key>
<string>Test Plug-in</string>
<key>CFBundleIdentifier</key>
<string>com.plexapp.agents.subliminal</string>
<string>com.plexapp.agents.subzero</string>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>CFBundleShortVersionString</key>
<string>1.0</string>
<string>1.3.31</string>
<key>CFBundleSignature</key>
<string>????</string>
<key>CFBundleVersion</key>
<string>1.0</string>
<string>1.3.33.522</string>
<key>PlexFrameworkVersion</key>
<string>2</string>
<key>PlexPluginClass</key>
<string>Agent</string>
<key>PlexPluginMode</key>
<string>AlwaysOn</string>
<string>Daemon</string>
<key>PlexPluginConsoleLogging</key>
<string>1</string>
<string>0</string>
<key>PlexPluginDevMode</key>
<string>1</string>
<string>0</string>
<key>PlexPluginCodePolicy</key>
<!-- this allows channels to access some python methods which are otherwise blocked, as well as import external code libraries, and interact with the PMS HTTP API -->
<string>Elevated</string>
<key>PlexAgentAttributionText</key>
<string>&lt;div style=&quot;white-space: pre;&quot;&gt;&lt;img src=&quot;https://raw.githubusercontent.com/pannal/Sub-Zero.bundle/master/Contents/Resources/subzero.gif&quot; /&gt;
&lt;h1&gt;Sub-Zero for Plex&lt;/h1&gt;&lt;i&gt;Subtitles done right&lt;/i&gt;
Version 1.3.33.522
Originally based on @bramwalet's awesome &lt;a href=&quot;https://github.com/bramwalet/Subliminal.bundle&quot;&gt;Subliminal.bundle&lt;/a&gt;
If you like this, buy me a beer: &lt;a href=&quot;https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&amp;hosted_button_id=G9VKR2B8PMNKG&quot; target=&quot;_blank&quot; title=&quot;donate&quot;&gt;&lt;img src=&quot;https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif&quot; alt=&quot;donate&quot; title=&quot;donate&quot; /&gt;&lt;/a&gt;
&lt;strong&gt;Need help?&lt;/strong&gt;
Plex thread: &lt;a href=&quot;https://forums.plex.tv/discussion/186575&quot;>https://forums.plex.tv/discussion/186575&lt;/a&gt;
Github: &lt;a href=&quot;https://github.com/pannal/Sub-Zero.bundle&quot;&gt;https://github.com/pannal/Sub-Zero&lt;/a&gt;
panni, 2016
&lt;/div&gt;
</string>
</dict>
</plist>
@@ -0,0 +1,16 @@
try:
import ast
from _markerlib.markers import default_environment, compile, interpret
except ImportError:
if 'ast' in globals():
raise
def default_environment():
return {}
def compile(marker):
def marker_fn(environment=None, override=None):
# 'empty markers are True' heuristic won't install extra deps.
return not marker.strip()
marker_fn.__doc__ = marker
return marker_fn
def interpret(marker, environment=None, override=None):
return compile(marker)()
@@ -0,0 +1,119 @@
# -*- coding: utf-8 -*-
"""Interpret PEP 345 environment markers.
EXPR [in|==|!=|not in] EXPR [or|and] ...
where EXPR belongs to any of those:
python_version = '%s.%s' % (sys.version_info[0], sys.version_info[1])
python_full_version = sys.version.split()[0]
os.name = os.name
sys.platform = sys.platform
platform.version = platform.version()
platform.machine = platform.machine()
platform.python_implementation = platform.python_implementation()
a free string, like '2.6', or 'win32'
"""
__all__ = ['default_environment', 'compile', 'interpret']
import ast
import os
import platform
import sys
import weakref
_builtin_compile = compile
try:
from platform import python_implementation
except ImportError:
if os.name == "java":
# Jython 2.5 has ast module, but not platform.python_implementation() function.
def python_implementation():
return "Jython"
else:
raise
# restricted set of variables
_VARS = {'sys.platform': sys.platform,
'python_version': '%s.%s' % sys.version_info[:2],
# FIXME parsing sys.platform is not reliable, but there is no other
# way to get e.g. 2.7.2+, and the PEP is defined with sys.version
'python_full_version': sys.version.split(' ', 1)[0],
'os.name': os.name,
'platform.version': platform.version(),
'platform.machine': platform.machine(),
'platform.python_implementation': python_implementation(),
'extra': None # wheel extension
}
for var in list(_VARS.keys()):
if '.' in var:
_VARS[var.replace('.', '_')] = _VARS[var]
def default_environment():
"""Return copy of default PEP 385 globals dictionary."""
return dict(_VARS)
class ASTWhitelist(ast.NodeTransformer):
def __init__(self, statement):
self.statement = statement # for error messages
ALLOWED = (ast.Compare, ast.BoolOp, ast.Attribute, ast.Name, ast.Load, ast.Str)
# Bool operations
ALLOWED += (ast.And, ast.Or)
# Comparison operations
ALLOWED += (ast.Eq, ast.Gt, ast.GtE, ast.In, ast.Is, ast.IsNot, ast.Lt, ast.LtE, ast.NotEq, ast.NotIn)
def visit(self, node):
"""Ensure statement only contains allowed nodes."""
if not isinstance(node, self.ALLOWED):
raise SyntaxError('Not allowed in environment markers.\n%s\n%s' %
(self.statement,
(' ' * node.col_offset) + '^'))
return ast.NodeTransformer.visit(self, node)
def visit_Attribute(self, node):
"""Flatten one level of attribute access."""
new_node = ast.Name("%s.%s" % (node.value.id, node.attr), node.ctx)
return ast.copy_location(new_node, node)
def parse_marker(marker):
tree = ast.parse(marker, mode='eval')
new_tree = ASTWhitelist(marker).generic_visit(tree)
return new_tree
def compile_marker(parsed_marker):
return _builtin_compile(parsed_marker, '<environment marker>', 'eval',
dont_inherit=True)
_cache = weakref.WeakValueDictionary()
def compile(marker):
"""Return compiled marker as a function accepting an environment dict."""
try:
return _cache[marker]
except KeyError:
pass
if not marker.strip():
def marker_fn(environment=None, override=None):
""""""
return True
else:
compiled_marker = compile_marker(parse_marker(marker))
def marker_fn(environment=None, override=None):
"""override updates environment"""
if override is None:
override = {}
if environment is None:
environment = default_environment()
environment.update(override)
return eval(compiled_marker, environment)
marker_fn.__doc__ = marker
_cache[marker] = marker_fn
return _cache[marker]
def interpret(marker, environment=None):
return compile(marker)(environment)
@@ -1,113 +0,0 @@
Changelog
=========
0.5.3
-----
**release date:** 2014-06-22
* Better equality semantics for Language, Country, Script
0.5.2
-----
**release date:** 2014-05-25
* Babelfish objects (Language, Country, Script) are now picklable
* Added support for Python 3.4
0.5.1
-----
**release date:** 2014-01-26
* Add a register method to ConverterManager to register without loading
0.5.0
-----
**release date:** 2014-01-25
**WARNING:** Backward incompatible changes
* Simplify converter management with ConverterManager class
* Make babelfish usable in place
* Add Python 2.6 / 3.2 compatibility
0.4.0
-----
**release date:** 2013-11-21
**WARNING:** Backward incompatible changes
* Add converter support for Country
* Language/country reverse name detection is now case-insensitive
* Add alpha3t, scope and type converters
* Use lazy loading of converters
0.3.0
-----
**release date:** 2013-11-09
* Add support for scripts
* Improve built-in converters
* Add support for ietf
0.2.1
-----
**release date:** 2013-11-03
* Fix reading of data files
0.2.0
-----
**release date:** 2013-10-31
* Add str method
* More explicit exceptions
* Change repr format to use ascii only
0.1.5
-----
**release date:** 2013-10-21
* Add a fromcode method on Language class
* Add a codes attribute on converters
0.1.4
-----
**release date:** 2013-10-20
* Fix converters not raising NoConversionError
0.1.3
-----
**release date:** 2013-09-29
* Fix source distribution
0.1.2
-----
**release date:** 2013-09-29
* Add missing files to source distribution
0.1.1
-----
**release date:** 2013-09-28
* Fix python3 support
0.1
---
**release date:** 2013-09-28
* Initial version
@@ -1,25 +0,0 @@
Copyright (c) 2013, by the respective authors (see AUTHORS file).
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the BabelFish authors nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -1,16 +0,0 @@
BabelFish
=========
BabelFish is a Python library to work with countries and languages.
.. image:: https://travis-ci.org/Diaoul/babelfish.png?branch=master
:target: https://travis-ci.org/Diaoul/babelfish
.. image:: https://coveralls.io/repos/Diaoul/babelfish/badge.png
:target: https://coveralls.io/r/Diaoul/babelfish
License
-------
BabelFish is licensed under the `3-clause BSD license <http://opensource.org/licenses/BSD-3-Clause>`_.
Copyright (c) 2013, the BabelFish authors and contributors.
@@ -5,10 +5,10 @@
# that can be found in the LICENSE file.
#
__title__ = 'babelfish'
__version__ = '0.5.3'
__version__ = '0.5.5-dev'
__author__ = 'Antoine Bertin'
__license__ = 'BSD'
__copyright__ = 'Copyright 2013 the BabelFish authors'
__copyright__ = 'Copyright 2015 the BabelFish authors'
import sys
@@ -241,7 +241,14 @@ class ConverterManager(object):
return self.converters[ep.name]
for ep in (EntryPoint.parse(c) for c in self.registered_converters + self.internal_converters):
if ep.name == name:
self.converters[ep.name] = ep.load(require=False)()
# `require` argument of ep.load() is deprecated in newer versions of setuptools
if hasattr(ep, 'resolve'):
plugin = ep.resolve()
elif hasattr(ep, '_load'):
plugin = ep._load()
else:
plugin = ep.load(require=False)
self.converters[ep.name] = plugin()
return self.converters[ep.name]
raise KeyError(name)
@@ -0,0 +1,45 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2013 the BabelFish authors. All rights reserved.
# Use of this source code is governed by the 3-clause BSD license
# that can be found in the LICENSE file.
#
from __future__ import unicode_literals
import os.path
import tempfile
import zipfile
import requests
DATA_DIR = os.path.dirname(__file__)
# iso-3166-1.txt
print('Downloading ISO-3166-1 standard (ISO country codes)...')
with open(os.path.join(DATA_DIR, 'iso-3166-1.txt'), 'w') as f:
r = requests.get('http://www.iso.org/iso/home/standards/country_codes/country_names_and_code_elements_txt.htm')
f.write(r.content.strip())
# iso-639-3.tab
print('Downloading ISO-639-3 standard (ISO language codes)...')
with tempfile.TemporaryFile() as f:
r = requests.get('http://www-01.sil.org/iso639-3/iso-639-3_Code_Tables_20130531.zip')
f.write(r.content)
with zipfile.ZipFile(f) as z:
z.extract('iso-639-3.tab', DATA_DIR)
# iso-15924
print('Downloading ISO-15924 standard (ISO script codes)...')
with tempfile.TemporaryFile() as f:
r = requests.get('http://www.unicode.org/iso15924/iso15924.txt.zip')
f.write(r.content)
with zipfile.ZipFile(f) as z:
z.extract('iso15924-utf8-20131012.txt', DATA_DIR)
# opensubtitles supported languages
print('Downloading OpenSubtitles supported languages...')
with open(os.path.join(DATA_DIR, 'opensubtitles_languages.txt'), 'w') as f:
r = requests.get('http://www.opensubtitles.org/addons/export_languages.php')
f.write(r.content)
print('Done!')
@@ -1,250 +1,250 @@
Country Name;ISO 3166-1-alpha-2 code
AFGHANISTAN;AF
ÅLAND ISLANDS;AX
ALBANIA;AL
ALGERIA;DZ
AMERICAN SAMOA;AS
ANDORRA;AD
ANGOLA;AO
ANGUILLA;AI
ANTARCTICA;AQ
ANTIGUA AND BARBUDA;AG
ARGENTINA;AR
ARMENIA;AM
ARUBA;AW
AUSTRALIA;AU
AUSTRIA;AT
AZERBAIJAN;AZ
BAHAMAS;BS
BAHRAIN;BH
BANGLADESH;BD
BARBADOS;BB
BELARUS;BY
BELGIUM;BE
BELIZE;BZ
BENIN;BJ
BERMUDA;BM
BHUTAN;BT
BOLIVIA, PLURINATIONAL STATE OF;BO
BONAIRE, SINT EUSTATIUS AND SABA;BQ
BOSNIA AND HERZEGOVINA;BA
BOTSWANA;BW
BOUVET ISLAND;BV
BRAZIL;BR
BRITISH INDIAN OCEAN TERRITORY;IO
BRUNEI DARUSSALAM;BN
BULGARIA;BG
BURKINA FASO;BF
BURUNDI;BI
CAMBODIA;KH
CAMEROON;CM
CANADA;CA
CAPE VERDE;CV
CAYMAN ISLANDS;KY
CENTRAL AFRICAN REPUBLIC;CF
CHAD;TD
CHILE;CL
CHINA;CN
CHRISTMAS ISLAND;CX
COCOS (KEELING) ISLANDS;CC
COLOMBIA;CO
COMOROS;KM
CONGO;CG
CONGO, THE DEMOCRATIC REPUBLIC OF THE;CD
COOK ISLANDS;CK
COSTA RICA;CR
CÔTE D'IVOIRE;CI
CROATIA;HR
CUBA;CU
CURAÇAO;CW
CYPRUS;CY
CZECH REPUBLIC;CZ
DENMARK;DK
DJIBOUTI;DJ
DOMINICA;DM
DOMINICAN REPUBLIC;DO
ECUADOR;EC
EGYPT;EG
EL SALVADOR;SV
EQUATORIAL GUINEA;GQ
ERITREA;ER
ESTONIA;EE
ETHIOPIA;ET
FALKLAND ISLANDS (MALVINAS);FK
FAROE ISLANDS;FO
FIJI;FJ
FINLAND;FI
FRANCE;FR
FRENCH GUIANA;GF
FRENCH POLYNESIA;PF
FRENCH SOUTHERN TERRITORIES;TF
GABON;GA
GAMBIA;GM
GEORGIA;GE
GERMANY;DE
GHANA;GH
GIBRALTAR;GI
GREECE;GR
GREENLAND;GL
GRENADA;GD
GUADELOUPE;GP
GUAM;GU
GUATEMALA;GT
GUERNSEY;GG
GUINEA;GN
GUINEA-BISSAU;GW
GUYANA;GY
HAITI;HT
HEARD ISLAND AND MCDONALD ISLANDS;HM
HOLY SEE (VATICAN CITY STATE);VA
HONDURAS;HN
HONG KONG;HK
HUNGARY;HU
ICELAND;IS
INDIA;IN
INDONESIA;ID
IRAN, ISLAMIC REPUBLIC OF;IR
IRAQ;IQ
IRELAND;IE
ISLE OF MAN;IM
ISRAEL;IL
ITALY;IT
JAMAICA;JM
JAPAN;JP
JERSEY;JE
JORDAN;JO
KAZAKHSTAN;KZ
KENYA;KE
KIRIBATI;KI
KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF;KP
KOREA, REPUBLIC OF;KR
KUWAIT;KW
KYRGYZSTAN;KG
LAO PEOPLE'S DEMOCRATIC REPUBLIC;LA
LATVIA;LV
LEBANON;LB
LESOTHO;LS
LIBERIA;LR
LIBYA;LY
LIECHTENSTEIN;LI
LITHUANIA;LT
LUXEMBOURG;LU
MACAO;MO
MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF;MK
MADAGASCAR;MG
MALAWI;MW
MALAYSIA;MY
MALDIVES;MV
MALI;ML
MALTA;MT
MARSHALL ISLANDS;MH
MARTINIQUE;MQ
MAURITANIA;MR
MAURITIUS;MU
MAYOTTE;YT
MEXICO;MX
MICRONESIA, FEDERATED STATES OF;FM
MOLDOVA, REPUBLIC OF;MD
MONACO;MC
MONGOLIA;MN
MONTENEGRO;ME
MONTSERRAT;MS
MOROCCO;MA
MOZAMBIQUE;MZ
MYANMAR;MM
NAMIBIA;NA
NAURU;NR
NEPAL;NP
NETHERLANDS;NL
NEW CALEDONIA;NC
NEW ZEALAND;NZ
NICARAGUA;NI
NIGER;NE
NIGERIA;NG
NIUE;NU
NORFOLK ISLAND;NF
NORTHERN MARIANA ISLANDS;MP
NORWAY;NO
OMAN;OM
PAKISTAN;PK
PALAU;PW
PALESTINE, STATE OF;PS
PANAMA;PA
PAPUA NEW GUINEA;PG
PARAGUAY;PY
PERU;PE
PHILIPPINES;PH
PITCAIRN;PN
POLAND;PL
PORTUGAL;PT
PUERTO RICO;PR
QATAR;QA
RÉUNION;RE
ROMANIA;RO
RUSSIAN FEDERATION;RU
RWANDA;RW
SAINT BARTHÉLEMY;BL
SAINT HELENA, ASCENSION AND TRISTAN DA CUNHA;SH
SAINT KITTS AND NEVIS;KN
SAINT LUCIA;LC
SAINT MARTIN (FRENCH PART);MF
SAINT PIERRE AND MIQUELON;PM
SAINT VINCENT AND THE GRENADINES;VC
SAMOA;WS
SAN MARINO;SM
SAO TOME AND PRINCIPE;ST
SAUDI ARABIA;SA
SENEGAL;SN
SERBIA;RS
SEYCHELLES;SC
SIERRA LEONE;SL
SINGAPORE;SG
SINT MAARTEN (DUTCH PART);SX
SLOVAKIA;SK
SLOVENIA;SI
SOLOMON ISLANDS;SB
SOMALIA;SO
SOUTH AFRICA;ZA
SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS;GS
SOUTH SUDAN;SS
SPAIN;ES
SRI LANKA;LK
SUDAN;SD
SURINAME;SR
SVALBARD AND JAN MAYEN;SJ
SWAZILAND;SZ
SWEDEN;SE
SWITZERLAND;CH
SYRIAN ARAB REPUBLIC;SY
TAIWAN, PROVINCE OF CHINA;TW
TAJIKISTAN;TJ
TANZANIA, UNITED REPUBLIC OF;TZ
THAILAND;TH
TIMOR-LESTE;TL
TOGO;TG
TOKELAU;TK
TONGA;TO
TRINIDAD AND TOBAGO;TT
TUNISIA;TN
TURKEY;TR
TURKMENISTAN;TM
TURKS AND CAICOS ISLANDS;TC
TUVALU;TV
UGANDA;UG
UKRAINE;UA
UNITED ARAB EMIRATES;AE
UNITED KINGDOM;GB
UNITED STATES;US
UNITED STATES MINOR OUTLYING ISLANDS;UM
URUGUAY;UY
UZBEKISTAN;UZ
VANUATU;VU
VENEZUELA, BOLIVARIAN REPUBLIC OF;VE
VIET NAM;VN
VIRGIN ISLANDS, BRITISH;VG
VIRGIN ISLANDS, U.S.;VI
WALLIS AND FUTUNA;WF
WESTERN SAHARA;EH
YEMEN;YE
ZAMBIA;ZM
Country Name;ISO 3166-1-alpha-2 code
AFGHANISTAN;AF
ÅLAND ISLANDS;AX
ALBANIA;AL
ALGERIA;DZ
AMERICAN SAMOA;AS
ANDORRA;AD
ANGOLA;AO
ANGUILLA;AI
ANTARCTICA;AQ
ANTIGUA AND BARBUDA;AG
ARGENTINA;AR
ARMENIA;AM
ARUBA;AW
AUSTRALIA;AU
AUSTRIA;AT
AZERBAIJAN;AZ
BAHAMAS;BS
BAHRAIN;BH
BANGLADESH;BD
BARBADOS;BB
BELARUS;BY
BELGIUM;BE
BELIZE;BZ
BENIN;BJ
BERMUDA;BM
BHUTAN;BT
BOLIVIA, PLURINATIONAL STATE OF;BO
BONAIRE, SINT EUSTATIUS AND SABA;BQ
BOSNIA AND HERZEGOVINA;BA
BOTSWANA;BW
BOUVET ISLAND;BV
BRAZIL;BR
BRITISH INDIAN OCEAN TERRITORY;IO
BRUNEI DARUSSALAM;BN
BULGARIA;BG
BURKINA FASO;BF
BURUNDI;BI
CAMBODIA;KH
CAMEROON;CM
CANADA;CA
CAPE VERDE;CV
CAYMAN ISLANDS;KY
CENTRAL AFRICAN REPUBLIC;CF
CHAD;TD
CHILE;CL
CHINA;CN
CHRISTMAS ISLAND;CX
COCOS (KEELING) ISLANDS;CC
COLOMBIA;CO
COMOROS;KM
CONGO;CG
CONGO, THE DEMOCRATIC REPUBLIC OF THE;CD
COOK ISLANDS;CK
COSTA RICA;CR
CÔTE D'IVOIRE;CI
CROATIA;HR
CUBA;CU
CURAÇAO;CW
CYPRUS;CY
CZECH REPUBLIC;CZ
DENMARK;DK
DJIBOUTI;DJ
DOMINICA;DM
DOMINICAN REPUBLIC;DO
ECUADOR;EC
EGYPT;EG
EL SALVADOR;SV
EQUATORIAL GUINEA;GQ
ERITREA;ER
ESTONIA;EE
ETHIOPIA;ET
FALKLAND ISLANDS (MALVINAS);FK
FAROE ISLANDS;FO
FIJI;FJ
FINLAND;FI
FRANCE;FR
FRENCH GUIANA;GF
FRENCH POLYNESIA;PF
FRENCH SOUTHERN TERRITORIES;TF
GABON;GA
GAMBIA;GM
GEORGIA;GE
GERMANY;DE
GHANA;GH
GIBRALTAR;GI
GREECE;GR
GREENLAND;GL
GRENADA;GD
GUADELOUPE;GP
GUAM;GU
GUATEMALA;GT
GUERNSEY;GG
GUINEA;GN
GUINEA-BISSAU;GW
GUYANA;GY
HAITI;HT
HEARD ISLAND AND MCDONALD ISLANDS;HM
HOLY SEE (VATICAN CITY STATE);VA
HONDURAS;HN
HONG KONG;HK
HUNGARY;HU
ICELAND;IS
INDIA;IN
INDONESIA;ID
IRAN, ISLAMIC REPUBLIC OF;IR
IRAQ;IQ
IRELAND;IE
ISLE OF MAN;IM
ISRAEL;IL
ITALY;IT
JAMAICA;JM
JAPAN;JP
JERSEY;JE
JORDAN;JO
KAZAKHSTAN;KZ
KENYA;KE
KIRIBATI;KI
KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF;KP
KOREA, REPUBLIC OF;KR
KUWAIT;KW
KYRGYZSTAN;KG
LAO PEOPLE'S DEMOCRATIC REPUBLIC;LA
LATVIA;LV
LEBANON;LB
LESOTHO;LS
LIBERIA;LR
LIBYA;LY
LIECHTENSTEIN;LI
LITHUANIA;LT
LUXEMBOURG;LU
MACAO;MO
MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF;MK
MADAGASCAR;MG
MALAWI;MW
MALAYSIA;MY
MALDIVES;MV
MALI;ML
MALTA;MT
MARSHALL ISLANDS;MH
MARTINIQUE;MQ
MAURITANIA;MR
MAURITIUS;MU
MAYOTTE;YT
MEXICO;MX
MICRONESIA, FEDERATED STATES OF;FM
MOLDOVA, REPUBLIC OF;MD
MONACO;MC
MONGOLIA;MN
MONTENEGRO;ME
MONTSERRAT;MS
MOROCCO;MA
MOZAMBIQUE;MZ
MYANMAR;MM
NAMIBIA;NA
NAURU;NR
NEPAL;NP
NETHERLANDS;NL
NEW CALEDONIA;NC
NEW ZEALAND;NZ
NICARAGUA;NI
NIGER;NE
NIGERIA;NG
NIUE;NU
NORFOLK ISLAND;NF
NORTHERN MARIANA ISLANDS;MP
NORWAY;NO
OMAN;OM
PAKISTAN;PK
PALAU;PW
PALESTINE, STATE OF;PS
PANAMA;PA
PAPUA NEW GUINEA;PG
PARAGUAY;PY
PERU;PE
PHILIPPINES;PH
PITCAIRN;PN
POLAND;PL
PORTUGAL;PT
PUERTO RICO;PR
QATAR;QA
RÉUNION;RE
ROMANIA;RO
RUSSIAN FEDERATION;RU
RWANDA;RW
SAINT BARTHÉLEMY;BL
SAINT HELENA, ASCENSION AND TRISTAN DA CUNHA;SH
SAINT KITTS AND NEVIS;KN
SAINT LUCIA;LC
SAINT MARTIN (FRENCH PART);MF
SAINT PIERRE AND MIQUELON;PM
SAINT VINCENT AND THE GRENADINES;VC
SAMOA;WS
SAN MARINO;SM
SAO TOME AND PRINCIPE;ST
SAUDI ARABIA;SA
SENEGAL;SN
SERBIA;RS
SEYCHELLES;SC
SIERRA LEONE;SL
SINGAPORE;SG
SINT MAARTEN (DUTCH PART);SX
SLOVAKIA;SK
SLOVENIA;SI
SOLOMON ISLANDS;SB
SOMALIA;SO
SOUTH AFRICA;ZA
SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS;GS
SOUTH SUDAN;SS
SPAIN;ES
SRI LANKA;LK
SUDAN;SD
SURINAME;SR
SVALBARD AND JAN MAYEN;SJ
SWAZILAND;SZ
SWEDEN;SE
SWITZERLAND;CH
SYRIAN ARAB REPUBLIC;SY
TAIWAN, PROVINCE OF CHINA;TW
TAJIKISTAN;TJ
TANZANIA, UNITED REPUBLIC OF;TZ
THAILAND;TH
TIMOR-LESTE;TL
TOGO;TG
TOKELAU;TK
TONGA;TO
TRINIDAD AND TOBAGO;TT
TUNISIA;TN
TURKEY;TR
TURKMENISTAN;TM
TURKS AND CAICOS ISLANDS;TC
TUVALU;TV
UGANDA;UG
UKRAINE;UA
UNITED ARAB EMIRATES;AE
UNITED KINGDOM;GB
UNITED STATES;US
UNITED STATES MINOR OUTLYING ISLANDS;UM
URUGUAY;UY
UZBEKISTAN;UZ
VANUATU;VU
VENEZUELA, BOLIVARIAN REPUBLIC OF;VE
VIET NAM;VN
VIRGIN ISLANDS, BRITISH;VG
VIRGIN ISLANDS, U.S.;VI
WALLIS AND FUTUNA;WF
WESTERN SAHARA;EH
YEMEN;YE
ZAMBIA;ZM
ZIMBABWE;ZW
File diff suppressed because it is too large Load Diff
+4 -3
View File
@@ -1,6 +1,6 @@
Beautiful Soup is made available under the MIT license:
Copyright (c) 2004-2012 Leonard Richardson
Copyright (c) 2004-2015 Leonard Richardson
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
@@ -20,7 +20,8 @@ Beautiful Soup is made available under the MIT license:
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE, DAMMIT.
SOFTWARE.
Beautiful Soup incorporates code from the html5lib library, which is
also made available under the MIT license.
also made available under the MIT license. Copyright (c) 2006-2013
James Graham and other contributors
+124
View File
@@ -1,3 +1,127 @@
= 4.4.1 (20150928) =
* Fixed a bug that deranged the tree when part of it was
removed. Thanks to Eric Weiser for the patch and John Wiseman for a
test. [bug=1481520]
* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
Kramer for the patch. [bug=1483781]
* Improved the implementation of CSS selector grouping. Thanks to
Orangain for the patch. [bug=1484543]
* Fixed the test_detect_utf8 test so that it works when chardet is
installed. [bug=1471359]
* Corrected the output of Declaration objects. [bug=1477847]
= 4.4.0 (20150703) =
Especially important changes:
* Added a warning when you instantiate a BeautifulSoup object without
explicitly naming a parser. [bug=1398866]
* __repr__ now returns an ASCII bytestring in Python 2, and a Unicode
string in Python 3, instead of a UTF8-encoded bytestring in both
versions. In Python 3, __str__ now returns a Unicode string instead
of a bytestring. [bug=1420131]
* The `text` argument to the find_* methods is now called `string`,
which is more accurate. `text` still works, but `string` is the
argument described in the documentation. `text` may eventually
change its meaning, but not for a very long time. [bug=1366856]
* Changed the way soup objects work under copy.copy(). Copying a
NavigableString or a Tag will give you a new NavigableString that's
equal to the old one but not connected to the parse tree. Patch by
Martijn Peters. [bug=1307490]
* Started using a standard MIT license. [bug=1294662]
* Added a Chinese translation of the documentation by Delong .w.
New features:
* Introduced the select_one() method, which uses a CSS selector but
only returns the first match, instead of a list of
matches. [bug=1349367]
* You can now create a Tag object without specifying a
TreeBuilder. Patch by Martijn Pieters. [bug=1307471]
* You can now create a NavigableString or a subclass just by invoking
the constructor. [bug=1294315]
* Added an `exclude_encodings` argument to UnicodeDammit and to the
Beautiful Soup constructor, which lets you prohibit the detection of
an encoding that you know is wrong. [bug=1469408]
* The select() method now supports selector grouping. Patch by
Francisco Canas [bug=1191917]
Bug fixes:
* Fixed yet another problem that caused the html5lib tree builder to
create a disconnected parse tree. [bug=1237763]
* Force object_was_parsed() to keep the tree intact even when an element
from later in the document is moved into place. [bug=1430633]
* Fixed yet another bug that caused a disconnected tree when html5lib
copied an element from one part of the tree to another. [bug=1270611]
* Fixed a bug where Element.extract() could create an infinite loop in
the remaining tree.
* The select() method can now find tags whose names contain
dashes. Patch by Francisco Canas. [bug=1276211]
* The select() method can now find tags with attributes whose names
contain dashes. Patch by Marek Kapolka. [bug=1304007]
* Improved the lxml tree builder's handling of processing
instructions. [bug=1294645]
* Restored the helpful syntax error that happens when you try to
import the Python 2 edition of Beautiful Soup under Python
3. [bug=1213387]
* In Python 3.4 and above, set the new convert_charrefs argument to
the html.parser constructor to avoid a warning and future
failures. Patch by Stefano Revera. [bug=1375721]
* The warning when you pass in a filename or URL as markup will now be
displayed correctly even if the filename or URL is a Unicode
string. [bug=1268888]
* If the initial <html> tag contains a CDATA list attribute such as
'class', the html5lib tree builder will now turn its value into a
list, as it would with any other tag. [bug=1296481]
* Fixed an import error in Python 3.5 caused by the removal of the
HTMLParseError class. [bug=1420063]
* Improved docstring for encode_contents() and
decode_contents(). [bug=1441543]
* Fixed a crash in Unicode, Dammit's encoding detector when the name
of the encoding itself contained invalid bytes. [bug=1360913]
* Improved the exception raised when you call .unwrap() or
.replace_with() on an element that's not attached to a tree.
* Raise a NotImplementedError whenever an unsupported CSS pseudoclass
is used in select(). Previously some cases did not result in a
NotImplementedError.
* It's now possible to pickle a BeautifulSoup object no matter which
tree builder was used to create it. However, the only tree builder
that survives the pickling process is the HTMLParserTreeBuilder
('html.parser'). If you unpickle a BeautifulSoup object created with
some other tree builder, soup.builder will be None. [bug=1231545]
= 4.3.2 (20131002) =
* Fixed a bug in which short Unicode input was improperly encoded to
+31
View File
@@ -0,0 +1,31 @@
Additions
---------
More of the jQuery API: nextUntil?
Optimizations
-------------
The html5lib tree builder doesn't use the standard tree-building API,
which worries me and has resulted in a number of bugs.
markup_attr_map can be optimized since it's always a map now.
Upon encountering UTF-16LE data or some other uncommon serialization
of Unicode, UnicodeDammit will convert the data to Unicode, then
encode it at UTF-8. This is wasteful because it will just get decoded
back to Unicode.
CDATA
-----
The elementtree XMLParser has a strip_cdata argument that, when set to
False, should allow Beautiful Soup to preserve CDATA sections instead
of treating them as text. Except it doesn't. (This argument is also
present for HTMLParser, and also does nothing there.)
Currently, htm5lib converts CDATA sections into comments. An
as-yet-unreleased version of html5lib changes the parser's handling of
CDATA sections to allow CDATA sections in tags like <svg> and
<math>. The HTML5TreeBuilder will need to be updated to create CData
objects instead of Comment objects in this situation.
+77 -15
View File
@@ -17,8 +17,8 @@ http://www.crummy.com/software/BeautifulSoup/bs4/doc/
"""
__author__ = "Leonard Richardson (leonardr@segfault.org)"
__version__ = "4.3.2"
__copyright__ = "Copyright (c) 2004-2013 Leonard Richardson"
__version__ = "4.4.1"
__copyright__ = "Copyright (c) 2004-2015 Leonard Richardson"
__license__ = "MIT"
__all__ = ['BeautifulSoup']
@@ -45,7 +45,7 @@ from .element import (
# The very first thing we do is give a useful error if someone is
# running this code under Python 3 without converting it.
syntax_error = u'You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work. You need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).'
'You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work.'<>'You need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).'
class BeautifulSoup(Tag):
"""
@@ -77,8 +77,11 @@ class BeautifulSoup(Tag):
ASCII_SPACES = '\x20\x0a\x09\x0c\x0d'
NO_PARSER_SPECIFIED_WARNING = "No parser was explicitly specified, so I'm using the best available %(markup_type)s parser for this system (\"%(parser)s\"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.\n\nTo get rid of this warning, change this:\n\n BeautifulSoup([your markup])\n\nto this:\n\n BeautifulSoup([your markup], \"%(parser)s\")\n"
def __init__(self, markup="", features=None, builder=None,
parse_only=None, from_encoding=None, **kwargs):
parse_only=None, from_encoding=None, exclude_encodings=None,
**kwargs):
"""The Soup object is initialized as the 'root tag', and the
provided markup (which can be a string or a file-like object)
is fed into the underlying parser."""
@@ -114,9 +117,9 @@ class BeautifulSoup(Tag):
del kwargs['isHTML']
warnings.warn(
"BS4 does not respect the isHTML argument to the "
"BeautifulSoup constructor. You can pass in features='html' "
"or features='xml' to get a builder capable of handling "
"one or the other.")
"BeautifulSoup constructor. Suggest you use "
"features='lxml' for HTML and features='lxml-xml' for "
"XML.")
def deprecated_argument(old_name, new_name):
if old_name in kwargs:
@@ -140,6 +143,7 @@ class BeautifulSoup(Tag):
"__init__() got an unexpected keyword argument '%s'" % arg)
if builder is None:
original_features = features
if isinstance(features, basestring):
features = [features]
if features is None or len(features) == 0:
@@ -151,6 +155,16 @@ class BeautifulSoup(Tag):
"requested: %s. Do you need to install a parser library?"
% ",".join(features))
builder = builder_class()
if not (original_features == builder.NAME or
original_features in builder.ALTERNATE_NAMES):
if builder.is_xml:
markup_type = "XML"
else:
markup_type = "HTML"
warnings.warn(self.NO_PARSER_SPECIFIED_WARNING % dict(
parser=builder.NAME,
markup_type=markup_type))
self.builder = builder
self.is_xml = builder.is_xml
self.builder.soup = self
@@ -178,6 +192,8 @@ class BeautifulSoup(Tag):
# system. Just let it go.
pass
if is_file:
if isinstance(markup, unicode):
markup = markup.encode("utf8")
warnings.warn(
'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)
if markup[:5] == "http:" or markup[:6] == "https:":
@@ -185,12 +201,15 @@ class BeautifulSoup(Tag):
# Python 3 otherwise.
if ((isinstance(markup, bytes) and not b' ' in markup)
or (isinstance(markup, unicode) and not u' ' in markup)):
if isinstance(markup, unicode):
markup = markup.encode("utf8")
warnings.warn(
'"%s" looks like a URL. Beautiful Soup is not an HTTP client. You should probably use an HTTP client to get the document behind the URL, and feed that document to Beautiful Soup.' % markup)
for (self.markup, self.original_encoding, self.declared_html_encoding,
self.contains_replacement_characters) in (
self.builder.prepare_markup(markup, from_encoding)):
self.builder.prepare_markup(
markup, from_encoding, exclude_encodings=exclude_encodings)):
self.reset()
try:
self._feed()
@@ -203,6 +222,16 @@ class BeautifulSoup(Tag):
self.markup = None
self.builder.soup = None
def __copy__(self):
return type(self)(self.encode(), builder=self.builder)
def __getstate__(self):
# Frequently a tree builder can't be pickled.
d = dict(self.__dict__)
if 'builder' in d and not self.builder.picklable:
del d['builder']
return d
def _feed(self):
# Convert the document to Unicode.
self.builder.reset()
@@ -229,9 +258,7 @@ class BeautifulSoup(Tag):
def new_string(self, s, subclass=NavigableString):
"""Create a new NavigableString associated with this soup."""
navigable = subclass(s)
navigable.setup()
return navigable
return subclass(s)
def insert_before(self, successor):
raise NotImplementedError("BeautifulSoup objects don't support insert_before().")
@@ -290,14 +317,49 @@ class BeautifulSoup(Tag):
def object_was_parsed(self, o, parent=None, most_recent_element=None):
"""Add an object to the parse tree."""
parent = parent or self.currentTag
most_recent_element = most_recent_element or self._most_recent_element
o.setup(parent, most_recent_element)
previous_element = most_recent_element or self._most_recent_element
next_element = previous_sibling = next_sibling = None
if isinstance(o, Tag):
next_element = o.next_element
next_sibling = o.next_sibling
previous_sibling = o.previous_sibling
if not previous_element:
previous_element = o.previous_element
o.setup(parent, previous_element, next_element, previous_sibling, next_sibling)
if most_recent_element is not None:
most_recent_element.next_element = o
self._most_recent_element = o
parent.contents.append(o)
if parent.next_sibling:
# This node is being inserted into an element that has
# already been parsed. Deal with any dangling references.
index = parent.contents.index(o)
if index == 0:
previous_element = parent
previous_sibling = None
else:
previous_element = previous_sibling = parent.contents[index-1]
if index == len(parent.contents)-1:
next_element = parent.next_sibling
next_sibling = None
else:
next_element = next_sibling = parent.contents[index+1]
o.previous_element = previous_element
if previous_element:
previous_element.next_element = o
o.next_element = next_element
if next_element:
next_element.previous_element = o
o.next_sibling = next_sibling
if next_sibling:
next_sibling.previous_sibling = o
o.previous_sibling = previous_sibling
if previous_sibling:
previous_sibling.next_sibling = o
def _popToTag(self, name, nsprefix=None, inclusivePop=True):
"""Pops the tag stack up to and including the most recent
instance of the given tag. If inclusivePop is false, pops the tag
@@ -80,9 +80,12 @@ builder_registry = TreeBuilderRegistry()
class TreeBuilder(object):
"""Turn a document into a Beautiful Soup object tree."""
NAME = "[Unknown tree builder]"
ALTERNATE_NAMES = []
features = []
is_xml = False
picklable = False
preserve_whitespace_tags = set()
empty_element_tags = None # A tag will be considered an empty-element
# tag when and only when it has no contents.
@@ -2,6 +2,7 @@ __all__ = [
'HTML5TreeBuilder',
]
from pdb import set_trace
import warnings
from bs4.builder import (
PERMISSIVE,
@@ -9,7 +10,10 @@ from bs4.builder import (
HTML_5,
HTMLTreeBuilder,
)
from bs4.element import NamespacedAttribute
from bs4.element import (
NamespacedAttribute,
whitespace_re,
)
import html5lib
from html5lib.constants import namespaces
from bs4.element import (
@@ -22,11 +26,20 @@ from bs4.element import (
class HTML5TreeBuilder(HTMLTreeBuilder):
"""Use html5lib to build a tree."""
features = ['html5lib', PERMISSIVE, HTML_5, HTML]
NAME = "html5lib"
def prepare_markup(self, markup, user_specified_encoding):
features = [NAME, PERMISSIVE, HTML_5, HTML]
def prepare_markup(self, markup, user_specified_encoding,
document_declared_encoding=None, exclude_encodings=None):
# Store the user-specified encoding for use later on.
self.user_specified_encoding = user_specified_encoding
# document_declared_encoding and exclude_encodings aren't used
# ATM because the html5lib TreeBuilder doesn't use
# UnicodeDammit.
if exclude_encodings:
warnings.warn("You provided a value for exclude_encoding, but the html5lib tree builder doesn't support exclude_encoding.")
yield (markup, None, None, False)
# These methods are defined by Beautiful Soup.
@@ -101,7 +114,16 @@ class AttrList(object):
def __iter__(self):
return list(self.attrs.items()).__iter__()
def __setitem__(self, name, value):
"set attr", name, value
# If this attribute is a multi-valued attribute for this element,
# turn its value into a list.
list_attr = HTML5TreeBuilder.cdata_list_attributes
if (name in list_attr['*']
or (self.element.name in list_attr
and name in list_attr[self.element.name])):
# A node that is being cloned may have already undergone
# this procedure.
if not isinstance(value, list):
value = whitespace_re.split(value)
self.element[name] = value
def items(self):
return list(self.attrs.items())
@@ -161,6 +183,12 @@ class Element(html5lib.treebuilders._base.Node):
# immediately after the parent, if it has no children.)
if self.element.contents:
most_recent_element = self.element._last_descendant(False)
elif self.element.next_element is not None:
# Something from further ahead in the parse tree is
# being inserted into this earlier element. This is
# very annoying because it means an expensive search
# for the last element in the tree.
most_recent_element = self.soup._last_descendant()
else:
most_recent_element = self.element
@@ -172,6 +200,7 @@ class Element(html5lib.treebuilders._base.Node):
return AttrList(self.element)
def setAttributes(self, attributes):
if attributes is not None and len(attributes) > 0:
converted_attributes = []
@@ -218,6 +247,9 @@ class Element(html5lib.treebuilders._base.Node):
def reparentChildren(self, new_parent):
"""Move all of this tag's children into another tag."""
# print "MOVE", self.element.contents
# print "FROM", self.element
# print "TO", new_parent.element
element = self.element
new_parent_element = new_parent.element
# Determine what this tag's next_element will be once all the children
@@ -236,17 +268,28 @@ class Element(html5lib.treebuilders._base.Node):
new_parents_last_descendant_next_element = new_parent_element.next_element
to_append = element.contents
append_after = new_parent.element.contents
append_after = new_parent_element.contents
if len(to_append) > 0:
# Set the first child's previous_element and previous_sibling
# to elements within the new parent
first_child = to_append[0]
first_child.previous_element = new_parents_last_descendant
if new_parents_last_descendant:
first_child.previous_element = new_parents_last_descendant
else:
first_child.previous_element = new_parent_element
first_child.previous_sibling = new_parents_last_child
if new_parents_last_descendant:
new_parents_last_descendant.next_element = first_child
else:
new_parent_element.next_element = first_child
if new_parents_last_child:
new_parents_last_child.next_sibling = first_child
# Fix the last child's next_element and next_sibling
last_child = to_append[-1]
last_child.next_element = new_parents_last_descendant_next_element
if new_parents_last_descendant_next_element:
new_parents_last_descendant_next_element.previous_element = last_child
last_child.next_sibling = None
for child in to_append:
@@ -257,6 +300,10 @@ class Element(html5lib.treebuilders._base.Node):
element.contents = []
element.next_element = final_next_element
# print "DONE WITH MOVE"
# print "FROM", self.element
# print "TO", new_parent_element
def cloneNode(self):
tag = self.soup.new_tag(self.element.name, self.namespace)
node = Element(tag, self.soup, self.namespace)
@@ -4,10 +4,16 @@ __all__ = [
'HTMLParserTreeBuilder',
]
from HTMLParser import (
HTMLParser,
HTMLParseError,
)
from HTMLParser import HTMLParser
try:
from HTMLParser import HTMLParseError
except ImportError, e:
# HTMLParseError is removed in Python 3.5. Since it can never be
# thrown in 3.5, we can just define our own class as a placeholder.
class HTMLParseError(Exception):
pass
import sys
import warnings
@@ -19,10 +25,10 @@ import warnings
# At the end of this file, we monkeypatch HTMLParser so that
# strict=True works well on Python 3.2.2.
major, minor, release = sys.version_info[:3]
CONSTRUCTOR_TAKES_STRICT = (
major > 3
or (major == 3 and minor > 2)
or (major == 3 and minor == 2 and release >= 3))
CONSTRUCTOR_TAKES_STRICT = major == 3 and minor == 2 and release >= 3
CONSTRUCTOR_STRICT_IS_DEPRECATED = major == 3 and minor == 3
CONSTRUCTOR_TAKES_CONVERT_CHARREFS = major == 3 and minor >= 4
from bs4.element import (
CData,
@@ -63,7 +69,8 @@ class BeautifulSoupHTMLParser(HTMLParser):
def handle_charref(self, name):
# XXX workaround for a bug in HTMLParser. Remove this once
# it's fixed.
# it's fixed in all supported versions.
# http://bugs.python.org/issue13633
if name.startswith('x'):
real_name = int(name.lstrip('x'), 16)
elif name.startswith('X'):
@@ -113,14 +120,6 @@ class BeautifulSoupHTMLParser(HTMLParser):
def handle_pi(self, data):
self.soup.endData()
if data.endswith("?") and data.lower().startswith("xml"):
# "An XHTML processing instruction using the trailing '?'
# will cause the '?' to be included in data." - HTMLParser
# docs.
#
# Strip the question mark so we don't end up with two
# question marks.
data = data[:-1]
self.soup.handle_data(data)
self.soup.endData(ProcessingInstruction)
@@ -128,15 +127,19 @@ class BeautifulSoupHTMLParser(HTMLParser):
class HTMLParserTreeBuilder(HTMLTreeBuilder):
is_xml = False
features = [HTML, STRICT, HTMLPARSER]
picklable = True
NAME = HTMLPARSER
features = [NAME, HTML, STRICT]
def __init__(self, *args, **kwargs):
if CONSTRUCTOR_TAKES_STRICT:
if CONSTRUCTOR_TAKES_STRICT and not CONSTRUCTOR_STRICT_IS_DEPRECATED:
kwargs['strict'] = False
if CONSTRUCTOR_TAKES_CONVERT_CHARREFS:
kwargs['convert_charrefs'] = False
self.parser_args = (args, kwargs)
def prepare_markup(self, markup, user_specified_encoding=None,
document_declared_encoding=None):
document_declared_encoding=None, exclude_encodings=None):
"""
:return: A 4-tuple (markup, original encoding, encoding
declared within markup, whether any characters had to be
@@ -147,7 +150,8 @@ class HTMLParserTreeBuilder(HTMLTreeBuilder):
return
try_encodings = [user_specified_encoding, document_declared_encoding]
dammit = UnicodeDammit(markup, try_encodings, is_html=True)
dammit = UnicodeDammit(markup, try_encodings, is_html=True,
exclude_encodings=exclude_encodings)
yield (dammit.markup, dammit.original_encoding,
dammit.declared_html_encoding,
dammit.contains_replacement_characters)
+20 -5
View File
@@ -7,7 +7,12 @@ from io import BytesIO
from StringIO import StringIO
import collections
from lxml import etree
from bs4.element import Comment, Doctype, NamespacedAttribute
from bs4.element import (
Comment,
Doctype,
NamespacedAttribute,
ProcessingInstruction,
)
from bs4.builder import (
FAST,
HTML,
@@ -25,8 +30,11 @@ class LXMLTreeBuilderForXML(TreeBuilder):
is_xml = True
NAME = "lxml-xml"
ALTERNATE_NAMES = ["xml"]
# Well, it's permissive by XML parser standards.
features = [LXML, XML, FAST, PERMISSIVE]
features = [NAME, LXML, XML, FAST, PERMISSIVE]
CHUNK_SIZE = 512
@@ -70,6 +78,7 @@ class LXMLTreeBuilderForXML(TreeBuilder):
return (None, tag)
def prepare_markup(self, markup, user_specified_encoding=None,
exclude_encodings=None,
document_declared_encoding=None):
"""
:yield: A series of 4-tuples.
@@ -95,7 +104,8 @@ class LXMLTreeBuilderForXML(TreeBuilder):
# the document as each one in turn.
is_html = not self.is_xml
try_encodings = [user_specified_encoding, document_declared_encoding]
detector = EncodingDetector(markup, try_encodings, is_html)
detector = EncodingDetector(
markup, try_encodings, is_html, exclude_encodings)
for encoding in detector.encodings:
yield (detector.markup, encoding, document_declared_encoding, False)
@@ -189,7 +199,9 @@ class LXMLTreeBuilderForXML(TreeBuilder):
self.nsmaps.pop()
def pi(self, target, data):
pass
self.soup.endData()
self.soup.handle_data(target + ' ' + data)
self.soup.endData(ProcessingInstruction)
def data(self, content):
self.soup.handle_data(content)
@@ -212,7 +224,10 @@ class LXMLTreeBuilderForXML(TreeBuilder):
class LXMLTreeBuilder(HTMLTreeBuilder, LXMLTreeBuilderForXML):
features = [LXML, HTML, FAST, PERMISSIVE]
NAME = LXML
ALTERNATE_NAMES = ["lxml-html"]
features = ALTERNATE_NAMES + [NAME, HTML, FAST, PERMISSIVE]
is_xml = False
def default_parser(self, encoding):
+16 -5
View File
@@ -3,10 +3,12 @@
This library converts a bytestream to Unicode through any means
necessary. It is heavily based on code from Mark Pilgrim's Universal
Feed Parser. It works best on XML and XML, but it does not rewrite the
Feed Parser. It works best on XML and HTML, but it does not rewrite the
XML or HTML to reflect a new encoding; that's the tree builder's job.
"""
__license__ = "MIT"
from pdb import set_trace
import codecs
from htmlentitydefs import codepoint2name
import re
@@ -212,8 +214,11 @@ class EncodingDetector:
5. Windows-1252.
"""
def __init__(self, markup, override_encodings=None, is_html=False):
def __init__(self, markup, override_encodings=None, is_html=False,
exclude_encodings=None):
self.override_encodings = override_encodings or []
exclude_encodings = exclude_encodings or []
self.exclude_encodings = set([x.lower() for x in exclude_encodings])
self.chardet_encoding = None
self.is_html = is_html
self.declared_encoding = None
@@ -224,6 +229,8 @@ class EncodingDetector:
def _usable(self, encoding, tried):
if encoding is not None:
encoding = encoding.lower()
if encoding in self.exclude_encodings:
return False
if encoding not in tried:
tried.add(encoding)
return True
@@ -266,6 +273,9 @@ class EncodingDetector:
def strip_byte_order_mark(cls, data):
"""If a byte-order mark is present, strip it and return the encoding it implies."""
encoding = None
if isinstance(data, unicode):
# Unicode data cannot have a byte-order mark.
return data, encoding
if (len(data) >= 4) and (data[:2] == b'\xfe\xff') \
and (data[2:4] != '\x00\x00'):
encoding = 'utf-16be'
@@ -306,7 +316,7 @@ class EncodingDetector:
declared_encoding_match = html_meta_re.search(markup, endpos=html_endpos)
if declared_encoding_match is not None:
declared_encoding = declared_encoding_match.groups()[0].decode(
'ascii')
'ascii', 'replace')
if declared_encoding:
return declared_encoding.lower()
return None
@@ -331,13 +341,14 @@ class UnicodeDammit:
]
def __init__(self, markup, override_encodings=[],
smart_quotes_to=None, is_html=False):
smart_quotes_to=None, is_html=False, exclude_encodings=[]):
self.smart_quotes_to = smart_quotes_to
self.tried_encodings = []
self.contains_replacement_characters = False
self.is_html = is_html
self.detector = EncodingDetector(markup, override_encodings, is_html)
self.detector = EncodingDetector(
markup, override_encodings, is_html, exclude_encodings)
# Short-circuit if the data is in Unicode to begin with.
if isinstance(markup, unicode) or markup == '':
+16 -4
View File
@@ -1,4 +1,7 @@
"""Diagnostic functions, mainly for use when doing tech support."""
__license__ = "MIT"
import cProfile
from StringIO import StringIO
from HTMLParser import HTMLParser
@@ -33,12 +36,21 @@ def diagnose(data):
if 'lxml' in basic_parsers:
basic_parsers.append(["lxml", "xml"])
from lxml import etree
print "Found lxml version %s" % ".".join(map(str,etree.LXML_VERSION))
try:
from lxml import etree
print "Found lxml version %s" % ".".join(map(str,etree.LXML_VERSION))
except ImportError, e:
print (
"lxml is not installed or couldn't be imported.")
if 'html5lib' in basic_parsers:
import html5lib
print "Found html5lib version %s" % html5lib.__version__
try:
import html5lib
print "Found html5lib version %s" % html5lib.__version__
except ImportError, e:
print (
"html5lib is not installed or couldn't be imported.")
if hasattr(data, 'read'):
data = data.read()
+181 -67
View File
@@ -1,3 +1,6 @@
__license__ = "MIT"
from pdb import set_trace
import collections
import re
import sys
@@ -185,24 +188,40 @@ class PageElement(object):
return self.HTML_FORMATTERS.get(
name, HTMLAwareEntitySubstitution.substitute_xml)
def setup(self, parent=None, previous_element=None):
def setup(self, parent=None, previous_element=None, next_element=None,
previous_sibling=None, next_sibling=None):
"""Sets up the initial relations between this element and
other elements."""
self.parent = parent
self.previous_element = previous_element
if previous_element is not None:
self.previous_element.next_element = self
self.next_element = None
self.previous_sibling = None
self.next_sibling = None
if self.parent is not None and self.parent.contents:
self.previous_sibling = self.parent.contents[-1]
self.next_element = next_element
if self.next_element:
self.next_element.previous_element = self
self.next_sibling = next_sibling
if self.next_sibling:
self.next_sibling.previous_sibling = self
if (not previous_sibling
and self.parent is not None and self.parent.contents):
previous_sibling = self.parent.contents[-1]
self.previous_sibling = previous_sibling
if previous_sibling:
self.previous_sibling.next_sibling = self
nextSibling = _alias("next_sibling") # BS3
previousSibling = _alias("previous_sibling") # BS3
def replace_with(self, replace_with):
if not self.parent:
raise ValueError(
"Cannot replace one element with another when the"
"element to be replaced is not part of a tree.")
if replace_with is self:
return
if replace_with is self.parent:
@@ -216,6 +235,10 @@ class PageElement(object):
def unwrap(self):
my_parent = self.parent
if not self.parent:
raise ValueError(
"Cannot replace an element with its contents when that"
"element is not part of a tree.")
my_index = self.parent.index(self)
self.extract()
for child in reversed(self.contents[:]):
@@ -240,17 +263,20 @@ class PageElement(object):
last_child = self._last_descendant()
next_element = last_child.next_element
if self.previous_element is not None:
if (self.previous_element is not None and
self.previous_element is not next_element):
self.previous_element.next_element = next_element
if next_element is not None:
if next_element is not None and next_element is not self.previous_element:
next_element.previous_element = self.previous_element
self.previous_element = None
last_child.next_element = None
self.parent = None
if self.previous_sibling is not None:
if (self.previous_sibling is not None
and self.previous_sibling is not self.next_sibling):
self.previous_sibling.next_sibling = self.next_sibling
if self.next_sibling is not None:
if (self.next_sibling is not None
and self.next_sibling is not self.previous_sibling):
self.next_sibling.previous_sibling = self.previous_sibling
self.previous_sibling = self.next_sibling = None
return self
@@ -263,13 +289,15 @@ class PageElement(object):
last_child = self
while isinstance(last_child, Tag) and last_child.contents:
last_child = last_child.contents[-1]
if not accept_self and last_child == self:
if not accept_self and last_child is self:
last_child = None
return last_child
# BS3: Not part of the API!
_lastRecursiveChild = _last_descendant
def insert(self, position, new_child):
if new_child is None:
raise ValueError("Cannot insert None into a tag.")
if new_child is self:
raise ValueError("Cannot insert a tag into itself.")
if (isinstance(new_child, basestring)
@@ -478,6 +506,10 @@ class PageElement(object):
def _find_all(self, name, attrs, text, limit, generator, **kwargs):
"Iterates over a generator looking for things that match."
if text is None and 'string' in kwargs:
text = kwargs['string']
del kwargs['string']
if isinstance(name, SoupStrainer):
strainer = name
else:
@@ -548,17 +580,17 @@ class PageElement(object):
# Methods for supporting CSS selectors.
tag_name_re = re.compile('^[a-z0-9]+$')
tag_name_re = re.compile('^[a-zA-Z0-9][-.a-zA-Z0-9:_]*$')
# /^(\w+)\[(\w+)([=~\|\^\$\*]?)=?"?([^\]"]*)"?\]$/
# \---/ \---/\-------------/ \-------/
# | | | |
# | | | The value
# | | ~,|,^,$,* or =
# | Attribute
# /^([a-zA-Z0-9][-.a-zA-Z0-9:_]*)\[(\w+)([=~\|\^\$\*]?)=?"?([^\]"]*)"?\]$/
# \---------------------------/ \---/\-------------/ \-------/
# | | | |
# | | | The value
# | | ~,|,^,$,* or =
# | Attribute
# Tag
attribselect_re = re.compile(
r'^(?P<tag>\w+)?\[(?P<attribute>\w+)(?P<operator>[=~\|\^\$\*]?)' +
r'^(?P<tag>[a-zA-Z0-9][-.a-zA-Z0-9:_]*)?\[(?P<attribute>[\w-]+)(?P<operator>[=~\|\^\$\*]?)' +
r'=?"?(?P<value>[^\]"]*)"?\]$'
)
@@ -654,11 +686,17 @@ class NavigableString(unicode, PageElement):
how to handle non-ASCII characters.
"""
if isinstance(value, unicode):
return unicode.__new__(cls, value)
return unicode.__new__(cls, value, DEFAULT_OUTPUT_ENCODING)
u = unicode.__new__(cls, value)
else:
u = unicode.__new__(cls, value, DEFAULT_OUTPUT_ENCODING)
u.setup()
return u
def __copy__(self):
return self
"""A copy of a NavigableString has the same contents and class
as the original, but it is not connected to the parse tree.
"""
return type(self)(self)
def __getnewargs__(self):
return (unicode(self),)
@@ -707,7 +745,7 @@ class CData(PreformattedString):
class ProcessingInstruction(PreformattedString):
PREFIX = u'<?'
SUFFIX = u'?>'
SUFFIX = u'>'
class Comment(PreformattedString):
@@ -716,8 +754,8 @@ class Comment(PreformattedString):
class Declaration(PreformattedString):
PREFIX = u'<!'
SUFFIX = u'!>'
PREFIX = u'<?'
SUFFIX = u'?>'
class Doctype(PreformattedString):
@@ -759,9 +797,12 @@ class Tag(PageElement):
self.prefix = prefix
if attrs is None:
attrs = {}
elif attrs and builder.cdata_list_attributes:
attrs = builder._replace_cdata_list_attribute_values(
self.name, attrs)
elif attrs:
if builder is not None and builder.cdata_list_attributes:
attrs = builder._replace_cdata_list_attribute_values(
self.name, attrs)
else:
attrs = dict(attrs)
else:
attrs = dict(attrs)
self.attrs = attrs
@@ -778,6 +819,18 @@ class Tag(PageElement):
parserClass = _alias("parser_class") # BS3
def __copy__(self):
"""A copy of a Tag is a new Tag, unconnected to the parse tree.
Its contents are a copy of the old Tag's contents.
"""
clone = type(self)(None, self.builder, self.name, self.namespace,
self.nsprefix, self.attrs)
for attr in ('can_be_empty_element', 'hidden'):
setattr(clone, attr, getattr(self, attr))
for child in self.contents:
clone.append(child.__copy__())
return clone
@property
def is_empty_element(self):
"""Is this tag an empty-element tag? (aka a self-closing tag)
@@ -971,15 +1024,25 @@ class Tag(PageElement):
as defined in __eq__."""
return not self == other
def __repr__(self, encoding=DEFAULT_OUTPUT_ENCODING):
def __repr__(self, encoding="unicode-escape"):
"""Renders this tag as a string."""
return self.encode(encoding)
if PY3K:
# "The return value must be a string object", i.e. Unicode
return self.decode()
else:
# "The return value must be a string object", i.e. a bytestring.
# By convention, the return value of __repr__ should also be
# an ASCII string.
return self.encode(encoding)
def __unicode__(self):
return self.decode()
def __str__(self):
return self.encode()
if PY3K:
return self.decode()
else:
return self.encode()
if PY3K:
__str__ = __repr__ = __unicode__
@@ -1103,12 +1166,18 @@ class Tag(PageElement):
formatter="minimal"):
"""Renders the contents of this tag as a Unicode string.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param eventual_encoding: The tag is destined to be
encoded into this encoding. This method is _not_
responsible for performing that encoding. This information
is passed in so that it can be substituted in if the
document contains a <META> tag that mentions the document's
encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
# First off, turn a string formatter into a function. This
# will stop the lookup from happening over and over again.
@@ -1137,7 +1206,17 @@ class Tag(PageElement):
def encode_contents(
self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a bytestring."""
"""Renders the contents of this tag as a bytestring.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param eventual_encoding: The bytestring will be in this encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
contents = self.decode_contents(indent_level, encoding, formatter)
return contents.encode(encoding)
@@ -1201,26 +1280,57 @@ class Tag(PageElement):
_selector_combinators = ['>', '+', '~']
_select_debug = False
def select(self, selector, _candidate_generator=None):
def select_one(self, selector):
"""Perform a CSS selection operation on the current element."""
value = self.select(selector, limit=1)
if value:
return value[0]
return None
def select(self, selector, _candidate_generator=None, limit=None):
"""Perform a CSS selection operation on the current element."""
# Handle grouping selectors if ',' exists, ie: p,a
if ',' in selector:
context = []
for partial_selector in selector.split(','):
partial_selector = partial_selector.strip()
if partial_selector == '':
raise ValueError('Invalid group selection syntax: %s' % selector)
candidates = self.select(partial_selector, limit=limit)
for candidate in candidates:
if candidate not in context:
context.append(candidate)
if limit and len(context) >= limit:
break
return context
tokens = selector.split()
current_context = [self]
if tokens[-1] in self._selector_combinators:
raise ValueError(
'Final combinator "%s" is missing an argument.' % tokens[-1])
if self._select_debug:
print 'Running CSS selector "%s"' % selector
for index, token in enumerate(tokens):
if self._select_debug:
print ' Considering token "%s"' % token
recursive_candidate_generator = None
tag_name = None
new_context = []
new_context_ids = set([])
if tokens[index-1] in self._selector_combinators:
# This token was consumed by the previous combinator. Skip it.
if self._select_debug:
print ' Token was consumed by the previous combinator.'
continue
if self._select_debug:
print ' Considering token "%s"' % token
recursive_candidate_generator = None
tag_name = None
# Each operation corresponds to a checker function, a rule
# for determining whether a candidate matches the
# selector. Candidates are generated by the active
@@ -1256,35 +1366,38 @@ class Tag(PageElement):
"A pseudo-class must be prefixed with a tag name.")
pseudo_attributes = re.match('([a-zA-Z\d-]+)\(([a-zA-Z\d]+)\)', pseudo)
found = []
if pseudo_attributes is not None:
if pseudo_attributes is None:
pseudo_type = pseudo
pseudo_value = None
else:
pseudo_type, pseudo_value = pseudo_attributes.groups()
if pseudo_type == 'nth-of-type':
try:
pseudo_value = int(pseudo_value)
except:
raise NotImplementedError(
'Only numeric values are currently supported for the nth-of-type pseudo-class.')
if pseudo_value < 1:
raise ValueError(
'nth-of-type pseudo-class value must be at least 1.')
class Counter(object):
def __init__(self, destination):
self.count = 0
self.destination = destination
def nth_child_of_type(self, tag):
self.count += 1
if self.count == self.destination:
return True
if self.count > self.destination:
# Stop the generator that's sending us
# these things.
raise StopIteration()
return False
checker = Counter(pseudo_value).nth_child_of_type
else:
if pseudo_type == 'nth-of-type':
try:
pseudo_value = int(pseudo_value)
except:
raise NotImplementedError(
'Only the following pseudo-classes are implemented: nth-of-type.')
'Only numeric values are currently supported for the nth-of-type pseudo-class.')
if pseudo_value < 1:
raise ValueError(
'nth-of-type pseudo-class value must be at least 1.')
class Counter(object):
def __init__(self, destination):
self.count = 0
self.destination = destination
def nth_child_of_type(self, tag):
self.count += 1
if self.count == self.destination:
return True
if self.count > self.destination:
# Stop the generator that's sending us
# these things.
raise StopIteration()
return False
checker = Counter(pseudo_value).nth_child_of_type
else:
raise NotImplementedError(
'Only the following pseudo-classes are implemented: nth-of-type.')
elif token == '*':
# Star selector -- matches everything
@@ -1311,7 +1424,6 @@ class Tag(PageElement):
else:
raise ValueError(
'Unsupported or invalid CSS selector: "%s"' % token)
if recursive_candidate_generator:
# This happens when the selector looks like "> foo".
#
@@ -1361,8 +1473,7 @@ class Tag(PageElement):
else:
_use_candidate_generator = _candidate_generator
new_context = []
new_context_ids = set([])
count = 0
for tag in current_context:
if self._select_debug:
print " Running candidate generator on %s %s" % (
@@ -1387,9 +1498,12 @@ class Tag(PageElement):
# don't include it in the context more than once.
new_context.append(candidate)
new_context_ids.add(id(candidate))
if limit and len(new_context) >= limit:
break
elif self._select_debug:
print " FAILURE %s %s" % (candidate.name, repr(candidate.attrs))
current_context = new_context
if self._select_debug:
+96 -1
View File
@@ -1,5 +1,8 @@
"""Helper classes for tests."""
__license__ = "MIT"
import pickle
import copy
import functools
import unittest
@@ -43,6 +46,16 @@ class SoupTest(unittest.TestCase):
self.assertEqual(obj.decode(), self.document_for(compare_parsed_to))
def assertConnectedness(self, element):
"""Ensure that next_element and previous_element are properly
set for all descendants of the given element.
"""
earlier = None
for e in element.descendants:
if earlier:
self.assertEqual(e, earlier.next_element)
self.assertEqual(earlier, e.previous_element)
earlier = e
class HTMLTreeBuilderSmokeTest(object):
@@ -54,6 +67,15 @@ class HTMLTreeBuilderSmokeTest(object):
markup in these tests, there's not much room for interpretation.
"""
def test_pickle_and_unpickle_identity(self):
# Pickling a tree, then unpickling it, yields a tree identical
# to the original.
tree = self.soup("<a><b>foo</a>")
dumped = pickle.dumps(tree, 2)
loaded = pickle.loads(dumped)
self.assertEqual(loaded.__class__, BeautifulSoup)
self.assertEqual(loaded.decode(), tree.decode())
def assertDoctypeHandled(self, doctype_fragment):
"""Assert that a given doctype string is handled correctly."""
doctype_str, soup = self._document_with_doctype(doctype_fragment)
@@ -114,6 +136,11 @@ class HTMLTreeBuilderSmokeTest(object):
soup.encode("utf-8").replace(b"\n", b""),
markup.replace(b"\n", b""))
def test_processing_instruction(self):
markup = b"""<?PITarget PIContent?>"""
soup = self.soup(markup)
self.assertEqual(markup, soup.encode("utf8"))
def test_deepcopy(self):
"""Make sure you can copy the tree builder.
@@ -155,6 +182,23 @@ class HTMLTreeBuilderSmokeTest(object):
def test_nested_formatting_elements(self):
self.assertSoupEquals("<em><em></em></em>")
def test_double_head(self):
html = '''<!DOCTYPE html>
<html>
<head>
<title>Ordinary HEAD element test</title>
</head>
<script type="text/javascript">
alert("Help!");
</script>
<body>
Hello, world!
</body>
</html>
'''
soup = self.soup(html)
self.assertEqual("text/javascript", soup.find('script')['type'])
def test_comment(self):
# Comments are represented as Comment objects.
markup = "<p>foo<!--foobar-->baz</p>"
@@ -221,6 +265,14 @@ class HTMLTreeBuilderSmokeTest(object):
soup = self.soup(markup)
self.assertEqual(["css"], soup.div.div['class'])
def test_multivalued_attribute_on_html(self):
# html5lib uses a different API to set the attributes ot the
# <html> tag. This has caused problems with multivalued
# attributes.
markup = '<html class="a b"></html>'
soup = self.soup(markup)
self.assertEqual(["a", "b"], soup.html['class'])
def test_angle_brackets_in_attribute_values_are_escaped(self):
self.assertSoupEquals('<a b="<a>"></a>', '<a b="&lt;a&gt;"></a>')
@@ -253,6 +305,35 @@ class HTMLTreeBuilderSmokeTest(object):
soup = self.soup("<html><h2>\nfoo</h2><p></p></html>")
self.assertEqual("p", soup.h2.string.next_element.name)
self.assertEqual("p", soup.p.name)
self.assertConnectedness(soup)
def test_head_tag_between_head_and_body(self):
"Prevent recurrence of a bug in the html5lib treebuilder."
content = """<html><head></head>
<link></link>
<body>foo</body>
</html>
"""
soup = self.soup(content)
self.assertNotEqual(None, soup.html.body)
self.assertConnectedness(soup)
def test_multiple_copies_of_a_tag(self):
"Prevent recurrence of a bug in the html5lib treebuilder."
content = """<!DOCTYPE html>
<html>
<body>
<article id="a" >
<div><a href="1"></div>
<footer>
<a href="2"></a>
</footer>
</article>
</body>
</html>
"""
soup = self.soup(content)
self.assertConnectedness(soup.article)
def test_basic_namespaces(self):
"""Parsers don't need to *understand* namespaces, but at the
@@ -463,11 +544,25 @@ class HTMLTreeBuilderSmokeTest(object):
class XMLTreeBuilderSmokeTest(object):
def test_pickle_and_unpickle_identity(self):
# Pickling a tree, then unpickling it, yields a tree identical
# to the original.
tree = self.soup("<a><b>foo</a>")
dumped = pickle.dumps(tree, 2)
loaded = pickle.loads(dumped)
self.assertEqual(loaded.__class__, BeautifulSoup)
self.assertEqual(loaded.decode(), tree.decode())
def test_docstring_generated(self):
soup = self.soup("<root/>")
self.assertEqual(
soup.encode(), b'<?xml version="1.0" encoding="utf-8"?>\n<root/>')
def test_xml_declaration(self):
markup = b"""<?xml version="1.0" encoding="utf8"?>\n<foo/>"""
soup = self.soup(markup)
self.assertEqual(markup, soup.encode("utf8"))
def test_real_xhtml_document(self):
"""A real XHTML document should come out *exactly* the same as it went in."""
markup = b"""<?xml version="1.0" encoding="utf-8"?>
@@ -485,7 +580,7 @@ class XMLTreeBuilderSmokeTest(object):
<script type="text/javascript">
</script>
"""
soup = BeautifulSoup(doc, "xml")
soup = BeautifulSoup(doc, "lxml-xml")
# lxml would have stripped this while parsing, but we can add
# it later.
soup.script.string = 'console.log("< < hey > > ");'
@@ -1,6 +1,7 @@
"""Tests of the builder registry."""
import unittest
import warnings
from bs4 import BeautifulSoup
from bs4.builder import (
@@ -67,10 +68,15 @@ class BuiltInRegistryTest(unittest.TestCase):
HTMLParserTreeBuilder)
def test_beautifulsoup_constructor_does_lookup(self):
# You can pass in a string.
BeautifulSoup("", features="html")
# Or a list of strings.
BeautifulSoup("", features=["html", "fast"])
with warnings.catch_warnings(record=True) as w:
# This will create a warning about not explicitly
# specifying a parser, but we'll ignore it.
# You can pass in a string.
BeautifulSoup("", features="html")
# Or a list of strings.
BeautifulSoup("", features=["html", "fast"])
# You'll get an exception if BS can't find an appropriate
# builder.
@@ -83,3 +83,16 @@ class HTML5LibBuilderSmokeTest(SoupTest, HTML5TreeBuilderSmokeTest):
soup = self.soup(markup)
self.assertEqual(u"<body><p><em>foo</em></p><em>\n</em><p><em>bar<a></a></em></p>\n</body>", soup.body.decode())
self.assertEqual(2, len(soup.find_all('p')))
def test_processing_instruction(self):
"""Processing instructions become comments."""
markup = b"""<?PITarget PIContent?>"""
soup = self.soup(markup)
assert str(soup).startswith("<!--?PITarget PIContent?-->")
def test_cloned_multivalue_node(self):
markup = b"""<a class="my_class"><p></a>"""
soup = self.soup(markup)
a1, a2 = soup.find_all('a')
self.assertEqual(a1, a2)
assert a1 is not a2
@@ -1,6 +1,8 @@
"""Tests to ensure that the html.parser tree builder generates good
trees."""
from pdb import set_trace
import pickle
from bs4.testing import SoupTest, HTMLTreeBuilderSmokeTest
from bs4.builder import HTMLParserTreeBuilder
@@ -17,3 +19,14 @@ class HTMLParserTreeBuilderSmokeTest(SoupTest, HTMLTreeBuilderSmokeTest):
def test_namespaced_public_doctype(self):
# html.parser can't handle namespaced doctypes, so skip this one.
pass
def test_builder_is_pickled(self):
"""Unlike most tree builders, HTMLParserTreeBuilder and will
be restored after pickling.
"""
tree = self.soup("<a><b>foo</a>")
dumped = pickle.dumps(tree, 2)
loaded = pickle.loads(dumped)
self.assertTrue(isinstance(loaded.builder, type(tree.builder)))
@@ -65,21 +65,6 @@ class LXMLTreeBuilderSmokeTest(SoupTest, HTMLTreeBuilderSmokeTest):
self.assertEqual(u"<b/>", unicode(soup.b))
self.assertTrue("BeautifulStoneSoup class is deprecated" in str(w[0].message))
def test_real_xhtml_document(self):
"""lxml strips the XML definition from an XHTML doc, which is fine."""
markup = b"""<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Hello.</title></head>
<body>Goodbye.</body>
</html>"""
soup = self.soup(markup)
self.assertEqual(
soup.encode("utf-8").replace(b"\n", b''),
markup.replace(b'\n', b'').replace(
b'<?xml version="1.0" encoding="utf-8"?>', b''))
@skipIf(
not LXML_PRESENT,
"lxml seems not to be present, not testing its XML tree builder.")
@@ -1,6 +1,7 @@
# -*- coding: utf-8 -*-
"""Tests of Beautiful Soup as a whole."""
from pdb import set_trace
import logging
import unittest
import sys
@@ -20,6 +21,7 @@ import bs4.dammit
from bs4.dammit import (
EntitySubstitution,
UnicodeDammit,
EncodingDetector,
)
from bs4.testing import (
SoupTest,
@@ -48,8 +50,34 @@ class TestConstructor(SoupTest):
soup = self.soup(data)
self.assertEqual(u"foo\0bar", soup.h1.string)
def test_exclude_encodings(self):
utf8_data = u"Räksmörgås".encode("utf-8")
soup = self.soup(utf8_data, exclude_encodings=["utf-8"])
self.assertEqual("windows-1252", soup.original_encoding)
class TestDeprecatedConstructorArguments(SoupTest):
class TestWarnings(SoupTest):
def _no_parser_specified(self, s, is_there=True):
v = s.startswith(BeautifulSoup.NO_PARSER_SPECIFIED_WARNING[:80])
self.assertTrue(v)
def test_warning_if_no_parser_specified(self):
with warnings.catch_warnings(record=True) as w:
soup = self.soup("<a><b></b></a>")
msg = str(w[0].message)
self._assert_no_parser_specified(msg)
def test_warning_if_parser_specified_too_vague(self):
with warnings.catch_warnings(record=True) as w:
soup = self.soup("<a><b></b></a>", "html")
msg = str(w[0].message)
self._assert_no_parser_specified(msg)
def test_no_warning_if_explicit_parser_specified(self):
with warnings.catch_warnings(record=True) as w:
soup = self.soup("<a><b></b></a>", "html.parser")
self.assertEquals([], w)
def test_parseOnlyThese_renamed_to_parse_only(self):
with warnings.catch_warnings(record=True) as w:
@@ -271,10 +299,11 @@ class TestUnicodeDammit(unittest.TestCase):
dammit.unicode_markup, """<foo>''""</foo>""")
def test_detect_utf8(self):
utf8 = b"\xc3\xa9"
utf8 = b"Sacr\xc3\xa9 bleu! \xe2\x98\x83"
dammit = UnicodeDammit(utf8)
self.assertEqual(dammit.unicode_markup, u'\xe9')
self.assertEqual(dammit.original_encoding.lower(), 'utf-8')
self.assertEqual(dammit.unicode_markup, u'Sacr\xe9 bleu! \N{SNOWMAN}')
def test_convert_hebrew(self):
hebrew = b"\xed\xe5\xec\xf9"
@@ -299,6 +328,26 @@ class TestUnicodeDammit(unittest.TestCase):
dammit = UnicodeDammit(utf8_data, [bad_encoding])
self.assertEqual(dammit.original_encoding.lower(), 'utf-8')
def test_exclude_encodings(self):
# This is UTF-8.
utf8_data = u"Räksmörgås".encode("utf-8")
# But if we exclude UTF-8 from consideration, the guess is
# Windows-1252.
dammit = UnicodeDammit(utf8_data, exclude_encodings=["utf-8"])
self.assertEqual(dammit.original_encoding.lower(), 'windows-1252')
# And if we exclude that, there is no valid guess at all.
dammit = UnicodeDammit(
utf8_data, exclude_encodings=["utf-8", "windows-1252"])
self.assertEqual(dammit.original_encoding, None)
def test_encoding_detector_replaces_junk_in_encoding_name_with_replacement_character(self):
detected = EncodingDetector(
b'<?xml version="1.0" encoding="UTF-\xdb" ?>')
encodings = list(detected.encodings)
assert u'utf-\N{REPLACEMENT CHARACTER}' in encodings
def test_detect_html5_style_meta_tag(self):
for data in (
+199 -17
View File
@@ -9,6 +9,7 @@ same markup, but all Beautiful Soup trees can be traversed with the
methods tested here.
"""
from pdb import set_trace
import copy
import pickle
import re
@@ -19,8 +20,10 @@ from bs4.builder import (
HTMLParserTreeBuilder,
)
from bs4.element import (
PY3K,
CData,
Comment,
Declaration,
Doctype,
NavigableString,
SoupStrainer,
@@ -68,7 +71,13 @@ class TestFind(TreeTest):
def test_unicode_text_find(self):
soup = self.soup(u'<h1>Räksmörgås</h1>')
self.assertEqual(soup.find(text=u'Räksmörgås'), u'Räksmörgås')
self.assertEqual(soup.find(string=u'Räksmörgås'), u'Räksmörgås')
def test_unicode_attribute_find(self):
soup = self.soup(u'<h1 id="Räksmörgås">here it is</h1>')
str(soup)
self.assertEqual("here it is", soup.find(id=u'Räksmörgås').text)
def test_find_everything(self):
"""Test an optimization that finds all tags."""
@@ -87,6 +96,7 @@ class TestFindAll(TreeTest):
"""You can search the tree for text nodes."""
soup = self.soup("<html>Foo<b>bar</b>\xbb</html>")
# Exact match.
self.assertEqual(soup.find_all(string="bar"), [u"bar"])
self.assertEqual(soup.find_all(text="bar"), [u"bar"])
# Match any of a number of strings.
self.assertEqual(
@@ -688,7 +698,7 @@ class TestTagCreation(SoupTest):
def test_tag_inherits_self_closing_rules_from_builder(self):
if XML_BUILDER_PRESENT:
xml_soup = BeautifulSoup("", "xml")
xml_soup = BeautifulSoup("", "lxml-xml")
xml_br = xml_soup.new_tag("br")
xml_p = xml_soup.new_tag("p")
@@ -697,7 +707,7 @@ class TestTagCreation(SoupTest):
self.assertEqual(b"<br/>", xml_br.encode())
self.assertEqual(b"<p/>", xml_p.encode())
html_soup = BeautifulSoup("", "html")
html_soup = BeautifulSoup("", "html.parser")
html_br = html_soup.new_tag("br")
html_p = html_soup.new_tag("p")
@@ -773,6 +783,14 @@ class TestTreeModification(SoupTest):
new_a = a.unwrap()
self.assertEqual(a, new_a)
def test_replace_with_and_unwrap_give_useful_exception_when_tag_has_no_parent(self):
soup = self.soup("<a><b>Foo</b></a><c>Bar</c>")
a = soup.a
a.extract()
self.assertEqual(None, a.parent)
self.assertRaises(ValueError, a.unwrap)
self.assertRaises(ValueError, a.replace_with, soup.c)
def test_replace_tag_with_itself(self):
text = "<a><b></b><c>Foo<d></d></c></a><a><e></e></a>"
soup = self.soup(text)
@@ -1067,6 +1085,31 @@ class TestTreeModification(SoupTest):
self.assertEqual(foo_2, soup.a.string)
self.assertEqual(bar_2, soup.b.string)
def test_extract_multiples_of_same_tag(self):
soup = self.soup("""
<html>
<head>
<script>foo</script>
</head>
<body>
<script>bar</script>
<a></a>
</body>
<script>baz</script>
</html>""")
[soup.script.extract() for i in soup.find_all("script")]
self.assertEqual("<body>\n\n<a></a>\n</body>", unicode(soup.body))
def test_extract_works_when_element_is_surrounded_by_identical_strings(self):
soup = self.soup(
'<html>\n'
'<body>hi</body>\n'
'</html>')
soup.find('body').extract()
self.assertEqual(None, soup.find('body'))
def test_clear(self):
"""Tag.clear()"""
soup = self.soup("<p><a>String <em>Italicized</em></a> and another</p>")
@@ -1293,6 +1336,51 @@ class TestPersistence(SoupTest):
loaded = pickle.loads(dumped)
self.assertEqual(loaded.decode(), soup.decode())
def test_copy_navigablestring_is_not_attached_to_tree(self):
html = u"<b>Foo<a></a></b><b>Bar</b>"
soup = self.soup(html)
s1 = soup.find(string="Foo")
s2 = copy.copy(s1)
self.assertEqual(s1, s2)
self.assertEqual(None, s2.parent)
self.assertEqual(None, s2.next_element)
self.assertNotEqual(None, s1.next_sibling)
self.assertEqual(None, s2.next_sibling)
self.assertEqual(None, s2.previous_element)
def test_copy_navigablestring_subclass_has_same_type(self):
html = u"<b><!--Foo--></b>"
soup = self.soup(html)
s1 = soup.string
s2 = copy.copy(s1)
self.assertEqual(s1, s2)
self.assertTrue(isinstance(s2, Comment))
def test_copy_entire_soup(self):
html = u"<div><b>Foo<a></a></b><b>Bar</b></div>end"
soup = self.soup(html)
soup_copy = copy.copy(soup)
self.assertEqual(soup, soup_copy)
def test_copy_tag_copies_contents(self):
html = u"<div><b>Foo<a></a></b><b>Bar</b></div>end"
soup = self.soup(html)
div = soup.div
div_copy = copy.copy(div)
# The two tags look the same, and evaluate to equal.
self.assertEqual(unicode(div), unicode(div_copy))
self.assertEqual(div, div_copy)
# But they're not the same object.
self.assertFalse(div is div_copy)
# And they don't have the same relation to the parse tree. The
# copy is not associated with a parse tree at all.
self.assertEqual(None, div_copy.parent)
self.assertEqual(None, div_copy.previous_element)
self.assertEqual(None, div_copy.find(string='Bar').next_element)
self.assertNotEqual(None, div.find(string='Bar').next_element)
class TestSubstitutions(SoupTest):
@@ -1366,7 +1454,7 @@ class TestSubstitutions(SoupTest):
console.log("< < hey > > ");
</script>
"""
encoded = BeautifulSoup(doc).encode()
encoded = BeautifulSoup(doc, 'html.parser').encode()
self.assertTrue(b"< < hey > >" in encoded)
def test_formatter_skips_style_tag_for_html_documents(self):
@@ -1375,7 +1463,7 @@ class TestSubstitutions(SoupTest):
console.log("< < hey > > ");
</style>
"""
encoded = BeautifulSoup(doc).encode()
encoded = BeautifulSoup(doc, 'html.parser').encode()
self.assertTrue(b"< < hey > >" in encoded)
def test_prettify_leaves_preformatted_text_alone(self):
@@ -1387,7 +1475,7 @@ class TestSubstitutions(SoupTest):
soup.div.prettify())
def test_prettify_accepts_formatter(self):
soup = BeautifulSoup("<html><body>foo</body></html>")
soup = BeautifulSoup("<html><body>foo</body></html>", 'html.parser')
pretty = soup.prettify(formatter = lambda x: x.upper())
self.assertTrue("FOO" in pretty)
@@ -1484,6 +1572,14 @@ class TestEncoding(SoupTest):
self.assertEqual(
u"\N{SNOWMAN}".encode("utf8"), soup.b.renderContents())
def test_repr(self):
html = u"<b>\N{SNOWMAN}</b>"
soup = self.soup(html)
if PY3K:
self.assertEqual(html, repr(soup))
else:
self.assertEqual(b'<b>\\u2603</b>', repr(soup))
class TestNavigableStringSubclasses(SoupTest):
def test_cdata(self):
@@ -1522,6 +1618,9 @@ class TestNavigableStringSubclasses(SoupTest):
soup.insert(1, doctype)
self.assertEqual(soup.encode(), b"<!DOCTYPE foo>\n")
def test_declaration(self):
d = Declaration("foo")
self.assertEqual("<?foo?>", d.output_ready())
class TestSoupSelector(TreeTest):
@@ -1534,7 +1633,7 @@ class TestSoupSelector(TreeTest):
<link rel="stylesheet" href="blah.css" type="text/css" id="l1">
</head>
<body>
<custom-dashed-tag class="dashed" id="dash1">Hello there.</custom-dashed-tag>
<div id="main" class="fancy">
<div id="inner">
<h1 id="header1">An H1</h1>
@@ -1552,8 +1651,18 @@ class TestSoupSelector(TreeTest):
<a href="#" id="s2a1">span2a1</a>
</span>
<span class="span3"></span>
<custom-dashed-tag class="dashed" id="dash2"/>
<div data-tag="dashedvalue" id="data1"/>
</span>
</div>
<x id="xid">
<z id="zida"/>
<z id="zidab"/>
<z id="zidac"/>
</x>
<y id="yid">
<z id="zidb"/>
</y>
<p lang="en" id="lang-en">English</p>
<p lang="en-gb" id="lang-en-gb">English UK</p>
<p lang="en-us" id="lang-en-us">English US</p>
@@ -1565,7 +1674,7 @@ class TestSoupSelector(TreeTest):
"""
def setUp(self):
self.soup = BeautifulSoup(self.HTML)
self.soup = BeautifulSoup(self.HTML, 'html.parser')
def assertSelects(self, selector, expected_ids):
el_ids = [el['id'] for el in self.soup.select(selector)]
@@ -1591,17 +1700,25 @@ class TestSoupSelector(TreeTest):
def test_one_tag_many(self):
els = self.soup.select('div')
self.assertEqual(len(els), 3)
self.assertEqual(len(els), 4)
for div in els:
self.assertEqual(div.name, 'div')
el = self.soup.select_one('div')
self.assertEqual('main', el['id'])
def test_select_one_returns_none_if_no_match(self):
match = self.soup.select_one('nonexistenttag')
self.assertEqual(None, match)
def test_tag_in_tag_one(self):
els = self.soup.select('div div')
self.assertSelects('div div', ['inner'])
self.assertSelects('div div', ['inner', 'data1'])
def test_tag_in_tag_many(self):
for selector in ('html div', 'html body div', 'body div'):
self.assertSelects(selector, ['main', 'inner', 'footer'])
self.assertSelects(selector, ['data1', 'main', 'inner', 'footer'])
def test_tag_no_match(self):
self.assertEqual(len(self.soup.select('del')), 0)
@@ -1609,6 +1726,20 @@ class TestSoupSelector(TreeTest):
def test_invalid_tag(self):
self.assertRaises(ValueError, self.soup.select, 'tag%t')
def test_select_dashed_tag_ids(self):
self.assertSelects('custom-dashed-tag', ['dash1', 'dash2'])
def test_select_dashed_by_id(self):
dashed = self.soup.select('custom-dashed-tag[id=\"dash2\"]')
self.assertEqual(dashed[0].name, 'custom-dashed-tag')
self.assertEqual(dashed[0]['id'], 'dash2')
def test_dashed_tag_text(self):
self.assertEqual(self.soup.select('body > custom-dashed-tag')[0].text, u'Hello there.')
def test_select_dashed_matches_find_all(self):
self.assertEqual(self.soup.select('custom-dashed-tag'), self.soup.find_all('custom-dashed-tag'))
def test_header_tags(self):
self.assertSelectMultiple(
('h1', ['header1']),
@@ -1709,6 +1840,7 @@ class TestSoupSelector(TreeTest):
('[id^="m"]', ['me', 'main']),
('div[id^="m"]', ['main']),
('a[id^="m"]', ['me']),
('div[data-tag^="dashed"]', ['data1'])
)
def test_attribute_endswith(self):
@@ -1716,8 +1848,8 @@ class TestSoupSelector(TreeTest):
('[href$=".css"]', ['l1']),
('link[href$=".css"]', ['l1']),
('link[id$="1"]', ['l1']),
('[id$="1"]', ['l1', 'p1', 'header1', 's1a1', 's2a1', 's1a2s1']),
('div[id$="1"]', []),
('[id$="1"]', ['data1', 'l1', 'p1', 'header1', 's1a1', 's2a1', 's1a2s1', 'dash1']),
('div[id$="1"]', ['data1']),
('[id$="noending"]', []),
)
@@ -1730,7 +1862,6 @@ class TestSoupSelector(TreeTest):
('[rel*="notstyle"]', []),
('link[rel*="notstyle"]', []),
('link[href*="bla"]', ['l1']),
('a[href*="http://"]', ['bob', 'me']),
('[href*="http://"]', ['bob', 'me']),
('[id*="p"]', ['pmulti', 'p1']),
('div[id*="m"]', ['main']),
@@ -1739,8 +1870,8 @@ class TestSoupSelector(TreeTest):
('[href*=".css"]', ['l1']),
('link[href*=".css"]', ['l1']),
('link[id*="1"]', ['l1']),
('[id*="1"]', ['l1', 'p1', 'header1', 's1a1', 's1a2', 's2a1', 's1a2s1']),
('div[id*="1"]', []),
('[id*="1"]', ['data1', 'l1', 'p1', 'header1', 's1a1', 's1a2', 's2a1', 's1a2s1', 'dash1']),
('div[id*="1"]', ['data1']),
('[id*="noending"]', []),
# New for this test
('[href*="."]', ['bob', 'me', 'l1']),
@@ -1748,6 +1879,7 @@ class TestSoupSelector(TreeTest):
('link[href*="."]', ['l1']),
('div[id*="n"]', ['main', 'inner']),
('div[id*="nn"]', ['inner']),
('div[data-tag*="edval"]', ['data1'])
)
def test_attribute_exact_or_hypen(self):
@@ -1767,8 +1899,17 @@ class TestSoupSelector(TreeTest):
('p[class]', ['p1', 'pmulti']),
('[blah]', []),
('p[blah]', []),
('div[data-tag]', ['data1'])
)
def test_unsupported_pseudoclass(self):
self.assertRaises(
NotImplementedError, self.soup.select, "a:no-such-pseudoclass")
self.assertRaises(
NotImplementedError, self.soup.select, "a:nth-of-type(a)")
def test_nth_of_type(self):
# Try to select first paragraph
els = self.soup.select('div#inner p:nth-of-type(1)')
@@ -1803,7 +1944,7 @@ class TestSoupSelector(TreeTest):
selected = inner.select("div")
# The <div id="inner"> tag was selected. The <div id="footer">
# tag was not.
self.assertSelectsIDs(selected, ['inner'])
self.assertSelectsIDs(selected, ['inner', 'data1'])
def test_overspecified_child_id(self):
self.assertSelects(".fancy #inner", ['inner'])
@@ -1827,3 +1968,44 @@ class TestSoupSelector(TreeTest):
def test_sibling_combinator_wont_select_same_tag_twice(self):
self.assertSelects('p[lang] ~ p', ['lang-en-gb', 'lang-en-us', 'lang-fr'])
# Test the selector grouping operator (the comma)
def test_multiple_select(self):
self.assertSelects('x, y', ['xid', 'yid'])
def test_multiple_select_with_no_space(self):
self.assertSelects('x,y', ['xid', 'yid'])
def test_multiple_select_with_more_space(self):
self.assertSelects('x, y', ['xid', 'yid'])
def test_multiple_select_duplicated(self):
self.assertSelects('x, x', ['xid'])
def test_multiple_select_sibling(self):
self.assertSelects('x, y ~ p[lang=fr]', ['xid', 'lang-fr'])
def test_multiple_select_tag_and_direct_descendant(self):
self.assertSelects('x, y > z', ['xid', 'zidb'])
def test_multiple_select_direct_descendant_and_tags(self):
self.assertSelects('div > x, y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac'])
def test_multiple_select_indirect_descendant(self):
self.assertSelects('div x,y, z', ['xid', 'yid', 'zida', 'zidb', 'zidab', 'zidac'])
def test_invalid_multiple_select(self):
self.assertRaises(ValueError, self.soup.select, ',x, y')
self.assertRaises(ValueError, self.soup.select, 'x,,y')
def test_multiple_select_attrs(self):
self.assertSelects('p[lang=en], p[lang=en-gb]', ['lang-en', 'lang-en-gb'])
def test_multiple_select_ids(self):
self.assertSelects('x, y > z[id=zida], z[id=zidab], z[id=zidb]', ['xid', 'zidb', 'zidab'])
def test_multiple_select_nested(self):
self.assertSelects('body > div > x, y > z', ['xid', 'zidb'])
+10 -2
View File
@@ -4,7 +4,7 @@ Chardet: The Universal Character Encoding Detector
Detects
- ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
- Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
- EUC-JP, SHIFT_JIS, ISO-2022-JP (Japanese)
- EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
- EUC-KR, ISO-2022-KR (Korean)
- KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
- ISO-8859-2, windows-1250 (Hungarian)
@@ -16,6 +16,14 @@ Detects
Requires Python 2.6 or later
Installation
------------
Install from `PyPI <https://pypi.python.org/pypi/chardet>`_::
pip install chardet
Command-line Tool
-----------------
@@ -31,7 +39,7 @@ About
This is a continuation of Mark Pilgrim's excellent chardet. Previously, two
versions needed to be maintained: one that supported python 2.x and one that
supported python 3.x. We've recently merged with `Ian Corduscano <https://github.com/sigmavirus24>`_'s
supported python 3.x. We've recently merged with `Ian Cordasco <https://github.com/sigmavirus24>`_'s
`charade <https://github.com/sigmavirus24/charade>`_ fork, so now we have one
coherent version that works for Python 2.6+.
@@ -15,7 +15,7 @@
# 02110-1301 USA
######################### END LICENSE BLOCK #########################
__version__ = "2.2.1"
__version__ = "2.3.0"
from sys import version_info
+50 -16
View File
@@ -12,34 +12,68 @@ Example::
If no paths are provided, it takes its input from stdin.
"""
from io import open
from sys import argv, stdin
from __future__ import absolute_import, print_function, unicode_literals
import argparse
import sys
from io import open
from chardet import __version__
from chardet.universaldetector import UniversalDetector
def description_of(file, name='stdin'):
"""Return a string describing the probable encoding of a file."""
def description_of(lines, name='stdin'):
"""
Return a string describing the probable encoding of a file or
list of strings.
:param lines: The lines to get the encoding of.
:type lines: Iterable of bytes
:param name: Name of file or collection of lines
:type name: str
"""
u = UniversalDetector()
for line in file:
for line in lines:
u.feed(line)
u.close()
result = u.result
if result['encoding']:
return '%s: %s with confidence %s' % (name,
result['encoding'],
result['confidence'])
return '{0}: {1} with confidence {2}'.format(name, result['encoding'],
result['confidence'])
else:
return '%s: no result' % name
return '{0}: no result'.format(name)
def main():
if len(argv) <= 1:
print(description_of(stdin))
else:
for path in argv[1:]:
with open(path, 'rb') as f:
print(description_of(f, path))
def main(argv=None):
'''
Handles command line arguments and gets things started.
:param argv: List of arguments, as if specified on the command-line.
If None, ``sys.argv[1:]`` is used instead.
:type argv: list of str
'''
# Get command line arguments
parser = argparse.ArgumentParser(
description="Takes one or more file paths and reports their detected \
encodings",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
conflict_handler='resolve')
parser.add_argument('input',
help='File whose encoding we would like to determine.',
type=argparse.FileType('rb'), nargs='*',
default=[sys.stdin])
parser.add_argument('--version', action='version',
version='%(prog)s {0}'.format(__version__))
args = parser.parse_args(argv)
for f in args.input:
if f.isatty():
print("You are running chardetect interactively. Press " +
"CTRL-D twice at the start of a blank line to signal the " +
"end of your input. If you want help, run chardetect " +
"--help\n", file=sys.stderr)
print(description_of(f, f.name))
if __name__ == '__main__':
@@ -177,6 +177,12 @@ class JapaneseContextAnalysis:
return -1, 1
class SJISContextAnalysis(JapaneseContextAnalysis):
def __init__(self):
self.charset_name = "SHIFT_JIS"
def get_charset_name(self):
return self.charset_name
def get_order(self, aBuf):
if not aBuf:
return -1, 1
@@ -184,6 +190,8 @@ class SJISContextAnalysis(JapaneseContextAnalysis):
first_char = wrap_ord(aBuf[0])
if ((0x81 <= first_char <= 0x9F) or (0xE0 <= first_char <= 0xFC)):
charLen = 2
if (first_char == 0x87) or (0xFA <= first_char <= 0xFC):
self.charset_name = "CP932"
else:
charLen = 1
@@ -129,11 +129,11 @@ class Latin1Prober(CharSetProber):
if total < 0.01:
confidence = 0.0
else:
confidence = ((self._mFreqCounter[3] / total)
- (self._mFreqCounter[1] * 20.0 / total))
confidence = ((self._mFreqCounter[3] - self._mFreqCounter[1] * 20.0)
/ total)
if confidence < 0.0:
confidence = 0.0
# lower the confidence of latin1 so that other more accurate
# detector can take priority.
confidence = confidence * 0.5
confidence = confidence * 0.73
return confidence
+3 -6
View File
@@ -353,7 +353,7 @@ SJIS_cls = (
2,2,2,2,2,2,2,2, # 68 - 6f
2,2,2,2,2,2,2,2, # 70 - 77
2,2,2,2,2,2,2,1, # 78 - 7f
3,3,3,3,3,3,3,3, # 80 - 87
3,3,3,3,3,2,2,3, # 80 - 87
3,3,3,3,3,3,3,3, # 88 - 8f
3,3,3,3,3,3,3,3, # 90 - 97
3,3,3,3,3,3,3,3, # 98 - 9f
@@ -369,9 +369,8 @@ SJIS_cls = (
2,2,2,2,2,2,2,2, # d8 - df
3,3,3,3,3,3,3,3, # e0 - e7
3,3,3,3,3,4,4,4, # e8 - ef
4,4,4,4,4,4,4,4, # f0 - f7
4,4,4,4,4,0,0,0 # f8 - ff
)
3,3,3,3,3,3,3,3, # f0 - f7
3,3,3,3,3,0,0,0) # f8 - ff
SJIS_st = (
@@ -571,5 +570,3 @@ UTF8SMModel = {'classTable': UTF8_cls,
'stateTable': UTF8_st,
'charLenTable': UTF8CharLenTable,
'name': 'UTF-8'}
# flake8: noqa
@@ -47,7 +47,7 @@ class SJISProber(MultiByteCharSetProber):
self._mContextAnalyzer.reset()
def get_charset_name(self):
return "SHIFT_JIS"
return self._mContextAnalyzer.get_charset_name()
def feed(self, aBuf):
aLen = len(aBuf)
@@ -71,9 +71,9 @@ class UniversalDetector:
if not self._mGotData:
# If the data starts with BOM, we know it is UTF
if aBuf[:3] == codecs.BOM:
if aBuf[:3] == codecs.BOM_UTF8:
# EF BB BF UTF-8 with BOM
self.result = {'encoding': "UTF-8", 'confidence': 1.0}
self.result = {'encoding': "UTF-8-SIG", 'confidence': 1.0}
elif aBuf[:4] == codecs.BOM_UTF32_LE:
# FF FE 00 00 UTF-32, little-endian BOM
self.result = {'encoding': "UTF-32LE", 'confidence': 1.0}
@@ -8,6 +8,7 @@ __copyright__ = 'Copyright 2013 Antoine Bertin'
import logging
from .exceptions import *
from .mkv import *
from .subtitle import *
logging.getLogger(__name__).addHandler(logging.NullHandler())
+42 -18
View File
@@ -65,30 +65,53 @@ class MKV(object):
continue
if element_name == 'Info':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self.info = Info.fromelement(ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32']))
element = self._load_element(stream, specs, element_position)
self.info = Info.fromelement(element)
elif element_name == 'Tracks':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
tracks = ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32'])
tracks = self._load_element(stream, specs, element_position)
self.video_tracks.extend([VideoTrack.fromelement(t) for t in tracks if t['TrackType'].data == VIDEO_TRACK])
self.audio_tracks.extend([AudioTrack.fromelement(t) for t in tracks if t['TrackType'].data == AUDIO_TRACK])
self.subtitle_tracks.extend([SubtitleTrack.fromelement(t) for t in tracks if t['TrackType'].data == SUBTITLE_TRACK])
elif element_name == 'Chapters':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self.chapters.extend([Chapter.fromelement(c) for c in ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32'])[0] if c.name == 'ChapterAtom'])
element = self._load_element(stream, specs, element_position)
self.chapters.extend([Chapter.fromelement(c) for c in element[0] if c.name == 'ChapterAtom'])
elif element_name == 'Tags':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self.tags.extend([Tag.fromelement(t) for t in ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32'])])
element = self._load_element(stream, specs, element_position)
self.tags.extend([Tag.fromelement(t) for t in element])
elif element_name == 'SeekHead' and self.recurse_seek_head:
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self._parse_seekhead(ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32']), segment, stream, specs)
element = self._load_element(stream, specs, element_position)
self._parse_seekhead(element, segment, stream, specs)
else:
logger.debug('Element %s ignored', element_name)
self._parsed_positions.add(element_position)
def _load_element(self,stream, specs, position):
stream.seek(position)
element = ebml.parse_element(stream,specs)
element.load(stream, specs, ignore_element_names=['Void', 'CRC-32'])
return element
def get_srt_subtitles_track_by_language(self):
"""get a dictionary of the SRT subtitles track id's indexed by language"""
subtitles = dict()
for track in self.subtitle_tracks:
logger.info("Found subtitle language %s, with codec %s and lacing %s",
track.language,track.codec_id,track.lacing)
if not track.is_srt():
logger.debug("Ignoring subtitle language %s with codec %s",track.language,track.codec_id)
elif track.lacing:
logger.info("Ignoring subtitle language %s with lacing %s",track.language,track.lacing)
else:
subtitles[track.language] = track
return subtitles
def to_dict(self):
return {'info': self.info.__dict__, 'video_tracks': [t.__dict__ for t in self.video_tracks],
@@ -103,6 +126,7 @@ class Info(object):
"""Object for the Info EBML element"""
def __init__(self, title=None, duration=None, date_utc=None, timecode_scale=None, muxing_app=None, writing_app=None):
self.title = title
self.timecode_scale = timecode_scale
self.duration = timedelta(microseconds=duration * (timecode_scale or 1000000) // 1000) if duration else None
self.date_utc = date_utc
self.muxing_app = muxing_app
@@ -119,7 +143,7 @@ class Info(object):
title = element.get('Title')
duration = element.get('Duration')
date_utc = element.get('DateUTC')
timecode_scale = element.get('TimecodeScale')
timecode_scale = element.get('TimecodeScale',1000000)
muxing_app = element.get('MuxingApp')
writing_app = element.get('WritingApp')
return cls(title, duration, date_utc, timecode_scale, muxing_app, writing_app)
@@ -133,7 +157,7 @@ class Info(object):
class Track(object):
"""Base object for the Tracks EBML element"""
def __init__(self, type=None, number=None, name=None, language=None, enabled=None, default=None, forced=None, lacing=None, # @ReservedAssignment
def __init__(self, type=None, number=None, name=None, language=None, enabled=None, default=None, forced=None, lacing=None,
codec_id=None, codec_name=None):
self.type = type
self.number = number
@@ -154,10 +178,10 @@ class Track(object):
:type element: :class:`~enzyme.parsers.ebml.Element`
"""
type = element.get('TrackType') # @ReservedAssignment
type = element.get('TrackType')
number = element.get('TrackNumber', 0)
name = element.get('Name')
language = element.get('Language')
language = element.get('Language','eng')
enabled = bool(element.get('FlagEnabled', 1))
default = bool(element.get('FlagDefault', 1))
forced = bool(element.get('FlagForced', 0))
@@ -256,8 +280,9 @@ class AudioTrack(Track):
class SubtitleTrack(Track):
"""Object for the Tracks EBML element with :data:`SUBTITLE_TRACK` TrackType"""
pass
def is_srt(self):
return self.codec_id == 'S_TEXT/UTF8'
class Tag(object):
"""Object for the Tag EBML element"""
@@ -344,8 +369,7 @@ class Chapter(object):
if chapterdisplays:
string = chapterdisplays[0].get('ChapString')
language = chapterdisplays[0].get('ChapLanguage')
return cls(start, hidden, enabled, end, string, language)
return cls(start, hidden, enabled, end)
return cls(start, hidden, enabled, end, string, language)
def __repr__(self):
return '<%s [%s, enabled=%s]>' % (self.__class__.__name__, self.start, self.enabled)
@@ -38,8 +38,15 @@ READERS = {
BINARY: read_element_binary
}
class BaseElement(object):
class Element(object):
def __init__(self, id=None, position=None, size=None, data=None):
self.id = id
self.position = position
self.size = size
self.data = data
class Element(BaseElement):
"""Base object of EBML
:param int id: id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
@@ -52,14 +59,11 @@ class Element(object):
:param data: data as read by the corresponding :data:`READERS`
"""
def __init__(self, id=None, type=None, name=None, level=None, position=None, size=None, data=None): # @ReservedAssignment
self.id = id
def __init__(self, id=None, type=None, name=None, level=None, position=None, size=None, data=None):
super(Element, self).__init__(id, position, size, data)
self.type = type
self.name = name
self.level = level
self.position = position
self.size = size
self.data = data
def __repr__(self):
return '<%s [%s, %r]>' % (self.__class__.__name__, self.name, self.data)
@@ -89,7 +93,7 @@ class MasterElement(Element):
Element(DocType, u'matroska')
"""
def __init__(self, id=None, name=None, level=None, position=None, size=None, data=None): # @ReservedAssignment
def __init__(self, id=None, name=None, level=None, position=None, size=None, data=None):
super(MasterElement, self).__init__(id, MASTER, name, level, position, size, data)
def load(self, stream, specs, ignore_element_types=None, ignore_element_names=None, max_level=None):
@@ -137,8 +141,7 @@ class MasterElement(Element):
def __iter__(self):
return iter(self.data)
def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None):
def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None, include_element_names=None):
"""Parse a stream for `size` bytes according to the `specs`
:param stream: file-like object from which to read
@@ -148,6 +151,7 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
:param list ignore_element_types: list of element types to ignore
:param list ignore_element_names: list of element names to ignore
:param int max_level: maximum level of elements
:param list include_element_names: list of element names to include exclusively, so ignoring all other element names
:return: parsed data as a tree of :class:`~enzyme.parsers.ebml.core.Element`
:rtype: list
@@ -158,26 +162,32 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
"""
ignore_element_types = ignore_element_types if ignore_element_types is not None else []
ignore_element_names = ignore_element_names if ignore_element_names is not None else []
include_element_names = include_element_names if include_element_names is not None else []
start = stream.tell()
elements = []
while size is None or stream.tell() - start < size:
try:
element = parse_element(stream, specs)
if element is None:
if element.type is None:
logger.error('Element with id 0x%x is not in the specs' % element_id)
stream.seek(element_size, 1)
continue
logger.debug('%s %s parsed', element.__class__.__name__, element.name)
if element.type in ignore_element_types or element.name in ignore_element_names:
logger.info('%s %s ignored', element.__class__.__name__, element.name)
if element.type == MASTER:
stream.seek(element.size, 1)
elif element.type in ignore_element_types or element.name in ignore_element_names:
logger.info('%s %s %s ignored', element.__class__.__name__, element.name, element.type)
stream.seek(element.size, 1)
continue
if element.type == MASTER:
elif len(include_element_names) > 0 and element.name not in include_element_names:
stream.seek(element.size, 1)
continue
elif element.type == MASTER:
if max_level is not None and element.level >= max_level:
logger.info('Maximum level %d reached for children of %s %s', max_level, element.__class__.__name__, element.name)
stream.seek(element.size, 1)
else:
logger.debug('Loading child elements for %s %s with size %d', element.__class__.__name__, element.name, element.size)
element.data = parse(stream, specs, element.size, ignore_element_types, ignore_element_names, max_level)
element.data = parse(stream, specs, element.size, ignore_element_types, ignore_element_names, max_level,include_element_names)
else:
element.data = READERS[element.type](stream, element.size)
elements.append(element)
except ReadError:
if size is not None:
@@ -186,21 +196,15 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
return elements
def parse_element(stream, specs, load_children=False, ignore_element_types=None, ignore_element_names=None, max_level=None):
def parse_element(stream, specs):
"""Extract a single :class:`Element` from the `stream` according to the `specs`
:param stream: file-like object from which to read
:param dict specs: see :ref:`specs`
:param bool load_children: load children elements if the parsed element is a :class:`MasterElement`
:param list ignore_element_types: list of element types to ignore
:param list ignore_element_names: list of element names to ignore
:param int max_level: maximum level for children elements
:return: the parsed element
:rtype: :class:`Element`
"""
ignore_element_types = ignore_element_types if ignore_element_types is not None else []
ignore_element_names = ignore_element_names if ignore_element_names is not None else []
element_id = read_element_id(stream)
if element_id is None:
raise ReadError('Cannot read element id')
@@ -208,20 +212,14 @@ def parse_element(stream, specs, load_children=False, ignore_element_types=None,
if element_size is None:
raise ReadError('Cannot read element size')
if element_id not in specs:
logger.error('Element with id 0x%x is not in the specs' % element_id)
stream.seek(element_size, 1)
return None
return BaseElement(element_id,stream.tell(),element_size)
element_type, element_name, element_level = specs[element_id]
if element_type == MASTER:
element = MasterElement(element_id, element_name, element_level, stream.tell(), element_size)
if load_children:
element.data = parse(stream, specs, element.size, ignore_element_types, ignore_element_names, max_level)
else:
element = Element(element_id, element_type, element_name, element_level, stream.tell(), element_size)
element.data = READERS[element_type](stream, element_size)
return element
def get_matroska_specs(webm_only=False):
"""Get the Matroska specs
@@ -0,0 +1,185 @@
# -*- coding: utf-8 -*-
from .exceptions import ReadError
from .parsers import ebml
from .mkv import MKV
from .parsers import ebml
import logging
import codecs
import os
import io
__all__ = ['Subtitle']
logger = logging.getLogger(__name__)
class Subtitle(object):
"""Subtitle extractor for Matroska Video File.
Currently only SRT subtitles stored without lacing are supported
"""
def __init__(self, stream):
"""Read the available subtitles from a MKV file-like object"""
self._stream = stream
#Use the MKV class to parse the META information
mkv = MKV(stream)
self._timecode_scale = mkv.info.timecode_scale
self._subtitles = mkv.get_srt_subtitles_track_by_language()
def has_subtitle(self, language):
return language in self._subtitles
def write_subtitle_to_stream(self, language):
"""Write a single subtitle to stream or return None if language not available"""
if language in self._subtitles:
subtitle = self._subtitles[language]
return _write_track_to_srt_stream(self._stream,subtitle.number,self._timecode_scale)
logger.info("Writing subtitle for language %s to stream",language)
else:
logger.info("Subtitle for language %s not found",language)
def write_subtitles_to_stream(self):
"""Write all available subtitles as streams to a dictionary with language as the key"""
subtitles = dict()
for language in self._subtitles:
subtitles[language] = self.write_subtitle_to_stream(language)
return subtitles
def _write_track_to_srt_stream(mkv_stream, track, timecode_scale):
srt_stream = io.StringIO()
index = 0
for cluster in _parse_segment(mkv_stream,track):
for blockgroup in cluster.blockgroups:
index = index + 1
timeRange = _print_time_range(timecode_scale,cluster.timecode,blockgroup.block.timecode,blockgroup.duration)
srt_stream.write(str(index) + '\n')
srt_stream.write(timeRange + '\n')
srt_stream.write(codecs.decode(blockgroup.block.data.read(),'utf-8') + '\n')
srt_stream.write('\n')
return srt_stream
def _parse_segment(stream,track):
stream.seek(0)
specs = ebml.get_matroska_specs()
# Find all level 1 Cluster elements and its subelements. Speed up this process by excluding all other currently known level 1 elements
try:
segments = ebml.parse(stream, specs,include_element_names=['Segment','Cluster','BlockGroup','Timecode','Block','BlockDuration',],max_level=3)
except ReadError:
pass
clusters = []
for cluster in segments[0].data:
_parse_cluster(track, clusters, cluster)
return clusters
def _parse_cluster(track, clusters, cluster):
blockgroups = []
timecode = None
for child in cluster.data:
if child.name == 'BlockGroup':
_parse_blockgroup(track, blockgroups, child)
elif child.name == 'Timecode':
timecode = child.data
if len(blockgroups) > 0 and timecode != None:
clusters.append(Cluster(timecode, blockgroups))
def _parse_blockgroup(track, blockgroups, blockgroup):
block = None
duration = None
for child in blockgroup.data:
if child.name == 'Block':
block = Block.fromelement(child)
if block.track != track:
block = None
elif child.name == 'BlockDuration':
duration = child.data
if duration != None and block != None:
blockgroups.append(BlockGroup(block, duration))
def _print_time_range(timecode_scale,clusterTimecode,blockTimecode,duration):
timecode_scale_ms = timecode_scale / 1000000 #Timecode
rawTimecode = clusterTimecode + blockTimecode
startTimeMilleSeconds = (rawTimecode) * timecode_scale_ms
endTimeMilleSeconds = (rawTimecode + duration) * timecode_scale_ms
return _print_time(startTimeMilleSeconds) + " --> " + _print_time(endTimeMilleSeconds)
def _print_time(timeInMilleSeconds):
timeInSeconds, milleSeconds = divmod(timeInMilleSeconds, 1000)
timeInMinutes, seconds = divmod(timeInSeconds, 60)
hours, minutes = divmod(timeInMinutes, 60)
return '%d:%02d:%02d,%d' % (hours,minutes,seconds,milleSeconds)
class Cluster(object):
def __init__(self,timecode=None, blockgroups=[]):
self.timecode = timecode
self.blockgroups = blockgroups
class BlockGroup(object):
def __init__(self,block=None,duration=None):
self.block = block
self.duration = duration
class Block(object):
def __init__(self, track=None, timecode=None, invisible=False, lacing=None, flags=None, data=None):
self.track = track
self.timecode = timecode
self.invisible = invisible
self.lacing = lacing
self.flags = flags
self.data = data
@classmethod
def fromelement(cls,element):
stream = element.data
track = ebml.read_element_size(stream)
timecode = ebml.read_element_integer(stream,2)
flags = ord(stream.read(1))
invisible = bool(flags & 0x8)
if (flags & 0x6):
lacing = 'EBML'
elif (flags & 0x4):
lacing = 'fixed-size'
elif (flags & 0x2):
lacing = 'Xiph'
else:
lacing = None
if lacing:
raise ReadError('Laced blocks are not implemented yet')
data = ebml.read_element_binary(stream, element.size - stream.tell())
return cls(track,timecode,invisible,lacing,flags,data)
def __repr__(self):
return '<%s track=%d, timecode=%d, invisible=%d, lacing=%s>' % (self.__class__.__name__, self.track,self.timecode,self.invisible,self.lacing)
class SimpleBlock(Block):
def __init__(self, track=None, timecode=None, keyframe=False, invisible=False, lacing=None, flags=None, data=None, discardable=False):
super(SimpleBlock,self).__init__(track,timecode,invisible,lacing,flags,data)
self.keyframe = keyframe
self.discardable = discardable
def fromelement(cls,element):
simpleblock = super(SimpleBlock, cls).fromelement(element)
simpleblock.keyframe = bool(simpleblock.flags & 0x80)
simpleblock.discardable = bool(simpleblock.flags & 0x1)
return simpleblock
def __repr__(self):
return '<%s track=%d, timecode=%d, keyframe=%d, invisible=%d, lacing=%s, discardable=%d>' % (self.__class__.__name__, self.track,self.timecode,self.keyframe,self.invisible,self.lacing,self.discardable)
@@ -1,9 +1,11 @@
# -*- coding: utf-8 -*-
from . import test_mkv, test_parsers
from . import test_mkv, test_parsers, test_subtitle
import unittest
suite = unittest.TestSuite([test_mkv.suite(), test_parsers.suite()])
suite = unittest.TestSuite([test_mkv.suite(), test_parsers.suite(), test_subtitle.suite()])
if __name__ == '__main__':
@@ -193,7 +193,7 @@ class MKVTestCase(unittest.TestCase):
self.assertTrue(mkv.audio_tracks[0].type == AUDIO_TRACK)
self.assertTrue(mkv.audio_tracks[0].number == 2)
self.assertTrue(mkv.audio_tracks[0].name is None)
self.assertTrue(mkv.audio_tracks[0].language is None)
self.assertTrue(mkv.audio_tracks[0].language == 'eng')
self.assertTrue(mkv.audio_tracks[0].enabled == True)
self.assertTrue(mkv.audio_tracks[0].default == True)
self.assertTrue(mkv.audio_tracks[0].forced == False)
@@ -276,7 +276,7 @@ class MKVTestCase(unittest.TestCase):
self.assertTrue(mkv.audio_tracks[1].type == AUDIO_TRACK)
self.assertTrue(mkv.audio_tracks[1].number == 10)
self.assertTrue(mkv.audio_tracks[1].name == 'Commentary')
self.assertTrue(mkv.audio_tracks[1].language is None)
self.assertTrue(mkv.audio_tracks[1].language == 'eng')
self.assertTrue(mkv.audio_tracks[1].enabled == True)
self.assertTrue(mkv.audio_tracks[1].default == False)
self.assertTrue(mkv.audio_tracks[1].forced == False)
@@ -292,7 +292,7 @@ class MKVTestCase(unittest.TestCase):
self.assertTrue(mkv.subtitle_tracks[0].type == SUBTITLE_TRACK)
self.assertTrue(mkv.subtitle_tracks[0].number == 3)
self.assertTrue(mkv.subtitle_tracks[0].name is None)
self.assertTrue(mkv.subtitle_tracks[0].language is None)
self.assertTrue(mkv.subtitle_tracks[0].language == 'eng')
self.assertTrue(mkv.subtitle_tracks[0].enabled == True)
self.assertTrue(mkv.subtitle_tracks[0].default == True)
self.assertTrue(mkv.subtitle_tracks[0].forced == False)
@@ -33,7 +33,7 @@ class EBMLTestCase(unittest.TestCase):
self.stream.close()
def check_element(self, element_id, element_type, element_name, element_level, element_position, element_size, element_data, element,
ignore_element_types=None, ignore_element_names=None, max_level=None):
ignore_element_types=None, ignore_element_names=None, max_level=None, include_element_names=None):
"""Recursively check an element"""
# base
self.assertTrue(element.id == element_id)
@@ -53,6 +53,8 @@ class EBMLTestCase(unittest.TestCase):
element_data = [e for e in element_data if e[1] not in ignore_element_types]
if ignore_element_names is not None: # filter validation on element names
element_data = [e for e in element_data if e[2] not in ignore_element_names]
if include_element_names is not None: # filter validation on element names
element_data = [e for e in element_data if e[2] in include_element_names]
if element.level == max_level: # special check when maximum level is reached
self.assertTrue(element.data is None)
return
@@ -60,7 +62,7 @@ class EBMLTestCase(unittest.TestCase):
for i in range(len(element.data)):
self.check_element(element_data[i][0], element_data[i][1], element_data[i][2], element_data[i][3],
element_data[i][4], element_data[i][5], element_data[i][6], element.data[i], ignore_element_types,
ignore_element_names, max_level)
ignore_element_names, max_level,include_element_names)
def test_parse_full(self):
result = ebml.parse(self.stream, self.specs)
@@ -87,6 +89,15 @@ class EBMLTestCase(unittest.TestCase):
self.check_element(self.validation[i][0], self.validation[i][1], self.validation[i][2], self.validation[i][3],
self.validation[i][4], self.validation[i][5], self.validation[i][6], result[i], ignore_element_names=ignore_element_names)
def test_parse_include_element_names(self):
include_element_names = ['Segment','Cluster']
result = ebml.parse(self.stream, self.specs, include_element_names=include_element_names)
self.validation = [e for e in self.validation if e[2] in include_element_names]
self.assertTrue(len(result) == len(self.validation))
for i in range(len(self.validation)):
self.check_element(self.validation[i][0], self.validation[i][1], self.validation[i][2], self.validation[i][3],
self.validation[i][4], self.validation[i][5], self.validation[i][6], result[i], include_element_names=include_element_names)
def test_parse_max_level(self):
max_level = 3
result = ebml.parse(self.stream, self.specs, max_level=max_level)
@@ -0,0 +1,86 @@
# -*- coding: utf-8 -*-
from enzyme.subtitle import Subtitle, _print_time_range, _print_time
import unittest
import os
import io
import requests
import zipfile
import glob
# Test directory
TEST_DIR = os.path.join(os.path.dirname(__file__), os.path.splitext(__file__)[0])
def setUpModule():
if not os.path.exists(TEST_DIR):
r = requests.get('http://downloads.sourceforge.net/project/matroska/test_files/matroska_test_w1_1.zip')
with zipfile.ZipFile(io.BytesIO(r.content), 'r') as f:
f.extractall(TEST_DIR)
class SubtitleTestCase(unittest.TestCase):
@classmethod
def setUpClass(cls):
file = 'test5.mkv'
stream = io.open(os.path.join(TEST_DIR, file), 'rb')
cls.subtitle = Subtitle(stream)
def test_subtitles_found(self):
subtitles = self.subtitle._subtitles
self.assertTrue('eng' in subtitles)
self.assertTrue('hun' in subtitles)
self.assertTrue('ger' in subtitles)
self.assertTrue('fre' in subtitles)
self.assertTrue('spa' in subtitles)
self.assertTrue('ita' in subtitles)
self.assertTrue('jpn' in subtitles)
self.assertTrue('und' in subtitles)
def test_write_subtitle_to_stream(self):
subtitle_stream = self.subtitle.write_subtitle_to_stream("eng")
self.assertIsInstance(subtitle_stream,io.StringIO,"Expecting a StringIO stream")
def test_write_subtitle_to_stream(self):
subtitle_streams = self.subtitle.write_subtitles_to_stream()
self.assertIn("eng", subtitle_streams, "Expecting a subtitle stream for language eng")
self.assertIsInstance(subtitle_streams["eng"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("hun", subtitle_streams, "Expecting a subtitle stream for language hun")
self.assertIsInstance(subtitle_streams["hun"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("ger", subtitle_streams, "Expecting a subtitle stream for language ger")
self.assertIsInstance(subtitle_streams["ger"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("fre", subtitle_streams, "Expecting a subtitle stream for language fre")
self.assertIsInstance(subtitle_streams["fre"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("spa", subtitle_streams, "Expecting a subtitle stream for language spa")
self.assertIsInstance(subtitle_streams["spa"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("ita", subtitle_streams, "Expecting a subtitle stream for language ita")
self.assertIsInstance(subtitle_streams["ita"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("jpn", subtitle_streams, "Expecting a subtitle stream for language jpn")
self.assertIsInstance(subtitle_streams["jpn"],io.StringIO,"Expecting a StringIO stream")
def test_print_time(self):
self.assertEqual('0:00:00,0',_print_time(0))
self.assertEqual('0:00:00,1',_print_time(1))
self.assertEqual('0:00:00,999',_print_time(999))
self.assertEqual('0:00:01,0',_print_time(1000))
self.assertEqual('0:00:59,999',_print_time(1000*60-1))
self.assertEqual('0:01:00,0',_print_time(1000*60))
self.assertEqual('0:59:59,999',_print_time(1000*60*60-1))
self.assertEqual('1:00:00,0',_print_time(1000*60*60))
def test_print_time_range(self):
self.assertEqual('0:00:00,0 --> 0:00:00,0',_print_time_range(1000000,0,0,0))
self.assertEqual('0:01:00,0 --> 0:01:01,0',_print_time_range(1000000,0,60000,1000))
def suite():
suite = unittest.TestSuite()
suite.addTest(unittest.TestLoader().loadTestsFromTestCase(SubtitleTestCase))
return suite
if __name__ == '__main__':
unittest.TextTestRunner().run(suite())
+165
View File
@@ -0,0 +1,165 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:
a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this license
document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.
@@ -0,0 +1,227 @@
GuessIt
=======
.. image:: http://img.shields.io/pypi/v/guessit.svg
:target: https://pypi.python.org/pypi/guessit
:alt: Latest Version
.. image:: http://img.shields.io/badge/license-LGPLv3-blue.svg
:target: https://pypi.python.org/pypi/guessit
:alt: License
.. image:: http://img.shields.io/travis/wackou/guessit.svg?branch=master
:target: http://travis-ci.org/wackou/guessit
:alt: Build Status
.. image:: http://img.shields.io/coveralls/wackou/guessit.svg?branch=master
:target: https://coveralls.io/r/wackou/guessit
:alt: Coveralls
`HuBoard <https://huboard.com/wackou/guessit>`_
GuessIt is a python library that extracts as much information as
possible from a video file.
It has a very powerful filename matcher that allows to guess a lot of
metadata from a video using its filename only. This matcher works with
both movies and tv shows episodes.
For example, GuessIt can do the following::
$ guessit "Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi"
For: Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi
GuessIt found: {
[1.00] "mimetype": "video/x-msvideo",
[0.80] "episodeNumber": 3,
[0.80] "videoCodec": "XviD",
[1.00] "container": "avi",
[1.00] "format": "HDTV",
[0.70] "series": "Treme",
[0.50] "title": "Right Place, Wrong Time",
[0.80] "releaseGroup": "NoTV",
[0.80] "season": 1,
[1.00] "type": "episode"
}
Install
-------
Installing GuessIt is simple with `pip <http://www.pip-installer.org/>`_::
$ pip install guessit
or, with `easy_install <http://pypi.python.org/pypi/setuptools>`_::
$ easy_install guessit
But, you really `shouldn't do that <http://stackoverflow.com/questions/3220404/why-use-pip-over-easy-install>`_.
You can now launch a demo::
$ guessit -d
and guess your own filename::
$ guessit "Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv"
For: Breaking.Bad.S05E08.720p.MP4.BDRip.[KoTuWa].mkv
GuessIt found: {
[1.00] "mimetype": "video/x-matroska",
[1.00] "episodeNumber": 8,
[0.30] "container": "mkv",
[1.00] "format": "BluRay",
[0.70] "series": "Breaking Bad",
[1.00] "releaseGroup": "KoTuWa",
[1.00] "screenSize": "720p",
[1.00] "season": 5,
[1.00] "type": "episode"
}
Filename matcher
----------------
The filename matcher is based on pattern matching and is able to recognize many properties from the filename,
like ``title``, ``year``, ``series``, ``episodeNumber``, ``seasonNumber``,
``videoCodec``, ``screenSize``, ``language``. Guessed values are cleaned up and given in a readable format
which may not match exactly the raw filename.
The full list of available properties can be seen in the
`main documentation <http://guessit.readthedocs.org/en/latest/user/properties.html>`_.
Other features
--------------
GuessIt also allows you to compute a whole lot of hashes from a file,
namely all the ones you can find in the hashlib python module (md5,
sha1, ...), but also the Media Player Classic hash that is used (amongst
others) by OpenSubtitles and SMPlayer, as well as the ed2k hash.
If you have the 'guess-language' python package installed, GuessIt can also
analyze a subtitle file's contents and detect which language it is written in.
If you have the 'enzyme' python package installed, GuessIt can also detect the
properties from the actual video file metadata.
Usage
-----
guessit can be use from command line::
$ guessit
usage: guessit [-h] [-t TYPE] [-n] [-c] [-X DISABLED_TRANSFORMERS] [-v]
[-P SHOW_PROPERTY] [-u] [-a] [-y] [-f INPUT_FILE] [-d] [-p]
[-V] [-s] [--version] [-b] [-i INFO] [-S EXPECTED_SERIES]
[-T EXPECTED_TITLE] [-Y] [-D] [-L ALLOWED_LANGUAGES] [-E]
[-C ALLOWED_COUNTRIES] [-G EXPECTED_GROUP]
[filename [filename ...]]
positional arguments:
filename Filename or release name to guess
optional arguments:
-h, --help show this help message and exit
Naming:
-t TYPE, --type TYPE The suggested file type: movie, episode. If undefined,
type will be guessed.
-n, --name-only Parse files as name only. Disable folder parsing,
extension parsing, and file content analysis.
-c, --split-camel Split camel case part of filename.
-X DISABLED_TRANSFORMERS, --disabled-transformer DISABLED_TRANSFORMERS
Transformer to disable (can be used multiple time)
-S EXPECTED_SERIES, --expected-series EXPECTED_SERIES
Expected series to parse (can be used multiple times)
-T EXPECTED_TITLE, --expected-title EXPECTED_TITLE
Expected title (can be used multiple times)
-Y, --date-year-first
If short date is found, consider the first digits as
the year.
-D, --date-day-first If short date is found, consider the second digits as
the day.
-L ALLOWED_LANGUAGES, --allowed-languages ALLOWED_LANGUAGES
Allowed language (can be used multiple times)
-E, --episode-prefer-number
Guess "serie.213.avi" as the episodeNumber 213.
Without this option, it will be guessed as season 2,
episodeNumber 13
-C ALLOWED_COUNTRIES, --allowed-country ALLOWED_COUNTRIES
Allowed country (can be used multiple times)
-G EXPECTED_GROUP, --expected-group EXPECTED_GROUP
Expected release group (can be used multiple times)
Output:
-v, --verbose Display debug output
-P SHOW_PROPERTY, --show-property SHOW_PROPERTY
Display the value of a single property (title, series,
videoCodec, year, type ...)
-u, --unidentified Display the unidentified parts.
-a, --advanced Display advanced information for filename guesses, as
json output
-y, --yaml Display information for filename guesses as yaml
output (like unit-test)
-f INPUT_FILE, --input-file INPUT_FILE
Read filenames from an input file.
-d, --demo Run a few builtin tests instead of analyzing a file
Information:
-p, --properties Display properties that can be guessed.
-V, --values Display property values that can be guessed.
-s, --transformers Display transformers that can be used.
--version Display the guessit version.
guessit.io:
-b, --bug Submit a wrong detection to the guessit.io service
Other features:
-i INFO, --info INFO The desired information type: filename, video,
hash_mpc or a hash from python's hashlib module, such
as hash_md5, hash_sha1, ...; or a list of any of them,
comma-separated
It can also be used as a python module::
>>> from guessit import guess_file_info
>>> guess_file_info('Treme.1x03.Right.Place,.Wrong.Time.HDTV.XviD-NoTV.avi')
{u'mimetype': 'video/x-msvideo', u'episodeNumber': 3, u'videoCodec': u'XviD', u'container': u'avi', u'format': u'HDTV', u'series': u'Treme', u'title': u'Right Place, Wrong Time', u'releaseGroup': u'NoTV', u'season': 1, u'type': u'episode'}
Support
-------
The project website for GuessIt is hosted at `ReadTheDocs <http://guessit.readthedocs.org/>`_.
There you will also find the User guide and Developer documentation.
This project is hosted on GitHub: `<https://github.com/wackou/guessit>`_
Please report issues and/or feature requests via the `bug tracker <https://github.com/wackou/guessit/issues>`_.
You can also report issues using the command-line tool::
$ guessit --bug "filename.that.fails.avi"
Contribute
----------
GuessIt is under active development, and contributions are more than welcome!
#. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
There is a Contributor Friendly tag for issues that should be ideal for people who are not very
familiar with the codebase yet.
#. Fork `the repository`_ on Github to start making your changes to the **master**
branch (or branch off of it).
#. Write a test which shows that the bug was fixed or that the feature works as expected.
#. Send a pull request and bug the maintainer until it gets merged and published. :)
.. _the repository: https://github.com/wackou/guessit
License
-------
GuessIt is licensed under the `LGPLv3 license <http://www.gnu.org/licenses/lgpl.html>`_.
+15 -3
View File
@@ -89,10 +89,14 @@ from guessit.guess import Guess, smart_merge
from guessit.language import Language
from guessit.matcher import IterativeMatcher
from guessit.textutils import clean_default, is_camel, from_camel
from copy import deepcopy
import babelfish
import os.path
import logging
from copy import deepcopy
from guessit.options import get_opts
import shlex
# Needed for guessit.plugins.transformers.reload() to be called.
from guessit.plugins import transformers
log = logging.getLogger(__name__)
@@ -117,7 +121,7 @@ def _build_filename_mtree(filename, options=None, **kwargs):
mtree = IterativeMatcher(filename, options=options, **kwargs)
second_pass_options = mtree.second_pass_options
if second_pass_options:
log.debug("Running 2nd pass")
log.debug('Running 2nd pass with options: %s' % second_pass_options)
merged_options = dict(options)
merged_options.update(second_pass_options)
mtree = IterativeMatcher(filename, options=merged_options, **kwargs)
@@ -271,8 +275,16 @@ def guess_file_info(filename, info=None, options=None, **kwargs):
"""
info = info or 'filename'
options = options or {}
if isinstance(options, base_text_type):
args = shlex.split(options)
options = vars(get_opts().parse_args(args))
if default_options:
merged_options = deepcopy(default_options)
if isinstance(default_options, base_text_type):
default_args = shlex.split(default_options)
merged_options = vars(get_opts().parse_args(default_args))
else:
merged_options = deepcopy(default_options)
merged_options.update(options)
options = merged_options
@@ -181,16 +181,16 @@ def submit_bug(filename, options):
opts = dict((k, v) for k, v in options.__dict__.items()
if v and k != 'submit_bug')
r = requests.post('http://localhost:5000/bugs', {'filename': filename,
r = requests.post('http://guessit.io/bugs', {'filename': filename,
'version': __version__,
'options': str(opts)})
if r.status_code == 200:
print('Successfully submitted file: %s' % r.text)
else:
print('Could not submit bug at the moment, please try again later.')
print('Could not submit bug at the moment, please try again later: %s %s' % (r.status_code, r.reason))
except RequestException as e:
print('Could not submit bug at the moment, please try again later.')
print('Could not submit bug at the moment, please try again later: %s' % e)
def main(args=None, setup_logging=True):
@@ -17,4 +17,4 @@
# You should have received a copy of the Lesser GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
__version__ = '0.10.4.dev0'
__version__ = '0.11.1.dev0'
@@ -135,8 +135,14 @@ class SameKeyValidator(object):
self.validator_function = validator_function
def validate(self, prop, string, node, match, entry_start, entry_end):
path_nodes = [path_node for path_node in node.ancestors if path_node.category == 'path']
if path_nodes:
path_node = path_nodes[0]
else:
path_node = node.root
for key in prop.keys:
for same_value_leaf in node.root.leaves_containing(key):
for same_value_leaf in path_node.leaves_containing(key):
ret = self.validator_function(same_value_leaf, key, prop, string, node, match, entry_start, entry_end)
if ret is not None:
return ret
@@ -144,6 +150,9 @@ class SameKeyValidator(object):
class OnlyOneValidator(SameKeyValidator):
"""
Check that there's only one occurence of key for current directory
"""
def __init__(self):
super(OnlyOneValidator, self).__init__(lambda same_value_leaf, key, prop, string, node, match, entry_start, entry_end: False)
@@ -153,12 +162,16 @@ class DefaultValidator(object):
def validate(self, prop, string, node, match, entry_start, entry_end):
span = _get_span(prop, match)
span = _trim_span(span, string[span[0]:span[1]])
return DefaultValidator.validate_string(string, span, entry_start, entry_end)
@staticmethod
def validate_string(string, span, entry_start=None, entry_end=None):
start, end = span
sep_start = start <= 0 or string[start - 1] in sep
sep_end = end >= len(string) or string[end] in sep
start_by_other = start in entry_end
end_by_other = end in entry_start
start_by_other = start in entry_end if entry_end else False
end_by_other = end in entry_start if entry_start else False
if (sep_start or start_by_other) and (sep_end or end_by_other):
return True
return False
@@ -235,6 +248,13 @@ class NeighborValidator(DefaultValidator):
return False
class FullMatchValidator(DefaultValidator):
"""Make sure the node match fully"""
def validate(self, prop, string, node, match, entry_start, entry_end):
at_start, at_end = _get_positions(prop, string, node, match, entry_start, entry_end)
return at_start and at_end
class LeavesValidator(DefaultValidator):
def __init__(self, lambdas=None, previous_lambdas=None, next_lambdas=None, both_side=False, default_=True):
@@ -290,7 +310,7 @@ class LeavesValidator(DefaultValidator):
class _Property:
"""Represents a property configuration."""
def __init__(self, keys=None, pattern=None, canonical_form=None, canonical_from_pattern=True, confidence=1.0, enhance=True, global_span=False, validator=DefaultValidator(), formatter=None, disabler=None, confidence_lambda=None):
def __init__(self, keys=None, pattern=None, canonical_form=None, canonical_from_pattern=True, confidence=1.0, enhance=True, global_span=False, validator=DefaultValidator(), formatter=None, disabler=None, confidence_lambda=None, remove_duplicates=False):
"""
:param keys: Keys of the property (format, screenSize, ...)
:type keys: string
@@ -309,6 +329,8 @@ class _Property:
:type validator: :class:`DefaultValidator`
:param formatter: Formater to use
:type formatter: function
:param remove_duplicates: Keep only the last match if multiple values are found
:type remove_duplicates: bool
"""
if isinstance(keys, list):
self.keys = keys
@@ -335,6 +357,7 @@ class _Property:
self.validator = validator
self.formatter = formatter
self.disabler = disabler
self.remove_duplicates = remove_duplicates
def disabled(self, options):
if self.disabler:
@@ -479,7 +502,8 @@ class PropertiesContainer(object):
entries.append((prop, match))
else:
matches = list(prop.compiled.finditer(string))
duplicate_matches[prop] = matches
if prop.remove_duplicates:
duplicate_matches[prop] = matches
for match in matches:
entries.append((prop, match))
@@ -490,6 +514,9 @@ class PropertiesContainer(object):
if computed_confidence is not None:
prop.confidence = computed_confidence
entries.sort(key=lambda entry: -entry[0].confidence)
# sort entries, from most confident to less confident
if validate:
# compute entries start and ends
for prop, match in entries:
@@ -531,7 +558,7 @@ class PropertiesContainer(object):
del entry_end[end]
for prop, prop_duplicate_matches in duplicate_matches.items():
# Keeping the last valid match.
# Keeping the last valid match only.
# Needed for the.100.109.hdtv-lol.mp4
for duplicate_match in prop_duplicate_matches[:-1]:
entries.remove((prop, duplicate_match))
@@ -561,8 +588,8 @@ class PropertiesContainer(object):
for prop, match in key_entries:
start, end = _get_span(prop, match)
if not best_prop or \
best_prop.confidence < best_prop.confidence or \
best_prop.confidence == best_prop.confidence and \
best_prop.confidence < prop.confidence or \
best_prop.confidence == prop.confidence and \
best_match.span()[1] - best_match.span()[0] < match.span()[1] - match.span()[0]:
best_prop, best_match = prop, match
+10 -10
View File
@@ -287,10 +287,10 @@ def choose_int(g1, g2):
if v1 == v2:
return v1, 1 - (1 - c1) * (1 - c2)
else:
if c1 > c2:
return v1, c1 - c2
if c1 >= c2:
return v1, c1 - c2 / 2
else:
return v2, c2 - c1
return v2, c2 - c1 / 2
def choose_string(g1, g2):
@@ -308,7 +308,7 @@ def choose_string(g1, g2):
prepended to it.
>>> s(choose_string(('Hello', 0.75), ('World', 0.5)))
('Hello', 0.25)
('Hello', 0.5)
>>> s(choose_string(('Hello', 0.5), ('hello', 0.5)))
('Hello', 0.75)
@@ -354,10 +354,10 @@ def choose_string(g1, g2):
# in case of conflict, return the one with highest confidence
else:
if c1 > c2:
return v1, c1 - c2
if c1 >= c2:
return v1, c1 - c2 / 2
else:
return v2, c2 - c1
return v2, c2 - c1 / 2
def _merge_similar_guesses_nocheck(guesses, prop, choose):
@@ -474,8 +474,8 @@ def merge_all(guesses, append=None):
# delete very unlikely values
for p in list(result.keys()):
if result.confidence(p) < 0.05:
del result[p]
if result.confidence(p) < 0.05:
del result[p]
# make sure our appendable properties contain unique values
for prop in append:
@@ -509,7 +509,7 @@ def smart_merge(guesses):
for string_part in ('title', 'series', 'container', 'format',
'releaseGroup', 'website', 'audioCodec',
'videoCodec', 'screenSize', 'episodeFormat',
'audioChannels', 'idNumber'):
'audioChannels', 'idNumber', 'container'):
merge_similar_guesses(guesses, string_part, choose_string)
# 2- merge the rest, potentially discarding information not properly
@@ -173,8 +173,9 @@ LNG_COMMON_WORDS = frozenset([
'is', 'it', 'am', 'mad', 'men', 'man', 'run', 'sin', 'st', 'to',
'no', 'non', 'war', 'min', 'new', 'car', 'day', 'bad', 'bat', 'fan',
'fry', 'cop', 'zen', 'gay', 'fat', 'one', 'cherokee', 'got', 'an', 'as',
'cat', 'her', 'be', 'hat', 'sun', 'may', 'my', 'mr', 'rum', 'pi', 'bb', 'bt',
'tv', 'aw', 'by', 'md', 'mp', 'cd', 'lt', 'gt', 'in', 'ad', 'ice', 'ay', 'at',
'cat', 'her', 'be', 'hat', 'sun', 'may', 'my', 'mr', 'rum', 'pi', 'bb',
'bt', 'tv', 'aw', 'by', 'md', 'mp', 'cd', 'lt', 'gt', 'in', 'ad', 'ice',
'ay', 'at', 'star', 'so',
# french words
'bas', 'de', 'le', 'son', 'ne', 'ca', 'ce', 'et', 'que',
'mal', 'est', 'vol', 'or', 'mon', 'se', 'je', 'tu', 'me',
@@ -185,7 +186,7 @@ LNG_COMMON_WORDS = frozenset([
'la', 'el', 'del', 'por', 'mar', 'al',
# other
'ind', 'arw', 'ts', 'ii', 'bin', 'chan', 'ss', 'san', 'oss', 'iii',
'vi', 'ben', 'da', 'lt', 'ch',
'vi', 'ben', 'da', 'lt', 'ch', 'sr', 'ps', 'cx',
# new from babelfish
'mkv', 'avi', 'dmd', 'the', 'dis', 'cut', 'stv', 'des', 'dia', 'and',
'cab', 'sub', 'mia', 'rim', 'las', 'une', 'par', 'srt', 'ano', 'toy',
@@ -197,7 +198,7 @@ LNG_COMMON_WORDS = frozenset([
'bs', # Bosnian
'kz',
# countries
'gt', 'lt',
'gt', 'lt', 'im',
# part/pt
'pt'
])
@@ -206,9 +207,11 @@ LNG_COMMON_WORDS_STRICT = frozenset(['brazil'])
subtitle_prefixes = ['sub', 'subs', 'st', 'vost', 'subforced', 'fansub', 'hardsub']
subtitle_suffixes = ['subforced', 'fansub', 'hardsub']
subtitle_suffixes = ['subforced', 'fansub', 'hardsub', 'sub', 'subs']
lang_prefixes = ['true']
all_lang_prefixes_suffixes = subtitle_prefixes + subtitle_suffixes + lang_prefixes
def find_possible_languages(string, allowed_languages=None):
"""Find possible languages in the string
@@ -239,7 +242,7 @@ def find_possible_languages(string, allowed_languages=None):
for prefix in lang_prefixes:
if lang_word.startswith(prefix):
lang_word = lang_word[len(prefix):]
if lang_word not in common_words:
if lang_word not in common_words and word.lower() not in common_words:
try:
lang = Language.fromguessit(lang_word)
if allowed_languages:
+73 -67
View File
@@ -215,94 +215,100 @@ def log_found_guess(guess, logger=None):
(k, v, guess.raw(k), guess.confidence(k)))
def _get_split_spans(node, span):
partition_spans = node.get_partition_spans(span)
for to_remove_span in partition_spans:
if to_remove_span[0] == span[0] and to_remove_span[1] in [span[1], span[1] + 1]:
partition_spans.remove(to_remove_span)
break
return partition_spans
class GuessFinder(object):
def __init__(self, guess_func, confidence=None, logger=None, options=None):
self.guess_func = guess_func
self.confidence = confidence
self.logger = logger or log
self.options = options
self.options = options or {}
def process_nodes(self, nodes):
for node in nodes:
self.process_node(node)
def process_node(self, node, iterative=True, partial_span=None):
def process_node(self, node, iterative=True, partial_span=None, skip_nodes=True):
if skip_nodes and not isinstance(skip_nodes, list):
skip_nodes = self.options.get('skip_nodes')
elif not isinstance(skip_nodes, list):
skip_nodes = []
if partial_span:
value = node.value[partial_span[0]:partial_span[1]]
else:
value = node.value
string = ' %s ' % value # add sentinels
if not self.options:
matcher_result = self.guess_func(string, node)
matcher_result = self.guess_func(string, node, self.options)
if not matcher_result:
return
if not isinstance(matcher_result, Guess):
result, span = matcher_result
else:
matcher_result = self.guess_func(string, node, self.options)
result, span = matcher_result, matcher_result.metadata().span
#log.error('span2 %s' % (span,))
if matcher_result:
if not isinstance(matcher_result, Guess):
result, span = matcher_result
else:
result, span = matcher_result, matcher_result.metadata().span
if not result:
return
if result:
# readjust span to compensate for sentinels
span = (span[0] - 1, span[1] - 1)
if span[1] == len(string):
# somehow, the sentinel got included in the span. Remove it
span = (span[0], span[1] - 1)
# readjust span to compensate for partial_span
if partial_span:
span = (span[0] + partial_span[0], span[1] + partial_span[0])
# readjust span to compensate for sentinels
span = (span[0] - 1, span[1] - 1)
partition_spans = None
if self.options and 'skip_nodes' in self.options:
skip_nodes = self.options.get('skip_nodes')
for skip_node in skip_nodes:
if skip_node.parent.node_idx == node.node_idx[:len(skip_node.parent.node_idx)] and\
skip_node.span == span or\
skip_node.span == (span[0] + skip_node.offset, span[1] + skip_node.offset):
if partition_spans is None:
partition_spans = _get_split_spans(node, skip_node.span)
else:
new_partition_spans = []
for partition_span in partition_spans:
tmp_node = MatchTree(value, span=partition_span, parent=node)
tmp_partitions_spans = _get_split_spans(tmp_node, skip_node.span)
new_partition_spans.extend(tmp_partitions_spans)
partition_spans.extend(new_partition_spans)
# readjust span to compensate for partial_span
if partial_span:
span = (span[0] + partial_span[0], span[1] + partial_span[0])
if not partition_spans:
# restore sentinels compensation
if skip_nodes:
skip_nodes = [skip_node for skip_node in self.options.get('skip_nodes') if skip_node.parent.span[0] == node.span[0] or skip_node.parent.span[1] == node.span[1]]
# if we guessed a node that we need to skip, recurse down the tree and ignore that node
indices = set()
skip_nodes_spans = []
next_skip_nodes = []
for skip_node in skip_nodes:
skip_for_next = False
skip_nodes_spans.append(skip_node.span)
if node.offset <= skip_node.span[0] <= node.span[1]:
indices.add(skip_node.span[0] - node.offset)
skip_for_next = True
if node.offset <= skip_node.span[1] <= node.span[1]:
indices.add(skip_node.span[1] - node.offset)
skip_for_next = True
if not skip_for_next:
next_skip_nodes.append(skip_node)
if indices:
partition_spans = [s for s in node.get_partition_spans(indices) if s not in skip_nodes_spans]
for partition_span in partition_spans:
relative_span = (partition_span[0] - node.offset, partition_span[1] - node.offset)
self.process_node(node, partial_span=relative_span, skip_nodes=next_skip_nodes)
return
if isinstance(result, Guess):
guess = result
else:
guess = Guess(result, confidence=self.confidence, input=string, span=span)
# restore sentinels compensation
if isinstance(result, Guess):
guess = result
else:
no_sentinel_string =string[1:-1]
guess = Guess(result, confidence=self.confidence, input=no_sentinel_string, span=span)
if not iterative:
found_guess(node, guess, logger=self.logger)
else:
absolute_span = (span[0] + node.offset, span[1] + node.offset)
node.partition(span)
found_child = None
for child in node.children:
if child.span == absolute_span:
# if we have a match on one of our children, mark it as such...
found_guess(child, guess, logger=self.logger)
found_child = child
break
# ...and only then recurse on the other children
for child in node.children:
if child is not found_child:
self.process_node(child)
if not iterative:
found_guess(node, guess, logger=self.logger)
else:
absolute_span = (span[0] + node.offset, span[1] + node.offset)
node.partition(span)
if node.is_leaf():
found_guess(node, guess, logger=self.logger)
else:
found_child = None
for child in node.children:
if child.span == absolute_span:
found_guess(child, guess, logger=self.logger)
found_child = child
break
for child in node.children:
if child is not found_child:
self.process_node(child)
else:
for partition_span in partition_spans:
self.process_node(node, partial_span=partition_span)
+82 -18
View File
@@ -27,9 +27,7 @@ import guessit # @UnusedImport needed for doctests
from guessit import UnicodeMixin, base_text_type
from guessit.textutils import clean_default, str_fill
from guessit.patterns import group_delimiters
from guessit.guess import (smart_merge,
Guess)
from guessit.guess import smart_merge, Guess
log = logging.getLogger(__name__)
@@ -75,7 +73,7 @@ class BaseMatchTree(UnicodeMixin):
(as shown by the ``f``'s on the last-but-one line).
"""
def __init__(self, string='', span=None, parent=None, clean_function=None):
def __init__(self, string='', span=None, parent=None, clean_function=None, category=None):
self.string = string
self.span = span or (0, len(string))
self.parent = parent
@@ -83,6 +81,7 @@ class BaseMatchTree(UnicodeMixin):
self.guess = Guess()
self._clean_value = None
self._clean_function = clean_function or clean_default
self.category = category
@property
def value(self):
@@ -116,6 +115,32 @@ class BaseMatchTree(UnicodeMixin):
return result
@property
def raw(self):
result = {}
for guess in self.guesses:
for k in guess.keys():
result[k] = guess.raw(k)
return result
@property
def guesses(self):
"""
List all guesses, including children ones.
:return: list of guesses objects
"""
result = []
if self.guess:
result.append(self.guess)
for c in self.children:
result.extend(c.guesses)
return result
@property
def root(self):
"""Return the root node of the tree."""
@@ -124,6 +149,23 @@ class BaseMatchTree(UnicodeMixin):
return self.parent.root
@property
def ancestors(self):
"""
Retrieve all ancestors, from this node to root node.
:return: a list of MatchTree objects
"""
ret = [self]
if not self.parent:
return ret
parent_ancestors = self.parent.ancestors
ret.extend(parent_ancestors)
return ret
@property
def depth(self):
"""Return the depth of this node."""
@@ -136,17 +178,30 @@ class BaseMatchTree(UnicodeMixin):
"""Return whether this node is a leaf or not."""
return self.children == []
def add_child(self, span):
"""Add a new child node to this node with the given span."""
child = MatchTree(self.string, span=span, parent=self, clean_function=self._clean_function)
def add_child(self, span, category=None):
"""Add a new child node to this node with the given span.
:param span: span of the new MatchTree
:param category: category of the new MatchTree
:return: A new MatchTree instance having self as a parent
"""
child = MatchTree(self.string, span=span, parent=self, clean_function=self._clean_function, category=category)
self.children.append(child)
return child
def get_partition_spans(self, indices):
"""Return the list of absolute spans for the regions of the original
string defined by splitting this node at the given indices (relative
to this node)"""
to this node)
:param indices: indices of the partition spans
:return: a list of tuple of the spans
"""
indices = sorted(indices)
if indices[-1] > len(self.value):
log.error('Filename: {}'.format(self.string))
log.error('Invalid call to get_partitions_spans, indices are too high: {}, len({}) == {:d}'
.format(indices, self.value, len(self.value)))
if indices[0] != 0:
indices.insert(0, 0)
if indices[-1] != len(self.value):
@@ -155,23 +210,33 @@ class BaseMatchTree(UnicodeMixin):
spans = []
for start, end in zip(indices[:-1], indices[1:]):
spans.append((self.offset + start,
self.offset + end))
self.offset + end))
return spans
def partition(self, indices):
def partition(self, indices, category=None):
"""Partition this node by splitting it at the given indices,
relative to this node."""
for partition_span in self.get_partition_spans(indices):
self.add_child(span=partition_span)
relative to this node.
def split_on_components(self, components):
:param indices: indices of the partition spans
:param category: category of the new MatchTree
:return: a list of created MatchTree instances
"""
created = []
for partition_span in self.get_partition_spans(indices):
created.append(self.add_child(span=partition_span, category=category))
return created
def split_on_components(self, components, category=None):
offset = 0
created = []
for c in components:
start = self.value.find(c, offset)
end = start + len(c)
self.add_child(span=(self.offset + start,
self.offset + end))
created.append(self.add_child(span=(self.offset + start,
self.offset + end), category=category))
offset = end
return created
def nodes_at_depth(self, depth):
"""Return all the nodes at a given depth in the tree"""
@@ -208,7 +273,7 @@ class BaseMatchTree(UnicodeMixin):
raise ValueError('Non-existent node index: %s' % (idx,))
def nodes(self):
"""Return all the nodes and subnodes in this tree."""
"""Return a generator of all nodes and subnodes in this tree."""
yield self
for child in self.children:
for node in child.nodes():
@@ -220,7 +285,6 @@ class BaseMatchTree(UnicodeMixin):
yield self
else:
for child in self.children:
# pylint: disable=W0212
for leaf in child.leaves():
yield leaf
View File
@@ -29,4 +29,4 @@ info_exts = ['nfo']
video_exts = ['3g2', '3gp', '3gp2', 'asf', 'avi', 'divx', 'flv', 'm4v', 'mk2',
'mka', 'mkv', 'mov', 'mp4', 'mp4a', 'mpeg', 'mpg', 'ogg', 'ogm',
'ogv', 'qt', 'ra', 'ram', 'rm', 'ts', 'wav', 'webm', 'wma', 'wmv',
'iso']
'iso', 'vob']
@@ -0,0 +1,80 @@
import re
from guessit.patterns import sep, build_or_pattern
from guessit.patterns.numeral import parse_numeral
range_separators = ['-', 'to', 'a']
discrete_separators = ['&', 'and', 'et']
excluded_separators = ['.'] # Dot cannot serve as a discrete_separator
discrete_sep = sep
for range_separator in range_separators:
discrete_sep = discrete_sep.replace(range_separator, '')
for excluded_separator in excluded_separators:
discrete_sep = discrete_sep.replace(excluded_separator, '')
discrete_separators.append(discrete_sep)
all_separators = list(range_separators)
all_separators.extend(discrete_separators)
range_separators_re = re.compile(build_or_pattern(range_separators), re.IGNORECASE)
discrete_separators_re = re.compile(build_or_pattern(discrete_separators), re.IGNORECASE)
all_separators_re = re.compile(build_or_pattern(all_separators), re.IGNORECASE)
def list_parser(value, property_list_name, discrete_separators_re=discrete_separators_re, range_separators_re=range_separators_re, allow_discrete=False, fill_gaps=False):
discrete_elements = filter(lambda x: x != '', discrete_separators_re.split(value))
discrete_elements = [x.strip() for x in discrete_elements]
proper_discrete_elements = []
i = 0
while i < len(discrete_elements):
if i < len(discrete_elements) - 2 and range_separators_re.match(discrete_elements[i+1]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i+1] + discrete_elements[i+2])
i += 3
else:
match = range_separators_re.search(discrete_elements[i])
if match and match.start() == 0:
proper_discrete_elements[i - 1] += discrete_elements[i]
elif match and match.end() == len(discrete_elements[i]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i + 1])
else:
proper_discrete_elements.append(discrete_elements[i])
i += 1
discrete_elements = proper_discrete_elements
ret = []
for discrete_element in discrete_elements:
range_values = filter(lambda x: x != '', range_separators_re.split(discrete_element))
range_values = [x.strip() for x in range_values]
if len(range_values) > 1:
for x in range(0, len(range_values) - 1):
start_range_ep = parse_numeral(range_values[x])
end_range_ep = parse_numeral(range_values[x+1])
for range_ep in range(start_range_ep, end_range_ep + 1):
if range_ep not in ret:
ret.append(range_ep)
else:
discrete_value = parse_numeral(discrete_element)
if discrete_value not in ret:
ret.append(discrete_value)
if len(ret) > 1:
if not allow_discrete:
valid_ret = list()
# replace discrete elements by ranges
valid_ret.append(ret[0])
for i in range(0, len(ret) - 1):
previous = valid_ret[len(valid_ret) - 1]
if ret[i+1] < previous:
pass
else:
valid_ret.append(ret[i+1])
ret = valid_ret
if fill_gaps:
ret = list(range(min(ret), max(ret) + 1))
if len(ret) > 1:
return {None: ret[0], property_list_name: ret}
if len(ret) > 0:
return ret[0]
return None
@@ -19,11 +19,14 @@
#
from __future__ import absolute_import, division, print_function, unicode_literals
from functools import wraps
import logging
import sys
import os
log = logging.getLogger(__name__)
GREEN_FONT = "\x1B[0;32m"
YELLOW_FONT = "\x1B[0;33m"
BLUE_FONT = "\x1B[0;34m"
@@ -87,3 +90,27 @@ def setup_logging(colored=True, with_time=False, with_thread=False, filename=Non
ch.setFormatter(SimpleFormatter(with_time, with_thread))
logging.getLogger().addHandler(ch)
def trace_func_call(f):
@wraps(f)
def wrapper(*args, **kwargs):
is_method = (f.__name__ != f.__qualname__) # method is still not bound, we need to get around it
if is_method:
no_self_args = args[1:]
else:
no_self_args = args
args_str = ', '.join(repr(arg) for arg in no_self_args)
kwargs_str = ', '.join('{}={}'.format(k, v) for k, v in kwargs.items())
if not args_str:
args_str = kwargs_str
elif not kwargs_str:
args_str = args_str
else:
args_str = '{}, {}'.format(args_str, kwargs_str)
log.debug('Calling {}({})'.format(f.__name__, args_str))
return f(*args, **kwargs)
return wrapper
@@ -525,3 +525,29 @@
screenSize: 720p
season: 5
series: Game of Thrones
? Parks and Recreation - [04x12] - Ad Campaign.avi
: type: episode
series: Parks and Recreation
season: 4
episodeNumber: 12
title: Ad Campaign
? Star Trek Into Darkness (2013)/star.trek.into.darkness.2013.720p.web-dl.h264-publichd.mkv
: type: movie
title: Star Trek Into Darkness
year: 2013
screenSize: 720p
format: WEB-DL
videoCodec: h264
releaseGroup: PublicHD
? /var/medias/series/The Originals/Season 02/The.Originals.S02E15.720p.HDTV.X264-DIMENSION.mkv
: type: episode
series: The Originals
season: 2
episodeNumber: 15
screenSize: 720p
format: HDTV
videoCodec: h264
releaseGroup: DIMENSION
@@ -282,12 +282,6 @@
episodeNumber: 1
title: The Impossible Astronaut
? Parks and Recreation - [04x12] - Ad Campaign.avi
: series: Parks and Recreation
season: 4
episodeNumber: 12
title: Ad Campaign
? The Sopranos - [05x07] - In Camelot.mp4
: series: The Sopranos
season: 5
@@ -635,7 +629,7 @@
format: HDTV
releaseGroup: lol
? 03-Criminal.Minds.5x03.Reckoner.ENG.-.sub.FR.HDTV.XviD-STi.[tvu.org.ru].avi
? Criminal.Minds.5x03.Reckoner.ENG.-.sub.FR.HDTV.XviD-STi.[tvu.org.ru].avi
: series: Criminal Minds
language: English
subtitleLanguage: French
@@ -1186,3 +1180,684 @@
videoCodec: h264
releaseGroup: BS
format: WEB-DL
? How to Make It in America - S02E06 - I'm Sorry, Who's Yosi?.mkv
: series: How to Make It in America
season: 2
episodeNumber: 6
title: I'm Sorry, Who's Yosi?
? 24.S05E07.FRENCH.DVDRip.XviD-FiXi0N.avi
: episodeNumber: 7
format: DVD
language: fr
season: 5
series: '24'
videoCodec: XviD
releaseGroup: FiXi0N
? 12.Monkeys.S01E12.FRENCH.BDRip.x264-VENUE.mkv
: episodeNumber: 12
format: BluRay
language: fr
releaseGroup: VENUE
season: 1
series: 12 Monkeys
videoCodec: h264
? The.Daily.Show.2015.07.01.Kirsten.Gillibrand.Extended.720p.CC.WEBRip.AAC2.0.x264-BTW.mkv
: audioChannels: '2.0'
audioCodec: AAC
date: 2015-07-01
format: WEBRip
other: CC
releaseGroup: BTW
screenSize: 720p
series: The Daily Show
title: Kirsten Gillibrand Extended
videoCodec: h264
? The.Daily.Show.2015.07.02.Sarah.Vowell.CC.WEBRip.AAC2.0.x264-BTW.mkv
: audioChannels: '2.0'
audioCodec: AAC
date: 2015-07-02
format: WEBRip
other: CC
releaseGroup: BTW
series: The Daily Show
title: Sarah Vowell
videoCodec: h264
? 90.Day.Fiance.S02E07.I.Have.To.Tell.You.Something.720p.HDTV.x264-W4F
: options: -n
episodeNumber: 7
format: HDTV
screenSize: 720p
season: 2
series: 90 Day Fiance
title: I Have To Tell You Something
? Doctor.Who.2005.S04E06.FRENCH.LD.DVDRip.XviD-TRACKS.avi
: episodeNumber: 6
format: DVD
language: fr
releaseGroup: TRACKS
season: 4
series: Doctor Who
other: LD
videoCodec: XviD
year: 2005
? Astro.Le.Petit.Robot.S01E01+02.FRENCH.DVDRiP.X264.INT-BOOLZ.mkv
: episodeNumber: 1
episodeList: [1, 2]
format: DVD
language: fr
releaseGroup: INT-BOOLZ
season: 1
series: Astro Le Petit Robot
videoCodec: h264
? Annika.Bengtzon.2012.E01.Le.Testament.De.Nobel.FRENCH.DVDRiP.XViD-STVFRV.avi
: episodeNumber: 1
format: DVD
language: fr
releaseGroup: STVFRV
series: Annika Bengtzon
title: Le Testament De Nobel
videoCodec: XviD
year: 2012
? Dead.Set.02.FRENCH.LD.DVDRip.XviD-EPZ.avi
: episodeNumber: 2
format: DVD
language: fr
other: LD
releaseGroup: EPZ
series: Dead Set
videoCodec: XviD
? Phineas and Ferb S01E00 & S01E01 & S01E02
: options: -n
episodeList:
- 0
- 1
- 2
episodeNumber: 0
season: 1
series: Phineas and Ferb
? Show.Name.S01E02.S01E03.HDTV.XViD.Etc-Group
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - S01E02 - S01E03 - S01E04 - Ep Name
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
season: 1
series: Show Name
title: Ep Name
? Show.Name.1x02.1x03.HDTV.XViD.Etc-Group
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - 1x02 - 1x03 - 1x04 - Ep Name
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
season: 1
series: Show Name
title: Ep Name
? Show.Name.S01E02.HDTV.XViD.Etc-Group
: options: -n
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - S01E02 - My Ep Name
: options: -n
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show Name - S01.E03 - My Ep Name
: options: -n
episodeNumber: 3
season: 1
series: Show Name
title: My Ep Name
? Show.Name.S01E02E03.HDTV.XViD.Etc-Group
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - S01E02-03 - My Ep Name
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show.Name.S01.E02.E03
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
season: 1
series: Show Name
? Show_Name.1x02.HDTV_XViD_Etc-Group
: options: -n
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - 1x02 - My Ep Name
: options: -n
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show_Name.1x02x03x04.HDTV_XViD_Etc-Group
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - 1x02-03-04 - My Ep Name
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show.Name.100.Event.2010.11.23.HDTV.XViD.Etc-Group
: options: -n
date: 2010-11-23
episodeNumber: 100
format: HDTV
releaseGroup: Etc-Group
series: Show Name
title: Event
videoCodec: XviD
? Show.Name.2010.11.23.HDTV.XViD.Etc-Group
: options: -n
date: 2010-11-23
format: HDTV
releaseGroup: Etc-Group
series: Show Name
? Show Name - 2010-11-23 - Ep Name
: options: -n
date: 2010-11-23
series: Show Name
title: Ep Name
? Show Name Season 1 Episode 2 Ep Name
: options: -n
episodeNumber: 2
season: 1
series: Show Name
title: Ep Name
? Show.Name.S01.HDTV.XViD.Etc-Group
: options: -n
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show.Name.E02-03
: options: -n
episodeNumber: 2
episodeList:
- 2
- 3
series: Show Name
? Show.Name.E02.2010
: options: -n
episodeNumber: 2
year: 2010
series: Show Name
? Show.Name.E23.Test
: options: -n
episodeNumber: 23
series: Show Name
title: Test
? Show.Name.Part.3.HDTV.XViD.Etc-Group
: options: -n -t episode
part: 3
series: Show Name
format: HDTV
videoCodec: XviD
releaseGroup: Etc-Group
? Show.Name.Part.1.and.Part.2.Blah-Group
: options: -n -t episode
part: 1
partList:
- 1
- 2
series: Show Name
? Show Name - 01 - Ep Name
: options: -n
episodeNumber: 1
series: Show Name
title: Ep Name
? 01 - Ep Name
: options: -n
episodeNumber: 1
series: Ep Name
? Show.Name.102.HDTV.XViD.Etc-Group
: options: -n
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? '[HorribleSubs] Maria the Virgin Witch - 01 [720p].mkv'
: episodeNumber: 1
releaseGroup: HorribleSubs
screenSize: 720p
series: Maria the Virgin Witch
? '[ISLAND]One_Piece_679_[VOSTFR]_[V1]_[8bit]_[720p]_[EB7838FC].mp4'
: options: -E
crc32: EB7838FC
episodeNumber: 679
releaseGroup: ISLAND
screenSize: 720p
series: One Piece
subtitleLanguage: fr
videoProfile: 8bit
version: 1
? '[ISLAND]One_Piece_679_[VOSTFR]_[8bit]_[720p]_[EB7838FC].mp4'
: options: -E
crc32: EB7838FC
episodeNumber: 679
releaseGroup: ISLAND
screenSize: 720p
series: One Piece
subtitleLanguage: fr
videoProfile: 8bit
? '[Kaerizaki-Fansub]_One_Piece_679_[VOSTFR][HD_1280x720].mp4'
: options: -E
episodeNumber: 679
other: HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: One Piece
subtitleLanguage: fr
? '[Kaerizaki-Fansub]_One_Piece_679_[VOSTFR][FANSUB][HD_1280x720].mp4'
: options: -E
episodeNumber: 679
other:
- Fansub
- HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: One Piece
subtitleLanguage: fr
? '[Kaerizaki-Fansub]_One_Piece_681_[VOSTFR][HD_1280x720]_V2.mp4'
: options: -E
episodeNumber: 681
other: HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: One Piece
subtitleLanguage: fr
version: 2
? '[Kaerizaki-Fansub] High School DxD New 04 VOSTFR HD (1280x720) V2.mp4'
: options: -E
episodeNumber: 4
other: HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: High School DxD New
subtitleLanguage: fr
version: 2
? '[Kaerizaki-Fansub] One Piece 603 VOSTFR PS VITA (960x544) V2.mp4'
: options: -E
episodeNumber: 603
releaseGroup: Kaerizaki-Fansub
screenSize: 960x544
series: One Piece
subtitleLanguage: fr
version: 2
? '[Group Name] Show Name.13'
: options: -n
episodeNumber: 13
releaseGroup: Group Name
series: Show Name
? '[Group Name] Show Name - 13'
: options: -n
episodeNumber: 13
releaseGroup: Group Name
series: Show Name
? '[Group Name] Show Name 13'
: options: -n
episodeNumber: 13
releaseGroup: Group Name
series: Show Name
# [Group Name] Show Name.13-14
# [Group Name] Show Name - 13-14
# Show Name 13-14
? '[Stratos-Subs]_Infinite_Stratos_-_12_(1280x720_H.264_AAC)_[379759DB]'
: options: -n
audioCodec: AAC
crc32: 379759DB
episodeNumber: 12
releaseGroup: Stratos-Subs
screenSize: 720p
series: Infinite Stratos
videoCodec: h264
# [ShinBunBu-Subs] Bleach - 02-03 (CX 1280x720 x264 AAC)
? '[SGKK] Bleach 312v1 [720p/MKV]'
: options: -n
episodeNumber: 312
releaseGroup: SGKK
screenSize: 720p
series: Bleach
version: 1
? '[Ayako]_Infinite_Stratos_-_IS_-_07_[H264][720p][EB7838FC]'
: options: -n
crc32: EB7838FC
episodeNumber: 7
releaseGroup: Ayako
screenSize: 720p
series: Infinite Stratos
videoCodec: h264
? '[Ayako] Infinite Stratos - IS - 07v2 [H264][720p][44419534]'
: options: -n
crc32: '44419534'
episodeNumber: 7
releaseGroup: Ayako
screenSize: 720p
series: Infinite Stratos
videoCodec: h264
version: 2
? '[Ayako-Shikkaku] Oniichan no Koto Nanka Zenzen Suki Janain Dakara ne - 10 [LQ][h264][720p] [8853B21C]'
: options: -n
crc32: 8853B21C
episodeNumber: 10
releaseGroup: Ayako-Shikkaku
screenSize: 720p
series: Oniichan no Koto Nanka Zenzen Suki Janain Dakara ne
videoCodec: h264
# Add support for absolute episodes
? Bleach - s16e03-04 - 313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach.s16e03-04.313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach.s16e03-04.313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach - 313-314
: options: -En
episodeList:
- 313
- 314
episodeNumber: 313
series: Bleach
? Bleach - s16e03-04 - 313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach.s16e03-04.313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach s16e03e04 313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? '[ShinBunBu-Subs] Bleach - 02-03 (CX 1280x720 x264 AAC)'
: audioCodec: AAC
episodeList:
- 2
- 3
episodeNumber: 2
releaseGroup: ShinBunBu-Subs
screenSize: 720p
series: Bleach
videoCodec: h264
? 003. Show Name - Ep Name.ext
: episodeNumber: 3
series: Show Name
title: Ep Name
? 003-004. Show Name - Ep Name.ext
: episodeList:
- 3
- 4
episodeNumber: 3
series: Show Name
title: Ep Name
? One Piece - 102
: options: -n -t episode
episodeNumber: 2
season: 1
series: One Piece
? "[ACX]_Wolf's_Spirit_001.mkv"
: episodeNumber: 1
releaseGroup: ACX
series: "Wolf's Spirit"
? Project.Runway.S14E00.and.S14E01.(Eng.Subs).SDTV.x264-[2Maverick].mp4
: episodeList:
- 0
- 1
episodeNumber: 0
format: TV
releaseGroup: 2Maverick
season: 14
series: Project Runway
subtitleLanguage: en
videoCodec: h264
? '[Hatsuyuki-Kaitou]_Fairy_Tail_2_-_16-20_[720p][10bit].torrent'
: episodeList:
- 16
- 17
- 18
- 19
- 20
episodeNumber: 16
releaseGroup: Hatsuyuki-Kaitou
screenSize: 720p
series: Fairy Tail 2
videoProfile: 10bit
? '[Hatsuyuki-Kaitou]_Fairy_Tail_2_-_16-20_(191-195)_[720p][10bit].torrent'
: options: -E
episodeList:
- 16
- 17
- 18
- 19
- 20
episodeNumber: 16
releaseGroup: Hatsuyuki-Kaitou
screenSize: 720p
series: Fairy Tail 2
? "Looney Tunes 1940x01 Porky's Last Stand.mkv"
: episodeNumber: 1
season: 1940
series: Looney Tunes
title: Porky's Last Stand
year: 1940
? The.Good.Wife.S06E01.E10.720p.WEB-DL.DD5.1.H.264-CtrlHD/The.Good.Wife.S06E09.Trust.Issues.720p.WEB-DL.DD5.1.H.264-CtrlHD.mkv
: audioChannels: '5.1'
audioCodec: DolbyDigital
episodeList:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
episodeNumber: 9
format: WEB-DL
releaseGroup: CtrlHD
screenSize: 720p
season: 6
series: The Good Wife
title: Trust Issues
videoCodec: h264
? Fear the Walking Dead - 01x02 - So Close, Yet So Far.REPACK-KILLERS.French.C.updated.Addic7ed.com.mkv
: episodeNumber: 2
language: fr
other: Proper
properCount: 1
season: 1
series: Fear the Walking Dead
title: So Close, Yet So Far
? Fear the Walking Dead - 01x02 - En Close, Yet En Far.REPACK-KILLERS.French.C.updated.Addic7ed.com.mkv
: episodeNumber: 2
language: fr
other: Proper
properCount: 1
season: 1
series: Fear the Walking Dead
title: En Close, Yet En Far
? /av/unsorted/The.Daily.Show.2015.07.22.Jake.Gyllenhaal.720p.HDTV.x264-BATV.mkv
: date: 2015-07-22
format: HDTV
releaseGroup: BATV
screenSize: 720p
series: The Daily Show
title: Jake Gyllenhaal
videoCodec: h264
@@ -22,7 +22,6 @@ from __future__ import absolute_import, division, print_function, unicode_litera
from collections import defaultdict
from unittest import TestCase, TestLoader
import shlex
import logging
import os
import sys
@@ -86,10 +85,6 @@ class TestGuessit(TestCase):
options = required_fields.pop('options') if 'options' in required_fields else None
if options:
args = shlex.split(options)
options = get_opts().parse_args(args)
options = vars(options)
try:
found = guess_func(filename, options)
except Exception as e:
@@ -606,7 +606,9 @@
? Yves.Saint.Laurent.2013.FRENCH.DVDSCR.MD.XviD-ViVARiUM.avi
: format: DVD
language: French
other: Screener
other:
- MD
- Screener
releaseGroup: ViVARiUM
title: Yves Saint Laurent
videoCodec: XviD
@@ -759,3 +761,19 @@
screenSize: 1080p
title: transformers 2
videoCodec: h265
? 1.Angry.Man.1957.mkv
: title: 1 Angry Man
year: 1957
? 12.Angry.Men.1957.mkv
: title: 12 Angry Men
year: 1957
? 123.Angry.Men.1957.mkv
: title: 123 Angry Men
year: 1957
? "Looney Tunes 1444x866 Porky's Last Stand.mkv"
: screenSize: 1444x866
title: Looney Tunes
@@ -31,10 +31,12 @@ keywords = yaml.load("""
? Xvid PROPER
: videoCodec: Xvid
other: PROPER
properCount: 1
? PROPER-Xvid
: videoCodec: Xvid
other: PROPER
properCount: 1
""")
@@ -19,6 +19,7 @@
#
from __future__ import absolute_import, division, print_function, unicode_literals
from guessit.containers import DefaultValidator
from guessit.plugins.transformers import Transformer
from guessit.matcher import GuessFinder
@@ -41,10 +42,9 @@ class GuessDate(Transformer):
@staticmethod
def guess_date(string, node=None, options=None):
date, span = search_date(string, options.get('date_year_first') if options else False, options.get('date_day_first') if options else False)
if date:
if date and span and DefaultValidator.validate_string(string, span): # ensure we have a separator before and after date
return {'date': date}, span
else:
return None, None
return None, None
def process(self, mtree, options=None):
GuessFinder(self.guess_date, 1.0, self.log, options).process_nodes(mtree.unidentified_leaves())
@@ -24,6 +24,8 @@ from guessit.plugins.transformers import Transformer, get_transformer
from guessit.textutils import reorder_title
from guessit.matcher import found_property
from guessit.patterns.list import all_separators
from guessit.language import all_lang_prefixes_suffixes
class GuessEpisodeInfoFromPosition(Transformer):
@@ -33,39 +35,49 @@ class GuessEpisodeInfoFromPosition(Transformer):
def supported_properties(self):
return ['title', 'series']
def match_from_epnum_position(self, mtree, node, options):
epnum_idx = node.node_idx
@staticmethod
def excluded_word(*values):
for value in values:
if value.clean_value.lower() in (all_separators + all_lang_prefixes_suffixes):
return True
return False
def match_from_epnum_position(self, path_node, ep_node, options):
epnum_idx = ep_node.node_idx
# a few helper functions to be able to filter using high-level semantics
def before_epnum_in_same_pathgroup():
return [leaf for leaf in mtree.unidentified_leaves(lambda x: len(x.clean_value) > 1)
return [leaf for leaf in path_node.unidentified_leaves(lambda x: len(x.clean_value) > 1)
if (leaf.node_idx[0] == epnum_idx[0] and
leaf.node_idx[1:] < epnum_idx[1:])]
leaf.node_idx[1:] < epnum_idx[1:] and
not GuessEpisodeInfoFromPosition.excluded_word(leaf))]
def after_epnum_in_same_pathgroup():
return [leaf for leaf in mtree.unidentified_leaves(lambda x: len(x.clean_value) > 1)
return [leaf for leaf in path_node.unidentified_leaves(lambda x: len(x.clean_value) > 1)
if (leaf.node_idx[0] == epnum_idx[0] and
leaf.node_idx[1:] > epnum_idx[1:])]
leaf.node_idx[1:] > epnum_idx[1:] and
not GuessEpisodeInfoFromPosition.excluded_word(leaf))]
def after_epnum_in_same_explicitgroup():
return [leaf for leaf in mtree.unidentified_leaves(lambda x: len(x.clean_value) > 1)
return [leaf for leaf in path_node.unidentified_leaves(lambda x: len(x.clean_value) > 1)
if (leaf.node_idx[:2] == epnum_idx[:2] and
leaf.node_idx[2:] > epnum_idx[2:])]
leaf.node_idx[2:] > epnum_idx[2:] and
not GuessEpisodeInfoFromPosition.excluded_word(leaf))]
# epnumber is the first group and there are only 2 after it in same
# path group
# -> series title - episode title
title_candidates = self._filter_candidates(after_epnum_in_same_pathgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_pathgroup(), options)
if ('title' not in mtree.info and # no title
'series' in mtree.info and # series present
if ('title' not in path_node.info and # no title
'series' in path_node.info and # series present
before_epnum_in_same_pathgroup() == [] and # no groups before
len(title_candidates) == 1): # only 1 group after
found_property(title_candidates[0], 'title', confidence=0.4)
return
if ('title' not in mtree.info and # no title
if ('title' not in path_node.info and # no title
before_epnum_in_same_pathgroup() == [] and # no groups before
len(title_candidates) == 2): # only 2 groups after
@@ -77,17 +89,17 @@ class GuessEpisodeInfoFromPosition(Transformer):
# probably the series name
series_candidates = before_epnum_in_same_pathgroup()
if len(series_candidates) >= 1:
found_property(series_candidates[0], 'series', confidence=0.7)
found_property(series_candidates[0], 'series', confidence=0.7)
# only 1 group after (in the same path group) and it's probably the
# episode title.
title_candidates = self._filter_candidates(after_epnum_in_same_pathgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_pathgroup(), options)
if len(title_candidates) == 1:
found_property(title_candidates[0], 'title', confidence=0.5)
return
else:
# try in the same explicit group, with lower confidence
title_candidates = self._filter_candidates(after_epnum_in_same_explicitgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_explicitgroup(), options)
if len(title_candidates) == 1:
found_property(title_candidates[0], 'title', confidence=0.4)
return
@@ -96,7 +108,7 @@ class GuessEpisodeInfoFromPosition(Transformer):
return
# get the one with the longest value
title_candidates = self._filter_candidates(after_epnum_in_same_pathgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_pathgroup(), options)
if title_candidates:
maxidx = -1
maxv = -1
@@ -104,7 +116,8 @@ class GuessEpisodeInfoFromPosition(Transformer):
if len(c.clean_value) > maxv:
maxidx = i
maxv = len(c.clean_value)
found_property(title_candidates[maxidx], 'title', confidence=0.3)
if maxidx > -1:
found_property(title_candidates[maxidx], 'title', confidence=0.3)
def should_process(self, mtree, options=None):
options = options or {}
@@ -114,9 +127,9 @@ class GuessEpisodeInfoFromPosition(Transformer):
def _filter_candidates(candidates, options):
episode_details_transformer = get_transformer('guess_episode_details')
if episode_details_transformer:
return [n for n in candidates if not episode_details_transformer.container.find_properties(n.value, n, options, re_match=True)]
else:
return candidates
candidates = [n for n in candidates if not episode_details_transformer.container.find_properties(n.value, n, options, re_match=True)]
candidates = list(filter(lambda n: not GuessEpisodeInfoFromPosition.excluded_word(n), candidates))
return candidates
def process(self, mtree, options=None):
"""
@@ -128,15 +141,26 @@ class GuessEpisodeInfoFromPosition(Transformer):
if not eps:
eps = [node for node in mtree.leaves() if 'date' in node.guess]
eps = sorted(eps, key=lambda ep: -ep.guess.confidence())
if eps:
self.match_from_epnum_position(mtree, eps[0], options)
performed_path_nodes = []
for ep_node in eps:
# Perform only first episode node for each path node
path_node = [node for node in ep_node.ancestors if node.category == 'path']
if len(path_node) > 0:
path_node = path_node[0]
else:
path_node = ep_node.root
if path_node not in performed_path_nodes:
self.match_from_epnum_position(path_node, ep_node, options)
performed_path_nodes.append(path_node)
else:
# if we don't have the episode number, but at least 2 groups in the
# basename, then it's probably series - eptitle
basename = mtree.node_at((-2,))
basename = list(filter(lambda x: x.category == 'path', mtree.nodes()))[-2]
title_candidates = self._filter_candidates(basename.unidentified_leaves(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(basename.unidentified_leaves(), options)
if len(title_candidates) >= 2 and 'series' not in mtree.info:
found_property(title_candidates[0], 'series', confidence=0.4)
@@ -147,12 +171,13 @@ class GuessEpisodeInfoFromPosition(Transformer):
# if we only have 1 remaining valid group in the folder containing the
# file, then it's likely that it is the series name
path_nodes = list(filter(lambda x: x.category == 'path', mtree.nodes()))
try:
series_candidates = list(mtree.node_at((-3,)).unidentified_leaves())
except ValueError:
series_candidates = list(path_nodes[-3].unidentified_leaves())
except IndexError:
series_candidates = []
if len(series_candidates) == 1:
if len(series_candidates) == 1 and not GuessEpisodeInfoFromPosition.excluded_word(series_candidates[0]):
found_property(series_candidates[0], 'series', confidence=0.3)
# if there's a path group that only contains the season info, then the
@@ -163,7 +188,7 @@ class GuessEpisodeInfoFromPosition(Transformer):
if eps:
previous = [node for node in mtree.unidentified_leaves()
if node.node_idx[0] == eps[0].node_idx[0] - 1]
if len(previous) == 1:
if len(previous) == 1 and not GuessEpisodeInfoFromPosition.excluded_word(previous[0]):
found_property(previous[0], 'series', confidence=0.5)
# If we have found title without any serie name, replace it by the serie name.
@@ -21,6 +21,7 @@
from __future__ import absolute_import, division, print_function, unicode_literals
import re
from guessit.patterns.list import list_parser, all_separators_re
from guessit.plugins.transformers import Transformer
from guessit.matcher import GuessFinder
@@ -34,9 +35,8 @@ class GuessEpisodesRexps(Transformer):
def __init__(self):
Transformer.__init__(self, 20)
range_separators = ['-', 'to', 'a']
discrete_separators = ['&', 'and', 'et']
of_separators = ['of', 'sur', '/', '\\']
of_separators_re = re.compile(build_or_pattern(of_separators, escape=True), re.IGNORECASE)
season_words = ['seasons?', 'saisons?', 'series?']
episode_words = ['episodes?']
@@ -44,85 +44,14 @@ class GuessEpisodesRexps(Transformer):
season_markers = ['s']
episode_markers = ['e', 'ep']
discrete_sep = sep
for range_separator in range_separators:
discrete_sep = discrete_sep.replace(range_separator, '')
discrete_separators.append(discrete_sep)
all_separators = list(range_separators)
all_separators.extend(discrete_separators)
self.container = PropertiesContainer(enhance=False, canonical_from_pattern=False)
range_separators_re = re.compile(build_or_pattern(range_separators), re.IGNORECASE)
discrete_separators_re = re.compile(build_or_pattern(discrete_separators), re.IGNORECASE)
all_separators_re = re.compile(build_or_pattern(all_separators), re.IGNORECASE)
of_separators_re = re.compile(build_or_pattern(of_separators, escape=True), re.IGNORECASE)
season_words_re = re.compile(build_or_pattern(season_words), re.IGNORECASE)
episode_words_re = re.compile(build_or_pattern(episode_words), re.IGNORECASE)
season_markers_re = re.compile(build_or_pattern(season_markers), re.IGNORECASE)
episode_markers_re = re.compile(build_or_pattern(episode_markers), re.IGNORECASE)
def list_parser(value, property_list_name, discrete_separators_re=discrete_separators_re, range_separators_re=range_separators_re, allow_discrete=False, fill_gaps=False):
discrete_elements = filter(lambda x: x != '', discrete_separators_re.split(value))
discrete_elements = [x.strip() for x in discrete_elements]
proper_discrete_elements = []
i = 0
while i < len(discrete_elements):
if i < len(discrete_elements) - 2 and range_separators_re.match(discrete_elements[i+1]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i+1] + discrete_elements[i+2])
i += 3
else:
match = range_separators_re.search(discrete_elements[i])
if match and match.start() == 0:
proper_discrete_elements[i - 1] += discrete_elements[i]
elif match and match.end() == len(discrete_elements[i]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i + 1])
else:
proper_discrete_elements.append(discrete_elements[i])
i += 1
discrete_elements = proper_discrete_elements
ret = []
for discrete_element in discrete_elements:
range_values = filter(lambda x: x != '', range_separators_re.split(discrete_element))
range_values = [x.strip() for x in range_values]
if len(range_values) > 1:
for x in range(0, len(range_values) - 1):
start_range_ep = parse_numeral(range_values[x])
end_range_ep = parse_numeral(range_values[x+1])
for range_ep in range(start_range_ep, end_range_ep + 1):
if range_ep not in ret:
ret.append(range_ep)
else:
discrete_value = parse_numeral(discrete_element)
if discrete_value not in ret:
ret.append(discrete_value)
if len(ret) > 1:
if not allow_discrete:
valid_ret = list()
# replace discrete elements by ranges
valid_ret.append(ret[0])
for i in range(0, len(ret) - 1):
previous = valid_ret[len(valid_ret) - 1]
if ret[i+1] < previous:
pass
else:
valid_ret.append(ret[i+1])
ret = valid_ret
if fill_gaps:
ret = list(range(min(ret), max(ret) + 1))
if len(ret) > 1:
return {None: ret[0], property_list_name: ret}
if len(ret) > 0:
return ret[0]
return None
def episode_parser_x(value):
return list_parser(value, 'episodeList', discrete_separators_re=re.compile('x', re.IGNORECASE))
@@ -138,34 +67,40 @@ class GuessEpisodesRexps(Transformer):
class ResolutionCollisionValidator(object):
@staticmethod
def validate(prop, string, node, match, entry_start, entry_end):
return len(match.group(2)) < 3 # limit
# Invalidate when season or episode is more than 100.
try:
season_value = season_parser(match.group(2))
episode_value = episode_parser_x(match.group(3))
return season_value < 100 or episode_value < 100
except:
# This may occur for 1xAll or patterns like this.
return True
self.container.register_property(None, r'(' + season_words_re.pattern + sep + '?(?P<season>' + numeral + ')' + sep + '?' + season_words_re.pattern + '?)', confidence=1.0, formatter=parse_numeral)
self.container.register_property(None, r'(' + season_words_re.pattern + sep + '?(?P<season>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*)' + sep + '?' + season_words_re.pattern + '?)' + sep, confidence=1.0, formatter={None: parse_numeral, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), FormatterValidator('season', lambda x: len(x) > 1 if hasattr(x, '__len__') else False)))
self.container.register_property(None, r'(' + season_markers_re.pattern + '(?P<season>' + digital_numeral + ')[^0-9]?' + sep + '?(?P<episodeNumber>(?:e' + digital_numeral + '(?:' + sep + '?[e-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_e, 'season': season_parser}, validator=NoValidator())
# self.container.register_property(None, r'[^0-9]((?P<season>' + digital_numeral + ')[^0-9 .-]?-?(?P<episodeNumber>(?:x' + digital_numeral + '(?:' + sep + '?[x-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_x, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), ResolutionCollisionValidator()))
self.container.register_property(None, r'(' + season_markers_re.pattern + '(?P<season>' + digital_numeral + ')[^0-9]?' + sep + '?(?P<episodeNumber>(?:e' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser, 'season': season_parser}, validator=NoValidator())
self.container.register_property(None, sep + r'((?P<season>' + digital_numeral + ')' + sep + '' + '(?P<episodeNumber>(?:x' + sep + digital_numeral + '(?:' + sep + '[x-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_x, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), ResolutionCollisionValidator()))
self.container.register_property(None, r'((?P<season>' + digital_numeral + ')' + '(?P<episodeNumber>(?:x' + digital_numeral + '(?:[x-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_x, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), ResolutionCollisionValidator()))
self.container.register_property(None, r'(' + season_markers_re.pattern + '(?P<season>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*))', confidence=0.6, formatter={None: parse_numeral, 'season': season_parser}, validator=NoValidator())
self.container.register_property(None, r'((?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.6, formatter=parse_numeral)
self.container.register_property('version', sep + r'(V\d+)' + sep, confidence=0.6, formatter=parse_numeral, validator=NoValidator())
self.container.register_property(None, r'(ep' + sep + r'?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?)', confidence=0.7, formatter=parse_numeral)
self.container.register_property(None, r'(ep' + sep + r'?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.7, formatter=parse_numeral)
self.container.register_property(None, r'(' + episode_markers_re.pattern + '(?P<episodeNumber>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*))', confidence=0.6, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_words_re.pattern + sep + '?(?P<episodeNumber>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*)' + sep + '?' + episode_words_re.pattern + '?)', confidence=0.8, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_markers_re.pattern + '(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.6, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_words_re.pattern + sep + '?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.8, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_markers_re.pattern + '(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.6, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_words_re.pattern + sep + '?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.8, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property('episodeNumber', r'^ ?(\d{2})' + sep, confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^ ?(\d{2})' + sep, confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^ ?0(\d{1,2})' + sep, confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', sep + r'(\d{2}) ?$', confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', sep + r'0(\d{1,2}) ?$', confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^' + sep + '+(\d{2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2}' + ')*)' + sep, confidence=0.4, formatter=episode_parser)
self.container.register_property('episodeNumber', r'^' + sep + '+0(\d{1,2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '0\d{1,2}' + ')*)' + sep, confidence=0.4, formatter=episode_parser)
self.container.register_property('episodeNumber', sep + r'(\d{2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + r'\d{2}' + ')*)' + sep + '+$', confidence=0.4, formatter=episode_parser)
self.container.register_property('episodeNumber', sep + r'0(\d{1,2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + r'0\d{1,2}' + ')*)' + sep + '+$', confidence=0.4, formatter=episode_parser)
self.container.register_property(None, r'((?P<episodeNumber>' + numeral + ')' + sep + '?' + of_separators_re.pattern + sep + '?(?P<episodeCount>' + numeral + ')(?:' + sep + '?(?:episodes?|eps?))?)', confidence=0.7, formatter=parse_numeral)
self.container.register_property(None, r'((?:episodes?|eps?)' + sep + '?(?P<episodeNumber>' + numeral + ')' + sep + '?' + of_separators_re.pattern + sep + '?(?P<episodeCount>' + numeral + '))', confidence=0.7, formatter=parse_numeral)
@@ -186,7 +121,29 @@ class GuessEpisodesRexps(Transformer):
def guess_episodes_rexps(self, string, node=None, options=None):
found = self.container.find_properties(string, node, options)
return self.container.as_guess(found, string)
guess = self.container.as_guess(found, string)
if guess and node:
if 'season' in guess and 'episodeNumber' in guess:
# If two guesses contains both season and episodeNumber in same group, create an episodeList
for existing_guess in node.group_node().guesses:
if 'season' in existing_guess and 'episodeNumber' in existing_guess:
if 'episodeList' not in existing_guess:
existing_guess['episodeList'] = [existing_guess['episodeNumber']]
existing_guess['episodeList'].append(guess['episodeNumber'])
existing_guess['episodeList'].sort()
if existing_guess['episodeNumber'] > guess['episodeNumber']:
existing_guess.set_confidence('episodeNumber', 0)
else:
guess.set_confidence('episodeNumber', 0)
guess['episodeList'] = list(existing_guess['episodeList'])
elif 'episodeNumber' in guess:
# If two guesses contains only episodeNumber in same group, remove the existing one.
for existing_guess in node.group_node().guesses:
if 'episodeNumber' in existing_guess:
for k, v in existing_guess.items():
if k in guess:
del guess[k]
return guess
def should_process(self, mtree, options=None):
return mtree.guess.get('type', '').startswith('episode')
@@ -156,6 +156,13 @@ class GuessFiletype(Transformer):
weak_episode_transformer = get_transformer('guess_weak_episodes_rexps')
if weak_episode_transformer:
found = weak_episode_transformer.container.find_properties(filename, mtree, options, 'episodeNumber')
guess = weak_episode_transformer.container.as_guess(found, filename)
if guess and (guess.raw('episodeNumber')[0] == '0' or guess['episodeNumber'] >= 10):
self.log.debug('Found characteristic property of episodes: %s"', guess)
upgrade_episode()
return filetype_container[0], other
found = properties_transformer.container.find_properties(filename, mtree, options, 'crc32')
guess = properties_transformer.container.as_guess(found, filename)
if guess:
@@ -217,7 +224,8 @@ class GuessFiletype(Transformer):
if mime is not None:
filetype_info.update({'mimetype': mime}, confidence=1.0)
node_ext = mtree.node_at((-1,))
# Retrieve the last node of category path (extension node)
node_ext = list(filter(lambda x: x.category == 'path', mtree.nodes()))[-1]
found_guess(node_ext, filetype_info)
if mtree.guess.get('type') in [None, 'unknown']:
@@ -226,12 +234,21 @@ class GuessFiletype(Transformer):
else:
raise TransformerException(__name__, 'Unknown file type')
def post_process(self, mtree, options=None):
# now look whether there are some specific hints for episode vs movie
# If we have a date and no year, this is a TV Show.
if 'date' in mtree.info and 'year' not in mtree.info and mtree.info.get('type') != 'episode':
mtree.guess['type'] = 'episode'
for type_leaves in mtree.leaves_containing('type'):
type_leaves.guess['type'] = 'episode'
for title_leaves in mtree.leaves_containing('title'):
title_leaves.guess.rename('title', 'series')
def second_pass_options(self, mtree, options=None):
if 'type' not in options or not options['type']:
if mtree.info.get('type') != 'episode':
# now look whether there are some specific hints for episode vs movie
# If we have a date and no year, this is a TV Show.
if 'date' in mtree.info and 'year' not in mtree.info:
return {'type': 'episode'}
if mtree.info.get('type') != 'movie':
# If we have a year, no season but raw episodeNumber is a number not starting with '0', this is a movie.
if 'year' in mtree.info and 'episodeNumber' in mtree.info and not 'season' in mtree.info:
try:
int(mtree.raw['episodeNumber'])
return {'type': 'movie'}
except ValueError:
pass
@@ -43,6 +43,12 @@ class GuessLanguage(Transformer):
allowed_languages = None
if options and 'allowed_languages' in options:
allowed_languages = options.get('allowed_languages')
directory = list(filter(lambda x: x.category == 'path', node.ancestors))[0]
if len(directory.clean_value) <= 3:
# skip if we have a langage code as directory
return None
guess = search_language(string, allowed_languages)
return guess
@@ -68,8 +74,10 @@ class GuessLanguage(Transformer):
title_ends = {}
for unidentified_node in mtree.unidentified_leaves():
unidentified_starts[unidentified_node.span[0]] = unidentified_node
unidentified_ends[unidentified_node.span[1]] = unidentified_node
if len(unidentified_node.clean_value) > 1:
# only consider unidentified leaves that have some meaningful content
unidentified_starts[unidentified_node.span[0]] = unidentified_node
unidentified_ends[unidentified_node.span[1]] = unidentified_node
for property_node in mtree.leaves_containing('year'):
property_starts[property_node.span[0]] = property_node
@@ -79,19 +87,20 @@ class GuessLanguage(Transformer):
title_starts[title_node.span[0]] = title_node
title_ends[title_node.span[1]] = title_node
return node.span[0] in title_ends.keys() and (node.span[1] in unidentified_starts.keys() or node.span[1] + 1 in property_starts.keys()) or\
node.span[1] in title_starts.keys() and (node.span[0] == node.group_node().span[0] or node.span[0] in unidentified_ends.keys() or node.span[0] in property_ends.keys())
return (node.span[0] in title_ends.keys() and (node.span[1] in unidentified_starts.keys() or
node.span[1] + 1 in property_starts.keys()) or
node.span[1] in title_starts.keys() and (node.span[0] == node.group_node().span[0] or
node.span[0] in unidentified_ends.keys() or
node.span[0] in property_ends.keys()))
def second_pass_options(self, mtree, options=None):
m = mtree.matched()
to_skip_language_nodes = []
to_skip_langs = set()
for lang_key in ('language', 'subtitleLanguage'):
langs = {}
lang_nodes = set(mtree.leaves_containing(lang_key))
for lang_node in lang_nodes:
lang = lang_node.guess.get(lang_key, None)
if self._skip_language_on_second_pass(mtree, lang_node):
# Language probably split the title. Add to skip for 2nd pass.
@@ -99,38 +108,19 @@ class GuessLanguage(Transformer):
# the extension, then it is likely a subtitle language
parts = mtree.clean_string(lang_node.root.value).split()
if m.get('type') in ['moviesubtitle', 'episodesubtitle']:
if lang_node.value in parts and \
(parts.index(lang_node.value) == len(parts) - 2):
if (lang_node.value in parts and parts.index(lang_node.value) == len(parts) - 2):
continue
to_skip_language_nodes.append(lang_node)
elif lang not in langs:
langs[lang] = lang_node
else:
# The same language was found. Keep the more confident one,
# and add others to skip for 2nd pass.
existing_lang_node = langs[lang]
to_skip = None
if (existing_lang_node.guess.confidence('language') >=
lang_node.guess.confidence('language')):
# lang_node is to remove
to_skip = lang_node
else:
# existing_lang_node is to remove
langs[lang] = lang_node
to_skip = existing_lang_node
to_skip_language_nodes.append(to_skip)
if to_skip_language_nodes:
to_skip_langs.add(lang_node.value)
if to_skip_langs:
# Also skip same value nodes
skipped_values = [skip_node.value for skip_node in to_skip_language_nodes]
lang_nodes = (set(mtree.leaves_containing('language')) |
set(mtree.leaves_containing('subtitleLanguage')))
for lang_key in ('language', 'subtitleLanguage'):
lang_nodes = set(mtree.leaves_containing(lang_key))
to_skip = [node for node in lang_nodes if node.value in to_skip_langs]
return {'skip_nodes': to_skip}
for lang_node in lang_nodes:
if lang_node not in to_skip_language_nodes and lang_node.value in skipped_values:
to_skip_language_nodes.append(lang_node)
return {'skip_nodes': to_skip_language_nodes}
return None
def should_process(self, mtree, options=None):
@@ -149,6 +139,8 @@ class GuessLanguage(Transformer):
def post_process(self, mtree, options=None):
# 1- try to promote language to subtitle language where it makes sense
prefixes = []
for node in mtree.nodes():
if 'language' not in node.guess:
continue
@@ -157,7 +149,8 @@ class GuessLanguage(Transformer):
# the group is the last group of the filename, it is probably the
# language of the subtitle
# (eg: 'xxx.english.srt')
if (mtree.node_at((-1,)).value.lower() in subtitle_exts and
ext_node = list(filter(lambda x: x.category == 'path', mtree.nodes()))[-1]
if (ext_node.value.lower() in subtitle_exts and
node == list(mtree.leaves())[-2]):
self.promote_subtitle(node)
@@ -171,11 +164,7 @@ class GuessLanguage(Transformer):
for sub_prefix in subtitle_prefixes:
if (sub_prefix in find_words(group_str) and
0 <= group_str.find(sub_prefix) < (node.span[0] - explicit_group.span[0])):
self.promote_subtitle(node)
for sub_suffix in subtitle_suffixes:
if (sub_suffix in find_words(group_str) and
(node.span[0] - explicit_group.span[0]) < group_str.find(sub_suffix)):
prefixes.append((explicit_group, sub_prefix))
self.promote_subtitle(node)
# - if a language is in an explicit group just preceded by "st",
@@ -187,3 +176,21 @@ class GuessLanguage(Transformer):
self.promote_subtitle(node)
except IndexError:
pass
for node in mtree.nodes():
if 'language' not in node.guess:
continue
explicit_group = mtree.node_at(node.node_idx[:2])
group_str = explicit_group.value.lower()
for sub_suffix in subtitle_suffixes:
if (sub_suffix in find_words(group_str) and
(node.span[0] - explicit_group.span[0]) < group_str.find(sub_suffix)):
is_a_prefix = False
for prefix in prefixes:
if prefix[0] == explicit_group and group_str.find(prefix[1]) == group_str.find(sub_suffix):
is_a_prefix = True
break
if not is_a_prefix:
self.promote_subtitle(node)
@@ -23,6 +23,8 @@ from __future__ import absolute_import, division, print_function, unicode_litera
from guessit.plugins.transformers import Transformer
from guessit.matcher import found_property
from guessit import u
from guessit.patterns.list import all_separators
from guessit.language import all_lang_prefixes_suffixes
class GuessMovieTitleFromPosition(Transformer):
@@ -36,6 +38,13 @@ class GuessMovieTitleFromPosition(Transformer):
options = options or {}
return not options.get('skip_title') and not mtree.guess.get('type', '').startswith('episode')
@staticmethod
def excluded_word(*values):
for value in values:
if value.clean_value.lower() in all_separators + all_lang_prefixes_suffixes:
return True
return False
def process(self, mtree, options=None):
"""
try to identify the remaining unknown groups by looking at their
@@ -44,14 +53,16 @@ class GuessMovieTitleFromPosition(Transformer):
if 'title' in mtree.info:
return
basename = mtree.node_at((-2,))
path_nodes = list(filter(lambda x: x.category == 'path', mtree.nodes()))
basename = path_nodes[-2]
all_valid = lambda leaf: len(leaf.clean_value) > 0
basename_leftover = list(basename.unidentified_leaves(valid=all_valid))
try:
folder = mtree.node_at((-3,))
folder = path_nodes[-3]
folder_leftover = list(folder.unidentified_leaves())
except ValueError:
except IndexError:
folder = None
folder_leftover = []
@@ -61,7 +72,9 @@ class GuessMovieTitleFromPosition(Transformer):
# specific cases:
# if we find the same group both in the folder name and the filename,
# it's a good candidate for title
if folder_leftover and basename_leftover and folder_leftover[0].clean_value == basename_leftover[0].clean_value:
if (folder_leftover and basename_leftover and
folder_leftover[0].clean_value == basename_leftover[0].clean_value and
not GuessMovieTitleFromPosition.excluded_word(folder_leftover[0])):
found_property(folder_leftover[0], 'title', confidence=0.8)
return
@@ -89,7 +102,8 @@ class GuessMovieTitleFromPosition(Transformer):
if (series.clean_value != title.clean_value and
series.clean_value != film_number.clean_value and
basename_leaves.index(film_number) == 0 and
basename_leaves.index(title) == 1):
basename_leaves.index(title) == 1 and
not GuessMovieTitleFromPosition.excluded_word(title, series)):
found_property(title, 'title', confidence=0.6)
found_property(series, 'filmSeries', confidence=0.6)
@@ -103,8 +117,9 @@ class GuessMovieTitleFromPosition(Transformer):
if groups_before:
try:
node = next(groups_before)
found_property(node, 'title', confidence=0.8)
return
if not GuessMovieTitleFromPosition.excluded_word(node):
found_property(node, 'title', confidence=0.8)
return
except StopIteration:
pass
@@ -125,8 +140,10 @@ class GuessMovieTitleFromPosition(Transformer):
# if they're all in the same group, take leftover info from there
leftover = mtree.node_at((group_idx,)).unidentified_leaves()
try:
found_property(next(leftover), 'title', confidence=0.7)
return
node = next(leftover)
if not GuessMovieTitleFromPosition.excluded_word(node):
found_property(node, 'title', confidence=0.7)
return
except StopIteration:
pass
@@ -138,7 +155,8 @@ class GuessMovieTitleFromPosition(Transformer):
# ex: Movies/Alice in Wonderland DVDRip.XviD-DiAMOND/dmd-aw.avi
# ex: Movies/Somewhere.2010.DVDRip.XviD-iLG/i-smwhr.avi <-- TODO: gets caught here?
if (basename_leftover[0].clean_value.count(' ') == 0 and
folder_leftover and folder_leftover[0].clean_value.count(' ') >= 2):
folder_leftover and folder_leftover[0].clean_value.count(' ') >= 2 and
not GuessMovieTitleFromPosition.excluded_word(folder_leftover[0])):
found_property(folder_leftover[0], 'title', confidence=0.7)
return
@@ -148,26 +166,28 @@ class GuessMovieTitleFromPosition(Transformer):
# ex: Movies/[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi
if basename_leftover[0].is_explicit():
for basename_leftover_elt in basename_leftover:
if not basename_leftover_elt.is_explicit():
if not basename_leftover_elt.is_explicit() and not GuessMovieTitleFromPosition.excluded_word(basename_leftover_elt):
found_property(basename_leftover_elt, 'title', confidence=0.8)
return
# if all else fails, take the first remaining unidentified group in the
# basename as title
found_property(basename_leftover[0], 'title', confidence=0.6)
return
if not GuessMovieTitleFromPosition.excluded_word(basename_leftover[0]):
found_property(basename_leftover[0], 'title', confidence=0.6)
return
# if there are no leftover groups in the basename, look in the folder name
if folder_leftover:
if folder_leftover and not GuessMovieTitleFromPosition.excluded_word(folder_leftover[0]):
found_property(folder_leftover[0], 'title', confidence=0.5)
return
# if nothing worked, look if we have a very small group at the beginning
# of the basename
basename = mtree.node_at((-2,))
basename_leftover = basename.unidentified_leaves(valid=lambda leaf: True)
try:
found_property(next(basename_leftover), 'title', confidence=0.4)
return
node = next(basename_leftover)
if not GuessMovieTitleFromPosition.excluded_word(node):
found_property(node, 'title', confidence=0.4)
return
except StopIteration:
pass
@@ -22,7 +22,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import re
from guessit.containers import PropertiesContainer, WeakValidator, LeavesValidator, QualitiesContainer, ChainedValidator, DefaultValidator, OnlyOneValidator, LeftValidator, NeighborValidator
from guessit.containers import PropertiesContainer, WeakValidator, LeavesValidator, QualitiesContainer, ChainedValidator, DefaultValidator, OnlyOneValidator, LeftValidator, NeighborValidator, FullMatchValidator
from guessit.patterns import sep, build_or_pattern
from guessit.patterns.extension import subtitle_exts, video_exts, info_exts
from guessit.patterns.numeral import numeral, parse_numeral
@@ -61,7 +61,6 @@ class GuessProperties(Transformer):
for canonical_form, quality in quality_dict.items():
self.qualities.register_quality(propname, canonical_form, quality)
register_property('container', {'mp4': ['MP4']})
# http://en.wikipedia.org/wiki/Pirated_movie_release_types
register_property('format', {'VHS': ['VHS', 'VHS-Rip'],
@@ -74,11 +73,11 @@ class GuessProperties(Transformer):
'TV': ['SD-TV', 'SD-TV-Rip', 'Rip-SD-TV', 'TV-Rip', 'Rip-TV'],
'DVB': ['DVB-Rip', 'DVB', 'PD-TV'],
'DVD': ['DVD', 'DVD-Rip', 'VIDEO-TS', 'DVD-R', 'DVD-9', 'DVD-5'],
'HDTV': ['HD-TV', 'TV-RIP-HD', 'HD-TV-RIP'],
'HDTV': ['HD-TV', 'TV-RIP-HD', 'HD-TV-RIP', 'HD-RIP'],
'VOD': ['VOD', 'VOD-Rip'],
'WEBRip': ['WEB-Rip'],
'WEB-DL': ['WEB-DL', 'WEB-HD', 'WEB'],
'HD-DVD': ['HD-(?:DVD)?-Rip', 'HD-DVD'],
'HD-DVD': ['HD-DVD-Rip', 'HD-DVD'],
'BluRay': ['Blu-ray(?:-Rip)?', 'B[DR]', 'B[DR]-Rip', 'BD[59]', 'BD25', 'BD50']
})
@@ -112,32 +111,13 @@ class GuessProperties(Transformer):
},
validator=ChainedValidator(DefaultValidator(), OnlyOneValidator()))
class ResolutionValidator(object):
"""Make sure our match is surrounded by separators, or by another entry"""
@staticmethod
def validate(prop, string, node, match, entry_start, entry_end):
"""
span = _get_span(prop, match)
span = _trim_span(span, string[span[0]:span[1]])
start, end = span
sep_start = start <= 0 or string[start - 1] in sep
sep_end = end >= len(string) or string[end] in sep
start_by_other = start in entry_end
end_by_other = end in entry_start
if (sep_start or start_by_other) and (sep_end or end_by_other):
return True
return False
"""
return True
_digits_re = re.compile('\d+')
def resolution_formatter(value):
digits = _digits_re.findall(value)
return 'x'.join(digits)
self.container.register_property('screenSize', '\d{3,4}-?[x\*]-?\d{3,4}', canonical_from_pattern=False, formatter=resolution_formatter, validator=ChainedValidator(DefaultValidator(), ResolutionValidator()))
self.container.register_property('screenSize', '\d{3,4}-?[x\*]-?\d{3,4}', canonical_from_pattern=False, formatter=resolution_formatter)
register_quality('screenSize', {'360p': -300,
'368p': -200,
@@ -239,8 +219,8 @@ class GuessProperties(Transformer):
self.container.register_property('crc32', '(?:[a-fA-F]|[0-9]){8}', enhance=False, canonical_from_pattern=False)
weak_episode_words = ['pt', 'part']
self.container.register_property(None, '(' + build_or_pattern(weak_episode_words) + sep + '?(?P<part>' + numeral + '))[^0-9]', enhance=False, canonical_from_pattern=False, confidence=0.4, formatter=parse_numeral)
part_words = ['pt', 'part']
self.container.register_property(None, '(' + build_or_pattern(part_words) + sep + '?(?P<part>' + numeral + '))[^0-9]', enhance=False, canonical_from_pattern=False, confidence=0.4, formatter=parse_numeral)
register_property('other', {'AudioFix': ['Audio-Fix', 'Audio-Fixed'],
'SyncFix': ['Sync-Fix', 'Sync-Fixed'],
@@ -249,13 +229,15 @@ class GuessProperties(Transformer):
'Netflix': ['Netflix', 'NF']
})
self.container.register_property('other', 'Real', 'Fix', canonical_form='Proper', validator=NeighborValidator())
self.container.register_property('other', 'Real', 'Fix', canonical_form='Proper', validator=ChainedValidator(FullMatchValidator(), NeighborValidator()))
self.container.register_property('other', 'Proper', 'Repack', 'Rerip', canonical_form='Proper')
self.container.register_property('other', 'Fansub', canonical_form='Fansub')
self.container.register_property('other', 'Fastsub', canonical_form='Fastsub')
self.container.register_property('other', 'Fansub', canonical_form='Fansub', validator=ChainedValidator(FullMatchValidator(), NeighborValidator()))
self.container.register_property('other', 'Fastsub', canonical_form='Fastsub', validator=ChainedValidator(FullMatchValidator(), NeighborValidator()))
self.container.register_property('other', '(?:Seasons?' + sep + '?)?Complete', canonical_form='Complete')
self.container.register_property('other', 'R5', 'RC', canonical_form='R5')
self.container.register_property('other', 'Pre-Air', 'Preair', canonical_form='Preair')
self.container.register_property('other', 'CC') # Close Caption
self.container.register_property('other', 'LD', 'MD') # Line/Mic Dubbed
self.container.register_canonical_properties('other', 'Screener', 'Remux', '3D', 'HD', 'mHD', 'HDLight', 'HQ',
'DDC',
@@ -271,10 +253,29 @@ class GuessProperties(Transformer):
def guess_properties(self, string, node=None, options=None):
found = self.container.find_properties(string, node, options)
return self.container.as_guess(found, string)
guess = self.container.as_guess(found, string)
if guess and node:
if 'part' in guess:
# If two guesses contains both part in same group, create an partList
for existing_guess in node.group_node().guesses:
if 'part' in existing_guess:
if 'partList' not in existing_guess:
existing_guess['partList'] = [existing_guess['part']]
existing_guess['partList'].append(guess['part'])
existing_guess['partList'].sort()
if existing_guess['part'] > guess['part']:
existing_guess.set_confidence('part', 0)
else:
guess.set_confidence('part', 0)
guess['partList'] = list(existing_guess['partList'])
return guess
def supported_properties(self):
return self.container.get_supported_properties()
supported_properties = list(self.container.get_supported_properties())
supported_properties.append('partList')
return supported_properties
def process(self, mtree, options=None):
GuessFinder(self.guess_properties, 1.0, self.log, options).process_nodes(mtree.unidentified_leaves())
@@ -93,8 +93,12 @@ class GuessReleaseGroup(Transformer):
return False
if self.re_sep.match(val[-1]):
val = val[:len(val)-1]
if not val:
return False
if self.re_sep.match(val[0]):
val = val[1:]
if not val:
return False
guess['releaseGroup'] = val
forbidden = False
for forbidden_lambda in self._forbidden_groupname_lambda:
@@ -21,6 +21,7 @@
from __future__ import absolute_import, division, print_function, unicode_literals
import re
from guessit.patterns.list import list_parser, all_separators_re
from guessit.plugins.transformers import Transformer
@@ -38,11 +39,14 @@ class GuessWeakEpisodesRexps(Transformer):
of_separators = ['of', 'sur', '/', '\\']
of_separators_re = re.compile(build_or_pattern(of_separators, escape=True), re.IGNORECASE)
self.container = PropertiesContainer(enhance=False, canonical_from_pattern=False)
self.container = PropertiesContainer(enhance=False, canonical_from_pattern=False, remove_duplicates=True)
episode_words = ['episodes?']
def _formater(episode_number):
def episode_list_parser(value):
return list_parser(value, 'episodeList')
def season_episode_parser(episode_number):
epnum = parse_numeral(episode_number)
if not valid_year(epnum):
if epnum > 100:
@@ -55,24 +59,46 @@ class GuessWeakEpisodesRexps(Transformer):
else:
return epnum
self.container.register_property(['episodeNumber', 'season'], '[0-9]{2,4}', confidence=0.6, formatter=_formater, disabler=lambda options: options.get('episode_prefer_number') if options else False)
self.container.register_property(['episodeNumber', 'season'], '[0-9]{4}', confidence=0.6, formatter=_formater)
self.container.register_property('episodeNumber', '[^0-9](\d{1,3})', confidence=0.6, formatter=parse_numeral, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property(['episodeNumber', 'season'], '[0-9]{2,4}', confidence=0.6, formatter=season_episode_parser, disabler=lambda options: options.get('episode_prefer_number') if options else False)
self.container.register_property(['episodeNumber', 'season'], '[0-9]{4}', confidence=0.6, formatter=season_episode_parser)
self.container.register_property(None, '(' + build_or_pattern(episode_words) + sep + '?(?P<episodeNumber>' + numeral + '))[^0-9]', confidence=0.4, formatter=parse_numeral)
self.container.register_property(None, r'(?P<episodeNumber>' + numeral + ')' + sep + '?' + of_separators_re.pattern + sep + '?(?P<episodeCount>' + numeral +')', confidence=0.6, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^' + sep + '?(\d{1,3})' + sep, confidence=0.4, formatter=parse_numeral, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', sep + r'(\d{1,3})' + sep + '?$', confidence=0.4, formatter=parse_numeral, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', '[^0-9](\d{2,3}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2,3}' + ')*)', confidence=0.4, formatter=episode_list_parser, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', r'^' + sep + '?(\d{2,3}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2,3}' + ')*)' + sep, confidence=0.4, formatter=episode_list_parser, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', sep + r'(\d{2,3}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2,3}' + ')*)' + sep + '?$', confidence=0.4, formatter=episode_list_parser, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
def supported_properties(self):
return self.container.get_supported_properties()
def guess_weak_episodes_rexps(self, string, node=None, options=None):
if node and 'episodeNumber' in node.root.info:
return None
properties = self.container.find_properties(string, node, options)
guess = self.container.as_guess(properties, string)
if node and guess:
if 'episodeNumber' in guess and 'season' in guess:
existing_guesses = list(filter(lambda x: 'season' in x and 'episodeNumber' in x, node.group_node().guesses))
if existing_guesses:
return None
elif 'episodeNumber' in guess:
# If we only have episodeNumber in the guess, and another node contains both season and episodeNumber
# keep only the second.
safe_guesses = list(filter(lambda x: 'season' in x and 'episodeNumber' in x, node.group_node().guesses))
if safe_guesses:
return None
else:
# If we have other nodes containing episodeNumber, create an episodeList.
existing_guesses = list(filter(lambda x: 'season' not in x and 'episodeNumber' in x, node.group_node().guesses))
for existing_guess in existing_guesses:
if 'episodeList' not in existing_guess:
existing_guess['episodeList'] = [existing_guess['episodeNumber']]
existing_guess['episodeList'].append(guess['episodeNumber'])
existing_guess['episodeList'].sort()
if existing_guess['episodeNumber'] > guess['episodeNumber']:
existing_guess.set_confidence('episodeNumber', 0)
else:
guess.set_confidence('episodeNumber', 0)
guess['episodeList'] = list(existing_guess['episodeList'])
return guess
def should_process(self, mtree, options=None):
@@ -42,8 +42,13 @@ class GuessYear(Transformer):
def second_pass_options(self, mtree, options=None):
year_nodes = list(mtree.leaves_containing('year'))
if len(year_nodes) > 1:
return {'skip_nodes': year_nodes[:len(year_nodes) - 1]}
# if we found a year, let's try by ignoring all instances of that year
# as a candidate, let's take the one that appears last in the filename
if year_nodes:
year_candidate = year_nodes[-1].guess['year']
year_nodes = [year for year in year_nodes if year.guess['year'] != year_candidate]
if year_nodes:
return {'skip_nodes': year_nodes}
return None
def process(self, mtree, options=None):
@@ -37,7 +37,7 @@ class SplitExplicitGroups(Transformer):
:return: return the string split into explicit groups, that is, those either
between parenthese, square brackets or curly braces, and those separated
by a dash."""
for c in mtree.children:
for c in mtree.unidentified_leaves():
groups = find_first_level_groups(c.value, group_delimiters[0])
for delimiters in group_delimiters:
flatten = lambda l, x: l + find_first_level_groups(x, delimiters)
@@ -47,4 +47,24 @@ class SplitExplicitGroups(Transformer):
# patterns, such as dates, etc...
# groups = functools.reduce(lambda l, x: l + x.split('-'), groups, [])
c.split_on_components(groups)
c.split_on_components(groups, category='explicit')
def post_process(self, mtree, options=None):
"""
Decrease confidence for properties found in explicit groups.
:param mtree:
:param options:
:return:
"""
if not options.get('name_only'):
explicit_nodes = [node for node in mtree.nodes() if node.category == 'explicit' and node.is_explicit()]
for explicit_node in explicit_nodes:
self.alter_confidence(explicit_node, 0.5)
def alter_confidence(self, node, factor):
for guess in node.guesses:
for k in guess.keys():
confidence = guess.confidence(k)
guess.set_confidence(k, confidence * factor)
@@ -45,4 +45,4 @@ class SplitOnDash(Transformer):
match = pattern.search(node.value, span[1])
if indices:
node.partition(indices)
node.partition(indices, category='dash')
@@ -41,6 +41,32 @@ class SplitPathComponents(Transformer):
components += list(splitext(basename))
components[-1] = components[-1][1:] # remove the '.' from the extension
mtree.split_on_components(components)
mtree.split_on_components(components, category='path')
else:
mtree.split_on_components([mtree.value, ''])
mtree.split_on_components([mtree.value, ''], category='path')
def post_process(self, mtree, options=None):
"""
Decrease confidence for properties found in directories, filename should always have priority.
:param mtree:
:param options:
:return:
"""
if not options.get('name_only'):
path_nodes = [node for node in mtree.nodes() if node.category == 'path']
for path_node in path_nodes[:-2]:
self.alter_confidence(path_node, 0.3)
try:
last_directory_node = path_nodes[-2]
self.alter_confidence(last_directory_node, 0.6)
except IndexError:
pass
def alter_confidence(self, node, factor):
for guess in node.guesses:
for k in guess.keys():
confidence = guess.confidence(k)
guess.set_confidence(k, confidence * factor)

Some files were not shown because too many files have changed in this diff Show More