Compare commits

...

156 Commits

Author SHA1 Message Date
panni c6a1df9a79 release 2.6.5.3152 2019-10-26 04:38:34 +02:00
panni 77861a4c6d release 2.6.5.3152 2019-10-26 04:38:03 +02:00
panni bf1e1c3139 bump dev 2019-10-26 04:17:25 +02:00
panni a564a1d808 providers: addic7ed: wait a short while between retries and after successfully logging in 2019-10-26 04:14:38 +02:00
panni 22ac935f9b core: scanning: add additional INFO logging for undetected languages 2019-10-23 14:12:37 +02:00
panni 02e2bcb417 bump dev 2019-10-21 17:21:01 +02:00
panni 3445259cde providers: addic7ed: fix detection of completed subtitle (#686) 2019-10-21 17:15:02 +02:00
panni c20c32c17d providers: addic7ed: refresh show IDs if stored ones were empty 2019-10-21 17:10:34 +02:00
panni 3fec766890 core: fix #688 2019-10-21 16:03:40 +02:00
panni f208a24213 providers: addic7ed: fix bungled show_ids reference; #686 2019-10-20 16:53:13 +02:00
panni 9e9dfb3f4d providers: addic7ed: fix Mayans M.C.; add logging; fix AuthenticationError 2019-10-20 05:50:33 +02:00
panni 83eecf09ed bump dev 2019-10-20 05:20:13 +02:00
panni 1aebe8d0dd providers: addic7ed: fix getting show ids (failing on foreign characters)
providers: addic7ed: don't run anything if no credentials given
providers: addic7ed: actually try three times to log in
providers: addic7ed: store last show ids fetch; when show id not found, re-try once per day
2019-10-20 05:19:05 +02:00
panni bb64e482df providers: addic7ed: fix getting show list (failing on foreign characters)
providers: addic7ed: don't run anything if no credentials given
providers: addic7ed: actually try three times to log in
2019-10-20 05:04:02 +02:00
panni 1841a72ca7 bump dev 2019-10-20 04:00:23 +02:00
panni 997d4aa1cf core: don't process any further if stream info is missing 2019-10-20 03:59:39 +02:00
panni d517e86333 core: don't fall back to default providers if none enabled 2019-10-20 03:45:16 +02:00
panni 8bcfc712fb updated dev 2019-10-19 23:21:03 +02:00
panni c0cf2fd78e providers: argenteam: bazarr-backport: use new url; fixes 2019-10-19 23:19:19 +02:00
panni 0a7de0e9b6 core: bazarr-backport: generic 10 minute throttling if uncaught exception occurs; also for downloads 2019-10-19 23:17:09 +02:00
panni 1e2a127dac core: bazarr-backport: generic 10 minute throttling if uncaught exception occurs 2019-10-19 23:13:57 +02:00
panni 5b8cd215e4 core: backport removal of existing subtitle file from bazarr, to support MergerFS 2019-10-19 23:02:57 +02:00
panni 7583edf3fe providers: addic7ed: move re to correct place; fix show match; #686 2019-10-19 15:36:46 +02:00
panni 2f219a1a81 providers: addic7ed: fix show match 2019-10-19 15:35:19 +02:00
panni 9127c38297 bump dev 2019-10-19 07:15:45 +02:00
panni 0c379f8b9f providers: addic7ed: add timeout on authentication error 2019-10-19 07:15:07 +02:00
panni d2b617bdf4 addicted; fix #686 2019-10-19 06:55:32 +02:00
panni 6d6f6d9356 forgot to fix #681 2019-10-19 06:54:38 +02:00
panni 8ffb20ebe3 try fixing #681 2019-10-07 15:13:00 +02:00
panni eed7b9da0c core: support using mediainfo for retrieving MP4 MOV_TEXT subtitle stream titles (PMS bug) 2019-10-05 16:47:45 +02:00
panni 802381b2bc back to dev 2019-10-05 04:25:48 +02:00
panni f265c861d2 back to dev 2019-10-05 04:24:20 +02:00
panni 1dc7b4b5e4 back to dev 2019-10-05 04:15:11 +02:00
panni c48aa2b255 release 2.6.5.3152 2019-10-05 04:14:28 +02:00
panni 66859802f9 update readme 2019-10-05 04:13:50 +02:00
panni 433c8e987b back from dev 2019-10-05 04:07:52 +02:00
panni aa477ca48c Merge branch 'develop-2.6' 2019-10-05 04:07:33 +02:00
pannal 65b502afa4 bump dev 2019-09-21 16:25:33 +02:00
pannal 06c0b44589 fix Dicked.get 2019-09-21 16:24:21 +02:00
pannal d651f2cbb7 bump dev 2019-09-20 18:24:16 +02:00
pannal 8b5be8ea4b #676 improve 2019-09-20 18:14:14 +02:00
pannal f4e82c560d core: fix default values of opensubtitles-skip-wrong-fps, use_https; fix #676 2019-09-20 18:10:16 +02:00
panni c23b3e93a6 bump dev 2019-08-31 14:33:04 +02:00
panni de447d2d0b core: fix for determining whether to search under certain circumstances; fixes #666 2019-08-31 14:32:42 +02:00
panni 95b1272018 explicit language=None check 2019-08-25 06:12:48 +02:00
pannal 11d111da7c Update README.md 2019-08-24 04:51:26 +02:00
pannal 638dec0f04 Update README.md 2019-08-24 04:45:45 +02:00
panni a0ab6e406a providers: titlovi: raise ConfigurationError if credentials aren't given 2019-08-22 15:46:38 +02:00
panni 23242c0f52 bump dev 2019-08-22 15:34:45 +02:00
pannal 48bf70e825 Merge pull request #660 from viking1304/develop-2.6
New implementation of Titlovi using API
2019-08-22 15:32:43 +02:00
panni ada0b96872 #664 fix missing language processing of multiple videos refreshed at once 2019-08-22 14:58:28 +02:00
panni 0e4917bba9 #661 further improvements 2019-08-13 18:06:06 +02:00
panni 8169d31e86 #661 fix bad condition 2019-08-13 13:01:05 +02:00
panni 75b83aa163 #661 fix match strictness when determining preexisting external subtitles 2019-08-13 12:56:08 +02:00
viking1304 d2022de970 Removed titlovi from AntiCaptcha lablel 2019-08-12 21:02:51 +02:00
viking1304 8db1cdacb4 Revert "Disable provider Titlovi if user and password are not set"
This reverts commit 527d171a6a.
2019-08-09 19:01:08 +02:00
viking1304 527d171a6a Disable provider Titlovi if user and password are not set 2019-08-09 18:54:35 +02:00
viking1304 20620cfa7e New implentation of Titlovi using API 2019-08-09 18:22:42 +02:00
panni 4d03ca078d back to dev 2019-08-09 03:38:54 +02:00
panni 775e2cca47 Merge remote-tracking branch 'origin/master' 2019-08-09 03:13:46 +02:00
panni 7cb2486d3e release 2.6.5.3124 2019-08-09 03:13:34 +02:00
panni 02a3ecc9fe prepare for release 2019-08-09 03:11:55 +02:00
panni 54435398af bump dev 2019-08-08 15:09:04 +02:00
panni ffc42883de core: extract embedded/menu: fix detection of unknown streams; don't use unknown streams if a known language was previously found 2019-08-08 14:30:29 +02:00
panni 0cf0371a43 core: language: use replacement map from bazarr 2019-08-06 18:04:34 +02:00
panni f5156bcea7 providers: titlovi: fix matching 2019-07-27 03:10:08 +02:00
panni efdf3b2c9d core: http: fallback to default DNS when normal resolving fails; fixes #657 2019-07-27 02:57:35 +02:00
panni c3d3163392 providers: subscene: fix unknown language code error when "empty" result is returned 2019-07-05 13:56:21 +02:00
panni c91d5ca483 providers: subscene: add support for pt-BR (based on https://github.com/Diaoul/subliminal/pull/740/commits/b22cf08a5d0e7082b0dc6c0de8cc764f01233625) 2019-07-05 13:55:52 +02:00
panni 5f0982970d docker/bazarr compat 2019-07-05 03:22:06 +02:00
panni ee05da70f4 providers: subscene: explicitly set account filters for languages 2019-06-23 15:26:41 +02:00
panni 04c283c48d providers: subscene: limit alternative searches to 3; set throttle to 8 2019-06-23 04:24:18 +02:00
panni 836945c95c providers: subscene: move login/cookies to initialization sequence 2019-06-22 16:45:31 +02:00
panni bd4c180c07 submod: generic: en: fix ";='s 2019-06-22 04:29:03 +02:00
pannal e1f5290365 Update README.md 2019-06-22 04:07:30 +02:00
panni eefffcfb1b back to dev 2019-06-22 04:06:10 +02:00
panni 9e088a5e9d release 2.6.5.3109 2019-06-22 04:05:37 +02:00
panni 317c02bf06 prepare next release 2019-06-22 04:04:22 +02:00
panni 22724c269c core: bazarr compat 2019-06-21 15:04:00 +02:00
panni 2a48782b6b core: bazarr compat 2019-06-21 15:00:35 +02:00
panni e7c3039fde providers: subscene: detect login availability; fallback to non year results if none found with year 2019-06-21 04:49:54 +02:00
panni 2afba02b59 bump dev 2019-06-21 04:17:15 +02:00
panni 94928c2930 providers: add Napisy24 (polish) 2019-06-21 04:16:46 +02:00
panni 2c25191291 providers: subscene: support logging in 2019-06-20 16:11:21 +02:00
panni ba2f3f2172 back to dev 2019-06-06 02:18:34 +02:00
panni aa5cba9347 release 2.6.5.3099 2019-06-06 02:15:23 +02:00
panni 5f40452f57 release 2.6.5.3099 2019-06-06 02:14:50 +02:00
panni 2dd9b1723b core: allow system DNS again by putting "system" as the DNS 2019-06-06 02:13:20 +02:00
panni ee54839f28 back to dev 2019-06-06 01:45:16 +02:00
panni c2f054a25e release 2.6.5.3096 2019-06-06 01:41:57 +02:00
panni f095d5c99c providers: subscene: remove obsolete exception handling 2019-06-06 01:38:28 +02:00
panni ab93f9809a providers: subscene: dumb down endpoint detection; adapt 2019-06-06 01:31:27 +02:00
panni bbb9a62357 back to dev 2019-05-30 04:23:49 +02:00
panni 82ffed699f release 2.6.5.3092 2019-05-30 04:23:07 +02:00
panni 4751ea8396 bump dev 2019-05-30 04:21:50 +02:00
panni c15d8fbe58 bump dev 2019-05-30 04:14:38 +02:00
panni b379468b47 properly re-raise 2019-05-30 04:13:00 +02:00
panni 0deb3eae21 providers: subscene: react to new endpoint; store and use new endpoint 2019-05-30 04:11:12 +02:00
panni 0c1042ec5c bump dev 2019-05-27 12:39:04 +02:00
panni 05d0de5120 core: providers: argenteam: backport fixes from bazarr 2019-05-27 12:34:37 +02:00
panni 2fa217d5d9 core: subtitle: encoding: re-revert 1ed4f11 2019-05-27 12:27:18 +02:00
panni a65b5a5d82 core: missed forced utf-8 instance 2019-05-24 18:10:09 +02:00
panni 7bb42e95d8 core: add env var SZ_KEEP_ENCODING to keep encoding of subtitles 2019-05-24 18:06:00 +02:00
panni db536502a1 bump dev 2019-05-19 06:06:43 +02:00
panni 47c8f1a2e6 Merge branch 'submod_opt' into develop-2.6 2019-05-19 06:04:17 +02:00
panni 30a0f11515 providers: subscene: don't calculate video fn for now 2019-05-19 04:27:20 +02:00
panni 9bf5123a00 providers: subscene: don't search for season packs (broken); fix endpoint error handling 2019-05-18 15:03:53 +02:00
panni f337b53ae3 submod: HI: remove music
submod: common: be less aggressive about music symbols
submod: HI: be less aggressive about brackets
submod: HI: be less aggressive about MAN
2019-05-18 06:23:04 +02:00
panni aea6050d71 subtitle: try decoding with utf-16 by default as well 2019-05-17 23:45:06 +02:00
panni 13d5e0761e providers: subscene: fix endpoint once again 2019-05-13 16:14:26 +02:00
panni ce28d0284c back from dev 2019-05-12 06:17:08 +02:00
panni 1a0bb9c3e4 release 2.6.5.3074 2019-05-12 06:05:16 +02:00
panni d0c71b4b67 bump dev 2019-05-12 05:12:58 +02:00
panni b3f062956d core: re-fix ass/ssa tags in srt in pysubs2 0.2.3 2019-05-12 05:12:34 +02:00
panni 1a853a780c core: update pysubs2 to 0.2.3 2019-05-12 05:01:38 +02:00
panni 5c47ddeb2d core: update chinese encodings; #646 2019-05-12 04:49:30 +02:00
panni b51deb5d01 core: subliminal: don't replace \r with \n by default; fixes utf-16 character transformation issues; fixes #646 2019-05-12 04:48:23 +02:00
panni cbf5ea69be core: cf: update cloudscraper to 1.1.9; fix keyerror 2019-05-08 15:57:33 +02:00
panni e139ffefe6 bump dev 2019-05-08 04:18:25 +02:00
panni dc0a8deb40 core: cf: testing
providers: subscene: testing
2019-05-08 04:14:04 +02:00
panni 97e93cd10a core: cf: update js2py; update cloudscraper to 1.1.5; 2019-05-08 01:31:21 +02:00
panni 03c934cf21 back to dev 2019-05-01 15:39:23 +02:00
panni 92d0d70258 Release 2.6.5.3062 2019-05-01 15:32:36 +02:00
panni d44298993c Release 2.6.5.3055 2019-05-01 15:32:19 +02:00
panni 12300d4115 Merge branch 'develop-2.6' 2019-05-01 15:29:42 +02:00
pannal b4f08f61a6 Update README.md 2019-05-01 06:00:01 +02:00
pannal 861a25be41 Update README.md 2019-05-01 05:59:21 +02:00
pannal 3e175109a6 Merge pull request #641 from fossabot/master
Add license scan report and status
2019-05-01 05:48:14 +02:00
fossabot fb2210f2fd Add license scan report and status
Signed-off-by: fossabot <badges@fossa.io>
2019-04-30 20:44:05 -07:00
panni e928918201 add cloudscaper LICENSE 2019-05-01 05:13:13 +02:00
panni df607e5772 bump dev 2019-05-01 04:49:30 +02:00
panni a7cc470645 core: log cf domain 2019-05-01 04:48:48 +02:00
panni 4e6421b928 core: dns: set env var empty if not configured 2019-05-01 04:36:03 +02:00
panni df48e8fccd providers: subscene: remove obsolete imports 2019-05-01 04:27:11 +02:00
panni 58111bf204 core: remove old cfscrape implementation 2019-05-01 04:25:04 +02:00
panni 8c02e75fed providers: titlovi: match cfsrc for src 2019-05-01 04:24:31 +02:00
panni 6f3f1cb4b5 core: cf: harden. 2019-05-01 04:24:09 +02:00
panni dd27997deb core: cf: add cloudscaper 1.1.1@496900e instead of cfscrape 2019-05-01 03:12:01 +02:00
panni a1f70d1d4d core: add ENV:dns_resolvers_timeout 2019-05-01 02:39:18 +02:00
panni 7da0bac643 skip warning 2019-05-01 02:33:46 +02:00
panni b3ab2a451c core: http: don't query DNS with IPs. thanks @fgump 2019-05-01 02:27:30 +02:00
panni 850f836ebd back to dev 2019-04-28 05:27:26 +02:00
panni d9fa9d03da back to dev 2019-04-28 05:22:24 +02:00
pannal 76c20dc3d7 Update README.md 2019-04-28 05:21:35 +02:00
panni 4568e222d1 release 2.6.5.3041 2019-04-28 05:11:45 +02:00
panni 344025226a add missing changelog entry 2019-04-28 05:11:09 +02:00
panni f546fcffce release 2.6.5.3039 2019-04-28 05:08:00 +02:00
panni 068c2d4d00 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	Contents/Info.plist
2019-04-28 05:04:51 +02:00
panni ccf5a902e5 core: cf: only store cookie if it had a value 2019-04-28 05:03:04 +02:00
panni 8c72cf9057 bump dev 2019-04-28 04:45:17 +02:00
panni 1ce14aa231 core: http: remove debug 2019-04-28 04:44:27 +02:00
panni 643485b879 core: cf: optimize
providers: titlovi: optimize cf/captcha handling
2019-04-28 04:43:03 +02:00
pannal 5b3d9f26be Update README.md 2019-04-28 03:47:55 +02:00
pannal 14f2f45f20 Update README.md 2019-04-22 05:37:47 +02:00
pannal 8ac6c9d7a7 Update README.md 2019-04-22 05:31:29 +02:00
pannal 237a47b8ed Update Info.plist 2019-04-21 03:48:37 +02:00
77 changed files with 4441 additions and 1163 deletions
+127
View File
@@ -1,3 +1,130 @@
2.6.5.3152
subscene, addic7ed
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- core: fix core issue possibly impacting results on OpenSubtitles in certain conditions
- core: fix default values of opensubtitles-skip-wrong-fps, use_https; fix #676
- core: fix for determining whether to search under certain circumstances; fixes #666
- core: #664 fix missing language processing of multiple videos refreshed at once
- core: #661 fix match strictness when determining preexisting external subtitles
- providers: titlovi: New implementation of Titlovi using API (thanks @viking1304)
2.6.5.3124
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- core: http: fallback to default DNS when normal resolving fails; fixes #657
- core: extract embedded/menu: fix detection of unknown streams; don't use unknown streams if a known language was previously found
- core: language: use replacement map from bazarr
- providers: titlovi: fix matching
- providers: subscene: fix unknown language code error when "empty" result is returned
- providers: subscene: add support for pt-BR (based on Diaoul/subliminal@b22cf08)
- providers: subscene: explicitly set account filters for languages
- providers: subscene: limit alternative searches to 3; set throttle to 8
- providers: subscene: move login/cookies to initialization sequence
- submod: generic: en: fix ";='s
2.6.5.3109
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- providers: add Napisy24 (polish)
- providers: subscene: reduce provider load by possibly half
- providers: subscene: support logging in (username/password are now required)
- providers: subscene: fallback to non year results if none found with year
2.6.5.3099
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- core: allow system DNS again by putting "system" as the DNS
- providers: subscene: fix again (subscene, contact us please, so we can end this)
2.6.5.3092
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- providers: subscene: fix endpoint (hopefully for longer now)
- providers: subscene: don't search for season packs (broken for now; relieves 50% of server load on provider)
- providers: subscene: don't calculate video fn for now
- providers: argenteam: backport fixes from bazarr
- subtitle: try decoding with utf-16 by default as well (zho/farsi)
- submod: HI: remove music tags by default
- core: compat (bazarr): add env var SZ_KEEP_ENCODING to keep encoding of subtitles
2.6.5.3074
Changelog
- core: cf: bypass cf 95% of the time without captchas
- core: fix breaking line endings of certain languages (chinese, UTF-16); fixes #646
- core: update pysubs2 to 0.2.3
2.6.5.3062
Changelog
- core: cf: optimize
- core: http: don't query DNS with IPs. thanks @fgump (fixes sonarr/radarr)
2.6.5.3041
Changelog
- core: only reference guessed title if there actually is one
- core: cf: optimize
- core/config: add setting for one existing language to be enough, fixes #491
- core/compat: dns: support nameservers via ENV[dns_resolvers]; don't fall back to default DNS when configured custom DNS failed
- providers: titlovi: prevent repeated captcha solving for CF
2.6.5.3017
Changelog
- core: SRT parsing: handle (bad) ASS color tag in SRT
- core: auto extract embedded: only use one unknown sub for first language
- core: better embedded streams language detection
- core: optimizations
- core: extract embedded: fix is_unknown check
- core: don't raise exception when subtitle not found inside archive
- core: search external subtitles: fix condition
- core: better plex transcoder path detection
- core: use Log.Warn instead of Log.Warning (#619, #629, #633)
- core: also check for "plex transcoder.exe" in case of windows (fixes #619)
- core: auto extract: use mbcs encoding for paths on windows
- core: Fix issue scandir not returning the name of the file inside Docker images on ARM systems. (thanks @giejay)
- core: also clean PYTHONHOME when calling external notification app
- core: update certifi to 2019.3.9
- core: scan_video: add series/title as alternative by scanning filename itself without parent folders
- core: add generic solution for solving captchas using anti captcha services
- core: increase cache time to 180d (was: 30d)
- core: guess_matches: handle multiple title matches; fixes bazarr#403
- windows: fix compatibility issues with plex transcoder
- compat: use lowercase paths on subtitle detection
- providers: addic7ed: re-enable (using paid anti captch service)
- providers: assrt: assume undefined Chinese flavor as Simplified (chs/zho-Hans)
- providers: subscene: make it work again by bypassing cf
- providers: subscene: don't fail on missing cover
- providers: titlovi: re-enable (might need paid anti captch service)
- providers: opensubtitles: fix only_foreign handling
- providers: opensubtitles: show subtitles with possibly mismatched series when manually listing subs
- menu: list subtitles: show subtitles with bad season/episode values as well
- refiners: omdb: fix imdb ids with spaces
2.6.4.2911
- core: improve file cache (windows especially); use fixed-length cache filenames; fixes #600
+6 -2
View File
@@ -119,19 +119,23 @@ def agent_extract_embedded(video_part_map):
for plexapi_part in get_all_parts(plexapi_item):
item_count = item_count + 1
used_one_unknown_stream = False
used_one_known_stream = False
for requested_language in config.lang_list:
skip_unknown = used_one_unknown_stream or used_one_known_stream
embedded_subs = stored_subs.get_by_provider(plexapi_part.id, requested_language, "embedded")
current = stored_subs.get_any(plexapi_part.id, requested_language) or \
requested_language in scanned_video.external_subtitle_languages
if not embedded_subs:
stream_data = get_embedded_subtitle_streams(plexapi_part, requested_language=requested_language,
skip_unknown=used_one_unknown_stream)
skip_unknown=skip_unknown)
if stream_data:
if stream_data and stream_data[0]["language"]:
stream = stream_data[0]["stream"]
if stream_data[0]["is_unknown"]:
used_one_unknown_stream = True
else:
used_one_known_stream = True
to_extract.append(({scanned_video: part_info}, plexapi_part, str(stream.index),
str(requested_language), not current))
+27 -25
View File
@@ -12,9 +12,10 @@ from menu_helpers import debounce, SubFolderObjectContainer, default_thumb, add_
from refresh_item import RefreshItem
from subzero.constants import PREFIX
from support.config import config, TEXT_SUBTITLE_EXTS
from support.helpers import timestamp, df, get_language, display_language, get_language_from_stream, is_stream_forced
from support.helpers import timestamp, df, get_language, display_language, get_language_from_stream
from support.items import get_item_kind_from_rating_key, get_item, get_current_sub, get_item_title, save_stored_sub
from support.plex_media import get_plex_metadata, get_part, get_embedded_subtitle_streams
from support.plex_media import get_plex_metadata, get_part, get_embedded_subtitle_streams, is_stream_forced, \
update_stream_info
from support.scanning import scan_videos
from support.scheduler import scheduler
from support.storage import get_subtitle_storage
@@ -118,6 +119,8 @@ def ItemDetailsMenu(rating_key, title=None, base_title=None, item_title=None, ra
if not os.path.exists(part.file):
continue
update_stream_info(part)
part_id = str(part.id)
part_index += 1
@@ -670,29 +673,28 @@ def ListEmbeddedSubsForItemMenu(**kwargs):
stream = stream_data["stream"]
is_forced = stream_data["is_forced"]
if language:
oc.add(DirectoryObject(
key=Callback(TriggerExtractEmbeddedSubForItemMenu, randomize=timestamp(),
stream_index=str(stream.index), language=language, with_mods=True, **kwargs),
title=_(u"Extract stream %(stream_index)s, %(language)s%(unknown_state)s%(forced_state)s"
u"%(stream_title)s with default mods",
stream_index=stream.index,
language=display_language(language),
unknown_state=_(" (unknown)") if is_unknown else "",
forced_state=_(" (forced)") if is_forced else "",
stream_title=" (\"%s\")" % stream.title if stream.title else ""),
))
oc.add(DirectoryObject(
key=Callback(TriggerExtractEmbeddedSubForItemMenu, randomize=timestamp(),
stream_index=str(stream.index), language=language, **kwargs),
title=_(u"Extract stream %(stream_index)s, %(language)s%(unknown_state)s%(forced_state)s"
u"%(stream_title)s",
stream_index=stream.index,
language=display_language(language),
unknown_state=_(" (unknown)") if is_unknown else "",
forced_state=_(" (forced)") if is_forced else "",
stream_title=" (\"%s\")" % stream.title if stream.title else ""),
))
oc.add(DirectoryObject(
key=Callback(TriggerExtractEmbeddedSubForItemMenu, randomize=timestamp(),
stream_index=str(stream.index), language=language, with_mods=True, **kwargs),
title=_(u"Extract stream %(stream_index)s, %(language)s%(unknown_state)s%(forced_state)s"
u"%(stream_title)s with default mods",
stream_index=stream.index,
language=display_language(language),
unknown_state=_(" (unknown)") if is_unknown else "",
forced_state=_(" (forced)") if is_forced else "",
stream_title=" (\"%s\")" % stream.title if stream.title else ""),
))
oc.add(DirectoryObject(
key=Callback(TriggerExtractEmbeddedSubForItemMenu, randomize=timestamp(),
stream_index=str(stream.index), language=language, **kwargs),
title=_(u"Extract stream %(stream_index)s, %(language)s%(unknown_state)s%(forced_state)s"
u"%(stream_title)s",
stream_index=stream.index,
language=display_language(language),
unknown_state=_(" (unknown)") if is_unknown else "",
forced_state=_(" (forced)") if is_forced else "",
stream_title=" (\"%s\")" % stream.title if stream.title else ""),
))
return oc
+1 -1
View File
@@ -368,7 +368,7 @@ def ValidatePrefs():
"subtitle_destination_folder", "include", "include_exclude_paths", "include_exclude_sz_files",
"new_style_cache", "dbm_supported", "lang_list", "providers", "normal_subs", "forced_only", "forced_also",
"plex_transcoder", "refiner_settings", "unrar", "adv_cfg_path", "use_custom_dns",
"has_anticaptcha", "anticaptcha_cls"]:
"has_anticaptcha", "anticaptcha_cls", "mediainfo_bin"]:
value = getattr(config, attr)
if isinstance(value, dict):
+3 -2
View File
@@ -11,14 +11,14 @@ from subzero.lib.io import get_viable_encoding
from subzero.language import Language
from support.i18n import is_localized_string, _
from support.items import get_kind, get_item_thumb, get_item, get_item_kind_from_item, refresh_item
from support.helpers import get_video_display_title, pad_title, display_language, quote_args, is_stream_forced, \
from support.helpers import get_video_display_title, pad_title, display_language, quote_args, \
get_title_for_video_metadata, mswindows
from support.history import get_history
from support.ignore import get_decision_list
from support.lib import get_intent
from support.config import config
from subzero.constants import ICON_SUB, ICON
from support.plex_media import get_part, get_plex_metadata
from support.plex_media import get_part, get_plex_metadata, is_stream_forced, update_stream_info
from support.scheduler import scheduler
from support.scanning import scan_videos
from support.storage import save_subtitles
@@ -178,6 +178,7 @@ def extract_embedded_sub(**kwargs):
metadata = get_plex_metadata(rating_key, part_id, item_type, plex_item=plex_item)
scanned_videos = scan_videos([metadata], ignore_all=True, skip_hashing=True)
update_stream_info(part)
for stream in part.streams:
# subtitle stream
if str(stream.index) == stream_index:
+28 -11
View File
@@ -22,6 +22,7 @@ from subzero.language import Language
from subliminal.cli import MutexLock
from subzero.lib.io import FileIO, get_viable_encoding
from subzero.lib.dict import Dicked
from subzero.lib.which import find_executable
from subzero.util import get_root_path
from subzero.constants import PLUGIN_NAME, PLUGIN_IDENTIFIER, MOVIE, SHOW, MEDIA_TYPE_TO_STRING
from subzero.prefs import get_user_prefs, update_user_prefs
@@ -66,6 +67,7 @@ PROVIDER_THROTTLE_MAP = {
DownloadLimitExceeded: (datetime.timedelta(hours=3), "3 hours"),
ServiceUnavailable: (datetime.timedelta(minutes=20), "20 minutes"),
APIThrottled: (datetime.timedelta(minutes=10), "10 minutes"),
AuthenticationError: (datetime.timedelta(hours=2), "2 hours"),
},
"opensubtitles": {
TooManyRequests: (datetime.timedelta(hours=3), "3 hours"),
@@ -75,6 +77,7 @@ PROVIDER_THROTTLE_MAP = {
"addic7ed": {
DownloadLimitExceeded: (datetime.timedelta(hours=3), "3 hours"),
TooManyRequests: (datetime.timedelta(minutes=5), "5 minutes"),
AuthenticationError: (datetime.timedelta(hours=24), "24 hours"),
}
}
@@ -153,6 +156,7 @@ class Config(object):
anticaptcha_token = None
anticaptcha_cls = None
has_anticaptcha = False
mediainfo_bin = None
store_recently_played_amount = 40
@@ -239,6 +243,8 @@ class Config(object):
self.embedded_auto_extract = cast_bool(Prefs["subtitles.embedded.autoextract"])
self.ietf_as_alpha3 = cast_bool(Prefs["subtitles.language.ietf_normalize"])
self.use_custom_dns = self.parse_custom_dns()
if not self.advanced.dont_use_mediainfo_mp4:
self.mediainfo_bin = self.advanced.mediainfo_bin or find_executable("mediainfo")
self.initialized = True
def migrate_prefs(self):
@@ -761,6 +767,7 @@ class Config(object):
return {'opensubtitles': cast_bool(Prefs['provider.opensubtitles.enabled']),
# 'thesubdb': Prefs['provider.thesubdb.enabled'],
'podnapisi': cast_bool(Prefs['provider.podnapisi.enabled']),
'napisy24': cast_bool(Prefs['provider.napisy24.enabled']),
'titlovi': cast_bool(Prefs['provider.titlovi.enabled']),
'addic7ed': cast_bool(Prefs['provider.addic7ed.enabled']) and self.has_anticaptcha,
'tvsubtitles': cast_bool(Prefs['provider.tvsubtitles.enabled']),
@@ -801,6 +808,7 @@ class Config(object):
providers["argenteam"] = False
providers["assrt"] = False
providers["subscene"] = False
providers["napisy24"] = False
providers_forced_off = dict(providers)
if not self.unrar and providers["legendastv"]:
@@ -841,11 +849,8 @@ class Config(object):
providers = property(get_providers)
def get_provider_settings(self):
os_use_https = self.advanced.providers.opensubtitles.use_https \
if self.advanced.providers.opensubtitles.use_https is not None else True
os_skip_wrong_fps = self.advanced.providers.opensubtitles.skip_wrong_fps \
if self.advanced.providers.opensubtitles.skip_wrong_fps is not None else True
os_use_https = self.advanced.providers.opensubtitles.get("use_https", True)
os_skip_wrong_fps = self.advanced.providers.opensubtitles.get("skip_wrong_fps", True)
provider_settings = {'addic7ed': {'username': Prefs['provider.addic7ed.username'],
'password': Prefs['provider.addic7ed.password'],
@@ -864,8 +869,18 @@ class Config(object):
'only_foreign': self.forced_only,
'also_foreign': self.forced_also,
},
'titlovi': {
'username': Prefs['provider.titlovi.username'],
'password': Prefs['provider.titlovi.password'],
},
'napisy24': {
'username': Prefs['provider.napisy24.username'],
'password': Prefs['provider.napisy24.password'],
},
'subscene': {
'only_foreign': self.forced_only,
'username': Prefs['provider.subscene.username'],
'password': Prefs['provider.subscene.password'],
},
'legendastv': {'username': Prefs['provider.legendastv.username'],
'password': Prefs['provider.legendastv.password'],
@@ -894,10 +909,10 @@ class Config(object):
throttle_data = PROVIDER_THROTTLE_MAP.get(name, PROVIDER_THROTTLE_MAP["default"]).get(cls, None) or \
PROVIDER_THROTTLE_MAP["default"].get(cls, None)
if not throttle_data:
return
throttle_delta, throttle_description = throttle_data
if throttle_data:
throttle_delta, throttle_description = throttle_data
else:
throttle_delta, throttle_description = datetime.timedelta(minutes=10), "10 minutes"
if "provider_throttle" not in Dict:
Dict["provider_throttle"] = {}
@@ -1083,11 +1098,13 @@ class Config(object):
def parse_custom_dns(self):
custom_dns = Prefs['use_custom_dns2'].strip()
if custom_dns:
os.environ["dns_resolvers"] = ""
if custom_dns and custom_dns != "system":
ips = filter(lambda x: x, [d.strip() for d in custom_dns.split(",")])
if ips:
os.environ["dns_resolvers"] = json.dumps(ips)
return os.environ["dns_resolvers"]
return os.environ["dns_resolvers"]
def init_subliminal_patches(self):
# configure custom subtitle destination folders for scanning pre-existing subs
+12 -11
View File
@@ -33,14 +33,14 @@ def get_missing_languages(video, part):
alpha3_map = {}
if config.ietf_as_alpha3:
for language in languages:
if language.country:
if language and language.country:
alpha3_map[language.alpha3] = language.country
language.country = None
have_languages = video.subtitle_languages.copy()
if config.ietf_as_alpha3:
for language in have_languages:
if language.country:
if language and language.country:
alpha3_map[language.alpha3] = language.country
language.country = None
@@ -53,14 +53,14 @@ def get_missing_languages(video, part):
filter(lambda l: not l.forced, video.subtitle_languages)
if langs:
Log.Debug("We have at least one subtitle for any configured language.")
return False
return set()
elif "External subtitle" in config.any_language_is_enough:
langs = video.subtitle_languages if not not_in_forced else \
langs = video.external_subtitle_languages if not not_in_forced else \
filter(lambda l: not l.forced, video.external_subtitle_languages)
if langs:
Log.Debug("We have at least one external subtitle for any configured language.")
return False
return set()
# all languages are found if we either really have subs for all languages or we only want to have exactly one language
# and we've only found one (the case for a selected language, Prefs['subtitles.only_one'] (one found sub matches any language))
@@ -70,7 +70,7 @@ def get_missing_languages(video, part):
Log.Debug('Only one language was requested, and we\'ve got a subtitle for %s', video)
else:
Log.Debug('All languages %r exist for %s', languages, video)
return False
return set()
# re-add country codes to the missing languages, in case we've removed them above
if config.ietf_as_alpha3:
@@ -106,21 +106,22 @@ def language_hook(provider):
def download_best_subtitles(video_part_map, min_score=0, throttle_time=None, providers=None):
hearing_impaired = Prefs['subtitles.search.hearingImpaired']
languages = set([Language.rebuild(l) for l in config.lang_list])
missing_languages = []
if not languages:
return
use_videos = []
missing_languages = set()
for video, part in video_part_map.iteritems():
if not video.ignore_all:
missing_languages = get_missing_languages(video, part)
p_missing_languages = get_missing_languages(video, part)
else:
missing_languages = languages
p_missing_languages = languages
if missing_languages:
Log.Info(u"%s has missing languages: %s", os.path.basename(video.name), missing_languages)
if p_missing_languages:
Log.Info(u"%s has missing languages: %s", os.path.basename(video.name), p_missing_languages)
refine_video(video, refiner_settings=config.refiner_settings)
use_videos.append(video)
missing_languages.update(p_missing_languages)
# prepare blacklist
blacklist = get_blacklist_from_part_map(video_part_map, languages)
+3 -10
View File
@@ -394,7 +394,7 @@ def get_language_from_stream(lang_code):
return Language.fromietf(lang)
elif lang:
try:
return language_from_stream(lang)
return language_from_stream(lang_code)
except LanguageError:
pass
@@ -437,17 +437,10 @@ def get_language(lang_short):
def display_language(l):
if not l:
return "Unknown"
return _(str(l.basename).lower()) + ((u" (%s)" % _("forced")) if l.forced else "")
def is_stream_forced(stream):
stream_title = getattr(stream, "title", "") or ""
forced = getattr(stream, "forced", False)
if not forced and stream_title and "forced" in stream_title.strip().lower():
forced = True
return forced
class PartUnknownException(Exception):
pass
+2 -1
View File
@@ -7,6 +7,7 @@ import helpers
import subtitlehelpers
from config import config as sz_config
from subzero.language import ENDSWITH_LANGUAGECODE_RE
SECONDARY_TAGS = ['forced', 'normal', 'default', 'embedded', 'embedded-forced', 'custom', 'hi', 'cc', 'sdh']
@@ -125,7 +126,7 @@ def find_subtitles(part, ignore_parts_cleanup=None):
root = split_tag[0]
# get associated media file name without language
sub_fn = subtitlehelpers.ENDSWITH_LANGUAGECODE_RE.sub("", root)
sub_fn = ENDSWITH_LANGUAGECODE_RE.sub("", root)
# subtitle basename and basename without possible language tag not found in collected
# media files? kill.
+3 -2
View File
@@ -7,7 +7,8 @@ import os
from babelfish import LanguageReverseError
from support.config import config, TEXT_SUBTITLE_EXTS
from support.helpers import get_plex_item_display_title, cast_bool, get_language_from_stream, is_stream_forced
from support.helpers import get_plex_item_display_title, cast_bool, get_language_from_stream
from support.plex_media import is_stream_forced, update_stream_info
from support.items import get_item
from support.lib import Plex
from support.storage import get_subtitle_storage
@@ -35,7 +36,7 @@ def item_discover_missing_subs(rating_key, kind="show", added_at=None, section_t
for media in item.media:
existing_subs = {"internal": [], "external": [], "own_external": [], "count": 0}
for part in media.parts:
update_stream_info(part)
# did we already download an external subtitle before?
if subtitle_target_dir and stored_subs:
for language in languages_set:
+86 -27
View File
@@ -1,6 +1,7 @@
# coding=utf-8
import os
import subprocess
import helpers
from items import get_item
@@ -26,6 +27,9 @@ tvdb_guid_identifier = "com.plexapp.agents.thetvdb://"
def get_plexapi_stream_info(plex_item, part_id=None):
if not plex_item:
return
d = {"stream": {}}
data = d["stream"]
@@ -100,6 +104,9 @@ def media_to_videos(media, kind="series"):
plex_episode = get_item(ep.id)
stream_info = get_plexapi_stream_info(plex_episode)
if not stream_info:
continue
for item in media.seasons[season].episodes[episode].items:
for part in item.parts:
videos.append(
@@ -121,22 +128,24 @@ def media_to_videos(media, kind="series"):
)
else:
stream_info = get_plexapi_stream_info(plex_item)
imdb_id = None
if imdb_guid_identifier in media.guid:
imdb_id = media.guid[len(imdb_guid_identifier):].split("?")[0]
for item in media.items:
for part in item.parts:
videos.append(
get_metadata_dict(plex_item, part, dict(stream_info, **{"plex_part": part, "type": "movie",
"title": media.title, "id": media.id,
"super_thumb": plex_item.thumb,
"series_id": None, "year": year,
"season_id": None, "imdb_id": imdb_id,
"original_title": original_title,
"series_tvdb_id": None, "tvdb_id": None,
"section": plex_item.section.title})
)
)
if stream_info:
imdb_id = None
if imdb_guid_identifier in media.guid:
imdb_id = media.guid[len(imdb_guid_identifier):].split("?")[0]
for item in media.items:
for part in item.parts:
videos.append(
get_metadata_dict(plex_item, part, dict(stream_info, **{"plex_part": part, "type": "movie",
"title": media.title, "id": media.id,
"super_thumb": plex_item.thumb,
"series_id": None, "year": year,
"season_id": None, "imdb_id": imdb_id,
"original_title": original_title,
"series_tvdb_id": None, "tvdb_id": None,
"section": plex_item.section.title})
)
)
return videos
@@ -174,42 +183,89 @@ def get_all_parts(plex_item):
return parts
def update_stream_info(part):
if config.mediainfo_bin and part.container == "mp4":
cmdline = '%s --Inform="Text;-%%ID%%_%%Title%%" %s' % (config.mediainfo_bin, helpers.quote(part.file))
result = subprocess.check_output(cmdline, stderr=subprocess.PIPE, shell=True)
if result:
try:
stream_titles = {}
for pair in result[1:].split("-"):
sid, title = pair.split("_")
stream_titles[int(sid.strip())] = title.strip()
except:
pass
else:
filled = []
for stream in part.streams:
index = stream.index+1
if index in stream_titles:
stream.title = stream_titles[index]
filled.append(index-1)
if filled:
Log.Debug("Filled missing MP4 stream title info for streams: %s", filled)
def is_stream_forced(stream):
stream_title = getattr(stream, "title", "") or ""
forced = getattr(stream, "forced", False)
if not forced and stream_title and "forced" in stream_title.strip().lower():
forced = True
return forced
def get_embedded_subtitle_streams(part, requested_language=None, skip_duplicate_unknown=True, skip_unknown=False):
streams = []
streams_unknown = []
all_streams = []
has_unknown = False
found_requested_language = False
update_stream_info(part)
for stream in part.streams:
# subtitle stream
if stream.stream_type == 3 and not stream.stream_key and stream.codec in TEXT_SUBTITLE_EXTS:
is_forced = helpers.is_stream_forced(stream)
is_forced = is_stream_forced(stream)
language = helpers.get_language_from_stream(stream.language_code)
if language:
language = Language.rebuild(language, forced=is_forced)
is_unknown = False
found_requested_language = requested_language and requested_language == language
stream_data = None
if not language and config.treat_und_as_first:
if not language:
# only consider first unknown subtitle stream
if has_unknown and skip_duplicate_unknown:
continue
if config.treat_und_as_first:
if has_unknown and skip_duplicate_unknown:
Log.Debug("skipping duplicate unknown")
continue
language = Language.rebuild(list(config.lang_list)[0], forced=is_forced)
language = Language.rebuild(list(config.lang_list)[0], forced=is_forced)
else:
language = None
is_unknown = True
has_unknown = True
streams_unknown.append({"stream": stream, "is_unknown": is_unknown, "language": language,
"is_forced": is_forced})
stream_data = {"stream": stream, "is_unknown": is_unknown, "language": language,
"is_forced": is_forced}
streams_unknown.append(stream_data)
if not requested_language or found_requested_language:
streams.append({"stream": stream, "is_unknown": is_unknown, "language": language,
"is_forced": is_forced})
stream_data = {"stream": stream, "is_unknown": is_unknown, "language": language,
"is_forced": is_forced}
streams.append(stream_data)
if found_requested_language:
break
if streams_unknown and not found_requested_language and not skip_unknown:
streams = streams_unknown
if stream_data:
all_streams.append(stream_data)
if requested_language:
if streams_unknown and not found_requested_language and not skip_unknown:
streams = streams_unknown
else:
streams = all_streams
return streams
@@ -245,6 +301,9 @@ def get_plex_metadata(rating_key, part_id, item_type, plex_item=None):
stream_info = get_plexapi_stream_info(plex_item, part_id)
if not stream_info:
return
# get normalized metadata
# fixme: duplicated logic of media_to_videos
if item_type == "episode":
+10 -5
View File
@@ -4,7 +4,7 @@ import helpers
from babelfish.exceptions import LanguageError
from support.lib import Plex, get_intent
from support.plex_media import get_stream_fps
from support.plex_media import get_stream_fps, is_stream_forced, update_stream_info
from support.storage import get_subtitle_storage
from support.config import config, TEXT_SUBTITLE_EXTS
from support.subtitlehelpers import get_subtitles_from_metadata
@@ -46,23 +46,25 @@ def prepare_video(pms_video_info, ignore_all=False, hints=None, rating_key=None,
# fixme: skip the whole scanning process if known_embedded == wanted languages?
audio_languages = []
if plexpy_part:
update_stream_info(plexpy_part)
for stream in plexpy_part.streams:
if stream.stream_type == 2:
lang = None
try:
lang = language_from_stream(stream.language_code)
except LanguageError:
Log.Debug("Couldn't detect embedded audio stream language: %s", stream.language_code)
Log.Info("Couldn't detect embedded audio stream language: %s", stream.language_code)
# treat unknown language as lang1?
if not lang and config.treat_und_as_first:
lang = Language.rebuild(list(config.lang_list)[0])
Log.Info("Assuming language %s for audio stream: %s", lang, getattr(stream, "index", None))
audio_languages.append(lang)
# subtitle stream
elif stream.stream_type == 3 and embedded_subtitles:
is_forced = helpers.is_stream_forced(stream)
is_forced = is_stream_forced(stream)
if ((config.forced_only or config.forced_also) and is_forced) or not is_forced:
# embedded subtitle
@@ -73,11 +75,13 @@ def prepare_video(pms_video_info, ignore_all=False, hints=None, rating_key=None,
try:
lang = language_from_stream(stream.language_code)
except LanguageError:
Log.Debug("Couldn't detect embedded subtitle stream language: %s", stream.language_code)
Log.Info("Couldn't detect embedded subtitle stream language: %s", stream.language_code)
# treat unknown language as lang1?
if not lang and config.treat_und_as_first:
lang = Language.rebuild(list(config.lang_list)[0])
Log.Info("Assuming language %s for subtitle stream: %s", lang,
getattr(stream, "index", None))
if lang:
if is_forced:
@@ -127,7 +131,8 @@ def prepare_video(pms_video_info, ignore_all=False, hints=None, rating_key=None,
set_existing_languages(video, pms_video_info, external_subtitles=external_subtitles,
embedded_subtitles=embedded_subtitles, known_embedded=known_embedded,
stored_subs=stored_subs, languages=config.lang_list,
only_one=config.only_one, known_metadata_subs=known_metadata_subs)
only_one=config.only_one, known_metadata_subs=known_metadata_subs,
match_strictness=config.ext_match_strictness)
# add video fps info
video.fps = plex_part.fps
+2 -14
View File
@@ -5,6 +5,7 @@ import helpers
from config import config, SUBTITLE_EXTS, TEXT_SUBTITLE_EXTS
from bs4 import UnicodeDammit
from subzero.language import match_ietf_language
class SubtitleHelper(object):
@@ -85,19 +86,6 @@ class VobSubSubtitleHelper(SubtitleHelper):
#####################################################################################################################
IETF_MATCH = ".+\.([^-.]+)(?:-[A-Za-z]+)?$"
ENDSWITH_LANGUAGECODE_RE = re.compile("\.([^-.]{2,3})(?:-[A-Za-z]{2,})?$")
def match_ietf_language(s):
language_match = re.match(".+\.([^\.]+)$" if not helpers.cast_bool(Prefs["subtitles.language.ietf_display"])
else IETF_MATCH, s)
if language_match and len(language_match.groups()) == 1:
language = language_match.groups()[0]
return language
return s
class DefaultSubtitleHelper(SubtitleHelper):
@classmethod
def is_helper_for(cls, filename):
@@ -133,7 +121,7 @@ class DefaultSubtitleHelper(SubtitleHelper):
# Attempt to extract the language from the filename (e.g. Avatar (2009).eng)
# IETF support thanks to
# https://github.com/hpsbranco/LocalMedia.bundle/commit/4fad9aefedece78a1fa96401304351347f644369
lang_part = match_ietf_language(file)
lang_part = match_ietf_language(file, ietf=helpers.cast_bool(Prefs["subtitles.language.ietf_display"]))
if lang_part != file:
language = Locale.Language.Match(lang_part)
elif config.only_one:
+51 -3
View File
@@ -288,7 +288,7 @@
},
{
"id": "anticaptcha.service",
"label": "AntiCaptcha-Service (needs paid account; enables Addic7ed, titlovi)",
"label": "AntiCaptcha-Service (needs paid account; enables Addic7ed)",
"type": "enum",
"values": [
"none",
@@ -335,6 +335,26 @@
"type": "bool",
"default": "true"
},
{
"id": "provider.napisy24.enabled",
"label": "Provider: Enable Napisy24 (pl)",
"type": "bool",
"default": "false"
},
{
"id": "provider.napisy24.username",
"label": "Napisy24 Username",
"type": "text",
"default": ""
},
{
"id": "provider.napisy24.password",
"label": "Napisy24 Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "provider.addic7ed.enabled",
"label": "Provider: Enable Addic7ed (needs AntiCaptcha)",
@@ -389,10 +409,24 @@
},
{
"id": "provider.titlovi.enabled",
"label": "Provider: Enable Titlovi.com (might need AntiCaptcha)",
"label": "Provider: Enable Titlovi.com (User and Password required)",
"type": "bool",
"default": "true"
},
{
"id": "provider.titlovi.username",
"label": "Titlovi Username",
"type": "text",
"default": ""
},
{
"id": "provider.titlovi.password",
"label": "Titlovi Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "provider.legendastv.enabled",
"label": "Provider: Enable Legendas TV (mostly pt-BR; UNRAR NEEDED)",
@@ -431,6 +465,20 @@
"type": "bool",
"default": "true"
},
{
"id": "provider.subscene.username",
"label": "SubScene Username",
"type": "text",
"default": ""
},
{
"id": "provider.subscene.password",
"label": "SubScene Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "provider.supersubtitles.enabled",
"label": "Provider: Enable feliratok.info (Hungarian)",
@@ -861,7 +909,7 @@
},
{
"id": "use_custom_dns2",
"label": "Use custom DNS (IPs, comma-separated, leave empty for system DNS. Default: Google/CF)",
"label": "Use custom DNS (IPs, comma-separated, set to 'system' for system DNS. Default: Google/CF)",
"type": "text",
"default": "1.1.1.1, 8.8.8.8"
},
+3 -3
View File
@@ -13,7 +13,7 @@
<key>CFBundleSignature</key>
<string>????</string>
<key>CFBundleVersion</key>
<string>2.6.5.3023</string>
<string>2.6.5.3183</string>
<key>PlexFrameworkVersion</key>
<string>2</string>
<key>PlexPluginClass</key>
@@ -23,7 +23,7 @@
<key>PlexPluginConsoleLogging</key>
<string>0</string>
<key>PlexPluginDevMode</key>
<string>1</string>
<string>0</string>
<key>PlexPluginCodePolicy</key>
<!-- this allows channels to access some python methods which are otherwise blocked, as well as import external code libraries, and interact with the PMS HTTP API -->
<string>Elevated</string>
@@ -32,7 +32,7 @@
&lt;h1&gt;Sub-Zero for Plex&lt;/h1&gt;&lt;i&gt;Subtitles done right&lt;/i&gt;
Version 2.6.5.3023 DEV
Version 2.6.5.3183
Originally based on @bramwalet's awesome &lt;a href=&quot;https://github.com/bramwalet/Subliminal.bundle&quot;&gt;Subliminal.bundle&lt;/a&gt;
@@ -1,392 +0,0 @@
# coding=utf-8
import logging
import random
import re
import os
import json
import base64
from copy import deepcopy
from time import sleep
from collections import OrderedDict
from .jsfuck import jsunfuck
import js2py
from requests.sessions import Session
from subliminal_patch.pitcher import pitchers
try:
from requests_toolbelt.utils import dump
except ImportError:
pass
try:
from urlparse import urlparse
from urlparse import urlunparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlunparse
brotli_available = True
try:
from brotli import decompress as brdec
except:
brotli_available = False
logger = logging.getLogger(__name__)
__version__ = "2.0.3"
# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT
DEFAULT_USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]
BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""
cur_path = os.path.abspath(os.path.dirname(__file__))
if brotli_available:
brwsrs = os.path.join(cur_path, "browsers_br.json")
with open(brwsrs, "r") as f:
UA_COMBO = json.load(f, object_pairs_hook=OrderedDict)["chrome"]
else:
brwsrs = os.path.join(cur_path, "browsers.json")
UA_COMBO = []
with open(brwsrs, "r") as f:
_brwsrs = json.load(f, object_pairs_hook=OrderedDict)
for entry in _brwsrs:
_entry = OrderedDict(("-".join(a.capitalize() for a in key.split("-")), value)
for key, value in entry.iteritems())
_entry["User-Agent"] = None
UA_COMBO.append({"User-Agent": [entry["user-agent"]], "headers": _entry})
class NeedsCaptchaException(Exception):
pass
class CloudflareScraper(Session):
def __init__(self, *args, **kwargs):
self.delay = kwargs.pop('delay', 8)
self.debug = False
self._ua = None
self._hdrs = None
super(CloudflareScraper, self).__init__(*args, **kwargs)
if not self._ua:
# Set a random User-Agent if no custom User-Agent has been set
ua_combo = random.choice(UA_COMBO)
self._ua = random.choice(ua_combo["User-Agent"])
self._hdrs = ua_combo["headers"].copy()
self._hdrs["User-Agent"] = self._ua
self.headers['User-Agent'] = self._ua
def set_cloudflare_challenge_delay(self, delay):
if isinstance(delay, (int, float)) and delay > 0:
self.delay = delay
def is_cloudflare_challenge(self, resp):
if resp.headers.get('Server', '').startswith('cloudflare'):
if b'why_captcha' in resp.content or b'/cdn-cgi/l/chk_captcha' in resp.content:
raise NeedsCaptchaException
return (
resp.status_code in [429, 503]
and b"jschl_vc" in resp.content
and b"jschl_answer" in resp.content
)
return False
def debugRequest(self, req):
try:
print (dump.dump_all(req).decode('utf-8'))
except:
pass
def request(self, method, url, *args, **kwargs):
# self.headers = (
# OrderedDict(
# [
# ('User-Agent', self.headers['User-Agent']),
# ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
# ('Accept-Language', 'en-US,en;q=0.5'),
# ('Accept-Encoding', 'gzip, deflate'),
# ('Connection', 'close'),
# ('Upgrade-Insecure-Requests', '1')
# ]
# )
# )
self.headers = self._hdrs.copy()
resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
if resp.headers.get('content-encoding') == 'br' and brotli_available:
resp._content = brdec(resp._content)
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
try:
if self.is_cloudflare_challenge(resp):
# Work around if the initial request is not a GET,
# Superseed with a GET then re-request the orignal METHOD.
if resp.request.method != 'GET':
self.request('GET', resp.url)
resp = self.request(method, url, *args, **kwargs)
else:
resp = self.solve_cf_challenge(resp, **kwargs)
except NeedsCaptchaException:
# solve the captcha
site_key = re.search(r'data-sitekey="(.+?)"', resp.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', resp.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', resp.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("cf: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("cf", resp.request.url, site_key,
user_agent=self.headers["User-Agent"],
cookies=self.cookies.get_dict(),
is_invisible=True)
logger.info("cf: Solving captcha")
result = pitcher.throw()
if not result:
raise Exception("cf: Couldn't solve captcha!")
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_captcha'.format(parsed_url.scheme, domain)
method = resp.request.method
cloudflare_kwargs = {
'allow_redirects': False,
'headers': {'Referer': resp.url},
'params': OrderedDict(
[
('s', challenge_s),
('g-recaptcha-response', result)
]
)
}
return self.request(method, submit_url, **cloudflare_kwargs)
return resp
def solve_cf_challenge(self, resp, **original_kwargs):
body = resp.text
# Cloudflare requires a delay before solving the challenge
if self.delay == 8:
try:
delay = float(re.search(r'submit\(\);\r?\n\s*},\s*([0-9]+)', body).group(1)) / float(1000)
if isinstance(delay, (int, float)):
self.delay = delay
except:
pass
sleep(self.delay)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_jschl'.format(parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})
try:
params = cloudflare_kwargs.setdefault(
'params', OrderedDict(
[
('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
]
)
)
except Exception as e:
# Something is wrong with the page.
# This may indicate Cloudflare has changed their anti-bot
# technique. If you see this and are running the latest version,
# please open a GitHub issue so I can update the code accordingly.
raise ValueError("Unable to parse Cloudflare anti-bots page: {} {}".format(e.message, BUG_REPORT))
# Solve the Javascript challenge
params['jschl_answer'] = self.solve_challenge(body, domain)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
method = resp.request.method
cloudflare_kwargs['allow_redirects'] = False
redirect = self.request(method, submit_url, **cloudflare_kwargs)
redirect_location = urlparse(redirect.headers['Location'])
if not redirect_location.netloc:
redirect_url = urlunparse(
(
parsed_url.scheme,
domain,
redirect_location.path,
redirect_location.params,
redirect_location.query,
redirect_location.fragment
)
)
return self.request(method, redirect_url, **original_kwargs)
return self.request(method, redirect.headers['Location'], **original_kwargs)
def solve_challenge(self, body, domain):
try:
js = re.search(
r"setTimeout\(function\(\){\s+(var s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n",
body
).group(1)
except Exception:
raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. {}".format(BUG_REPORT))
js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
js = re.sub(r'(e\s=\sfunction\(s\)\s{.*?};)', '', js, flags=re.DOTALL|re.MULTILINE)
js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))
js = js.replace('; 121', '')
# Strip characters that could be used to exit the string context
# These characters are not currently used in Cloudflare's arithmetic snippet
js = re.sub(r"[\n\\']", "", js)
if 'toFixed' not in js:
raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. {}".format(BUG_REPORT))
try:
jsEnv = """
var t = "{domain}";
var g = String.fromCharCode;
o = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
e = function(s) {{
s += "==".slice(2 - (s.length & 3));
var bm, r = "", r1, r2, i = 0;
for (; i < s.length;) {{
bm = o.indexOf(s.charAt(i++)) << 18 | o.indexOf(s.charAt(i++)) << 12 | (r1 = o.indexOf(s.charAt(i++))) << 6 | (r2 = o.indexOf(s.charAt(i++)));
r += r1 === 64 ? g(bm >> 16 & 255) : r2 === 64 ? g(bm >> 16 & 255, bm >> 8 & 255) : g(bm >> 16 & 255, bm >> 8 & 255, bm & 255);
}}
return r;
}};
function italics (str) {{ return '<i>' + this + '</i>'; }};
var document = {{
getElementById: function () {{
return {{'innerHTML': '{innerHTML}'}};
}}
}};
{js}
"""
innerHTML = re.search(
'<div(?: [^<>]*)? id="([^<>]*?)">([^<>]*?)<\/div>',
body,
re.MULTILINE | re.DOTALL
)
innerHTML = innerHTML.group(2).replace("'", r"\'") if innerHTML else ""
js = jsunfuck(jsEnv.format(domain=domain, innerHTML=innerHTML, js=js))
def atob(s):
return base64.b64decode('{}'.format(s)).decode('utf-8')
js2py.disable_pyimport()
context = js2py.EvalJs({'atob': atob})
result = context.eval(js)
except Exception:
logging.error("Error executing Cloudflare IUAM Javascript. {}".format(BUG_REPORT))
raise
try:
float(result)
except Exception:
raise ValueError("Cloudflare IUAM challenge returned unexpected answer. {}".format(BUG_REPORT))
return result
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudflareScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
# Functions for integrating cloudflare-scrape with other applications and scripts
@classmethod
def get_tokens(cls, url, user_agent=None, debug=False, **kwargs):
scraper = cls.create_scraper()
scraper.debug = debug
if user_agent:
scraper.headers['User-Agent'] = user_agent
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception as e:
logging.error("'{}' returned an error. Could not collect tokens.".format(url))
raise
domain = urlparse(resp.url).netloc
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith('.') and d in ('.{}'.format(domain)):
cookie_domain = d
break
else:
raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")
return (
{
'__cfduid': scraper.cookies.get('__cfduid', '', domain=cookie_domain),
'cf_clearance': scraper.cookies.get('cf_clearance', '', domain=cookie_domain)
},
scraper.headers['User-Agent']
)
@classmethod
def get_cookie_string(cls, url, user_agent=None, debug=False, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, debug=debug, **kwargs)
return "; ".join("=".join(pair) for pair in tokens.items()), user_agent
create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string
@@ -1,80 +0,0 @@
[
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"user-agent": "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.102 Safari/537.36",
"accept-encoding": "gzip,deflate",
"accept-language": "en-US,en;q=0.8"
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"user-agent": "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36",
"accept-encoding": "gzip,deflate",
"accept-language": "en-US,en;q=0.8"
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36",
"accept-language": "en-US,en;q=0.8",
"accept-encoding": "gzip, deflate, "
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36",
"accept-language": "en-US,en;q=0.8",
"accept-encoding": "gzip, deflate, "
},
{
"connection": "close",
"accept": "*/*",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Firefox/30.0"
},
{
"connection": "close",
"accept": "image/jpeg, image/gif, image/pjpeg, application/x-ms-application, application/xaml+xml, application/x-ms-xbap, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"accept": "text/html, application/xhtml+xml, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"accept": "text/html, application/xhtml+xml, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)",
"accept-encoding": "gzip, deflate",
"dnt": "1"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
}
]
@@ -0,0 +1,311 @@
import logging
import re
import sys
import ssl
from copy import deepcopy
from time import sleep
from collections import OrderedDict
from requests.sessions import Session
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
from .interpreters import JavaScriptInterpreter
from .user_agent import User_Agent
try:
from requests_toolbelt.utils import dump
except ImportError:
pass
try:
import brotli
except ImportError:
pass
try:
from urlparse import urlparse
from urlparse import urlunparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlunparse
##########################################################################################################################################################
__version__ = '1.1.9'
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
class CipherSuiteAdapter(HTTPAdapter):
def __init__(self, cipherSuite=None, **kwargs):
self.cipherSuite = cipherSuite
if hasattr(ssl, 'PROTOCOL_TLS'):
self.ssl_context = create_urllib3_context(
ssl_version=getattr(ssl, 'PROTOCOL_TLSv1_3', ssl.PROTOCOL_TLSv1_2),
ciphers=self.cipherSuite
)
else:
self.ssl_context = create_urllib3_context(ssl_version=ssl.PROTOCOL_TLSv1)
super(CipherSuiteAdapter, self).__init__(**kwargs)
##########################################################################################################################################################
def init_poolmanager(self, *args, **kwargs):
kwargs['ssl_context'] = self.ssl_context
return super(CipherSuiteAdapter, self).init_poolmanager(*args, **kwargs)
##########################################################################################################################################################
def proxy_manager_for(self, *args, **kwargs):
kwargs['ssl_context'] = self.ssl_context
return super(CipherSuiteAdapter, self).proxy_manager_for(*args, **kwargs)
##########################################################################################################################################################
class CloudScraper(Session):
def __init__(self, *args, **kwargs):
self.debug = kwargs.pop('debug', False)
self.delay = kwargs.pop('delay', None)
self.interpreter = kwargs.pop('interpreter', 'js2py')
self.allow_brotli = kwargs.pop('allow_brotli', True if 'brotli' in sys.modules.keys() else False)
self.cipherSuite = None
super(CloudScraper, self).__init__(*args, **kwargs)
if 'requests' in self.headers['User-Agent']:
# Set a random User-Agent if no custom User-Agent has been set
self.headers = User_Agent(allow_brotli=self.allow_brotli).headers
self.mount('https://', CipherSuiteAdapter(self.loadCipherSuite()))
##########################################################################################################################################################
@staticmethod
def debugRequest(req):
try:
print(dump.dump_all(req).decode('utf-8'))
except: # noqa
pass
##########################################################################################################################################################
def loadCipherSuite(self):
if self.cipherSuite:
return self.cipherSuite
self.cipherSuite = ''
if hasattr(ssl, 'PROTOCOL_TLS'):
ciphers = [
'ECDHE-ECDSA-AES128-GCM-SHA256', 'ECDHE-RSA-AES128-GCM-SHA256', 'ECDHE-ECDSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES256-GCM-SHA384', 'ECDHE-ECDSA-CHACHA20-POLY1305-SHA256', 'ECDHE-RSA-CHACHA20-POLY1305-SHA256',
'ECDHE-RSA-AES128-CBC-SHA', 'ECDHE-RSA-AES256-CBC-SHA', 'RSA-AES128-GCM-SHA256', 'RSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES128-GCM-SHA256', 'RSA-AES256-SHA', '3DES-EDE-CBC'
]
if hasattr(ssl, 'PROTOCOL_TLSv1_3'):
ciphers.insert(0, ['GREASE_3A', 'GREASE_6A', 'AES128-GCM-SHA256', 'AES256-GCM-SHA256', 'AES256-GCM-SHA384', 'CHACHA20-POLY1305-SHA256'])
ctx = ssl.SSLContext(getattr(ssl, 'PROTOCOL_TLSv1_3', ssl.PROTOCOL_TLSv1_2))
for cipher in ciphers:
try:
ctx.set_ciphers(cipher)
self.cipherSuite = '{}:{}'.format(self.cipherSuite, cipher).rstrip(':')
except ssl.SSLError:
pass
return self.cipherSuite
##########################################################################################################################################################
def request(self, method, url, *args, **kwargs):
ourSuper = super(CloudScraper, self)
resp = ourSuper.request(method, url, *args, **kwargs)
if resp.headers.get('Content-Encoding') == 'br':
if self.allow_brotli and resp._content:
resp._content = brotli.decompress(resp.content)
else:
logging.warning('Brotli content detected, But option is disabled, we will not continue.')
return resp
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
if self.isChallengeRequest(resp):
if resp.request.method != 'GET':
# Work around if the initial request is not a GET,
# Supersede with a GET then re-request the original METHOD.
self.request('GET', resp.url)
resp = ourSuper.request(method, url, *args, **kwargs)
else:
# Solve Challenge
resp = self.sendChallengeResponse(resp, **kwargs)
return resp
##########################################################################################################################################################
@staticmethod
def isChallengeRequest(resp):
if resp.headers.get('Server', '').startswith('cloudflare'):
if b'why_captcha' in resp.content or b'/cdn-cgi/l/chk_captcha' in resp.content:
raise ValueError('Captcha')
return (
resp.status_code in [429, 503]
and all(s in resp.content for s in [b'jschl_vc', b'jschl_answer'])
)
return False
##########################################################################################################################################################
def sendChallengeResponse(self, resp, **original_kwargs):
body = resp.text
# Cloudflare requires a delay before solving the challenge
if not self.delay:
try:
delay = float(re.search(r'submit\(\);\r?\n\s*},\s*([0-9]+)', body).group(1)) / float(1000)
if isinstance(delay, (int, float)):
self.delay = delay
except: # noqa
pass
sleep(self.delay)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_jschl'.format(parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
try:
params = OrderedDict()
s = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body)
if s:
params['s'] = s.group('s_value')
params.update(
[
('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
('pass', re.search(r'name="pass" value="(.+?)"', body).group(1))
]
)
params = cloudflare_kwargs.setdefault('params', params)
except Exception as e:
raise ValueError('Unable to parse Cloudflare anti-bots page: {} {}'.format(e.message, BUG_REPORT))
# Solve the Javascript challenge
params['jschl_answer'] = JavaScriptInterpreter.dynamicImport(self.interpreter).solveChallenge(body, domain)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
cloudflare_kwargs['allow_redirects'] = False
redirect = self.request(resp.request.method, submit_url, **cloudflare_kwargs)
redirect_location = urlparse(redirect.headers['Location'])
if not redirect_location.netloc:
redirect_url = urlunparse(
(
parsed_url.scheme,
domain,
redirect_location.path,
redirect_location.params,
redirect_location.query,
redirect_location.fragment
)
)
return self.request(resp.request.method, redirect_url, **original_kwargs)
return self.request(resp.request.method, redirect.headers['Location'], **original_kwargs)
##########################################################################################################################################################
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
##########################################################################################################################################################
# Functions for integrating cloudscraper with other applications and scripts
@classmethod
def get_tokens(cls, url, **kwargs):
scraper = cls.create_scraper(
debug=kwargs.pop('debug', False),
delay=kwargs.pop('delay', None),
interpreter=kwargs.pop('interpreter', 'js2py'),
allow_brotli=kwargs.pop('allow_brotli', True),
)
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception:
logging.error('"{}" returned an error. Could not collect tokens.'.format(url))
raise
domain = urlparse(resp.url).netloc
# noinspection PyUnusedLocal
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith('.') and d in ('.{}'.format(domain)):
cookie_domain = d
break
else:
raise ValueError('Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM ("I\'m Under Attack Mode") enabled?')
return (
{
'__cfduid': scraper.cookies.get('__cfduid', '', domain=cookie_domain),
'cf_clearance': scraper.cookies.get('cf_clearance', '', domain=cookie_domain)
},
scraper.headers['User-Agent']
)
##########################################################################################################################################################
@classmethod
def get_cookie_string(cls, url, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, **kwargs)
return '; '.join('='.join(pair) for pair in tokens.items()), user_agent
##########################################################################################################################################################
create_scraper = CloudScraper.create_scraper
get_tokens = CloudScraper.get_tokens
get_cookie_string = CloudScraper.get_cookie_string
@@ -0,0 +1,89 @@
import re
import sys
import logging
import abc
if sys.version_info >= (3, 4):
ABC = abc.ABC # noqa
else:
ABC = abc.ABCMeta('ABC', (), {})
##########################################################################################################################################################
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
interpreters = {}
class JavaScriptInterpreter(ABC):
@abc.abstractmethod
def __init__(self, name):
interpreters[name] = self
@classmethod
def dynamicImport(cls, name):
if name not in interpreters:
try:
__import__('{}.{}'.format(cls.__module__, name))
if not isinstance(interpreters.get(name), JavaScriptInterpreter):
raise ImportError('The interpreter was not initialized.')
except ImportError:
logging.error('Unable to load {} interpreter'.format(name))
raise
return interpreters[name]
@abc.abstractmethod
def eval(self, jsEnv, js):
pass
def solveChallenge(self, body, domain):
try:
js = re.search(
r'setTimeout\(function\(\){\s+(var s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n',
body
).group(1)
except Exception:
raise ValueError('Unable to identify Cloudflare IUAM Javascript on website. {}'.format(BUG_REPORT))
js = re.sub(r'\s{2,}', ' ', js, flags=re.MULTILINE | re.DOTALL).replace('\'; 121\'', '')
js += '\na.value;'
jsEnv = '''
String.prototype.italics=function(str) {{return "<i>" + this + "</i>";}};
var document = {{
createElement: function () {{
return {{ firstChild: {{ href: "https://{domain}/" }} }}
}},
getElementById: function () {{
return {{"innerHTML": "{innerHTML}"}};
}}
}};
'''
try:
innerHTML = re.search(
r'<div(?: [^<>]*)? id="([^<>]*?)">([^<>]*?)</div>',
body,
re.MULTILINE | re.DOTALL
)
innerHTML = innerHTML.group(2) if innerHTML else ''
except: # noqa
logging.error('Error extracting Cloudflare IUAM Javascript. {}'.format(BUG_REPORT))
raise
try:
result = self.eval(
re.sub(r'\s{2,}', ' ', jsEnv.format(domain=domain, innerHTML=innerHTML), flags=re.MULTILINE | re.DOTALL),
js
)
float(result)
except Exception:
logging.error('Error executing Cloudflare IUAM Javascript. {}'.format(BUG_REPORT))
raise
return result
@@ -0,0 +1,32 @@
from __future__ import absolute_import
import js2py
import logging
import base64
from . import JavaScriptInterpreter
from .jsunfuck import jsunfuck
class ChallengeInterpreter(JavaScriptInterpreter):
def __init__(self):
super(ChallengeInterpreter, self).__init__('js2py')
def eval(self, jsEnv, js):
if js2py.eval_js('(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]') == '1':
logging.warning('WARNING - Please upgrade your js2py https://github.com/PiotrDabkowski/Js2Py, applying work around for the meantime.')
js = jsunfuck(js)
def atob(s):
return base64.b64decode('{}'.format(s)).decode('utf-8')
js2py.disable_pyimport()
context = js2py.EvalJs({'atob': atob})
result = context.eval('{}{}'.format(jsEnv, js))
return result
ChallengeInterpreter()
@@ -80,18 +80,18 @@ CONSTRUCTORS = {
'RegExp': 'Function("return/"+false+"/")()'
}
def jsunfuck(jsfuckString):
for key in sorted(MAPPING, key=lambda k: len(MAPPING[k]), reverse=True):
if MAPPING.get(key) in jsfuckString:
jsfuckString = jsfuckString.replace(MAPPING.get(key), '"{}"'.format(key))
for key in sorted(SIMPLE, key=lambda k: len(SIMPLE[k]), reverse=True):
if SIMPLE.get(key) in jsfuckString:
jsfuckString = jsfuckString.replace(SIMPLE.get(key), '{}'.format(key))
#for key in sorted(CONSTRUCTORS, key=lambda k: len(CONSTRUCTORS[k]), reverse=True):
# for key in sorted(CONSTRUCTORS, key=lambda k: len(CONSTRUCTORS[k]), reverse=True):
# if CONSTRUCTORS.get(key) in jsfuckString:
# jsfuckString = jsfuckString.replace(CONSTRUCTORS.get(key), '{}'.format(key))
return jsfuckString
return jsfuckString
@@ -0,0 +1,46 @@
import base64
import logging
import subprocess
from . import JavaScriptInterpreter
##########################################################################################################################################################
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
class ChallengeInterpreter(JavaScriptInterpreter):
def __init__(self):
super(ChallengeInterpreter, self).__init__('nodejs')
def eval(self, jsEnv, js):
try:
js = 'var atob = function(str) {return Buffer.from(str, "base64").toString("binary");};' \
'var challenge = atob("%s");' \
'var context = {atob: atob};' \
'var options = {filename: "iuam-challenge.js", timeout: 4000};' \
'var answer = require("vm").runInNewContext(challenge, context, options);' \
'process.stdout.write(String(answer));' \
% base64.b64encode('{}{}'.format(jsEnv, js).encode('UTF-8')).decode('ascii')
return subprocess.check_output(['node', '-e', js])
except OSError as e:
if e.errno == 2:
raise EnvironmentError(
'Missing Node.js runtime. Node is required and must be in the PATH (check with `node -v`). Your Node binary may be called `nodejs` rather than `node`, '
'in which case you may need to run `apt-get install nodejs-legacy` on some Debian-based systems. (Please read the cloudscraper'
' README\'s Dependencies section: https://github.com/VeNoMouS/cloudscraper#dependencies.'
)
raise
except Exception:
logging.error('Error executing Cloudflare IUAM Javascript. %s' % BUG_REPORT)
raise
pass
ChallengeInterpreter()
@@ -0,0 +1,40 @@
import os
import json
import random
import logging
from collections import OrderedDict
##########################################################################################################################################################
class User_Agent():
##########################################################################################################################################################
def __init__(self, *args, **kwargs):
self.headers = None
self.loadUserAgent(*args, **kwargs)
##########################################################################################################################################################
def loadUserAgent(self, *args, **kwargs):
browser = kwargs.pop('browser', 'chrome')
user_agents = json.load(
open(os.path.join(os.path.dirname(__file__), 'browsers.json'), 'r'),
object_pairs_hook=OrderedDict
)
if not user_agents.get(browser):
logging.error('Sorry "{}" browser User-Agent was not found.'.format(browser))
raise
user_agent = random.choice(user_agents.get(browser))
self.headers = user_agent.get('headers')
self.headers['User-Agent'] = random.choice(user_agent.get('User-Agent'))
if not kwargs.get('allow_brotli', False):
if 'br' in self.headers['Accept-Encoding']:
self.headers['Accept-Encoding'] = ','.join([encoding for encoding in self.headers['Accept-Encoding'].split(',') if encoding.strip() != 'br']).strip()
File diff suppressed because it is too large Load Diff
+3 -10
View File
@@ -5,6 +5,7 @@ import re
from .translators.friendly_nodes import REGEXP_CONVERTER
from .utils.injector import fix_js_args
from types import FunctionType, ModuleType, GeneratorType, BuiltinFunctionType, MethodType, BuiltinMethodType
from math import floor, log10
import traceback
try:
import numpy
@@ -603,15 +604,7 @@ class PyJs(object):
elif typ == 'Boolean':
return Js('true') if self.value else Js('false')
elif typ == 'Number': #or self.Class=='Number':
if self.is_nan():
return Js('NaN')
elif self.is_infinity():
sign = '-' if self.value < 0 else ''
return Js(sign + 'Infinity')
elif isinstance(self.value,
long) or self.value.is_integer(): # dont print .0
return Js(unicode(int(self.value)))
return Js(unicode(self.value)) # accurate enough
return Js(unicode(js_dtoa(self.value)))
elif typ == 'String':
return self
else: #object
@@ -1046,7 +1039,7 @@ def PyJsComma(a, b):
return b
from .internals.simplex import JsException as PyJsException
from .internals.simplex import JsException as PyJsException, js_dtoa
import pyjsparser
pyjsparser.parser.ENABLE_JS2PY_ERRORS = lambda msg: MakeError('SyntaxError', msg)
@@ -116,10 +116,12 @@ def eval_js(js):
def eval_js6(js):
"""Just like eval_js but with experimental support for js6 via babel."""
return eval_js(js6_to_js5(js))
def translate_js6(js):
"""Just like translate_js but with experimental support for js6 via babel."""
return translate_js(js6_to_js5(js))
@@ -3,15 +3,19 @@ import re
import datetime
from desc import *
from simplex import *
from conversions import *
import six
from pyjsparser import PyJsParser
from itertools import izip
from .desc import *
from .simplex import *
from .conversions import *
from pyjsparser import PyJsParser
import six
if six.PY2:
from itertools import izip
else:
izip = zip
from conversions import *
from simplex import *
def Type(obj):
@@ -1,8 +1,8 @@
from code import Code
from simplex import MakeError
from opcodes import *
from operations import *
from trans_utils import *
from .code import Code
from .simplex import MakeError
from .opcodes import *
from .operations import *
from .trans_utils import *
SPECIAL_IDENTIFIERS = {'true', 'false', 'this'}
@@ -465,10 +465,11 @@ class ByteCodeGenerator:
self.emit('LOAD_OBJECT', tuple(data))
def Program(self, body, **kwargs):
old_tape_len = len(self.exe.tape)
self.emit('LOAD_UNDEFINED')
self.emit(body)
# add function tape !
self.exe.tape = self.function_declaration_tape + self.exe.tape
self.exe.tape = self.exe.tape[:old_tape_len] + self.function_declaration_tape + self.exe.tape[old_tape_len:]
def Pyimport(self, imp, **kwargs):
raise NotImplementedError(
@@ -735,17 +736,17 @@ def main():
#
# }
a.emit(d)
print a.declared_vars
print a.exe.tape
print len(a.exe.tape)
print(a.declared_vars)
print(a.exe.tape)
print(len(a.exe.tape))
a.exe.compile()
def log(this, args):
print args[0]
print(args[0])
return 999
print a.exe.run(a.exe.space.GlobalObj)
print(a.exe.run(a.exe.space.GlobalObj))
if __name__ == '__main__':
@@ -1,16 +1,17 @@
from opcodes import *
from space import *
from base import *
from .opcodes import *
from .space import *
from .base import *
class Code:
'''Can generate, store and run sequence of ops representing js code'''
def __init__(self, is_strict=False):
def __init__(self, is_strict=False, debug_mode=False):
self.tape = []
self.compiled = False
self.label_locs = None
self.is_strict = is_strict
self.debug_mode = debug_mode
self.contexts = []
self.current_ctx = None
@@ -22,6 +23,10 @@ class Code:
self.GLOBAL_THIS = None
self.space = None
# dbg
self.ctx_depth = 0
def get_new_label(self):
self._label_count += 1
return self._label_count
@@ -74,21 +79,35 @@ class Code:
# 0=normal, 1=return, 2=jump_outside, 3=errors
# execute_fragment_under_context returns:
# (return_value, typ, return_value/jump_loc/py_error)
# ctx.stack must be len 1 and its always empty after the call.
# IMPARTANT: It is guaranteed that the length of the ctx.stack is unchanged.
'''
old_curr_ctx = self.current_ctx
self.ctx_depth += 1
old_stack_len = len(ctx.stack)
old_ret_len = len(self.return_locs)
old_ctx_len = len(self.contexts)
try:
self.current_ctx = ctx
return self._execute_fragment_under_context(
ctx, start_label, end_label)
except JsException as err:
# undo the things that were put on the stack (if any)
# don't worry, I know the recovery is possible through try statement and for this reason try statement
# has its own context and stack so it will not delete the contents of the outer stack
del ctx.stack[:]
if self.debug_mode:
self._on_fragment_exit("js errors")
# undo the things that were put on the stack (if any) to ensure a proper error recovery
del ctx.stack[old_stack_len:]
del self.return_locs[old_ret_len:]
del self.contexts[old_ctx_len :]
return undefined, 3, err
finally:
self.ctx_depth -= 1
self.current_ctx = old_curr_ctx
assert old_stack_len == len(ctx.stack)
def _get_dbg_indent(self):
return self.ctx_depth * ' '
def _on_fragment_exit(self, mode):
print(self._get_dbg_indent() + 'ctx exit (%s)' % mode)
def _execute_fragment_under_context(self, ctx, start_label, end_label):
start, end = self.label_locs[start_label], self.label_locs[end_label]
@@ -97,16 +116,20 @@ class Code:
entry_level = len(self.contexts)
# for e in self.tape[start:end]:
# print e
if self.debug_mode:
print(self._get_dbg_indent() + 'ctx entry (from:%d, to:%d)' % (start, end))
while loc < len(self.tape):
#print loc, self.tape[loc]
if len(self.contexts) == entry_level and loc >= end:
if self.debug_mode:
self._on_fragment_exit('normal')
assert loc == end
assert len(ctx.stack) == (
1 + initial_len), 'Stack change must be equal to +1!'
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return ctx.stack.pop(), 0, None # means normal return
# execute instruction
if self.debug_mode:
print(self._get_dbg_indent() + str(loc), self.tape[loc])
status = self.tape[loc].eval(ctx)
# check status for special actions
@@ -116,9 +139,10 @@ class Code:
if len(self.contexts) == entry_level:
# check if jumped outside of the fragment and break if so
if not start <= loc < end:
assert len(ctx.stack) == (
1 + initial_len
), 'Stack change must be equal to +1!'
if self.debug_mode:
self._on_fragment_exit('jump outside loc:%d label:%d' % (loc, status))
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return ctx.stack.pop(), 2, status # jump outside
continue
@@ -137,7 +161,10 @@ class Code:
# return: (None, None)
else:
if len(self.contexts) == entry_level:
assert len(ctx.stack) == 1 + initial_len
if self.debug_mode:
self._on_fragment_exit('return')
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return undefined, 1, ctx.stack.pop(
) # return signal
return_value = ctx.stack.pop()
@@ -149,6 +176,8 @@ class Code:
continue
# next instruction
loc += 1
if self.debug_mode:
self._on_fragment_exit('internal error - unexpected end of tape, will crash')
assert False, 'Remember to add NOP at the end!'
def run(self, ctx, starting_loc=0):
@@ -156,7 +185,8 @@ class Code:
self.current_ctx = ctx
while loc < len(self.tape):
# execute instruction
#print loc, self.tape[loc]
if self.debug_mode:
print(loc, self.tape[loc])
status = self.tape[loc].eval(ctx)
# check status for special actions
@@ -42,6 +42,7 @@ def executable_code(code_str, space, global_context=True):
space.byte_generator.emit('LABEL', skip)
space.byte_generator.emit('NOP')
space.byte_generator.restore_state()
space.byte_generator.exe.compile(
start_loc=old_tape_len
) # dont read the code from the beginning, dont be stupid!
@@ -71,5 +72,5 @@ def _eval(this, args):
def log(this, args):
print ' '.join(map(to_string, args))
print(' '.join(map(to_string, args)))
return undefined
@@ -1,6 +1,6 @@
from __future__ import unicode_literals
# Type Conversions. to_type. All must return PyJs subclass instance
from simplex import *
from .simplex import *
def to_primitive(self, hint=None):
@@ -73,14 +73,7 @@ def to_string(self):
elif typ == 'Boolean':
return 'true' if self else 'false'
elif typ == 'Number': # or self.Class=='Number':
if is_nan(self):
return 'NaN'
elif is_infinity(self):
sign = '-' if self < 0 else ''
return sign + 'Infinity'
elif int(self) == self: # integer value!
return unicode(int(self))
return unicode(self) # todo make it print exactly like node.js
return js_dtoa(self)
else: # object
return to_string(to_primitive(self, 'String'))
@@ -1,29 +1,22 @@
from __future__ import unicode_literals
from base import Scope
from func_utils import *
from conversions import *
from .base import Scope
from .func_utils import *
from .conversions import *
import six
from prototypes.jsboolean import BooleanPrototype
from prototypes.jserror import ErrorPrototype
from prototypes.jsfunction import FunctionPrototype
from prototypes.jsnumber import NumberPrototype
from prototypes.jsobject import ObjectPrototype
from prototypes.jsregexp import RegExpPrototype
from prototypes.jsstring import StringPrototype
from prototypes.jsarray import ArrayPrototype
import prototypes.jsjson as jsjson
import prototypes.jsutils as jsutils
from .prototypes.jsboolean import BooleanPrototype
from .prototypes.jserror import ErrorPrototype
from .prototypes.jsfunction import FunctionPrototype
from .prototypes.jsnumber import NumberPrototype
from .prototypes.jsobject import ObjectPrototype
from .prototypes.jsregexp import RegExpPrototype
from .prototypes.jsstring import StringPrototype
from .prototypes.jsarray import ArrayPrototype
from .prototypes import jsjson
from .prototypes import jsutils
from .constructors import jsnumber, jsstring, jsarray, jsboolean, jsregexp, jsmath, jsobject, jsfunction, jsconsole
from constructors import jsnumber
from constructors import jsstring
from constructors import jsarray
from constructors import jsboolean
from constructors import jsregexp
from constructors import jsmath
from constructors import jsobject
from constructors import jsfunction
from constructors import jsconsole
def fill_proto(proto, proto_class, space):
@@ -155,7 +148,10 @@ def fill_space(space, byte_generator):
j = easy_func(creator, space)
j.name = unicode(typ)
j.prototype = space.ERROR_TYPES[typ]
set_protected(j, 'prototype', space.ERROR_TYPES[typ])
set_non_enumerable(space.ERROR_TYPES[typ], 'constructor', j)
def new_create(args, space):
message = get_arg(args, 0)
@@ -178,6 +174,7 @@ def fill_space(space, byte_generator):
setattr(space, err_type_name + u'Prototype', extra_err)
error_constructors[err_type_name] = construct_constructor(
err_type_name)
assert space.TypeErrorPrototype is not None
# RegExp
@@ -1,5 +1,5 @@
from simplex import *
from conversions import *
from .simplex import *
from .conversions import *
import six
if six.PY3:
@@ -1,5 +1,5 @@
from operations import *
from base import get_member, get_member_dot, PyJsFunction, Scope
from .operations import *
from .base import get_member, get_member_dot, PyJsFunction, Scope
class OP_CODE(object):
@@ -1,6 +1,6 @@
from __future__ import unicode_literals
from simplex import *
from conversions import *
from .simplex import *
from .conversions import *
# ------------------------------------------------------------------------------
# Unary operations
@@ -4,7 +4,7 @@ from __future__ import unicode_literals
import re
from ..conversions import *
from ..func_utils import *
from jsregexp import RegExpExec
from .jsregexp import RegExpExec
DIGS = set(u'0123456789')
WHITE = u"\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u2028\u2029\u202F\u205F\u3000\uFEFF"
@@ -1,11 +1,9 @@
import pyjsparser
from space import Space
import fill_space
from byte_trans import ByteCodeGenerator
from code import Code
from simplex import MakeError
import sys
sys.setrecursionlimit(100000)
from .space import Space
from . import fill_space
from .byte_trans import ByteCodeGenerator
from .code import Code
from .simplex import *
pyjsparser.parser.ENABLE_JS2PY_ERRORS = lambda msg: MakeError(u'SyntaxError', unicode(msg))
@@ -16,8 +14,8 @@ def get_js_bytecode(js):
a.emit(d)
return a.exe.tape
def eval_js_vm(js):
a = ByteCodeGenerator(Code())
def eval_js_vm(js, debug=False):
a = ByteCodeGenerator(Code(debug_mode=debug))
s = Space()
a.exe.space = s
s.exe = a.exe
@@ -26,7 +24,10 @@ def eval_js_vm(js):
a.emit(d)
fill_space.fill_space(s, a)
# print a.exe.tape
if debug:
from pprint import pprint
pprint(a.exe.tape)
print()
a.exe.compile()
return a.exe.run(a.exe.space.GlobalObj)
@@ -1,6 +1,10 @@
from __future__ import unicode_literals
import six
if six.PY3:
basestring = str
long = int
xrange = range
unicode = str
#Undefined
class PyJsUndefined(object):
@@ -75,7 +79,7 @@ def is_callable(self):
def is_infinity(self):
return self == float('inf') or self == -float('inf')
return self == Infinity or self == -Infinity
def is_nan(self):
@@ -114,7 +118,7 @@ class JsException(Exception):
return self.mes.to_string().value
else:
if self.throw is not None:
from conversions import to_string
from .conversions import to_string
return to_string(self.throw)
else:
return self.typ + ': ' + self.message
@@ -131,3 +135,26 @@ def value_from_js_exception(js_exception, space):
return js_exception.throw
else:
return space.NewError(js_exception.typ, js_exception.message)
def js_dtoa(number):
if is_nan(number):
return u'NaN'
elif is_infinity(number):
if number > 0:
return u'Infinity'
return u'-Infinity'
elif number == 0.:
return u'0'
elif abs(number) < 1e-6 or abs(number) >= 1e21:
frac, exponent = unicode(repr(float(number))).split('e')
# Remove leading zeros from the exponent.
exponent = int(exponent)
return frac + ('e' if exponent < 0 else 'e+') + unicode(exponent)
elif abs(number) < 1e-4: # python starts to return exp notation while we still want the prec
frac, exponent = unicode(repr(float(number))).split('e-')
base = u'0.' + u'0' * (int(exponent) - 1) + frac.lstrip('-').replace('.', '')
return base if number > 0. else u'-' + base
elif isinstance(number, long) or number.is_integer(): # dont print .0
return unicode(int(number))
return unicode(repr(number)) # python representation should be equivalent.
@@ -1,5 +1,5 @@
from base import *
from simplex import *
from .base import *
from .simplex import *
class Space(object):
@@ -1,3 +1,10 @@
import six
if six.PY3:
basestring = str
long = int
xrange = range
unicode = str
def to_key(literal_or_identifier):
''' returns string representation of this object'''
if literal_or_identifier['type'] == 'Identifier':
@@ -6,8 +6,6 @@ if six.PY3:
xrange = range
unicode = str
# todo fix apply and bind
class FunctionPrototype:
def toString():
@@ -41,6 +39,7 @@ class FunctionPrototype:
return this.call(obj, args)
def bind(thisArg):
arguments_ = arguments
target = this
if not target.is_callable():
raise this.MakeError(
@@ -48,5 +47,5 @@ class FunctionPrototype:
if len(arguments) <= 1:
args = ()
else:
args = tuple([arguments[e] for e in xrange(1, len(arguments))])
args = tuple([arguments_[e] for e in xrange(1, len(arguments_))])
return this.PyJsBoundFunction(target, thisArg, args)
@@ -345,7 +345,7 @@ def BlockStatement(type, body):
body) # never returns empty string! In the worst case returns pass\n
def ExpressionStatement(type, expression, **ommit):
def ExpressionStatement(type, expression):
return trans(expression) + '\n' # end expression space with new line
+10
View File
@@ -163,3 +163,13 @@ class Pysubs2CLI(object):
elif args.transform_framerate is not None:
in_fps, out_fps = args.transform_framerate
subs.transform_framerate(in_fps, out_fps)
def __main__():
cli = Pysubs2CLI()
rv = cli(sys.argv[1:])
sys.exit(rv)
if __name__ == "__main__":
__main__()
+3 -1
View File
@@ -17,12 +17,14 @@ class Color(_Color):
return _Color.__new__(cls, r, g, b, a)
#: Version of the pysubs2 library.
VERSION = "0.2.1"
VERSION = "0.2.3"
PY3 = sys.version_info.major == 3
if PY3:
text_type = str
binary_string_type = bytes
else:
text_type = unicode
binary_string_type = str
+1 -3
View File
@@ -3,7 +3,7 @@ from .microdvd import MicroDVDFormat
from .subrip import SubripFormat
from .jsonformat import JSONFormat
from .substation import SubstationFormat
from .txt_generic import TXTGenericFormat, MPL2Format
from .mpl2 import MPL2Format
from .exceptions import *
#: Dict mapping file extensions to format identifiers.
@@ -13,7 +13,6 @@ FILE_EXTENSION_TO_FORMAT_IDENTIFIER = {
".ssa": "ssa",
".sub": "microdvd",
".json": "json",
".txt": "txt_generic",
}
#: Dict mapping format identifiers to implementations (FormatBase subclasses).
@@ -23,7 +22,6 @@ FORMAT_IDENTIFIER_TO_FORMAT_CLASS = {
"ssa": SubstationFormat,
"microdvd": MicroDVDFormat,
"json": JSONFormat,
"txt_generic": TXTGenericFormat,
"mpl2": MPL2Format,
}
@@ -2,44 +2,48 @@
from __future__ import print_function, division, unicode_literals
import re
from numbers import Number
from pysubs2.time import times_to_ms
from .time import times_to_ms
from .formatbase import FormatBase
from .ssaevent import SSAEvent
from .ssastyle import SSAStyle
# thanks to http://otsaloma.io/gaupol/doc/api/aeidon.files.mpl2_source.html
MPL2_FORMAT = re.compile(r"^(?um)\[(-?\d+)\]\[(-?\d+)\](.*?)$")
class TXTGenericFormat(FormatBase):
@classmethod
def guess_format(cls, text):
if MPL2_FORMAT.match(text):
return "mpl2"
MPL2_FORMAT = re.compile(r"^(?um)\[(-?\d+)\]\[(-?\d+)\](.*)")
class MPL2Format(FormatBase):
@classmethod
def guess_format(cls, text):
return TXTGenericFormat.guess_format(text)
if MPL2_FORMAT.search(text):
return "mpl2"
@classmethod
def from_file(cls, subs, fp, format_, **kwargs):
def prepare_text(lines):
out = []
for s in lines.split("|"):
s = s.strip()
if s.startswith("/"):
out.append(r"{\i1}%s{\i0}" % s[1:])
continue
# line beginning with '/' is in italics
s = r"{\i1}%s{\i0}" % s[1:].strip()
out.append(s)
return "\n".join(out)
return "\\N".join(out)
subs.events = [SSAEvent(start=times_to_ms(s=float(start) / 10), end=times_to_ms(s=float(end) / 10),
text=prepare_text(text)) for start, end, text in MPL2_FORMAT.findall(fp.getvalue())]
@classmethod
def to_file(cls, subs, fp, format_, **kwargs):
raise NotImplemented
# TODO handle italics
for line in subs:
if line.is_comment:
continue
print("[{start}][{end}] {text}".format(start=int(line.start // 100),
end=int(line.end // 100),
text=line.plaintext.replace("\n", "|")),
file=fp)
@@ -78,7 +78,7 @@ class SSAStyle(object):
s += "%rpx " % self.fontsize
if self.bold: s += "bold "
if self.italic: s += "italic "
s += "'%s'>" % self.fontname
s += "{!r}>".format(self.fontname)
if not PY3: s = s.encode("utf-8")
return s
+9 -1
View File
@@ -46,8 +46,16 @@ class SubripFormat(FormatBase):
following_lines[-1].append(line)
def prepare_text(lines):
# Handle the "happy" empty subtitle case, which is timestamp line followed by blank line(s)
# followed by number line and timestamp line of the next subtitle. Fixes issue #11.
if (len(lines) >= 2
and all(re.match("\s*$", line) for line in lines[:-1])
and re.match("\s*\d+\s*$", lines[-1])):
return ""
# Handle the general case.
s = "".join(lines).strip()
s = re.sub(r"\n* *\d+ *$", "", s) # strip number of next subtitle
s = re.sub(r"\n+ *\d+ *$", "", s) # strip number of next subtitle
s = re.sub(r"< *i *>", r"{\i1}", s)
s = re.sub(r"< */ *i *>", r"{\i0}", s)
s = re.sub(r"< *s *>", r"{\s1}", s)
+14 -13
View File
@@ -4,7 +4,7 @@ from numbers import Number
from .formatbase import FormatBase
from .ssaevent import SSAEvent
from .ssastyle import SSAStyle
from .common import text_type, Color
from .common import text_type, Color, PY3, binary_string_type
from .time import make_time, ms_to_times, timestamp_to_ms, TIMESTAMP
SSA_ALIGNMENT = (1, 2, 3, 9, 10, 11, 5, 6, 7)
@@ -150,14 +150,7 @@ class SubstationFormat(FormatBase):
if format_ == "ass":
return ass_rgba_to_color(v)
else:
try:
return ssa_rgb_to_color(v)
except ValueError:
try:
return ass_rgba_to_color(v)
except:
return Color(255, 255, 255, 0)
return ssa_rgb_to_color(v)
elif f in {"bold", "underline", "italic", "strikeout"}:
return v == "-1"
elif f in {"borderstyle", "encoding", "marginl", "marginr", "marginv", "layer", "alphalevel"}:
@@ -229,7 +222,7 @@ class SubstationFormat(FormatBase):
for k, v in subs.aegisub_project.items():
print(k, v, sep=": ", file=fp)
def field_to_string(f, v):
def field_to_string(f, v, line):
if f in {"start", "end"}:
return ms_to_timestamp(v)
elif f == "marked":
@@ -240,23 +233,31 @@ class SubstationFormat(FormatBase):
return "-1" if v else "0"
elif isinstance(v, (text_type, Number)):
return text_type(v)
elif not PY3 and isinstance(v, binary_string_type):
# A convenience feature, see issue #12 - accept non-unicode strings
# when they are ASCII; this is useful in Python 2, especially for non-text
# fields like style names, where requiring Unicode type seems too stringent
if all(ord(c) < 128 for c in v):
return text_type(v)
else:
raise TypeError("Encountered binary string with non-ASCII codepoint in SubStation field {!r} for line {!r} - please use unicode string instead of str".format(f, line))
elif isinstance(v, Color):
if format_ == "ass":
return color_to_ass_rgba(v)
else:
return color_to_ssa_rgb(v)
else:
raise TypeError("Unexpected type when writing a SubStation field")
raise TypeError("Unexpected type when writing a SubStation field {!r} for line {!r}".format(f, line))
print("\n[V4+ Styles]" if format_ == "ass" else "\n[V4 Styles]", file=fp)
print(STYLE_FORMAT_LINE[format_], file=fp)
for name, sty in subs.styles.items():
fields = [field_to_string(f, getattr(sty, f)) for f in STYLE_FIELDS[format_]]
fields = [field_to_string(f, getattr(sty, f), sty) for f in STYLE_FIELDS[format_]]
print("Style: %s" % name, *fields, sep=",", file=fp)
print("\n[Events]", file=fp)
print(EVENT_FORMAT_LINE[format_], file=fp)
for ev in subs.events:
fields = [field_to_string(f, getattr(ev, f)) for f in EVENT_FIELDS[format_]]
fields = [field_to_string(f, getattr(ev, f), ev) for f in EVENT_FIELDS[format_]]
print(ev.type, end=": ", file=fp)
print(*fields, sep=",", file=fp)
@@ -258,4 +258,4 @@ def fix_line_ending(content):
:rtype: bytes
"""
return content.replace(b'\r\n', b'\n').replace(b'\r', b'\n')
return content.replace(b'\r\n', b'\n')
@@ -12,6 +12,13 @@ from_subscene = {
'Malay': 'msa', 'Pashto': 'pus', 'Punjabi': 'pan', 'Swahili': 'swa'
}
from_subscene_with_country = {
'Brazillian Portuguese': ('por', 'BR')
}
to_subscene_with_country = {val: key for key, val in from_subscene_with_country.items()}
to_subscene = {v: k for k, v in from_subscene.items()}
exact_languages_alpha3 = [
@@ -34,12 +41,12 @@ language_ids = {
'mkd': 48, 'mal': 64, 'mni': 65, 'mon': 72, 'pus': 67, 'pol': 31,
'por': 32, 'pan': 66, 'rus': 34, 'srp': 35, 'sin': 58, 'slk': 36,
'slv': 37, 'som': 70, 'tgl': 53, 'tam': 59, 'tel': 63, 'tha': 40,
'tur': 41, 'ukr': 56, 'urd': 42, 'yor': 71
'tur': 41, 'ukr': 56, 'urd': 42, 'yor': 71, 'pt-BR': 4
}
# TODO: specify codes for unspecified_languages
unspecified_languages = [
'Big 5 code', 'Brazillian Portuguese', 'Bulgarian/ English',
'Big 5 code', 'Bulgarian/ English',
'Chinese BG code', 'Dutch/ English', 'English/ German',
'Hungarian/ English', 'Rohingya'
]
@@ -50,6 +57,8 @@ alpha3_of_code = {l.name: l.alpha3 for l in supported_languages}
supported_languages.update({Language(l) for l in to_subscene})
supported_languages.update({Language(lang, cr) for lang, cr in to_subscene_with_country})
class SubsceneConverter(LanguageReverseConverter):
codes = {l.name for l in supported_languages}
@@ -61,9 +70,15 @@ class SubsceneConverter(LanguageReverseConverter):
if alpha3 in to_subscene:
return to_subscene[alpha3]
if (alpha3, country) in to_subscene_with_country:
return to_subscene_with_country[(alpha3, country)]
raise ConfigurationError('Unsupported language for subscene: %s, %s, %s' % (alpha3, country, script))
def reverse(self, code):
if code in from_subscene_with_country:
return from_subscene_with_country[code]
if code in from_subscene:
return (from_subscene[code],)
@@ -27,16 +27,6 @@ class TitloviConverter(LanguageReverseConverter):
}
self.codes = set(self.from_titlovi.keys())
# temporary fix, should be removed as soon as API is used
self.lang_from_countrycode = {'ba': ('bos',),
'en': ('eng',),
'hr': ('hrv',),
'mk': ('mkd',),
'rs': ('srp',),
'rsc': ('srp', None, 'Cyrl'),
'si': ('slv',)
}
def convert(self, alpha3, country=None, script=None):
if (alpha3, country, script) in self.to_titlovi:
return self.to_titlovi[(alpha3, country, script)]
@@ -49,9 +39,5 @@ class TitloviConverter(LanguageReverseConverter):
if titlovi in self.from_titlovi:
return self.from_titlovi[titlovi]
# temporary fix, should be removed as soon as API is used
if titlovi in self.lang_from_countrycode:
return self.lang_from_countrycode[titlovi]
raise ConfigurationError('Unsupported language number for titlovi: %s' % titlovi)
@@ -30,7 +30,7 @@ from subliminal.core import guessit, ProviderPool, io, is_windows_special_path,
ThreadPoolExecutor, check_video
from subliminal_patch.exceptions import TooManyRequests, APIThrottled
from subzero.language import Language
from subzero.language import Language, ENDSWITH_LANGUAGECODE_RE
from scandir import scandir, scandir_generic as _scandir_generic
logger = logging.getLogger(__name__)
@@ -62,7 +62,7 @@ class SZProviderPool(ProviderPool):
def __init__(self, providers=None, provider_configs=None, blacklist=None, throttle_callback=None,
pre_download_hook=None, post_download_hook=None, language_hook=None):
#: Name of providers to use
self.providers = providers or provider_registry.names()
self.providers = providers
#: Provider configuration
self.provider_configs = provider_configs or {}
@@ -186,12 +186,9 @@ class SZProviderPool(ProviderPool):
except (requests.Timeout, socket.timeout):
logger.error('Provider %r timed out', provider)
except (TooManyRequests, DownloadLimitExceeded, ServiceUnavailable, APIThrottled), e:
self.throttle_callback(provider, e)
return
except:
except Exception as e:
logger.exception('Unexpected error in provider %r: %s', provider, traceback.format_exc())
self.throttle_callback(provider, e)
def list_subtitles(self, video, languages):
"""List subtitles.
@@ -283,14 +280,10 @@ class SZProviderPool(ProviderPool):
logger.debug("RAR Traceback: %s", traceback.format_exc())
return False
except (TooManyRequests, DownloadLimitExceeded, ServiceUnavailable, APIThrottled), e:
self.throttle_callback(subtitle.provider_name, e)
self.discarded_providers.add(subtitle.provider_name)
return False
except:
except Exception as e:
logger.exception('Unexpected error in provider %r, Traceback: %s', subtitle.provider_name,
traceback.format_exc())
self.throttle_callback(subtitle.provider_name, e)
self.discarded_providers.add(subtitle.provider_name)
return False
@@ -309,7 +302,8 @@ class SZProviderPool(ProviderPool):
logger.error('Invalid subtitle')
return False
subtitle.normalize()
if not os.environ.get("SZ_KEEP_ENCODING", False):
subtitle.normalize()
return True
@@ -472,7 +466,7 @@ if is_windows_special_path:
SZAsyncProviderPool = SZProviderPool
def scan_video(path, dont_use_actual_file=False, hints=None, providers=None, skip_hashing=False):
def scan_video(path, dont_use_actual_file=False, hints=None, providers=None, skip_hashing=False, hash_from=None):
"""Scan a video from a `path`.
patch:
@@ -537,28 +531,34 @@ def scan_video(path, dont_use_actual_file=False, hints=None, providers=None, ski
video.alternative_titles.append(alt_guess["title"])
logger.debug("Adding alternative title: %s", alt_guess["title"])
if dont_use_actual_file:
if dont_use_actual_file and not hash_from:
return video
# size and hashes
if not skip_hashing:
video.size = os.path.getsize(path)
hash_path = hash_from or path
video.size = os.path.getsize(hash_path)
if video.size > 10485760:
logger.debug('Size is %d', video.size)
osub_hash = None
if "opensubtitles" in providers:
video.hashes['opensubtitles'] = hash_opensubtitles(path)
video.hashes['opensubtitles'] = osub_hash = hash_opensubtitles(hash_path)
if "shooter" in providers:
video.hashes['shooter'] = hash_shooter(path)
video.hashes['shooter'] = hash_shooter(hash_path)
if "thesubdb" in providers:
video.hashes['thesubdb'] = hash_thesubdb(path)
video.hashes['thesubdb'] = hash_thesubdb(hash_path)
if "napiprojekt" in providers:
try:
video.hashes['napiprojekt'] = hash_napiprojekt(path)
video.hashes['napiprojekt'] = hash_napiprojekt(hash_path)
except MemoryError:
logger.warning(u"Couldn't compute napiprojekt hash for %s", path)
logger.warning(u"Couldn't compute napiprojekt hash for %s", hash_path)
if "napisy24" in providers:
# Napisy24 uses the same hash as opensubtitles
video.hashes['napisy24'] = osub_hash or hash_opensubtitles(hash_path)
logger.debug('Computed hashes %r', video.hashes)
else:
@@ -567,14 +567,16 @@ def scan_video(path, dont_use_actual_file=False, hints=None, providers=None, ski
return video
def _search_external_subtitles(path, languages=None, only_one=False, scandir_generic=False):
def _search_external_subtitles(path, languages=None, only_one=False, scandir_generic=False, match_strictness="strict"):
dirpath, filename = os.path.split(path)
dirpath = dirpath or '.'
fileroot, fileext = os.path.splitext(filename)
fn_no_ext, fileext = os.path.splitext(filename)
fn_no_ext_lower = fn_no_ext.lower()
subtitles = {}
_scandir = _scandir_generic if scandir_generic else scandir
for entry in _scandir(dirpath):
if not entry.name and not scandir_generic:
if (not entry.name or entry.name in ('\x0c', '$', ',', '\x7f')) and not scandir_generic:
logger.debug('Could not determine the name of the file, retrying with scandir_generic')
return _search_external_subtitles(path, languages, only_one, True)
if not entry.is_file(follow_symlinks=False):
@@ -583,9 +585,11 @@ def _search_external_subtitles(path, languages=None, only_one=False, scandir_gen
p = entry.name
# keep only valid subtitle filenames
if not p.lower().startswith(fileroot.lower()) or not p.lower().endswith(SUBTITLE_EXTENSIONS):
if not p.lower().endswith(SUBTITLE_EXTENSIONS):
continue
# not p.lower().startswith(fileroot.lower()) or not
p_root, p_ext = os.path.splitext(p)
if not INCLUDE_EXOTIC_SUBS and p_ext not in (".srt", ".ass", ".ssa", ".vtt"):
continue
@@ -603,22 +607,33 @@ def _search_external_subtitles(path, languages=None, only_one=False, scandir_gen
if adv_tag:
forced = "forced" in adv_tag
# remove possible language code for matching
p_root_bare = ENDSWITH_LANGUAGECODE_RE.sub("", p_root)
p_root_lower = p_root_bare.lower()
filename_matches = p_root_lower == fn_no_ext_lower
filename_contains = p_root_lower in fn_no_ext_lower
if not filename_matches:
if match_strictness == "strict" or (match_strictness == "loose" and not filename_contains):
continue
language = None
# extract the potential language code
language_code = p_root[len(fileroot):].replace('_', '-')[1:]
# default language is undefined
language = Language('und')
# attempt to parse
if language_code:
try:
language_code = p_root.rsplit(".", 1)[1].replace('_', '-')
try:
language = Language.fromietf(language_code)
language.forced = forced
except ValueError:
except (ValueError, LanguageReverseError):
logger.error('Cannot parse language code %r', language_code)
language = None
language_code = None
except IndexError:
language_code = None
elif not language_code and only_one:
if not language and not language_code and only_one:
language = Language.rebuild(list(languages)[0], forced=forced)
subtitles[p] = language
@@ -628,7 +643,7 @@ def _search_external_subtitles(path, languages=None, only_one=False, scandir_gen
return subtitles
def search_external_subtitles(path, languages=None, only_one=False):
def search_external_subtitles(path, languages=None, only_one=False, match_strictness="strict"):
"""
wrap original search_external_subtitles function to search multiple paths for one given video
# todo: cleanup and merge with _search_external_subtitles
@@ -649,10 +664,11 @@ def search_external_subtitles(path, languages=None, only_one=False):
if os.path.isdir(os.path.dirname(abspath)):
try:
subtitles.update(_search_external_subtitles(abspath, languages=languages,
only_one=only_one))
only_one=only_one, match_strictness=match_strictness))
except OSError:
subtitles.update(_search_external_subtitles(abspath, languages=languages,
only_one=only_one, scandir_generic=True))
only_one=only_one, match_strictness=match_strictness,
scandir_generic=True))
logger.debug("external subs: found %s", subtitles)
return subtitles
@@ -845,6 +861,9 @@ def save_subtitles(file_path, subtitles, single=False, directory=None, chmod=Non
logger.debug(u"Saving %r to %r", subtitle, subtitle_path)
content = subtitle.get_modified_content(format=format, debug=debug_mods)
if content:
if os.path.exists(subtitle_path):
os.remove(subtitle_path)
with open(subtitle_path, 'w') as f:
f.write(content)
subtitle.storage_path = subtitle_path
@@ -9,3 +9,8 @@ class TooManyRequests(ProviderError):
class APIThrottled(ProviderError):
pass
class ParseResponseError(ProviderError):
"""Exception raised by providers when they are not able to parse the response."""
pass
@@ -10,6 +10,8 @@ import logging
import requests
import xmlrpclib
import dns.resolver
import ipaddress
import re
from requests import exceptions
from urllib3.util import connection
@@ -17,7 +19,13 @@ from retry.api import retry_call
from exceptions import APIThrottled
from dogpile.cache.api import NO_VALUE
from subliminal.cache import region
from cfscrape import CloudflareScraper
from subliminal_patch.pitcher import pitchers
from cloudscraper import CloudScraper
try:
import brotli
except:
pass
try:
from urlparse import urlparse
@@ -55,39 +63,111 @@ class CertifiSession(TimeoutSession):
self.verify = pem_file
class CFSession(CloudflareScraper):
def __init__(self):
super(CFSession, self).__init__()
class NeedsCaptchaException(Exception):
pass
class CFSession(CloudScraper):
def __init__(self, *args, **kwargs):
super(CFSession, self).__init__(*args, **kwargs)
self.debug = os.environ.get("CF_DEBUG", False)
def _request(self, method, url, *args, **kwargs):
ourSuper = super(CloudScraper, self)
resp = ourSuper.request(method, url, *args, **kwargs)
if resp.headers.get('Content-Encoding') == 'br':
if self.allow_brotli and resp._content:
resp._content = brotli.decompress(resp.content)
else:
logging.warning('Brotli content detected, But option is disabled, we will not continue.')
return resp
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
try:
if self.isChallengeRequest(resp):
if resp.request.method != 'GET':
# Work around if the initial request is not a GET,
# Supersede with a GET then re-request the original METHOD.
CloudScraper.request(self, 'GET', resp.url)
resp = ourSuper.request(method, url, *args, **kwargs)
else:
# Solve Challenge
resp = self.sendChallengeResponse(resp, **kwargs)
except ValueError, e:
if e.message == "Captcha":
parsed_url = urlparse(url)
domain = parsed_url.netloc
# solve the captcha
site_key = re.search(r'data-sitekey="(.+?)"', resp.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', resp.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', resp.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("cf: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("cf: %s" % domain, resp.request.url, site_key,
user_agent=self.headers["User-Agent"],
cookies=self.cookies.get_dict(),
is_invisible=True)
parsed_url = urlparse(resp.url)
logger.info("cf: %s: Solving captcha", domain)
result = pitcher.throw()
if not result:
raise Exception("cf: Couldn't solve captcha!")
submit_url = '{}://{}/cdn-cgi/l/chk_captcha'.format(parsed_url.scheme, domain)
method = resp.request.method
cloudflare_kwargs = {
'allow_redirects': False,
'headers': {'Referer': resp.url},
'params': OrderedDict(
[
('s', challenge_s),
('g-recaptcha-response', result)
]
)
}
return CloudScraper.request(self, method, submit_url, **cloudflare_kwargs)
return resp
def request(self, method, url, *args, **kwargs):
parsed_url = urlparse(url)
domain = parsed_url.netloc
cache_key = "cf_data2_%s" % domain
cache_key = "cf_data3_%s" % domain
if not self.cookies.get("__cfduid", "", domain=domain):
if not self.cookies.get("cf_clearance", "", domain=domain):
cf_data = region.get(cache_key)
if cf_data is not NO_VALUE:
cf_cookies, user_agent, hdrs = cf_data
cf_cookies, hdrs = cf_data
logger.debug("Trying to use old cf data for %s: %s", domain, cf_data)
for cookie, value in cf_cookies.iteritems():
self.cookies.set(cookie, value, domain=domain)
self._hdrs = hdrs
self._ua = user_agent
self.headers['User-Agent'] = self._ua
self.headers = hdrs
ret = super(CFSession, self).request(method, url, *args, **kwargs)
ret = self._request(method, url, *args, **kwargs)
try:
cf_data = self.get_cf_live_tokens(domain)
except:
pass
else:
if cf_data != region.get(cache_key) and cf_data[0]["__cfduid"] and cf_data[0]["cf_clearance"]:
logger.debug("Storing cf data for %s: %s", domain, cf_data)
region.set(cache_key, cf_data)
if cf_data and "cf_clearance" in cf_data[0] and cf_data[0]["cf_clearance"]:
if cf_data != region.get(cache_key):
logger.debug("Storing cf data for %s: %s", domain, cf_data)
region.set(cache_key, cf_data)
elif cf_data[0]["cf_clearance"]:
logger.debug("CF Live tokens not updated")
return ret
@@ -101,11 +181,11 @@ class CFSession(CloudflareScraper):
"Unable to find Cloudflare cookies. Does the site actually have "
"Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")
return (OrderedDict([
return (OrderedDict(filter(lambda x: x[1], [
("__cfduid", self.cookies.get("__cfduid", "", domain=cookie_domain)),
("cf_clearance", self.cookies.get("cf_clearance", "", domain=cookie_domain))
]),
self._ua, self._hdrs
])),
self.headers
)
@@ -236,41 +316,46 @@ def patch_create_connection():
global _custom_resolver, _custom_resolver_ips, dns_cache
host, port = address
__custom_resolver_ips = os.environ.get("dns_resolvers", None)
try:
ipaddress.ip_address(unicode(host))
except (ipaddress.AddressValueError, ValueError):
__custom_resolver_ips = os.environ.get("dns_resolvers", None)
# resolver ips changed in the meantime?
if __custom_resolver_ips != _custom_resolver_ips:
_custom_resolver = None
_custom_resolver_ips = __custom_resolver_ips
dns_cache = {}
# resolver ips changed in the meantime?
if __custom_resolver_ips != _custom_resolver_ips:
_custom_resolver = None
_custom_resolver_ips = __custom_resolver_ips
dns_cache = {}
custom_resolver = _custom_resolver
custom_resolver = _custom_resolver
if not custom_resolver:
if _custom_resolver_ips:
logger.debug("DNS: Trying to use custom DNS resolvers: %s", _custom_resolver_ips)
custom_resolver = dns.resolver.Resolver(configure=False)
custom_resolver.lifetime = 8.0
try:
custom_resolver.nameservers = json.loads(_custom_resolver_ips)
except:
logger.debug("DNS: Couldn't load custom DNS resolvers: %s", _custom_resolver_ips)
if not custom_resolver:
if _custom_resolver_ips:
logger.debug("DNS: Trying to use custom DNS resolvers: %s", _custom_resolver_ips)
custom_resolver = dns.resolver.Resolver(configure=False)
custom_resolver.lifetime = os.environ.get("dns_resolvers_timeout", 8.0)
try:
custom_resolver.nameservers = json.loads(_custom_resolver_ips)
except:
logger.debug("DNS: Couldn't load custom DNS resolvers: %s", _custom_resolver_ips)
else:
_custom_resolver = custom_resolver
if custom_resolver:
if host in dns_cache:
ip = dns_cache[host]
logger.debug("DNS: Using %s=%s from cache", host, ip)
return _orig_create_connection((ip, port), *args, **kwargs)
else:
_custom_resolver = custom_resolver
if custom_resolver:
if host in dns_cache:
ip = dns_cache[host]
logger.debug("DNS: Using %s=%s from cache", host, ip)
else:
try:
ip = custom_resolver.query(host)[0].address
logger.debug("DNS: Resolved %s to %s using %s", host, ip, custom_resolver.nameservers)
dns_cache[host] = ip
except dns.exception.DNSException:
logger.warning("DNS: Couldn't resolve %s with DNS: %s", host, custom_resolver.nameservers)
raise
try:
ip = custom_resolver.query(host)[0].address
logger.debug("DNS: Resolved %s to %s using %s", host, ip, custom_resolver.nameservers)
dns_cache[host] = ip
return _orig_create_connection((ip, port), *args, **kwargs)
except dns.exception.DNSException:
logger.warning("DNS: Couldn't resolve %s with DNS: %s", host, custom_resolver.nameservers)
logger.debug("DNS: Falling back to default DNS or IP on %s", host)
return _orig_create_connection((host, port), *args, **kwargs)
patch_create_connection._sz_patched = True
@@ -6,11 +6,13 @@ import subliminal
import time
from random import randint
from dogpile.cache.api import NO_VALUE
from requests import Session
from subliminal.cache import region
from subliminal.exceptions import DownloadLimitExceeded, AuthenticationError
from subliminal.exceptions import DownloadLimitExceeded, AuthenticationError, ConfigurationError
from subliminal.providers.addic7ed import Addic7edProvider as _Addic7edProvider, \
Addic7edSubtitle as _Addic7edSubtitle, ParserBeautifulSoup, show_cells_re
Addic7edSubtitle as _Addic7edSubtitle, ParserBeautifulSoup
from subliminal.subtitle import fix_line_ending
from subliminal_patch.utils import sanitize
from subliminal_patch.exceptions import TooManyRequests
@@ -19,6 +21,8 @@ from subzero.language import Language
logger = logging.getLogger(__name__)
show_cells_re = re.compile(b'<td class="(?:version|vr)">.*?</td>', re.DOTALL)
#: Series header parsing regex
series_year_re = re.compile(r'^(?P<series>[ \w\'.:(),*&!?-]+?)(?: \((?P<year>\d{4})\))?$')
@@ -66,11 +70,15 @@ class Addic7edProvider(_Addic7edProvider):
server_url = 'https://www.addic7ed.com/'
sanitize_characters = {'-', ':', '(', ')', '.', '/'}
last_show_ids_fetch_key = "addic7ed_last_id_fetch"
def __init__(self, username=None, password=None, use_random_agents=False):
super(Addic7edProvider, self).__init__(username=username, password=password)
self.USE_ADDICTED_RANDOM_AGENTS = use_random_agents
if not all((username, password)):
raise ConfigurationError('Username and password must be specified')
def initialize(self):
self.session = Session()
self.session.headers['User-Agent'] = 'Subliminal/%s' % subliminal.__short_version__
@@ -101,13 +109,18 @@ class Addic7edProvider(_Addic7edProvider):
'remember': 'true'}
tries = 0
while tries < 3:
while tries <= 3:
tries += 1
r = self.session.get(self.server_url + 'login.php', timeout=10, headers={"Referer": self.server_url})
if "grecaptcha" in r.content:
if "g-recaptcha" in r.content or "grecaptcha" in r.content:
logger.info('Addic7ed: Solving captcha. This might take a couple of minutes, but should only '
'happen once every so often')
site_key = re.search(r'grecaptcha.execute\(\'(.+?)\',', r.content).group(1)
for g, s in (("g-recaptcha-response", r'g-recaptcha.+?data-sitekey=\"(.+?)\"'),
("recaptcha_response", r'grecaptcha.execute\(\'(.+?)\',')):
site_key = re.search(s, r.content).group(1)
if site_key:
break
if not site_key:
logger.error("Addic7ed: Captcha site-key not found!")
return
@@ -119,23 +132,31 @@ class Addic7edProvider(_Addic7edProvider):
result = pitcher.throw()
if not result:
raise Exception("Addic7ed: Couldn't solve captcha!")
if tries >= 3:
raise Exception("Addic7ed: Couldn't solve captcha!")
logger.info("Addic7ed: Couldn't solve captcha! Retrying")
time.sleep(4)
continue
data["recaptcha_response"] = result
data[g] = result
time.sleep(1)
r = self.session.post(self.server_url + 'dologin.php', data, allow_redirects=False, timeout=10,
headers={"Referer": self.server_url + "login.php"})
if "relax, slow down" in r.content:
raise TooManyRequests(self.username)
if r.status_code != 302:
if "User <b></b> doesn't exist" in r.content and tries <= 2:
logger.info("Addic7ed: Error, trying again. (%s/%s)", tries+1, 3)
tries += 1
continue
if "Wrong password" in r.content or "doesn't exist" in r.content:
raise AuthenticationError(self.username)
if r.status_code != 302:
if tries >= 3:
logger.error("Addic7ed: Something went wrong when logging in")
raise AuthenticationError(self.username)
logger.info("Addic7ed: Something went wrong when logging in; retrying")
time.sleep(4)
continue
break
store_verification("addic7ed", self.session)
@@ -143,10 +164,12 @@ class Addic7edProvider(_Addic7edProvider):
logger.debug('Addic7ed: Logged in')
self.logged_in = True
time.sleep(2)
def terminate(self):
self.session.close()
def get_show_id(self, series, year=None, country_code=None):
def get_show_id(self, series, year=None, country_code=None, ignore_cache=False):
"""Get the best matching show id for `series`, `year` and `country_code`.
First search in the result of :meth:`_get_show_ids` and fallback on a search with :meth:`_search_show_id`.
@@ -158,32 +181,45 @@ class Addic7edProvider(_Addic7edProvider):
:type country_code: str
:return: the show id, if found.
:rtype: int
"""
series_sanitized = sanitize(series).lower()
show_ids = self._get_show_ids()
show_id = None
ids_to_look_for = {sanitize(series).lower(), sanitize(series.replace(".", "")).lower()}
show_ids = self._get_show_ids()
if ignore_cache or not show_ids:
show_ids = self._get_show_ids.refresh(self)
# attempt with country
if not show_id and country_code:
logger.debug('Getting show id with country')
show_id = show_ids.get('%s %s' % (series_sanitized, country_code.lower()))
logger.debug("Trying show ids: %s", ids_to_look_for)
for series_sanitized in ids_to_look_for:
# attempt with country
if not show_id and country_code:
logger.debug('Getting show id with country')
show_id = show_ids.get('%s %s' % (series_sanitized, country_code.lower()))
# attempt with year
if not show_id and year:
logger.debug('Getting show id with year')
show_id = show_ids.get('%s %d' % (series_sanitized, year))
# attempt with year
if not show_id and year:
logger.debug('Getting show id with year')
show_id = show_ids.get('%s %d' % (series_sanitized, year))
# attempt clean
if not show_id:
logger.debug('Getting show id')
show_id = show_ids.get(series_sanitized)
# attempt clean
if not show_id:
logger.debug('Getting show id')
show_id = show_ids.get(series_sanitized)
# search as last resort
# broken right now
# if not show_id:
# logger.warning('Series %s not found in show ids', series)
# show_id = self._search_show_id(series)
if not show_id:
now = datetime.datetime.now()
last_fetch = region.get(self.last_show_ids_fetch_key)
# re-fetch show ids once per day if any show ID not found
if not ignore_cache and last_fetch != NO_VALUE and last_fetch + datetime.timedelta(days=1) < now:
logger.info("Show id not found; re-fetching show ids")
return self.get_show_id(series, year=year, country_code=country_code, ignore_cache=True)
logger.debug("Not refreshing show ids, as the last fetch has been too recent")
# search as last resort
# broken right now
# if not show_id:
# logger.warning('Series %s not found in show ids', series)
# show_id = self._search_show_id(series)
return show_id
@@ -197,6 +233,8 @@ class Addic7edProvider(_Addic7edProvider):
"""
# get the show page
logger.info('Getting show ids')
region.set(self.last_show_ids_fetch_key, datetime.datetime.now())
r = self.session.get(self.server_url + 'shows.php', timeout=10)
r.raise_for_status()
@@ -205,14 +243,15 @@ class Addic7edProvider(_Addic7edProvider):
# Assuming the site's markup is bad, and stripping it down to only contain what's needed.
show_cells = re.findall(show_cells_re, r.content)
if show_cells:
soup = ParserBeautifulSoup(b''.join(show_cells), ['lxml', 'html.parser'])
soup = ParserBeautifulSoup(b''.join(show_cells).decode('utf-8', 'ignore'), ['lxml', 'html.parser'])
else:
# If RegEx fails, fall back to original r.content and use 'html.parser'
soup = ParserBeautifulSoup(r.content, ['html.parser'])
# populate the show ids
show_ids = {}
for show in soup.select('td > h3 > a[href^="/show/"]'):
shows = soup.select('td > h3 > a[href^="/show/"]')
for show in shows:
show_clean = sanitize(show.text, default_characters=self.sanitize_characters)
try:
show_id = int(show['href'][6:])
@@ -230,6 +269,9 @@ class Addic7edProvider(_Addic7edProvider):
logger.debug('Found %d show ids', len(show_ids))
if not show_ids:
raise Exception("Addic7ed: No show IDs found!")
return show_ids
@region.cache_on_arguments(expiration_time=SHOW_EXPIRATION_TIME)
@@ -329,7 +371,7 @@ class Addic7edProvider(_Addic7edProvider):
# ignore incomplete subtitles
status = cells[5].text
if status != 'Completed':
if "%" in status:
logger.debug('Ignoring subtitle with status %s', status)
continue
@@ -23,9 +23,10 @@ class ArgenteamSubtitle(Subtitle):
hearing_impaired_verifiable = False
_release_info = None
def __init__(self, language, download_link, movie_kind, title, season, episode, year, release, version, source,
def __init__(self, language, page_link, download_link, movie_kind, title, season, episode, year, release, version, source,
video_codec, tvdb_id, imdb_id, asked_for_episode=None, asked_for_release_group=None, *args, **kwargs):
super(ArgenteamSubtitle, self).__init__(language, download_link, *args, **kwargs)
super(ArgenteamSubtitle, self).__init__(language, page_link=page_link, *args, **kwargs)
self.page_link = page_link
self.download_link = download_link
self.movie_kind = movie_kind
self.title = title
@@ -135,7 +136,8 @@ class ArgenteamProvider(Provider, ProviderSubtitleArchiveMixin):
provider_name = 'argenteam'
languages = {Language.fromalpha2(l) for l in ['es']}
video_types = (Episode, Movie)
API_URL = "http://argenteam.net/api/v1/"
BASE_URL = "https://argenteam.net/"
API_URL = BASE_URL + "api/v1/"
subtitle_class = ArgenteamSubtitle
hearing_impaired_verifiable = False
language_list = list(languages)
@@ -240,12 +242,15 @@ class ArgenteamProvider(Provider, ProviderSubtitleArchiveMixin):
for r in content['releases']:
for s in r['subtitles']:
sub = ArgenteamSubtitle(language, s['uri'], "episode" if is_episode else "movie", returned_title,
movie_kind = "episode" if is_episode else "movie"
page_link = self.BASE_URL + movie_kind + "/" + str(aid)
# use https and new domain
download_link = s['uri'].replace('http://www.argenteam.net/', self.BASE_URL)
sub = ArgenteamSubtitle(language, page_link, download_link, movie_kind, returned_title,
season, episode, year, r.get('team'), r.get('tags'),
r.get('source'), r.get('codec'), content.get("tvdb"), imdb_id,
asked_for_release_group=video.release_group,
asked_for_episode=episode
)
asked_for_episode=episode)
subtitles.append(sub)
if has_multiple_ids:
@@ -0,0 +1,124 @@
import logging
import os
from io import BytesIO
from zipfile import ZipFile
from requests import Session
from subliminal_patch.subtitle import Subtitle
from subliminal_patch.providers import Provider
from subliminal import __short_version__
from subliminal.exceptions import AuthenticationError, ConfigurationError
from subliminal.subtitle import fix_line_ending
from subzero.language import Language
logger = logging.getLogger(__name__)
class Napisy24Subtitle(Subtitle):
'''Napisy24 Subtitle.'''
provider_name = 'napisy24'
def __init__(self, language, hash, imdb_id, napis_id):
super(Napisy24Subtitle, self).__init__(language)
self.hash = hash
self.imdb_id = imdb_id
self.napis_id = napis_id
@property
def id(self):
return self.hash
def get_matches(self, video):
matches = set()
# hash
if 'napisy24' in video.hashes and video.hashes['napisy24'] == self.hash:
matches.add('hash')
# imdb_id
if video.imdb_id and self.imdb_id == video.imdb_id:
matches.add('imdb_id')
return matches
class Napisy24Provider(Provider):
'''Napisy24 Provider.'''
languages = {Language(l) for l in ['pol']}
required_hash = 'napisy24'
api_url = 'http://napisy24.pl/run/CheckSubAgent.php'
def __init__(self, username=None, password=None):
if all((username, password)):
self.username = username
self.password = password
else:
self.username = 'subliminal'
self.password = 'lanimilbus'
self.session = None
def initialize(self):
self.session = Session()
self.session.headers['User-Agent'] = 'Subliminal/%s' % __short_version__
self.session.headers['Content-Type'] = 'application/x-www-form-urlencoded'
def terminate(self):
self.session.close()
def query(self, language, size, name, hash):
params = {
'postAction': 'CheckSub',
'ua': self.username,
'ap': self.password,
'fs': size,
'fh': hash,
'fn': os.path.basename(name),
'n24pref': 1
}
response = self.session.post(self.api_url, data=params, timeout=10)
response.raise_for_status()
response_content = response.content.split(b'||', 1)
n24_data = response_content[0].decode()
if n24_data[:2] != 'OK':
if n24_data[:11] == 'login error':
raise AuthenticationError('Login failed')
logger.error('Unknown response: %s', response.content)
return None
n24_status = n24_data[:4]
if n24_status == 'OK-0':
logger.info('No subtitles found')
return None
subtitle_info = dict(p.split(':', 1) for p in n24_data.split('|')[1:])
logger.debug('Subtitle info: %s', subtitle_info)
if n24_status == 'OK-1':
logger.info('No subtitles found but got video info')
return None
elif n24_status == 'OK-2':
logger.info('Found subtitles')
elif n24_status == 'OK-3':
logger.info('Found subtitles but not from Napisy24 database')
return None
subtitle_content = response_content[1]
subtitle = Napisy24Subtitle(language, hash, 'tt%s' % subtitle_info['imdb'].zfill(7), subtitle_info['napisId'])
with ZipFile(BytesIO(subtitle_content)) as zf:
subtitle.content = fix_line_ending(zf.open(zf.namelist()[0]).read())
return subtitle
def list_subtitles(self, video, languages):
subtitles = [self.query(l, video.size, video.name, video.hashes['napisy24']) for l in languages]
return [s for s in subtitles if s is not None]
def download_subtitle(self, subtitle):
# there is no download step, content is already filled from listing subtitles
pass
@@ -4,29 +4,34 @@ import io
import logging
import os
import time
import traceback
import requests
import inflect
import cfscrape
import re
import json
import HTMLParser
import urlparse
from random import randint
from zipfile import ZipFile
from babelfish import language_converters
from guessit import guessit
from dogpile.cache.api import NO_VALUE
from subliminal import Episode, ProviderError
from subliminal.cache import region
from subliminal.exceptions import ConfigurationError, ServiceUnavailable
from subliminal.utils import sanitize_release_group
from subliminal.cache import region
from subliminal_patch.http import RetryingCFSession
from subliminal_patch.providers import Provider
from subliminal_patch.providers.mixins import ProviderSubtitleArchiveMixin
from subliminal_patch.subtitle import Subtitle, guess_matches
from subliminal_patch.converters.subscene import language_ids, supported_languages
from subscene_api.subscene import search, Subtitle as APISubtitle
from subscene_api.subscene import search, Subtitle as APISubtitle, SITE_DOMAIN
from subzero.language import Language
p = inflect.engine()
language_converters.register('subscene = subliminal_patch.converters.subscene:SubsceneConverter')
logger = logging.getLogger(__name__)
@@ -117,28 +122,106 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
skip_wrong_fps = False
hearing_impaired_verifiable = True
only_foreign = False
username = None
password = None
search_throttle = 2 # seconds
search_throttle = 8 # seconds
def __init__(self, only_foreign=False, username=None, password=None):
if not all((username, password)):
raise ConfigurationError('Username and password must be specified')
def __init__(self, only_foreign=False):
self.only_foreign = only_foreign
self.username = username
self.password = password
def initialize(self):
logger.info("Creating session")
self.session = RetryingCFSession()
prev_cookies = region.get("subscene_cookies2")
if prev_cookies != NO_VALUE:
logger.debug("Re-using old subscene cookies: %r", prev_cookies)
self.session.cookies.update(prev_cookies)
else:
logger.debug("Logging in")
self.login()
def login(self):
r = self.session.get("https://subscene.com/account/login")
if "Server Error" in r.content:
logger.error("Login unavailable; Maintenance?")
raise ServiceUnavailable("Login unavailable; Maintenance?")
match = re.search(r"<script id='modelJson' type='application/json'>\s*(.+)\s*</script>", r.content)
if match:
h = HTMLParser.HTMLParser()
data = json.loads(h.unescape(match.group(1)))
login_url = urlparse.urljoin(data["siteUrl"], data["loginUrl"])
time.sleep(1.0)
r = self.session.post(login_url,
{
"username": self.username,
"password": self.password,
data["antiForgery"]["name"]: data["antiForgery"]["value"]
})
pep_content = re.search(r"<form method=\"post\" action=\"https://subscene\.com/\">"
r".+name=\"id_token\".+?value=\"(?P<id_token>.+?)\".*?"
r"access_token\".+?value=\"(?P<access_token>.+?)\".+?"
r"token_type.+?value=\"(?P<token_type>.+?)\".+?"
r"expires_in.+?value=\"(?P<expires_in>.+?)\".+?"
r"scope.+?value=\"(?P<scope>.+?)\".+?"
r"state.+?value=\"(?P<state>.+?)\".+?"
r"session_state.+?value=\"(?P<session_state>.+?)\"",
r.content, re.MULTILINE | re.DOTALL)
if pep_content:
r = self.session.post(SITE_DOMAIN, pep_content.groupdict())
try:
r.raise_for_status()
except Exception:
raise ProviderError("Something went wrong when trying to log in: %s", traceback.format_exc())
else:
cj = self.session.cookies.copy()
store_cks = ("scene", "idsrv", "idsrv.xsrf", "idsvr.clients", "idsvr.session", "idsvr.username")
for cn in self.session.cookies.iterkeys():
if cn not in store_cks:
del cj[cn]
logger.debug("Storing cookies: %r", cj)
region.set("subscene_cookies2", cj)
return
raise ProviderError("Something went wrong when trying to log in #1")
def terminate(self):
logger.info("Closing session")
self.session.close()
def _create_filters(self, languages):
self.filters = dict(HearingImpaired="2")
acc_filters = self.filters.copy()
if self.only_foreign:
self.filters["ForeignOnly"] = "True"
acc_filters["ForeignOnly"] = self.filters["ForeignOnly"].lower()
logger.info("Only searching for foreign/forced subtitles")
self.filters["LanguageFilter"] = ",".join((str(language_ids[l.alpha3]) for l in languages
if l.alpha3 in language_ids))
selected_ids = []
for l in languages:
lid = language_ids.get(l.basename, language_ids.get(l.alpha3, None))
if lid:
selected_ids.append(str(lid))
acc_filters["SelectedIds"] = selected_ids
self.filters["LanguageFilter"] = ",".join(acc_filters["SelectedIds"])
last_filters = region.get("subscene_filters")
if last_filters != acc_filters:
region.set("subscene_filters", acc_filters)
logger.debug("Setting account filters to %r", acc_filters)
self.session.post("https://u.subscene.com/filter", acc_filters, allow_redirects=False)
logger.debug("Filter created: '%s'" % self.filters)
@@ -181,7 +264,11 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
def parse_results(self, video, film):
subtitles = []
for s in film.subtitles:
subtitle = SubsceneSubtitle.from_api(s)
try:
subtitle = SubsceneSubtitle.from_api(s)
except NotImplementedError, e:
logger.info(e)
continue
subtitle.asked_for_release_group = video.release_group
if isinstance(video, Episode):
subtitle.asked_for_episode = video.episode
@@ -194,10 +281,16 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
return subtitles
def do_search(self, *args, **kwargs):
try:
return search(*args, **kwargs)
except requests.HTTPError:
region.delete("subscene_cookies2")
def query(self, video):
vfn = get_video_filename(video)
# vfn = get_video_filename(video)
subtitles = []
#logger.debug(u"Searching for: %s", vfn)
# logger.debug(u"Searching for: %s", vfn)
# film = search(vfn, session=self.session)
#
# if film and film.subtitles:
@@ -206,16 +299,17 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
# else:
# logger.debug('No release results found')
#time.sleep(self.search_throttle)
# time.sleep(self.search_throttle)
# re-search for episodes without explicit release name
if isinstance(video, Episode):
#term = u"%s S%02iE%02i" % (video.series, video.season, video.episode)
more_than_one = len([video.series] + video.alternative_series) > 1
for series in [video.series] + video.alternative_series:
titles = list(set([video.series] + video.alternative_series))[:2]
# term = u"%s S%02iE%02i" % (video.series, video.season, video.episode)
more_than_one = len(titles) > 1
for series in titles:
term = u"%s - %s Season" % (series, p.number_to_words("%sth" % video.season).capitalize())
logger.debug('Searching for alternative results: %s', term)
film = search(term, session=self.session, release=False)
film = self.do_search(term, session=self.session, release=False, throttle=self.search_throttle)
if film and film.subtitles:
logger.debug('Alternative results found: %s', len(film.subtitles))
subtitles += self.parse_results(video, film)
@@ -223,25 +317,27 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
logger.debug('No alternative results found')
# packs
if video.season_fully_aired:
term = u"%s S%02i" % (series, video.season)
logger.debug('Searching for packs: %s', term)
time.sleep(self.search_throttle)
film = search(term, session=self.session)
if film and film.subtitles:
logger.debug('Pack results found: %s', len(film.subtitles))
subtitles += self.parse_results(video, film)
else:
logger.debug('No pack results found')
else:
logger.debug("Not searching for packs, because the season hasn't fully aired")
# if video.season_fully_aired:
# term = u"%s S%02i" % (series, video.season)
# logger.debug('Searching for packs: %s', term)
# time.sleep(self.search_throttle)
# film = search(term, session=self.session, throttle=self.search_throttle)
# if film and film.subtitles:
# logger.debug('Pack results found: %s', len(film.subtitles))
# subtitles += self.parse_results(video, film)
# else:
# logger.debug('No pack results found')
# else:
# logger.debug("Not searching for packs, because the season hasn't fully aired")
if more_than_one:
time.sleep(self.search_throttle)
else:
more_than_one = len([video.title] + video.alternative_titles) > 1
for title in [video.title] + video.alternative_titles:
logger.debug('Searching for movie results: %s', title)
film = search(title, year=video.year, session=self.session, limit_to=None, release=False)
titles = list(set([video.title] + video.alternative_titles))[:2]
more_than_one = len(titles) > 1
for title in titles:
logger.debug('Searching for movie results: %r', title)
film = self.do_search(title, year=video.year, session=self.session, limit_to=None, release=False,
throttle=self.search_throttle)
if film and film.subtitles:
subtitles += self.parse_results(video, film)
if more_than_one:
@@ -2,42 +2,35 @@
import io
import logging
import math
import re
import time
from datetime import datetime
import dateutil.parser
import rarfile
from bs4 import BeautifulSoup
from zipfile import ZipFile, is_zipfile
from rarfile import RarFile, is_rarfile
from babelfish import language_converters, Script
from requests import RequestException
from requests import RequestException, codes as request_codes
from guessit import guessit
from subliminal_patch.http import RetryingCFSession
from subliminal_patch.providers import Provider
from subliminal_patch.providers.mixins import ProviderSubtitleArchiveMixin
from subliminal_patch.subtitle import Subtitle
from subliminal_patch.utils import sanitize, fix_inconsistent_naming as _fix_inconsistent_naming
from subliminal.exceptions import ProviderError
from subliminal.exceptions import ProviderError, AuthenticationError, ConfigurationError
from subliminal.score import get_equivalent_release_groups
from subliminal.utils import sanitize_release_group
from subliminal.subtitle import guess_matches
from subliminal.video import Episode, Movie
from subliminal.subtitle import fix_line_ending
from subliminal_patch.pitcher import pitchers, load_verification, store_verification
from subzero.language import Language
from random import randint
from .utils import FIRST_THOUSAND_OR_SO_USER_AGENTS as AGENT_LIST
from subzero.language import Language
from dogpile.cache.api import NO_VALUE
from subliminal.cache import region
# parsing regex definitions
title_re = re.compile(r'(?P<title>(?:.+(?= [Aa][Kk][Aa] ))|.+)(?:(?:.+)(?P<altitle>(?<= [Aa][Kk][Aa] ).+))?')
lang_re = re.compile(r'(?<=flags/)(?P<lang>.{2})(?:.)(?P<script>c?)(?:.+)')
season_re = re.compile(r'Sezona (?P<season>\d+)')
episode_re = re.compile(r'Epizoda (?P<episode>\d+)')
year_re = re.compile(r'(?P<year>\d+)')
fps_re = re.compile(r'fps: (?P<fps>.+)')
def fix_inconsistent_naming(title):
@@ -51,6 +44,7 @@ def fix_inconsistent_naming(title):
return _fix_inconsistent_naming(title, {"DC's Legends of Tomorrow": "Legends of Tomorrow",
"Marvel's Jessica Jones": "Jessica Jones"})
logger = logging.getLogger(__name__)
# Configure :mod:`rarfile` to use the same path separator as :mod:`zipfile`
@@ -62,9 +56,9 @@ language_converters.register('titlovi = subliminal_patch.converters.titlovi:Titl
class TitloviSubtitle(Subtitle):
provider_name = 'titlovi'
def __init__(self, language, page_link, download_link, sid, releases, title, alt_title=None, season=None,
episode=None, year=None, fps=None, asked_for_release_group=None, asked_for_episode=None):
super(TitloviSubtitle, self).__init__(language, page_link=page_link)
def __init__(self, language, download_link, sid, releases, title, alt_title=None, season=None,
episode=None, year=None, rating=None, download_count=None, asked_for_release_group=None, asked_for_episode=None):
super(TitloviSubtitle, self).__init__(language)
self.sid = sid
self.releases = self.release_info = releases
self.title = title
@@ -73,11 +67,21 @@ class TitloviSubtitle(Subtitle):
self.episode = episode
self.year = year
self.download_link = download_link
self.fps = fps
self.rating = rating
self.download_count = download_count
self.matches = None
self.asked_for_release_group = asked_for_release_group
self.asked_for_episode = asked_for_episode
def __repr__(self):
if self.season and self.episode:
return '<%s "%s (%r)" s%.2de%.2d [%s:%s] ID:%r R:%.2f D:%r>' % (
self.__class__.__name__, self.title, self.year, self.season, self.episode, self.language, self._guessed_encoding, self.sid,
self.rating, self.download_count)
else:
return '<%s "%s (%r)" [%s:%s] ID:%r R:%.2f D:%r>' % (
self.__class__.__name__, self.title, self.year, self.language, self._guessed_encoding, self.sid, self.rating, self.download_count)
@property
def id(self):
return self.sid
@@ -134,20 +138,62 @@ class TitloviSubtitle(Subtitle):
class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
subtitle_class = TitloviSubtitle
languages = {Language.fromtitlovi(l) for l in language_converters['titlovi'].codes} | {Language.fromietf('sr-Latn')}
server_url = 'https://titlovi.com'
search_url = server_url + '/titlovi/?'
download_url = server_url + '/download/?type=1&mediaid='
api_url = 'https://kodi.titlovi.com/api/subtitles'
api_gettoken_url = api_url + '/gettoken'
api_search_url = api_url + '/search'
def __init__(self, username=None, password=None):
if not all((username, password)):
raise ConfigurationError('Username and password must be specified')
self.username = username
self.password = password
self.session = None
self.user_id = None
self.login_token = None
self.token_exp = None
def initialize(self):
self.session = RetryingCFSession()
load_verification("titlovi", self.session)
#load_verification("titlovi", self.session)
token = region.get("titlovi_token")
if token is not NO_VALUE:
self.user_id, self.login_token, self.token_exp = token
if datetime.now() > self.token_exp:
logger.debug('Token expired')
self.log_in()
else:
logger.debug('Use cached token')
else:
logger.debug('Token not found in cache')
self.log_in()
def log_in(self):
login_params = dict(username=self.username, password=self.password, json=True)
try:
response = self.session.post(self.api_gettoken_url, params=login_params)
if response.status_code == request_codes.ok:
resp_json = response.json()
self.login_token = resp_json.get('Token')
self.user_id = resp_json.get('UserId')
self.token_exp = dateutil.parser.parse(resp_json.get('ExpirationDate'))
region.set("titlovi_token", [self.user_id, self.login_token, self.token_exp])
logger.debug('New token obtained')
elif response.status_code == request_codes.unauthorized:
raise AuthenticationError('Login failed')
except RequestException as e:
logger.error(e)
def terminate(self):
self.session.close()
def query(self, languages, title, season=None, episode=None, year=None, video=None):
items_per_page = 10
current_page = 1
def query(self, languages, title, season=None, episode=None, year=None, imdb_id=None, video=None):
search_params = dict()
used_languages = languages
lang_strings = [str(lang) for lang in used_languages]
@@ -162,168 +208,73 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
langs = '|'.join(map(str, [l.titlovi for l in used_languages]))
# set query params
params = {'prijevod': title, 'jezik': langs}
search_params['query'] = title
search_params['lang'] = langs
is_episode = False
if season and episode:
is_episode = True
params['s'] = season
params['e'] = episode
if year:
params['g'] = year
search_params['season'] = season
search_params['episode'] = episode
#if year:
# search_params['year'] = year
if imdb_id:
search_params['imdbID'] = imdb_id
# loop through paginated results
logger.info('Searching subtitles %r', params)
logger.info('Searching subtitles %r', search_params)
subtitles = []
query_results = []
while True:
# query the server
try:
r = self.session.get(self.search_url, params=params, timeout=10)
r.raise_for_status()
except RequestException as e:
captcha_passed = False
if e.response.status_code == 403 and "data-sitekey" in e.response.content:
logger.info('titlovi: Solving captcha. This might take a couple of minutes, but should only '
'happen once every so often')
try:
search_params['token'] = self.login_token
search_params['userid'] = self.user_id
search_params['json'] = True
site_key = re.search(r'data-sitekey="(.+?)"', e.response.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', e.response.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', e.response.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("titlovi: Captcha site-key not found!")
response = self.session.get(self.api_search_url, params=search_params)
resp_json = response.json()
if resp_json['SubtitleResults']:
query_results.extend(resp_json['SubtitleResults'])
pitcher = pitchers.get_pitcher()("titlovi", e.request.url, site_key,
user_agent=self.session.headers["User-Agent"],
cookies=self.session.cookies.get_dict(),
is_invisible=True)
result = pitcher.throw()
if not result:
raise Exception("titlovi: Couldn't solve captcha!")
except Exception as e:
logger.error(e)
s_params = {
"s": challenge_s,
"id": challenge_ray,
"g-recaptcha-response": result,
}
r = self.session.get(self.server_url + "/cdn-cgi/l/chk_captcha", params=s_params, timeout=10,
allow_redirects=False)
r.raise_for_status()
r = self.session.get(self.search_url, params=params, timeout=10)
r.raise_for_status()
store_verification("titlovi", self.session)
captcha_passed = True
for sub in query_results:
if not captcha_passed:
logger.exception('RequestException %s', e)
break
# title and alternate title
match = title_re.search(sub.get('Title'))
if match:
_title = match.group('title')
alt_title = match.group('altitle')
else:
try:
soup = BeautifulSoup(r.content, 'lxml')
continue
# number of results
result_count = int(soup.select_one('.results_count b').string)
except:
result_count = None
# handle movies and series separately
if is_episode:
subtitle = self.subtitle_class(Language.fromtitlovi(sub.get('Lang')), sub.get('Link'), sub.get('Id'), sub.get('Release'), _title,
alt_title=alt_title, season=sub.get('Season'), episode=sub.get('Episode'),
year=sub.get('Year'), rating=sub.get('Rating'),
download_count=sub.get('DownloadCount'),
asked_for_release_group=video.release_group,
asked_for_episode=episode)
else:
subtitle = self.subtitle_class(Language.fromtitlovi(sub.get('Lang')), sub.get('Link'), sub.get('Id'), sub.get('Release'), _title,
alt_title=alt_title, year=sub.get('Year'), rating=sub.get('Rating'),
download_count=sub.get('DownloadCount'),
asked_for_release_group=video.release_group)
logger.debug('Found subtitle %r', subtitle)
# exit if no results
if not result_count:
if not subtitles:
logger.debug('No subtitles found')
else:
logger.debug("No more subtitles found")
break
# prime our matches so we can use the values later
subtitle.get_matches(video)
# number of pages with results
pages = int(math.ceil(result_count / float(items_per_page)))
# get current page
if 'pg' in params:
current_page = int(params['pg'])
try:
sublist = soup.select('section.titlovi > ul.titlovi > li.subtitleContainer.canEdit')
for sub in sublist:
# subtitle id
sid = sub.find(attrs={'data-id': True}).attrs['data-id']
# get download link
download_link = self.download_url + sid
# title and alternate title
match = title_re.search(sub.a.string)
if match:
_title = match.group('title')
alt_title = match.group('altitle')
else:
continue
# page link
page_link = self.server_url + sub.a.attrs['href']
# subtitle language
match = lang_re.search(sub.select_one('.lang').attrs['src'])
if match:
try:
# decode language
lang = Language.fromtitlovi(match.group('lang')+match.group('script'))
except ValueError:
continue
# relase year or series start year
match = year_re.search(sub.find(attrs={'data-id': True}).parent.i.string)
if match:
r_year = int(match.group('year'))
# fps
match = fps_re.search(sub.select_one('.fps').string)
if match:
fps = match.group('fps')
# releases
releases = str(sub.select_one('.fps').parent.contents[0].string)
# handle movies and series separately
if is_episode:
# season and episode info
sxe = sub.select_one('.s0xe0y').string
r_season = None
r_episode = None
if sxe:
match = season_re.search(sxe)
if match:
r_season = int(match.group('season'))
match = episode_re.search(sxe)
if match:
r_episode = int(match.group('episode'))
subtitle = self.subtitle_class(lang, page_link, download_link, sid, releases, _title,
alt_title=alt_title, season=r_season, episode=r_episode,
year=r_year, fps=fps,
asked_for_release_group=video.release_group,
asked_for_episode=episode)
else:
subtitle = self.subtitle_class(lang, page_link, download_link, sid, releases, _title,
alt_title=alt_title, year=r_year, fps=fps,
asked_for_release_group=video.release_group)
logger.debug('Found subtitle %r', subtitle)
# prime our matches so we can use the values later
subtitle.get_matches(video)
# add found subtitles
subtitles.append(subtitle)
finally:
soup.decompose()
# stop on last page
if current_page >= pages:
break
# increment current page
params['pg'] = current_page + 1
logger.debug('Getting page %d', params['pg'])
# add found subtitles
subtitles.append(subtitle)
return subtitles
def list_subtitles(self, video, languages):
season = episode = None
if isinstance(video, Episode):
title = video.series
season = video.season
@@ -333,6 +284,7 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
return [s for s in
self.query(languages, fix_inconsistent_naming(title), season=season, episode=episode, year=video.year,
imdb_id=video.imdb_id,
video=video)]
def download_subtitle(self, subtitle):
@@ -370,10 +322,12 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
sub_to_extract = None
for sub_name in subs_in_archive:
if not ('.cyr' in sub_name or '.cir' in sub_name):
_sub_name = sub_name.lower()
if not ('.cyr' in _sub_name or '.cir' in _sub_name or 'cyr)' in _sub_name):
sr_lat_subs.append(sub_name)
if ('.cyr' in sub_name or '.cir' in sub_name) and not '.lat' in sub_name:
if ('.cyr' in sub_name or '.cir' in _sub_name) and not '.lat' in _sub_name.lower():
sr_cyr_subs.append(sub_name)
if subtitle.language == 'sr':
@@ -123,7 +123,8 @@ class Subtitle(Subtitle_):
# http://scratchpad.wikia.com/wiki/Character_Encoding_Recommendation_for_Languages
if self.language.alpha3 == 'zho':
encodings.extend(['cp936', 'gb2312', 'cp950', 'gb18030', 'big5', 'big5hkscs'])
encodings.extend(['cp936', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp_2', 'cp950', 'gb18030', 'big5',
'big5hkscs', 'utf-16'])
elif self.language.alpha3 == 'jpn':
encodings.extend(['shift-jis', 'cp932', 'euc_jp', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2',
'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', ])
@@ -132,7 +133,7 @@ class Subtitle(Subtitle_):
# arabian/farsi
elif self.language.alpha3 in ('ara', 'fas', 'per'):
encodings.append('windows-1256')
encodings.extend(['windows-1256', 'utf-16'])
elif self.language.alpha3 == 'heb':
encodings.extend(['windows-1255', 'iso-8859-8'])
elif self.language.alpha3 == 'tur':
@@ -250,8 +251,7 @@ class Subtitle(Subtitle_):
subs = pysubs2.SSAFile.from_string(text, fps=self.plex_media_fps)
unicontent = self.pysubs2_to_unicode(subs)
self.content = unicontent.encode("utf-8")
self._guessed_encoding = "utf-8"
self.content = unicontent.encode(self._guessed_encoding)
except:
logger.exception("Couldn't convert subtitle %s to .srt format: %s", self, traceback.format_exc())
return False
@@ -319,7 +319,8 @@ class Subtitle(Subtitle_):
:return: string
"""
if not self.mods:
return fix_text(self.content.decode("utf-8"), **ftfy_defaults).encode(encoding="utf-8")
return fix_text(self.content.decode(encoding=self._guessed_encoding), **ftfy_defaults).encode(
encoding=self._guessed_encoding)
submods = SubtitleModifications(debug=debug)
if submods.load(content=self.text, language=self.language):
@@ -328,7 +329,7 @@ class Subtitle(Subtitle_):
self.mods = submods.mods_used
content = fix_text(self.pysubs2_to_unicode(submods.f, format=format), **ftfy_defaults)\
.encode(encoding="utf-8")
.encode(encoding=self._guessed_encoding)
submods.f = None
del submods
return content
+1 -1
View File
@@ -21,7 +21,7 @@ if debug:
logging.basicConfig(level=logging.DEBUG)
#sub = Subtitle(Language.fromietf("eng:forced"), mods=["common", "remove_HI", "OCR_fixes", "fix_uppercase", "shift_offset(ms=-500)", "shift_offset(ms=500)", "shift_offset(s=2,ms=800)"])
sub = Subtitle(Language.fromietf("eng:forced"), mods=["common", "remove_HI", "OCR_fixes", "fix_uppercase", "shift_offset(ms=0,s=1)"])
sub = Subtitle(Language.fromietf("eng"), mods=["common", "remove_HI", "OCR_fixes", "fix_uppercase", "shift_offset(ms=0,s=1)"])
sub.content = open(fn).read()
sub.normalize()
content = sub.get_modified_content(debug=True)
@@ -28,6 +28,9 @@ import re
import enum
import sys
import requests
import time
import logging
is_PY2 = sys.version_info[0] < 3
if is_PY2:
@@ -37,8 +40,13 @@ else:
from contextlib import suppress
from urllib2.request import Request, urlopen
from dogpile.cache.api import NO_VALUE
from subliminal.cache import region
from bs4 import BeautifulSoup, NavigableString
logger = logging.getLogger(__name__)
# constants
HEADERS = {
}
@@ -48,14 +56,23 @@ DEFAULT_USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWeb"\
"Kit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36"
ENDPOINT_RE = re.compile(ur'(?uis)<form.+?action="/subtitles/(.+)">.*?<input type="text"')
class NewEndpoint(Exception):
pass
# utils
def soup_for(url, session=None, user_agent=DEFAULT_USER_AGENT):
def soup_for(url, data=None, session=None, user_agent=DEFAULT_USER_AGENT):
url = re.sub("\s", "+", url)
if not session:
r = Request(url, data=None, headers=dict(HEADERS, **{"User-Agent": user_agent}))
html = urlopen(r).read().decode("utf-8")
else:
html = session.get(url).text
ret = session.post(url, data=data)
ret.raise_for_status()
html = ret.text
return BeautifulSoup(html, "html.parser")
@@ -108,7 +125,7 @@ class Subtitle(object):
subtitles = []
for row in rows:
if row.td.a is not None:
if row.td.a is not None and row.td.get("class", ["lazy"])[0] != "empty":
subtitles.append(cls.from_row(row))
return subtitles
@@ -238,22 +255,52 @@ def get_first_film(soup, section, year=None, session=None):
url = SITE_DOMAIN + t.div.a.get("href")
break
if not url:
return
# fallback to non-year results
logger.info("Falling back to non-year results as year wasn't found (%s)", year)
url = SITE_DOMAIN + tag.findNext("ul").find("li").div.a.get("href")
return Film.from_url(url, session=session)
def search(term, release=True, session=None, year=None, limit_to=SearchTypes.Exact):
soup = soup_for("%s/subtitles/%s?q=%s" % (SITE_DOMAIN, "release" if release else "title", term), session=session)
def find_endpoint(session, content=None):
endpoint = region.get("subscene_endpoint2")
if endpoint is NO_VALUE:
if not content:
content = session.get(SITE_DOMAIN).text
if "Subtitle search by" in str(soup):
rows = soup.find("table").tbody.find_all("tr")
subtitles = Subtitle.from_rows(rows)
return Film(term, subtitles=subtitles)
m = ENDPOINT_RE.search(content)
if m:
endpoint = m.group(1).strip()
logger.debug("Switching main endpoint to %s", endpoint)
region.set("subscene_endpoint2", endpoint)
return endpoint
for junk, search_type in SearchTypes.__members__.items():
if section_exists(soup, search_type):
return get_first_film(soup, search_type, year=year, session=session)
if limit_to == search_type:
return
def search(term, release=True, session=None, year=None, limit_to=SearchTypes.Exact, throttle=0):
# note to subscene: if you actually start to randomize the endpoint, we'll have to query your server even more
if release:
endpoint = "release"
else:
endpoint = find_endpoint(session)
time.sleep(throttle)
if not endpoint:
logger.error("Couldn't find endpoint, exiting")
return
soup = soup_for("%s/subtitles/%s" % (SITE_DOMAIN, endpoint), data={"query": term},
session=session)
if soup:
if "Subtitle search by" in str(soup):
rows = soup.find("table").tbody.find_all("tr")
subtitles = Subtitle.from_rows(rows)
return Film(term, subtitles=subtitles)
for junk, search_type in SearchTypes.__members__.items():
if section_exists(soup, search_type):
return get_first_film(soup, search_type, year=year, session=session)
if limit_to == search_type:
return
@@ -2,7 +2,8 @@
OS_PLEX_USERAGENT = 'plexapp.com v9.0'
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'subzero', 'libfilebot', 'cfscrape']
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'subzero', 'libfilebot',
'cloudscraper']
PERSONAL_MEDIA_IDENTIFIER = "com.plexapp.agents.none"
PLUGIN_IDENTIFIER_SHORT = "subzero"
PLUGIN_IDENTIFIER = "com.plexapp.agents.%s" % PLUGIN_IDENTIFIER_SHORT
@@ -1,5 +1,6 @@
# coding=utf-8
import types
import re
from babelfish.exceptions import LanguageError
from babelfish import Language as Language_, basestr
@@ -8,6 +9,25 @@ repl_map = {
"dk": "da",
"nld": "nl",
"english": "en",
"alb": "sq",
"arm": "hy",
"baq": "eu",
"bur": "my",
"chi": "zh",
"cze": "cs",
"dut": "nl",
"fre": "fr",
"geo": "ka",
"ger": "de",
"gre": "el",
"ice": "is",
"mac": "mk",
"mao": "mi",
"may": "ms",
"per": "fa",
"rum": "ro",
"slo": "sk",
"tib": "bo",
}
@@ -115,3 +135,16 @@ class Language(Language_):
return Language(*Language_.fromietf(s).__getstate__())
return Language(*Language_.fromalpha3b(s).__getstate__())
IETF_MATCH = ".+\.([^-.]+)(?:-[A-Za-z]+)?$"
ENDSWITH_LANGUAGECODE_RE = re.compile("\.([^-.]{2,3})(?:-[A-Za-z]{2,})?$")
def match_ietf_language(s, ietf=False):
language_match = re.match(".+\.([^\.]+)$" if not ietf
else IETF_MATCH, s)
if language_match and len(language_match.groups()) == 1:
language = language_match.groups()[0]
return language
return s
@@ -107,6 +107,12 @@ class Dicked(object):
for key, value in entries.iteritems():
self.__dict__[key] = (Dicked(**value) if isinstance(value, dict) else value)
def has(self, key):
return self._entries is not None and key in self._entries
def get(self, key, default=None):
return self._entries.get(key, default) if self._entries else default
def __repr__(self):
return str(self)
File diff suppressed because one or more lines are too long
@@ -36,6 +36,7 @@ SZ_FIX_DATA = {
u" l ": u" I ",
u"'sjust": u"'s just",
u"'tjust": u"'t just",
u"\";": u"'s",
},
"WholeWords": {
u"I'11": u"I'll",
@@ -293,6 +293,9 @@ class SubtitleModifications(object):
end_tag = line[-5:]
line = line[:-5]
last_procs_mods = []
# fixme: this double loop is ugly
for order, identifier, args in mods:
mod = self.initialized_mods[identifier]
@@ -312,6 +315,33 @@ class SubtitleModifications(object):
break
applied_mods.append(identifier)
if mod.last_processors:
last_procs_mods.append([identifier, args])
if skip_entry:
lines = []
break
if skip_line:
continue
for identifier, args in last_procs_mods:
mod = self.initialized_mods[identifier]
try:
line = mod.modify(line.strip(), entry=entry.text, debug=self.debug, parent=self, index=index,
procs=["last_process"], **args)
except EmptyEntryError:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, entry.text)
skip_entry = True
break
if not line:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, old_line)
skip_line = True
break
if skip_entry:
lines = []
@@ -21,6 +21,7 @@ class SubtitleModification(object):
pre_processors = []
processors = []
post_processors = []
last_processors = []
languages = []
def __init__(self, parent):
@@ -67,15 +68,16 @@ class SubtitleModification(object):
def post_process(self, content, debug=False, parent=None, **kwargs):
return self._process(content, self.post_processors, debug=debug, parent=parent, **kwargs)
def modify(self, content, debug=False, parent=None, **kwargs):
def modify(self, content, debug=False, parent=None, procs=None, **kwargs):
if not content:
return
new_content = content
for method in ("pre_process", "process", "post_process"):
for method in procs or ("pre_process", "process", "post_process"):
if not new_content:
return
new_content = getattr(self, method)(new_content, debug=debug, parent=parent, **kwargs)
new_content = self._process(new_content, getattr(self, "%sors" % method),
debug=debug, parent=parent, **kwargs)
return new_content
@@ -107,3 +109,7 @@ empty_line_post_processors = [
class EmptyEntryError(Exception):
pass
class EmptyLineError(Exception):
pass
@@ -28,7 +28,7 @@ class CommonFixes(SubtitleTextModification):
NReProcessor(re.compile(r'(?u)(\w|\b|\s|^)(-\s?-{1,2})'), ur"\1", name="CM_multidash"),
# line = _/-/\s
NReProcessor(re.compile(r'(?u)(^\W*[-_.:>~]+\W*$)'), "", name="CM_non_word_only"),
NReProcessor(re.compile(r'(?u)(^\W*[-_.:>~]+\W*$)'), "", name="<CM_non_word_only"),
# remove >>
NReProcessor(re.compile(r'(?u)^\s?>>\s*'), "", name="CM_leading_crocodiles"),
@@ -37,7 +37,7 @@ class CommonFixes(SubtitleTextModification):
NReProcessor(re.compile(r'(?u)(^\W*:\s*(?=\w+))'), "", name="CM_empty_colon_start"),
# fix music symbols
NReProcessor(re.compile(ur'(?u)(^[-\s>~]*[*#¶]+\s*)|(\s*[*#¶]+\s*$)'),
NReProcessor(re.compile(ur'(?u)(^[-\s>~]*[*#¶]+\s+)|(\s*[*#¶]+\s*$)'),
lambda x: u"" if x.group(1) else u"",
name="CM_music_symbols"),
@@ -49,11 +49,11 @@ class HearingImpaired(SubtitleTextModification):
NReProcessor(re.compile(ur'(?sux)-?%(t)s[([][^([)\]]+?(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' %
{"t": TAG}), "", name="HI_brackets"),
NReProcessor(re.compile(ur'(?sux)-?%(t)s[([]%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+%(t)s$' % {"t": TAG}),
"", name="HI_bracket_open_start"),
#NReProcessor(re.compile(ur'(?sux)-?%(t)s[([]%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+%(t)s$' % {"t": TAG}),
# "", name="HI_bracket_open_start"),
NReProcessor(re.compile(ur'(?sux)-?%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' % {"t": TAG}), "",
name="HI_bracket_open_end"),
#NReProcessor(re.compile(ur'(?sux)-?%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' % {"t": TAG}), "",
# name="HI_bracket_open_end"),
# text before colon (and possible dash in front), max 11 chars after the first whitespace (if any)
# NReProcessor(re.compile(r'(?u)(^[A-z\-\'"_]+[\w\s]{0,11}:[^0-9{2}][\s]*)'), "", name="HI_before_colon"),
@@ -73,7 +73,7 @@ class HearingImpaired(SubtitleTextModification):
supported=lambda p: not p.only_uppercase),
# remove MAN:
NReProcessor(re.compile(ur'(?suxi)(.*MAN:\s*)'), "", name="HI_remove_man"),
NReProcessor(re.compile(ur'(?suxi)(\b(?:WO)MAN:\s*)'), "", name="HI_remove_man"),
# dash in front
# NReProcessor(re.compile(r'(?u)^\s*-\s*'), "", name="HI_starting_dash"),
@@ -81,13 +81,18 @@ class HearingImpaired(SubtitleTextModification):
# all caps at start before new sentence
NReProcessor(re.compile(ur'(?u)^(?=[A-ZÀ-Ž]{4,})[A-ZÀ-Ž-_\s]+\s([A-ZÀ-Ž][a-zà-ž].+)'), r"\1",
name="HI_starting_upper_then_sentence", supported=lambda p: not p.only_uppercase),
# remove music symbols
NReProcessor(re.compile(ur'(?u)(^%(t)s[*#¶♫♪\s]*%(t)s[*#¶♫♪\s]+%(t)s[*#¶♫♪\s]*%(t)s$)' % {"t": TAG}),
"", name="HI_music_symbols_only"),
]
post_processors = empty_line_post_processors
last_processors = [
# remove music symbols
NReProcessor(re.compile(ur'(?u)(^%(t)s[*#¶♫♪\s]*%(t)s[*#¶♫♪\s]+%(t)s[*#¶♫♪\s]*%(t)s$)' % {"t": TAG}),
"", name="HI_music_symbols_only"),
# remove music entries
NReProcessor(re.compile(ur'(?ums)(^[-\s>~]*[♫♪]+\s*.+|.+\s*[♫♪]+\s*$)'),
"", name="HI_music"),
]
registry.register(HearingImpaired)
+6 -4
View File
@@ -17,7 +17,8 @@ def has_external_subtitle(part_id, stored_subs, language):
def set_existing_languages(video, video_info, external_subtitles=False, embedded_subtitles=False, known_embedded=None,
stored_subs=None, languages=None, only_one=False, known_metadata_subs=None):
stored_subs=None, languages=None, only_one=False, known_metadata_subs=None,
match_strictness="strict"):
logger.debug(u"Determining existing subtitles for %s", video.name)
external_langs_found = set()
@@ -27,7 +28,8 @@ def set_existing_languages(video, video_info, external_subtitles=False, embedded
external_langs_found = known_metadata_subs
external_langs_found.update(set(search_external_subtitles(video.name, languages=languages,
only_one=only_one).values()))
only_one=only_one,
match_strictness=match_strictness).values()))
# found external subtitles should be considered?
if external_subtitles:
@@ -52,10 +54,10 @@ def set_existing_languages(video, video_info, external_subtitles=False, embedded
video.subtitle_languages.add(language)
def parse_video(fn, hints, skip_hashing=False, dry_run=False, providers=None):
def parse_video(fn, hints, skip_hashing=False, dry_run=False, providers=None, hash_from=None):
logger.debug("Parsing video: %s, hints: %s", os.path.basename(fn), hints)
return scan_video(fn, hints=hints, dont_use_actual_file=dry_run, providers=providers,
skip_hashing=skip_hashing)
skip_hashing=skip_hashing, hash_from=hash_from)
def refine_video(video, no_refining=False, refiner_settings=None):
+7
View File
@@ -24,6 +24,13 @@ Don't expect support if you mess this up.
"find_better_as_extracted_tv_score": 352,
"find_better_as_extracted_movie_score": 82,
// SZ can use mediainfo if present to detect titles/forced state of MP4 MOV_TEXT, because the PMS currently doesn't
// set the title attribute
"dont_use_mediainfo_mp4": False,
// specific mediainfo binary path
"mediainfo_bin": null,
"debug_i18n": false,
// per-provider-config
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2019 VeNoMouS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+34 -37
View File
@@ -1,7 +1,9 @@
# Sub-Zero for Plex
[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle.svg?style=flat&label=stable)](https://github.com/pannal/Sub-Zero.bundle/releases/latest)<!--[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle/all.svg?maxAge=2592000&label=testing+2.0+RC9)](https://github.com/pannal/Sub-Zero.bundle/releases)--> [![master](https://img.shields.io/badge/master-stable-green.svg?maxAge=2592000)]()
[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle.svg?style=flat&label=stable)](https://github.com/pannal/Sub-Zero.bundle/releases/latest)
[![master](https://img.shields.io/badge/master-stable-green.svg?maxAge=2592000)]()
[![Maintenance](https://img.shields.io/maintenance/yes/2019.svg)]()
[![Slack Status](https://szslack.fragstore.net/badge.svg)](https://szslack.fragstore.net)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle?ref=badge_shield)
<img src="https://raw.githubusercontent.com/pannal/Sub-Zero.bundle/master/Contents/Resources/subzero.gif" align="left" height="100"> <font size="5"><b>Subtitles done right!</b></font><br />
@@ -12,11 +14,19 @@ Check out **[the Sub-Zero Wiki](https://github.com/pannal/Sub-Zero.bundle/wiki)*
---
**[Kitana is now required to have a UI](https://github.com/pannal/Kitana)**
---
**[The future of Sub-Zero](https://www.reddit.com/r/PleX/comments/9n9qjl/subzero_the_future/)**
---
If you like this, buy me a beer: <br>[![Donate](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=G9VKR2B8PMNKG) <br>or become a Patreon starting at **1 $ / month** <br><a href="https://www.patreon.com/subzero_plex" target="_blank"><img src="http://www.wenspencer.com/wp-content/uploads/2017/02/patreon-button.png" height="42" /></a> <br>or use the OpenSubtitles Sub-Zero affiliate link to become VIP <br>**10€/year, ad-free subs, 1000 subs/day, no-cache *VIP* server**<br><a href="http://v.ht/osvip" target="_blank"><img src="https://static.opensubtitles.org/gfx/logo.gif" height="50" /></a>
## Helping development
If you like this, buy me a beer: <br>[![Donate](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=G9VKR2B8PMNKG) <br>or become a Patreon starting at **1 $ / month** <br><a href="https://www.patreon.com/subzero_plex" target="_blank"><img src="https://i0.wp.com/tablecakes.com/wp-content/uploads/2018/08/become-a-patron-button.png" height="54" /></a> <br>or use the OpenSubtitles Sub-Zero affiliate link to become VIP <br>**10€/year, ad-free subs, 1000 subs/day, no-cache *VIP* server**<br><a href="http://v.ht/osvip" target="_blank"><img src="https://static.opensubtitles.org/gfx/logo.gif" height="50" /></a>
If you register with an anti-captcha service and you decide to use [Anti-Captcha.com](http://getcaptchasolution.com/kkvviom7nh), you can use [this affiliate link](http://getcaptchasolution.com/kkvviom7nh) to help development.
## Introduction
#### What's Sub-Zero?
@@ -84,47 +94,34 @@ the.vbm, mmgoodnow, Vertig0ne, thliu78, tattoomees, ostman, count_confucius, ehe
## Changelog
2.6.5.3017
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service (anti-captcha.com or deathbycaptcha.com), add funds, then supply your credentials/apikey in the configuration
2.6.5.3152
subscene, addic7ed
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- core: SRT parsing: handle (bad) ASS color tag in SRT
- core: auto extract embedded: only use one unknown sub for first language
- core: better embedded streams language detection
- core: optimizations
- core: extract embedded: fix is_unknown check
- core: don't raise exception when subtitle not found inside archive
- core: search external subtitles: fix condition
- core: better plex transcoder path detection
- core: use Log.Warn instead of Log.Warning (#619, #629, #633)
- core: also check for "plex transcoder.exe" in case of windows (fixes #619)
- core: auto extract: use mbcs encoding for paths on windows
- core: Fix issue scandir not returning the name of the file inside Docker images on ARM systems. (thanks @giejay)
- core: also clean PYTHONHOME when calling external notification app
- core: update certifi to 2019.3.9
- core: scan_video: add series/title as alternative by scanning filename itself without parent folders
- core: add generic solution for solving captchas using anti captcha services
- core: increase cache time to 180d (was: 30d)
- core: guess_matches: handle multiple title matches; fixes bazarr#403
- windows: fix compatibility issues with plex transcoder
- compat: use lowercase paths on subtitle detection
- providers: addic7ed: re-enable (using paid anti captch service)
- providers: assrt: assume undefined Chinese flavor as Simplified (chs/zho-Hans)
- providers: subscene: make it work again by bypassing cf
- providers: subscene: don't fail on missing cover
- providers: titlovi: re-enable (might need paid anti captch service)
- providers: opensubtitles: fix only_foreign handling
- providers: opensubtitles: show subtitles with possibly mismatched series when manually listing subs
- menu: list subtitles: show subtitles with bad season/episode values as well
- refiners: omdb: fix imdb ids with spaces
- core: don't fall back to default providers if none enabled
- core: don't process any further if stream info is missing
- core: support using mediainfo for retrieving MP4 MOV_TEXT subtitle stream titles (PMS bug)
- core: fix embedded subtitle extraction in some cases (#681, #680)
- core: scanning: add additional INFO logging for undetected languages
- core: bazarr-backport: remove existing subtitle file, to support MergerFS
- core: bazarr-backport: generic 10 minute throttling if uncaught exception occurs
- providers: addic7ed: fix recaptcha solving; fix show ID retrieval (#681)
- providers: addic7ed: add timeout on authentication error
- providers: addic7ed: fix shows with dots in them (Mayans M.C.)
- providers: addic7ed: fix detection of completed subtitle for non-english users (#686)
- providers: addic7ed: add more timeouts in the login process
- providers: argenteam: bazarr-backport: use new url; fixes
[older changes](CHANGELOG.md)
Subtitles provided by [OpenSubtitles.org](http://www.opensubtitles.org/), [Podnapisi.NET](https://www.podnapisi.net/), [TVSubtitles.net](http://www.tvsubtitles.net/), [Addic7ed.com](http://www.addic7ed.com/), [Legendas TV](http://legendas.tv/), [Napi Projekt](http://www.napiprojekt.pl/), [Shooter](http://shooter.cn/), [Titlovi](http://titlovi.com), [aRGENTeaM](http://argenteam.net), [SubScene](https://subscene.com/), [Hosszupuska](http://hosszupuskasub.com/)
Subtitles provided by [OpenSubtitles.org](http://www.opensubtitles.org/), [Podnapisi.NET](https://www.podnapisi.net/), [TVSubtitles.net](http://www.tvsubtitles.net/), [Addic7ed.com](http://www.addic7ed.com/), [Legendas TV](http://legendas.tv/), [Napi Projekt](http://www.napiprojekt.pl/), [Shooter](http://shooter.cn/), [Titlovi](http://titlovi.com), [aRGENTeaM](http://argenteam.net), [SubScene](https://subscene.com/), [Hosszupuska](http://hosszupuskasub.com/), [Napisy24](https://napisy24.pl/)
[3rd party licenses](https://github.com/pannal/Sub-Zero.bundle/tree/master/Licenses)
## License
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle?ref=badge_large)