Compare commits

...

50 Commits

Author SHA1 Message Date
panni f8f99f0fb2 submod retry; WIP 2019-05-19 06:03:55 +02:00
panni f337b53ae3 submod: HI: remove music
submod: common: be less aggressive about music symbols
submod: HI: be less aggressive about brackets
submod: HI: be less aggressive about MAN
2019-05-18 06:23:04 +02:00
panni aea6050d71 subtitle: try decoding with utf-16 by default as well 2019-05-17 23:45:06 +02:00
panni 13d5e0761e providers: subscene: fix endpoint once again 2019-05-13 16:14:26 +02:00
panni ce28d0284c back from dev 2019-05-12 06:17:08 +02:00
panni 1a0bb9c3e4 release 2.6.5.3074 2019-05-12 06:05:16 +02:00
panni d0c71b4b67 bump dev 2019-05-12 05:12:58 +02:00
panni b3f062956d core: re-fix ass/ssa tags in srt in pysubs2 0.2.3 2019-05-12 05:12:34 +02:00
panni 1a853a780c core: update pysubs2 to 0.2.3 2019-05-12 05:01:38 +02:00
panni 5c47ddeb2d core: update chinese encodings; #646 2019-05-12 04:49:30 +02:00
panni b51deb5d01 core: subliminal: don't replace \r with \n by default; fixes utf-16 character transformation issues; fixes #646 2019-05-12 04:48:23 +02:00
panni cbf5ea69be core: cf: update cloudscraper to 1.1.9; fix keyerror 2019-05-08 15:57:33 +02:00
panni e139ffefe6 bump dev 2019-05-08 04:18:25 +02:00
panni dc0a8deb40 core: cf: testing
providers: subscene: testing
2019-05-08 04:14:04 +02:00
panni 97e93cd10a core: cf: update js2py; update cloudscraper to 1.1.5; 2019-05-08 01:31:21 +02:00
panni 03c934cf21 back to dev 2019-05-01 15:39:23 +02:00
panni 92d0d70258 Release 2.6.5.3062 2019-05-01 15:32:36 +02:00
panni d44298993c Release 2.6.5.3055 2019-05-01 15:32:19 +02:00
panni 12300d4115 Merge branch 'develop-2.6' 2019-05-01 15:29:42 +02:00
pannal b4f08f61a6 Update README.md 2019-05-01 06:00:01 +02:00
pannal 861a25be41 Update README.md 2019-05-01 05:59:21 +02:00
pannal 3e175109a6 Merge pull request #641 from fossabot/master
Add license scan report and status
2019-05-01 05:48:14 +02:00
fossabot fb2210f2fd Add license scan report and status
Signed-off-by: fossabot <badges@fossa.io>
2019-04-30 20:44:05 -07:00
panni e928918201 add cloudscaper LICENSE 2019-05-01 05:13:13 +02:00
panni df607e5772 bump dev 2019-05-01 04:49:30 +02:00
panni a7cc470645 core: log cf domain 2019-05-01 04:48:48 +02:00
panni 4e6421b928 core: dns: set env var empty if not configured 2019-05-01 04:36:03 +02:00
panni df48e8fccd providers: subscene: remove obsolete imports 2019-05-01 04:27:11 +02:00
panni 58111bf204 core: remove old cfscrape implementation 2019-05-01 04:25:04 +02:00
panni 8c02e75fed providers: titlovi: match cfsrc for src 2019-05-01 04:24:31 +02:00
panni 6f3f1cb4b5 core: cf: harden. 2019-05-01 04:24:09 +02:00
panni dd27997deb core: cf: add cloudscaper 1.1.1@496900e instead of cfscrape 2019-05-01 03:12:01 +02:00
panni a1f70d1d4d core: add ENV:dns_resolvers_timeout 2019-05-01 02:39:18 +02:00
panni 7da0bac643 skip warning 2019-05-01 02:33:46 +02:00
panni b3ab2a451c core: http: don't query DNS with IPs. thanks @fgump 2019-05-01 02:27:30 +02:00
panni 850f836ebd back to dev 2019-04-28 05:27:26 +02:00
panni d9fa9d03da back to dev 2019-04-28 05:22:24 +02:00
pannal 76c20dc3d7 Update README.md 2019-04-28 05:21:35 +02:00
panni 4568e222d1 release 2.6.5.3041 2019-04-28 05:11:45 +02:00
panni 344025226a add missing changelog entry 2019-04-28 05:11:09 +02:00
panni f546fcffce release 2.6.5.3039 2019-04-28 05:08:00 +02:00
panni 068c2d4d00 Merge remote-tracking branch 'origin/master'
# Conflicts:
#	Contents/Info.plist
2019-04-28 05:04:51 +02:00
panni ccf5a902e5 core: cf: only store cookie if it had a value 2019-04-28 05:03:04 +02:00
panni 8c72cf9057 bump dev 2019-04-28 04:45:17 +02:00
panni 1ce14aa231 core: http: remove debug 2019-04-28 04:44:27 +02:00
panni 643485b879 core: cf: optimize
providers: titlovi: optimize cf/captcha handling
2019-04-28 04:43:03 +02:00
pannal 5b3d9f26be Update README.md 2019-04-28 03:47:55 +02:00
pannal 14f2f45f20 Update README.md 2019-04-22 05:37:47 +02:00
pannal 8ac6c9d7a7 Update README.md 2019-04-22 05:31:29 +02:00
pannal 237a47b8ed Update Info.plist 2019-04-21 03:48:37 +02:00
53 changed files with 3595 additions and 856 deletions
+44
View File
@@ -1,4 +1,48 @@
2.6.5.3041
Changelog
- core: only reference guessed title if there actually is one
- core: cf: optimize
- core/config: add setting for one existing language to be enough, fixes #491
- core/compat: dns: support nameservers via ENV[dns_resolvers]; don't fall back to default DNS when configured custom DNS failed
- providers: titlovi: prevent repeated captcha solving for CF
2.6.5.3017
Changelog
- core: SRT parsing: handle (bad) ASS color tag in SRT
- core: auto extract embedded: only use one unknown sub for first language
- core: better embedded streams language detection
- core: optimizations
- core: extract embedded: fix is_unknown check
- core: don't raise exception when subtitle not found inside archive
- core: search external subtitles: fix condition
- core: better plex transcoder path detection
- core: use Log.Warn instead of Log.Warning (#619, #629, #633)
- core: also check for "plex transcoder.exe" in case of windows (fixes #619)
- core: auto extract: use mbcs encoding for paths on windows
- core: Fix issue scandir not returning the name of the file inside Docker images on ARM systems. (thanks @giejay)
- core: also clean PYTHONHOME when calling external notification app
- core: update certifi to 2019.3.9
- core: scan_video: add series/title as alternative by scanning filename itself without parent folders
- core: add generic solution for solving captchas using anti captcha services
- core: increase cache time to 180d (was: 30d)
- core: guess_matches: handle multiple title matches; fixes bazarr#403
- windows: fix compatibility issues with plex transcoder
- compat: use lowercase paths on subtitle detection
- providers: addic7ed: re-enable (using paid anti captch service)
- providers: assrt: assume undefined Chinese flavor as Simplified (chs/zho-Hans)
- providers: subscene: make it work again by bypassing cf
- providers: subscene: don't fail on missing cover
- providers: titlovi: re-enable (might need paid anti captch service)
- providers: opensubtitles: fix only_foreign handling
- providers: opensubtitles: show subtitles with possibly mismatched series when manually listing subs
- menu: list subtitles: show subtitles with bad season/episode values as well
- refiners: omdb: fix imdb ids with spaces
2.6.4.2911
- core: improve file cache (windows especially); use fixed-length cache filenames; fixes #600
- core: don't log "Checking connections ..." when sonarr/radarr not activated
+1
View File
@@ -1083,6 +1083,7 @@ class Config(object):
def parse_custom_dns(self):
custom_dns = Prefs['use_custom_dns2'].strip()
os.environ["dns_resolvers"] = ""
if custom_dns:
ips = filter(lambda x: x, [d.strip() for d in custom_dns.split(",")])
if ips:
+2 -2
View File
@@ -13,7 +13,7 @@
<key>CFBundleSignature</key>
<string>????</string>
<key>CFBundleVersion</key>
<string>2.6.5.3023</string>
<string>2.6.5.3074</string>
<key>PlexFrameworkVersion</key>
<string>2</string>
<key>PlexPluginClass</key>
@@ -32,7 +32,7 @@
&lt;h1&gt;Sub-Zero for Plex&lt;/h1&gt;&lt;i&gt;Subtitles done right&lt;/i&gt;
Version 2.6.5.3023 DEV
Version 2.6.5.3074 DEV
Originally based on @bramwalet's awesome &lt;a href=&quot;https://github.com/bramwalet/Subliminal.bundle&quot;&gt;Subliminal.bundle&lt;/a&gt;
@@ -1,392 +0,0 @@
# coding=utf-8
import logging
import random
import re
import os
import json
import base64
from copy import deepcopy
from time import sleep
from collections import OrderedDict
from .jsfuck import jsunfuck
import js2py
from requests.sessions import Session
from subliminal_patch.pitcher import pitchers
try:
from requests_toolbelt.utils import dump
except ImportError:
pass
try:
from urlparse import urlparse
from urlparse import urlunparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlunparse
brotli_available = True
try:
from brotli import decompress as brdec
except:
brotli_available = False
logger = logging.getLogger(__name__)
__version__ = "2.0.3"
# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT
DEFAULT_USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]
BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""
cur_path = os.path.abspath(os.path.dirname(__file__))
if brotli_available:
brwsrs = os.path.join(cur_path, "browsers_br.json")
with open(brwsrs, "r") as f:
UA_COMBO = json.load(f, object_pairs_hook=OrderedDict)["chrome"]
else:
brwsrs = os.path.join(cur_path, "browsers.json")
UA_COMBO = []
with open(brwsrs, "r") as f:
_brwsrs = json.load(f, object_pairs_hook=OrderedDict)
for entry in _brwsrs:
_entry = OrderedDict(("-".join(a.capitalize() for a in key.split("-")), value)
for key, value in entry.iteritems())
_entry["User-Agent"] = None
UA_COMBO.append({"User-Agent": [entry["user-agent"]], "headers": _entry})
class NeedsCaptchaException(Exception):
pass
class CloudflareScraper(Session):
def __init__(self, *args, **kwargs):
self.delay = kwargs.pop('delay', 8)
self.debug = False
self._ua = None
self._hdrs = None
super(CloudflareScraper, self).__init__(*args, **kwargs)
if not self._ua:
# Set a random User-Agent if no custom User-Agent has been set
ua_combo = random.choice(UA_COMBO)
self._ua = random.choice(ua_combo["User-Agent"])
self._hdrs = ua_combo["headers"].copy()
self._hdrs["User-Agent"] = self._ua
self.headers['User-Agent'] = self._ua
def set_cloudflare_challenge_delay(self, delay):
if isinstance(delay, (int, float)) and delay > 0:
self.delay = delay
def is_cloudflare_challenge(self, resp):
if resp.headers.get('Server', '').startswith('cloudflare'):
if b'why_captcha' in resp.content or b'/cdn-cgi/l/chk_captcha' in resp.content:
raise NeedsCaptchaException
return (
resp.status_code in [429, 503]
and b"jschl_vc" in resp.content
and b"jschl_answer" in resp.content
)
return False
def debugRequest(self, req):
try:
print (dump.dump_all(req).decode('utf-8'))
except:
pass
def request(self, method, url, *args, **kwargs):
# self.headers = (
# OrderedDict(
# [
# ('User-Agent', self.headers['User-Agent']),
# ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
# ('Accept-Language', 'en-US,en;q=0.5'),
# ('Accept-Encoding', 'gzip, deflate'),
# ('Connection', 'close'),
# ('Upgrade-Insecure-Requests', '1')
# ]
# )
# )
self.headers = self._hdrs.copy()
resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
if resp.headers.get('content-encoding') == 'br' and brotli_available:
resp._content = brdec(resp._content)
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
try:
if self.is_cloudflare_challenge(resp):
# Work around if the initial request is not a GET,
# Superseed with a GET then re-request the orignal METHOD.
if resp.request.method != 'GET':
self.request('GET', resp.url)
resp = self.request(method, url, *args, **kwargs)
else:
resp = self.solve_cf_challenge(resp, **kwargs)
except NeedsCaptchaException:
# solve the captcha
site_key = re.search(r'data-sitekey="(.+?)"', resp.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', resp.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', resp.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("cf: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("cf", resp.request.url, site_key,
user_agent=self.headers["User-Agent"],
cookies=self.cookies.get_dict(),
is_invisible=True)
logger.info("cf: Solving captcha")
result = pitcher.throw()
if not result:
raise Exception("cf: Couldn't solve captcha!")
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_captcha'.format(parsed_url.scheme, domain)
method = resp.request.method
cloudflare_kwargs = {
'allow_redirects': False,
'headers': {'Referer': resp.url},
'params': OrderedDict(
[
('s', challenge_s),
('g-recaptcha-response', result)
]
)
}
return self.request(method, submit_url, **cloudflare_kwargs)
return resp
def solve_cf_challenge(self, resp, **original_kwargs):
body = resp.text
# Cloudflare requires a delay before solving the challenge
if self.delay == 8:
try:
delay = float(re.search(r'submit\(\);\r?\n\s*},\s*([0-9]+)', body).group(1)) / float(1000)
if isinstance(delay, (int, float)):
self.delay = delay
except:
pass
sleep(self.delay)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_jschl'.format(parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})
try:
params = cloudflare_kwargs.setdefault(
'params', OrderedDict(
[
('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
]
)
)
except Exception as e:
# Something is wrong with the page.
# This may indicate Cloudflare has changed their anti-bot
# technique. If you see this and are running the latest version,
# please open a GitHub issue so I can update the code accordingly.
raise ValueError("Unable to parse Cloudflare anti-bots page: {} {}".format(e.message, BUG_REPORT))
# Solve the Javascript challenge
params['jschl_answer'] = self.solve_challenge(body, domain)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
method = resp.request.method
cloudflare_kwargs['allow_redirects'] = False
redirect = self.request(method, submit_url, **cloudflare_kwargs)
redirect_location = urlparse(redirect.headers['Location'])
if not redirect_location.netloc:
redirect_url = urlunparse(
(
parsed_url.scheme,
domain,
redirect_location.path,
redirect_location.params,
redirect_location.query,
redirect_location.fragment
)
)
return self.request(method, redirect_url, **original_kwargs)
return self.request(method, redirect.headers['Location'], **original_kwargs)
def solve_challenge(self, body, domain):
try:
js = re.search(
r"setTimeout\(function\(\){\s+(var s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n",
body
).group(1)
except Exception:
raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. {}".format(BUG_REPORT))
js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
js = re.sub(r'(e\s=\sfunction\(s\)\s{.*?};)', '', js, flags=re.DOTALL|re.MULTILINE)
js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))
js = js.replace('; 121', '')
# Strip characters that could be used to exit the string context
# These characters are not currently used in Cloudflare's arithmetic snippet
js = re.sub(r"[\n\\']", "", js)
if 'toFixed' not in js:
raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. {}".format(BUG_REPORT))
try:
jsEnv = """
var t = "{domain}";
var g = String.fromCharCode;
o = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
e = function(s) {{
s += "==".slice(2 - (s.length & 3));
var bm, r = "", r1, r2, i = 0;
for (; i < s.length;) {{
bm = o.indexOf(s.charAt(i++)) << 18 | o.indexOf(s.charAt(i++)) << 12 | (r1 = o.indexOf(s.charAt(i++))) << 6 | (r2 = o.indexOf(s.charAt(i++)));
r += r1 === 64 ? g(bm >> 16 & 255) : r2 === 64 ? g(bm >> 16 & 255, bm >> 8 & 255) : g(bm >> 16 & 255, bm >> 8 & 255, bm & 255);
}}
return r;
}};
function italics (str) {{ return '<i>' + this + '</i>'; }};
var document = {{
getElementById: function () {{
return {{'innerHTML': '{innerHTML}'}};
}}
}};
{js}
"""
innerHTML = re.search(
'<div(?: [^<>]*)? id="([^<>]*?)">([^<>]*?)<\/div>',
body,
re.MULTILINE | re.DOTALL
)
innerHTML = innerHTML.group(2).replace("'", r"\'") if innerHTML else ""
js = jsunfuck(jsEnv.format(domain=domain, innerHTML=innerHTML, js=js))
def atob(s):
return base64.b64decode('{}'.format(s)).decode('utf-8')
js2py.disable_pyimport()
context = js2py.EvalJs({'atob': atob})
result = context.eval(js)
except Exception:
logging.error("Error executing Cloudflare IUAM Javascript. {}".format(BUG_REPORT))
raise
try:
float(result)
except Exception:
raise ValueError("Cloudflare IUAM challenge returned unexpected answer. {}".format(BUG_REPORT))
return result
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudflareScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
# Functions for integrating cloudflare-scrape with other applications and scripts
@classmethod
def get_tokens(cls, url, user_agent=None, debug=False, **kwargs):
scraper = cls.create_scraper()
scraper.debug = debug
if user_agent:
scraper.headers['User-Agent'] = user_agent
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception as e:
logging.error("'{}' returned an error. Could not collect tokens.".format(url))
raise
domain = urlparse(resp.url).netloc
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith('.') and d in ('.{}'.format(domain)):
cookie_domain = d
break
else:
raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")
return (
{
'__cfduid': scraper.cookies.get('__cfduid', '', domain=cookie_domain),
'cf_clearance': scraper.cookies.get('cf_clearance', '', domain=cookie_domain)
},
scraper.headers['User-Agent']
)
@classmethod
def get_cookie_string(cls, url, user_agent=None, debug=False, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, debug=debug, **kwargs)
return "; ".join("=".join(pair) for pair in tokens.items()), user_agent
create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string
@@ -1,80 +0,0 @@
[
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"user-agent": "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.102 Safari/537.36",
"accept-encoding": "gzip,deflate",
"accept-language": "en-US,en;q=0.8"
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"user-agent": "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36",
"accept-encoding": "gzip,deflate",
"accept-language": "en-US,en;q=0.8"
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36",
"accept-language": "en-US,en;q=0.8",
"accept-encoding": "gzip, deflate, "
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36",
"accept-language": "en-US,en;q=0.8",
"accept-encoding": "gzip, deflate, "
},
{
"connection": "close",
"accept": "*/*",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Firefox/30.0"
},
{
"connection": "close",
"accept": "image/jpeg, image/gif, image/pjpeg, application/x-ms-application, application/xaml+xml, application/x-ms-xbap, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"accept": "text/html, application/xhtml+xml, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"accept": "text/html, application/xhtml+xml, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)",
"accept-encoding": "gzip, deflate",
"dnt": "1"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
}
]
@@ -0,0 +1,311 @@
import logging
import re
import sys
import ssl
from copy import deepcopy
from time import sleep
from collections import OrderedDict
from requests.sessions import Session
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
from .interpreters import JavaScriptInterpreter
from .user_agent import User_Agent
try:
from requests_toolbelt.utils import dump
except ImportError:
pass
try:
import brotli
except ImportError:
pass
try:
from urlparse import urlparse
from urlparse import urlunparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlunparse
##########################################################################################################################################################
__version__ = '1.1.9'
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
class CipherSuiteAdapter(HTTPAdapter):
def __init__(self, cipherSuite=None, **kwargs):
self.cipherSuite = cipherSuite
if hasattr(ssl, 'PROTOCOL_TLS'):
self.ssl_context = create_urllib3_context(
ssl_version=getattr(ssl, 'PROTOCOL_TLSv1_3', ssl.PROTOCOL_TLSv1_2),
ciphers=self.cipherSuite
)
else:
self.ssl_context = create_urllib3_context(ssl_version=ssl.PROTOCOL_TLSv1)
super(CipherSuiteAdapter, self).__init__(**kwargs)
##########################################################################################################################################################
def init_poolmanager(self, *args, **kwargs):
kwargs['ssl_context'] = self.ssl_context
return super(CipherSuiteAdapter, self).init_poolmanager(*args, **kwargs)
##########################################################################################################################################################
def proxy_manager_for(self, *args, **kwargs):
kwargs['ssl_context'] = self.ssl_context
return super(CipherSuiteAdapter, self).proxy_manager_for(*args, **kwargs)
##########################################################################################################################################################
class CloudScraper(Session):
def __init__(self, *args, **kwargs):
self.debug = kwargs.pop('debug', False)
self.delay = kwargs.pop('delay', None)
self.interpreter = kwargs.pop('interpreter', 'js2py')
self.allow_brotli = kwargs.pop('allow_brotli', True if 'brotli' in sys.modules.keys() else False)
self.cipherSuite = None
super(CloudScraper, self).__init__(*args, **kwargs)
if 'requests' in self.headers['User-Agent']:
# Set a random User-Agent if no custom User-Agent has been set
self.headers = User_Agent(allow_brotli=self.allow_brotli).headers
self.mount('https://', CipherSuiteAdapter(self.loadCipherSuite()))
##########################################################################################################################################################
@staticmethod
def debugRequest(req):
try:
print(dump.dump_all(req).decode('utf-8'))
except: # noqa
pass
##########################################################################################################################################################
def loadCipherSuite(self):
if self.cipherSuite:
return self.cipherSuite
self.cipherSuite = ''
if hasattr(ssl, 'PROTOCOL_TLS'):
ciphers = [
'ECDHE-ECDSA-AES128-GCM-SHA256', 'ECDHE-RSA-AES128-GCM-SHA256', 'ECDHE-ECDSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES256-GCM-SHA384', 'ECDHE-ECDSA-CHACHA20-POLY1305-SHA256', 'ECDHE-RSA-CHACHA20-POLY1305-SHA256',
'ECDHE-RSA-AES128-CBC-SHA', 'ECDHE-RSA-AES256-CBC-SHA', 'RSA-AES128-GCM-SHA256', 'RSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES128-GCM-SHA256', 'RSA-AES256-SHA', '3DES-EDE-CBC'
]
if hasattr(ssl, 'PROTOCOL_TLSv1_3'):
ciphers.insert(0, ['GREASE_3A', 'GREASE_6A', 'AES128-GCM-SHA256', 'AES256-GCM-SHA256', 'AES256-GCM-SHA384', 'CHACHA20-POLY1305-SHA256'])
ctx = ssl.SSLContext(getattr(ssl, 'PROTOCOL_TLSv1_3', ssl.PROTOCOL_TLSv1_2))
for cipher in ciphers:
try:
ctx.set_ciphers(cipher)
self.cipherSuite = '{}:{}'.format(self.cipherSuite, cipher).rstrip(':')
except ssl.SSLError:
pass
return self.cipherSuite
##########################################################################################################################################################
def request(self, method, url, *args, **kwargs):
ourSuper = super(CloudScraper, self)
resp = ourSuper.request(method, url, *args, **kwargs)
if resp.headers.get('Content-Encoding') == 'br':
if self.allow_brotli and resp._content:
resp._content = brotli.decompress(resp.content)
else:
logging.warning('Brotli content detected, But option is disabled, we will not continue.')
return resp
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
if self.isChallengeRequest(resp):
if resp.request.method != 'GET':
# Work around if the initial request is not a GET,
# Supersede with a GET then re-request the original METHOD.
self.request('GET', resp.url)
resp = ourSuper.request(method, url, *args, **kwargs)
else:
# Solve Challenge
resp = self.sendChallengeResponse(resp, **kwargs)
return resp
##########################################################################################################################################################
@staticmethod
def isChallengeRequest(resp):
if resp.headers.get('Server', '').startswith('cloudflare'):
if b'why_captcha' in resp.content or b'/cdn-cgi/l/chk_captcha' in resp.content:
raise ValueError('Captcha')
return (
resp.status_code in [429, 503]
and all(s in resp.content for s in [b'jschl_vc', b'jschl_answer'])
)
return False
##########################################################################################################################################################
def sendChallengeResponse(self, resp, **original_kwargs):
body = resp.text
# Cloudflare requires a delay before solving the challenge
if not self.delay:
try:
delay = float(re.search(r'submit\(\);\r?\n\s*},\s*([0-9]+)', body).group(1)) / float(1000)
if isinstance(delay, (int, float)):
self.delay = delay
except: # noqa
pass
sleep(self.delay)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_jschl'.format(parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
try:
params = OrderedDict()
s = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body)
if s:
params['s'] = s.group('s_value')
params.update(
[
('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
('pass', re.search(r'name="pass" value="(.+?)"', body).group(1))
]
)
params = cloudflare_kwargs.setdefault('params', params)
except Exception as e:
raise ValueError('Unable to parse Cloudflare anti-bots page: {} {}'.format(e.message, BUG_REPORT))
# Solve the Javascript challenge
params['jschl_answer'] = JavaScriptInterpreter.dynamicImport(self.interpreter).solveChallenge(body, domain)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
cloudflare_kwargs['allow_redirects'] = False
redirect = self.request(resp.request.method, submit_url, **cloudflare_kwargs)
redirect_location = urlparse(redirect.headers['Location'])
if not redirect_location.netloc:
redirect_url = urlunparse(
(
parsed_url.scheme,
domain,
redirect_location.path,
redirect_location.params,
redirect_location.query,
redirect_location.fragment
)
)
return self.request(resp.request.method, redirect_url, **original_kwargs)
return self.request(resp.request.method, redirect.headers['Location'], **original_kwargs)
##########################################################################################################################################################
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
##########################################################################################################################################################
# Functions for integrating cloudscraper with other applications and scripts
@classmethod
def get_tokens(cls, url, **kwargs):
scraper = cls.create_scraper(
debug=kwargs.pop('debug', False),
delay=kwargs.pop('delay', None),
interpreter=kwargs.pop('interpreter', 'js2py'),
allow_brotli=kwargs.pop('allow_brotli', True),
)
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception:
logging.error('"{}" returned an error. Could not collect tokens.'.format(url))
raise
domain = urlparse(resp.url).netloc
# noinspection PyUnusedLocal
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith('.') and d in ('.{}'.format(domain)):
cookie_domain = d
break
else:
raise ValueError('Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM ("I\'m Under Attack Mode") enabled?')
return (
{
'__cfduid': scraper.cookies.get('__cfduid', '', domain=cookie_domain),
'cf_clearance': scraper.cookies.get('cf_clearance', '', domain=cookie_domain)
},
scraper.headers['User-Agent']
)
##########################################################################################################################################################
@classmethod
def get_cookie_string(cls, url, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, **kwargs)
return '; '.join('='.join(pair) for pair in tokens.items()), user_agent
##########################################################################################################################################################
create_scraper = CloudScraper.create_scraper
get_tokens = CloudScraper.get_tokens
get_cookie_string = CloudScraper.get_cookie_string
@@ -0,0 +1,89 @@
import re
import sys
import logging
import abc
if sys.version_info >= (3, 4):
ABC = abc.ABC # noqa
else:
ABC = abc.ABCMeta('ABC', (), {})
##########################################################################################################################################################
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
interpreters = {}
class JavaScriptInterpreter(ABC):
@abc.abstractmethod
def __init__(self, name):
interpreters[name] = self
@classmethod
def dynamicImport(cls, name):
if name not in interpreters:
try:
__import__('{}.{}'.format(cls.__module__, name))
if not isinstance(interpreters.get(name), JavaScriptInterpreter):
raise ImportError('The interpreter was not initialized.')
except ImportError:
logging.error('Unable to load {} interpreter'.format(name))
raise
return interpreters[name]
@abc.abstractmethod
def eval(self, jsEnv, js):
pass
def solveChallenge(self, body, domain):
try:
js = re.search(
r'setTimeout\(function\(\){\s+(var s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n',
body
).group(1)
except Exception:
raise ValueError('Unable to identify Cloudflare IUAM Javascript on website. {}'.format(BUG_REPORT))
js = re.sub(r'\s{2,}', ' ', js, flags=re.MULTILINE | re.DOTALL).replace('\'; 121\'', '')
js += '\na.value;'
jsEnv = '''
String.prototype.italics=function(str) {{return "<i>" + this + "</i>";}};
var document = {{
createElement: function () {{
return {{ firstChild: {{ href: "https://{domain}/" }} }}
}},
getElementById: function () {{
return {{"innerHTML": "{innerHTML}"}};
}}
}};
'''
try:
innerHTML = re.search(
r'<div(?: [^<>]*)? id="([^<>]*?)">([^<>]*?)</div>',
body,
re.MULTILINE | re.DOTALL
)
innerHTML = innerHTML.group(2) if innerHTML else ''
except: # noqa
logging.error('Error extracting Cloudflare IUAM Javascript. {}'.format(BUG_REPORT))
raise
try:
result = self.eval(
re.sub(r'\s{2,}', ' ', jsEnv.format(domain=domain, innerHTML=innerHTML), flags=re.MULTILINE | re.DOTALL),
js
)
float(result)
except Exception:
logging.error('Error executing Cloudflare IUAM Javascript. {}'.format(BUG_REPORT))
raise
return result
@@ -0,0 +1,32 @@
from __future__ import absolute_import
import js2py
import logging
import base64
from . import JavaScriptInterpreter
from .jsunfuck import jsunfuck
class ChallengeInterpreter(JavaScriptInterpreter):
def __init__(self):
super(ChallengeInterpreter, self).__init__('js2py')
def eval(self, jsEnv, js):
if js2py.eval_js('(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]') == '1':
logging.warning('WARNING - Please upgrade your js2py https://github.com/PiotrDabkowski/Js2Py, applying work around for the meantime.')
js = jsunfuck(js)
def atob(s):
return base64.b64decode('{}'.format(s)).decode('utf-8')
js2py.disable_pyimport()
context = js2py.EvalJs({'atob': atob})
result = context.eval('{}{}'.format(jsEnv, js))
return result
ChallengeInterpreter()
@@ -80,18 +80,18 @@ CONSTRUCTORS = {
'RegExp': 'Function("return/"+false+"/")()'
}
def jsunfuck(jsfuckString):
for key in sorted(MAPPING, key=lambda k: len(MAPPING[k]), reverse=True):
if MAPPING.get(key) in jsfuckString:
jsfuckString = jsfuckString.replace(MAPPING.get(key), '"{}"'.format(key))
for key in sorted(SIMPLE, key=lambda k: len(SIMPLE[k]), reverse=True):
if SIMPLE.get(key) in jsfuckString:
jsfuckString = jsfuckString.replace(SIMPLE.get(key), '{}'.format(key))
#for key in sorted(CONSTRUCTORS, key=lambda k: len(CONSTRUCTORS[k]), reverse=True):
# for key in sorted(CONSTRUCTORS, key=lambda k: len(CONSTRUCTORS[k]), reverse=True):
# if CONSTRUCTORS.get(key) in jsfuckString:
# jsfuckString = jsfuckString.replace(CONSTRUCTORS.get(key), '{}'.format(key))
return jsfuckString
return jsfuckString
@@ -0,0 +1,46 @@
import base64
import logging
import subprocess
from . import JavaScriptInterpreter
##########################################################################################################################################################
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
class ChallengeInterpreter(JavaScriptInterpreter):
def __init__(self):
super(ChallengeInterpreter, self).__init__('nodejs')
def eval(self, jsEnv, js):
try:
js = 'var atob = function(str) {return Buffer.from(str, "base64").toString("binary");};' \
'var challenge = atob("%s");' \
'var context = {atob: atob};' \
'var options = {filename: "iuam-challenge.js", timeout: 4000};' \
'var answer = require("vm").runInNewContext(challenge, context, options);' \
'process.stdout.write(String(answer));' \
% base64.b64encode('{}{}'.format(jsEnv, js).encode('UTF-8')).decode('ascii')
return subprocess.check_output(['node', '-e', js])
except OSError as e:
if e.errno == 2:
raise EnvironmentError(
'Missing Node.js runtime. Node is required and must be in the PATH (check with `node -v`). Your Node binary may be called `nodejs` rather than `node`, '
'in which case you may need to run `apt-get install nodejs-legacy` on some Debian-based systems. (Please read the cloudscraper'
' README\'s Dependencies section: https://github.com/VeNoMouS/cloudscraper#dependencies.'
)
raise
except Exception:
logging.error('Error executing Cloudflare IUAM Javascript. %s' % BUG_REPORT)
raise
pass
ChallengeInterpreter()
@@ -0,0 +1,40 @@
import os
import json
import random
import logging
from collections import OrderedDict
##########################################################################################################################################################
class User_Agent():
##########################################################################################################################################################
def __init__(self, *args, **kwargs):
self.headers = None
self.loadUserAgent(*args, **kwargs)
##########################################################################################################################################################
def loadUserAgent(self, *args, **kwargs):
browser = kwargs.pop('browser', 'chrome')
user_agents = json.load(
open(os.path.join(os.path.dirname(__file__), 'browsers.json'), 'r'),
object_pairs_hook=OrderedDict
)
if not user_agents.get(browser):
logging.error('Sorry "{}" browser User-Agent was not found.'.format(browser))
raise
user_agent = random.choice(user_agents.get(browser))
self.headers = user_agent.get('headers')
self.headers['User-Agent'] = random.choice(user_agent.get('User-Agent'))
if not kwargs.get('allow_brotli', False):
if 'br' in self.headers['Accept-Encoding']:
self.headers['Accept-Encoding'] = ','.join([encoding for encoding in self.headers['Accept-Encoding'].split(',') if encoding.strip() != 'br']).strip()
File diff suppressed because it is too large Load Diff
+3 -10
View File
@@ -5,6 +5,7 @@ import re
from .translators.friendly_nodes import REGEXP_CONVERTER
from .utils.injector import fix_js_args
from types import FunctionType, ModuleType, GeneratorType, BuiltinFunctionType, MethodType, BuiltinMethodType
from math import floor, log10
import traceback
try:
import numpy
@@ -603,15 +604,7 @@ class PyJs(object):
elif typ == 'Boolean':
return Js('true') if self.value else Js('false')
elif typ == 'Number': #or self.Class=='Number':
if self.is_nan():
return Js('NaN')
elif self.is_infinity():
sign = '-' if self.value < 0 else ''
return Js(sign + 'Infinity')
elif isinstance(self.value,
long) or self.value.is_integer(): # dont print .0
return Js(unicode(int(self.value)))
return Js(unicode(self.value)) # accurate enough
return Js(unicode(js_dtoa(self.value)))
elif typ == 'String':
return self
else: #object
@@ -1046,7 +1039,7 @@ def PyJsComma(a, b):
return b
from .internals.simplex import JsException as PyJsException
from .internals.simplex import JsException as PyJsException, js_dtoa
import pyjsparser
pyjsparser.parser.ENABLE_JS2PY_ERRORS = lambda msg: MakeError('SyntaxError', msg)
@@ -116,10 +116,12 @@ def eval_js(js):
def eval_js6(js):
"""Just like eval_js but with experimental support for js6 via babel."""
return eval_js(js6_to_js5(js))
def translate_js6(js):
"""Just like translate_js but with experimental support for js6 via babel."""
return translate_js(js6_to_js5(js))
@@ -3,15 +3,19 @@ import re
import datetime
from desc import *
from simplex import *
from conversions import *
import six
from pyjsparser import PyJsParser
from itertools import izip
from .desc import *
from .simplex import *
from .conversions import *
from pyjsparser import PyJsParser
import six
if six.PY2:
from itertools import izip
else:
izip = zip
from conversions import *
from simplex import *
def Type(obj):
@@ -1,8 +1,8 @@
from code import Code
from simplex import MakeError
from opcodes import *
from operations import *
from trans_utils import *
from .code import Code
from .simplex import MakeError
from .opcodes import *
from .operations import *
from .trans_utils import *
SPECIAL_IDENTIFIERS = {'true', 'false', 'this'}
@@ -465,10 +465,11 @@ class ByteCodeGenerator:
self.emit('LOAD_OBJECT', tuple(data))
def Program(self, body, **kwargs):
old_tape_len = len(self.exe.tape)
self.emit('LOAD_UNDEFINED')
self.emit(body)
# add function tape !
self.exe.tape = self.function_declaration_tape + self.exe.tape
self.exe.tape = self.exe.tape[:old_tape_len] + self.function_declaration_tape + self.exe.tape[old_tape_len:]
def Pyimport(self, imp, **kwargs):
raise NotImplementedError(
@@ -735,17 +736,17 @@ def main():
#
# }
a.emit(d)
print a.declared_vars
print a.exe.tape
print len(a.exe.tape)
print(a.declared_vars)
print(a.exe.tape)
print(len(a.exe.tape))
a.exe.compile()
def log(this, args):
print args[0]
print(args[0])
return 999
print a.exe.run(a.exe.space.GlobalObj)
print(a.exe.run(a.exe.space.GlobalObj))
if __name__ == '__main__':
@@ -1,16 +1,17 @@
from opcodes import *
from space import *
from base import *
from .opcodes import *
from .space import *
from .base import *
class Code:
'''Can generate, store and run sequence of ops representing js code'''
def __init__(self, is_strict=False):
def __init__(self, is_strict=False, debug_mode=False):
self.tape = []
self.compiled = False
self.label_locs = None
self.is_strict = is_strict
self.debug_mode = debug_mode
self.contexts = []
self.current_ctx = None
@@ -22,6 +23,10 @@ class Code:
self.GLOBAL_THIS = None
self.space = None
# dbg
self.ctx_depth = 0
def get_new_label(self):
self._label_count += 1
return self._label_count
@@ -74,21 +79,35 @@ class Code:
# 0=normal, 1=return, 2=jump_outside, 3=errors
# execute_fragment_under_context returns:
# (return_value, typ, return_value/jump_loc/py_error)
# ctx.stack must be len 1 and its always empty after the call.
# IMPARTANT: It is guaranteed that the length of the ctx.stack is unchanged.
'''
old_curr_ctx = self.current_ctx
self.ctx_depth += 1
old_stack_len = len(ctx.stack)
old_ret_len = len(self.return_locs)
old_ctx_len = len(self.contexts)
try:
self.current_ctx = ctx
return self._execute_fragment_under_context(
ctx, start_label, end_label)
except JsException as err:
# undo the things that were put on the stack (if any)
# don't worry, I know the recovery is possible through try statement and for this reason try statement
# has its own context and stack so it will not delete the contents of the outer stack
del ctx.stack[:]
if self.debug_mode:
self._on_fragment_exit("js errors")
# undo the things that were put on the stack (if any) to ensure a proper error recovery
del ctx.stack[old_stack_len:]
del self.return_locs[old_ret_len:]
del self.contexts[old_ctx_len :]
return undefined, 3, err
finally:
self.ctx_depth -= 1
self.current_ctx = old_curr_ctx
assert old_stack_len == len(ctx.stack)
def _get_dbg_indent(self):
return self.ctx_depth * ' '
def _on_fragment_exit(self, mode):
print(self._get_dbg_indent() + 'ctx exit (%s)' % mode)
def _execute_fragment_under_context(self, ctx, start_label, end_label):
start, end = self.label_locs[start_label], self.label_locs[end_label]
@@ -97,16 +116,20 @@ class Code:
entry_level = len(self.contexts)
# for e in self.tape[start:end]:
# print e
if self.debug_mode:
print(self._get_dbg_indent() + 'ctx entry (from:%d, to:%d)' % (start, end))
while loc < len(self.tape):
#print loc, self.tape[loc]
if len(self.contexts) == entry_level and loc >= end:
if self.debug_mode:
self._on_fragment_exit('normal')
assert loc == end
assert len(ctx.stack) == (
1 + initial_len), 'Stack change must be equal to +1!'
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return ctx.stack.pop(), 0, None # means normal return
# execute instruction
if self.debug_mode:
print(self._get_dbg_indent() + str(loc), self.tape[loc])
status = self.tape[loc].eval(ctx)
# check status for special actions
@@ -116,9 +139,10 @@ class Code:
if len(self.contexts) == entry_level:
# check if jumped outside of the fragment and break if so
if not start <= loc < end:
assert len(ctx.stack) == (
1 + initial_len
), 'Stack change must be equal to +1!'
if self.debug_mode:
self._on_fragment_exit('jump outside loc:%d label:%d' % (loc, status))
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return ctx.stack.pop(), 2, status # jump outside
continue
@@ -137,7 +161,10 @@ class Code:
# return: (None, None)
else:
if len(self.contexts) == entry_level:
assert len(ctx.stack) == 1 + initial_len
if self.debug_mode:
self._on_fragment_exit('return')
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return undefined, 1, ctx.stack.pop(
) # return signal
return_value = ctx.stack.pop()
@@ -149,6 +176,8 @@ class Code:
continue
# next instruction
loc += 1
if self.debug_mode:
self._on_fragment_exit('internal error - unexpected end of tape, will crash')
assert False, 'Remember to add NOP at the end!'
def run(self, ctx, starting_loc=0):
@@ -156,7 +185,8 @@ class Code:
self.current_ctx = ctx
while loc < len(self.tape):
# execute instruction
#print loc, self.tape[loc]
if self.debug_mode:
print(loc, self.tape[loc])
status = self.tape[loc].eval(ctx)
# check status for special actions
@@ -42,6 +42,7 @@ def executable_code(code_str, space, global_context=True):
space.byte_generator.emit('LABEL', skip)
space.byte_generator.emit('NOP')
space.byte_generator.restore_state()
space.byte_generator.exe.compile(
start_loc=old_tape_len
) # dont read the code from the beginning, dont be stupid!
@@ -71,5 +72,5 @@ def _eval(this, args):
def log(this, args):
print ' '.join(map(to_string, args))
print(' '.join(map(to_string, args)))
return undefined
@@ -1,6 +1,6 @@
from __future__ import unicode_literals
# Type Conversions. to_type. All must return PyJs subclass instance
from simplex import *
from .simplex import *
def to_primitive(self, hint=None):
@@ -73,14 +73,7 @@ def to_string(self):
elif typ == 'Boolean':
return 'true' if self else 'false'
elif typ == 'Number': # or self.Class=='Number':
if is_nan(self):
return 'NaN'
elif is_infinity(self):
sign = '-' if self < 0 else ''
return sign + 'Infinity'
elif int(self) == self: # integer value!
return unicode(int(self))
return unicode(self) # todo make it print exactly like node.js
return js_dtoa(self)
else: # object
return to_string(to_primitive(self, 'String'))
@@ -1,29 +1,22 @@
from __future__ import unicode_literals
from base import Scope
from func_utils import *
from conversions import *
from .base import Scope
from .func_utils import *
from .conversions import *
import six
from prototypes.jsboolean import BooleanPrototype
from prototypes.jserror import ErrorPrototype
from prototypes.jsfunction import FunctionPrototype
from prototypes.jsnumber import NumberPrototype
from prototypes.jsobject import ObjectPrototype
from prototypes.jsregexp import RegExpPrototype
from prototypes.jsstring import StringPrototype
from prototypes.jsarray import ArrayPrototype
import prototypes.jsjson as jsjson
import prototypes.jsutils as jsutils
from .prototypes.jsboolean import BooleanPrototype
from .prototypes.jserror import ErrorPrototype
from .prototypes.jsfunction import FunctionPrototype
from .prototypes.jsnumber import NumberPrototype
from .prototypes.jsobject import ObjectPrototype
from .prototypes.jsregexp import RegExpPrototype
from .prototypes.jsstring import StringPrototype
from .prototypes.jsarray import ArrayPrototype
from .prototypes import jsjson
from .prototypes import jsutils
from .constructors import jsnumber, jsstring, jsarray, jsboolean, jsregexp, jsmath, jsobject, jsfunction, jsconsole
from constructors import jsnumber
from constructors import jsstring
from constructors import jsarray
from constructors import jsboolean
from constructors import jsregexp
from constructors import jsmath
from constructors import jsobject
from constructors import jsfunction
from constructors import jsconsole
def fill_proto(proto, proto_class, space):
@@ -155,7 +148,10 @@ def fill_space(space, byte_generator):
j = easy_func(creator, space)
j.name = unicode(typ)
j.prototype = space.ERROR_TYPES[typ]
set_protected(j, 'prototype', space.ERROR_TYPES[typ])
set_non_enumerable(space.ERROR_TYPES[typ], 'constructor', j)
def new_create(args, space):
message = get_arg(args, 0)
@@ -178,6 +174,7 @@ def fill_space(space, byte_generator):
setattr(space, err_type_name + u'Prototype', extra_err)
error_constructors[err_type_name] = construct_constructor(
err_type_name)
assert space.TypeErrorPrototype is not None
# RegExp
@@ -1,5 +1,5 @@
from simplex import *
from conversions import *
from .simplex import *
from .conversions import *
import six
if six.PY3:
@@ -1,5 +1,5 @@
from operations import *
from base import get_member, get_member_dot, PyJsFunction, Scope
from .operations import *
from .base import get_member, get_member_dot, PyJsFunction, Scope
class OP_CODE(object):
@@ -1,6 +1,6 @@
from __future__ import unicode_literals
from simplex import *
from conversions import *
from .simplex import *
from .conversions import *
# ------------------------------------------------------------------------------
# Unary operations
@@ -4,7 +4,7 @@ from __future__ import unicode_literals
import re
from ..conversions import *
from ..func_utils import *
from jsregexp import RegExpExec
from .jsregexp import RegExpExec
DIGS = set(u'0123456789')
WHITE = u"\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u2028\u2029\u202F\u205F\u3000\uFEFF"
@@ -1,11 +1,9 @@
import pyjsparser
from space import Space
import fill_space
from byte_trans import ByteCodeGenerator
from code import Code
from simplex import MakeError
import sys
sys.setrecursionlimit(100000)
from .space import Space
from . import fill_space
from .byte_trans import ByteCodeGenerator
from .code import Code
from .simplex import *
pyjsparser.parser.ENABLE_JS2PY_ERRORS = lambda msg: MakeError(u'SyntaxError', unicode(msg))
@@ -16,8 +14,8 @@ def get_js_bytecode(js):
a.emit(d)
return a.exe.tape
def eval_js_vm(js):
a = ByteCodeGenerator(Code())
def eval_js_vm(js, debug=False):
a = ByteCodeGenerator(Code(debug_mode=debug))
s = Space()
a.exe.space = s
s.exe = a.exe
@@ -26,7 +24,10 @@ def eval_js_vm(js):
a.emit(d)
fill_space.fill_space(s, a)
# print a.exe.tape
if debug:
from pprint import pprint
pprint(a.exe.tape)
print()
a.exe.compile()
return a.exe.run(a.exe.space.GlobalObj)
@@ -1,6 +1,10 @@
from __future__ import unicode_literals
import six
if six.PY3:
basestring = str
long = int
xrange = range
unicode = str
#Undefined
class PyJsUndefined(object):
@@ -75,7 +79,7 @@ def is_callable(self):
def is_infinity(self):
return self == float('inf') or self == -float('inf')
return self == Infinity or self == -Infinity
def is_nan(self):
@@ -114,7 +118,7 @@ class JsException(Exception):
return self.mes.to_string().value
else:
if self.throw is not None:
from conversions import to_string
from .conversions import to_string
return to_string(self.throw)
else:
return self.typ + ': ' + self.message
@@ -131,3 +135,26 @@ def value_from_js_exception(js_exception, space):
return js_exception.throw
else:
return space.NewError(js_exception.typ, js_exception.message)
def js_dtoa(number):
if is_nan(number):
return u'NaN'
elif is_infinity(number):
if number > 0:
return u'Infinity'
return u'-Infinity'
elif number == 0.:
return u'0'
elif abs(number) < 1e-6 or abs(number) >= 1e21:
frac, exponent = unicode(repr(float(number))).split('e')
# Remove leading zeros from the exponent.
exponent = int(exponent)
return frac + ('e' if exponent < 0 else 'e+') + unicode(exponent)
elif abs(number) < 1e-4: # python starts to return exp notation while we still want the prec
frac, exponent = unicode(repr(float(number))).split('e-')
base = u'0.' + u'0' * (int(exponent) - 1) + frac.lstrip('-').replace('.', '')
return base if number > 0. else u'-' + base
elif isinstance(number, long) or number.is_integer(): # dont print .0
return unicode(int(number))
return unicode(repr(number)) # python representation should be equivalent.
@@ -1,5 +1,5 @@
from base import *
from simplex import *
from .base import *
from .simplex import *
class Space(object):
@@ -1,3 +1,10 @@
import six
if six.PY3:
basestring = str
long = int
xrange = range
unicode = str
def to_key(literal_or_identifier):
''' returns string representation of this object'''
if literal_or_identifier['type'] == 'Identifier':
@@ -6,8 +6,6 @@ if six.PY3:
xrange = range
unicode = str
# todo fix apply and bind
class FunctionPrototype:
def toString():
@@ -41,6 +39,7 @@ class FunctionPrototype:
return this.call(obj, args)
def bind(thisArg):
arguments_ = arguments
target = this
if not target.is_callable():
raise this.MakeError(
@@ -48,5 +47,5 @@ class FunctionPrototype:
if len(arguments) <= 1:
args = ()
else:
args = tuple([arguments[e] for e in xrange(1, len(arguments))])
args = tuple([arguments_[e] for e in xrange(1, len(arguments_))])
return this.PyJsBoundFunction(target, thisArg, args)
@@ -345,7 +345,7 @@ def BlockStatement(type, body):
body) # never returns empty string! In the worst case returns pass\n
def ExpressionStatement(type, expression, **ommit):
def ExpressionStatement(type, expression):
return trans(expression) + '\n' # end expression space with new line
+10
View File
@@ -163,3 +163,13 @@ class Pysubs2CLI(object):
elif args.transform_framerate is not None:
in_fps, out_fps = args.transform_framerate
subs.transform_framerate(in_fps, out_fps)
def __main__():
cli = Pysubs2CLI()
rv = cli(sys.argv[1:])
sys.exit(rv)
if __name__ == "__main__":
__main__()
+3 -1
View File
@@ -17,12 +17,14 @@ class Color(_Color):
return _Color.__new__(cls, r, g, b, a)
#: Version of the pysubs2 library.
VERSION = "0.2.1"
VERSION = "0.2.3"
PY3 = sys.version_info.major == 3
if PY3:
text_type = str
binary_string_type = bytes
else:
text_type = unicode
binary_string_type = str
+1 -3
View File
@@ -3,7 +3,7 @@ from .microdvd import MicroDVDFormat
from .subrip import SubripFormat
from .jsonformat import JSONFormat
from .substation import SubstationFormat
from .txt_generic import TXTGenericFormat, MPL2Format
from .mpl2 import MPL2Format
from .exceptions import *
#: Dict mapping file extensions to format identifiers.
@@ -13,7 +13,6 @@ FILE_EXTENSION_TO_FORMAT_IDENTIFIER = {
".ssa": "ssa",
".sub": "microdvd",
".json": "json",
".txt": "txt_generic",
}
#: Dict mapping format identifiers to implementations (FormatBase subclasses).
@@ -23,7 +22,6 @@ FORMAT_IDENTIFIER_TO_FORMAT_CLASS = {
"ssa": SubstationFormat,
"microdvd": MicroDVDFormat,
"json": JSONFormat,
"txt_generic": TXTGenericFormat,
"mpl2": MPL2Format,
}
@@ -2,44 +2,48 @@
from __future__ import print_function, division, unicode_literals
import re
from numbers import Number
from pysubs2.time import times_to_ms
from .time import times_to_ms
from .formatbase import FormatBase
from .ssaevent import SSAEvent
from .ssastyle import SSAStyle
# thanks to http://otsaloma.io/gaupol/doc/api/aeidon.files.mpl2_source.html
MPL2_FORMAT = re.compile(r"^(?um)\[(-?\d+)\]\[(-?\d+)\](.*?)$")
class TXTGenericFormat(FormatBase):
@classmethod
def guess_format(cls, text):
if MPL2_FORMAT.match(text):
return "mpl2"
MPL2_FORMAT = re.compile(r"^(?um)\[(-?\d+)\]\[(-?\d+)\](.*)")
class MPL2Format(FormatBase):
@classmethod
def guess_format(cls, text):
return TXTGenericFormat.guess_format(text)
if MPL2_FORMAT.search(text):
return "mpl2"
@classmethod
def from_file(cls, subs, fp, format_, **kwargs):
def prepare_text(lines):
out = []
for s in lines.split("|"):
s = s.strip()
if s.startswith("/"):
out.append(r"{\i1}%s{\i0}" % s[1:])
continue
# line beginning with '/' is in italics
s = r"{\i1}%s{\i0}" % s[1:].strip()
out.append(s)
return "\n".join(out)
return "\\N".join(out)
subs.events = [SSAEvent(start=times_to_ms(s=float(start) / 10), end=times_to_ms(s=float(end) / 10),
text=prepare_text(text)) for start, end, text in MPL2_FORMAT.findall(fp.getvalue())]
@classmethod
def to_file(cls, subs, fp, format_, **kwargs):
raise NotImplemented
# TODO handle italics
for line in subs:
if line.is_comment:
continue
print("[{start}][{end}] {text}".format(start=int(line.start // 100),
end=int(line.end // 100),
text=line.plaintext.replace("\n", "|")),
file=fp)
@@ -78,7 +78,7 @@ class SSAStyle(object):
s += "%rpx " % self.fontsize
if self.bold: s += "bold "
if self.italic: s += "italic "
s += "'%s'>" % self.fontname
s += "{!r}>".format(self.fontname)
if not PY3: s = s.encode("utf-8")
return s
+9 -1
View File
@@ -46,8 +46,16 @@ class SubripFormat(FormatBase):
following_lines[-1].append(line)
def prepare_text(lines):
# Handle the "happy" empty subtitle case, which is timestamp line followed by blank line(s)
# followed by number line and timestamp line of the next subtitle. Fixes issue #11.
if (len(lines) >= 2
and all(re.match("\s*$", line) for line in lines[:-1])
and re.match("\s*\d+\s*$", lines[-1])):
return ""
# Handle the general case.
s = "".join(lines).strip()
s = re.sub(r"\n* *\d+ *$", "", s) # strip number of next subtitle
s = re.sub(r"\n+ *\d+ *$", "", s) # strip number of next subtitle
s = re.sub(r"< *i *>", r"{\i1}", s)
s = re.sub(r"< */ *i *>", r"{\i0}", s)
s = re.sub(r"< *s *>", r"{\s1}", s)
+14 -13
View File
@@ -4,7 +4,7 @@ from numbers import Number
from .formatbase import FormatBase
from .ssaevent import SSAEvent
from .ssastyle import SSAStyle
from .common import text_type, Color
from .common import text_type, Color, PY3, binary_string_type
from .time import make_time, ms_to_times, timestamp_to_ms, TIMESTAMP
SSA_ALIGNMENT = (1, 2, 3, 9, 10, 11, 5, 6, 7)
@@ -150,14 +150,7 @@ class SubstationFormat(FormatBase):
if format_ == "ass":
return ass_rgba_to_color(v)
else:
try:
return ssa_rgb_to_color(v)
except ValueError:
try:
return ass_rgba_to_color(v)
except:
return Color(255, 255, 255, 0)
return ssa_rgb_to_color(v)
elif f in {"bold", "underline", "italic", "strikeout"}:
return v == "-1"
elif f in {"borderstyle", "encoding", "marginl", "marginr", "marginv", "layer", "alphalevel"}:
@@ -229,7 +222,7 @@ class SubstationFormat(FormatBase):
for k, v in subs.aegisub_project.items():
print(k, v, sep=": ", file=fp)
def field_to_string(f, v):
def field_to_string(f, v, line):
if f in {"start", "end"}:
return ms_to_timestamp(v)
elif f == "marked":
@@ -240,23 +233,31 @@ class SubstationFormat(FormatBase):
return "-1" if v else "0"
elif isinstance(v, (text_type, Number)):
return text_type(v)
elif not PY3 and isinstance(v, binary_string_type):
# A convenience feature, see issue #12 - accept non-unicode strings
# when they are ASCII; this is useful in Python 2, especially for non-text
# fields like style names, where requiring Unicode type seems too stringent
if all(ord(c) < 128 for c in v):
return text_type(v)
else:
raise TypeError("Encountered binary string with non-ASCII codepoint in SubStation field {!r} for line {!r} - please use unicode string instead of str".format(f, line))
elif isinstance(v, Color):
if format_ == "ass":
return color_to_ass_rgba(v)
else:
return color_to_ssa_rgb(v)
else:
raise TypeError("Unexpected type when writing a SubStation field")
raise TypeError("Unexpected type when writing a SubStation field {!r} for line {!r}".format(f, line))
print("\n[V4+ Styles]" if format_ == "ass" else "\n[V4 Styles]", file=fp)
print(STYLE_FORMAT_LINE[format_], file=fp)
for name, sty in subs.styles.items():
fields = [field_to_string(f, getattr(sty, f)) for f in STYLE_FIELDS[format_]]
fields = [field_to_string(f, getattr(sty, f), sty) for f in STYLE_FIELDS[format_]]
print("Style: %s" % name, *fields, sep=",", file=fp)
print("\n[Events]", file=fp)
print(EVENT_FORMAT_LINE[format_], file=fp)
for ev in subs.events:
fields = [field_to_string(f, getattr(ev, f)) for f in EVENT_FIELDS[format_]]
fields = [field_to_string(f, getattr(ev, f), ev) for f in EVENT_FIELDS[format_]]
print(ev.type, end=": ", file=fp)
print(*fields, sep=",", file=fp)
@@ -258,4 +258,4 @@ def fix_line_ending(content):
:rtype: bytes
"""
return content.replace(b'\r\n', b'\n').replace(b'\r', b'\n')
return content.replace(b'\r\n', b'\n')
@@ -10,6 +10,8 @@ import logging
import requests
import xmlrpclib
import dns.resolver
import ipaddress
import re
from requests import exceptions
from urllib3.util import connection
@@ -17,7 +19,13 @@ from retry.api import retry_call
from exceptions import APIThrottled
from dogpile.cache.api import NO_VALUE
from subliminal.cache import region
from cfscrape import CloudflareScraper
from subliminal_patch.pitcher import pitchers
from cloudscraper import CloudScraper
try:
import brotli
except:
pass
try:
from urlparse import urlparse
@@ -55,39 +63,111 @@ class CertifiSession(TimeoutSession):
self.verify = pem_file
class CFSession(CloudflareScraper):
def __init__(self):
super(CFSession, self).__init__()
class NeedsCaptchaException(Exception):
pass
class CFSession(CloudScraper):
def __init__(self, *args, **kwargs):
super(CFSession, self).__init__(*args, **kwargs)
self.debug = os.environ.get("CF_DEBUG", False)
def _request(self, method, url, *args, **kwargs):
ourSuper = super(CloudScraper, self)
resp = ourSuper.request(method, url, *args, **kwargs)
if resp.headers.get('Content-Encoding') == 'br':
if self.allow_brotli and resp._content:
resp._content = brotli.decompress(resp.content)
else:
logging.warning('Brotli content detected, But option is disabled, we will not continue.')
return resp
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
try:
if self.isChallengeRequest(resp):
if resp.request.method != 'GET':
# Work around if the initial request is not a GET,
# Supersede with a GET then re-request the original METHOD.
CloudScraper.request(self, 'GET', resp.url)
resp = ourSuper.request(method, url, *args, **kwargs)
else:
# Solve Challenge
resp = self.sendChallengeResponse(resp, **kwargs)
except ValueError, e:
if e.message == "Captcha":
parsed_url = urlparse(url)
domain = parsed_url.netloc
# solve the captcha
site_key = re.search(r'data-sitekey="(.+?)"', resp.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', resp.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', resp.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("cf: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("cf: %s" % domain, resp.request.url, site_key,
user_agent=self.headers["User-Agent"],
cookies=self.cookies.get_dict(),
is_invisible=True)
parsed_url = urlparse(resp.url)
logger.info("cf: %s: Solving captcha", domain)
result = pitcher.throw()
if not result:
raise Exception("cf: Couldn't solve captcha!")
submit_url = '{}://{}/cdn-cgi/l/chk_captcha'.format(parsed_url.scheme, domain)
method = resp.request.method
cloudflare_kwargs = {
'allow_redirects': False,
'headers': {'Referer': resp.url},
'params': OrderedDict(
[
('s', challenge_s),
('g-recaptcha-response', result)
]
)
}
return CloudScraper.request(self, method, submit_url, **cloudflare_kwargs)
return resp
def request(self, method, url, *args, **kwargs):
parsed_url = urlparse(url)
domain = parsed_url.netloc
cache_key = "cf_data2_%s" % domain
cache_key = "cf_data3_%s" % domain
if not self.cookies.get("__cfduid", "", domain=domain):
if not self.cookies.get("cf_clearance", "", domain=domain):
cf_data = region.get(cache_key)
if cf_data is not NO_VALUE:
cf_cookies, user_agent, hdrs = cf_data
cf_cookies, hdrs = cf_data
logger.debug("Trying to use old cf data for %s: %s", domain, cf_data)
for cookie, value in cf_cookies.iteritems():
self.cookies.set(cookie, value, domain=domain)
self._hdrs = hdrs
self._ua = user_agent
self.headers['User-Agent'] = self._ua
self.headers = hdrs
ret = super(CFSession, self).request(method, url, *args, **kwargs)
ret = self._request(method, url, *args, **kwargs)
try:
cf_data = self.get_cf_live_tokens(domain)
except:
pass
else:
if cf_data != region.get(cache_key) and cf_data[0]["__cfduid"] and cf_data[0]["cf_clearance"]:
logger.debug("Storing cf data for %s: %s", domain, cf_data)
region.set(cache_key, cf_data)
if cf_data and "cf_clearance" in cf_data[0] and cf_data[0]["cf_clearance"]:
if cf_data != region.get(cache_key):
logger.debug("Storing cf data for %s: %s", domain, cf_data)
region.set(cache_key, cf_data)
elif cf_data[0]["cf_clearance"]:
logger.debug("CF Live tokens not updated")
return ret
@@ -101,11 +181,11 @@ class CFSession(CloudflareScraper):
"Unable to find Cloudflare cookies. Does the site actually have "
"Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")
return (OrderedDict([
return (OrderedDict(filter(lambda x: x[1], [
("__cfduid", self.cookies.get("__cfduid", "", domain=cookie_domain)),
("cf_clearance", self.cookies.get("cf_clearance", "", domain=cookie_domain))
]),
self._ua, self._hdrs
])),
self.headers
)
@@ -236,41 +316,47 @@ def patch_create_connection():
global _custom_resolver, _custom_resolver_ips, dns_cache
host, port = address
__custom_resolver_ips = os.environ.get("dns_resolvers", None)
try:
ipaddress.ip_address(unicode(host))
except (ipaddress.AddressValueError, ValueError):
__custom_resolver_ips = os.environ.get("dns_resolvers", None)
# resolver ips changed in the meantime?
if __custom_resolver_ips != _custom_resolver_ips:
_custom_resolver = None
_custom_resolver_ips = __custom_resolver_ips
dns_cache = {}
# resolver ips changed in the meantime?
if __custom_resolver_ips != _custom_resolver_ips:
_custom_resolver = None
_custom_resolver_ips = __custom_resolver_ips
dns_cache = {}
custom_resolver = _custom_resolver
custom_resolver = _custom_resolver
if not custom_resolver:
if _custom_resolver_ips:
logger.debug("DNS: Trying to use custom DNS resolvers: %s", _custom_resolver_ips)
custom_resolver = dns.resolver.Resolver(configure=False)
custom_resolver.lifetime = 8.0
try:
custom_resolver.nameservers = json.loads(_custom_resolver_ips)
except:
logger.debug("DNS: Couldn't load custom DNS resolvers: %s", _custom_resolver_ips)
if not custom_resolver:
if _custom_resolver_ips:
logger.debug("DNS: Trying to use custom DNS resolvers: %s", _custom_resolver_ips)
custom_resolver = dns.resolver.Resolver(configure=False)
custom_resolver.lifetime = os.environ.get("dns_resolvers_timeout", 8.0)
try:
custom_resolver.nameservers = json.loads(_custom_resolver_ips)
except:
logger.debug("DNS: Couldn't load custom DNS resolvers: %s", _custom_resolver_ips)
else:
_custom_resolver = custom_resolver
if custom_resolver:
if host in dns_cache:
ip = dns_cache[host]
logger.debug("DNS: Using %s=%s from cache", host, ip)
return _orig_create_connection((ip, port), *args, **kwargs)
else:
_custom_resolver = custom_resolver
if custom_resolver:
if host in dns_cache:
ip = dns_cache[host]
logger.debug("DNS: Using %s=%s from cache", host, ip)
else:
try:
ip = custom_resolver.query(host)[0].address
logger.debug("DNS: Resolved %s to %s using %s", host, ip, custom_resolver.nameservers)
dns_cache[host] = ip
except dns.exception.DNSException:
logger.warning("DNS: Couldn't resolve %s with DNS: %s", host, custom_resolver.nameservers)
raise
try:
ip = custom_resolver.query(host)[0].address
logger.debug("DNS: Resolved %s to %s using %s", host, ip, custom_resolver.nameservers)
dns_cache[host] = ip
return _orig_create_connection((ip, port), *args, **kwargs)
except dns.exception.DNSException:
logger.warning("DNS: Couldn't resolve %s with DNS: %s", host, custom_resolver.nameservers)
raise
logger.debug("DNS: Falling back to default DNS or IP on %s", host)
return _orig_create_connection((host, port), *args, **kwargs)
patch_create_connection._sz_patched = True
@@ -5,16 +5,11 @@ import logging
import os
import time
import inflect
import cfscrape
from random import randint
from zipfile import ZipFile
from babelfish import language_converters
from guessit import guessit
from dogpile.cache.api import NO_VALUE
from subliminal import Episode, ProviderError
from subliminal.cache import region
from subliminal.utils import sanitize_release_group
from subliminal_patch.http import RetryingCFSession
from subliminal_patch.providers import Provider
@@ -215,7 +210,7 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
for series in [video.series] + video.alternative_series:
term = u"%s - %s Season" % (series, p.number_to_words("%sth" % video.season).capitalize())
logger.debug('Searching for alternative results: %s', term)
film = search(term, session=self.session, release=False)
film = search(term, session=self.session, release=False, throttle=self.search_throttle)
if film and film.subtitles:
logger.debug('Alternative results found: %s', len(film.subtitles))
subtitles += self.parse_results(video, film)
@@ -227,7 +222,7 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
term = u"%s S%02i" % (series, video.season)
logger.debug('Searching for packs: %s', term)
time.sleep(self.search_throttle)
film = search(term, session=self.session)
film = search(term, session=self.session, throttle=self.search_throttle)
if film and film.subtitles:
logger.debug('Pack results found: %s', len(film.subtitles))
subtitles += self.parse_results(video, film)
@@ -241,7 +236,8 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
more_than_one = len([video.title] + video.alternative_titles) > 1
for title in [video.title] + video.alternative_titles:
logger.debug('Searching for movie results: %s', title)
film = search(title, year=video.year, session=self.session, limit_to=None, release=False)
film = search(title, year=video.year, session=self.session, limit_to=None, release=False,
throttle=self.search_throttle)
if film and film.subtitles:
subtitles += self.parse_results(video, film)
if more_than_one:
@@ -140,7 +140,7 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
def initialize(self):
self.session = RetryingCFSession()
load_verification("titlovi", self.session)
#load_verification("titlovi", self.session)
def terminate(self):
self.session.close()
@@ -181,42 +181,8 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
r = self.session.get(self.search_url, params=params, timeout=10)
r.raise_for_status()
except RequestException as e:
captcha_passed = False
if e.response.status_code == 403 and "data-sitekey" in e.response.content:
logger.info('titlovi: Solving captcha. This might take a couple of minutes, but should only '
'happen once every so often')
site_key = re.search(r'data-sitekey="(.+?)"', e.response.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', e.response.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', e.response.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("titlovi: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("titlovi", e.request.url, site_key,
user_agent=self.session.headers["User-Agent"],
cookies=self.session.cookies.get_dict(),
is_invisible=True)
result = pitcher.throw()
if not result:
raise Exception("titlovi: Couldn't solve captcha!")
s_params = {
"s": challenge_s,
"id": challenge_ray,
"g-recaptcha-response": result,
}
r = self.session.get(self.server_url + "/cdn-cgi/l/chk_captcha", params=s_params, timeout=10,
allow_redirects=False)
r.raise_for_status()
r = self.session.get(self.search_url, params=params, timeout=10)
r.raise_for_status()
store_verification("titlovi", self.session)
captcha_passed = True
if not captcha_passed:
logger.exception('RequestException %s', e)
break
logger.exception('RequestException %s', e)
break
else:
try:
soup = BeautifulSoup(r.content, 'lxml')
@@ -259,7 +225,8 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
# page link
page_link = self.server_url + sub.a.attrs['href']
# subtitle language
match = lang_re.search(sub.select_one('.lang').attrs['src'])
_lang = sub.select_one('.lang')
match = lang_re.search(_lang.attrs.get('src', _lang.attrs.get('cfsrc', '')))
if match:
try:
# decode language
@@ -117,13 +117,14 @@ class Subtitle(Subtitle_):
logger.info('Guessing encoding for language %s', self.language)
encodings = ['utf-8']
encodings = ['utf-8', 'utf-16']
# add language-specific encodings
# http://scratchpad.wikia.com/wiki/Character_Encoding_Recommendation_for_Languages
if self.language.alpha3 == 'zho':
encodings.extend(['cp936', 'gb2312', 'cp950', 'gb18030', 'big5', 'big5hkscs'])
encodings.extend(['cp936', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp_2', 'cp950', 'gb18030', 'big5',
'big5hkscs'])
elif self.language.alpha3 == 'jpn':
encodings.extend(['shift-jis', 'cp932', 'euc_jp', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2',
'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', ])
@@ -28,6 +28,8 @@ import re
import enum
import sys
import requests
import time
is_PY2 = sys.version_info[0] < 3
if is_PY2:
@@ -55,7 +57,9 @@ def soup_for(url, session=None, user_agent=DEFAULT_USER_AGENT):
r = Request(url, data=None, headers=dict(HEADERS, **{"User-Agent": user_agent}))
html = urlopen(r).read().decode("utf-8")
else:
html = session.get(url).text
ret = session.get(url)
ret.raise_for_status()
html = ret.text
return BeautifulSoup(html, "html.parser")
@@ -243,17 +247,34 @@ def get_first_film(soup, section, year=None, session=None):
return Film.from_url(url, session=session)
def search(term, release=True, session=None, year=None, limit_to=SearchTypes.Exact):
soup = soup_for("%s/subtitles/%s?q=%s" % (SITE_DOMAIN, "release" if release else "title", term), session=session)
def search(term, release=True, session=None, year=None, limit_to=SearchTypes.Exact, throttle=0):
# note to subscene: if you actually start to randomize the endpoint, we'll have to query your server even more
endpoints = ["searching", "search", "srch", "find"]
if release:
endpoints = ["release"]
if "Subtitle search by" in str(soup):
rows = soup.find("table").tbody.find_all("tr")
subtitles = Subtitle.from_rows(rows)
return Film(term, subtitles=subtitles)
soup = None
for endpoint in endpoints:
try:
soup = soup_for("%s/subtitles/%s?q=%s" % (SITE_DOMAIN, endpoint, term),
session=session)
except requests.HTTPError, e:
if e.response.status_code == 404:
time.sleep(throttle)
# fixme: detect endpoint from html
continue
raise
break
for junk, search_type in SearchTypes.__members__.items():
if section_exists(soup, search_type):
return get_first_film(soup, search_type, year=year, session=session)
if soup:
if "Subtitle search by" in str(soup):
rows = soup.find("table").tbody.find_all("tr")
subtitles = Subtitle.from_rows(rows)
return Film(term, subtitles=subtitles)
if limit_to == search_type:
return
for junk, search_type in SearchTypes.__members__.items():
if section_exists(soup, search_type):
return get_first_film(soup, search_type, year=year, session=session)
if limit_to == search_type:
return
@@ -2,7 +2,8 @@
OS_PLEX_USERAGENT = 'plexapp.com v9.0'
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'subzero', 'libfilebot', 'cfscrape']
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'subzero', 'libfilebot',
'cloudscraper']
PERSONAL_MEDIA_IDENTIFIER = "com.plexapp.agents.none"
PLUGIN_IDENTIFIER_SHORT = "subzero"
PLUGIN_IDENTIFIER = "com.plexapp.agents.%s" % PLUGIN_IDENTIFIER_SHORT
@@ -6,7 +6,7 @@ import pysubs2
import logging
import time
from mods import EMPTY_TAG_PROCESSOR, EmptyEntryError
from mods import EMPTY_TAG_PROCESSOR, EmptyEntryError, FullContentRep
from registry import registry
from subzero.language import Language
@@ -257,7 +257,16 @@ class SubtitleModifications(object):
mod.modify(None, debug=self.debug, parent=self, **args)
def apply_line_mods(self, new_entries, mods):
for index, entry in enumerate(self.f, 1):
index = 1
entries = self.f[:]
entry_count = len(entries)
while 1:
if index > entry_count - 1:
break
entry = entries[index]
applied_mods = []
lines = []
@@ -265,86 +274,110 @@ class SubtitleModifications(object):
start_tags = []
end_tags = []
t = entry.text.strip()
if not t:
text = entry.text.replace(ur"\N", "\n").strip()
if not text:
if self.debug:
logger.debug(u"Skipping empty line: %s", index)
index += 1
continue
skip_entry = False
for line in t.split(ur"\N"):
# don't bother the mods with surrounding tags
old_line = line
line = line.strip()
skip_line = False
line_count += 1
if not line:
continue
# clean {\X0} tags before processing
# fixme: handle nested tags?
start_tag = u""
end_tag = u""
if line.startswith(self.font_style_tag_start):
start_tag = line[:5]
line = line[5:]
if line[-5:-3] == self.font_style_tag_start:
end_tag = line[-5:]
line = line[:-5]
for order, identifier, args in mods:
mod = self.initialized_mods[identifier]
try:
line = mod.modify(line.strip(), entry=entry.text, debug=self.debug, parent=self, index=index,
**args)
except EmptyEntryError:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, entry.text)
skip_entry = True
break
try:
for line in text.split("\n"):
# don't bother the mods with surrounding tags
old_line = line
line = line.strip()
skip_line = False
line_count += 1
if not line:
continue
# clean {\X0} tags before processing
# fixme: handle nested tags?
start_tag = u""
end_tag = u""
if line.startswith(self.font_style_tag_start):
start_tag = line[:5]
line = line[5:]
if line[-5:-3] == self.font_style_tag_start:
end_tag = line[-5:]
line = line[:-5]
last_procs_mods = []
# fixme: this double loop is ugly
for order, identifier, args in mods:
mod = self.initialized_mods[identifier]
line = mod.modify(line.strip(), entry=text, debug=self.debug, parent=self, index=index,
**args)
if not line:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, old_line)
skip_line = True
break
applied_mods.append(identifier)
if mod.last_processors:
last_procs_mods.append([identifier, args])
if skip_line:
continue
for identifier, args in last_procs_mods:
mod = self.initialized_mods[identifier]
line = mod.modify(line.strip(), entry=text, debug=self.debug, parent=self, index=index,
procs=["last_process"], **args)
if not line:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, old_line)
skip_line = True
break
if skip_line:
continue
if start_tag:
start_tags.append(start_tag)
if end_tag:
end_tags.append(end_tag)
# append new line and clean possibly newly added empty tags
cleaned_line = EMPTY_TAG_PROCESSOR.process(start_tag + line + end_tag, debug=self.debug).strip()
if cleaned_line:
# we may have a single closing tag, if so, try appending it to the previous line
if len(cleaned_line) == 5 and cleaned_line.startswith("{\\") and cleaned_line.endswith("0}"):
if lines:
prev_line = lines.pop()
lines.append(prev_line + cleaned_line)
continue
lines.append(cleaned_line)
else:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, old_line)
skip_line = True
break
logger.debug(u"%d: Ditching now empty line (%r)", index, line)
applied_mods.append(identifier)
if skip_entry:
lines = []
break
if skip_line:
if not lines:
# don't bother logging when the entry only had one line
if self.debug and line_count > 1:
logger.debug(u"%d: %r -> ''", index, text)
index += 1
continue
except EmptyEntryError, e:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, e.mod.identifier, e.entry)
index += 1
continue
if start_tag:
start_tags.append(start_tag)
if end_tag:
end_tags.append(end_tag)
# append new line and clean possibly newly added empty tags
cleaned_line = EMPTY_TAG_PROCESSOR.process(start_tag + line + end_tag, debug=self.debug).strip()
if cleaned_line:
# we may have a single closing tag, if so, try appending it to the previous line
if len(cleaned_line) == 5 and cleaned_line.startswith("{\\") and cleaned_line.endswith("0}"):
if lines:
prev_line = lines.pop()
lines.append(prev_line + cleaned_line)
continue
lines.append(cleaned_line)
else:
if self.debug:
logger.debug(u"%d: Ditching now empty line (%r)", index, line)
if not lines:
# don't bother logging when the entry only had one line
if self.debug and line_count > 1:
logger.debug(u"%d: %r -> ''", index, entry.text)
except FullContentRep, e:
if self.debug:
logger.debug(u"%d: %s: %r -> %r", index, e.mod.identifier, text, e.new_content)
new_entries.append(e.new_content.replace("\n", ur"\N"))
index += 1
continue
new_text = ur"\N".join(lines)
@@ -373,6 +406,8 @@ class SubtitleModifications(object):
entry.text = new_text
new_entries.append(entry)
index += 1
SubMod = SubtitleModifications
@@ -21,6 +21,7 @@ class SubtitleModification(object):
pre_processors = []
processors = []
post_processors = []
last_processors = []
languages = []
def __init__(self, parent):
@@ -46,7 +47,7 @@ class SubtitleModification(object):
continue
old_content = new_content
new_content = processor.process(new_content, debug=debug, **kwargs)
new_content = processor.process(new_content, debug=debug, mod=self, **kwargs)
if not new_content:
if debug:
logger.debug("Processor returned empty line: %s", processor.name)
@@ -67,15 +68,16 @@ class SubtitleModification(object):
def post_process(self, content, debug=False, parent=None, **kwargs):
return self._process(content, self.post_processors, debug=debug, parent=parent, **kwargs)
def modify(self, content, debug=False, parent=None, **kwargs):
def modify(self, content, debug=False, parent=None, procs=None, **kwargs):
if not content:
return
new_content = content
for method in ("pre_process", "process", "post_process"):
for method in procs or ("pre_process", "process", "post_process"):
if not new_content:
return
new_content = getattr(self, method)(new_content, debug=debug, parent=parent, **kwargs)
new_content = self._process(new_content, getattr(self, "%sors" % method),
debug=debug, parent=parent, **kwargs)
return new_content
@@ -105,5 +107,22 @@ empty_line_post_processors = [
]
class EmptyEntryError(Exception):
class ModEvent(Exception):
def __init__(self, *args, **kwargs):
self.mod = kwargs.pop("mod", None)
self.entry = kwargs.pop("entry", None)
super(ModEvent, self).__init__(*args, **kwargs)
class EmptyEntryError(ModEvent):
pass
class EmptyLineError(ModEvent):
pass
class FullContentRep(ModEvent):
def __init__(self, *args, **kwargs):
self.new_content = kwargs.pop("new_content", None)
super(FullContentRep, self).__init__(*args, **kwargs)
@@ -28,7 +28,7 @@ class CommonFixes(SubtitleTextModification):
NReProcessor(re.compile(r'(?u)(\w|\b|\s|^)(-\s?-{1,2})'), ur"\1", name="CM_multidash"),
# line = _/-/\s
NReProcessor(re.compile(r'(?u)(^\W*[-_.:>~]+\W*$)'), "", name="CM_non_word_only"),
NReProcessor(re.compile(r'(?u)(^\W*[-_.:>~]+\W*$)'), "", name="<CM_non_word_only"),
# remove >>
NReProcessor(re.compile(r'(?u)^\s?>>\s*'), "", name="CM_leading_crocodiles"),
@@ -37,7 +37,7 @@ class CommonFixes(SubtitleTextModification):
NReProcessor(re.compile(r'(?u)(^\W*:\s*(?=\w+))'), "", name="CM_empty_colon_start"),
# fix music symbols
NReProcessor(re.compile(ur'(?u)(^[-\s>~]*[*#¶]+\s*)|(\s*[*#¶]+\s*$)'),
NReProcessor(re.compile(ur'(?u)(^[-\s>~]*[*#¶]+\s+)|(\s*[*#¶]+\s*$)'),
lambda x: u"" if x.group(1) else u"",
name="CM_music_symbols"),
@@ -1,7 +1,8 @@
# coding=utf-8
import re
from subzero.modification.mods import SubtitleTextModification, empty_line_post_processors, EmptyEntryError, TAG
from subzero.modification.mods import SubtitleTextModification, empty_line_post_processors, EmptyEntryError, TAG, \
FullContentRep
from subzero.modification.processors.re_processor import NReProcessor
from subzero.modification import registry
@@ -10,9 +11,11 @@ class FullBracketEntryProcessor(NReProcessor):
def process(self, content, debug=False, **kwargs):
entry = kwargs.get("entry")
if entry:
rep_content = super(FullBracketEntryProcessor, self).process(entry, debug=debug, **kwargs)
if not rep_content.strip():
raise EmptyEntryError()
rep_content = super(FullBracketEntryProcessor, self).process(entry, debug=debug, **kwargs).strip()
if not rep_content:
raise EmptyEntryError(mod=self.mod, entry=entry)
if content != rep_content:
raise FullContentRep(new_content=rep_content, mod=self.mod, entry=entry)
return content
@@ -49,11 +52,11 @@ class HearingImpaired(SubtitleTextModification):
NReProcessor(re.compile(ur'(?sux)-?%(t)s[([][^([)\]]+?(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' %
{"t": TAG}), "", name="HI_brackets"),
NReProcessor(re.compile(ur'(?sux)-?%(t)s[([]%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+%(t)s$' % {"t": TAG}),
FullBracketEntryProcessor(re.compile(ur'(?sux)-?%(t)s[([]%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+%(t)s$' % {"t": TAG}),
"", name="HI_bracket_open_start"),
NReProcessor(re.compile(ur'(?sux)-?%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' % {"t": TAG}), "",
name="HI_bracket_open_end"),
#NReProcessor(re.compile(ur'(?sux)-?%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' % {"t": TAG}), "",
# name="HI_bracket_open_end"),
# text before colon (and possible dash in front), max 11 chars after the first whitespace (if any)
# NReProcessor(re.compile(r'(?u)(^[A-z\-\'"_]+[\w\s]{0,11}:[^0-9{2}][\s]*)'), "", name="HI_before_colon"),
@@ -73,7 +76,7 @@ class HearingImpaired(SubtitleTextModification):
supported=lambda p: not p.only_uppercase),
# remove MAN:
NReProcessor(re.compile(ur'(?suxi)(.*MAN:\s*)'), "", name="HI_remove_man"),
NReProcessor(re.compile(ur'(?suxi)(\b(?:WO)MAN:\s*)'), "", name="HI_remove_man"),
# dash in front
# NReProcessor(re.compile(r'(?u)^\s*-\s*'), "", name="HI_starting_dash"),
@@ -81,13 +84,18 @@ class HearingImpaired(SubtitleTextModification):
# all caps at start before new sentence
NReProcessor(re.compile(ur'(?u)^(?=[A-ZÀ-Ž]{4,})[A-ZÀ-Ž-_\s]+\s([A-ZÀ-Ž][a-zà-ž].+)'), r"\1",
name="HI_starting_upper_then_sentence", supported=lambda p: not p.only_uppercase),
# remove music symbols
NReProcessor(re.compile(ur'(?u)(^%(t)s[*#¶♫♪\s]*%(t)s[*#¶♫♪\s]+%(t)s[*#¶♫♪\s]*%(t)s$)' % {"t": TAG}),
"", name="HI_music_symbols_only"),
]
post_processors = empty_line_post_processors
last_processors = [
# remove music symbols
NReProcessor(re.compile(ur'(?u)(^%(t)s[*#¶♫♪\s]*%(t)s[*#¶♫♪\s]+%(t)s[*#¶♫♪\s]*%(t)s$)' % {"t": TAG}),
"", name="HI_music_symbols_only"),
# remove music entries
NReProcessor(re.compile(ur'(?ums)(^[-\s>~]*[♫♪]+\s*.+|.+\s*[♫♪]+\s*$)'),
"", name="HI_music"),
]
registry.register(HearingImpaired)
@@ -7,12 +7,14 @@ class Processor(object):
"""
name = None
parent = None
mod = None
supported = None
enabled = True
def __init__(self, name=None, parent=None, supported=None):
def __init__(self, name=None, parent=None, mod=None, supported=None):
self.name = name
self.parent = parent
self.mod = mod
self.supported = supported if supported else lambda parent: True
@property
@@ -20,6 +22,8 @@ class Processor(object):
return self.name
def process(self, content, debug=False, **kwargs):
if not self.mod:
self.mod = kwargs.get("mod", None)
return content
def __repr__(self):
@@ -14,12 +14,13 @@ class ReProcessor(Processor):
pattern = None
replace_with = None
def __init__(self, pattern, replace_with, name=None, supported=None):
super(ReProcessor, self).__init__(name=name, supported=supported)
def __init__(self, pattern, replace_with, name=None, supported=None, **kwargs):
super(ReProcessor, self).__init__(name=name, supported=supported, **kwargs)
self.pattern = pattern
self.replace_with = replace_with
def process(self, content, debug=False, **kwargs):
super(ReProcessor, self).process(content, debug=debug, **kwargs)
return self.pattern.sub(self.replace_with, content)
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2019 VeNoMouS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+23 -32
View File
@@ -1,7 +1,9 @@
# Sub-Zero for Plex
[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle.svg?style=flat&label=stable)](https://github.com/pannal/Sub-Zero.bundle/releases/latest)<!--[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle/all.svg?maxAge=2592000&label=testing+2.0+RC9)](https://github.com/pannal/Sub-Zero.bundle/releases)--> [![master](https://img.shields.io/badge/master-stable-green.svg?maxAge=2592000)]()
[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle.svg?style=flat&label=stable)](https://github.com/pannal/Sub-Zero.bundle/releases/latest)
[![master](https://img.shields.io/badge/master-stable-green.svg?maxAge=2592000)]()
[![Maintenance](https://img.shields.io/maintenance/yes/2019.svg)]()
[![Slack Status](https://szslack.fragstore.net/badge.svg)](https://szslack.fragstore.net)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle?ref=badge_shield)
<img src="https://raw.githubusercontent.com/pannal/Sub-Zero.bundle/master/Contents/Resources/subzero.gif" align="left" height="100"> <font size="5"><b>Subtitles done right!</b></font><br />
@@ -16,8 +18,12 @@ Check out **[the Sub-Zero Wiki](https://github.com/pannal/Sub-Zero.bundle/wiki)*
---
## Helping development
If you like this, buy me a beer: <br>[![Donate](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=G9VKR2B8PMNKG) <br>or become a Patreon starting at **1 $ / month** <br><a href="https://www.patreon.com/subzero_plex" target="_blank"><img src="http://www.wenspencer.com/wp-content/uploads/2017/02/patreon-button.png" height="42" /></a> <br>or use the OpenSubtitles Sub-Zero affiliate link to become VIP <br>**10€/year, ad-free subs, 1000 subs/day, no-cache *VIP* server**<br><a href="http://v.ht/osvip" target="_blank"><img src="https://static.opensubtitles.org/gfx/logo.gif" height="50" /></a>
If you register with an anti-captcha service and you decide to use [Anti-Captcha.com](http://getcaptchasolution.com/kkvviom7nh), you can use [this affiliate link](http://getcaptchasolution.com/kkvviom7nh) to help development.
## Introduction
#### What's Sub-Zero?
Sub-Zero is a metadata agent and interface-plugin at the same time, for the popular Plex Media Server environment.
@@ -84,43 +90,24 @@ the.vbm, mmgoodnow, Vertig0ne, thliu78, tattoomees, ostman, count_confucius, ehe
## Changelog
2.6.5.3017
2.6.5.3074
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service (anti-captcha.com or deathbycaptcha.com), add funds, then supply your credentials/apikey in the configuration
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
- core: SRT parsing: handle (bad) ASS color tag in SRT
- core: auto extract embedded: only use one unknown sub for first language
- core: better embedded streams language detection
- core: optimizations
- core: extract embedded: fix is_unknown check
- core: don't raise exception when subtitle not found inside archive
- core: search external subtitles: fix condition
- core: better plex transcoder path detection
- core: use Log.Warn instead of Log.Warning (#619, #629, #633)
- core: also check for "plex transcoder.exe" in case of windows (fixes #619)
- core: auto extract: use mbcs encoding for paths on windows
- core: Fix issue scandir not returning the name of the file inside Docker images on ARM systems. (thanks @giejay)
- core: also clean PYTHONHOME when calling external notification app
- core: update certifi to 2019.3.9
- core: scan_video: add series/title as alternative by scanning filename itself without parent folders
- core: add generic solution for solving captchas using anti captcha services
- core: increase cache time to 180d (was: 30d)
- core: guess_matches: handle multiple title matches; fixes bazarr#403
- windows: fix compatibility issues with plex transcoder
- compat: use lowercase paths on subtitle detection
- providers: addic7ed: re-enable (using paid anti captch service)
- providers: assrt: assume undefined Chinese flavor as Simplified (chs/zho-Hans)
- providers: subscene: make it work again by bypassing cf
- providers: subscene: don't fail on missing cover
- providers: titlovi: re-enable (might need paid anti captch service)
- providers: opensubtitles: fix only_foreign handling
- providers: opensubtitles: show subtitles with possibly mismatched series when manually listing subs
- menu: list subtitles: show subtitles with bad season/episode values as well
- refiners: omdb: fix imdb ids with spaces
- core: cf: bypass cf 95% of the time without captchas
- core: fix breaking line endings of certain languages (chinese, UTF-16); fixes #646
- core: update pysubs2 to 0.2.3
2.6.5.3062
Changelog
- core: cf: optimize
- core: http: don't query DNS with IPs. thanks @fgump (fixes sonarr/radarr)
[older changes](CHANGELOG.md)
@@ -128,3 +115,7 @@ Changelog
Subtitles provided by [OpenSubtitles.org](http://www.opensubtitles.org/), [Podnapisi.NET](https://www.podnapisi.net/), [TVSubtitles.net](http://www.tvsubtitles.net/), [Addic7ed.com](http://www.addic7ed.com/), [Legendas TV](http://legendas.tv/), [Napi Projekt](http://www.napiprojekt.pl/), [Shooter](http://shooter.cn/), [Titlovi](http://titlovi.com), [aRGENTeaM](http://argenteam.net), [SubScene](https://subscene.com/), [Hosszupuska](http://hosszupuskasub.com/)
[3rd party licenses](https://github.com/pannal/Sub-Zero.bundle/tree/master/Licenses)
## License
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle?ref=badge_large)