Compare commits

...

37 Commits

Author SHA1 Message Date
panni f337b53ae3 submod: HI: remove music
submod: common: be less aggressive about music symbols
submod: HI: be less aggressive about brackets
submod: HI: be less aggressive about MAN
2019-05-18 06:23:04 +02:00
panni aea6050d71 subtitle: try decoding with utf-16 by default as well 2019-05-17 23:45:06 +02:00
panni 13d5e0761e providers: subscene: fix endpoint once again 2019-05-13 16:14:26 +02:00
panni ce28d0284c back from dev 2019-05-12 06:17:08 +02:00
panni 1a0bb9c3e4 release 2.6.5.3074 2019-05-12 06:05:16 +02:00
panni d0c71b4b67 bump dev 2019-05-12 05:12:58 +02:00
panni b3f062956d core: re-fix ass/ssa tags in srt in pysubs2 0.2.3 2019-05-12 05:12:34 +02:00
panni 1a853a780c core: update pysubs2 to 0.2.3 2019-05-12 05:01:38 +02:00
panni 5c47ddeb2d core: update chinese encodings; #646 2019-05-12 04:49:30 +02:00
panni b51deb5d01 core: subliminal: don't replace \r with \n by default; fixes utf-16 character transformation issues; fixes #646 2019-05-12 04:48:23 +02:00
panni cbf5ea69be core: cf: update cloudscraper to 1.1.9; fix keyerror 2019-05-08 15:57:33 +02:00
panni e139ffefe6 bump dev 2019-05-08 04:18:25 +02:00
panni dc0a8deb40 core: cf: testing
providers: subscene: testing
2019-05-08 04:14:04 +02:00
panni 97e93cd10a core: cf: update js2py; update cloudscraper to 1.1.5; 2019-05-08 01:31:21 +02:00
panni 03c934cf21 back to dev 2019-05-01 15:39:23 +02:00
panni 92d0d70258 Release 2.6.5.3062 2019-05-01 15:32:36 +02:00
panni d44298993c Release 2.6.5.3055 2019-05-01 15:32:19 +02:00
panni 12300d4115 Merge branch 'develop-2.6' 2019-05-01 15:29:42 +02:00
pannal b4f08f61a6 Update README.md 2019-05-01 06:00:01 +02:00
pannal 861a25be41 Update README.md 2019-05-01 05:59:21 +02:00
pannal 3e175109a6 Merge pull request #641 from fossabot/master
Add license scan report and status
2019-05-01 05:48:14 +02:00
fossabot fb2210f2fd Add license scan report and status
Signed-off-by: fossabot <badges@fossa.io>
2019-04-30 20:44:05 -07:00
panni e928918201 add cloudscaper LICENSE 2019-05-01 05:13:13 +02:00
panni df607e5772 bump dev 2019-05-01 04:49:30 +02:00
panni a7cc470645 core: log cf domain 2019-05-01 04:48:48 +02:00
panni 4e6421b928 core: dns: set env var empty if not configured 2019-05-01 04:36:03 +02:00
panni df48e8fccd providers: subscene: remove obsolete imports 2019-05-01 04:27:11 +02:00
panni 58111bf204 core: remove old cfscrape implementation 2019-05-01 04:25:04 +02:00
panni 8c02e75fed providers: titlovi: match cfsrc for src 2019-05-01 04:24:31 +02:00
panni 6f3f1cb4b5 core: cf: harden. 2019-05-01 04:24:09 +02:00
panni dd27997deb core: cf: add cloudscaper 1.1.1@496900e instead of cfscrape 2019-05-01 03:12:01 +02:00
panni a1f70d1d4d core: add ENV:dns_resolvers_timeout 2019-05-01 02:39:18 +02:00
panni 7da0bac643 skip warning 2019-05-01 02:33:46 +02:00
panni b3ab2a451c core: http: don't query DNS with IPs. thanks @fgump 2019-05-01 02:27:30 +02:00
panni 850f836ebd back to dev 2019-04-28 05:27:26 +02:00
panni d9fa9d03da back to dev 2019-04-28 05:22:24 +02:00
pannal 76c20dc3d7 Update README.md 2019-04-28 05:21:35 +02:00
51 changed files with 3479 additions and 754 deletions
+44
View File
@@ -1,4 +1,48 @@
2.6.5.3041
Changelog
- core: only reference guessed title if there actually is one
- core: cf: optimize
- core/config: add setting for one existing language to be enough, fixes #491
- core/compat: dns: support nameservers via ENV[dns_resolvers]; don't fall back to default DNS when configured custom DNS failed
- providers: titlovi: prevent repeated captcha solving for CF
2.6.5.3017
Changelog
- core: SRT parsing: handle (bad) ASS color tag in SRT
- core: auto extract embedded: only use one unknown sub for first language
- core: better embedded streams language detection
- core: optimizations
- core: extract embedded: fix is_unknown check
- core: don't raise exception when subtitle not found inside archive
- core: search external subtitles: fix condition
- core: better plex transcoder path detection
- core: use Log.Warn instead of Log.Warning (#619, #629, #633)
- core: also check for "plex transcoder.exe" in case of windows (fixes #619)
- core: auto extract: use mbcs encoding for paths on windows
- core: Fix issue scandir not returning the name of the file inside Docker images on ARM systems. (thanks @giejay)
- core: also clean PYTHONHOME when calling external notification app
- core: update certifi to 2019.3.9
- core: scan_video: add series/title as alternative by scanning filename itself without parent folders
- core: add generic solution for solving captchas using anti captcha services
- core: increase cache time to 180d (was: 30d)
- core: guess_matches: handle multiple title matches; fixes bazarr#403
- windows: fix compatibility issues with plex transcoder
- compat: use lowercase paths on subtitle detection
- providers: addic7ed: re-enable (using paid anti captch service)
- providers: assrt: assume undefined Chinese flavor as Simplified (chs/zho-Hans)
- providers: subscene: make it work again by bypassing cf
- providers: subscene: don't fail on missing cover
- providers: titlovi: re-enable (might need paid anti captch service)
- providers: opensubtitles: fix only_foreign handling
- providers: opensubtitles: show subtitles with possibly mismatched series when manually listing subs
- menu: list subtitles: show subtitles with bad season/episode values as well
- refiners: omdb: fix imdb ids with spaces
2.6.4.2911
- core: improve file cache (windows especially); use fixed-length cache filenames; fixes #600
- core: don't log "Checking connections ..." when sonarr/radarr not activated
+1
View File
@@ -1083,6 +1083,7 @@ class Config(object):
def parse_custom_dns(self):
custom_dns = Prefs['use_custom_dns2'].strip()
os.environ["dns_resolvers"] = ""
if custom_dns:
ips = filter(lambda x: x, [d.strip() for d in custom_dns.split(",")])
if ips:
+3 -3
View File
@@ -13,7 +13,7 @@
<key>CFBundleSignature</key>
<string>????</string>
<key>CFBundleVersion</key>
<string>2.6.5.3041</string>
<string>2.6.5.3074</string>
<key>PlexFrameworkVersion</key>
<string>2</string>
<key>PlexPluginClass</key>
@@ -23,7 +23,7 @@
<key>PlexPluginConsoleLogging</key>
<string>0</string>
<key>PlexPluginDevMode</key>
<string>0</string>
<string>1</string>
<key>PlexPluginCodePolicy</key>
<!-- this allows channels to access some python methods which are otherwise blocked, as well as import external code libraries, and interact with the PMS HTTP API -->
<string>Elevated</string>
@@ -32,7 +32,7 @@
&lt;h1&gt;Sub-Zero for Plex&lt;/h1&gt;&lt;i&gt;Subtitles done right&lt;/i&gt;
Version 2.6.5.3041
Version 2.6.5.3074 DEV
Originally based on @bramwalet's awesome &lt;a href=&quot;https://github.com/bramwalet/Subliminal.bundle&quot;&gt;Subliminal.bundle&lt;/a&gt;
@@ -1,395 +0,0 @@
# coding=utf-8
import logging
import random
import re
import os
import json
import base64
from copy import deepcopy
from time import sleep
from collections import OrderedDict
from .jsfuck import jsunfuck
import js2py
from requests.sessions import Session
from subliminal_patch.pitcher import pitchers
try:
from requests_toolbelt.utils import dump
except ImportError:
pass
try:
from urlparse import urlparse
from urlparse import urlunparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlunparse
brotli_available = True
try:
from brotli import decompress as brdec
except:
brotli_available = False
logger = logging.getLogger(__name__)
__version__ = "2.0.3"
# Orignally written by https://github.com/Anorov/cloudflare-scrape
# Rewritten by VeNoMouS - <venom@gen-x.co.nz> for https://github.com/VeNoMouS/Sick-Beard - 24/3/2018 NZDT
DEFAULT_USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/65.0.3325.181 Chrome/65.0.3325.181 Safari/537.36",
"Mozilla/5.0 (Linux; Android 7.0; Moto G (5) Build/NPPS25.137-93-8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:59.0) Gecko/20100101 Firefox/59.0",
"Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0",
]
BUG_REPORT = """\
Cloudflare may have changed their technique, or there may be a bug in the script.
"""
cur_path = os.path.abspath(os.path.dirname(__file__))
if brotli_available:
brwsrs = os.path.join(cur_path, "browsers_br.json")
with open(brwsrs, "r") as f:
UA_COMBO = json.load(f, object_pairs_hook=OrderedDict)["chrome"]
else:
brwsrs = os.path.join(cur_path, "browsers.json")
UA_COMBO = []
with open(brwsrs, "r") as f:
_brwsrs = json.load(f, object_pairs_hook=OrderedDict)
for entry in _brwsrs:
_entry = OrderedDict(("-".join(a.capitalize() for a in key.split("-")), value)
for key, value in entry.iteritems())
_entry["User-Agent"] = None
UA_COMBO.append({"User-Agent": [entry["user-agent"]], "headers": _entry})
class NeedsCaptchaException(Exception):
pass
class CloudflareScraper(Session):
def __init__(self, *args, **kwargs):
self.delay = kwargs.pop('delay', 8)
self.debug = False
self._was_cf = False
self._ua = None
self._hdrs = None
super(CloudflareScraper, self).__init__(*args, **kwargs)
if not self._ua:
# Set a random User-Agent if no custom User-Agent has been set
ua_combo = random.choice(UA_COMBO)
self._ua = random.choice(ua_combo["User-Agent"])
self._hdrs = ua_combo["headers"].copy()
self._hdrs["User-Agent"] = self._ua
self.headers['User-Agent'] = self._ua
def set_cloudflare_challenge_delay(self, delay):
if isinstance(delay, (int, float)) and delay > 0:
self.delay = delay
def is_cloudflare_challenge(self, resp):
if resp.headers.get('Server', '').startswith('cloudflare'):
if b'why_captcha' in resp.content or b'/cdn-cgi/l/chk_captcha' in resp.content:
raise NeedsCaptchaException
return (
resp.status_code in [429, 503]
and b"jschl_vc" in resp.content
and b"jschl_answer" in resp.content
)
return False
def debugRequest(self, req):
try:
print (dump.dump_all(req).decode('utf-8'))
except:
pass
def request(self, method, url, *args, **kwargs):
# self.headers = (
# OrderedDict(
# [
# ('User-Agent', self.headers['User-Agent']),
# ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
# ('Accept-Language', 'en-US,en;q=0.5'),
# ('Accept-Encoding', 'gzip, deflate'),
# ('Connection', 'close'),
# ('Upgrade-Insecure-Requests', '1')
# ]
# )
# )
self.headers = self._hdrs.copy()
resp = super(CloudflareScraper, self).request(method, url, *args, **kwargs)
if resp.headers.get('content-encoding') == 'br' and brotli_available:
resp._content = brdec(resp._content)
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
try:
if self.is_cloudflare_challenge(resp):
self._was_cf = True
# Work around if the initial request is not a GET,
# Superseed with a GET then re-request the orignal METHOD.
if resp.request.method != 'GET':
self.request('GET', resp.url)
resp = self.request(method, url, *args, **kwargs)
else:
resp = self.solve_cf_challenge(resp, **kwargs)
except NeedsCaptchaException:
# solve the captcha
self._was_cf = True
site_key = re.search(r'data-sitekey="(.+?)"', resp.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', resp.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', resp.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("cf: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("cf", resp.request.url, site_key,
user_agent=self.headers["User-Agent"],
cookies=self.cookies.get_dict(),
is_invisible=True)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
logger.info("cf: %s: Solving captcha", domain)
result = pitcher.throw()
if not result:
raise Exception("cf: Couldn't solve captcha!")
submit_url = '{}://{}/cdn-cgi/l/chk_captcha'.format(parsed_url.scheme, domain)
method = resp.request.method
cloudflare_kwargs = {
'allow_redirects': False,
'headers': {'Referer': resp.url},
'params': OrderedDict(
[
('s', challenge_s),
('g-recaptcha-response', result)
]
)
}
return self.request(method, submit_url, **cloudflare_kwargs)
return resp
def solve_cf_challenge(self, resp, **original_kwargs):
body = resp.text
# Cloudflare requires a delay before solving the challenge
if self.delay == 8:
try:
delay = float(re.search(r'submit\(\);\r?\n\s*},\s*([0-9]+)', body).group(1)) / float(1000)
if isinstance(delay, (int, float)):
self.delay = delay
except:
pass
sleep(self.delay)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_jschl'.format(parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
headers = cloudflare_kwargs.setdefault('headers', {'Referer': resp.url})
try:
params = cloudflare_kwargs.setdefault(
'params', OrderedDict(
[
('s', re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body).group('s_value')),
('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
('pass', re.search(r'name="pass" value="(.+?)"', body).group(1)),
]
)
)
except Exception as e:
# Something is wrong with the page.
# This may indicate Cloudflare has changed their anti-bot
# technique. If you see this and are running the latest version,
# please open a GitHub issue so I can update the code accordingly.
raise ValueError("Unable to parse Cloudflare anti-bots page: {} {}".format(e.message, BUG_REPORT))
# Solve the Javascript challenge
params['jschl_answer'] = self.solve_challenge(body, domain)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
method = resp.request.method
cloudflare_kwargs['allow_redirects'] = False
redirect = self.request(method, submit_url, **cloudflare_kwargs)
redirect_location = urlparse(redirect.headers['Location'])
if not redirect_location.netloc:
redirect_url = urlunparse(
(
parsed_url.scheme,
domain,
redirect_location.path,
redirect_location.params,
redirect_location.query,
redirect_location.fragment
)
)
return self.request(method, redirect_url, **original_kwargs)
return self.request(method, redirect.headers['Location'], **original_kwargs)
def solve_challenge(self, body, domain):
try:
js = re.search(
r"setTimeout\(function\(\){\s+(var s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n",
body
).group(1)
except Exception:
raise ValueError("Unable to identify Cloudflare IUAM Javascript on website. {}".format(BUG_REPORT))
js = re.sub(r"a\.value = ((.+).toFixed\(10\))?", r"\1", js)
js = re.sub(r'(e\s=\sfunction\(s\)\s{.*?};)', '', js, flags=re.DOTALL|re.MULTILINE)
js = re.sub(r"\s{3,}[a-z](?: = |\.).+", "", js).replace("t.length", str(len(domain)))
js = js.replace('; 121', '')
# Strip characters that could be used to exit the string context
# These characters are not currently used in Cloudflare's arithmetic snippet
js = re.sub(r"[\n\\']", "", js)
if 'toFixed' not in js:
raise ValueError("Error parsing Cloudflare IUAM Javascript challenge. {}".format(BUG_REPORT))
try:
jsEnv = """
var t = "{domain}";
var g = String.fromCharCode;
o = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
e = function(s) {{
s += "==".slice(2 - (s.length & 3));
var bm, r = "", r1, r2, i = 0;
for (; i < s.length;) {{
bm = o.indexOf(s.charAt(i++)) << 18 | o.indexOf(s.charAt(i++)) << 12 | (r1 = o.indexOf(s.charAt(i++))) << 6 | (r2 = o.indexOf(s.charAt(i++)));
r += r1 === 64 ? g(bm >> 16 & 255) : r2 === 64 ? g(bm >> 16 & 255, bm >> 8 & 255) : g(bm >> 16 & 255, bm >> 8 & 255, bm & 255);
}}
return r;
}};
function italics (str) {{ return '<i>' + this + '</i>'; }};
var document = {{
getElementById: function () {{
return {{'innerHTML': '{innerHTML}'}};
}}
}};
{js}
"""
innerHTML = re.search(
'<div(?: [^<>]*)? id="([^<>]*?)">([^<>]*?)<\/div>',
body,
re.MULTILINE | re.DOTALL
)
innerHTML = innerHTML.group(2).replace("'", r"\'") if innerHTML else ""
js = jsunfuck(jsEnv.format(domain=domain, innerHTML=innerHTML, js=js))
def atob(s):
return base64.b64decode('{}'.format(s)).decode('utf-8')
js2py.disable_pyimport()
context = js2py.EvalJs({'atob': atob})
result = context.eval(js)
except Exception:
logging.error("Error executing Cloudflare IUAM Javascript. {}".format(BUG_REPORT))
raise
try:
float(result)
except Exception:
raise ValueError("Cloudflare IUAM challenge returned unexpected answer. {}".format(BUG_REPORT))
return result
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudflareScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
# Functions for integrating cloudflare-scrape with other applications and scripts
@classmethod
def get_tokens(cls, url, user_agent=None, debug=False, **kwargs):
scraper = cls.create_scraper()
scraper.debug = debug
if user_agent:
scraper.headers['User-Agent'] = user_agent
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception as e:
logging.error("'{}' returned an error. Could not collect tokens.".format(url))
raise
domain = urlparse(resp.url).netloc
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith('.') and d in ('.{}'.format(domain)):
cookie_domain = d
break
else:
raise ValueError("Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM (\"I'm Under Attack Mode\") enabled?")
return (
{
'__cfduid': scraper.cookies.get('__cfduid', '', domain=cookie_domain),
'cf_clearance': scraper.cookies.get('cf_clearance', '', domain=cookie_domain)
},
scraper.headers['User-Agent']
)
@classmethod
def get_cookie_string(cls, url, user_agent=None, debug=False, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, user_agent=user_agent, debug=debug, **kwargs)
return "; ".join("=".join(pair) for pair in tokens.items()), user_agent
create_scraper = CloudflareScraper.create_scraper
get_tokens = CloudflareScraper.get_tokens
get_cookie_string = CloudflareScraper.get_cookie_string
@@ -1,80 +0,0 @@
[
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"user-agent": "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.102 Safari/537.36",
"accept-encoding": "gzip,deflate",
"accept-language": "en-US,en;q=0.8"
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"user-agent": "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.101 Safari/537.36",
"accept-encoding": "gzip,deflate",
"accept-language": "en-US,en;q=0.8"
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36",
"accept-language": "en-US,en;q=0.8",
"accept-encoding": "gzip, deflate, "
},
{
"connection": "close",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36",
"accept-language": "en-US,en;q=0.8",
"accept-encoding": "gzip, deflate, "
},
{
"connection": "close",
"accept": "*/*",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Firefox/30.0"
},
{
"connection": "close",
"accept": "image/jpeg, image/gif, image/pjpeg, application/x-ms-application, application/xaml+xml, application/x-ms-xbap, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"accept": "text/html, application/xhtml+xml, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"accept": "text/html, application/xhtml+xml, */*",
"accept-language": "en-US",
"user-agent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)",
"accept-encoding": "gzip, deflate",
"dnt": "1"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
},
{
"connection": "close",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-language": "en-US,en;q=0.5",
"accept-encoding": "gzip, deflate"
}
]
@@ -0,0 +1,311 @@
import logging
import re
import sys
import ssl
from copy import deepcopy
from time import sleep
from collections import OrderedDict
from requests.sessions import Session
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.ssl_ import create_urllib3_context
from .interpreters import JavaScriptInterpreter
from .user_agent import User_Agent
try:
from requests_toolbelt.utils import dump
except ImportError:
pass
try:
import brotli
except ImportError:
pass
try:
from urlparse import urlparse
from urlparse import urlunparse
except ImportError:
from urllib.parse import urlparse
from urllib.parse import urlunparse
##########################################################################################################################################################
__version__ = '1.1.9'
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
class CipherSuiteAdapter(HTTPAdapter):
def __init__(self, cipherSuite=None, **kwargs):
self.cipherSuite = cipherSuite
if hasattr(ssl, 'PROTOCOL_TLS'):
self.ssl_context = create_urllib3_context(
ssl_version=getattr(ssl, 'PROTOCOL_TLSv1_3', ssl.PROTOCOL_TLSv1_2),
ciphers=self.cipherSuite
)
else:
self.ssl_context = create_urllib3_context(ssl_version=ssl.PROTOCOL_TLSv1)
super(CipherSuiteAdapter, self).__init__(**kwargs)
##########################################################################################################################################################
def init_poolmanager(self, *args, **kwargs):
kwargs['ssl_context'] = self.ssl_context
return super(CipherSuiteAdapter, self).init_poolmanager(*args, **kwargs)
##########################################################################################################################################################
def proxy_manager_for(self, *args, **kwargs):
kwargs['ssl_context'] = self.ssl_context
return super(CipherSuiteAdapter, self).proxy_manager_for(*args, **kwargs)
##########################################################################################################################################################
class CloudScraper(Session):
def __init__(self, *args, **kwargs):
self.debug = kwargs.pop('debug', False)
self.delay = kwargs.pop('delay', None)
self.interpreter = kwargs.pop('interpreter', 'js2py')
self.allow_brotli = kwargs.pop('allow_brotli', True if 'brotli' in sys.modules.keys() else False)
self.cipherSuite = None
super(CloudScraper, self).__init__(*args, **kwargs)
if 'requests' in self.headers['User-Agent']:
# Set a random User-Agent if no custom User-Agent has been set
self.headers = User_Agent(allow_brotli=self.allow_brotli).headers
self.mount('https://', CipherSuiteAdapter(self.loadCipherSuite()))
##########################################################################################################################################################
@staticmethod
def debugRequest(req):
try:
print(dump.dump_all(req).decode('utf-8'))
except: # noqa
pass
##########################################################################################################################################################
def loadCipherSuite(self):
if self.cipherSuite:
return self.cipherSuite
self.cipherSuite = ''
if hasattr(ssl, 'PROTOCOL_TLS'):
ciphers = [
'ECDHE-ECDSA-AES128-GCM-SHA256', 'ECDHE-RSA-AES128-GCM-SHA256', 'ECDHE-ECDSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES256-GCM-SHA384', 'ECDHE-ECDSA-CHACHA20-POLY1305-SHA256', 'ECDHE-RSA-CHACHA20-POLY1305-SHA256',
'ECDHE-RSA-AES128-CBC-SHA', 'ECDHE-RSA-AES256-CBC-SHA', 'RSA-AES128-GCM-SHA256', 'RSA-AES256-GCM-SHA384',
'ECDHE-RSA-AES128-GCM-SHA256', 'RSA-AES256-SHA', '3DES-EDE-CBC'
]
if hasattr(ssl, 'PROTOCOL_TLSv1_3'):
ciphers.insert(0, ['GREASE_3A', 'GREASE_6A', 'AES128-GCM-SHA256', 'AES256-GCM-SHA256', 'AES256-GCM-SHA384', 'CHACHA20-POLY1305-SHA256'])
ctx = ssl.SSLContext(getattr(ssl, 'PROTOCOL_TLSv1_3', ssl.PROTOCOL_TLSv1_2))
for cipher in ciphers:
try:
ctx.set_ciphers(cipher)
self.cipherSuite = '{}:{}'.format(self.cipherSuite, cipher).rstrip(':')
except ssl.SSLError:
pass
return self.cipherSuite
##########################################################################################################################################################
def request(self, method, url, *args, **kwargs):
ourSuper = super(CloudScraper, self)
resp = ourSuper.request(method, url, *args, **kwargs)
if resp.headers.get('Content-Encoding') == 'br':
if self.allow_brotli and resp._content:
resp._content = brotli.decompress(resp.content)
else:
logging.warning('Brotli content detected, But option is disabled, we will not continue.')
return resp
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
if self.isChallengeRequest(resp):
if resp.request.method != 'GET':
# Work around if the initial request is not a GET,
# Supersede with a GET then re-request the original METHOD.
self.request('GET', resp.url)
resp = ourSuper.request(method, url, *args, **kwargs)
else:
# Solve Challenge
resp = self.sendChallengeResponse(resp, **kwargs)
return resp
##########################################################################################################################################################
@staticmethod
def isChallengeRequest(resp):
if resp.headers.get('Server', '').startswith('cloudflare'):
if b'why_captcha' in resp.content or b'/cdn-cgi/l/chk_captcha' in resp.content:
raise ValueError('Captcha')
return (
resp.status_code in [429, 503]
and all(s in resp.content for s in [b'jschl_vc', b'jschl_answer'])
)
return False
##########################################################################################################################################################
def sendChallengeResponse(self, resp, **original_kwargs):
body = resp.text
# Cloudflare requires a delay before solving the challenge
if not self.delay:
try:
delay = float(re.search(r'submit\(\);\r?\n\s*},\s*([0-9]+)', body).group(1)) / float(1000)
if isinstance(delay, (int, float)):
self.delay = delay
except: # noqa
pass
sleep(self.delay)
parsed_url = urlparse(resp.url)
domain = parsed_url.netloc
submit_url = '{}://{}/cdn-cgi/l/chk_jschl'.format(parsed_url.scheme, domain)
cloudflare_kwargs = deepcopy(original_kwargs)
try:
params = OrderedDict()
s = re.search(r'name="s"\svalue="(?P<s_value>[^"]+)', body)
if s:
params['s'] = s.group('s_value')
params.update(
[
('jschl_vc', re.search(r'name="jschl_vc" value="(\w+)"', body).group(1)),
('pass', re.search(r'name="pass" value="(.+?)"', body).group(1))
]
)
params = cloudflare_kwargs.setdefault('params', params)
except Exception as e:
raise ValueError('Unable to parse Cloudflare anti-bots page: {} {}'.format(e.message, BUG_REPORT))
# Solve the Javascript challenge
params['jschl_answer'] = JavaScriptInterpreter.dynamicImport(self.interpreter).solveChallenge(body, domain)
# Requests transforms any request into a GET after a redirect,
# so the redirect has to be handled manually here to allow for
# performing other types of requests even as the first request.
cloudflare_kwargs['allow_redirects'] = False
redirect = self.request(resp.request.method, submit_url, **cloudflare_kwargs)
redirect_location = urlparse(redirect.headers['Location'])
if not redirect_location.netloc:
redirect_url = urlunparse(
(
parsed_url.scheme,
domain,
redirect_location.path,
redirect_location.params,
redirect_location.query,
redirect_location.fragment
)
)
return self.request(resp.request.method, redirect_url, **original_kwargs)
return self.request(resp.request.method, redirect.headers['Location'], **original_kwargs)
##########################################################################################################################################################
@classmethod
def create_scraper(cls, sess=None, **kwargs):
"""
Convenience function for creating a ready-to-go CloudScraper object.
"""
scraper = cls(**kwargs)
if sess:
attrs = ['auth', 'cert', 'cookies', 'headers', 'hooks', 'params', 'proxies', 'data']
for attr in attrs:
val = getattr(sess, attr, None)
if val:
setattr(scraper, attr, val)
return scraper
##########################################################################################################################################################
# Functions for integrating cloudscraper with other applications and scripts
@classmethod
def get_tokens(cls, url, **kwargs):
scraper = cls.create_scraper(
debug=kwargs.pop('debug', False),
delay=kwargs.pop('delay', None),
interpreter=kwargs.pop('interpreter', 'js2py'),
allow_brotli=kwargs.pop('allow_brotli', True),
)
try:
resp = scraper.get(url, **kwargs)
resp.raise_for_status()
except Exception:
logging.error('"{}" returned an error. Could not collect tokens.'.format(url))
raise
domain = urlparse(resp.url).netloc
# noinspection PyUnusedLocal
cookie_domain = None
for d in scraper.cookies.list_domains():
if d.startswith('.') and d in ('.{}'.format(domain)):
cookie_domain = d
break
else:
raise ValueError('Unable to find Cloudflare cookies. Does the site actually have Cloudflare IUAM ("I\'m Under Attack Mode") enabled?')
return (
{
'__cfduid': scraper.cookies.get('__cfduid', '', domain=cookie_domain),
'cf_clearance': scraper.cookies.get('cf_clearance', '', domain=cookie_domain)
},
scraper.headers['User-Agent']
)
##########################################################################################################################################################
@classmethod
def get_cookie_string(cls, url, **kwargs):
"""
Convenience function for building a Cookie HTTP header value.
"""
tokens, user_agent = cls.get_tokens(url, **kwargs)
return '; '.join('='.join(pair) for pair in tokens.items()), user_agent
##########################################################################################################################################################
create_scraper = CloudScraper.create_scraper
get_tokens = CloudScraper.get_tokens
get_cookie_string = CloudScraper.get_cookie_string
@@ -0,0 +1,89 @@
import re
import sys
import logging
import abc
if sys.version_info >= (3, 4):
ABC = abc.ABC # noqa
else:
ABC = abc.ABCMeta('ABC', (), {})
##########################################################################################################################################################
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
interpreters = {}
class JavaScriptInterpreter(ABC):
@abc.abstractmethod
def __init__(self, name):
interpreters[name] = self
@classmethod
def dynamicImport(cls, name):
if name not in interpreters:
try:
__import__('{}.{}'.format(cls.__module__, name))
if not isinstance(interpreters.get(name), JavaScriptInterpreter):
raise ImportError('The interpreter was not initialized.')
except ImportError:
logging.error('Unable to load {} interpreter'.format(name))
raise
return interpreters[name]
@abc.abstractmethod
def eval(self, jsEnv, js):
pass
def solveChallenge(self, body, domain):
try:
js = re.search(
r'setTimeout\(function\(\){\s+(var s,t,o,p,b,r,e,a,k,i,n,g,f.+?\r?\n[\s\S]+?a\.value =.+?)\r?\n',
body
).group(1)
except Exception:
raise ValueError('Unable to identify Cloudflare IUAM Javascript on website. {}'.format(BUG_REPORT))
js = re.sub(r'\s{2,}', ' ', js, flags=re.MULTILINE | re.DOTALL).replace('\'; 121\'', '')
js += '\na.value;'
jsEnv = '''
String.prototype.italics=function(str) {{return "<i>" + this + "</i>";}};
var document = {{
createElement: function () {{
return {{ firstChild: {{ href: "https://{domain}/" }} }}
}},
getElementById: function () {{
return {{"innerHTML": "{innerHTML}"}};
}}
}};
'''
try:
innerHTML = re.search(
r'<div(?: [^<>]*)? id="([^<>]*?)">([^<>]*?)</div>',
body,
re.MULTILINE | re.DOTALL
)
innerHTML = innerHTML.group(2) if innerHTML else ''
except: # noqa
logging.error('Error extracting Cloudflare IUAM Javascript. {}'.format(BUG_REPORT))
raise
try:
result = self.eval(
re.sub(r'\s{2,}', ' ', jsEnv.format(domain=domain, innerHTML=innerHTML), flags=re.MULTILINE | re.DOTALL),
js
)
float(result)
except Exception:
logging.error('Error executing Cloudflare IUAM Javascript. {}'.format(BUG_REPORT))
raise
return result
@@ -0,0 +1,32 @@
from __future__ import absolute_import
import js2py
import logging
import base64
from . import JavaScriptInterpreter
from .jsunfuck import jsunfuck
class ChallengeInterpreter(JavaScriptInterpreter):
def __init__(self):
super(ChallengeInterpreter, self).__init__('js2py')
def eval(self, jsEnv, js):
if js2py.eval_js('(+(+!+[]+[+!+[]]+(!![]+[])[!+[]+!+[]+!+[]]+[!+[]+!+[]]+[+[]])+[])[+!+[]]') == '1':
logging.warning('WARNING - Please upgrade your js2py https://github.com/PiotrDabkowski/Js2Py, applying work around for the meantime.')
js = jsunfuck(js)
def atob(s):
return base64.b64decode('{}'.format(s)).decode('utf-8')
js2py.disable_pyimport()
context = js2py.EvalJs({'atob': atob})
result = context.eval('{}{}'.format(jsEnv, js))
return result
ChallengeInterpreter()
@@ -80,18 +80,18 @@ CONSTRUCTORS = {
'RegExp': 'Function("return/"+false+"/")()'
}
def jsunfuck(jsfuckString):
for key in sorted(MAPPING, key=lambda k: len(MAPPING[k]), reverse=True):
if MAPPING.get(key) in jsfuckString:
jsfuckString = jsfuckString.replace(MAPPING.get(key), '"{}"'.format(key))
for key in sorted(SIMPLE, key=lambda k: len(SIMPLE[k]), reverse=True):
if SIMPLE.get(key) in jsfuckString:
jsfuckString = jsfuckString.replace(SIMPLE.get(key), '{}'.format(key))
#for key in sorted(CONSTRUCTORS, key=lambda k: len(CONSTRUCTORS[k]), reverse=True):
# for key in sorted(CONSTRUCTORS, key=lambda k: len(CONSTRUCTORS[k]), reverse=True):
# if CONSTRUCTORS.get(key) in jsfuckString:
# jsfuckString = jsfuckString.replace(CONSTRUCTORS.get(key), '{}'.format(key))
return jsfuckString
return jsfuckString
@@ -0,0 +1,46 @@
import base64
import logging
import subprocess
from . import JavaScriptInterpreter
##########################################################################################################################################################
BUG_REPORT = 'Cloudflare may have changed their technique, or there may be a bug in the script.'
##########################################################################################################################################################
class ChallengeInterpreter(JavaScriptInterpreter):
def __init__(self):
super(ChallengeInterpreter, self).__init__('nodejs')
def eval(self, jsEnv, js):
try:
js = 'var atob = function(str) {return Buffer.from(str, "base64").toString("binary");};' \
'var challenge = atob("%s");' \
'var context = {atob: atob};' \
'var options = {filename: "iuam-challenge.js", timeout: 4000};' \
'var answer = require("vm").runInNewContext(challenge, context, options);' \
'process.stdout.write(String(answer));' \
% base64.b64encode('{}{}'.format(jsEnv, js).encode('UTF-8')).decode('ascii')
return subprocess.check_output(['node', '-e', js])
except OSError as e:
if e.errno == 2:
raise EnvironmentError(
'Missing Node.js runtime. Node is required and must be in the PATH (check with `node -v`). Your Node binary may be called `nodejs` rather than `node`, '
'in which case you may need to run `apt-get install nodejs-legacy` on some Debian-based systems. (Please read the cloudscraper'
' README\'s Dependencies section: https://github.com/VeNoMouS/cloudscraper#dependencies.'
)
raise
except Exception:
logging.error('Error executing Cloudflare IUAM Javascript. %s' % BUG_REPORT)
raise
pass
ChallengeInterpreter()
@@ -0,0 +1,40 @@
import os
import json
import random
import logging
from collections import OrderedDict
##########################################################################################################################################################
class User_Agent():
##########################################################################################################################################################
def __init__(self, *args, **kwargs):
self.headers = None
self.loadUserAgent(*args, **kwargs)
##########################################################################################################################################################
def loadUserAgent(self, *args, **kwargs):
browser = kwargs.pop('browser', 'chrome')
user_agents = json.load(
open(os.path.join(os.path.dirname(__file__), 'browsers.json'), 'r'),
object_pairs_hook=OrderedDict
)
if not user_agents.get(browser):
logging.error('Sorry "{}" browser User-Agent was not found.'.format(browser))
raise
user_agent = random.choice(user_agents.get(browser))
self.headers = user_agent.get('headers')
self.headers['User-Agent'] = random.choice(user_agent.get('User-Agent'))
if not kwargs.get('allow_brotli', False):
if 'br' in self.headers['Accept-Encoding']:
self.headers['Accept-Encoding'] = ','.join([encoding for encoding in self.headers['Accept-Encoding'].split(',') if encoding.strip() != 'br']).strip()
File diff suppressed because it is too large Load Diff
+3 -10
View File
@@ -5,6 +5,7 @@ import re
from .translators.friendly_nodes import REGEXP_CONVERTER
from .utils.injector import fix_js_args
from types import FunctionType, ModuleType, GeneratorType, BuiltinFunctionType, MethodType, BuiltinMethodType
from math import floor, log10
import traceback
try:
import numpy
@@ -603,15 +604,7 @@ class PyJs(object):
elif typ == 'Boolean':
return Js('true') if self.value else Js('false')
elif typ == 'Number': #or self.Class=='Number':
if self.is_nan():
return Js('NaN')
elif self.is_infinity():
sign = '-' if self.value < 0 else ''
return Js(sign + 'Infinity')
elif isinstance(self.value,
long) or self.value.is_integer(): # dont print .0
return Js(unicode(int(self.value)))
return Js(unicode(self.value)) # accurate enough
return Js(unicode(js_dtoa(self.value)))
elif typ == 'String':
return self
else: #object
@@ -1046,7 +1039,7 @@ def PyJsComma(a, b):
return b
from .internals.simplex import JsException as PyJsException
from .internals.simplex import JsException as PyJsException, js_dtoa
import pyjsparser
pyjsparser.parser.ENABLE_JS2PY_ERRORS = lambda msg: MakeError('SyntaxError', msg)
@@ -116,10 +116,12 @@ def eval_js(js):
def eval_js6(js):
"""Just like eval_js but with experimental support for js6 via babel."""
return eval_js(js6_to_js5(js))
def translate_js6(js):
"""Just like translate_js but with experimental support for js6 via babel."""
return translate_js(js6_to_js5(js))
@@ -3,15 +3,19 @@ import re
import datetime
from desc import *
from simplex import *
from conversions import *
import six
from pyjsparser import PyJsParser
from itertools import izip
from .desc import *
from .simplex import *
from .conversions import *
from pyjsparser import PyJsParser
import six
if six.PY2:
from itertools import izip
else:
izip = zip
from conversions import *
from simplex import *
def Type(obj):
@@ -1,8 +1,8 @@
from code import Code
from simplex import MakeError
from opcodes import *
from operations import *
from trans_utils import *
from .code import Code
from .simplex import MakeError
from .opcodes import *
from .operations import *
from .trans_utils import *
SPECIAL_IDENTIFIERS = {'true', 'false', 'this'}
@@ -465,10 +465,11 @@ class ByteCodeGenerator:
self.emit('LOAD_OBJECT', tuple(data))
def Program(self, body, **kwargs):
old_tape_len = len(self.exe.tape)
self.emit('LOAD_UNDEFINED')
self.emit(body)
# add function tape !
self.exe.tape = self.function_declaration_tape + self.exe.tape
self.exe.tape = self.exe.tape[:old_tape_len] + self.function_declaration_tape + self.exe.tape[old_tape_len:]
def Pyimport(self, imp, **kwargs):
raise NotImplementedError(
@@ -735,17 +736,17 @@ def main():
#
# }
a.emit(d)
print a.declared_vars
print a.exe.tape
print len(a.exe.tape)
print(a.declared_vars)
print(a.exe.tape)
print(len(a.exe.tape))
a.exe.compile()
def log(this, args):
print args[0]
print(args[0])
return 999
print a.exe.run(a.exe.space.GlobalObj)
print(a.exe.run(a.exe.space.GlobalObj))
if __name__ == '__main__':
@@ -1,16 +1,17 @@
from opcodes import *
from space import *
from base import *
from .opcodes import *
from .space import *
from .base import *
class Code:
'''Can generate, store and run sequence of ops representing js code'''
def __init__(self, is_strict=False):
def __init__(self, is_strict=False, debug_mode=False):
self.tape = []
self.compiled = False
self.label_locs = None
self.is_strict = is_strict
self.debug_mode = debug_mode
self.contexts = []
self.current_ctx = None
@@ -22,6 +23,10 @@ class Code:
self.GLOBAL_THIS = None
self.space = None
# dbg
self.ctx_depth = 0
def get_new_label(self):
self._label_count += 1
return self._label_count
@@ -74,21 +79,35 @@ class Code:
# 0=normal, 1=return, 2=jump_outside, 3=errors
# execute_fragment_under_context returns:
# (return_value, typ, return_value/jump_loc/py_error)
# ctx.stack must be len 1 and its always empty after the call.
# IMPARTANT: It is guaranteed that the length of the ctx.stack is unchanged.
'''
old_curr_ctx = self.current_ctx
self.ctx_depth += 1
old_stack_len = len(ctx.stack)
old_ret_len = len(self.return_locs)
old_ctx_len = len(self.contexts)
try:
self.current_ctx = ctx
return self._execute_fragment_under_context(
ctx, start_label, end_label)
except JsException as err:
# undo the things that were put on the stack (if any)
# don't worry, I know the recovery is possible through try statement and for this reason try statement
# has its own context and stack so it will not delete the contents of the outer stack
del ctx.stack[:]
if self.debug_mode:
self._on_fragment_exit("js errors")
# undo the things that were put on the stack (if any) to ensure a proper error recovery
del ctx.stack[old_stack_len:]
del self.return_locs[old_ret_len:]
del self.contexts[old_ctx_len :]
return undefined, 3, err
finally:
self.ctx_depth -= 1
self.current_ctx = old_curr_ctx
assert old_stack_len == len(ctx.stack)
def _get_dbg_indent(self):
return self.ctx_depth * ' '
def _on_fragment_exit(self, mode):
print(self._get_dbg_indent() + 'ctx exit (%s)' % mode)
def _execute_fragment_under_context(self, ctx, start_label, end_label):
start, end = self.label_locs[start_label], self.label_locs[end_label]
@@ -97,16 +116,20 @@ class Code:
entry_level = len(self.contexts)
# for e in self.tape[start:end]:
# print e
if self.debug_mode:
print(self._get_dbg_indent() + 'ctx entry (from:%d, to:%d)' % (start, end))
while loc < len(self.tape):
#print loc, self.tape[loc]
if len(self.contexts) == entry_level and loc >= end:
if self.debug_mode:
self._on_fragment_exit('normal')
assert loc == end
assert len(ctx.stack) == (
1 + initial_len), 'Stack change must be equal to +1!'
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return ctx.stack.pop(), 0, None # means normal return
# execute instruction
if self.debug_mode:
print(self._get_dbg_indent() + str(loc), self.tape[loc])
status = self.tape[loc].eval(ctx)
# check status for special actions
@@ -116,9 +139,10 @@ class Code:
if len(self.contexts) == entry_level:
# check if jumped outside of the fragment and break if so
if not start <= loc < end:
assert len(ctx.stack) == (
1 + initial_len
), 'Stack change must be equal to +1!'
if self.debug_mode:
self._on_fragment_exit('jump outside loc:%d label:%d' % (loc, status))
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return ctx.stack.pop(), 2, status # jump outside
continue
@@ -137,7 +161,10 @@ class Code:
# return: (None, None)
else:
if len(self.contexts) == entry_level:
assert len(ctx.stack) == 1 + initial_len
if self.debug_mode:
self._on_fragment_exit('return')
delta_stack = len(ctx.stack) - initial_len
assert delta_stack == +1, 'Stack change must be equal to +1! got %d' % delta_stack
return undefined, 1, ctx.stack.pop(
) # return signal
return_value = ctx.stack.pop()
@@ -149,6 +176,8 @@ class Code:
continue
# next instruction
loc += 1
if self.debug_mode:
self._on_fragment_exit('internal error - unexpected end of tape, will crash')
assert False, 'Remember to add NOP at the end!'
def run(self, ctx, starting_loc=0):
@@ -156,7 +185,8 @@ class Code:
self.current_ctx = ctx
while loc < len(self.tape):
# execute instruction
#print loc, self.tape[loc]
if self.debug_mode:
print(loc, self.tape[loc])
status = self.tape[loc].eval(ctx)
# check status for special actions
@@ -42,6 +42,7 @@ def executable_code(code_str, space, global_context=True):
space.byte_generator.emit('LABEL', skip)
space.byte_generator.emit('NOP')
space.byte_generator.restore_state()
space.byte_generator.exe.compile(
start_loc=old_tape_len
) # dont read the code from the beginning, dont be stupid!
@@ -71,5 +72,5 @@ def _eval(this, args):
def log(this, args):
print ' '.join(map(to_string, args))
print(' '.join(map(to_string, args)))
return undefined
@@ -1,6 +1,6 @@
from __future__ import unicode_literals
# Type Conversions. to_type. All must return PyJs subclass instance
from simplex import *
from .simplex import *
def to_primitive(self, hint=None):
@@ -73,14 +73,7 @@ def to_string(self):
elif typ == 'Boolean':
return 'true' if self else 'false'
elif typ == 'Number': # or self.Class=='Number':
if is_nan(self):
return 'NaN'
elif is_infinity(self):
sign = '-' if self < 0 else ''
return sign + 'Infinity'
elif int(self) == self: # integer value!
return unicode(int(self))
return unicode(self) # todo make it print exactly like node.js
return js_dtoa(self)
else: # object
return to_string(to_primitive(self, 'String'))
@@ -1,29 +1,22 @@
from __future__ import unicode_literals
from base import Scope
from func_utils import *
from conversions import *
from .base import Scope
from .func_utils import *
from .conversions import *
import six
from prototypes.jsboolean import BooleanPrototype
from prototypes.jserror import ErrorPrototype
from prototypes.jsfunction import FunctionPrototype
from prototypes.jsnumber import NumberPrototype
from prototypes.jsobject import ObjectPrototype
from prototypes.jsregexp import RegExpPrototype
from prototypes.jsstring import StringPrototype
from prototypes.jsarray import ArrayPrototype
import prototypes.jsjson as jsjson
import prototypes.jsutils as jsutils
from .prototypes.jsboolean import BooleanPrototype
from .prototypes.jserror import ErrorPrototype
from .prototypes.jsfunction import FunctionPrototype
from .prototypes.jsnumber import NumberPrototype
from .prototypes.jsobject import ObjectPrototype
from .prototypes.jsregexp import RegExpPrototype
from .prototypes.jsstring import StringPrototype
from .prototypes.jsarray import ArrayPrototype
from .prototypes import jsjson
from .prototypes import jsutils
from .constructors import jsnumber, jsstring, jsarray, jsboolean, jsregexp, jsmath, jsobject, jsfunction, jsconsole
from constructors import jsnumber
from constructors import jsstring
from constructors import jsarray
from constructors import jsboolean
from constructors import jsregexp
from constructors import jsmath
from constructors import jsobject
from constructors import jsfunction
from constructors import jsconsole
def fill_proto(proto, proto_class, space):
@@ -155,7 +148,10 @@ def fill_space(space, byte_generator):
j = easy_func(creator, space)
j.name = unicode(typ)
j.prototype = space.ERROR_TYPES[typ]
set_protected(j, 'prototype', space.ERROR_TYPES[typ])
set_non_enumerable(space.ERROR_TYPES[typ], 'constructor', j)
def new_create(args, space):
message = get_arg(args, 0)
@@ -178,6 +174,7 @@ def fill_space(space, byte_generator):
setattr(space, err_type_name + u'Prototype', extra_err)
error_constructors[err_type_name] = construct_constructor(
err_type_name)
assert space.TypeErrorPrototype is not None
# RegExp
@@ -1,5 +1,5 @@
from simplex import *
from conversions import *
from .simplex import *
from .conversions import *
import six
if six.PY3:
@@ -1,5 +1,5 @@
from operations import *
from base import get_member, get_member_dot, PyJsFunction, Scope
from .operations import *
from .base import get_member, get_member_dot, PyJsFunction, Scope
class OP_CODE(object):
@@ -1,6 +1,6 @@
from __future__ import unicode_literals
from simplex import *
from conversions import *
from .simplex import *
from .conversions import *
# ------------------------------------------------------------------------------
# Unary operations
@@ -4,7 +4,7 @@ from __future__ import unicode_literals
import re
from ..conversions import *
from ..func_utils import *
from jsregexp import RegExpExec
from .jsregexp import RegExpExec
DIGS = set(u'0123456789')
WHITE = u"\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u2028\u2029\u202F\u205F\u3000\uFEFF"
@@ -1,11 +1,9 @@
import pyjsparser
from space import Space
import fill_space
from byte_trans import ByteCodeGenerator
from code import Code
from simplex import MakeError
import sys
sys.setrecursionlimit(100000)
from .space import Space
from . import fill_space
from .byte_trans import ByteCodeGenerator
from .code import Code
from .simplex import *
pyjsparser.parser.ENABLE_JS2PY_ERRORS = lambda msg: MakeError(u'SyntaxError', unicode(msg))
@@ -16,8 +14,8 @@ def get_js_bytecode(js):
a.emit(d)
return a.exe.tape
def eval_js_vm(js):
a = ByteCodeGenerator(Code())
def eval_js_vm(js, debug=False):
a = ByteCodeGenerator(Code(debug_mode=debug))
s = Space()
a.exe.space = s
s.exe = a.exe
@@ -26,7 +24,10 @@ def eval_js_vm(js):
a.emit(d)
fill_space.fill_space(s, a)
# print a.exe.tape
if debug:
from pprint import pprint
pprint(a.exe.tape)
print()
a.exe.compile()
return a.exe.run(a.exe.space.GlobalObj)
@@ -1,6 +1,10 @@
from __future__ import unicode_literals
import six
if six.PY3:
basestring = str
long = int
xrange = range
unicode = str
#Undefined
class PyJsUndefined(object):
@@ -75,7 +79,7 @@ def is_callable(self):
def is_infinity(self):
return self == float('inf') or self == -float('inf')
return self == Infinity or self == -Infinity
def is_nan(self):
@@ -114,7 +118,7 @@ class JsException(Exception):
return self.mes.to_string().value
else:
if self.throw is not None:
from conversions import to_string
from .conversions import to_string
return to_string(self.throw)
else:
return self.typ + ': ' + self.message
@@ -131,3 +135,26 @@ def value_from_js_exception(js_exception, space):
return js_exception.throw
else:
return space.NewError(js_exception.typ, js_exception.message)
def js_dtoa(number):
if is_nan(number):
return u'NaN'
elif is_infinity(number):
if number > 0:
return u'Infinity'
return u'-Infinity'
elif number == 0.:
return u'0'
elif abs(number) < 1e-6 or abs(number) >= 1e21:
frac, exponent = unicode(repr(float(number))).split('e')
# Remove leading zeros from the exponent.
exponent = int(exponent)
return frac + ('e' if exponent < 0 else 'e+') + unicode(exponent)
elif abs(number) < 1e-4: # python starts to return exp notation while we still want the prec
frac, exponent = unicode(repr(float(number))).split('e-')
base = u'0.' + u'0' * (int(exponent) - 1) + frac.lstrip('-').replace('.', '')
return base if number > 0. else u'-' + base
elif isinstance(number, long) or number.is_integer(): # dont print .0
return unicode(int(number))
return unicode(repr(number)) # python representation should be equivalent.
@@ -1,5 +1,5 @@
from base import *
from simplex import *
from .base import *
from .simplex import *
class Space(object):
@@ -1,3 +1,10 @@
import six
if six.PY3:
basestring = str
long = int
xrange = range
unicode = str
def to_key(literal_or_identifier):
''' returns string representation of this object'''
if literal_or_identifier['type'] == 'Identifier':
@@ -6,8 +6,6 @@ if six.PY3:
xrange = range
unicode = str
# todo fix apply and bind
class FunctionPrototype:
def toString():
@@ -41,6 +39,7 @@ class FunctionPrototype:
return this.call(obj, args)
def bind(thisArg):
arguments_ = arguments
target = this
if not target.is_callable():
raise this.MakeError(
@@ -48,5 +47,5 @@ class FunctionPrototype:
if len(arguments) <= 1:
args = ()
else:
args = tuple([arguments[e] for e in xrange(1, len(arguments))])
args = tuple([arguments_[e] for e in xrange(1, len(arguments_))])
return this.PyJsBoundFunction(target, thisArg, args)
@@ -345,7 +345,7 @@ def BlockStatement(type, body):
body) # never returns empty string! In the worst case returns pass\n
def ExpressionStatement(type, expression, **ommit):
def ExpressionStatement(type, expression):
return trans(expression) + '\n' # end expression space with new line
+10
View File
@@ -163,3 +163,13 @@ class Pysubs2CLI(object):
elif args.transform_framerate is not None:
in_fps, out_fps = args.transform_framerate
subs.transform_framerate(in_fps, out_fps)
def __main__():
cli = Pysubs2CLI()
rv = cli(sys.argv[1:])
sys.exit(rv)
if __name__ == "__main__":
__main__()
+3 -1
View File
@@ -17,12 +17,14 @@ class Color(_Color):
return _Color.__new__(cls, r, g, b, a)
#: Version of the pysubs2 library.
VERSION = "0.2.1"
VERSION = "0.2.3"
PY3 = sys.version_info.major == 3
if PY3:
text_type = str
binary_string_type = bytes
else:
text_type = unicode
binary_string_type = str
+1 -3
View File
@@ -3,7 +3,7 @@ from .microdvd import MicroDVDFormat
from .subrip import SubripFormat
from .jsonformat import JSONFormat
from .substation import SubstationFormat
from .txt_generic import TXTGenericFormat, MPL2Format
from .mpl2 import MPL2Format
from .exceptions import *
#: Dict mapping file extensions to format identifiers.
@@ -13,7 +13,6 @@ FILE_EXTENSION_TO_FORMAT_IDENTIFIER = {
".ssa": "ssa",
".sub": "microdvd",
".json": "json",
".txt": "txt_generic",
}
#: Dict mapping format identifiers to implementations (FormatBase subclasses).
@@ -23,7 +22,6 @@ FORMAT_IDENTIFIER_TO_FORMAT_CLASS = {
"ssa": SubstationFormat,
"microdvd": MicroDVDFormat,
"json": JSONFormat,
"txt_generic": TXTGenericFormat,
"mpl2": MPL2Format,
}
@@ -2,44 +2,48 @@
from __future__ import print_function, division, unicode_literals
import re
from numbers import Number
from pysubs2.time import times_to_ms
from .time import times_to_ms
from .formatbase import FormatBase
from .ssaevent import SSAEvent
from .ssastyle import SSAStyle
# thanks to http://otsaloma.io/gaupol/doc/api/aeidon.files.mpl2_source.html
MPL2_FORMAT = re.compile(r"^(?um)\[(-?\d+)\]\[(-?\d+)\](.*?)$")
class TXTGenericFormat(FormatBase):
@classmethod
def guess_format(cls, text):
if MPL2_FORMAT.match(text):
return "mpl2"
MPL2_FORMAT = re.compile(r"^(?um)\[(-?\d+)\]\[(-?\d+)\](.*)")
class MPL2Format(FormatBase):
@classmethod
def guess_format(cls, text):
return TXTGenericFormat.guess_format(text)
if MPL2_FORMAT.search(text):
return "mpl2"
@classmethod
def from_file(cls, subs, fp, format_, **kwargs):
def prepare_text(lines):
out = []
for s in lines.split("|"):
s = s.strip()
if s.startswith("/"):
out.append(r"{\i1}%s{\i0}" % s[1:])
continue
# line beginning with '/' is in italics
s = r"{\i1}%s{\i0}" % s[1:].strip()
out.append(s)
return "\n".join(out)
return "\\N".join(out)
subs.events = [SSAEvent(start=times_to_ms(s=float(start) / 10), end=times_to_ms(s=float(end) / 10),
text=prepare_text(text)) for start, end, text in MPL2_FORMAT.findall(fp.getvalue())]
@classmethod
def to_file(cls, subs, fp, format_, **kwargs):
raise NotImplemented
# TODO handle italics
for line in subs:
if line.is_comment:
continue
print("[{start}][{end}] {text}".format(start=int(line.start // 100),
end=int(line.end // 100),
text=line.plaintext.replace("\n", "|")),
file=fp)
@@ -78,7 +78,7 @@ class SSAStyle(object):
s += "%rpx " % self.fontsize
if self.bold: s += "bold "
if self.italic: s += "italic "
s += "'%s'>" % self.fontname
s += "{!r}>".format(self.fontname)
if not PY3: s = s.encode("utf-8")
return s
+9 -1
View File
@@ -46,8 +46,16 @@ class SubripFormat(FormatBase):
following_lines[-1].append(line)
def prepare_text(lines):
# Handle the "happy" empty subtitle case, which is timestamp line followed by blank line(s)
# followed by number line and timestamp line of the next subtitle. Fixes issue #11.
if (len(lines) >= 2
and all(re.match("\s*$", line) for line in lines[:-1])
and re.match("\s*\d+\s*$", lines[-1])):
return ""
# Handle the general case.
s = "".join(lines).strip()
s = re.sub(r"\n* *\d+ *$", "", s) # strip number of next subtitle
s = re.sub(r"\n+ *\d+ *$", "", s) # strip number of next subtitle
s = re.sub(r"< *i *>", r"{\i1}", s)
s = re.sub(r"< */ *i *>", r"{\i0}", s)
s = re.sub(r"< *s *>", r"{\s1}", s)
+14 -13
View File
@@ -4,7 +4,7 @@ from numbers import Number
from .formatbase import FormatBase
from .ssaevent import SSAEvent
from .ssastyle import SSAStyle
from .common import text_type, Color
from .common import text_type, Color, PY3, binary_string_type
from .time import make_time, ms_to_times, timestamp_to_ms, TIMESTAMP
SSA_ALIGNMENT = (1, 2, 3, 9, 10, 11, 5, 6, 7)
@@ -150,14 +150,7 @@ class SubstationFormat(FormatBase):
if format_ == "ass":
return ass_rgba_to_color(v)
else:
try:
return ssa_rgb_to_color(v)
except ValueError:
try:
return ass_rgba_to_color(v)
except:
return Color(255, 255, 255, 0)
return ssa_rgb_to_color(v)
elif f in {"bold", "underline", "italic", "strikeout"}:
return v == "-1"
elif f in {"borderstyle", "encoding", "marginl", "marginr", "marginv", "layer", "alphalevel"}:
@@ -229,7 +222,7 @@ class SubstationFormat(FormatBase):
for k, v in subs.aegisub_project.items():
print(k, v, sep=": ", file=fp)
def field_to_string(f, v):
def field_to_string(f, v, line):
if f in {"start", "end"}:
return ms_to_timestamp(v)
elif f == "marked":
@@ -240,23 +233,31 @@ class SubstationFormat(FormatBase):
return "-1" if v else "0"
elif isinstance(v, (text_type, Number)):
return text_type(v)
elif not PY3 and isinstance(v, binary_string_type):
# A convenience feature, see issue #12 - accept non-unicode strings
# when they are ASCII; this is useful in Python 2, especially for non-text
# fields like style names, where requiring Unicode type seems too stringent
if all(ord(c) < 128 for c in v):
return text_type(v)
else:
raise TypeError("Encountered binary string with non-ASCII codepoint in SubStation field {!r} for line {!r} - please use unicode string instead of str".format(f, line))
elif isinstance(v, Color):
if format_ == "ass":
return color_to_ass_rgba(v)
else:
return color_to_ssa_rgb(v)
else:
raise TypeError("Unexpected type when writing a SubStation field")
raise TypeError("Unexpected type when writing a SubStation field {!r} for line {!r}".format(f, line))
print("\n[V4+ Styles]" if format_ == "ass" else "\n[V4 Styles]", file=fp)
print(STYLE_FORMAT_LINE[format_], file=fp)
for name, sty in subs.styles.items():
fields = [field_to_string(f, getattr(sty, f)) for f in STYLE_FIELDS[format_]]
fields = [field_to_string(f, getattr(sty, f), sty) for f in STYLE_FIELDS[format_]]
print("Style: %s" % name, *fields, sep=",", file=fp)
print("\n[Events]", file=fp)
print(EVENT_FORMAT_LINE[format_], file=fp)
for ev in subs.events:
fields = [field_to_string(f, getattr(ev, f)) for f in EVENT_FIELDS[format_]]
fields = [field_to_string(f, getattr(ev, f), ev) for f in EVENT_FIELDS[format_]]
print(ev.type, end=": ", file=fp)
print(*fields, sep=",", file=fp)
@@ -258,4 +258,4 @@ def fix_line_ending(content):
:rtype: bytes
"""
return content.replace(b'\r\n', b'\n').replace(b'\r', b'\n')
return content.replace(b'\r\n', b'\n')
@@ -10,6 +10,8 @@ import logging
import requests
import xmlrpclib
import dns.resolver
import ipaddress
import re
from requests import exceptions
from urllib3.util import connection
@@ -17,7 +19,13 @@ from retry.api import retry_call
from exceptions import APIThrottled
from dogpile.cache.api import NO_VALUE
from subliminal.cache import region
from cfscrape import CloudflareScraper
from subliminal_patch.pitcher import pitchers
from cloudscraper import CloudScraper
try:
import brotli
except:
pass
try:
from urlparse import urlparse
@@ -55,43 +63,111 @@ class CertifiSession(TimeoutSession):
self.verify = pem_file
class CFSession(CloudflareScraper):
def __init__(self):
super(CFSession, self).__init__()
class NeedsCaptchaException(Exception):
pass
class CFSession(CloudScraper):
def __init__(self, *args, **kwargs):
super(CFSession, self).__init__(*args, **kwargs)
self.debug = os.environ.get("CF_DEBUG", False)
def _request(self, method, url, *args, **kwargs):
ourSuper = super(CloudScraper, self)
resp = ourSuper.request(method, url, *args, **kwargs)
if resp.headers.get('Content-Encoding') == 'br':
if self.allow_brotli and resp._content:
resp._content = brotli.decompress(resp.content)
else:
logging.warning('Brotli content detected, But option is disabled, we will not continue.')
return resp
# Debug request
if self.debug:
self.debugRequest(resp)
# Check if Cloudflare anti-bot is on
try:
if self.isChallengeRequest(resp):
if resp.request.method != 'GET':
# Work around if the initial request is not a GET,
# Supersede with a GET then re-request the original METHOD.
CloudScraper.request(self, 'GET', resp.url)
resp = ourSuper.request(method, url, *args, **kwargs)
else:
# Solve Challenge
resp = self.sendChallengeResponse(resp, **kwargs)
except ValueError, e:
if e.message == "Captcha":
parsed_url = urlparse(url)
domain = parsed_url.netloc
# solve the captcha
site_key = re.search(r'data-sitekey="(.+?)"', resp.content).group(1)
challenge_s = re.search(r'type="hidden" name="s" value="(.+?)"', resp.content).group(1)
challenge_ray = re.search(r'data-ray="(.+?)"', resp.content).group(1)
if not all([site_key, challenge_s, challenge_ray]):
raise Exception("cf: Captcha site-key not found!")
pitcher = pitchers.get_pitcher()("cf: %s" % domain, resp.request.url, site_key,
user_agent=self.headers["User-Agent"],
cookies=self.cookies.get_dict(),
is_invisible=True)
parsed_url = urlparse(resp.url)
logger.info("cf: %s: Solving captcha", domain)
result = pitcher.throw()
if not result:
raise Exception("cf: Couldn't solve captcha!")
submit_url = '{}://{}/cdn-cgi/l/chk_captcha'.format(parsed_url.scheme, domain)
method = resp.request.method
cloudflare_kwargs = {
'allow_redirects': False,
'headers': {'Referer': resp.url},
'params': OrderedDict(
[
('s', challenge_s),
('g-recaptcha-response', result)
]
)
}
return CloudScraper.request(self, method, submit_url, **cloudflare_kwargs)
return resp
def request(self, method, url, *args, **kwargs):
parsed_url = urlparse(url)
domain = parsed_url.netloc
cache_key = "cf_data2_%s" % domain
cache_key = "cf_data3_%s" % domain
if not self.cookies.get("cf_clearance", "", domain=domain):
cf_data = region.get(cache_key)
if cf_data is not NO_VALUE:
cf_cookies, user_agent, hdrs = cf_data
cf_cookies, hdrs = cf_data
logger.debug("Trying to use old cf data for %s: %s", domain, cf_data)
for cookie, value in cf_cookies.iteritems():
self.cookies.set(cookie, value, domain=domain)
self._hdrs = hdrs
self._ua = user_agent
self.headers['User-Agent'] = self._ua
self.headers = hdrs
ret = super(CFSession, self).request(method, url, *args, **kwargs)
ret = self._request(method, url, *args, **kwargs)
if self._was_cf:
self._was_cf = False
logger.debug("We've hit CF, trying to store previous data")
try:
cf_data = self.get_cf_live_tokens(domain)
except:
logger.debug("Couldn't get CF live tokens for re-use. Cookies: %r", self.cookies)
pass
else:
if cf_data != region.get(cache_key) and cf_data[0]["cf_clearance"]:
try:
cf_data = self.get_cf_live_tokens(domain)
except:
pass
else:
if cf_data and "cf_clearance" in cf_data[0] and cf_data[0]["cf_clearance"]:
if cf_data != region.get(cache_key):
logger.debug("Storing cf data for %s: %s", domain, cf_data)
region.set(cache_key, cf_data)
elif cf_data[0]["cf_clearance"]:
logger.debug("CF Live tokens not updated")
return ret
@@ -109,7 +185,7 @@ class CFSession(CloudflareScraper):
("__cfduid", self.cookies.get("__cfduid", "", domain=cookie_domain)),
("cf_clearance", self.cookies.get("cf_clearance", "", domain=cookie_domain))
])),
self._ua, self._hdrs
self.headers
)
@@ -240,42 +316,47 @@ def patch_create_connection():
global _custom_resolver, _custom_resolver_ips, dns_cache
host, port = address
__custom_resolver_ips = os.environ.get("dns_resolvers", None)
try:
ipaddress.ip_address(unicode(host))
except (ipaddress.AddressValueError, ValueError):
__custom_resolver_ips = os.environ.get("dns_resolvers", None)
# resolver ips changed in the meantime?
if __custom_resolver_ips != _custom_resolver_ips:
_custom_resolver = None
_custom_resolver_ips = __custom_resolver_ips
dns_cache = {}
# resolver ips changed in the meantime?
if __custom_resolver_ips != _custom_resolver_ips:
_custom_resolver = None
_custom_resolver_ips = __custom_resolver_ips
dns_cache = {}
custom_resolver = _custom_resolver
custom_resolver = _custom_resolver
if not custom_resolver:
if _custom_resolver_ips:
logger.debug("DNS: Trying to use custom DNS resolvers: %s", _custom_resolver_ips)
custom_resolver = dns.resolver.Resolver(configure=False)
custom_resolver.lifetime = 8.0
try:
custom_resolver.nameservers = json.loads(_custom_resolver_ips)
except:
logger.debug("DNS: Couldn't load custom DNS resolvers: %s", _custom_resolver_ips)
if not custom_resolver:
if _custom_resolver_ips:
logger.debug("DNS: Trying to use custom DNS resolvers: %s", _custom_resolver_ips)
custom_resolver = dns.resolver.Resolver(configure=False)
custom_resolver.lifetime = os.environ.get("dns_resolvers_timeout", 8.0)
try:
custom_resolver.nameservers = json.loads(_custom_resolver_ips)
except:
logger.debug("DNS: Couldn't load custom DNS resolvers: %s", _custom_resolver_ips)
else:
_custom_resolver = custom_resolver
if custom_resolver:
if host in dns_cache:
ip = dns_cache[host]
logger.debug("DNS: Using %s=%s from cache", host, ip)
return _orig_create_connection((ip, port), *args, **kwargs)
else:
_custom_resolver = custom_resolver
if custom_resolver:
if host in dns_cache:
ip = dns_cache[host]
logger.debug("DNS: Using %s=%s from cache", host, ip)
return _orig_create_connection((ip, port), *args, **kwargs)
else:
try:
ip = custom_resolver.query(host)[0].address
logger.debug("DNS: Resolved %s to %s using %s", host, ip, custom_resolver.nameservers)
dns_cache[host] = ip
except dns.exception.DNSException:
logger.warning("DNS: Couldn't resolve %s with DNS: %s", host, custom_resolver.nameservers)
raise
try:
ip = custom_resolver.query(host)[0].address
logger.debug("DNS: Resolved %s to %s using %s", host, ip, custom_resolver.nameservers)
dns_cache[host] = ip
return _orig_create_connection((ip, port), *args, **kwargs)
except dns.exception.DNSException:
logger.warning("DNS: Couldn't resolve %s with DNS: %s", host, custom_resolver.nameservers)
raise
logger.debug("DNS: Falling back to default DNS or IP on %s", host)
return _orig_create_connection((host, port), *args, **kwargs)
patch_create_connection._sz_patched = True
@@ -5,16 +5,11 @@ import logging
import os
import time
import inflect
import cfscrape
from random import randint
from zipfile import ZipFile
from babelfish import language_converters
from guessit import guessit
from dogpile.cache.api import NO_VALUE
from subliminal import Episode, ProviderError
from subliminal.cache import region
from subliminal.utils import sanitize_release_group
from subliminal_patch.http import RetryingCFSession
from subliminal_patch.providers import Provider
@@ -215,7 +210,7 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
for series in [video.series] + video.alternative_series:
term = u"%s - %s Season" % (series, p.number_to_words("%sth" % video.season).capitalize())
logger.debug('Searching for alternative results: %s', term)
film = search(term, session=self.session, release=False)
film = search(term, session=self.session, release=False, throttle=self.search_throttle)
if film and film.subtitles:
logger.debug('Alternative results found: %s', len(film.subtitles))
subtitles += self.parse_results(video, film)
@@ -227,7 +222,7 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
term = u"%s S%02i" % (series, video.season)
logger.debug('Searching for packs: %s', term)
time.sleep(self.search_throttle)
film = search(term, session=self.session)
film = search(term, session=self.session, throttle=self.search_throttle)
if film and film.subtitles:
logger.debug('Pack results found: %s', len(film.subtitles))
subtitles += self.parse_results(video, film)
@@ -241,7 +236,8 @@ class SubsceneProvider(Provider, ProviderSubtitleArchiveMixin):
more_than_one = len([video.title] + video.alternative_titles) > 1
for title in [video.title] + video.alternative_titles:
logger.debug('Searching for movie results: %s', title)
film = search(title, year=video.year, session=self.session, limit_to=None, release=False)
film = search(title, year=video.year, session=self.session, limit_to=None, release=False,
throttle=self.search_throttle)
if film and film.subtitles:
subtitles += self.parse_results(video, film)
if more_than_one:
@@ -225,7 +225,8 @@ class TitloviProvider(Provider, ProviderSubtitleArchiveMixin):
# page link
page_link = self.server_url + sub.a.attrs['href']
# subtitle language
match = lang_re.search(sub.select_one('.lang').attrs['src'])
_lang = sub.select_one('.lang')
match = lang_re.search(_lang.attrs.get('src', _lang.attrs.get('cfsrc', '')))
if match:
try:
# decode language
@@ -117,13 +117,14 @@ class Subtitle(Subtitle_):
logger.info('Guessing encoding for language %s', self.language)
encodings = ['utf-8']
encodings = ['utf-8', 'utf-16']
# add language-specific encodings
# http://scratchpad.wikia.com/wiki/Character_Encoding_Recommendation_for_Languages
if self.language.alpha3 == 'zho':
encodings.extend(['cp936', 'gb2312', 'cp950', 'gb18030', 'big5', 'big5hkscs'])
encodings.extend(['cp936', 'gb2312', 'gbk', 'gb18030', 'hz', 'iso2022_jp_2', 'cp950', 'gb18030', 'big5',
'big5hkscs'])
elif self.language.alpha3 == 'jpn':
encodings.extend(['shift-jis', 'cp932', 'euc_jp', 'iso2022_jp', 'iso2022_jp_1', 'iso2022_jp_2',
'iso2022_jp_2004', 'iso2022_jp_3', 'iso2022_jp_ext', ])
@@ -28,6 +28,8 @@ import re
import enum
import sys
import requests
import time
is_PY2 = sys.version_info[0] < 3
if is_PY2:
@@ -55,7 +57,9 @@ def soup_for(url, session=None, user_agent=DEFAULT_USER_AGENT):
r = Request(url, data=None, headers=dict(HEADERS, **{"User-Agent": user_agent}))
html = urlopen(r).read().decode("utf-8")
else:
html = session.get(url).text
ret = session.get(url)
ret.raise_for_status()
html = ret.text
return BeautifulSoup(html, "html.parser")
@@ -243,17 +247,34 @@ def get_first_film(soup, section, year=None, session=None):
return Film.from_url(url, session=session)
def search(term, release=True, session=None, year=None, limit_to=SearchTypes.Exact):
soup = soup_for("%s/subtitles/%s?q=%s" % (SITE_DOMAIN, "release" if release else "title", term), session=session)
def search(term, release=True, session=None, year=None, limit_to=SearchTypes.Exact, throttle=0):
# note to subscene: if you actually start to randomize the endpoint, we'll have to query your server even more
endpoints = ["searching", "search", "srch", "find"]
if release:
endpoints = ["release"]
if "Subtitle search by" in str(soup):
rows = soup.find("table").tbody.find_all("tr")
subtitles = Subtitle.from_rows(rows)
return Film(term, subtitles=subtitles)
soup = None
for endpoint in endpoints:
try:
soup = soup_for("%s/subtitles/%s?q=%s" % (SITE_DOMAIN, endpoint, term),
session=session)
except requests.HTTPError, e:
if e.response.status_code == 404:
time.sleep(throttle)
# fixme: detect endpoint from html
continue
raise
break
for junk, search_type in SearchTypes.__members__.items():
if section_exists(soup, search_type):
return get_first_film(soup, search_type, year=year, session=session)
if soup:
if "Subtitle search by" in str(soup):
rows = soup.find("table").tbody.find_all("tr")
subtitles = Subtitle.from_rows(rows)
return Film(term, subtitles=subtitles)
if limit_to == search_type:
return
for junk, search_type in SearchTypes.__members__.items():
if section_exists(soup, search_type):
return get_first_film(soup, search_type, year=year, session=session)
if limit_to == search_type:
return
@@ -2,7 +2,8 @@
OS_PLEX_USERAGENT = 'plexapp.com v9.0'
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'subzero', 'libfilebot', 'cfscrape']
DEPENDENCY_MODULE_NAMES = ['subliminal', 'subliminal_patch', 'enzyme', 'guessit', 'subzero', 'libfilebot',
'cloudscraper']
PERSONAL_MEDIA_IDENTIFIER = "com.plexapp.agents.none"
PLUGIN_IDENTIFIER_SHORT = "subzero"
PLUGIN_IDENTIFIER = "com.plexapp.agents.%s" % PLUGIN_IDENTIFIER_SHORT
@@ -293,6 +293,9 @@ class SubtitleModifications(object):
end_tag = line[-5:]
line = line[:-5]
last_procs_mods = []
# fixme: this double loop is ugly
for order, identifier, args in mods:
mod = self.initialized_mods[identifier]
@@ -312,6 +315,33 @@ class SubtitleModifications(object):
break
applied_mods.append(identifier)
if mod.last_processors:
last_procs_mods.append([identifier, args])
if skip_entry:
lines = []
break
if skip_line:
continue
for identifier, args in last_procs_mods:
mod = self.initialized_mods[identifier]
try:
line = mod.modify(line.strip(), entry=entry.text, debug=self.debug, parent=self, index=index,
procs=["last_process"], **args)
except EmptyEntryError:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, entry.text)
skip_entry = True
break
if not line:
if self.debug:
logger.debug(u"%d: %s: %r -> ''", index, identifier, old_line)
skip_line = True
break
if skip_entry:
lines = []
@@ -21,6 +21,7 @@ class SubtitleModification(object):
pre_processors = []
processors = []
post_processors = []
last_processors = []
languages = []
def __init__(self, parent):
@@ -67,15 +68,16 @@ class SubtitleModification(object):
def post_process(self, content, debug=False, parent=None, **kwargs):
return self._process(content, self.post_processors, debug=debug, parent=parent, **kwargs)
def modify(self, content, debug=False, parent=None, **kwargs):
def modify(self, content, debug=False, parent=None, procs=None, **kwargs):
if not content:
return
new_content = content
for method in ("pre_process", "process", "post_process"):
for method in procs or ("pre_process", "process", "post_process"):
if not new_content:
return
new_content = getattr(self, method)(new_content, debug=debug, parent=parent, **kwargs)
new_content = self._process(new_content, getattr(self, "%sors" % method),
debug=debug, parent=parent, **kwargs)
return new_content
@@ -107,3 +109,7 @@ empty_line_post_processors = [
class EmptyEntryError(Exception):
pass
class EmptyLineError(Exception):
pass
@@ -28,7 +28,7 @@ class CommonFixes(SubtitleTextModification):
NReProcessor(re.compile(r'(?u)(\w|\b|\s|^)(-\s?-{1,2})'), ur"\1", name="CM_multidash"),
# line = _/-/\s
NReProcessor(re.compile(r'(?u)(^\W*[-_.:>~]+\W*$)'), "", name="CM_non_word_only"),
NReProcessor(re.compile(r'(?u)(^\W*[-_.:>~]+\W*$)'), "", name="<CM_non_word_only"),
# remove >>
NReProcessor(re.compile(r'(?u)^\s?>>\s*'), "", name="CM_leading_crocodiles"),
@@ -37,7 +37,7 @@ class CommonFixes(SubtitleTextModification):
NReProcessor(re.compile(r'(?u)(^\W*:\s*(?=\w+))'), "", name="CM_empty_colon_start"),
# fix music symbols
NReProcessor(re.compile(ur'(?u)(^[-\s>~]*[*#¶]+\s*)|(\s*[*#¶]+\s*$)'),
NReProcessor(re.compile(ur'(?u)(^[-\s>~]*[*#¶]+\s+)|(\s*[*#¶]+\s*$)'),
lambda x: u"" if x.group(1) else u"",
name="CM_music_symbols"),
@@ -49,11 +49,11 @@ class HearingImpaired(SubtitleTextModification):
NReProcessor(re.compile(ur'(?sux)-?%(t)s[([][^([)\]]+?(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' %
{"t": TAG}), "", name="HI_brackets"),
NReProcessor(re.compile(ur'(?sux)-?%(t)s[([]%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+%(t)s$' % {"t": TAG}),
"", name="HI_bracket_open_start"),
#NReProcessor(re.compile(ur'(?sux)-?%(t)s[([]%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+%(t)s$' % {"t": TAG}),
# "", name="HI_bracket_open_start"),
NReProcessor(re.compile(ur'(?sux)-?%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' % {"t": TAG}), "",
name="HI_bracket_open_end"),
#NReProcessor(re.compile(ur'(?sux)-?%(t)s(?=[A-zÀ-ž"\'.]{3,})[^([)\]]+[)\]][\s:]*%(t)s' % {"t": TAG}), "",
# name="HI_bracket_open_end"),
# text before colon (and possible dash in front), max 11 chars after the first whitespace (if any)
# NReProcessor(re.compile(r'(?u)(^[A-z\-\'"_]+[\w\s]{0,11}:[^0-9{2}][\s]*)'), "", name="HI_before_colon"),
@@ -73,7 +73,7 @@ class HearingImpaired(SubtitleTextModification):
supported=lambda p: not p.only_uppercase),
# remove MAN:
NReProcessor(re.compile(ur'(?suxi)(.*MAN:\s*)'), "", name="HI_remove_man"),
NReProcessor(re.compile(ur'(?suxi)(\b(?:WO)MAN:\s*)'), "", name="HI_remove_man"),
# dash in front
# NReProcessor(re.compile(r'(?u)^\s*-\s*'), "", name="HI_starting_dash"),
@@ -81,13 +81,18 @@ class HearingImpaired(SubtitleTextModification):
# all caps at start before new sentence
NReProcessor(re.compile(ur'(?u)^(?=[A-ZÀ-Ž]{4,})[A-ZÀ-Ž-_\s]+\s([A-ZÀ-Ž][a-zà-ž].+)'), r"\1",
name="HI_starting_upper_then_sentence", supported=lambda p: not p.only_uppercase),
# remove music symbols
NReProcessor(re.compile(ur'(?u)(^%(t)s[*#¶♫♪\s]*%(t)s[*#¶♫♪\s]+%(t)s[*#¶♫♪\s]*%(t)s$)' % {"t": TAG}),
"", name="HI_music_symbols_only"),
]
post_processors = empty_line_post_processors
last_processors = [
# remove music symbols
NReProcessor(re.compile(ur'(?u)(^%(t)s[*#¶♫♪\s]*%(t)s[*#¶♫♪\s]+%(t)s[*#¶♫♪\s]*%(t)s$)' % {"t": TAG}),
"", name="HI_music_symbols_only"),
# remove music entries
NReProcessor(re.compile(ur'(?ums)(^[-\s>~]*[♫♪]+\s*.+|.+\s*[♫♪]+\s*$)'),
"", name="HI_music"),
]
registry.register(HearingImpaired)
+21
View File
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2019 VeNoMouS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
+15 -38
View File
@@ -1,7 +1,9 @@
# Sub-Zero for Plex
[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle.svg?style=flat&label=stable)](https://github.com/pannal/Sub-Zero.bundle/releases/latest)<!--[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle/all.svg?maxAge=2592000&label=testing+2.0+RC9)](https://github.com/pannal/Sub-Zero.bundle/releases)--> [![master](https://img.shields.io/badge/master-stable-green.svg?maxAge=2592000)]()
[![](https://img.shields.io/github/release/pannal/Sub-Zero.bundle.svg?style=flat&label=stable)](https://github.com/pannal/Sub-Zero.bundle/releases/latest)
[![master](https://img.shields.io/badge/master-stable-green.svg?maxAge=2592000)]()
[![Maintenance](https://img.shields.io/maintenance/yes/2019.svg)]()
[![Slack Status](https://szslack.fragstore.net/badge.svg)](https://szslack.fragstore.net)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle?ref=badge_shield)
<img src="https://raw.githubusercontent.com/pannal/Sub-Zero.bundle/master/Contents/Resources/subzero.gif" align="left" height="100"> <font size="5"><b>Subtitles done right!</b></font><br />
@@ -88,52 +90,23 @@ the.vbm, mmgoodnow, Vertig0ne, thliu78, tattoomees, ostman, count_confucius, ehe
## Changelog
2.6.5.3041
2.6.5.3074
subscene, addic7ed and titlovi
- either of those providers might impose a reCAPTCHA verification. In order to use those providers, please create an account at an AntiCaptcha service ([anti-captcha.com](http://getcaptchasolution.com/kkvviom7nh) or [deathbycaptcha.com](http://deathbycaptcha.com)), add funds, then supply your credentials/apikey in the configuration
Changelog
core: only reference guessed title if there actually is one
core: cf: optimize
core/config: add setting for one existing language to be enough, fixes #491
core/compat: dns: support nameservers via ENV[dns_resolvers]; don't fall back to default DNS when configured custom DNS failed
providers: titlovi: prevent repeated captcha solving for CF
- core: cf: bypass cf 95% of the time without captchas
- core: fix breaking line endings of certain languages (chinese, UTF-16); fixes #646
- core: update pysubs2 to 0.2.3
2.6.5.3017
2.6.5.3062
Changelog
- core: SRT parsing: handle (bad) ASS color tag in SRT
- core: auto extract embedded: only use one unknown sub for first language
- core: better embedded streams language detection
- core: optimizations
- core: extract embedded: fix is_unknown check
- core: don't raise exception when subtitle not found inside archive
- core: search external subtitles: fix condition
- core: better plex transcoder path detection
- core: use Log.Warn instead of Log.Warning (#619, #629, #633)
- core: also check for "plex transcoder.exe" in case of windows (fixes #619)
- core: auto extract: use mbcs encoding for paths on windows
- core: Fix issue scandir not returning the name of the file inside Docker images on ARM systems. (thanks @giejay)
- core: also clean PYTHONHOME when calling external notification app
- core: update certifi to 2019.3.9
- core: scan_video: add series/title as alternative by scanning filename itself without parent folders
- core: add generic solution for solving captchas using anti captcha services
- core: increase cache time to 180d (was: 30d)
- core: guess_matches: handle multiple title matches; fixes bazarr#403
- windows: fix compatibility issues with plex transcoder
- compat: use lowercase paths on subtitle detection
- providers: addic7ed: re-enable (using paid anti captch service)
- providers: assrt: assume undefined Chinese flavor as Simplified (chs/zho-Hans)
- providers: subscene: make it work again by bypassing cf
- providers: subscene: don't fail on missing cover
- providers: titlovi: re-enable (might need paid anti captch service)
- providers: opensubtitles: fix only_foreign handling
- providers: opensubtitles: show subtitles with possibly mismatched series when manually listing subs
- menu: list subtitles: show subtitles with bad season/episode values as well
- refiners: omdb: fix imdb ids with spaces
- core: cf: optimize
- core: http: don't query DNS with IPs. thanks @fgump (fixes sonarr/radarr)
[older changes](CHANGELOG.md)
@@ -142,3 +115,7 @@ Changelog
Subtitles provided by [OpenSubtitles.org](http://www.opensubtitles.org/), [Podnapisi.NET](https://www.podnapisi.net/), [TVSubtitles.net](http://www.tvsubtitles.net/), [Addic7ed.com](http://www.addic7ed.com/), [Legendas TV](http://legendas.tv/), [Napi Projekt](http://www.napiprojekt.pl/), [Shooter](http://shooter.cn/), [Titlovi](http://titlovi.com), [aRGENTeaM](http://argenteam.net), [SubScene](https://subscene.com/), [Hosszupuska](http://hosszupuskasub.com/)
[3rd party licenses](https://github.com/pannal/Sub-Zero.bundle/tree/master/Licenses)
## License
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Fpannal%2FSub-Zero.bundle?ref=badge_large)