Compare commits

...

52 Commits

Author SHA1 Message Date
pannal 6c588964a7 Update README.md 2015-10-09 02:42:20 +02:00
pannal f65b24094a Merge pull request #25 from pannal/rc3
pull RC3 into master
2015-10-09 02:36:57 +02:00
panni 6b807be0e6 opensubtitles: add optional credentials for VIPs; fixes #17 2015-10-09 02:35:33 +02:00
panni a794eb8310 providers: move punctuation fix into seperate mixins.py and use it 2015-10-09 02:08:43 +02:00
panni 8290c8a371 tvsubtitles: fix series with punctuation 2015-10-09 02:04:30 +02:00
panni 475152a7eb podnapisi: fix logging 2015-10-09 01:40:24 +02:00
panni 4e75e20ede add download retry option; fixes #24; move questionable only_one setting to the bottom 2015-10-09 01:28:56 +02:00
panni d36823c7ca better score logging; move patched providers to separate folder; better addic7ed punctuation handling in get_show_ids 2015-10-09 00:48:11 +02:00
panni 2a6b387112 addic7ed: fix series detection with punctuation; add missing self 2015-10-08 10:38:29 +02:00
panni a83822bff9 more verbose logging on subtitle download fail 2015-10-08 10:37:51 +02:00
panni 8e7538f6e6 fix broken import 2015-10-07 19:05:48 +02:00
panni 9cdb26f7cc forgot second clean_punctuation 2015-10-07 19:03:45 +02:00
panni 9659c913c4 Merge branch 'master' of github.com:pannal/Subliminal.bundle 2015-10-07 19:02:46 +02:00
panni c9506cb95e fix getting addic7ed show IDs for series with punctuation in their names 2015-10-07 19:02:33 +02:00
pannal 43e6ce3997 Update README.md 2015-10-07 05:13:36 +02:00
pannal dfd12edcb3 Update DefaultPrefs.json 2015-10-07 05:11:10 +02:00
pannal 154a8072f6 Update README.md 2015-10-07 04:07:59 +02:00
pannal 904abaf26b Update README.md 2015-10-07 02:58:32 +02:00
panni bea18a27ba set default TV score to 15; movie score to 30 2015-10-07 02:55:56 +02:00
pannal 2d998eab50 Update README.md 2015-10-07 02:47:40 +02:00
pannal a25a67572b Update README.md 2015-10-07 02:45:23 +02:00
pannal 1bdf6f9969 Merge pull request #22 from pannal/rc1-fix
RC1 fixes
2015-10-07 02:44:10 +02:00
panni 0b32892fa8 better existing subtitles debug logging 2015-10-07 02:42:14 +02:00
panni fea5b8a716 switch to tonswieb/enzyme 2015-10-07 02:06:47 +02:00
panni 90b3707409 update enzyme 2015-10-07 01:07:01 +02:00
panni 1c0224fbe7 skip empty folder creation if not subtitles found; should fix #20 2015-10-07 00:59:07 +02:00
pannal 626fcd1140 Update README.md 2015-09-24 02:57:23 +02:00
pannal b01c84b14c Update README.md 2015-09-24 02:55:53 +02:00
pannal 412492b4d1 Update README.md 2015-09-24 02:55:37 +02:00
panni 9a6f7a4316 forgot import, again 2015-09-24 02:44:30 +02:00
panni 660f887923 correct number casting; fixes #16 2015-09-24 02:34:34 +02:00
panni fe9c67ed91 forgot import 2015-09-24 02:13:20 +02:00
panni d3bbd05e4f subliminal: fix wrong usage of logger; fixes #15 2015-09-24 01:58:18 +02:00
panni 34585129aa Merge branch 'master' of github.com:pannal/Subliminal.bundle 2015-09-24 01:27:26 +02:00
panni 955cd4c173 allow only one subtitle optionally; fixes #3 2015-09-24 01:27:15 +02:00
pannal 4da63a8fd7 Update README.md 2015-09-23 14:40:42 +02:00
panni fa27789608 fixed typo 2015-09-23 14:31:55 +02:00
panni f9e9f35157 Merge branch 'deep_scan_subs'
Conflicts:
	Contents/Code/__init__.py
2015-09-23 14:29:21 +02:00
panni 4a6604f0ab custom folder now takes precedence; also scan subfolders for existing subtitles if configured; update custom folder settings description; remove direct subliminal.video patch and move it to subliminal_patch.patch_video 2015-09-23 14:26:21 +02:00
panni 971d1221da don't die on missing header; maybe fixes #13 2015-09-23 13:36:18 +02:00
panni ba69885477 fix saving subs to video folder without custom_path given; should fix #14 2015-09-23 12:46:07 +02:00
panni 8e23098037 add basic functionality to scan custom (sub-) folders for subtitles 2015-09-19 04:35:48 +02:00
pannal 8da7bf029c Update README.md 2015-09-18 03:48:34 +02:00
pannal e16e58cbfa Update README.md 2015-09-18 03:29:34 +02:00
pannal abb7cd3bfa Update README.md 2015-09-18 03:19:04 +02:00
pannal bfa06f3989 Update README.md 2015-09-18 03:16:37 +02:00
pannal c63529939d Merge pull request #11 from pannal/guessit-0.11.0
update guessit to 0.11.0
2015-09-18 03:16:20 +02:00
panni 2814f57e89 update guessit to 0.11.0 2015-09-18 03:14:21 +02:00
panni 70476883c6 Merge branch 'master' of github.com:pannal/Subliminal.bundle 2015-09-18 03:11:20 +02:00
panni b5ed209453 Revert "update guessit to 0.11.0"
This reverts commit be7687f15d.
2015-09-18 03:10:58 +02:00
panni be7687f15d update guessit to 0.11.0 2015-09-18 03:08:55 +02:00
pannal b7fb8e1e76 Update README.md 2015-09-18 02:56:40 +02:00
52 changed files with 2328 additions and 487 deletions
+58 -18
View File
@@ -19,6 +19,8 @@ def Start():
# configured cache to be in memory as per https://github.com/Diaoul/subliminal/issues/303
subliminal.region.configure('dogpile.cache.memory')
def ValidatePrefs():
Log.Debug("Validate Prefs called.")
return
@@ -26,6 +28,8 @@ def ValidatePrefs():
# Prepare a list of languages we want subs for
def getLangList():
langList = {Language.fromietf(Prefs["langPref1"])}
if(Prefs['subtitles.only_one']):
return langList
if(Prefs["langPref2"] != "None"):
langList.update({Language.fromietf(Prefs["langPref2"])})
if(Prefs["langPref3"] != "None"):
@@ -33,6 +37,19 @@ def getLangList():
return langList
def getSubtitleDestinationFolder():
if not Prefs["subtitles.save.filesystem"]:
return
fld_custom = Prefs["subtitles.save.subFolder.Custom"].strip() if bool(Prefs["subtitles.save.subFolder.Custom"]) else None
return fld_custom or (Prefs["subtitles.save.subFolder"] if Prefs["subtitles.save.subFolder"] != "current folder" else None)
def initSubliminalPatches():
# configure custom subtitle destination folders for scanning pre-existing subs
dest_folder = getSubtitleDestinationFolder()
subliminal_patch.patch_video.CUSTOM_PATHS = [dest_folder] if dest_folder else []
subliminal_patch.patch_provider_pool.DOWNLOAD_TRIES = int(Prefs['subtitles.try_downloads'])
def getProviders():
providers = {'opensubtitles' : Prefs['provider.opensubtitles.enabled'],
'thesubdb' : Prefs['provider.thesubdb.enabled'],
@@ -47,7 +64,11 @@ def getProviderSettings():
'password': Prefs['provider.addic7ed.password'],
'use_random_agents': Prefs['provider.addic7ed.use_random_agents'],
},
}
'opensubtitles': {'username': Prefs['provider.opensubtitles.username'],
'password': Prefs['provider.opensubtitles.password'],
},
}
return provider_settings
def scanTvMedia(media):
@@ -78,26 +99,44 @@ def scanVideo(part):
except ValueError:
Log.Warn("File could not be guessed by subliminal")
def downloadBestSubtitles(videos):
min_score = int(Prefs['subtitles.search.minimumScore'])
def downloadBestSubtitles(videos, min_score=0):
hearing_impaired = Prefs['subtitles.search.hearingImpaired']
Log.Debug("Download best subtitles using settings: min_score: %s, hearing_impaired: %s" %(min_score, hearing_impaired))
languages = getLangList()
if not languages:
return
missing_languages = False
for video in videos:
if not (languages - video.subtitle_languages):
Log.Debug('All languages %r exist for %s', languages, video)
continue
missing_languages = True
break
if missing_languages:
Log.Debug("Download best subtitles using settings: min_score: %s, hearing_impaired: %s" %(min_score, hearing_impaired))
return subliminal.api.download_best_subtitles(videos, getLangList(), min_score, hearing_impaired, providers=getProviders(), provider_configs=getProviderSettings())
return subliminal.api.download_best_subtitles(videos, languages, min_score, hearing_impaired, providers=getProviders(), provider_configs=getProviderSettings(), only_one=Prefs['subtitles.only_one'])
Log.Debug("All languages for all requested videos exist. Doing nothing.")
def saveSubtitles(videos, subtitles):
if Prefs['subtitles.save.filesystem']:
Log.Debug("Saving subtitles to filesystem")
Log.Debug("Using filesystem as subtitle storage")
saveSubtitlesToFile(subtitles)
else:
Log.Debug("Saving subtitles as metadata")
Log.Debug("Using metadata as subtitle storage")
saveSubtitlesToMetadata(videos, subtitles)
def saveSubtitlesToFile(subtitles):
fld_custom = Prefs["subtitles.save.subFolder.Custom"].strip() if bool(Prefs["subtitles.save.subFolder.Custom"]) else None
if Prefs["subtitles.save.subFolder"] != "current folder" or fld_custom:
# specific subFolder requested, create it if it doesn't exist
for video, video_subtitles in subtitles.items():
for video, video_subtitles in subtitles.items():
if not video_subtitles:
continue
fld = None
if fld_custom or Prefs["subtitles.save.subFolder"] != "current folder":
# specific subFolder requested, create it if it doesn't exist
fld_base = os.path.split(video.name)[0]
if fld_custom:
if fld_custom.startswith("/"):
@@ -109,10 +148,7 @@ def saveSubtitlesToFile(subtitles):
fld = os.path.join(fld_base, Prefs["subtitles.save.subFolder"])
if not os.path.exists(fld):
os.makedirs(fld)
subliminal.api.save_subtitles(video, video_subtitles, directory=fld)
else:
subliminal.api.save_subtitles(subtitles)
subliminal.api.save_subtitles(video, video_subtitles, directory=fld, single=Prefs['subtitles.only_one'])
def saveSubtitlesToMetadata(videos, subtitles):
for video, video_subtitles in subtitles.items():
@@ -132,9 +168,11 @@ class SubliminalSubtitlesAgentMovies(Agent.Movies):
def update(self, metadata, media, lang):
Log.Debug("MOVIE UPDATE CALLED")
initSubliminalPatches()
videos = scanMovieMedia(media)
subtitles = downloadBestSubtitles(videos.keys())
saveSubtitles(videos, subtitles)
subtitles = downloadBestSubtitles(videos.keys(), min_score=int(Prefs["subtitles.search.minimumMovieScore"]))
if subtitles:
saveSubtitles(videos, subtitles)
class SubliminalSubtitlesAgentTvShows(Agent.TV_Shows):
@@ -149,6 +187,8 @@ class SubliminalSubtitlesAgentTvShows(Agent.TV_Shows):
def update(self, metadata, media, lang):
Log.Debug("TvUpdate. Lang %s" % lang)
initSubliminalPatches()
videos = scanTvMedia(media)
subtitles = downloadBestSubtitles(videos.keys())
saveSubtitles(videos, subtitles)
subtitles = downloadBestSubtitles(videos.keys(), min_score=int(Prefs["subtitles.search.minimumTVScore"]))
if subtitles:
saveSubtitles(videos, subtitles)
+44 -10
View File
@@ -1,5 +1,11 @@
[
{
{ "id": "subtitles.try_downloads",
"label": "How many download tries per subtitle (on timeout or error)",
"type": "enum",
"values": ["1", "2", "3", "4"],
"default": "2"
},
{
"id": "provider.addic7ed.username",
"label": "Addic7ed Username",
"type": "text",
@@ -13,6 +19,20 @@
"default": "",
"secure": "true"
},
{
"id": "provider.opensubtitles.username",
"label": "Opensubtitles Username (VIP)",
"type": "text",
"default": ""
},
{
"id": "provider.opensubtitles.password",
"label": "Opensubtitles Password",
"type": "text",
"option": "hidden",
"default": "",
"secure": "true"
},
{
"id": "provider.addic7ed.use_random_agents",
"label": "Addic7ed: Use random user agents (should not be necessary)",
@@ -40,6 +60,7 @@
"values": ["None", "sq","ar","be","bs","bg","ca","zh","cs","da","nl","en","et","fi","fr","de","el","he","hi","hu","is","id","it","ja","ko","lv","lt","mk","ms","no","fa","pl","pt","pt-br","ro","ru","sr","sk","sl","es","sv","th","tr","uk","vi","hr"],
"default": "None"
},
{
"id": "provider.opensubtitles.enabled",
"label": "Provider: Enable OpenSubtitles",
@@ -74,20 +95,27 @@
"id": "subtitles.scan.embedded",
"label": "Scan: include embedded subtitles (skip if existing)",
"type": "bool",
"default": "false"
"default": "true"
},
{
"id": "subtitles.scan.external",
"label": "Scan: include external subtitles (skip if existing)",
"type": "bool",
"default": "false"
"default": "true"
},
{
"id": "subtitles.search.minimumScore",
"label": "Minimum score for subtitles to download",
"id": "subtitles.search.minimumTVScore",
"label": "Minimum score for TV subtitles to download",
"type": "enum",
"values": ["100","95","90","85","80","75","70","65","60","55","50","45","40","35","30","25","20","15","10","5","0"],
"default": "0"
"default": "15"
},
{
"id": "subtitles.search.minimumMovieScore",
"label": "Minimum score for movie subtitles to download",
"type": "enum",
"values": ["100","95","90","85","80","75","70","65","60","55","50","45","40","35","30","25","20","15","10","5","0"],
"default": "30"
},
{
"id": "subtitles.search.hearingImpaired",
@@ -101,17 +129,23 @@
"type": "bool",
"default": "false"
},
{
{
"id": "subtitles.save.subFolder",
"label": "Subtitle Folder (\"current folder\" is the folder the current media file lives in)",
"label": "Subtitle Folder (\"current folder\" is the folder the current media file lives in) - needs LocalMediaExtended agent",
"type": "enum",
"values": ["current folder", "sub", "subs", "subtitle", "subtitles"],
"default": "current folder"
},
{
{
"id": "subtitles.save.subFolder.Custom",
"label": "Custom Subtitle folder (computes to real paths; use for example \"bla\" as a subfolder of the current media file folder - can use real paths aswell)",
"label": "Custom Subtitle folder (overrides \"Subtitle Folder\"; computes to real paths; use for example \"bla\" as a subfolder of the current media file folder or an absolute path) - needs LocalMediaExtended agent",
"type": "text",
"default": ""
},
{
"id": "subtitles.only_one",
"label": "Restrict to one language (skips adding \".lang.\" to the subtitle filename; only uses \"Subtitle Language (1)\")",
"type": "bool",
"default": "false"
}
]
@@ -8,6 +8,7 @@ __copyright__ = 'Copyright 2013 Antoine Bertin'
import logging
from .exceptions import *
from .mkv import *
from .subtitle import *
logging.getLogger(__name__).addHandler(logging.NullHandler())
+42 -18
View File
@@ -65,30 +65,53 @@ class MKV(object):
continue
if element_name == 'Info':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self.info = Info.fromelement(ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32']))
element = self._load_element(stream, specs, element_position)
self.info = Info.fromelement(element)
elif element_name == 'Tracks':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
tracks = ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32'])
tracks = self._load_element(stream, specs, element_position)
self.video_tracks.extend([VideoTrack.fromelement(t) for t in tracks if t['TrackType'].data == VIDEO_TRACK])
self.audio_tracks.extend([AudioTrack.fromelement(t) for t in tracks if t['TrackType'].data == AUDIO_TRACK])
self.subtitle_tracks.extend([SubtitleTrack.fromelement(t) for t in tracks if t['TrackType'].data == SUBTITLE_TRACK])
elif element_name == 'Chapters':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self.chapters.extend([Chapter.fromelement(c) for c in ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32'])[0] if c.name == 'ChapterAtom'])
element = self._load_element(stream, specs, element_position)
self.chapters.extend([Chapter.fromelement(c) for c in element[0] if c.name == 'ChapterAtom'])
elif element_name == 'Tags':
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self.tags.extend([Tag.fromelement(t) for t in ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32'])])
element = self._load_element(stream, specs, element_position)
self.tags.extend([Tag.fromelement(t) for t in element])
elif element_name == 'SeekHead' and self.recurse_seek_head:
logger.info('Processing element %s from SeekHead at position %d', element_name, element_position)
stream.seek(element_position)
self._parse_seekhead(ebml.parse_element(stream, specs, True, ignore_element_names=['Void', 'CRC-32']), segment, stream, specs)
element = self._load_element(stream, specs, element_position)
self._parse_seekhead(element, segment, stream, specs)
else:
logger.debug('Element %s ignored', element_name)
self._parsed_positions.add(element_position)
def _load_element(self,stream, specs, position):
stream.seek(position)
element = ebml.parse_element(stream,specs)
element.load(stream, specs, ignore_element_names=['Void', 'CRC-32'])
return element
def get_srt_subtitles_track_by_language(self):
"""get a dictionary of the SRT subtitles track id's indexed by language"""
subtitles = dict()
for track in self.subtitle_tracks:
logger.info("Found subtitle language %s, with codec %s and lacing %s",
track.language,track.codec_id,track.lacing)
if not track.is_srt():
logger.debug("Ignoring subtitle language %s with codec %s",track.language,track.codec_id)
elif track.lacing:
logger.info("Ignoring subtitle language %s with lacing %s",track.language,track.lacing)
else:
subtitles[track.language] = track
return subtitles
def to_dict(self):
return {'info': self.info.__dict__, 'video_tracks': [t.__dict__ for t in self.video_tracks],
@@ -103,6 +126,7 @@ class Info(object):
"""Object for the Info EBML element"""
def __init__(self, title=None, duration=None, date_utc=None, timecode_scale=None, muxing_app=None, writing_app=None):
self.title = title
self.timecode_scale = timecode_scale
self.duration = timedelta(microseconds=duration * (timecode_scale or 1000000) // 1000) if duration else None
self.date_utc = date_utc
self.muxing_app = muxing_app
@@ -119,7 +143,7 @@ class Info(object):
title = element.get('Title')
duration = element.get('Duration')
date_utc = element.get('DateUTC')
timecode_scale = element.get('TimecodeScale')
timecode_scale = element.get('TimecodeScale',1000000)
muxing_app = element.get('MuxingApp')
writing_app = element.get('WritingApp')
return cls(title, duration, date_utc, timecode_scale, muxing_app, writing_app)
@@ -133,7 +157,7 @@ class Info(object):
class Track(object):
"""Base object for the Tracks EBML element"""
def __init__(self, type=None, number=None, name=None, language=None, enabled=None, default=None, forced=None, lacing=None, # @ReservedAssignment
def __init__(self, type=None, number=None, name=None, language=None, enabled=None, default=None, forced=None, lacing=None,
codec_id=None, codec_name=None):
self.type = type
self.number = number
@@ -154,10 +178,10 @@ class Track(object):
:type element: :class:`~enzyme.parsers.ebml.Element`
"""
type = element.get('TrackType') # @ReservedAssignment
type = element.get('TrackType')
number = element.get('TrackNumber', 0)
name = element.get('Name')
language = element.get('Language')
language = element.get('Language','eng')
enabled = bool(element.get('FlagEnabled', 1))
default = bool(element.get('FlagDefault', 1))
forced = bool(element.get('FlagForced', 0))
@@ -256,8 +280,9 @@ class AudioTrack(Track):
class SubtitleTrack(Track):
"""Object for the Tracks EBML element with :data:`SUBTITLE_TRACK` TrackType"""
pass
def is_srt(self):
return self.codec_id == 'S_TEXT/UTF8'
class Tag(object):
"""Object for the Tag EBML element"""
@@ -344,8 +369,7 @@ class Chapter(object):
if chapterdisplays:
string = chapterdisplays[0].get('ChapString')
language = chapterdisplays[0].get('ChapLanguage')
return cls(start, hidden, enabled, end, string, language)
return cls(start, hidden, enabled, end)
return cls(start, hidden, enabled, end, string, language)
def __repr__(self):
return '<%s [%s, enabled=%s]>' % (self.__class__.__name__, self.start, self.enabled)
@@ -38,8 +38,15 @@ READERS = {
BINARY: read_element_binary
}
class BaseElement(object):
class Element(object):
def __init__(self, id=None, position=None, size=None, data=None):
self.id = id
self.position = position
self.size = size
self.data = data
class Element(BaseElement):
"""Base object of EBML
:param int id: id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
@@ -52,14 +59,11 @@ class Element(object):
:param data: data as read by the corresponding :data:`READERS`
"""
def __init__(self, id=None, type=None, name=None, level=None, position=None, size=None, data=None): # @ReservedAssignment
self.id = id
def __init__(self, id=None, type=None, name=None, level=None, position=None, size=None, data=None):
super(Element, self).__init__(id, position, size, data)
self.type = type
self.name = name
self.level = level
self.position = position
self.size = size
self.data = data
def __repr__(self):
return '<%s [%s, %r]>' % (self.__class__.__name__, self.name, self.data)
@@ -89,7 +93,7 @@ class MasterElement(Element):
Element(DocType, u'matroska')
"""
def __init__(self, id=None, name=None, level=None, position=None, size=None, data=None): # @ReservedAssignment
def __init__(self, id=None, name=None, level=None, position=None, size=None, data=None):
super(MasterElement, self).__init__(id, MASTER, name, level, position, size, data)
def load(self, stream, specs, ignore_element_types=None, ignore_element_names=None, max_level=None):
@@ -137,8 +141,7 @@ class MasterElement(Element):
def __iter__(self):
return iter(self.data)
def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None):
def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None, include_element_names=None):
"""Parse a stream for `size` bytes according to the `specs`
:param stream: file-like object from which to read
@@ -148,6 +151,7 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
:param list ignore_element_types: list of element types to ignore
:param list ignore_element_names: list of element names to ignore
:param int max_level: maximum level of elements
:param list include_element_names: list of element names to include exclusively, so ignoring all other element names
:return: parsed data as a tree of :class:`~enzyme.parsers.ebml.core.Element`
:rtype: list
@@ -158,26 +162,32 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
"""
ignore_element_types = ignore_element_types if ignore_element_types is not None else []
ignore_element_names = ignore_element_names if ignore_element_names is not None else []
include_element_names = include_element_names if include_element_names is not None else []
start = stream.tell()
elements = []
while size is None or stream.tell() - start < size:
try:
element = parse_element(stream, specs)
if element is None:
if element.type is None:
logger.error('Element with id 0x%x is not in the specs' % element_id)
stream.seek(element_size, 1)
continue
logger.debug('%s %s parsed', element.__class__.__name__, element.name)
if element.type in ignore_element_types or element.name in ignore_element_names:
logger.info('%s %s ignored', element.__class__.__name__, element.name)
if element.type == MASTER:
stream.seek(element.size, 1)
elif element.type in ignore_element_types or element.name in ignore_element_names:
logger.info('%s %s %s ignored', element.__class__.__name__, element.name, element.type)
stream.seek(element.size, 1)
continue
if element.type == MASTER:
elif len(include_element_names) > 0 and element.name not in include_element_names:
stream.seek(element.size, 1)
continue
elif element.type == MASTER:
if max_level is not None and element.level >= max_level:
logger.info('Maximum level %d reached for children of %s %s', max_level, element.__class__.__name__, element.name)
stream.seek(element.size, 1)
else:
logger.debug('Loading child elements for %s %s with size %d', element.__class__.__name__, element.name, element.size)
element.data = parse(stream, specs, element.size, ignore_element_types, ignore_element_names, max_level)
element.data = parse(stream, specs, element.size, ignore_element_types, ignore_element_names, max_level,include_element_names)
else:
element.data = READERS[element.type](stream, element.size)
elements.append(element)
except ReadError:
if size is not None:
@@ -186,21 +196,15 @@ def parse(stream, specs, size=None, ignore_element_types=None, ignore_element_na
return elements
def parse_element(stream, specs, load_children=False, ignore_element_types=None, ignore_element_names=None, max_level=None):
def parse_element(stream, specs):
"""Extract a single :class:`Element` from the `stream` according to the `specs`
:param stream: file-like object from which to read
:param dict specs: see :ref:`specs`
:param bool load_children: load children elements if the parsed element is a :class:`MasterElement`
:param list ignore_element_types: list of element types to ignore
:param list ignore_element_names: list of element names to ignore
:param int max_level: maximum level for children elements
:return: the parsed element
:rtype: :class:`Element`
"""
ignore_element_types = ignore_element_types if ignore_element_types is not None else []
ignore_element_names = ignore_element_names if ignore_element_names is not None else []
element_id = read_element_id(stream)
if element_id is None:
raise ReadError('Cannot read element id')
@@ -208,20 +212,14 @@ def parse_element(stream, specs, load_children=False, ignore_element_types=None,
if element_size is None:
raise ReadError('Cannot read element size')
if element_id not in specs:
logger.error('Element with id 0x%x is not in the specs' % element_id)
stream.seek(element_size, 1)
return None
return BaseElement(element_id,stream.tell(),element_size)
element_type, element_name, element_level = specs[element_id]
if element_type == MASTER:
element = MasterElement(element_id, element_name, element_level, stream.tell(), element_size)
if load_children:
element.data = parse(stream, specs, element.size, ignore_element_types, ignore_element_names, max_level)
else:
element = Element(element_id, element_type, element_name, element_level, stream.tell(), element_size)
element.data = READERS[element_type](stream, element_size)
return element
def get_matroska_specs(webm_only=False):
"""Get the Matroska specs
@@ -0,0 +1,185 @@
# -*- coding: utf-8 -*-
from .exceptions import ReadError
from .parsers import ebml
from .mkv import MKV
from .parsers import ebml
import logging
import codecs
import os
import io
__all__ = ['Subtitle']
logger = logging.getLogger(__name__)
class Subtitle(object):
"""Subtitle extractor for Matroska Video File.
Currently only SRT subtitles stored without lacing are supported
"""
def __init__(self, stream):
"""Read the available subtitles from a MKV file-like object"""
self._stream = stream
#Use the MKV class to parse the META information
mkv = MKV(stream)
self._timecode_scale = mkv.info.timecode_scale
self._subtitles = mkv.get_srt_subtitles_track_by_language()
def has_subtitle(self, language):
return language in self._subtitles
def write_subtitle_to_stream(self, language):
"""Write a single subtitle to stream or return None if language not available"""
if language in self._subtitles:
subtitle = self._subtitles[language]
return _write_track_to_srt_stream(self._stream,subtitle.number,self._timecode_scale)
logger.info("Writing subtitle for language %s to stream",language)
else:
logger.info("Subtitle for language %s not found",language)
def write_subtitles_to_stream(self):
"""Write all available subtitles as streams to a dictionary with language as the key"""
subtitles = dict()
for language in self._subtitles:
subtitles[language] = self.write_subtitle_to_stream(language)
return subtitles
def _write_track_to_srt_stream(mkv_stream, track, timecode_scale):
srt_stream = io.StringIO()
index = 0
for cluster in _parse_segment(mkv_stream,track):
for blockgroup in cluster.blockgroups:
index = index + 1
timeRange = _print_time_range(timecode_scale,cluster.timecode,blockgroup.block.timecode,blockgroup.duration)
srt_stream.write(str(index) + '\n')
srt_stream.write(timeRange + '\n')
srt_stream.write(codecs.decode(blockgroup.block.data.read(),'utf-8') + '\n')
srt_stream.write('\n')
return srt_stream
def _parse_segment(stream,track):
stream.seek(0)
specs = ebml.get_matroska_specs()
# Find all level 1 Cluster elements and its subelements. Speed up this process by excluding all other currently known level 1 elements
try:
segments = ebml.parse(stream, specs,include_element_names=['Segment','Cluster','BlockGroup','Timecode','Block','BlockDuration',],max_level=3)
except ReadError:
pass
clusters = []
for cluster in segments[0].data:
_parse_cluster(track, clusters, cluster)
return clusters
def _parse_cluster(track, clusters, cluster):
blockgroups = []
timecode = None
for child in cluster.data:
if child.name == 'BlockGroup':
_parse_blockgroup(track, blockgroups, child)
elif child.name == 'Timecode':
timecode = child.data
if len(blockgroups) > 0 and timecode != None:
clusters.append(Cluster(timecode, blockgroups))
def _parse_blockgroup(track, blockgroups, blockgroup):
block = None
duration = None
for child in blockgroup.data:
if child.name == 'Block':
block = Block.fromelement(child)
if block.track != track:
block = None
elif child.name == 'BlockDuration':
duration = child.data
if duration != None and block != None:
blockgroups.append(BlockGroup(block, duration))
def _print_time_range(timecode_scale,clusterTimecode,blockTimecode,duration):
timecode_scale_ms = timecode_scale / 1000000 #Timecode
rawTimecode = clusterTimecode + blockTimecode
startTimeMilleSeconds = (rawTimecode) * timecode_scale_ms
endTimeMilleSeconds = (rawTimecode + duration) * timecode_scale_ms
return _print_time(startTimeMilleSeconds) + " --> " + _print_time(endTimeMilleSeconds)
def _print_time(timeInMilleSeconds):
timeInSeconds, milleSeconds = divmod(timeInMilleSeconds, 1000)
timeInMinutes, seconds = divmod(timeInSeconds, 60)
hours, minutes = divmod(timeInMinutes, 60)
return '%d:%02d:%02d,%d' % (hours,minutes,seconds,milleSeconds)
class Cluster(object):
def __init__(self,timecode=None, blockgroups=[]):
self.timecode = timecode
self.blockgroups = blockgroups
class BlockGroup(object):
def __init__(self,block=None,duration=None):
self.block = block
self.duration = duration
class Block(object):
def __init__(self, track=None, timecode=None, invisible=False, lacing=None, flags=None, data=None):
self.track = track
self.timecode = timecode
self.invisible = invisible
self.lacing = lacing
self.flags = flags
self.data = data
@classmethod
def fromelement(cls,element):
stream = element.data
track = ebml.read_element_size(stream)
timecode = ebml.read_element_integer(stream,2)
flags = ord(stream.read(1))
invisible = bool(flags & 0x8)
if (flags & 0x6):
lacing = 'EBML'
elif (flags & 0x4):
lacing = 'fixed-size'
elif (flags & 0x2):
lacing = 'Xiph'
else:
lacing = None
if lacing:
raise ReadError('Laced blocks are not implemented yet')
data = ebml.read_element_binary(stream, element.size - stream.tell())
return cls(track,timecode,invisible,lacing,flags,data)
def __repr__(self):
return '<%s track=%d, timecode=%d, invisible=%d, lacing=%s>' % (self.__class__.__name__, self.track,self.timecode,self.invisible,self.lacing)
class SimpleBlock(Block):
def __init__(self, track=None, timecode=None, keyframe=False, invisible=False, lacing=None, flags=None, data=None, discardable=False):
super(SimpleBlock,self).__init__(track,timecode,invisible,lacing,flags,data)
self.keyframe = keyframe
self.discardable = discardable
def fromelement(cls,element):
simpleblock = super(SimpleBlock, cls).fromelement(element)
simpleblock.keyframe = bool(simpleblock.flags & 0x80)
simpleblock.discardable = bool(simpleblock.flags & 0x1)
return simpleblock
def __repr__(self):
return '<%s track=%d, timecode=%d, keyframe=%d, invisible=%d, lacing=%s, discardable=%d>' % (self.__class__.__name__, self.track,self.timecode,self.keyframe,self.invisible,self.lacing,self.discardable)
@@ -1,9 +1,11 @@
# -*- coding: utf-8 -*-
from . import test_mkv, test_parsers
from . import test_mkv, test_parsers, test_subtitle
import unittest
suite = unittest.TestSuite([test_mkv.suite(), test_parsers.suite()])
suite = unittest.TestSuite([test_mkv.suite(), test_parsers.suite(), test_subtitle.suite()])
if __name__ == '__main__':
@@ -193,7 +193,7 @@ class MKVTestCase(unittest.TestCase):
self.assertTrue(mkv.audio_tracks[0].type == AUDIO_TRACK)
self.assertTrue(mkv.audio_tracks[0].number == 2)
self.assertTrue(mkv.audio_tracks[0].name is None)
self.assertTrue(mkv.audio_tracks[0].language is None)
self.assertTrue(mkv.audio_tracks[0].language == 'eng')
self.assertTrue(mkv.audio_tracks[0].enabled == True)
self.assertTrue(mkv.audio_tracks[0].default == True)
self.assertTrue(mkv.audio_tracks[0].forced == False)
@@ -276,7 +276,7 @@ class MKVTestCase(unittest.TestCase):
self.assertTrue(mkv.audio_tracks[1].type == AUDIO_TRACK)
self.assertTrue(mkv.audio_tracks[1].number == 10)
self.assertTrue(mkv.audio_tracks[1].name == 'Commentary')
self.assertTrue(mkv.audio_tracks[1].language is None)
self.assertTrue(mkv.audio_tracks[1].language == 'eng')
self.assertTrue(mkv.audio_tracks[1].enabled == True)
self.assertTrue(mkv.audio_tracks[1].default == False)
self.assertTrue(mkv.audio_tracks[1].forced == False)
@@ -292,7 +292,7 @@ class MKVTestCase(unittest.TestCase):
self.assertTrue(mkv.subtitle_tracks[0].type == SUBTITLE_TRACK)
self.assertTrue(mkv.subtitle_tracks[0].number == 3)
self.assertTrue(mkv.subtitle_tracks[0].name is None)
self.assertTrue(mkv.subtitle_tracks[0].language is None)
self.assertTrue(mkv.subtitle_tracks[0].language == 'eng')
self.assertTrue(mkv.subtitle_tracks[0].enabled == True)
self.assertTrue(mkv.subtitle_tracks[0].default == True)
self.assertTrue(mkv.subtitle_tracks[0].forced == False)
@@ -33,7 +33,7 @@ class EBMLTestCase(unittest.TestCase):
self.stream.close()
def check_element(self, element_id, element_type, element_name, element_level, element_position, element_size, element_data, element,
ignore_element_types=None, ignore_element_names=None, max_level=None):
ignore_element_types=None, ignore_element_names=None, max_level=None, include_element_names=None):
"""Recursively check an element"""
# base
self.assertTrue(element.id == element_id)
@@ -53,6 +53,8 @@ class EBMLTestCase(unittest.TestCase):
element_data = [e for e in element_data if e[1] not in ignore_element_types]
if ignore_element_names is not None: # filter validation on element names
element_data = [e for e in element_data if e[2] not in ignore_element_names]
if include_element_names is not None: # filter validation on element names
element_data = [e for e in element_data if e[2] in include_element_names]
if element.level == max_level: # special check when maximum level is reached
self.assertTrue(element.data is None)
return
@@ -60,7 +62,7 @@ class EBMLTestCase(unittest.TestCase):
for i in range(len(element.data)):
self.check_element(element_data[i][0], element_data[i][1], element_data[i][2], element_data[i][3],
element_data[i][4], element_data[i][5], element_data[i][6], element.data[i], ignore_element_types,
ignore_element_names, max_level)
ignore_element_names, max_level,include_element_names)
def test_parse_full(self):
result = ebml.parse(self.stream, self.specs)
@@ -87,6 +89,15 @@ class EBMLTestCase(unittest.TestCase):
self.check_element(self.validation[i][0], self.validation[i][1], self.validation[i][2], self.validation[i][3],
self.validation[i][4], self.validation[i][5], self.validation[i][6], result[i], ignore_element_names=ignore_element_names)
def test_parse_include_element_names(self):
include_element_names = ['Segment','Cluster']
result = ebml.parse(self.stream, self.specs, include_element_names=include_element_names)
self.validation = [e for e in self.validation if e[2] in include_element_names]
self.assertTrue(len(result) == len(self.validation))
for i in range(len(self.validation)):
self.check_element(self.validation[i][0], self.validation[i][1], self.validation[i][2], self.validation[i][3],
self.validation[i][4], self.validation[i][5], self.validation[i][6], result[i], include_element_names=include_element_names)
def test_parse_max_level(self):
max_level = 3
result = ebml.parse(self.stream, self.specs, max_level=max_level)
@@ -0,0 +1,86 @@
# -*- coding: utf-8 -*-
from enzyme.subtitle import Subtitle, _print_time_range, _print_time
import unittest
import os
import io
import requests
import zipfile
import glob
# Test directory
TEST_DIR = os.path.join(os.path.dirname(__file__), os.path.splitext(__file__)[0])
def setUpModule():
if not os.path.exists(TEST_DIR):
r = requests.get('http://downloads.sourceforge.net/project/matroska/test_files/matroska_test_w1_1.zip')
with zipfile.ZipFile(io.BytesIO(r.content), 'r') as f:
f.extractall(TEST_DIR)
class SubtitleTestCase(unittest.TestCase):
@classmethod
def setUpClass(cls):
file = 'test5.mkv'
stream = io.open(os.path.join(TEST_DIR, file), 'rb')
cls.subtitle = Subtitle(stream)
def test_subtitles_found(self):
subtitles = self.subtitle._subtitles
self.assertTrue('eng' in subtitles)
self.assertTrue('hun' in subtitles)
self.assertTrue('ger' in subtitles)
self.assertTrue('fre' in subtitles)
self.assertTrue('spa' in subtitles)
self.assertTrue('ita' in subtitles)
self.assertTrue('jpn' in subtitles)
self.assertTrue('und' in subtitles)
def test_write_subtitle_to_stream(self):
subtitle_stream = self.subtitle.write_subtitle_to_stream("eng")
self.assertIsInstance(subtitle_stream,io.StringIO,"Expecting a StringIO stream")
def test_write_subtitle_to_stream(self):
subtitle_streams = self.subtitle.write_subtitles_to_stream()
self.assertIn("eng", subtitle_streams, "Expecting a subtitle stream for language eng")
self.assertIsInstance(subtitle_streams["eng"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("hun", subtitle_streams, "Expecting a subtitle stream for language hun")
self.assertIsInstance(subtitle_streams["hun"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("ger", subtitle_streams, "Expecting a subtitle stream for language ger")
self.assertIsInstance(subtitle_streams["ger"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("fre", subtitle_streams, "Expecting a subtitle stream for language fre")
self.assertIsInstance(subtitle_streams["fre"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("spa", subtitle_streams, "Expecting a subtitle stream for language spa")
self.assertIsInstance(subtitle_streams["spa"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("ita", subtitle_streams, "Expecting a subtitle stream for language ita")
self.assertIsInstance(subtitle_streams["ita"],io.StringIO,"Expecting a StringIO stream")
self.assertIn("jpn", subtitle_streams, "Expecting a subtitle stream for language jpn")
self.assertIsInstance(subtitle_streams["jpn"],io.StringIO,"Expecting a StringIO stream")
def test_print_time(self):
self.assertEqual('0:00:00,0',_print_time(0))
self.assertEqual('0:00:00,1',_print_time(1))
self.assertEqual('0:00:00,999',_print_time(999))
self.assertEqual('0:00:01,0',_print_time(1000))
self.assertEqual('0:00:59,999',_print_time(1000*60-1))
self.assertEqual('0:01:00,0',_print_time(1000*60))
self.assertEqual('0:59:59,999',_print_time(1000*60*60-1))
self.assertEqual('1:00:00,0',_print_time(1000*60*60))
def test_print_time_range(self):
self.assertEqual('0:00:00,0 --> 0:00:00,0',_print_time_range(1000000,0,0,0))
self.assertEqual('0:01:00,0 --> 0:01:01,0',_print_time_range(1000000,0,60000,1000))
def suite():
suite = unittest.TestSuite()
suite.addTest(unittest.TestLoader().loadTestsFromTestCase(SubtitleTestCase))
return suite
if __name__ == '__main__':
unittest.TextTestRunner().run(suite())
+15 -3
View File
@@ -89,10 +89,14 @@ from guessit.guess import Guess, smart_merge
from guessit.language import Language
from guessit.matcher import IterativeMatcher
from guessit.textutils import clean_default, is_camel, from_camel
from copy import deepcopy
import babelfish
import os.path
import logging
from copy import deepcopy
from guessit.options import get_opts
import shlex
# Needed for guessit.plugins.transformers.reload() to be called.
from guessit.plugins import transformers
log = logging.getLogger(__name__)
@@ -117,7 +121,7 @@ def _build_filename_mtree(filename, options=None, **kwargs):
mtree = IterativeMatcher(filename, options=options, **kwargs)
second_pass_options = mtree.second_pass_options
if second_pass_options:
log.debug("Running 2nd pass")
log.debug('Running 2nd pass with options: %s' % second_pass_options)
merged_options = dict(options)
merged_options.update(second_pass_options)
mtree = IterativeMatcher(filename, options=merged_options, **kwargs)
@@ -271,8 +275,16 @@ def guess_file_info(filename, info=None, options=None, **kwargs):
"""
info = info or 'filename'
options = options or {}
if isinstance(options, base_text_type):
args = shlex.split(options)
options = vars(get_opts().parse_args(args))
if default_options:
merged_options = deepcopy(default_options)
if isinstance(default_options, base_text_type):
default_args = shlex.split(default_options)
merged_options = vars(get_opts().parse_args(default_args))
else:
merged_options = deepcopy(default_options)
merged_options.update(options)
options = merged_options
@@ -181,16 +181,16 @@ def submit_bug(filename, options):
opts = dict((k, v) for k, v in options.__dict__.items()
if v and k != 'submit_bug')
r = requests.post('http://localhost:5000/bugs', {'filename': filename,
r = requests.post('http://guessit.io/bugs', {'filename': filename,
'version': __version__,
'options': str(opts)})
if r.status_code == 200:
print('Successfully submitted file: %s' % r.text)
else:
print('Could not submit bug at the moment, please try again later.')
print('Could not submit bug at the moment, please try again later: %s %s' % (r.status_code, r.reason))
except RequestException as e:
print('Could not submit bug at the moment, please try again later.')
print('Could not submit bug at the moment, please try again later: %s' % e)
def main(args=None, setup_logging=True):
@@ -17,4 +17,4 @@
# You should have received a copy of the Lesser GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
#
__version__ = '0.10.4.dev0'
__version__ = '0.11.0'
@@ -135,8 +135,14 @@ class SameKeyValidator(object):
self.validator_function = validator_function
def validate(self, prop, string, node, match, entry_start, entry_end):
path_nodes = [path_node for path_node in node.ancestors if path_node.category == 'path']
if path_nodes:
path_node = path_nodes[0]
else:
path_node = node.root
for key in prop.keys:
for same_value_leaf in node.root.leaves_containing(key):
for same_value_leaf in path_node.leaves_containing(key):
ret = self.validator_function(same_value_leaf, key, prop, string, node, match, entry_start, entry_end)
if ret is not None:
return ret
@@ -144,6 +150,9 @@ class SameKeyValidator(object):
class OnlyOneValidator(SameKeyValidator):
"""
Check that there's only one occurence of key for current directory
"""
def __init__(self):
super(OnlyOneValidator, self).__init__(lambda same_value_leaf, key, prop, string, node, match, entry_start, entry_end: False)
@@ -153,12 +162,16 @@ class DefaultValidator(object):
def validate(self, prop, string, node, match, entry_start, entry_end):
span = _get_span(prop, match)
span = _trim_span(span, string[span[0]:span[1]])
return DefaultValidator.validate_string(string, span, entry_start, entry_end)
@staticmethod
def validate_string(string, span, entry_start=None, entry_end=None):
start, end = span
sep_start = start <= 0 or string[start - 1] in sep
sep_end = end >= len(string) or string[end] in sep
start_by_other = start in entry_end
end_by_other = end in entry_start
start_by_other = start in entry_end if entry_end else False
end_by_other = end in entry_start if entry_start else False
if (sep_start or start_by_other) and (sep_end or end_by_other):
return True
return False
@@ -235,6 +248,13 @@ class NeighborValidator(DefaultValidator):
return False
class FullMatchValidator(DefaultValidator):
"""Make sure the node match fully"""
def validate(self, prop, string, node, match, entry_start, entry_end):
at_start, at_end = _get_positions(prop, string, node, match, entry_start, entry_end)
return at_start and at_end
class LeavesValidator(DefaultValidator):
def __init__(self, lambdas=None, previous_lambdas=None, next_lambdas=None, both_side=False, default_=True):
@@ -290,7 +310,7 @@ class LeavesValidator(DefaultValidator):
class _Property:
"""Represents a property configuration."""
def __init__(self, keys=None, pattern=None, canonical_form=None, canonical_from_pattern=True, confidence=1.0, enhance=True, global_span=False, validator=DefaultValidator(), formatter=None, disabler=None, confidence_lambda=None):
def __init__(self, keys=None, pattern=None, canonical_form=None, canonical_from_pattern=True, confidence=1.0, enhance=True, global_span=False, validator=DefaultValidator(), formatter=None, disabler=None, confidence_lambda=None, remove_duplicates=False):
"""
:param keys: Keys of the property (format, screenSize, ...)
:type keys: string
@@ -309,6 +329,8 @@ class _Property:
:type validator: :class:`DefaultValidator`
:param formatter: Formater to use
:type formatter: function
:param remove_duplicates: Keep only the last match if multiple values are found
:type remove_duplicates: bool
"""
if isinstance(keys, list):
self.keys = keys
@@ -335,6 +357,7 @@ class _Property:
self.validator = validator
self.formatter = formatter
self.disabler = disabler
self.remove_duplicates = remove_duplicates
def disabled(self, options):
if self.disabler:
@@ -479,7 +502,8 @@ class PropertiesContainer(object):
entries.append((prop, match))
else:
matches = list(prop.compiled.finditer(string))
duplicate_matches[prop] = matches
if prop.remove_duplicates:
duplicate_matches[prop] = matches
for match in matches:
entries.append((prop, match))
@@ -490,6 +514,9 @@ class PropertiesContainer(object):
if computed_confidence is not None:
prop.confidence = computed_confidence
entries.sort(key=lambda entry: -entry[0].confidence)
# sort entries, from most confident to less confident
if validate:
# compute entries start and ends
for prop, match in entries:
@@ -531,7 +558,7 @@ class PropertiesContainer(object):
del entry_end[end]
for prop, prop_duplicate_matches in duplicate_matches.items():
# Keeping the last valid match.
# Keeping the last valid match only.
# Needed for the.100.109.hdtv-lol.mp4
for duplicate_match in prop_duplicate_matches[:-1]:
entries.remove((prop, duplicate_match))
@@ -561,8 +588,8 @@ class PropertiesContainer(object):
for prop, match in key_entries:
start, end = _get_span(prop, match)
if not best_prop or \
best_prop.confidence < best_prop.confidence or \
best_prop.confidence == best_prop.confidence and \
best_prop.confidence < prop.confidence or \
best_prop.confidence == prop.confidence and \
best_match.span()[1] - best_match.span()[0] < match.span()[1] - match.span()[0]:
best_prop, best_match = prop, match
+10 -10
View File
@@ -287,10 +287,10 @@ def choose_int(g1, g2):
if v1 == v2:
return v1, 1 - (1 - c1) * (1 - c2)
else:
if c1 > c2:
return v1, c1 - c2
if c1 >= c2:
return v1, c1 - c2 / 2
else:
return v2, c2 - c1
return v2, c2 - c1 / 2
def choose_string(g1, g2):
@@ -308,7 +308,7 @@ def choose_string(g1, g2):
prepended to it.
>>> s(choose_string(('Hello', 0.75), ('World', 0.5)))
('Hello', 0.25)
('Hello', 0.5)
>>> s(choose_string(('Hello', 0.5), ('hello', 0.5)))
('Hello', 0.75)
@@ -354,10 +354,10 @@ def choose_string(g1, g2):
# in case of conflict, return the one with highest confidence
else:
if c1 > c2:
return v1, c1 - c2
if c1 >= c2:
return v1, c1 - c2 / 2
else:
return v2, c2 - c1
return v2, c2 - c1 / 2
def _merge_similar_guesses_nocheck(guesses, prop, choose):
@@ -474,8 +474,8 @@ def merge_all(guesses, append=None):
# delete very unlikely values
for p in list(result.keys()):
if result.confidence(p) < 0.05:
del result[p]
if result.confidence(p) < 0.05:
del result[p]
# make sure our appendable properties contain unique values
for prop in append:
@@ -509,7 +509,7 @@ def smart_merge(guesses):
for string_part in ('title', 'series', 'container', 'format',
'releaseGroup', 'website', 'audioCodec',
'videoCodec', 'screenSize', 'episodeFormat',
'audioChannels', 'idNumber'):
'audioChannels', 'idNumber', 'container'):
merge_similar_guesses(guesses, string_part, choose_string)
# 2- merge the rest, potentially discarding information not properly
@@ -173,8 +173,9 @@ LNG_COMMON_WORDS = frozenset([
'is', 'it', 'am', 'mad', 'men', 'man', 'run', 'sin', 'st', 'to',
'no', 'non', 'war', 'min', 'new', 'car', 'day', 'bad', 'bat', 'fan',
'fry', 'cop', 'zen', 'gay', 'fat', 'one', 'cherokee', 'got', 'an', 'as',
'cat', 'her', 'be', 'hat', 'sun', 'may', 'my', 'mr', 'rum', 'pi', 'bb', 'bt',
'tv', 'aw', 'by', 'md', 'mp', 'cd', 'lt', 'gt', 'in', 'ad', 'ice', 'ay', 'at',
'cat', 'her', 'be', 'hat', 'sun', 'may', 'my', 'mr', 'rum', 'pi', 'bb',
'bt', 'tv', 'aw', 'by', 'md', 'mp', 'cd', 'lt', 'gt', 'in', 'ad', 'ice',
'ay', 'at', 'star', 'so',
# french words
'bas', 'de', 'le', 'son', 'ne', 'ca', 'ce', 'et', 'que',
'mal', 'est', 'vol', 'or', 'mon', 'se', 'je', 'tu', 'me',
@@ -185,7 +186,7 @@ LNG_COMMON_WORDS = frozenset([
'la', 'el', 'del', 'por', 'mar', 'al',
# other
'ind', 'arw', 'ts', 'ii', 'bin', 'chan', 'ss', 'san', 'oss', 'iii',
'vi', 'ben', 'da', 'lt', 'ch',
'vi', 'ben', 'da', 'lt', 'ch', 'sr', 'ps', 'cx',
# new from babelfish
'mkv', 'avi', 'dmd', 'the', 'dis', 'cut', 'stv', 'des', 'dia', 'and',
'cab', 'sub', 'mia', 'rim', 'las', 'une', 'par', 'srt', 'ano', 'toy',
@@ -197,7 +198,7 @@ LNG_COMMON_WORDS = frozenset([
'bs', # Bosnian
'kz',
# countries
'gt', 'lt',
'gt', 'lt', 'im',
# part/pt
'pt'
])
@@ -206,9 +207,11 @@ LNG_COMMON_WORDS_STRICT = frozenset(['brazil'])
subtitle_prefixes = ['sub', 'subs', 'st', 'vost', 'subforced', 'fansub', 'hardsub']
subtitle_suffixes = ['subforced', 'fansub', 'hardsub']
subtitle_suffixes = ['subforced', 'fansub', 'hardsub', 'sub', 'subs']
lang_prefixes = ['true']
all_lang_prefixes_suffixes = subtitle_prefixes + subtitle_suffixes + lang_prefixes
def find_possible_languages(string, allowed_languages=None):
"""Find possible languages in the string
@@ -239,7 +242,7 @@ def find_possible_languages(string, allowed_languages=None):
for prefix in lang_prefixes:
if lang_word.startswith(prefix):
lang_word = lang_word[len(prefix):]
if lang_word not in common_words:
if lang_word not in common_words and word.lower() not in common_words:
try:
lang = Language.fromguessit(lang_word)
if allowed_languages:
+73 -67
View File
@@ -215,94 +215,100 @@ def log_found_guess(guess, logger=None):
(k, v, guess.raw(k), guess.confidence(k)))
def _get_split_spans(node, span):
partition_spans = node.get_partition_spans(span)
for to_remove_span in partition_spans:
if to_remove_span[0] == span[0] and to_remove_span[1] in [span[1], span[1] + 1]:
partition_spans.remove(to_remove_span)
break
return partition_spans
class GuessFinder(object):
def __init__(self, guess_func, confidence=None, logger=None, options=None):
self.guess_func = guess_func
self.confidence = confidence
self.logger = logger or log
self.options = options
self.options = options or {}
def process_nodes(self, nodes):
for node in nodes:
self.process_node(node)
def process_node(self, node, iterative=True, partial_span=None):
def process_node(self, node, iterative=True, partial_span=None, skip_nodes=True):
if skip_nodes and not isinstance(skip_nodes, list):
skip_nodes = self.options.get('skip_nodes')
elif not isinstance(skip_nodes, list):
skip_nodes = []
if partial_span:
value = node.value[partial_span[0]:partial_span[1]]
else:
value = node.value
string = ' %s ' % value # add sentinels
if not self.options:
matcher_result = self.guess_func(string, node)
matcher_result = self.guess_func(string, node, self.options)
if not matcher_result:
return
if not isinstance(matcher_result, Guess):
result, span = matcher_result
else:
matcher_result = self.guess_func(string, node, self.options)
result, span = matcher_result, matcher_result.metadata().span
#log.error('span2 %s' % (span,))
if matcher_result:
if not isinstance(matcher_result, Guess):
result, span = matcher_result
else:
result, span = matcher_result, matcher_result.metadata().span
if not result:
return
if result:
# readjust span to compensate for sentinels
span = (span[0] - 1, span[1] - 1)
if span[1] == len(string):
# somehow, the sentinel got included in the span. Remove it
span = (span[0], span[1] - 1)
# readjust span to compensate for partial_span
if partial_span:
span = (span[0] + partial_span[0], span[1] + partial_span[0])
# readjust span to compensate for sentinels
span = (span[0] - 1, span[1] - 1)
partition_spans = None
if self.options and 'skip_nodes' in self.options:
skip_nodes = self.options.get('skip_nodes')
for skip_node in skip_nodes:
if skip_node.parent.node_idx == node.node_idx[:len(skip_node.parent.node_idx)] and\
skip_node.span == span or\
skip_node.span == (span[0] + skip_node.offset, span[1] + skip_node.offset):
if partition_spans is None:
partition_spans = _get_split_spans(node, skip_node.span)
else:
new_partition_spans = []
for partition_span in partition_spans:
tmp_node = MatchTree(value, span=partition_span, parent=node)
tmp_partitions_spans = _get_split_spans(tmp_node, skip_node.span)
new_partition_spans.extend(tmp_partitions_spans)
partition_spans.extend(new_partition_spans)
# readjust span to compensate for partial_span
if partial_span:
span = (span[0] + partial_span[0], span[1] + partial_span[0])
if not partition_spans:
# restore sentinels compensation
if skip_nodes:
skip_nodes = [skip_node for skip_node in self.options.get('skip_nodes') if skip_node.parent.span[0] == node.span[0] or skip_node.parent.span[1] == node.span[1]]
# if we guessed a node that we need to skip, recurse down the tree and ignore that node
indices = set()
skip_nodes_spans = []
next_skip_nodes = []
for skip_node in skip_nodes:
skip_for_next = False
skip_nodes_spans.append(skip_node.span)
if node.offset <= skip_node.span[0] <= node.span[1]:
indices.add(skip_node.span[0] - node.offset)
skip_for_next = True
if node.offset <= skip_node.span[1] <= node.span[1]:
indices.add(skip_node.span[1] - node.offset)
skip_for_next = True
if not skip_for_next:
next_skip_nodes.append(skip_node)
if indices:
partition_spans = [s for s in node.get_partition_spans(indices) if s not in skip_nodes_spans]
for partition_span in partition_spans:
relative_span = (partition_span[0] - node.offset, partition_span[1] - node.offset)
self.process_node(node, partial_span=relative_span, skip_nodes=next_skip_nodes)
return
if isinstance(result, Guess):
guess = result
else:
guess = Guess(result, confidence=self.confidence, input=string, span=span)
# restore sentinels compensation
if isinstance(result, Guess):
guess = result
else:
no_sentinel_string =string[1:-1]
guess = Guess(result, confidence=self.confidence, input=no_sentinel_string, span=span)
if not iterative:
found_guess(node, guess, logger=self.logger)
else:
absolute_span = (span[0] + node.offset, span[1] + node.offset)
node.partition(span)
found_child = None
for child in node.children:
if child.span == absolute_span:
# if we have a match on one of our children, mark it as such...
found_guess(child, guess, logger=self.logger)
found_child = child
break
# ...and only then recurse on the other children
for child in node.children:
if child is not found_child:
self.process_node(child)
if not iterative:
found_guess(node, guess, logger=self.logger)
else:
absolute_span = (span[0] + node.offset, span[1] + node.offset)
node.partition(span)
if node.is_leaf():
found_guess(node, guess, logger=self.logger)
else:
found_child = None
for child in node.children:
if child.span == absolute_span:
found_guess(child, guess, logger=self.logger)
found_child = child
break
for child in node.children:
if child is not found_child:
self.process_node(child)
else:
for partition_span in partition_spans:
self.process_node(node, partial_span=partition_span)
+82 -18
View File
@@ -27,9 +27,7 @@ import guessit # @UnusedImport needed for doctests
from guessit import UnicodeMixin, base_text_type
from guessit.textutils import clean_default, str_fill
from guessit.patterns import group_delimiters
from guessit.guess import (smart_merge,
Guess)
from guessit.guess import smart_merge, Guess
log = logging.getLogger(__name__)
@@ -75,7 +73,7 @@ class BaseMatchTree(UnicodeMixin):
(as shown by the ``f``'s on the last-but-one line).
"""
def __init__(self, string='', span=None, parent=None, clean_function=None):
def __init__(self, string='', span=None, parent=None, clean_function=None, category=None):
self.string = string
self.span = span or (0, len(string))
self.parent = parent
@@ -83,6 +81,7 @@ class BaseMatchTree(UnicodeMixin):
self.guess = Guess()
self._clean_value = None
self._clean_function = clean_function or clean_default
self.category = category
@property
def value(self):
@@ -116,6 +115,32 @@ class BaseMatchTree(UnicodeMixin):
return result
@property
def raw(self):
result = {}
for guess in self.guesses:
for k in guess.keys():
result[k] = guess.raw(k)
return result
@property
def guesses(self):
"""
List all guesses, including children ones.
:return: list of guesses objects
"""
result = []
if self.guess:
result.append(self.guess)
for c in self.children:
result.extend(c.guesses)
return result
@property
def root(self):
"""Return the root node of the tree."""
@@ -124,6 +149,23 @@ class BaseMatchTree(UnicodeMixin):
return self.parent.root
@property
def ancestors(self):
"""
Retrieve all ancestors, from this node to root node.
:return: a list of MatchTree objects
"""
ret = [self]
if not self.parent:
return ret
parent_ancestors = self.parent.ancestors
ret.extend(parent_ancestors)
return ret
@property
def depth(self):
"""Return the depth of this node."""
@@ -136,17 +178,30 @@ class BaseMatchTree(UnicodeMixin):
"""Return whether this node is a leaf or not."""
return self.children == []
def add_child(self, span):
"""Add a new child node to this node with the given span."""
child = MatchTree(self.string, span=span, parent=self, clean_function=self._clean_function)
def add_child(self, span, category=None):
"""Add a new child node to this node with the given span.
:param span: span of the new MatchTree
:param category: category of the new MatchTree
:return: A new MatchTree instance having self as a parent
"""
child = MatchTree(self.string, span=span, parent=self, clean_function=self._clean_function, category=category)
self.children.append(child)
return child
def get_partition_spans(self, indices):
"""Return the list of absolute spans for the regions of the original
string defined by splitting this node at the given indices (relative
to this node)"""
to this node)
:param indices: indices of the partition spans
:return: a list of tuple of the spans
"""
indices = sorted(indices)
if indices[-1] > len(self.value):
log.error('Filename: {}'.format(self.string))
log.error('Invalid call to get_partitions_spans, indices are too high: {}, len({}) == {:d}'
.format(indices, self.value, len(self.value)))
if indices[0] != 0:
indices.insert(0, 0)
if indices[-1] != len(self.value):
@@ -155,23 +210,33 @@ class BaseMatchTree(UnicodeMixin):
spans = []
for start, end in zip(indices[:-1], indices[1:]):
spans.append((self.offset + start,
self.offset + end))
self.offset + end))
return spans
def partition(self, indices):
def partition(self, indices, category=None):
"""Partition this node by splitting it at the given indices,
relative to this node."""
for partition_span in self.get_partition_spans(indices):
self.add_child(span=partition_span)
relative to this node.
def split_on_components(self, components):
:param indices: indices of the partition spans
:param category: category of the new MatchTree
:return: a list of created MatchTree instances
"""
created = []
for partition_span in self.get_partition_spans(indices):
created.append(self.add_child(span=partition_span, category=category))
return created
def split_on_components(self, components, category=None):
offset = 0
created = []
for c in components:
start = self.value.find(c, offset)
end = start + len(c)
self.add_child(span=(self.offset + start,
self.offset + end))
created.append(self.add_child(span=(self.offset + start,
self.offset + end), category=category))
offset = end
return created
def nodes_at_depth(self, depth):
"""Return all the nodes at a given depth in the tree"""
@@ -208,7 +273,7 @@ class BaseMatchTree(UnicodeMixin):
raise ValueError('Non-existent node index: %s' % (idx,))
def nodes(self):
"""Return all the nodes and subnodes in this tree."""
"""Return a generator of all nodes and subnodes in this tree."""
yield self
for child in self.children:
for node in child.nodes():
@@ -220,7 +285,6 @@ class BaseMatchTree(UnicodeMixin):
yield self
else:
for child in self.children:
# pylint: disable=W0212
for leaf in child.leaves():
yield leaf
@@ -29,4 +29,4 @@ info_exts = ['nfo']
video_exts = ['3g2', '3gp', '3gp2', 'asf', 'avi', 'divx', 'flv', 'm4v', 'mk2',
'mka', 'mkv', 'mov', 'mp4', 'mp4a', 'mpeg', 'mpg', 'ogg', 'ogm',
'ogv', 'qt', 'ra', 'ram', 'rm', 'ts', 'wav', 'webm', 'wma', 'wmv',
'iso']
'iso', 'vob']
@@ -0,0 +1,80 @@
import re
from guessit.patterns import sep, build_or_pattern
from guessit.patterns.numeral import parse_numeral
range_separators = ['-', 'to', 'a']
discrete_separators = ['&', 'and', 'et']
excluded_separators = ['.'] # Dot cannot serve as a discrete_separator
discrete_sep = sep
for range_separator in range_separators:
discrete_sep = discrete_sep.replace(range_separator, '')
for excluded_separator in excluded_separators:
discrete_sep = discrete_sep.replace(excluded_separator, '')
discrete_separators.append(discrete_sep)
all_separators = list(range_separators)
all_separators.extend(discrete_separators)
range_separators_re = re.compile(build_or_pattern(range_separators), re.IGNORECASE)
discrete_separators_re = re.compile(build_or_pattern(discrete_separators), re.IGNORECASE)
all_separators_re = re.compile(build_or_pattern(all_separators), re.IGNORECASE)
def list_parser(value, property_list_name, discrete_separators_re=discrete_separators_re, range_separators_re=range_separators_re, allow_discrete=False, fill_gaps=False):
discrete_elements = filter(lambda x: x != '', discrete_separators_re.split(value))
discrete_elements = [x.strip() for x in discrete_elements]
proper_discrete_elements = []
i = 0
while i < len(discrete_elements):
if i < len(discrete_elements) - 2 and range_separators_re.match(discrete_elements[i+1]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i+1] + discrete_elements[i+2])
i += 3
else:
match = range_separators_re.search(discrete_elements[i])
if match and match.start() == 0:
proper_discrete_elements[i - 1] += discrete_elements[i]
elif match and match.end() == len(discrete_elements[i]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i + 1])
else:
proper_discrete_elements.append(discrete_elements[i])
i += 1
discrete_elements = proper_discrete_elements
ret = []
for discrete_element in discrete_elements:
range_values = filter(lambda x: x != '', range_separators_re.split(discrete_element))
range_values = [x.strip() for x in range_values]
if len(range_values) > 1:
for x in range(0, len(range_values) - 1):
start_range_ep = parse_numeral(range_values[x])
end_range_ep = parse_numeral(range_values[x+1])
for range_ep in range(start_range_ep, end_range_ep + 1):
if range_ep not in ret:
ret.append(range_ep)
else:
discrete_value = parse_numeral(discrete_element)
if discrete_value not in ret:
ret.append(discrete_value)
if len(ret) > 1:
if not allow_discrete:
valid_ret = list()
# replace discrete elements by ranges
valid_ret.append(ret[0])
for i in range(0, len(ret) - 1):
previous = valid_ret[len(valid_ret) - 1]
if ret[i+1] < previous:
pass
else:
valid_ret.append(ret[i+1])
ret = valid_ret
if fill_gaps:
ret = list(range(min(ret), max(ret) + 1))
if len(ret) > 1:
return {None: ret[0], property_list_name: ret}
if len(ret) > 0:
return ret[0]
return None
@@ -19,11 +19,14 @@
#
from __future__ import absolute_import, division, print_function, unicode_literals
from functools import wraps
import logging
import sys
import os
log = logging.getLogger(__name__)
GREEN_FONT = "\x1B[0;32m"
YELLOW_FONT = "\x1B[0;33m"
BLUE_FONT = "\x1B[0;34m"
@@ -87,3 +90,27 @@ def setup_logging(colored=True, with_time=False, with_thread=False, filename=Non
ch.setFormatter(SimpleFormatter(with_time, with_thread))
logging.getLogger().addHandler(ch)
def trace_func_call(f):
@wraps(f)
def wrapper(*args, **kwargs):
is_method = (f.__name__ != f.__qualname__) # method is still not bound, we need to get around it
if is_method:
no_self_args = args[1:]
else:
no_self_args = args
args_str = ', '.join(repr(arg) for arg in no_self_args)
kwargs_str = ', '.join('{}={}'.format(k, v) for k, v in kwargs.items())
if not args_str:
args_str = kwargs_str
elif not kwargs_str:
args_str = args_str
else:
args_str = '{}, {}'.format(args_str, kwargs_str)
log.debug('Calling {}({})'.format(f.__name__, args_str))
return f(*args, **kwargs)
return wrapper
@@ -525,3 +525,29 @@
screenSize: 720p
season: 5
series: Game of Thrones
? Parks and Recreation - [04x12] - Ad Campaign.avi
: type: episode
series: Parks and Recreation
season: 4
episodeNumber: 12
title: Ad Campaign
? Star Trek Into Darkness (2013)/star.trek.into.darkness.2013.720p.web-dl.h264-publichd.mkv
: type: movie
title: Star Trek Into Darkness
year: 2013
screenSize: 720p
format: WEB-DL
videoCodec: h264
releaseGroup: PublicHD
? /var/medias/series/The Originals/Season 02/The.Originals.S02E15.720p.HDTV.X264-DIMENSION.mkv
: type: episode
series: The Originals
season: 2
episodeNumber: 15
screenSize: 720p
format: HDTV
videoCodec: h264
releaseGroup: DIMENSION
@@ -282,12 +282,6 @@
episodeNumber: 1
title: The Impossible Astronaut
? Parks and Recreation - [04x12] - Ad Campaign.avi
: series: Parks and Recreation
season: 4
episodeNumber: 12
title: Ad Campaign
? The Sopranos - [05x07] - In Camelot.mp4
: series: The Sopranos
season: 5
@@ -635,7 +629,7 @@
format: HDTV
releaseGroup: lol
? 03-Criminal.Minds.5x03.Reckoner.ENG.-.sub.FR.HDTV.XviD-STi.[tvu.org.ru].avi
? Criminal.Minds.5x03.Reckoner.ENG.-.sub.FR.HDTV.XviD-STi.[tvu.org.ru].avi
: series: Criminal Minds
language: English
subtitleLanguage: French
@@ -1186,3 +1180,684 @@
videoCodec: h264
releaseGroup: BS
format: WEB-DL
? How to Make It in America - S02E06 - I'm Sorry, Who's Yosi?.mkv
: series: How to Make It in America
season: 2
episodeNumber: 6
title: I'm Sorry, Who's Yosi?
? 24.S05E07.FRENCH.DVDRip.XviD-FiXi0N.avi
: episodeNumber: 7
format: DVD
language: fr
season: 5
series: '24'
videoCodec: XviD
releaseGroup: FiXi0N
? 12.Monkeys.S01E12.FRENCH.BDRip.x264-VENUE.mkv
: episodeNumber: 12
format: BluRay
language: fr
releaseGroup: VENUE
season: 1
series: 12 Monkeys
videoCodec: h264
? The.Daily.Show.2015.07.01.Kirsten.Gillibrand.Extended.720p.CC.WEBRip.AAC2.0.x264-BTW.mkv
: audioChannels: '2.0'
audioCodec: AAC
date: 2015-07-01
format: WEBRip
other: CC
releaseGroup: BTW
screenSize: 720p
series: The Daily Show
title: Kirsten Gillibrand Extended
videoCodec: h264
? The.Daily.Show.2015.07.02.Sarah.Vowell.CC.WEBRip.AAC2.0.x264-BTW.mkv
: audioChannels: '2.0'
audioCodec: AAC
date: 2015-07-02
format: WEBRip
other: CC
releaseGroup: BTW
series: The Daily Show
title: Sarah Vowell
videoCodec: h264
? 90.Day.Fiance.S02E07.I.Have.To.Tell.You.Something.720p.HDTV.x264-W4F
: options: -n
episodeNumber: 7
format: HDTV
screenSize: 720p
season: 2
series: 90 Day Fiance
title: I Have To Tell You Something
? Doctor.Who.2005.S04E06.FRENCH.LD.DVDRip.XviD-TRACKS.avi
: episodeNumber: 6
format: DVD
language: fr
releaseGroup: TRACKS
season: 4
series: Doctor Who
other: LD
videoCodec: XviD
year: 2005
? Astro.Le.Petit.Robot.S01E01+02.FRENCH.DVDRiP.X264.INT-BOOLZ.mkv
: episodeNumber: 1
episodeList: [1, 2]
format: DVD
language: fr
releaseGroup: INT-BOOLZ
season: 1
series: Astro Le Petit Robot
videoCodec: h264
? Annika.Bengtzon.2012.E01.Le.Testament.De.Nobel.FRENCH.DVDRiP.XViD-STVFRV.avi
: episodeNumber: 1
format: DVD
language: fr
releaseGroup: STVFRV
series: Annika Bengtzon
title: Le Testament De Nobel
videoCodec: XviD
year: 2012
? Dead.Set.02.FRENCH.LD.DVDRip.XviD-EPZ.avi
: episodeNumber: 2
format: DVD
language: fr
other: LD
releaseGroup: EPZ
series: Dead Set
videoCodec: XviD
? Phineas and Ferb S01E00 & S01E01 & S01E02
: options: -n
episodeList:
- 0
- 1
- 2
episodeNumber: 0
season: 1
series: Phineas and Ferb
? Show.Name.S01E02.S01E03.HDTV.XViD.Etc-Group
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - S01E02 - S01E03 - S01E04 - Ep Name
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
season: 1
series: Show Name
title: Ep Name
? Show.Name.1x02.1x03.HDTV.XViD.Etc-Group
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - 1x02 - 1x03 - 1x04 - Ep Name
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
season: 1
series: Show Name
title: Ep Name
? Show.Name.S01E02.HDTV.XViD.Etc-Group
: options: -n
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - S01E02 - My Ep Name
: options: -n
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show Name - S01.E03 - My Ep Name
: options: -n
episodeNumber: 3
season: 1
series: Show Name
title: My Ep Name
? Show.Name.S01E02E03.HDTV.XViD.Etc-Group
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - S01E02-03 - My Ep Name
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show.Name.S01.E02.E03
: options: -n
episodeList:
- 2
- 3
episodeNumber: 2
season: 1
series: Show Name
? Show_Name.1x02.HDTV_XViD_Etc-Group
: options: -n
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - 1x02 - My Ep Name
: options: -n
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show_Name.1x02x03x04.HDTV_XViD_Etc-Group
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show Name - 1x02-03-04 - My Ep Name
: options: -n
episodeList:
- 2
- 3
- 4
episodeNumber: 2
season: 1
series: Show Name
title: My Ep Name
? Show.Name.100.Event.2010.11.23.HDTV.XViD.Etc-Group
: options: -n
date: 2010-11-23
episodeNumber: 100
format: HDTV
releaseGroup: Etc-Group
series: Show Name
title: Event
videoCodec: XviD
? Show.Name.2010.11.23.HDTV.XViD.Etc-Group
: options: -n
date: 2010-11-23
format: HDTV
releaseGroup: Etc-Group
series: Show Name
? Show Name - 2010-11-23 - Ep Name
: options: -n
date: 2010-11-23
series: Show Name
title: Ep Name
? Show Name Season 1 Episode 2 Ep Name
: options: -n
episodeNumber: 2
season: 1
series: Show Name
title: Ep Name
? Show.Name.S01.HDTV.XViD.Etc-Group
: options: -n
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? Show.Name.E02-03
: options: -n
episodeNumber: 2
episodeList:
- 2
- 3
series: Show Name
? Show.Name.E02.2010
: options: -n
episodeNumber: 2
year: 2010
series: Show Name
? Show.Name.E23.Test
: options: -n
episodeNumber: 23
series: Show Name
title: Test
? Show.Name.Part.3.HDTV.XViD.Etc-Group
: options: -n -t episode
part: 3
series: Show Name
format: HDTV
videoCodec: XviD
releaseGroup: Etc-Group
? Show.Name.Part.1.and.Part.2.Blah-Group
: options: -n -t episode
part: 1
partList:
- 1
- 2
series: Show Name
? Show Name - 01 - Ep Name
: options: -n
episodeNumber: 1
series: Show Name
title: Ep Name
? 01 - Ep Name
: options: -n
episodeNumber: 1
series: Ep Name
? Show.Name.102.HDTV.XViD.Etc-Group
: options: -n
episodeNumber: 2
format: HDTV
releaseGroup: Etc-Group
season: 1
series: Show Name
videoCodec: XviD
? '[HorribleSubs] Maria the Virgin Witch - 01 [720p].mkv'
: episodeNumber: 1
releaseGroup: HorribleSubs
screenSize: 720p
series: Maria the Virgin Witch
? '[ISLAND]One_Piece_679_[VOSTFR]_[V1]_[8bit]_[720p]_[EB7838FC].mp4'
: options: -E
crc32: EB7838FC
episodeNumber: 679
releaseGroup: ISLAND
screenSize: 720p
series: One Piece
subtitleLanguage: fr
videoProfile: 8bit
version: 1
? '[ISLAND]One_Piece_679_[VOSTFR]_[8bit]_[720p]_[EB7838FC].mp4'
: options: -E
crc32: EB7838FC
episodeNumber: 679
releaseGroup: ISLAND
screenSize: 720p
series: One Piece
subtitleLanguage: fr
videoProfile: 8bit
? '[Kaerizaki-Fansub]_One_Piece_679_[VOSTFR][HD_1280x720].mp4'
: options: -E
episodeNumber: 679
other: HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: One Piece
subtitleLanguage: fr
? '[Kaerizaki-Fansub]_One_Piece_679_[VOSTFR][FANSUB][HD_1280x720].mp4'
: options: -E
episodeNumber: 679
other:
- Fansub
- HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: One Piece
subtitleLanguage: fr
? '[Kaerizaki-Fansub]_One_Piece_681_[VOSTFR][HD_1280x720]_V2.mp4'
: options: -E
episodeNumber: 681
other: HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: One Piece
subtitleLanguage: fr
version: 2
? '[Kaerizaki-Fansub] High School DxD New 04 VOSTFR HD (1280x720) V2.mp4'
: options: -E
episodeNumber: 4
other: HD
releaseGroup: Kaerizaki-Fansub
screenSize: 720p
series: High School DxD New
subtitleLanguage: fr
version: 2
? '[Kaerizaki-Fansub] One Piece 603 VOSTFR PS VITA (960x544) V2.mp4'
: options: -E
episodeNumber: 603
releaseGroup: Kaerizaki-Fansub
screenSize: 960x544
series: One Piece
subtitleLanguage: fr
version: 2
? '[Group Name] Show Name.13'
: options: -n
episodeNumber: 13
releaseGroup: Group Name
series: Show Name
? '[Group Name] Show Name - 13'
: options: -n
episodeNumber: 13
releaseGroup: Group Name
series: Show Name
? '[Group Name] Show Name 13'
: options: -n
episodeNumber: 13
releaseGroup: Group Name
series: Show Name
# [Group Name] Show Name.13-14
# [Group Name] Show Name - 13-14
# Show Name 13-14
? '[Stratos-Subs]_Infinite_Stratos_-_12_(1280x720_H.264_AAC)_[379759DB]'
: options: -n
audioCodec: AAC
crc32: 379759DB
episodeNumber: 12
releaseGroup: Stratos-Subs
screenSize: 720p
series: Infinite Stratos
videoCodec: h264
# [ShinBunBu-Subs] Bleach - 02-03 (CX 1280x720 x264 AAC)
? '[SGKK] Bleach 312v1 [720p/MKV]'
: options: -n
episodeNumber: 312
releaseGroup: SGKK
screenSize: 720p
series: Bleach
version: 1
? '[Ayako]_Infinite_Stratos_-_IS_-_07_[H264][720p][EB7838FC]'
: options: -n
crc32: EB7838FC
episodeNumber: 7
releaseGroup: Ayako
screenSize: 720p
series: Infinite Stratos
videoCodec: h264
? '[Ayako] Infinite Stratos - IS - 07v2 [H264][720p][44419534]'
: options: -n
crc32: '44419534'
episodeNumber: 7
releaseGroup: Ayako
screenSize: 720p
series: Infinite Stratos
videoCodec: h264
version: 2
? '[Ayako-Shikkaku] Oniichan no Koto Nanka Zenzen Suki Janain Dakara ne - 10 [LQ][h264][720p] [8853B21C]'
: options: -n
crc32: 8853B21C
episodeNumber: 10
releaseGroup: Ayako-Shikkaku
screenSize: 720p
series: Oniichan no Koto Nanka Zenzen Suki Janain Dakara ne
videoCodec: h264
# Add support for absolute episodes
? Bleach - s16e03-04 - 313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach.s16e03-04.313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach.s16e03-04.313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach - 313-314
: options: -En
episodeList:
- 313
- 314
episodeNumber: 313
series: Bleach
? Bleach - s16e03-04 - 313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach.s16e03-04.313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? Bleach s16e03e04 313-314
: options: -n
episodeList:
- 3
- 4
episodeNumber: 3
season: 16
series: Bleach
? '[ShinBunBu-Subs] Bleach - 02-03 (CX 1280x720 x264 AAC)'
: audioCodec: AAC
episodeList:
- 2
- 3
episodeNumber: 2
releaseGroup: ShinBunBu-Subs
screenSize: 720p
series: Bleach
videoCodec: h264
? 003. Show Name - Ep Name.ext
: episodeNumber: 3
series: Show Name
title: Ep Name
? 003-004. Show Name - Ep Name.ext
: episodeList:
- 3
- 4
episodeNumber: 3
series: Show Name
title: Ep Name
? One Piece - 102
: options: -n -t episode
episodeNumber: 2
season: 1
series: One Piece
? "[ACX]_Wolf's_Spirit_001.mkv"
: episodeNumber: 1
releaseGroup: ACX
series: "Wolf's Spirit"
? Project.Runway.S14E00.and.S14E01.(Eng.Subs).SDTV.x264-[2Maverick].mp4
: episodeList:
- 0
- 1
episodeNumber: 0
format: TV
releaseGroup: 2Maverick
season: 14
series: Project Runway
subtitleLanguage: en
videoCodec: h264
? '[Hatsuyuki-Kaitou]_Fairy_Tail_2_-_16-20_[720p][10bit].torrent'
: episodeList:
- 16
- 17
- 18
- 19
- 20
episodeNumber: 16
releaseGroup: Hatsuyuki-Kaitou
screenSize: 720p
series: Fairy Tail 2
videoProfile: 10bit
? '[Hatsuyuki-Kaitou]_Fairy_Tail_2_-_16-20_(191-195)_[720p][10bit].torrent'
: options: -E
episodeList:
- 16
- 17
- 18
- 19
- 20
episodeNumber: 16
releaseGroup: Hatsuyuki-Kaitou
screenSize: 720p
series: Fairy Tail 2
? "Looney Tunes 1940x01 Porky's Last Stand.mkv"
: episodeNumber: 1
season: 1940
series: Looney Tunes
title: Porky's Last Stand
year: 1940
? The.Good.Wife.S06E01.E10.720p.WEB-DL.DD5.1.H.264-CtrlHD/The.Good.Wife.S06E09.Trust.Issues.720p.WEB-DL.DD5.1.H.264-CtrlHD.mkv
: audioChannels: '5.1'
audioCodec: DolbyDigital
episodeList:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
episodeNumber: 9
format: WEB-DL
releaseGroup: CtrlHD
screenSize: 720p
season: 6
series: The Good Wife
title: Trust Issues
videoCodec: h264
? Fear the Walking Dead - 01x02 - So Close, Yet So Far.REPACK-KILLERS.French.C.updated.Addic7ed.com.mkv
: episodeNumber: 2
language: fr
other: Proper
properCount: 1
season: 1
series: Fear the Walking Dead
title: So Close, Yet So Far
? Fear the Walking Dead - 01x02 - En Close, Yet En Far.REPACK-KILLERS.French.C.updated.Addic7ed.com.mkv
: episodeNumber: 2
language: fr
other: Proper
properCount: 1
season: 1
series: Fear the Walking Dead
title: En Close, Yet En Far
? /av/unsorted/The.Daily.Show.2015.07.22.Jake.Gyllenhaal.720p.HDTV.x264-BATV.mkv
: date: 2015-07-22
format: HDTV
releaseGroup: BATV
screenSize: 720p
series: The Daily Show
title: Jake Gyllenhaal
videoCodec: h264
@@ -22,7 +22,6 @@ from __future__ import absolute_import, division, print_function, unicode_litera
from collections import defaultdict
from unittest import TestCase, TestLoader
import shlex
import logging
import os
import sys
@@ -86,10 +85,6 @@ class TestGuessit(TestCase):
options = required_fields.pop('options') if 'options' in required_fields else None
if options:
args = shlex.split(options)
options = get_opts().parse_args(args)
options = vars(options)
try:
found = guess_func(filename, options)
except Exception as e:
@@ -606,7 +606,9 @@
? Yves.Saint.Laurent.2013.FRENCH.DVDSCR.MD.XviD-ViVARiUM.avi
: format: DVD
language: French
other: Screener
other:
- MD
- Screener
releaseGroup: ViVARiUM
title: Yves Saint Laurent
videoCodec: XviD
@@ -759,3 +761,19 @@
screenSize: 1080p
title: transformers 2
videoCodec: h265
? 1.Angry.Man.1957.mkv
: title: 1 Angry Man
year: 1957
? 12.Angry.Men.1957.mkv
: title: 12 Angry Men
year: 1957
? 123.Angry.Men.1957.mkv
: title: 123 Angry Men
year: 1957
? "Looney Tunes 1444x866 Porky's Last Stand.mkv"
: screenSize: 1444x866
title: Looney Tunes
@@ -31,10 +31,12 @@ keywords = yaml.load("""
? Xvid PROPER
: videoCodec: Xvid
other: PROPER
properCount: 1
? PROPER-Xvid
: videoCodec: Xvid
other: PROPER
properCount: 1
""")
@@ -19,6 +19,7 @@
#
from __future__ import absolute_import, division, print_function, unicode_literals
from guessit.containers import DefaultValidator
from guessit.plugins.transformers import Transformer
from guessit.matcher import GuessFinder
@@ -41,10 +42,9 @@ class GuessDate(Transformer):
@staticmethod
def guess_date(string, node=None, options=None):
date, span = search_date(string, options.get('date_year_first') if options else False, options.get('date_day_first') if options else False)
if date:
if date and span and DefaultValidator.validate_string(string, span): # ensure we have a separator before and after date
return {'date': date}, span
else:
return None, None
return None, None
def process(self, mtree, options=None):
GuessFinder(self.guess_date, 1.0, self.log, options).process_nodes(mtree.unidentified_leaves())
@@ -24,6 +24,8 @@ from guessit.plugins.transformers import Transformer, get_transformer
from guessit.textutils import reorder_title
from guessit.matcher import found_property
from guessit.patterns.list import all_separators
from guessit.language import all_lang_prefixes_suffixes
class GuessEpisodeInfoFromPosition(Transformer):
@@ -33,39 +35,49 @@ class GuessEpisodeInfoFromPosition(Transformer):
def supported_properties(self):
return ['title', 'series']
def match_from_epnum_position(self, mtree, node, options):
epnum_idx = node.node_idx
@staticmethod
def excluded_word(*values):
for value in values:
if value.clean_value.lower() in (all_separators + all_lang_prefixes_suffixes):
return True
return False
def match_from_epnum_position(self, path_node, ep_node, options):
epnum_idx = ep_node.node_idx
# a few helper functions to be able to filter using high-level semantics
def before_epnum_in_same_pathgroup():
return [leaf for leaf in mtree.unidentified_leaves(lambda x: len(x.clean_value) > 1)
return [leaf for leaf in path_node.unidentified_leaves(lambda x: len(x.clean_value) > 1)
if (leaf.node_idx[0] == epnum_idx[0] and
leaf.node_idx[1:] < epnum_idx[1:])]
leaf.node_idx[1:] < epnum_idx[1:] and
not GuessEpisodeInfoFromPosition.excluded_word(leaf))]
def after_epnum_in_same_pathgroup():
return [leaf for leaf in mtree.unidentified_leaves(lambda x: len(x.clean_value) > 1)
return [leaf for leaf in path_node.unidentified_leaves(lambda x: len(x.clean_value) > 1)
if (leaf.node_idx[0] == epnum_idx[0] and
leaf.node_idx[1:] > epnum_idx[1:])]
leaf.node_idx[1:] > epnum_idx[1:] and
not GuessEpisodeInfoFromPosition.excluded_word(leaf))]
def after_epnum_in_same_explicitgroup():
return [leaf for leaf in mtree.unidentified_leaves(lambda x: len(x.clean_value) > 1)
return [leaf for leaf in path_node.unidentified_leaves(lambda x: len(x.clean_value) > 1)
if (leaf.node_idx[:2] == epnum_idx[:2] and
leaf.node_idx[2:] > epnum_idx[2:])]
leaf.node_idx[2:] > epnum_idx[2:] and
not GuessEpisodeInfoFromPosition.excluded_word(leaf))]
# epnumber is the first group and there are only 2 after it in same
# path group
# -> series title - episode title
title_candidates = self._filter_candidates(after_epnum_in_same_pathgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_pathgroup(), options)
if ('title' not in mtree.info and # no title
'series' in mtree.info and # series present
if ('title' not in path_node.info and # no title
'series' in path_node.info and # series present
before_epnum_in_same_pathgroup() == [] and # no groups before
len(title_candidates) == 1): # only 1 group after
found_property(title_candidates[0], 'title', confidence=0.4)
return
if ('title' not in mtree.info and # no title
if ('title' not in path_node.info and # no title
before_epnum_in_same_pathgroup() == [] and # no groups before
len(title_candidates) == 2): # only 2 groups after
@@ -77,17 +89,17 @@ class GuessEpisodeInfoFromPosition(Transformer):
# probably the series name
series_candidates = before_epnum_in_same_pathgroup()
if len(series_candidates) >= 1:
found_property(series_candidates[0], 'series', confidence=0.7)
found_property(series_candidates[0], 'series', confidence=0.7)
# only 1 group after (in the same path group) and it's probably the
# episode title.
title_candidates = self._filter_candidates(after_epnum_in_same_pathgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_pathgroup(), options)
if len(title_candidates) == 1:
found_property(title_candidates[0], 'title', confidence=0.5)
return
else:
# try in the same explicit group, with lower confidence
title_candidates = self._filter_candidates(after_epnum_in_same_explicitgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_explicitgroup(), options)
if len(title_candidates) == 1:
found_property(title_candidates[0], 'title', confidence=0.4)
return
@@ -96,7 +108,7 @@ class GuessEpisodeInfoFromPosition(Transformer):
return
# get the one with the longest value
title_candidates = self._filter_candidates(after_epnum_in_same_pathgroup(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(after_epnum_in_same_pathgroup(), options)
if title_candidates:
maxidx = -1
maxv = -1
@@ -104,7 +116,8 @@ class GuessEpisodeInfoFromPosition(Transformer):
if len(c.clean_value) > maxv:
maxidx = i
maxv = len(c.clean_value)
found_property(title_candidates[maxidx], 'title', confidence=0.3)
if maxidx > -1:
found_property(title_candidates[maxidx], 'title', confidence=0.3)
def should_process(self, mtree, options=None):
options = options or {}
@@ -114,9 +127,9 @@ class GuessEpisodeInfoFromPosition(Transformer):
def _filter_candidates(candidates, options):
episode_details_transformer = get_transformer('guess_episode_details')
if episode_details_transformer:
return [n for n in candidates if not episode_details_transformer.container.find_properties(n.value, n, options, re_match=True)]
else:
return candidates
candidates = [n for n in candidates if not episode_details_transformer.container.find_properties(n.value, n, options, re_match=True)]
candidates = list(filter(lambda n: not GuessEpisodeInfoFromPosition.excluded_word(n), candidates))
return candidates
def process(self, mtree, options=None):
"""
@@ -128,15 +141,26 @@ class GuessEpisodeInfoFromPosition(Transformer):
if not eps:
eps = [node for node in mtree.leaves() if 'date' in node.guess]
eps = sorted(eps, key=lambda ep: -ep.guess.confidence())
if eps:
self.match_from_epnum_position(mtree, eps[0], options)
performed_path_nodes = []
for ep_node in eps:
# Perform only first episode node for each path node
path_node = [node for node in ep_node.ancestors if node.category == 'path']
if len(path_node) > 0:
path_node = path_node[0]
else:
path_node = ep_node.root
if path_node not in performed_path_nodes:
self.match_from_epnum_position(path_node, ep_node, options)
performed_path_nodes.append(path_node)
else:
# if we don't have the episode number, but at least 2 groups in the
# basename, then it's probably series - eptitle
basename = mtree.node_at((-2,))
basename = list(filter(lambda x: x.category == 'path', mtree.nodes()))[-2]
title_candidates = self._filter_candidates(basename.unidentified_leaves(), options)
title_candidates = GuessEpisodeInfoFromPosition._filter_candidates(basename.unidentified_leaves(), options)
if len(title_candidates) >= 2 and 'series' not in mtree.info:
found_property(title_candidates[0], 'series', confidence=0.4)
@@ -147,12 +171,13 @@ class GuessEpisodeInfoFromPosition(Transformer):
# if we only have 1 remaining valid group in the folder containing the
# file, then it's likely that it is the series name
path_nodes = list(filter(lambda x: x.category == 'path', mtree.nodes()))
try:
series_candidates = list(mtree.node_at((-3,)).unidentified_leaves())
except ValueError:
series_candidates = list(path_nodes[-3].unidentified_leaves())
except IndexError:
series_candidates = []
if len(series_candidates) == 1:
if len(series_candidates) == 1 and not GuessEpisodeInfoFromPosition.excluded_word(series_candidates[0]):
found_property(series_candidates[0], 'series', confidence=0.3)
# if there's a path group that only contains the season info, then the
@@ -163,7 +188,7 @@ class GuessEpisodeInfoFromPosition(Transformer):
if eps:
previous = [node for node in mtree.unidentified_leaves()
if node.node_idx[0] == eps[0].node_idx[0] - 1]
if len(previous) == 1:
if len(previous) == 1 and not GuessEpisodeInfoFromPosition.excluded_word(previous[0]):
found_property(previous[0], 'series', confidence=0.5)
# If we have found title without any serie name, replace it by the serie name.
@@ -21,6 +21,7 @@
from __future__ import absolute_import, division, print_function, unicode_literals
import re
from guessit.patterns.list import list_parser, all_separators_re
from guessit.plugins.transformers import Transformer
from guessit.matcher import GuessFinder
@@ -34,9 +35,8 @@ class GuessEpisodesRexps(Transformer):
def __init__(self):
Transformer.__init__(self, 20)
range_separators = ['-', 'to', 'a']
discrete_separators = ['&', 'and', 'et']
of_separators = ['of', 'sur', '/', '\\']
of_separators_re = re.compile(build_or_pattern(of_separators, escape=True), re.IGNORECASE)
season_words = ['seasons?', 'saisons?', 'series?']
episode_words = ['episodes?']
@@ -44,85 +44,14 @@ class GuessEpisodesRexps(Transformer):
season_markers = ['s']
episode_markers = ['e', 'ep']
discrete_sep = sep
for range_separator in range_separators:
discrete_sep = discrete_sep.replace(range_separator, '')
discrete_separators.append(discrete_sep)
all_separators = list(range_separators)
all_separators.extend(discrete_separators)
self.container = PropertiesContainer(enhance=False, canonical_from_pattern=False)
range_separators_re = re.compile(build_or_pattern(range_separators), re.IGNORECASE)
discrete_separators_re = re.compile(build_or_pattern(discrete_separators), re.IGNORECASE)
all_separators_re = re.compile(build_or_pattern(all_separators), re.IGNORECASE)
of_separators_re = re.compile(build_or_pattern(of_separators, escape=True), re.IGNORECASE)
season_words_re = re.compile(build_or_pattern(season_words), re.IGNORECASE)
episode_words_re = re.compile(build_or_pattern(episode_words), re.IGNORECASE)
season_markers_re = re.compile(build_or_pattern(season_markers), re.IGNORECASE)
episode_markers_re = re.compile(build_or_pattern(episode_markers), re.IGNORECASE)
def list_parser(value, property_list_name, discrete_separators_re=discrete_separators_re, range_separators_re=range_separators_re, allow_discrete=False, fill_gaps=False):
discrete_elements = filter(lambda x: x != '', discrete_separators_re.split(value))
discrete_elements = [x.strip() for x in discrete_elements]
proper_discrete_elements = []
i = 0
while i < len(discrete_elements):
if i < len(discrete_elements) - 2 and range_separators_re.match(discrete_elements[i+1]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i+1] + discrete_elements[i+2])
i += 3
else:
match = range_separators_re.search(discrete_elements[i])
if match and match.start() == 0:
proper_discrete_elements[i - 1] += discrete_elements[i]
elif match and match.end() == len(discrete_elements[i]):
proper_discrete_elements.append(discrete_elements[i] + discrete_elements[i + 1])
else:
proper_discrete_elements.append(discrete_elements[i])
i += 1
discrete_elements = proper_discrete_elements
ret = []
for discrete_element in discrete_elements:
range_values = filter(lambda x: x != '', range_separators_re.split(discrete_element))
range_values = [x.strip() for x in range_values]
if len(range_values) > 1:
for x in range(0, len(range_values) - 1):
start_range_ep = parse_numeral(range_values[x])
end_range_ep = parse_numeral(range_values[x+1])
for range_ep in range(start_range_ep, end_range_ep + 1):
if range_ep not in ret:
ret.append(range_ep)
else:
discrete_value = parse_numeral(discrete_element)
if discrete_value not in ret:
ret.append(discrete_value)
if len(ret) > 1:
if not allow_discrete:
valid_ret = list()
# replace discrete elements by ranges
valid_ret.append(ret[0])
for i in range(0, len(ret) - 1):
previous = valid_ret[len(valid_ret) - 1]
if ret[i+1] < previous:
pass
else:
valid_ret.append(ret[i+1])
ret = valid_ret
if fill_gaps:
ret = list(range(min(ret), max(ret) + 1))
if len(ret) > 1:
return {None: ret[0], property_list_name: ret}
if len(ret) > 0:
return ret[0]
return None
def episode_parser_x(value):
return list_parser(value, 'episodeList', discrete_separators_re=re.compile('x', re.IGNORECASE))
@@ -138,34 +67,40 @@ class GuessEpisodesRexps(Transformer):
class ResolutionCollisionValidator(object):
@staticmethod
def validate(prop, string, node, match, entry_start, entry_end):
return len(match.group(2)) < 3 # limit
# Invalidate when season or episode is more than 100.
try:
season_value = season_parser(match.group(2))
episode_value = episode_parser_x(match.group(3))
return season_value < 100 or episode_value < 100
except:
# This may occur for 1xAll or patterns like this.
return True
self.container.register_property(None, r'(' + season_words_re.pattern + sep + '?(?P<season>' + numeral + ')' + sep + '?' + season_words_re.pattern + '?)', confidence=1.0, formatter=parse_numeral)
self.container.register_property(None, r'(' + season_words_re.pattern + sep + '?(?P<season>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*)' + sep + '?' + season_words_re.pattern + '?)' + sep, confidence=1.0, formatter={None: parse_numeral, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), FormatterValidator('season', lambda x: len(x) > 1 if hasattr(x, '__len__') else False)))
self.container.register_property(None, r'(' + season_markers_re.pattern + '(?P<season>' + digital_numeral + ')[^0-9]?' + sep + '?(?P<episodeNumber>(?:e' + digital_numeral + '(?:' + sep + '?[e-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_e, 'season': season_parser}, validator=NoValidator())
# self.container.register_property(None, r'[^0-9]((?P<season>' + digital_numeral + ')[^0-9 .-]?-?(?P<episodeNumber>(?:x' + digital_numeral + '(?:' + sep + '?[x-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_x, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), ResolutionCollisionValidator()))
self.container.register_property(None, r'(' + season_markers_re.pattern + '(?P<season>' + digital_numeral + ')[^0-9]?' + sep + '?(?P<episodeNumber>(?:e' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser, 'season': season_parser}, validator=NoValidator())
self.container.register_property(None, sep + r'((?P<season>' + digital_numeral + ')' + sep + '' + '(?P<episodeNumber>(?:x' + sep + digital_numeral + '(?:' + sep + '[x-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_x, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), ResolutionCollisionValidator()))
self.container.register_property(None, r'((?P<season>' + digital_numeral + ')' + '(?P<episodeNumber>(?:x' + digital_numeral + '(?:[x-]' + digital_numeral + ')*)))', confidence=1.0, formatter={None: parse_numeral, 'episodeNumber': episode_parser_x, 'season': season_parser}, validator=ChainedValidator(DefaultValidator(), ResolutionCollisionValidator()))
self.container.register_property(None, r'(' + season_markers_re.pattern + '(?P<season>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*))', confidence=0.6, formatter={None: parse_numeral, 'season': season_parser}, validator=NoValidator())
self.container.register_property(None, r'((?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.6, formatter=parse_numeral)
self.container.register_property('version', sep + r'(V\d+)' + sep, confidence=0.6, formatter=parse_numeral, validator=NoValidator())
self.container.register_property(None, r'(ep' + sep + r'?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?)', confidence=0.7, formatter=parse_numeral)
self.container.register_property(None, r'(ep' + sep + r'?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.7, formatter=parse_numeral)
self.container.register_property(None, r'(' + episode_markers_re.pattern + '(?P<episodeNumber>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*))', confidence=0.6, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_words_re.pattern + sep + '?(?P<episodeNumber>' + digital_numeral + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + digital_numeral + ')*)' + sep + '?' + episode_words_re.pattern + '?)', confidence=0.8, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_markers_re.pattern + '(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.6, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_words_re.pattern + sep + '?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.8, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_markers_re.pattern + '(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.6, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property(None, r'(' + episode_words_re.pattern + sep + '?(?P<episodeNumber>' + digital_numeral + ')' + sep + '?v(?P<version>\d+))', confidence=0.8, formatter={None: parse_numeral, 'episodeNumber': episode_parser})
self.container.register_property('episodeNumber', r'^ ?(\d{2})' + sep, confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^ ?(\d{2})' + sep, confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^ ?0(\d{1,2})' + sep, confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', sep + r'(\d{2}) ?$', confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', sep + r'0(\d{1,2}) ?$', confidence=0.4, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^' + sep + '+(\d{2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2}' + ')*)' + sep, confidence=0.4, formatter=episode_parser)
self.container.register_property('episodeNumber', r'^' + sep + '+0(\d{1,2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '0\d{1,2}' + ')*)' + sep, confidence=0.4, formatter=episode_parser)
self.container.register_property('episodeNumber', sep + r'(\d{2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + r'\d{2}' + ')*)' + sep + '+$', confidence=0.4, formatter=episode_parser)
self.container.register_property('episodeNumber', sep + r'0(\d{1,2}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + r'0\d{1,2}' + ')*)' + sep + '+$', confidence=0.4, formatter=episode_parser)
self.container.register_property(None, r'((?P<episodeNumber>' + numeral + ')' + sep + '?' + of_separators_re.pattern + sep + '?(?P<episodeCount>' + numeral + ')(?:' + sep + '?(?:episodes?|eps?))?)', confidence=0.7, formatter=parse_numeral)
self.container.register_property(None, r'((?:episodes?|eps?)' + sep + '?(?P<episodeNumber>' + numeral + ')' + sep + '?' + of_separators_re.pattern + sep + '?(?P<episodeCount>' + numeral + '))', confidence=0.7, formatter=parse_numeral)
@@ -186,7 +121,29 @@ class GuessEpisodesRexps(Transformer):
def guess_episodes_rexps(self, string, node=None, options=None):
found = self.container.find_properties(string, node, options)
return self.container.as_guess(found, string)
guess = self.container.as_guess(found, string)
if guess and node:
if 'season' in guess and 'episodeNumber' in guess:
# If two guesses contains both season and episodeNumber in same group, create an episodeList
for existing_guess in node.group_node().guesses:
if 'season' in existing_guess and 'episodeNumber' in existing_guess:
if 'episodeList' not in existing_guess:
existing_guess['episodeList'] = [existing_guess['episodeNumber']]
existing_guess['episodeList'].append(guess['episodeNumber'])
existing_guess['episodeList'].sort()
if existing_guess['episodeNumber'] > guess['episodeNumber']:
existing_guess.set_confidence('episodeNumber', 0)
else:
guess.set_confidence('episodeNumber', 0)
guess['episodeList'] = list(existing_guess['episodeList'])
elif 'episodeNumber' in guess:
# If two guesses contains only episodeNumber in same group, remove the existing one.
for existing_guess in node.group_node().guesses:
if 'episodeNumber' in existing_guess:
for k, v in existing_guess.items():
if k in guess:
del guess[k]
return guess
def should_process(self, mtree, options=None):
return mtree.guess.get('type', '').startswith('episode')
@@ -156,6 +156,13 @@ class GuessFiletype(Transformer):
weak_episode_transformer = get_transformer('guess_weak_episodes_rexps')
if weak_episode_transformer:
found = weak_episode_transformer.container.find_properties(filename, mtree, options, 'episodeNumber')
guess = weak_episode_transformer.container.as_guess(found, filename)
if guess and (guess.raw('episodeNumber')[0] == '0' or guess['episodeNumber'] >= 10):
self.log.debug('Found characteristic property of episodes: %s"', guess)
upgrade_episode()
return filetype_container[0], other
found = properties_transformer.container.find_properties(filename, mtree, options, 'crc32')
guess = properties_transformer.container.as_guess(found, filename)
if guess:
@@ -217,7 +224,8 @@ class GuessFiletype(Transformer):
if mime is not None:
filetype_info.update({'mimetype': mime}, confidence=1.0)
node_ext = mtree.node_at((-1,))
# Retrieve the last node of category path (extension node)
node_ext = list(filter(lambda x: x.category == 'path', mtree.nodes()))[-1]
found_guess(node_ext, filetype_info)
if mtree.guess.get('type') in [None, 'unknown']:
@@ -226,12 +234,21 @@ class GuessFiletype(Transformer):
else:
raise TransformerException(__name__, 'Unknown file type')
def post_process(self, mtree, options=None):
# now look whether there are some specific hints for episode vs movie
# If we have a date and no year, this is a TV Show.
if 'date' in mtree.info and 'year' not in mtree.info and mtree.info.get('type') != 'episode':
mtree.guess['type'] = 'episode'
for type_leaves in mtree.leaves_containing('type'):
type_leaves.guess['type'] = 'episode'
for title_leaves in mtree.leaves_containing('title'):
title_leaves.guess.rename('title', 'series')
def second_pass_options(self, mtree, options=None):
if 'type' not in options or not options['type']:
if mtree.info.get('type') != 'episode':
# now look whether there are some specific hints for episode vs movie
# If we have a date and no year, this is a TV Show.
if 'date' in mtree.info and 'year' not in mtree.info:
return {'type': 'episode'}
if mtree.info.get('type') != 'movie':
# If we have a year, no season but raw episodeNumber is a number not starting with '0', this is a movie.
if 'year' in mtree.info and 'episodeNumber' in mtree.info and not 'season' in mtree.info:
try:
int(mtree.raw['episodeNumber'])
return {'type': 'movie'}
except ValueError:
pass
@@ -43,6 +43,12 @@ class GuessLanguage(Transformer):
allowed_languages = None
if options and 'allowed_languages' in options:
allowed_languages = options.get('allowed_languages')
directory = list(filter(lambda x: x.category == 'path', node.ancestors))[0]
if len(directory.clean_value) <= 3:
# skip if we have a langage code as directory
return None
guess = search_language(string, allowed_languages)
return guess
@@ -68,8 +74,10 @@ class GuessLanguage(Transformer):
title_ends = {}
for unidentified_node in mtree.unidentified_leaves():
unidentified_starts[unidentified_node.span[0]] = unidentified_node
unidentified_ends[unidentified_node.span[1]] = unidentified_node
if len(unidentified_node.clean_value) > 1:
# only consider unidentified leaves that have some meaningful content
unidentified_starts[unidentified_node.span[0]] = unidentified_node
unidentified_ends[unidentified_node.span[1]] = unidentified_node
for property_node in mtree.leaves_containing('year'):
property_starts[property_node.span[0]] = property_node
@@ -79,19 +87,20 @@ class GuessLanguage(Transformer):
title_starts[title_node.span[0]] = title_node
title_ends[title_node.span[1]] = title_node
return node.span[0] in title_ends.keys() and (node.span[1] in unidentified_starts.keys() or node.span[1] + 1 in property_starts.keys()) or\
node.span[1] in title_starts.keys() and (node.span[0] == node.group_node().span[0] or node.span[0] in unidentified_ends.keys() or node.span[0] in property_ends.keys())
return (node.span[0] in title_ends.keys() and (node.span[1] in unidentified_starts.keys() or
node.span[1] + 1 in property_starts.keys()) or
node.span[1] in title_starts.keys() and (node.span[0] == node.group_node().span[0] or
node.span[0] in unidentified_ends.keys() or
node.span[0] in property_ends.keys()))
def second_pass_options(self, mtree, options=None):
m = mtree.matched()
to_skip_language_nodes = []
to_skip_langs = set()
for lang_key in ('language', 'subtitleLanguage'):
langs = {}
lang_nodes = set(mtree.leaves_containing(lang_key))
for lang_node in lang_nodes:
lang = lang_node.guess.get(lang_key, None)
if self._skip_language_on_second_pass(mtree, lang_node):
# Language probably split the title. Add to skip for 2nd pass.
@@ -99,38 +108,19 @@ class GuessLanguage(Transformer):
# the extension, then it is likely a subtitle language
parts = mtree.clean_string(lang_node.root.value).split()
if m.get('type') in ['moviesubtitle', 'episodesubtitle']:
if lang_node.value in parts and \
(parts.index(lang_node.value) == len(parts) - 2):
if (lang_node.value in parts and parts.index(lang_node.value) == len(parts) - 2):
continue
to_skip_language_nodes.append(lang_node)
elif lang not in langs:
langs[lang] = lang_node
else:
# The same language was found. Keep the more confident one,
# and add others to skip for 2nd pass.
existing_lang_node = langs[lang]
to_skip = None
if (existing_lang_node.guess.confidence('language') >=
lang_node.guess.confidence('language')):
# lang_node is to remove
to_skip = lang_node
else:
# existing_lang_node is to remove
langs[lang] = lang_node
to_skip = existing_lang_node
to_skip_language_nodes.append(to_skip)
if to_skip_language_nodes:
to_skip_langs.add(lang_node.value)
if to_skip_langs:
# Also skip same value nodes
skipped_values = [skip_node.value for skip_node in to_skip_language_nodes]
lang_nodes = (set(mtree.leaves_containing('language')) |
set(mtree.leaves_containing('subtitleLanguage')))
for lang_key in ('language', 'subtitleLanguage'):
lang_nodes = set(mtree.leaves_containing(lang_key))
to_skip = [node for node in lang_nodes if node.value in to_skip_langs]
return {'skip_nodes': to_skip}
for lang_node in lang_nodes:
if lang_node not in to_skip_language_nodes and lang_node.value in skipped_values:
to_skip_language_nodes.append(lang_node)
return {'skip_nodes': to_skip_language_nodes}
return None
def should_process(self, mtree, options=None):
@@ -149,6 +139,8 @@ class GuessLanguage(Transformer):
def post_process(self, mtree, options=None):
# 1- try to promote language to subtitle language where it makes sense
prefixes = []
for node in mtree.nodes():
if 'language' not in node.guess:
continue
@@ -157,7 +149,8 @@ class GuessLanguage(Transformer):
# the group is the last group of the filename, it is probably the
# language of the subtitle
# (eg: 'xxx.english.srt')
if (mtree.node_at((-1,)).value.lower() in subtitle_exts and
ext_node = list(filter(lambda x: x.category == 'path', mtree.nodes()))[-1]
if (ext_node.value.lower() in subtitle_exts and
node == list(mtree.leaves())[-2]):
self.promote_subtitle(node)
@@ -171,11 +164,7 @@ class GuessLanguage(Transformer):
for sub_prefix in subtitle_prefixes:
if (sub_prefix in find_words(group_str) and
0 <= group_str.find(sub_prefix) < (node.span[0] - explicit_group.span[0])):
self.promote_subtitle(node)
for sub_suffix in subtitle_suffixes:
if (sub_suffix in find_words(group_str) and
(node.span[0] - explicit_group.span[0]) < group_str.find(sub_suffix)):
prefixes.append((explicit_group, sub_prefix))
self.promote_subtitle(node)
# - if a language is in an explicit group just preceded by "st",
@@ -187,3 +176,21 @@ class GuessLanguage(Transformer):
self.promote_subtitle(node)
except IndexError:
pass
for node in mtree.nodes():
if 'language' not in node.guess:
continue
explicit_group = mtree.node_at(node.node_idx[:2])
group_str = explicit_group.value.lower()
for sub_suffix in subtitle_suffixes:
if (sub_suffix in find_words(group_str) and
(node.span[0] - explicit_group.span[0]) < group_str.find(sub_suffix)):
is_a_prefix = False
for prefix in prefixes:
if prefix[0] == explicit_group and group_str.find(prefix[1]) == group_str.find(sub_suffix):
is_a_prefix = True
break
if not is_a_prefix:
self.promote_subtitle(node)
@@ -23,6 +23,8 @@ from __future__ import absolute_import, division, print_function, unicode_litera
from guessit.plugins.transformers import Transformer
from guessit.matcher import found_property
from guessit import u
from guessit.patterns.list import all_separators
from guessit.language import all_lang_prefixes_suffixes
class GuessMovieTitleFromPosition(Transformer):
@@ -36,6 +38,13 @@ class GuessMovieTitleFromPosition(Transformer):
options = options or {}
return not options.get('skip_title') and not mtree.guess.get('type', '').startswith('episode')
@staticmethod
def excluded_word(*values):
for value in values:
if value.clean_value.lower() in all_separators + all_lang_prefixes_suffixes:
return True
return False
def process(self, mtree, options=None):
"""
try to identify the remaining unknown groups by looking at their
@@ -44,14 +53,16 @@ class GuessMovieTitleFromPosition(Transformer):
if 'title' in mtree.info:
return
basename = mtree.node_at((-2,))
path_nodes = list(filter(lambda x: x.category == 'path', mtree.nodes()))
basename = path_nodes[-2]
all_valid = lambda leaf: len(leaf.clean_value) > 0
basename_leftover = list(basename.unidentified_leaves(valid=all_valid))
try:
folder = mtree.node_at((-3,))
folder = path_nodes[-3]
folder_leftover = list(folder.unidentified_leaves())
except ValueError:
except IndexError:
folder = None
folder_leftover = []
@@ -61,7 +72,9 @@ class GuessMovieTitleFromPosition(Transformer):
# specific cases:
# if we find the same group both in the folder name and the filename,
# it's a good candidate for title
if folder_leftover and basename_leftover and folder_leftover[0].clean_value == basename_leftover[0].clean_value:
if (folder_leftover and basename_leftover and
folder_leftover[0].clean_value == basename_leftover[0].clean_value and
not GuessMovieTitleFromPosition.excluded_word(folder_leftover[0])):
found_property(folder_leftover[0], 'title', confidence=0.8)
return
@@ -89,7 +102,8 @@ class GuessMovieTitleFromPosition(Transformer):
if (series.clean_value != title.clean_value and
series.clean_value != film_number.clean_value and
basename_leaves.index(film_number) == 0 and
basename_leaves.index(title) == 1):
basename_leaves.index(title) == 1 and
not GuessMovieTitleFromPosition.excluded_word(title, series)):
found_property(title, 'title', confidence=0.6)
found_property(series, 'filmSeries', confidence=0.6)
@@ -103,8 +117,9 @@ class GuessMovieTitleFromPosition(Transformer):
if groups_before:
try:
node = next(groups_before)
found_property(node, 'title', confidence=0.8)
return
if not GuessMovieTitleFromPosition.excluded_word(node):
found_property(node, 'title', confidence=0.8)
return
except StopIteration:
pass
@@ -125,8 +140,10 @@ class GuessMovieTitleFromPosition(Transformer):
# if they're all in the same group, take leftover info from there
leftover = mtree.node_at((group_idx,)).unidentified_leaves()
try:
found_property(next(leftover), 'title', confidence=0.7)
return
node = next(leftover)
if not GuessMovieTitleFromPosition.excluded_word(node):
found_property(node, 'title', confidence=0.7)
return
except StopIteration:
pass
@@ -138,7 +155,8 @@ class GuessMovieTitleFromPosition(Transformer):
# ex: Movies/Alice in Wonderland DVDRip.XviD-DiAMOND/dmd-aw.avi
# ex: Movies/Somewhere.2010.DVDRip.XviD-iLG/i-smwhr.avi <-- TODO: gets caught here?
if (basename_leftover[0].clean_value.count(' ') == 0 and
folder_leftover and folder_leftover[0].clean_value.count(' ') >= 2):
folder_leftover and folder_leftover[0].clean_value.count(' ') >= 2 and
not GuessMovieTitleFromPosition.excluded_word(folder_leftover[0])):
found_property(folder_leftover[0], 'title', confidence=0.7)
return
@@ -148,26 +166,28 @@ class GuessMovieTitleFromPosition(Transformer):
# ex: Movies/[阿维达].Avida.2006.FRENCH.DVDRiP.XViD-PROD.avi
if basename_leftover[0].is_explicit():
for basename_leftover_elt in basename_leftover:
if not basename_leftover_elt.is_explicit():
if not basename_leftover_elt.is_explicit() and not GuessMovieTitleFromPosition.excluded_word(basename_leftover_elt):
found_property(basename_leftover_elt, 'title', confidence=0.8)
return
# if all else fails, take the first remaining unidentified group in the
# basename as title
found_property(basename_leftover[0], 'title', confidence=0.6)
return
if not GuessMovieTitleFromPosition.excluded_word(basename_leftover[0]):
found_property(basename_leftover[0], 'title', confidence=0.6)
return
# if there are no leftover groups in the basename, look in the folder name
if folder_leftover:
if folder_leftover and not GuessMovieTitleFromPosition.excluded_word(folder_leftover[0]):
found_property(folder_leftover[0], 'title', confidence=0.5)
return
# if nothing worked, look if we have a very small group at the beginning
# of the basename
basename = mtree.node_at((-2,))
basename_leftover = basename.unidentified_leaves(valid=lambda leaf: True)
try:
found_property(next(basename_leftover), 'title', confidence=0.4)
return
node = next(basename_leftover)
if not GuessMovieTitleFromPosition.excluded_word(node):
found_property(node, 'title', confidence=0.4)
return
except StopIteration:
pass
@@ -22,7 +22,7 @@ from __future__ import absolute_import, division, print_function, unicode_litera
import re
from guessit.containers import PropertiesContainer, WeakValidator, LeavesValidator, QualitiesContainer, ChainedValidator, DefaultValidator, OnlyOneValidator, LeftValidator, NeighborValidator
from guessit.containers import PropertiesContainer, WeakValidator, LeavesValidator, QualitiesContainer, ChainedValidator, DefaultValidator, OnlyOneValidator, LeftValidator, NeighborValidator, FullMatchValidator
from guessit.patterns import sep, build_or_pattern
from guessit.patterns.extension import subtitle_exts, video_exts, info_exts
from guessit.patterns.numeral import numeral, parse_numeral
@@ -61,7 +61,6 @@ class GuessProperties(Transformer):
for canonical_form, quality in quality_dict.items():
self.qualities.register_quality(propname, canonical_form, quality)
register_property('container', {'mp4': ['MP4']})
# http://en.wikipedia.org/wiki/Pirated_movie_release_types
register_property('format', {'VHS': ['VHS', 'VHS-Rip'],
@@ -74,11 +73,11 @@ class GuessProperties(Transformer):
'TV': ['SD-TV', 'SD-TV-Rip', 'Rip-SD-TV', 'TV-Rip', 'Rip-TV'],
'DVB': ['DVB-Rip', 'DVB', 'PD-TV'],
'DVD': ['DVD', 'DVD-Rip', 'VIDEO-TS', 'DVD-R', 'DVD-9', 'DVD-5'],
'HDTV': ['HD-TV', 'TV-RIP-HD', 'HD-TV-RIP'],
'HDTV': ['HD-TV', 'TV-RIP-HD', 'HD-TV-RIP', 'HD-RIP'],
'VOD': ['VOD', 'VOD-Rip'],
'WEBRip': ['WEB-Rip'],
'WEB-DL': ['WEB-DL', 'WEB-HD', 'WEB'],
'HD-DVD': ['HD-(?:DVD)?-Rip', 'HD-DVD'],
'HD-DVD': ['HD-DVD-Rip', 'HD-DVD'],
'BluRay': ['Blu-ray(?:-Rip)?', 'B[DR]', 'B[DR]-Rip', 'BD[59]', 'BD25', 'BD50']
})
@@ -112,32 +111,13 @@ class GuessProperties(Transformer):
},
validator=ChainedValidator(DefaultValidator(), OnlyOneValidator()))
class ResolutionValidator(object):
"""Make sure our match is surrounded by separators, or by another entry"""
@staticmethod
def validate(prop, string, node, match, entry_start, entry_end):
"""
span = _get_span(prop, match)
span = _trim_span(span, string[span[0]:span[1]])
start, end = span
sep_start = start <= 0 or string[start - 1] in sep
sep_end = end >= len(string) or string[end] in sep
start_by_other = start in entry_end
end_by_other = end in entry_start
if (sep_start or start_by_other) and (sep_end or end_by_other):
return True
return False
"""
return True
_digits_re = re.compile('\d+')
def resolution_formatter(value):
digits = _digits_re.findall(value)
return 'x'.join(digits)
self.container.register_property('screenSize', '\d{3,4}-?[x\*]-?\d{3,4}', canonical_from_pattern=False, formatter=resolution_formatter, validator=ChainedValidator(DefaultValidator(), ResolutionValidator()))
self.container.register_property('screenSize', '\d{3,4}-?[x\*]-?\d{3,4}', canonical_from_pattern=False, formatter=resolution_formatter)
register_quality('screenSize', {'360p': -300,
'368p': -200,
@@ -239,8 +219,8 @@ class GuessProperties(Transformer):
self.container.register_property('crc32', '(?:[a-fA-F]|[0-9]){8}', enhance=False, canonical_from_pattern=False)
weak_episode_words = ['pt', 'part']
self.container.register_property(None, '(' + build_or_pattern(weak_episode_words) + sep + '?(?P<part>' + numeral + '))[^0-9]', enhance=False, canonical_from_pattern=False, confidence=0.4, formatter=parse_numeral)
part_words = ['pt', 'part']
self.container.register_property(None, '(' + build_or_pattern(part_words) + sep + '?(?P<part>' + numeral + '))[^0-9]', enhance=False, canonical_from_pattern=False, confidence=0.4, formatter=parse_numeral)
register_property('other', {'AudioFix': ['Audio-Fix', 'Audio-Fixed'],
'SyncFix': ['Sync-Fix', 'Sync-Fixed'],
@@ -249,13 +229,15 @@ class GuessProperties(Transformer):
'Netflix': ['Netflix', 'NF']
})
self.container.register_property('other', 'Real', 'Fix', canonical_form='Proper', validator=NeighborValidator())
self.container.register_property('other', 'Real', 'Fix', canonical_form='Proper', validator=ChainedValidator(FullMatchValidator(), NeighborValidator()))
self.container.register_property('other', 'Proper', 'Repack', 'Rerip', canonical_form='Proper')
self.container.register_property('other', 'Fansub', canonical_form='Fansub')
self.container.register_property('other', 'Fastsub', canonical_form='Fastsub')
self.container.register_property('other', 'Fansub', canonical_form='Fansub', validator=ChainedValidator(FullMatchValidator(), NeighborValidator()))
self.container.register_property('other', 'Fastsub', canonical_form='Fastsub', validator=ChainedValidator(FullMatchValidator(), NeighborValidator()))
self.container.register_property('other', '(?:Seasons?' + sep + '?)?Complete', canonical_form='Complete')
self.container.register_property('other', 'R5', 'RC', canonical_form='R5')
self.container.register_property('other', 'Pre-Air', 'Preair', canonical_form='Preair')
self.container.register_property('other', 'CC') # Close Caption
self.container.register_property('other', 'LD', 'MD') # Line/Mic Dubbed
self.container.register_canonical_properties('other', 'Screener', 'Remux', '3D', 'HD', 'mHD', 'HDLight', 'HQ',
'DDC',
@@ -271,10 +253,29 @@ class GuessProperties(Transformer):
def guess_properties(self, string, node=None, options=None):
found = self.container.find_properties(string, node, options)
return self.container.as_guess(found, string)
guess = self.container.as_guess(found, string)
if guess and node:
if 'part' in guess:
# If two guesses contains both part in same group, create an partList
for existing_guess in node.group_node().guesses:
if 'part' in existing_guess:
if 'partList' not in existing_guess:
existing_guess['partList'] = [existing_guess['part']]
existing_guess['partList'].append(guess['part'])
existing_guess['partList'].sort()
if existing_guess['part'] > guess['part']:
existing_guess.set_confidence('part', 0)
else:
guess.set_confidence('part', 0)
guess['partList'] = list(existing_guess['partList'])
return guess
def supported_properties(self):
return self.container.get_supported_properties()
supported_properties = list(self.container.get_supported_properties())
supported_properties.append('partList')
return supported_properties
def process(self, mtree, options=None):
GuessFinder(self.guess_properties, 1.0, self.log, options).process_nodes(mtree.unidentified_leaves())
@@ -93,8 +93,12 @@ class GuessReleaseGroup(Transformer):
return False
if self.re_sep.match(val[-1]):
val = val[:len(val)-1]
if not val:
return False
if self.re_sep.match(val[0]):
val = val[1:]
if not val:
return False
guess['releaseGroup'] = val
forbidden = False
for forbidden_lambda in self._forbidden_groupname_lambda:
@@ -21,6 +21,7 @@
from __future__ import absolute_import, division, print_function, unicode_literals
import re
from guessit.patterns.list import list_parser, all_separators_re
from guessit.plugins.transformers import Transformer
@@ -38,11 +39,14 @@ class GuessWeakEpisodesRexps(Transformer):
of_separators = ['of', 'sur', '/', '\\']
of_separators_re = re.compile(build_or_pattern(of_separators, escape=True), re.IGNORECASE)
self.container = PropertiesContainer(enhance=False, canonical_from_pattern=False)
self.container = PropertiesContainer(enhance=False, canonical_from_pattern=False, remove_duplicates=True)
episode_words = ['episodes?']
def _formater(episode_number):
def episode_list_parser(value):
return list_parser(value, 'episodeList')
def season_episode_parser(episode_number):
epnum = parse_numeral(episode_number)
if not valid_year(epnum):
if epnum > 100:
@@ -55,24 +59,46 @@ class GuessWeakEpisodesRexps(Transformer):
else:
return epnum
self.container.register_property(['episodeNumber', 'season'], '[0-9]{2,4}', confidence=0.6, formatter=_formater, disabler=lambda options: options.get('episode_prefer_number') if options else False)
self.container.register_property(['episodeNumber', 'season'], '[0-9]{4}', confidence=0.6, formatter=_formater)
self.container.register_property('episodeNumber', '[^0-9](\d{1,3})', confidence=0.6, formatter=parse_numeral, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property(['episodeNumber', 'season'], '[0-9]{2,4}', confidence=0.6, formatter=season_episode_parser, disabler=lambda options: options.get('episode_prefer_number') if options else False)
self.container.register_property(['episodeNumber', 'season'], '[0-9]{4}', confidence=0.6, formatter=season_episode_parser)
self.container.register_property(None, '(' + build_or_pattern(episode_words) + sep + '?(?P<episodeNumber>' + numeral + '))[^0-9]', confidence=0.4, formatter=parse_numeral)
self.container.register_property(None, r'(?P<episodeNumber>' + numeral + ')' + sep + '?' + of_separators_re.pattern + sep + '?(?P<episodeCount>' + numeral +')', confidence=0.6, formatter=parse_numeral)
self.container.register_property('episodeNumber', r'^' + sep + '?(\d{1,3})' + sep, confidence=0.4, formatter=parse_numeral, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', sep + r'(\d{1,3})' + sep + '?$', confidence=0.4, formatter=parse_numeral, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', '[^0-9](\d{2,3}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2,3}' + ')*)', confidence=0.4, formatter=episode_list_parser, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', r'^' + sep + '?(\d{2,3}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2,3}' + ')*)' + sep, confidence=0.4, formatter=episode_list_parser, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
self.container.register_property('episodeNumber', sep + r'(\d{2,3}' + '(?:' + sep + '?' + all_separators_re.pattern + sep + '?' + '\d{2,3}' + ')*)' + sep + '?$', confidence=0.4, formatter=episode_list_parser, disabler=lambda options: not options.get('episode_prefer_number') if options else True)
def supported_properties(self):
return self.container.get_supported_properties()
def guess_weak_episodes_rexps(self, string, node=None, options=None):
if node and 'episodeNumber' in node.root.info:
return None
properties = self.container.find_properties(string, node, options)
guess = self.container.as_guess(properties, string)
if node and guess:
if 'episodeNumber' in guess and 'season' in guess:
existing_guesses = list(filter(lambda x: 'season' in x and 'episodeNumber' in x, node.group_node().guesses))
if existing_guesses:
return None
elif 'episodeNumber' in guess:
# If we only have episodeNumber in the guess, and another node contains both season and episodeNumber
# keep only the second.
safe_guesses = list(filter(lambda x: 'season' in x and 'episodeNumber' in x, node.group_node().guesses))
if safe_guesses:
return None
else:
# If we have other nodes containing episodeNumber, create an episodeList.
existing_guesses = list(filter(lambda x: 'season' not in x and 'episodeNumber' in x, node.group_node().guesses))
for existing_guess in existing_guesses:
if 'episodeList' not in existing_guess:
existing_guess['episodeList'] = [existing_guess['episodeNumber']]
existing_guess['episodeList'].append(guess['episodeNumber'])
existing_guess['episodeList'].sort()
if existing_guess['episodeNumber'] > guess['episodeNumber']:
existing_guess.set_confidence('episodeNumber', 0)
else:
guess.set_confidence('episodeNumber', 0)
guess['episodeList'] = list(existing_guess['episodeList'])
return guess
def should_process(self, mtree, options=None):
@@ -42,8 +42,13 @@ class GuessYear(Transformer):
def second_pass_options(self, mtree, options=None):
year_nodes = list(mtree.leaves_containing('year'))
if len(year_nodes) > 1:
return {'skip_nodes': year_nodes[:len(year_nodes) - 1]}
# if we found a year, let's try by ignoring all instances of that year
# as a candidate, let's take the one that appears last in the filename
if year_nodes:
year_candidate = year_nodes[-1].guess['year']
year_nodes = [year for year in year_nodes if year.guess['year'] != year_candidate]
if year_nodes:
return {'skip_nodes': year_nodes}
return None
def process(self, mtree, options=None):
@@ -37,7 +37,7 @@ class SplitExplicitGroups(Transformer):
:return: return the string split into explicit groups, that is, those either
between parenthese, square brackets or curly braces, and those separated
by a dash."""
for c in mtree.children:
for c in mtree.unidentified_leaves():
groups = find_first_level_groups(c.value, group_delimiters[0])
for delimiters in group_delimiters:
flatten = lambda l, x: l + find_first_level_groups(x, delimiters)
@@ -47,4 +47,24 @@ class SplitExplicitGroups(Transformer):
# patterns, such as dates, etc...
# groups = functools.reduce(lambda l, x: l + x.split('-'), groups, [])
c.split_on_components(groups)
c.split_on_components(groups, category='explicit')
def post_process(self, mtree, options=None):
"""
Decrease confidence for properties found in explicit groups.
:param mtree:
:param options:
:return:
"""
if not options.get('name_only'):
explicit_nodes = [node for node in mtree.nodes() if node.category == 'explicit' and node.is_explicit()]
for explicit_node in explicit_nodes:
self.alter_confidence(explicit_node, 0.5)
def alter_confidence(self, node, factor):
for guess in node.guesses:
for k in guess.keys():
confidence = guess.confidence(k)
guess.set_confidence(k, confidence * factor)
@@ -45,4 +45,4 @@ class SplitOnDash(Transformer):
match = pattern.search(node.value, span[1])
if indices:
node.partition(indices)
node.partition(indices, category='dash')
@@ -41,6 +41,32 @@ class SplitPathComponents(Transformer):
components += list(splitext(basename))
components[-1] = components[-1][1:] # remove the '.' from the extension
mtree.split_on_components(components)
mtree.split_on_components(components, category='path')
else:
mtree.split_on_components([mtree.value, ''])
mtree.split_on_components([mtree.value, ''], category='path')
def post_process(self, mtree, options=None):
"""
Decrease confidence for properties found in directories, filename should always have priority.
:param mtree:
:param options:
:return:
"""
if not options.get('name_only'):
path_nodes = [node for node in mtree.nodes() if node.category == 'path']
for path_node in path_nodes[:-2]:
self.alter_confidence(path_node, 0.3)
try:
last_directory_node = path_nodes[-2]
self.alter_confidence(last_directory_node, 0.6)
except IndexError:
pass
def alter_confidence(self, node, factor):
for guess in node.guesses:
for k in guess.keys():
confidence = guess.confidence(k)
guess.set_confidence(k, confidence * factor)
@@ -249,9 +249,9 @@ def search_external_subtitles(path):
subtitles = {}
for p in os.listdir(dirpath):
# skip badly encoded filenames
#if isinstance(p, bytes): # pragma: no cover
# logger.error('Skipping badly encoded filename %r in %r', p.decode('utf-8', errors='replace'), dirpath)
# continue
if isinstance(p, bytes): # pragma: no cover
logger.error('Skipping badly encoded filename %r in %r', p.decode('utf-8', errors='replace'), dirpath)
continue
# keep only valid subtitle filenames
if not p.startswith(fileroot) or not p.endswith(SUBTITLE_EXTENSIONS):
@@ -1,16 +1,26 @@
# coding=utf-8
from .patch_provider_pool import PatchedProviderPool
from .patch_providers import PatchedAddic7edProvider
import subliminal
import babelfish
from .patch_provider_pool import PatchedProviderPool
from .patch_video import patched_search_external_subtitles
from .patch_providers import addic7ed, podnapisi, tvsubtitles, opensubtitles
# patch subliminal's ProviderPool
subliminal.api.ProviderPool = PatchedProviderPool
# patch subliminal's Addic7edProvider
subliminal.providers.addic7ed.Addic7edProvider = PatchedAddic7edProvider
# patch subliminal's providers
subliminal.providers.addic7ed.Addic7edProvider = addic7ed.PatchedAddic7edProvider
subliminal.providers.podnapisi.PodnapisiProvider = podnapisi.PatchedPodnapisiProvider
subliminal.providers.tvsubtitles.TVsubtitlesProvider = tvsubtitles.PatchedTVsubtitlesProvider
subliminal.providers.opensubtitles.OpenSubtitlesProvider = opensubtitles.PatchedOpenSubtitlesProvider
# add language converters
babelfish.language_converters.register('addic7ed = subliminal_patch.patch_language:PatchedAddic7edConverter')
babelfish.language_converters.register('tvsubtitles = subliminal.converters.tvsubtitles:TVsubtitlesConverter')
# patch subliminal's external subtitles search algorithm
subliminal.video.search_external_subtitles = patched_search_external_subtitles
@@ -4,14 +4,20 @@ import logging
import traceback
import requests
import socket
import operator
import time
from babelfish.exceptions import LanguageReverseError
from pkg_resources import EntryPoint, iter_entry_points
from subliminal.api import ProviderPool
from subliminal.api import ProviderPool, compute_score
logger = logging.getLogger(__name__)
DOWNLOAD_TRIES = 0
DOWNLOAD_RETRY_SLEEP = 2
class OldToNewProvider(object):
"""
Simple proxy class to support the .plugin property which would normally exist
@@ -182,3 +188,101 @@ class PatchedProviderPool(ProviderPool):
subtitles.extend(provider_subtitles)
return subtitles
def download_subtitle(self, subtitle):
"""Download `subtitle`'s :attr:`~subliminal.subtitle.Subtitle.content`.
:param subtitle: subtitle to download.
:type subtitle: :class:`~subliminal.subtitle.Subtitle`
:return: `True` if the subtitle has been successfully downloaded, `False` otherwise.
:rtype: bool
"""
# check discarded providers
if subtitle.provider_name in self.discarded_providers:
logger.warning('Provider %r is discarded', subtitle.provider_name)
return False
logger.info('Downloading subtitle %r', subtitle)
tries = 0
# retry downloading on failure until settings' download retry limit hit
while True:
tries += 1
try:
self[subtitle.provider_name].download_subtitle(subtitle)
except (requests.Timeout, socket.timeout):
logger.error('Provider %r timed out', subtitle.provider_name)
except:
logger.exception('Unexpected error in provider %r, Traceback: %s', subtitle.provider_name, traceback.format_exc())
else:
break
if tries == DOWNLOAD_TRIES:
self.discarded_providers.add(subtitle.provider_name)
logger.error('Maximum retries reached for provider %r, discarding it', subtitle.provider_name)
return False
# don't hammer the provider
logger.debug('Errors while downloading subtitle, retrying provider %r in %s seconds', subtitle.provider_name, DOWNLOAD_RETRY_SLEEP)
time.sleep(DOWNLOAD_RETRY_SLEEP)
# check subtitle validity
if not subtitle.is_valid():
logger.error('Invalid subtitle')
return False
return True
def download_best_subtitles(self, subtitles, video, languages, min_score=0, hearing_impaired=False, only_one=False,
scores=None):
"""Download the best matching subtitles.
:param subtitles: the subtitles to use.
:type subtitles: list of :class:`~subliminal.subtitle.Subtitle`
:param video: video to download subtitles for.
:type video: :class:`~subliminal.video.Video`
:param languages: languages to download.
:type languages: set of :class:`~babelfish.language.Language`
:param int min_score: minimum score for a subtitle to be downloaded.
:param bool hearing_impaired: hearing impaired preference.
:param bool only_one: download only one subtitle, not one per language.
:param dict scores: scores to use, if `None`, the :attr:`~subliminal.video.Video.scores` from the video are
used.
:return: downloaded subtitles.
:rtype: list of :class:`~subliminal.subtitle.Subtitle`
"""
# sort subtitles by score
unsorted_subtitles = []
for s in subtitles:
logger.debug("Starting score computation for %s", s)
unsorted_subtitles.append((s, compute_score(s.get_matches(video, hearing_impaired=hearing_impaired), video,
scores=scores)))
scored_subtitles = sorted(unsorted_subtitles, key=operator.itemgetter(1), reverse=True)
# download best subtitles, falling back on the next on error
downloaded_subtitles = []
for subtitle, score in scored_subtitles:
# check score
if score < min_score:
logger.info('Score %d is below min_score (%d)', score, min_score)
break
# check downloaded languages
if subtitle.language in set(s.language for s in downloaded_subtitles):
logger.debug('Skipping subtitle: %r already downloaded', subtitle.language)
continue
# download
logger.info('Downloading subtitle %r with score %d', subtitle, score)
if self.download_subtitle(subtitle):
downloaded_subtitles.append(subtitle)
# stop when all languages are downloaded
if set(s.language for s in downloaded_subtitles) == languages:
logger.debug('All languages downloaded')
break
# stop if only one subtitle is requested
if only_one:
logger.debug('Only one subtitle downloaded')
break
return downloaded_subtitles
@@ -1,24 +0,0 @@
# coding=utf-8
import logging
from random import randint
from subliminal.providers.addic7ed import Addic7edProvider
logger = logging.getLogger(__name__)
class PatchedAddic7edProvider(Addic7edProvider):
USE_ADDICTED_RANDOM_AGENTS = False
def __init__(self, username=None, password=None, use_random_agents=False):
super(PatchedAddic7edProvider, self).__init__(username=username, password=password)
self.USE_ADDICTED_RANDOM_AGENTS = use_random_agents
def initialize(self):
super(PatchedAddic7edProvider, self).initialize()
if self.USE_ADDICTED_RANDOM_AGENTS:
from .utils import FIRST_THOUSAND_OR_SO_USER_AGENTS as AGENT_LIST
logger.debug("addic7ed: using random user agents")
self.session.headers = {
'User-Agent': AGENT_LIST[randint(0, len(AGENT_LIST)-1)],
'Referer': self.server_url,
}
@@ -0,0 +1,135 @@
# coding=utf-8
import logging
import re
from random import randint
from subliminal.providers.addic7ed import Addic7edProvider, Addic7edSubtitle, ParserBeautifulSoup, Language
from subliminal.cache import SHOW_EXPIRATION_TIME, region
from .mixins import PunctuationMixin
logger = logging.getLogger(__name__)
series_year_re = re.compile('^(?P<series>[ \w.:]+)(?: \((?P<year>\d{4})\))?$')
class PatchedAddic7edProvider(PunctuationMixin, Addic7edProvider):
USE_ADDICTED_RANDOM_AGENTS = False
def __init__(self, username=None, password=None, use_random_agents=False):
super(PatchedAddic7edProvider, self).__init__(username=username, password=password)
self.USE_ADDICTED_RANDOM_AGENTS = use_random_agents
def initialize(self):
# patch: add optional user agent randomization
super(PatchedAddic7edProvider, self).initialize()
if self.USE_ADDICTED_RANDOM_AGENTS:
from .utils import FIRST_THOUSAND_OR_SO_USER_AGENTS as AGENT_LIST
logger.debug("addic7ed: using random user agents")
self.session.headers = {
'User-Agent': AGENT_LIST[randint(0, len(AGENT_LIST)-1)],
'Referer': self.server_url,
}
@region.cache_on_arguments(expiration_time=SHOW_EXPIRATION_TIME)
def _get_show_ids(self):
"""Get the ``dict`` of show ids per series by querying the `shows.php` page.
:return: show id per series, lower case and without quotes.
:rtype: dict
# patch: add punctuation cleaning
"""
# get the show page
logger.info('Getting show ids')
r = self.session.get(self.server_url + 'shows.php', timeout=10)
r.raise_for_status()
soup = ParserBeautifulSoup(r.content, ['lxml', 'html.parser'])
# populate the show ids
show_ids = {}
for show in soup.select('td.version > h3 > a[href^="/show/"]'):
show_ids[self.clean_punctuation(show.text.lower().replace('\'', ''))] = int(show['href'][6:])
logger.debug('Found %d show ids', len(show_ids))
return show_ids
@region.cache_on_arguments(expiration_time=SHOW_EXPIRATION_TIME)
def _search_show_id(self, series, year=None):
"""Search the show id from the `series` and `year`.
:param string series: series of the episode.
:param year: year of the series, if any.
:type year: int or None
:return: the show id, if found.
:rtype: int or None
# patch: add punctuation cleaning
"""
# build the params
series_year = '%s (%d)' % (series, year) if year is not None else series
params = {'search': series_year, 'Submit': 'Search'}
# make the search
logger.info('Searching show ids with %r', params)
r = self.session.get(self.server_url + 'search.php', params=params, timeout=10)
r.raise_for_status()
soup = ParserBeautifulSoup(r.content, ['lxml', 'html.parser'])
# get the suggestion
suggestion = soup.select('span.titulo > a[href^="/show/"]')
if not suggestion:
logger.warning('Show id not found: no suggestion')
return None
if not self.clean_punctuation(suggestion[0].i.text.lower()) == self.clean_punctuation(series_year.lower()):
logger.warning('Show id not found: suggestion does not match')
return None
show_id = int(suggestion[0]['href'][6:])
logger.debug('Found show id %d', show_id)
return show_id
def query(self, series, season, year=None, country=None):
# patch: fix logging
# get the show id
show_id = self.get_show_id(series, year, country)
if show_id is None:
logger.error('No show id found for %r (%r)', series, {'year': year, 'country': country})
return []
# get the page of the season of the show
logger.info('Getting the page of show id %d, season %d', show_id, season)
r = self.session.get(self.server_url + 'show/%d' % show_id, params={'season': season}, timeout=10)
r.raise_for_status()
soup = ParserBeautifulSoup(r.content, ['lxml', 'html.parser'])
# loop over subtitle rows
header = soup.select('#header font')
if header:
match = series_year_re.match(header[0].text.strip()[:-10])
series = match.group('series')
year = int(match.group('year')) if match.group('year') else None
subtitles = []
for row in soup.select('tr.epeven'):
cells = row('td')
# ignore incomplete subtitles
status = cells[5].text
if status != 'Completed':
logger.debug('Ignoring subtitle with status %s', status)
continue
# read the item
language = Language.fromaddic7ed(cells[3].text)
hearing_impaired = bool(cells[6].text)
page_link = self.server_url + cells[2].a['href'][1:]
season = int(cells[0].text)
episode = int(cells[1].text)
title = cells[2].text
version = cells[4].text
download_link = cells[9].a['href'][1:]
subtitle = Addic7edSubtitle(language, hearing_impaired, page_link, series, season, episode, title, year,
version, download_link)
logger.debug('Found subtitle %r', subtitle)
subtitles.append(subtitle)
return subtitles
@@ -0,0 +1,7 @@
# coding=utf-8
class PunctuationMixin(object):
def clean_punctuation(self, s):
# fixes show ids for stuff like "Mr. Petterson", as our matcher already sees it as "Mr Petterson" but addic7ed doesn't
return s.replace(".", "")
@@ -0,0 +1,23 @@
# coding=utf-8
import logging
from subliminal.providers.opensubtitles import OpenSubtitlesProvider, checked, get_version, __version__
logger = logging.getLogger(__name__)
class PatchedOpenSubtitlesProvider(OpenSubtitlesProvider):
def __init__(self, username=None, password=None):
if username is not None and password is None or username is None and password is not None:
raise ConfigurationError('Username and password must be specified')
self.username = username or ''
self.password = password or ''
super(PatchedOpenSubtitlesProvider, self).__init__()
def initialize(self):
logger.info('Logging in')
response = checked(self.server.LogIn(self.username, self.password, 'eng', 'subliminal v%s' % get_version(__version__)))
self.token = response['token']
logger.debug('Logged in with token %r', self.token)
@@ -0,0 +1,22 @@
# coding=utf-8
import logging
import io
from zipfile import ZipFile
from subliminal.providers.podnapisi import PodnapisiProvider, fix_line_ending, ProviderError
logger = logging.getLogger(__name__)
class PatchedPodnapisiProvider(PodnapisiProvider):
def download_subtitle(self, subtitle):
# download as a zip
logger.info('Downloading subtitle %r', subtitle)
r = self.session.get(self.server_url + subtitle.pid + '/download', params={'container': 'zip'}, timeout=10)
r.raise_for_status()
# open the zip
with ZipFile(io.BytesIO(r.content)) as zf:
if len(zf.namelist()) > 1:
raise ProviderError('More than one file to unzip')
subtitle.content = fix_line_ending(zf.read(zf.namelist()[0]))
@@ -0,0 +1,45 @@
# coding=utf-8
import logging
from subliminal.providers import ParserBeautifulSoup
from subliminal.cache import SHOW_EXPIRATION_TIME, region
from subliminal.providers.tvsubtitles import TVsubtitlesProvider, link_re
from .mixins import PunctuationMixin
logger = logging.getLogger(__name__)
class PatchedTVsubtitlesProvider(PunctuationMixin, TVsubtitlesProvider):
@region.cache_on_arguments(expiration_time=SHOW_EXPIRATION_TIME)
def search_show_id(self, series, year=None):
"""Search the show id from the `series` and `year`.
:param string series: series of the episode.
:param year: year of the series, if any.
:type year: int or None
:return: the show id, if any.
:rtype: int or None
"""
# make the search
logger.info('Searching show id for %r', series)
r = self.session.post(self.server_url + 'search.php', data={'q': series}, timeout=10)
r.raise_for_status()
# get the series out of the suggestions
soup = ParserBeautifulSoup(r.content, ['lxml', 'html.parser'])
show_id = None
for suggestion in soup.select('div.left li div a[href^="/tvshow-"]'):
match = link_re.match(self.clean_punctuation(suggestion.text))
if not match:
logger.error('Failed to match %s', suggestion.text)
continue
if match.group('series').lower() == series.lower():
if year is not None and int(match.group('first_year')) != year:
logger.debug('Year does not match')
continue
show_id = int(suggestion['href'][8:-5])
logger.debug('Found show id %d', show_id)
break
return show_id
@@ -0,0 +1,61 @@
# coding=utf-8
import os
import logging
from subliminal.video import SUBTITLE_EXTENSIONS, Language
logger = logging.getLogger(__name__)
# may be absolute or relative paths; set to selected options
CUSTOM_PATHS = []
def _search_external_subtitles(path):
dirpath, filename = os.path.split(path)
dirpath = dirpath or '.'
fileroot, fileext = os.path.splitext(filename)
subtitles = {}
for p in os.listdir(dirpath):
# keep only valid subtitle filenames
if not p.startswith(fileroot) or not p.endswith(SUBTITLE_EXTENSIONS):
continue
# extract the potential language code
language_code = p[len(fileroot):-len(os.path.splitext(p)[1])].replace(fileext, '').replace('_', '-')[1:]
# default language is undefined
language = Language('und')
# attempt to parse
if language_code:
try:
language = Language.fromietf(language_code)
except ValueError:
logger.error('Cannot parse language code %r', language_code)
subtitles[p] = language
logger.debug('Found subtitles %r', subtitles)
return subtitles
def patched_search_external_subtitles(path):
"""
wrap original search_external_subtitles function to search multiple paths for one given video
# todo: cleanup and merge with _search_external_subtitles
"""
video_path, video_filename = os.path.split(path)
subtitles = {}
for folder_or_subfolder in [video_path] + CUSTOM_PATHS:
# folder_or_subfolder may be a relative path or an absolute one
try:
abspath = unicode(os.path.abspath(os.path.join(*[video_path if not os.path.isabs(folder_or_subfolder) else "", folder_or_subfolder, video_filename])))
except Exception, e:
logger.error("skipping path %s because of %s", repr(folder_or_subfolder), e)
continue
logger.debug("external subs: scanning path %s", abspath)
if os.path.isdir(os.path.dirname(abspath)):
subtitles.update(_search_external_subtitles(abspath))
logger.debug("external subs: found %s", subtitles)
return subtitles
+45 -11
View File
@@ -1,16 +1,50 @@
pannal's fork:
# pannal's fork of Subliminal.bundle
Please install [LocalMediaExtended.bundle](https://github.com/pannal/LocalMediaExtended.bundle) and use it **INSTEAD** of LocalMedia.
Use the following agent order:
1. Subliminal TV/Movie Subtitles
2. Local Media Assets Extended
3. anything else
4. again, **DISABLE Local Media Assets**!
## Changelog
#### RC-3
- addic7ed/tvsubtitles: punctuation fixes (correctly get show ids for series like "Mr. Poopster" now)
- podnapisi: fix logging
- opensubtitles: add login credentials (for VIPs)
- add retry functionality to retry failed subtitle downloads, including configurable amount of retries until discarding of provider
- move possibly not needed setting "Restrict to one language" to the bottom
- more detailed logging
- some cleanup
RC-2
- fix empty custom subtitle folder creation
- fix detection of existing embedded subtitles (switch to https://github.com/tonswieb/enzyme)
- better logging
- set default TV score to 15; movie score to 30
RC-1
- fix subliminal's logging error on min_score not met (fixes #15)
- separated tv and movies subtitle scores settings (fixes #16)
- add option to save only one subtitle per video (skipping the ".lang." naming scheme plex supports) (fixes #3)
beta5
- fix storing subtitles besides the actual video file, not subfolder (fixes #14)
- "custom folder" setting now always used if given (properly overrides "subtitle folder" setting)
- also scan (custom) given subtitle folders for existing subtitles instead of redownloading them on every refresh (fixes #9, #2)
beta4
- ~~increased score of addic7ed subtitles a bit~~ (not existing currently)
- **support for newest Subliminal (1.0.1) and guessit (0.10.1)**
- **plugin now also works with com.plexapp.agents.thetvdbdvdorder**
- guessit's release-group detection bug fixed (*not the correct way, though. has already been fixed in guessit itself, need to merge*)
- providers fixed for subliminal 1.0.1 (at least addic7ed)
- support for addic7ed languages: French (Canadian)
- support for additional languages: pt-br (Portuguese (Brasil)), fa (Persian (Farsi))
- support for three (two optional) subtitle languages
bugs:
- skip existing subtitles (not in video's path - e.g. subFolder given) currently broken
- **support for newest Subliminal ([1.0.1](27a6e51cd36ffb2910cd9a7add6d797a2c6469b7)) and guessit ([0.11.0](2814f57e8999dcc31575619f076c0c1a63ce78f2))**
- **plugin now also [works with com.plexapp.agents.thetvdbdvdorder](924470d2c0db3a71529278bce4b7247eaf2f85b8)**
- providers fixed for subliminal 1.0.1 ([at least addic7ed](131504e7eed8b3400c457fbe49beea3b115bc916))
- providers [don't simply fail and get excluded on non-detected language](1a779020792e0201ad689eefbf5a126155e89c97)
- support for addic7ed languages: [French (Canadian)](b11a051c233fd72033f0c3b5a8c1965260e7e19f)
- support for additional languages: [pt-br (Portuguese (Brasil)), fa (Persian (Farsi))](131504e7eed8b3400c457fbe49beea3b115bc916)
- support for [three (two optional) subtitle languages](e543c927cf49c264eaece36640c99d67a99c7da2)
- optionally use [random user agent for addic7ed provider](83ace14faf75fbd75313f0ceda9b78161895fbcf) (should not be needed)
Subliminal.bundle
=================