16 Commits

Author SHA1 Message Date
kelciour 447973e773 Update README 2020-10-05 19:30:05 +03:00
kelciour 983d302eaa Add support for ripgrep 2020-10-05 19:22:16 +03:00
kelciour c1c2a424f4 Pass a list of arguments to Popen 2020-10-05 18:42:29 +03:00
kelciour 64fba5c94c Improve --demo option 2020-10-05 17:59:41 +03:00
kelciour a2ea5cbd28 Split on \n only' 2020-10-05 17:10:04 +03:00
kelciour b46e891978 Update README 2020-10-05 15:09:02 +03:00
kelciour 2f930673e4 Reword usage 2020-10-05 14:59:04 +03:00
kelciour 92dc4cf137 Make sure <media_dir> is a folder 2020-10-05 14:31:11 +03:00
kelciour 17673583f0 Fix for grep 3.1: -P supports only unibyte and UTF-8 locales 2020-10-05 12:46:48 +03:00
kelciour d56180e89b Update to Python 3 2020-10-05 12:41:25 +03:00
kelciour accfa1b039 Update README 2017-11-16 00:44:32 +03:00
kelciour cec525a11f Update README 2017-11-16 00:35:26 +03:00
kelciour 4ece4ce006 Add progress bar 2017-11-16 00:22:48 +03:00
kelciour 1a8ec33ea6 Don't join sentences in songs 2017-11-15 20:15:28 +03:00
kelciour e5d972c18f Add subtitles export 2017-11-15 20:01:16 +03:00
kelciour 4f9d9867b8 Remove surrounding quotes from search_phrase in utf-8 representation 2017-11-07 16:18:01 +03:00
2 changed files with 188 additions and 95 deletions
+27 -23
View File
@@ -10,46 +10,49 @@ Inspired by [videogrep](http://lav.io/2014/06/videogrep-automatic-supercuts-with
# Usage
Run ```python playphrase.py -i <media_dir> _init_``` to generate txt files from srt files that will be used for search (only the first time or when you add new movies in your folder).
At first, run ```python playphrase.py -i <media_dir> _init_``` to generate txt files from srt files that will be used for searching (only the first time or when you add a new movie in the media folder).
After that use
```python playphrase.py -i <media_dir> <phrase>```
After that, use ```python playphrase.py -i <media_dir> <phrase>```
Regular expressions can be used in search, for example, \b for word boundary.
Regular expressions can be used too, for example, \b for word boundary.
### Keyboard Shortcuts
Use ```Enter``` to move to the next clip or ```Shift + <``` and ```Shift + >``` to switch between clips, ```Ctrl + Left``` and ```Ctrl + Right``` to move to the prev/next subtitle, ```q``` to close video player.
Use ```Enter``` to move to the next clip or ```Shift + <``` and ```Shift + >``` to switch between clips, ```Ctrl + Left``` and ```Ctrl + Right``` to move to the prev/next subtitle, ```q``` to close the video player.
More info: [https://mpv.io/manual/stable/#keyboard-control](https://mpv.io/manual/master/#keyboard-control)
### Batch Scripts
There's ```videogrep.bat``` (Windows) and ```videogrep.sh``` (Linux) files to simplify user input. First time before running edit them and update ```media_dir``` path. Use ```quit```, ```exit``` or ```q```, ```x``` to exit from the batch script.
The repository contains ```videogrep.bat``` (Windows) and ```videogrep.sh``` (Linux) files to simplify the user input. Before running it for the first time, edit the file in a text editor and update ```media_dir``` path. Use ```quit```, ```exit``` or ```q```, ```x``` to exit from the batch script.
Here's a quick demo how to set up and run ```videogrep.bat``` on Windows ([YouTube](https://youtu.be/kEkXZY4LFCY)).
### Additional Options:
* ```-ph, --phrases GAP_BETWEEN_PHRASES```
move start time of the clip to the beginning of the current phrase. Value is optional (default=1.75 seconds)
* ```-l, --limit```
maximum duration of the phrase (default=30 seconds)
* ```-p, --padding```
padding in seconds to add to the start and end of each clip (default=0.0 seconds)
* ```-e, --ending```
* ```-ph GAP_BETWEEN_PHRASES, --phrases```
move the start time of the clip to the beginning of the current phrase (default=1.25 seconds)
* ```-l SECONDS, --limit```
maximum phrase's duration (default=60 seconds)
* ```-p SECONDS, --padding```
padding in seconds to add to the start and the end of each clip (default=0.0 seconds)
* ```-e SECONDS, --ending```
play only matching lines (or phrases)
* ```-r, --randomize```
randomize clips
* ```-o, --output```
name of the file in which output of \'grep\' command will be written
* ```-d, --demo```
* ```-r, --randomize```
randomize the clips
* ```-o FILENAME, --output```
write the \'grep\' output to the file
* ```-d, --demo```
only show grep results
* ```-a, --audio```
create audio fragments
* ```-v, --video```
create video fragments
* ```-s, --video-sub```
create video fragments with subtitles
* ```-m, --mpv-options OPTIONS```
* ```-vs, --video-sub```
create video fragments with hardcoded subtitles
* ```-s, --subtitles```
create subtitles for fragments
* ```-m OPTIONS, --mpv-options```
mpv player options
### Optional Configuration Changes
@@ -85,9 +88,10 @@ Here's example video how it looks like (YouTube):
# Requirements
* python 2.7
* grep
* python 3
* grep or [ripgrep](https://github.com/BurntSushi/ripgrep) (for non-ASCII languages)
* mpv
* ffmpeg
# Note
+161 -72
View File
@@ -4,6 +4,7 @@
import os
import random
import re
import shutil
import sys
import subprocess
import time
@@ -32,9 +33,19 @@ def get_time_parts(time):
def seconds_to_srt_time(time):
return '%02d:%02d:%02d,%03d' % get_time_parts(time)
def read_subtitles(content):
def read_subtitles(file_path):
content = open(file_path, 'rb').read()
if content[:3] == b'\xef\xbb\xbf': # with bom
content = content[3:]
ret_code, content = convert_to_unicode(content)
if ret_code == False:
sys.exit(1)
subs = []
content = content.replace('\r\n', '\n')
content = content.replace('\r', '\n')
content = re.sub('\n\n+', '\n\n', content)
for sub in content.strip().split('\n\n'):
sub_chunks = sub.split('\n')
@@ -43,7 +54,7 @@ def read_subtitles(content):
sub_start = srt_time_to_seconds(sub_timecode[0].strip())
sub_end = srt_time_to_seconds(sub_timecode[1].strip())
sub_content = " ".join(sub_chunks[2:]).replace("\t", " ")
sub_content = "\n".join(sub_chunks[2:]).replace("\t", " ")
sub_content = re.sub(r"<[^>]+>", "", sub_content)
sub_content = re.sub(r" +", " ", sub_content)
sub_content = sub_content.strip()
@@ -59,7 +70,7 @@ def convert_into_sentences(en_subs, limit):
for sub in en_subs:
sub_start = sub[0]
sub_end = sub[1]
sub_content = sub[2]
sub_content = sub[2].replace('\n', ' ')
if len(subs) > 0:
prev_sub_start = subs[-1][0]
@@ -69,14 +80,15 @@ def convert_into_sentences(en_subs, limit):
if ((sub_start - prev_sub_end) <= 2 and (sub_end - prev_sub_start) < limit and
sub_content[0] != '-' and
sub_content[0] != '"' and
sub_content[0] != u'' and
sub_content[0] != '' and
sub_content[0].isupper() != True and
(prev_sub_content[-1] != '.' or (sub_content[0:3] == '...' or (prev_sub_content[-3:] == '...' and sub_content[0].islower()))) and
prev_sub_content[-1] != '?' and
prev_sub_content[-1] != '!' and
prev_sub_content[-1] != ']' and
prev_sub_content[-1] != ')' and
prev_sub_content[-1] != u'' and
prev_sub_content[-1] != u'' and
prev_sub_content[-1] != '' and
prev_sub_content[-1] != '' and
prev_sub_content[-1] != '"'):
subs[-1] = (prev_sub_start, sub_end, prev_sub_content + " " + sub_content)
@@ -87,15 +99,38 @@ def convert_into_sentences(en_subs, limit):
return subs
def write_subtitles(filename, subs):
f = open(filename, 'w')
def filter_subtitles(subs, clip_start, clip_end):
subs_filtered = []
for idx in range(len(subs)):
f.write("(%s, %s)" % (seconds_to_srt_time(subs[idx][0]), seconds_to_srt_time(subs[idx][1])))
f.write("\t")
f.write(subs[idx][2].encode('utf-8'))
f.write("\n")
sub_start = subs[idx][0]
sub_end = subs[idx][1]
sub_content = subs[idx][2]
if sub_end > clip_start and sub_start < clip_end:
subs_filtered.append((sub_start - clip_start, sub_end - clip_start, sub_content))
if sub_start > clip_end:
break
return subs_filtered
def write_subtitles(filename, subs):
f = open(filename, 'w', encoding='utf-8')
if filename.endswith('.srt'):
for idx in range(len(subs)):
f.write(str(idx+1) + "\n")
f.write(seconds_to_srt_time(subs[idx][0]) + " --> " + seconds_to_srt_time(subs[idx][1]) + "\n")
f.write(subs[idx][2] + "\n")
f.write("\n")
else:
for idx in range(len(subs)):
f.write("(%s, %s)" % (seconds_to_srt_time(subs[idx][0]), seconds_to_srt_time(subs[idx][1])))
f.write("\t")
f.write(subs[idx][2])
f.write("\n")
f.close()
def update_mpv_player_cmd(cmd_options, mpv_options):
@@ -115,6 +150,12 @@ def update_mpv_player_cmd(cmd_options, mpv_options):
return cmd
def update_progress(progress, num, max_num):
width = 25
n = int(progress / 100.0 * width)
sys.stdout.write("\r %3d%% [%s%s%s] %d/%d" % (progress, "=" * n, ">", " " * (width - n), num, max_num))
sys.stdout.flush()
def get_fragment_filename(phrase):
s = phrase.strip().replace(' ', '_')
s = s.replace('.*', '...')
@@ -125,6 +166,8 @@ def get_fragment_filename(phrase):
def create_fragments(search_phrase, clips, export_mode):
idx = 1
update_progress(0, 0, len(clips))
for video_file, clip_start, clip_end in clips:
fragment_filename = get_fragment_filename(search_phrase)
@@ -166,6 +209,14 @@ def create_fragments(search_phrase, clips, export_mode):
p = subprocess.Popen(cmd)
p.wait()
if export_mode["subtitles"]:
subtitles_filename = video_file.rsplit('.', 1)[0] + ".srt"
subs = read_subtitles(subtitles_filename)
subs = filter_subtitles(subs, clip_start, clip_end)
write_subtitles(fragment_filename + ".srt", subs)
update_progress(float(idx) / len(clips) * 100, idx, len(clips))
idx += 1
def play_clips(clips, ending_mode, mpv_options):
@@ -195,40 +246,64 @@ def play_clips(clips, ending_mode, mpv_options):
try:
if p.poll() == None:
f_pipe.write(" ".join(cmd) + "\n")
msg = " ".join(cmd) + "\n"
f_pipe.write(msg.encode('utf-8'))
else:
break
except IOError as ex:
if ex.errno != 32:
print ex
print(ex)
if p != None:
p.kill()
return
def main(media_dir, search_phrase, phrase_mode, phrases_gap, padding, limit, output_file, ending_mode, randomize_mode, demo_mode, mpv_options, audio_mode, video_mode, video_with_sub_mode):
search_phrase = search_phrase.decode(locale.getpreferredencoding())
search_phrase_in_utf8_representation = repr(search_phrase.encode("UTF-8"))
search_phrase_in_grep = "\"(?s)\(\d\d:\d\d:\d\d,\d\d\d\, \d\d:\d\d:\d\d,\d\d\d\)\\t[^\\n]*" + search_phrase_in_utf8_representation.strip("\'") + "[^\\n]*\""
def print_match(media_dir, filename, line, attrs={"prev_filename": None}):
if filename.startswith(media_dir):
filename = filename.replace(media_dir + os.sep, '', 1)
cmd = " ".join(["grep", "-r", "-z", "-o", "-i", "--include", "\*\.txt", "-P", search_phrase_in_grep, '"' + media_dir + '"'])
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True, bufsize=-1)
if attrs["prev_filename"] != filename:
print()
print('-', filename)
print()
attrs["prev_filename"] = filename
line = line.replace('\t', ' ')
print(line)
def main(media_dir, search_phrase, phrase_mode, phrases_gap, padding, limit, output_file, ending_mode, randomize_mode, demo_mode, mpv_options, audio_mode, video_mode, video_with_sub_mode, subtitles_mode):
search_phrase_in_grep = "(?s)\(\d\d:\d\d:\d\d,\d\d\d\, \d\d:\d\d:\d\d,\d\d\d\)\\t[^\\n]*" + search_phrase + "[^\\n]*"
rg = shutil.which('rg')
if rg:
cmd = ["rg", "--no-heading", "--null-data", "-N", "-o", "-i", "-g", "*.txt", "-P", search_phrase_in_grep, media_dir]
else:
cmd = ["grep", "-r", "-z", "-o", "-i", "--include", "*.txt", "-P", search_phrase_in_grep, media_dir]
p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, bufsize=-1)
output, error = p.communicate()
media_dir = os.path.abspath(media_dir).replace('\\', '/').replace('/', os.sep)
if p.returncode == 0:
matches = output.rstrip("\x00").split("\x00")
if output_file != None:
with open(output_file, 'w') as f_results:
f_results.write("\n".join(matches))
clips = []
for match in matches:
filename, line = match.split(".txt:", 1)
lines = line.splitlines()
filename, line = match.strip().split(".txt:", 1)
filename = os.path.abspath(filename).replace('\\', '/').replace('/', os.sep)
if demo_mode:
print_match(media_dir, filename, line)
lines = line.split('\n')
def get_line_timings(line):
sub_timing, sub_content = line.split("\t", 1)
sub_timing, sub_content = line.split("\t", 1)
sub_start, sub_end = sub_timing.strip("()").split(", ")
return (sub_start, sub_end)
@@ -310,27 +385,30 @@ def main(media_dir, search_phrase, phrase_mode, phrases_gap, padding, limit, out
for ext in movie_extensions:
movie_filename = filename + "." + ext
if os.path.isfile(movie_filename):
clips.append((os.path.abspath(movie_filename), phrase_start - padding, phrase_end + padding))
clips.append((movie_filename, phrase_start - padding, phrase_end + padding))
break
if demo_mode:
print()
if phrase_mode:
print "Number of matches:", len(clips)
print("Number of matches:", len(clips))
clips = list(OrderedDict.fromkeys(clips)) # delete dublicates
print "Number of clips:", len(clips)
print("Number of clips:", len(clips))
if randomize_mode:
random.shuffle(clips)
if audio_mode or video_mode or video_with_sub_mode:
create_fragments(search_phrase, clips, {"audio": audio_mode, "video": video_mode, "video-sub": video_with_sub_mode})
if audio_mode or video_mode or video_with_sub_mode or subtitles_mode:
create_fragments(search_phrase, clips, {"audio": audio_mode, "video": video_mode, "video-sub": video_with_sub_mode, "subtitles": subtitles_mode})
elif not demo_mode:
play_clips(clips, ending_mode, mpv_options)
elif p.returncode == 1:
print "'%s' is not found in '%s'" % (search_phrase, media_dir)
print("'%s' is not found in '%s'" % (search_phrase, media_dir))
else:
print '%s' % error
print('%s' % error)
def need_update(media_dir):
srt_counter = 0
@@ -355,7 +433,7 @@ def convert_to_unicode(file_content):
except UnicodeDecodeError:
pass
print "ERROR: Unknown encoding. Use srt file with 'utf-8' encoding."
print("ERROR: Unknown encoding. Use srt file with 'utf-8' encoding.")
return (False, file_content)
def init(media_dir, limit):
@@ -365,17 +443,9 @@ def init(media_dir, limit):
if file_ext == "srt":
file_path = os.path.join(root, file)
print file_path
print(file_path)
with open(file_path, 'rU') as f_srt:
content = f_srt.read()
if content[:3]=='\xef\xbb\xbf': # with bom
content = content[3:]
ret_code, content = convert_to_unicode(content)
if ret_code == False:
sys.exit(1)
subs = read_subtitles(content)
subs = read_subtitles(file_path)
subs = convert_into_sentences(subs, limit)
write_subtitles(file_path[:-4] + ".txt", subs)
@@ -386,10 +456,10 @@ def parse_args(argv):
search_phrase = argv[-1]
if len(search_phrase) == 0:
print "Search phrase can't be empty"
sys.exit()
print("Search phrase can't be empty")
sys.exit(1)
args = {"padding": 0, "limit": 60, "output_file": None, "phrase_mode": False, "phrases_gap":1.25, "search_phrase":search_phrase, "ending_mode":False, "randomize_mode":False, "demo_mode":False, "mpv_options":"", "audio_mode":False, "video_mode":False, "video_with_sub_mode":False }
args = {"padding": 0, "limit": 60, "output_file": None, "phrase_mode": False, "phrases_gap":1.25, "search_phrase":search_phrase, "ending_mode":False, "randomize_mode":False, "demo_mode":False, "mpv_options":"", "audio_mode":False, "video_mode":False, "video_with_sub_mode":False, "subtitles_mode":False }
argv = argv[:-1]
idx = 0
@@ -424,8 +494,10 @@ def parse_args(argv):
args["audio_mode"] = True
elif argv[idx] == "--video" or argv[idx] == "-v":
args["video_mode"] = True
elif argv[idx] == "--video-sub" or argv[idx] == "-s":
elif argv[idx] == "--video-sub" or argv[idx] == "-vs":
args["video_with_sub_mode"] = True
elif argv[idx] == "--subtitles" or argv[idx] == "-s":
args["subtitles_mode"] = True
elif argv[idx] == "--phrases" or argv[idx] == "-ph":
args["phrase_mode"] = True
if idx + 1 < len(argv):
@@ -450,37 +522,54 @@ def parse_args(argv):
return args
def usage():
print "Usage: playphrase -i <media_dir> <phrase>"
print ""
print "Init: playphrase -i <media_dir> _init_"
print ""
print "Additional options:"
print "-ph, --phrases GAP_BETWEEN_PHRASES", " ", "move start time of the clip to the beginning of the current phrase. Value is optional (default=1.25 seconds)"
print "-l, --limit", " ", "maximum duration of the phrase (default=60 seconds)"
print "-p, --padding", " ", "padding in seconds to add to the start and end of each clip (default=0.0 seconds)"
print "-e, --ending", " ", "play only matching lines (or phrases)"
print "-r, --randomize", " ", "randomize the clips"
print "-o, --output", " ", "name of the file in which output of \'grep\' command will be written"
print "-d, --demo", " ", "only show grep results"
print "-a, --audio", " ", "create audio fragments"
print "-v, --video", " ", "create video fragments"
print "-s, --video-sub", " ", "create video fragments with subtitles"
print "-m, --mpv-options OPTIONS", " ", "mpv player options"
def validate_args(args):
if not os.path.isdir(args["media_dir"]):
print("ERROR: '{}' is not a folder".format(args["media_dir"]))
return False
if args["output_file"]:
if os.path.isdir(args["output_file"]):
print("ERROR: '{}' can't be a folder".format(args["output_file"]))
return False
return True
def print_usage():
print("Usage: playphrase -i <media_dir> <phrase>")
print()
print("Init: playphrase -i <media_dir> _init_")
print()
print("Additional options:")
print("-ph GAP_BETWEEN_PHRASES, --phrases", " ", "move the start time of the clip to the beginning of the current phrase (default=1.25 seconds)")
print("-l SECONDS, --limit", " ", "maximum phrase's duration (default=60 seconds)")
print("-p SECONDS, --padding", " ", "padding in seconds to add to the start and the end of each clip (default=0.0 seconds)")
print("-e SECONDS, --ending", " ", "play only matching lines (or phrases)")
print("-r, --randomize", " ", "randomize the clips")
print("-o FILENAME, --output", " ", "write the 'grep' output to the file")
print("-d, --demo", " ", "only show grep results")
print("-a, --audio", " ", "create audio fragments")
print("-v, --video", " ", "create video fragments")
print("-vs, --video-sub", " ", "create video fragments with hardcoded subtitles")
print("-s, --subtitles", " ", "create subtitles for fragments")
print("-m OPTIONS, --mpv-options", " ", "mpv player options")
if __name__ == '__main__':
os.environ["PATH"] += os.pathsep + "." + os.sep + "utils" + os.sep + "grep"
os.environ["PATH"] += os.pathsep + "." + os.sep + "utils" + os.sep + "mpv"
os.environ["PATH"] += os.pathsep + "." + os.sep + "utils" + os.sep + "ffmpeg"
if "LC_ALL" not in os.environ:
os.environ["LC_ALL"] = "en_US.utf8"
args = parse_args(sys.argv[1:])
if args != False:
if args == False:
print_usage()
sys.exit(1)
if validate_args(args):
if args["search_phrase"] == "_init_":
init(args["media_dir"], args["limit"])
else:
if need_update(args["media_dir"]):
print "WARNING: number of '.srt' and '.txt' files doesn't match. Maybe you need to use 'playphrase <media_dir> _init_'."
print("WARNING: number of '.srt' and '.txt' files doesn't match. Maybe use 'playphrase -i <media_dir> _init_'.")
main(args["media_dir"], args["search_phrase"], args["phrase_mode"], args["phrases_gap"], args["padding"], args["limit"], args["output_file"], args["ending_mode"], args["randomize_mode"], args["demo_mode"], args["mpv_options"], args["audio_mode"], args["video_mode"], args["video_with_sub_mode"])
else:
usage()
main(args["media_dir"], args["search_phrase"], args["phrase_mode"], args["phrases_gap"], args["padding"], args["limit"], args["output_file"], args["ending_mode"], args["randomize_mode"], args["demo_mode"], args["mpv_options"], args["audio_mode"], args["video_mode"], args["video_with_sub_mode"], args["subtitles_mode"])