Update README

Add support for ripgrep
Pass a list of arguments to Popen
2020-10-05 19:30:05 +03:00 · 2020-10-05 19:22:16 +03:00 · 2020-10-05 18:42:29 +03:00 · 2020-10-05 17:59:41 +03:00 · 2020-10-05 17:10:04 +03:00 · 2020-10-05 15:09:02 +03:00
2 changed files with 188 additions and 95 deletions
@@ -10,46 +10,49 @@ Inspired by [videogrep](http://lav.io/2014/06/videogrep-automatic-supercuts-with

 # Usage

-Run ```python playphrase.py -i <media_dir> _init_``` to generate txt files from srt files that will be used for search (only the first time or when you add new movies in your folder).
+At first, run ```python playphrase.py -i <media_dir> _init_``` to generate txt files from srt files that will be used for searching (only the first time or when you add a new movie in the media folder).

-After that use 
-```python playphrase.py -i <media_dir> <phrase>```
+After that, use ```python playphrase.py -i <media_dir> <phrase>```

-Regular expressions can be used in search, for example, \b for word boundary.
+Regular expressions can be used too, for example, \b for word boundary.

 ### Keyboard Shortcuts 

-Use ```Enter``` to move to the next clip or ```Shift + <``` and ```Shift + >``` to switch between clips, ```Ctrl + Left``` and ```Ctrl + Right``` to move to the prev/next subtitle, ```q``` to close video player.
+Use ```Enter``` to move to the next clip or ```Shift + <``` and ```Shift + >``` to switch between clips, ```Ctrl + Left``` and ```Ctrl + Right``` to move to the prev/next subtitle, ```q``` to close the video player.

 More info: [https://mpv.io/manual/stable/#keyboard-control](https://mpv.io/manual/master/#keyboard-control)

 ### Batch Scripts

-There's ```videogrep.bat``` (Windows) and ```videogrep.sh``` (Linux) files to simplify user input. First time before running edit them and update ```media_dir``` path. Use ```quit```, ```exit``` or ```q```, ```x``` to exit from the batch script.
+The repository contains ```videogrep.bat``` (Windows) and ```videogrep.sh``` (Linux) files to simplify the user input. Before running it for the first time, edit the file in a text editor and update ```media_dir``` path. Use ```quit```, ```exit``` or ```q```, ```x``` to exit from the batch script.
+
+Here's a quick demo how to set up and run ```videogrep.bat``` on Windows ([YouTube](https://youtu.be/kEkXZY4LFCY)).

 ### Additional Options:

-* ```-ph, --phrases GAP_BETWEEN_PHRASES``` 
-move start time of the clip to the beginning of the current phrase. Value is optional (default=1.75 seconds)
-* ```-l, --limit``` 
-maximum duration of the phrase (default=30 seconds)
-* ```-p, --padding``` 
-padding in seconds to add to the start and end of each clip (default=0.0 seconds)
-* ```-e, --ending``` 
+* ```-ph GAP_BETWEEN_PHRASES, --phrases```
+move the start time of the clip to the beginning of the current phrase (default=1.25 seconds)
+* ```-l SECONDS, --limit```
+maximum phrase's duration (default=60 seconds)
+* ```-p SECONDS, --padding```
+padding in seconds to add to the start and the end of each clip (default=0.0 seconds)
+* ```-e SECONDS, --ending```
 play only matching lines (or phrases)
-* ```-r, --randomize``` 
-randomize clips
-* ```-o, --output``` 
-name of the file in which output of \'grep\' command will be written
-* ```-d, --demo``` 
+* ```-r, --randomize```
+randomize the clips
+* ```-o FILENAME, --output```
+write the \'grep\' output to the file
+* ```-d, --demo```
 only show grep results
 * ```-a, --audio```
 create audio fragments
 * ```-v, --video```
 create video fragments
-* ```-s, --video-sub```
-create video fragments with subtitles
-* ```-m, --mpv-options OPTIONS```
+* ```-vs, --video-sub```
+create video fragments with hardcoded subtitles
+* ```-s, --subtitles```
+create subtitles for fragments
+* ```-m OPTIONS, --mpv-options```
 mpv player options

 ### Optional Configuration Changes
@@ -85,9 +88,10 @@ Here's example video how it looks like (YouTube):

 # Requirements

-* python 2.7
-* grep
+* python 3
+* grep or [ripgrep](https://github.com/BurntSushi/ripgrep) (for non-ASCII languages)
 * mpv
+* ffmpeg

 # Note

@@ -4,6 +4,7 @@
 import os
 import random
 import re
+import shutil
 import sys
 import subprocess
 import time
@@ -32,9 +33,19 @@ def get_time_parts(time):
 def seconds_to_srt_time(time):
    return '%02d:%02d:%02d,%03d' % get_time_parts(time)

-def read_subtitles(content):
+def read_subtitles(file_path):
+    content = open(file_path, 'rb').read()
+
+    if content[:3] == b'\xef\xbb\xbf': # with bom
+        content = content[3:]
+
+    ret_code, content = convert_to_unicode(content)
+    if ret_code == False:
+        sys.exit(1)
+
    subs = []
-    
+    content = content.replace('\r\n', '\n')
+    content = content.replace('\r', '\n')
    content = re.sub('\n\n+', '\n\n', content)
    for sub in content.strip().split('\n\n'):
        sub_chunks = sub.split('\n')
@@ -43,7 +54,7 @@ def read_subtitles(content):
            
            sub_start = srt_time_to_seconds(sub_timecode[0].strip())
            sub_end = srt_time_to_seconds(sub_timecode[1].strip())
-            sub_content = " ".join(sub_chunks[2:]).replace("\t", " ")
+            sub_content = "\n".join(sub_chunks[2:]).replace("\t", " ")
            sub_content = re.sub(r"<[^>]+>", "", sub_content)
            sub_content = re.sub(r"  +", " ", sub_content)
            sub_content = sub_content.strip()
@@ -59,7 +70,7 @@ def convert_into_sentences(en_subs, limit):
    for sub in en_subs:
        sub_start = sub[0]
        sub_end = sub[1]
-        sub_content = sub[2]
+        sub_content = sub[2].replace('\n', ' ')

        if len(subs) > 0: 
            prev_sub_start = subs[-1][0]
@@ -69,14 +80,15 @@ def convert_into_sentences(en_subs, limit):
            if ((sub_start - prev_sub_end) <= 2 and (sub_end - prev_sub_start) < limit and 
                sub_content[0] != '-' and
                sub_content[0] != '"' and
-                sub_content[0] != u'♪' and
+                sub_content[0] != '♪' and
+                sub_content[0].isupper() != True and
                (prev_sub_content[-1] != '.' or (sub_content[0:3] == '...' or (prev_sub_content[-3:] == '...' and sub_content[0].islower()))) and 
                prev_sub_content[-1] != '?' and
                prev_sub_content[-1] != '!' and
                prev_sub_content[-1] != ']' and
                prev_sub_content[-1] != ')' and
-                prev_sub_content[-1] != u'♪' and
-                prev_sub_content[-1] != u'”' and
+                prev_sub_content[-1] != '♪' and
+                prev_sub_content[-1] != '”' and
                prev_sub_content[-1] != '"'):

                subs[-1] = (prev_sub_start, sub_end, prev_sub_content + " " + sub_content)
@@ -87,15 +99,38 @@ def convert_into_sentences(en_subs, limit):

    return subs

-def write_subtitles(filename, subs):
-    f = open(filename, 'w')
+def filter_subtitles(subs, clip_start, clip_end):
+    subs_filtered = []

    for idx in range(len(subs)):
-        f.write("(%s, %s)" % (seconds_to_srt_time(subs[idx][0]), seconds_to_srt_time(subs[idx][1])))
-        f.write("\t")
-        f.write(subs[idx][2].encode('utf-8'))
-        f.write("\n")
-    
+        sub_start = subs[idx][0]
+        sub_end = subs[idx][1]
+        sub_content = subs[idx][2]
+        
+        if sub_end > clip_start and sub_start < clip_end:
+            subs_filtered.append((sub_start - clip_start, sub_end - clip_start, sub_content))
+
+        if sub_start > clip_end:
+            break
+
+    return subs_filtered
+
+def write_subtitles(filename, subs):
+    f = open(filename, 'w', encoding='utf-8')
+
+    if filename.endswith('.srt'):
+        for idx in range(len(subs)):
+            f.write(str(idx+1) + "\n")
+            f.write(seconds_to_srt_time(subs[idx][0]) + " --> " + seconds_to_srt_time(subs[idx][1]) + "\n")
+            f.write(subs[idx][2] + "\n")
+            f.write("\n")
+    else:
+        for idx in range(len(subs)):
+            f.write("(%s, %s)" % (seconds_to_srt_time(subs[idx][0]), seconds_to_srt_time(subs[idx][1])))
+            f.write("\t")
+            f.write(subs[idx][2])
+            f.write("\n")
+        
    f.close()

 def update_mpv_player_cmd(cmd_options, mpv_options):
@@ -115,6 +150,12 @@ def update_mpv_player_cmd(cmd_options, mpv_options):

    return cmd

+def update_progress(progress, num, max_num):
+    width = 25
+    n = int(progress / 100.0 * width)
+    sys.stdout.write("\r %3d%% [%s%s%s] %d/%d" % (progress, "=" * n, ">", " " * (width - n), num, max_num))
+    sys.stdout.flush()
+
 def get_fragment_filename(phrase):
    s = phrase.strip().replace(' ', '_')
    s = s.replace('.*', '...')
@@ -125,6 +166,8 @@ def get_fragment_filename(phrase):

 def create_fragments(search_phrase, clips, export_mode):
    idx = 1
+    
+    update_progress(0, 0, len(clips))
    for video_file, clip_start, clip_end in clips:
        fragment_filename = get_fragment_filename(search_phrase)

@@ -166,6 +209,14 @@ def create_fragments(search_phrase, clips, export_mode):
            p = subprocess.Popen(cmd)
            p.wait()

+        if export_mode["subtitles"]:
+            subtitles_filename = video_file.rsplit('.', 1)[0] + ".srt"
+            subs = read_subtitles(subtitles_filename)
+            subs = filter_subtitles(subs, clip_start, clip_end)
+            write_subtitles(fragment_filename + ".srt", subs)
+
+        update_progress(float(idx) / len(clips) * 100, idx, len(clips))
+        
        idx += 1

 def play_clips(clips, ending_mode, mpv_options):
@@ -195,40 +246,64 @@ def play_clips(clips, ending_mode, mpv_options):
                
                try:
                    if p.poll() == None:
-                        f_pipe.write(" ".join(cmd) + "\n")
+                        msg = " ".join(cmd) + "\n"
+                        f_pipe.write(msg.encode('utf-8'))
                    else:
                        break
                except IOError as ex:
                    if ex.errno != 32:
-                        print ex
+                        print(ex)
                    if p != None:
                        p.kill()
                    return

-def main(media_dir, search_phrase, phrase_mode, phrases_gap, padding, limit, output_file, ending_mode, randomize_mode, demo_mode, mpv_options, audio_mode, video_mode, video_with_sub_mode):
-    search_phrase = search_phrase.decode(locale.getpreferredencoding())
-    search_phrase_in_utf8_representation = repr(search_phrase.encode("UTF-8"))
-    search_phrase_in_grep = "\"(?s)\(\d\d:\d\d:\d\d,\d\d\d\, \d\d:\d\d:\d\d,\d\d\d\)\\t[^\\n]*" + search_phrase_in_utf8_representation.strip("\'") + "[^\\n]*\""
+def print_match(media_dir, filename, line, attrs={"prev_filename": None}):
+    if filename.startswith(media_dir):
+        filename = filename.replace(media_dir + os.sep, '', 1)

-    cmd = " ".join(["grep", "-r", "-z", "-o", "-i", "--include", "\*\.txt", "-P", search_phrase_in_grep, '"' + media_dir + '"'])
-    p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True, bufsize=-1)
+    if attrs["prev_filename"] != filename:
+        print()
+        print('-', filename)
+        print()
+
+    attrs["prev_filename"] = filename
+
+    line = line.replace('\t', ' ')
+    print(line)
+
+def main(media_dir, search_phrase, phrase_mode, phrases_gap, padding, limit, output_file, ending_mode, randomize_mode, demo_mode, mpv_options, audio_mode, video_mode, video_with_sub_mode, subtitles_mode):
+    search_phrase_in_grep = "(?s)\(\d\d:\d\d:\d\d,\d\d\d\, \d\d:\d\d:\d\d,\d\d\d\)\\t[^\\n]*" + search_phrase + "[^\\n]*"
+
+    rg = shutil.which('rg')
+    if rg:
+        cmd = ["rg", "--no-heading", "--null-data", "-N", "-o", "-i", "-g", "*.txt", "-P", search_phrase_in_grep, media_dir]
+    else:
+        cmd = ["grep", "-r", "-z", "-o", "-i", "--include", "*.txt", "-P", search_phrase_in_grep, media_dir]
+    p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, bufsize=-1)
    output, error = p.communicate()

+    media_dir = os.path.abspath(media_dir).replace('\\', '/').replace('/', os.sep)
+
    if p.returncode == 0:
        matches = output.rstrip("\x00").split("\x00")
-        
+
        if output_file != None:
            with open(output_file, 'w') as f_results:
                f_results.write("\n".join(matches))

        clips = []
        for match in matches:
-            filename, line = match.split(".txt:", 1)
-            
-            lines = line.splitlines()
+            filename, line = match.strip().split(".txt:", 1)
+
+            filename = os.path.abspath(filename).replace('\\', '/').replace('/', os.sep)
+
+            if demo_mode:
+                print_match(media_dir, filename, line)
+
+            lines = line.split('\n')

            def get_line_timings(line):
-                sub_timing, sub_content = line.split("\t", 1)            
+                sub_timing, sub_content = line.split("\t", 1)
                sub_start, sub_end = sub_timing.strip("()").split(", ")
                return (sub_start, sub_end)

@@ -310,27 +385,30 @@ def main(media_dir, search_phrase, phrase_mode, phrases_gap, padding, limit, out
            for ext in movie_extensions:
                movie_filename = filename + "." + ext
                if os.path.isfile(movie_filename):
-                    clips.append((os.path.abspath(movie_filename), phrase_start - padding, phrase_end + padding))
+                    clips.append((movie_filename, phrase_start - padding, phrase_end + padding))
                    break

+        if demo_mode:
+            print()
+
        if phrase_mode:
-            print "Number of matches:", len(clips)
+            print("Number of matches:", len(clips))
            clips = list(OrderedDict.fromkeys(clips)) # delete dublicates
-        
-        print "Number of clips:", len(clips)
+
+        print("Number of clips:", len(clips))
        
        if randomize_mode:
            random.shuffle(clips)

-        if audio_mode or video_mode or video_with_sub_mode:
-            create_fragments(search_phrase, clips, {"audio": audio_mode, "video": video_mode, "video-sub": video_with_sub_mode})
+        if audio_mode or video_mode or video_with_sub_mode or subtitles_mode:
+            create_fragments(search_phrase, clips, {"audio": audio_mode, "video": video_mode, "video-sub": video_with_sub_mode, "subtitles": subtitles_mode})
        elif not demo_mode:
            play_clips(clips, ending_mode, mpv_options)

    elif p.returncode == 1:
-        print "'%s' is not found in '%s'" % (search_phrase, media_dir)
+        print("'%s' is not found in '%s'" % (search_phrase, media_dir))
    else:
-        print '%s' % error
+        print('%s' % error)

 def need_update(media_dir):
    srt_counter = 0
@@ -355,7 +433,7 @@ def convert_to_unicode(file_content):
        except UnicodeDecodeError:
            pass

-    print "ERROR: Unknown encoding. Use srt file with 'utf-8' encoding."
+    print("ERROR: Unknown encoding. Use srt file with 'utf-8' encoding.")
    return (False, file_content)

 def init(media_dir, limit):
@@ -365,17 +443,9 @@ def init(media_dir, limit):
            if file_ext == "srt":
                file_path = os.path.join(root, file)

-                print file_path
+                print(file_path)

-                with open(file_path, 'rU') as f_srt:
-                    content = f_srt.read()
-                    if content[:3]=='\xef\xbb\xbf': # with bom
-                        content = content[3:]
-                    ret_code, content = convert_to_unicode(content)
-                    if ret_code == False:
-                        sys.exit(1)
-
-                subs = read_subtitles(content)
+                subs = read_subtitles(file_path)
                subs = convert_into_sentences(subs, limit)

                write_subtitles(file_path[:-4] + ".txt", subs)
@@ -386,10 +456,10 @@ def parse_args(argv):

    search_phrase = argv[-1]
    if len(search_phrase) == 0:
-        print "Search phrase can't be empty"
-        sys.exit()
+        print("Search phrase can't be empty")
+        sys.exit(1)

-    args = {"padding": 0, "limit": 60, "output_file": None, "phrase_mode": False, "phrases_gap":1.25, "search_phrase":search_phrase, "ending_mode":False, "randomize_mode":False, "demo_mode":False, "mpv_options":"", "audio_mode":False, "video_mode":False, "video_with_sub_mode":False }
+    args = {"padding": 0, "limit": 60, "output_file": None, "phrase_mode": False, "phrases_gap":1.25, "search_phrase":search_phrase, "ending_mode":False, "randomize_mode":False, "demo_mode":False, "mpv_options":"", "audio_mode":False, "video_mode":False, "video_with_sub_mode":False, "subtitles_mode":False }
    
    argv = argv[:-1]
    idx = 0
@@ -424,8 +494,10 @@ def parse_args(argv):
            args["audio_mode"] = True
        elif argv[idx] == "--video" or argv[idx] == "-v":
            args["video_mode"] = True
-        elif argv[idx] == "--video-sub" or argv[idx] == "-s":
+        elif argv[idx] == "--video-sub" or argv[idx] == "-vs":
            args["video_with_sub_mode"] = True
+        elif argv[idx] == "--subtitles" or argv[idx] == "-s":
+            args["subtitles_mode"] = True
        elif argv[idx] == "--phrases" or argv[idx] == "-ph":
            args["phrase_mode"] = True
            if idx + 1 < len(argv):
@@ -450,37 +522,54 @@ def parse_args(argv):
    
    return args

-def usage():
-    print "Usage: playphrase -i <media_dir> <phrase>"
-    print ""
-    print "Init: playphrase -i <media_dir> _init_"
-    print ""
-    print "Additional options:"
-    print "-ph, --phrases GAP_BETWEEN_PHRASES", " ", "move start time of the clip to the beginning of the current phrase. Value is optional (default=1.25 seconds)"
-    print "-l, --limit", "     ", "maximum duration of the phrase (default=60 seconds)"
-    print "-p, --padding", "   ", "padding in seconds to add to the start and end of each clip (default=0.0 seconds)"
-    print "-e, --ending", "    ", "play only matching lines (or phrases)"
-    print "-r, --randomize", " ", "randomize the clips"
-    print "-o, --output", "    ", "name of the file in which output of \'grep\' command will be written"
-    print "-d, --demo", "      ", "only show grep results"
-    print "-a, --audio", "     ", "create audio fragments"
-    print "-v, --video", "     ", "create video fragments"
-    print "-s, --video-sub", " ", "create video fragments with subtitles"
-    print "-m, --mpv-options OPTIONS", " ", "mpv player options"
+def validate_args(args):
+    if not os.path.isdir(args["media_dir"]):
+        print("ERROR: '{}' is not a folder".format(args["media_dir"]))
+        return False
+    if args["output_file"]:
+        if os.path.isdir(args["output_file"]):
+            print("ERROR: '{}' can't be a folder".format(args["output_file"]))
+            return False
+    return True
+
+def print_usage():
+    print("Usage: playphrase -i <media_dir> <phrase>")
+    print()
+    print("Init: playphrase -i <media_dir> _init_")
+    print()
+    print("Additional options:")
+    print("-ph GAP_BETWEEN_PHRASES, --phrases", "    ", "move the start time of the clip to the beginning of the current phrase (default=1.25 seconds)")
+    print("-l SECONDS, --limit", "       ", "maximum phrase's duration (default=60 seconds)")
+    print("-p SECONDS, --padding", "     ", "padding in seconds to add to the start and the end of each clip (default=0.0 seconds)")
+    print("-e SECONDS, --ending", "      ", "play only matching lines (or phrases)")
+    print("-r, --randomize", "           ", "randomize the clips")
+    print("-o FILENAME, --output", "     ", "write the 'grep' output to the file")
+    print("-d, --demo", "                ", "only show grep results")
+    print("-a, --audio", "               ", "create audio fragments")
+    print("-v, --video", "               ", "create video fragments")
+    print("-vs, --video-sub", "          ", "create video fragments with hardcoded subtitles")
+    print("-s, --subtitles", "           ", "create subtitles for fragments")
+    print("-m OPTIONS, --mpv-options", " ", "mpv player options")

 if __name__ == '__main__':
    os.environ["PATH"] += os.pathsep + "." + os.sep + "utils" + os.sep + "grep"
    os.environ["PATH"] += os.pathsep + "." + os.sep + "utils" + os.sep + "mpv"
    os.environ["PATH"] += os.pathsep + "." + os.sep + "utils" + os.sep + "ffmpeg"

+    if "LC_ALL" not in os.environ:
+        os.environ["LC_ALL"] = "en_US.utf8"
+
    args = parse_args(sys.argv[1:])
-    if args != False:
+
+    if args == False:
+        print_usage()
+        sys.exit(1)
+
+    if validate_args(args):
        if args["search_phrase"] == "_init_":
            init(args["media_dir"], args["limit"])
        else:
            if need_update(args["media_dir"]):
-                print "WARNING: number of '.srt' and '.txt' files doesn't match. Maybe you need to use 'playphrase <media_dir> _init_'."
+                print("WARNING: number of '.srt' and '.txt' files doesn't match. Maybe use 'playphrase -i <media_dir> _init_'.")
            
-            main(args["media_dir"], args["search_phrase"], args["phrase_mode"], args["phrases_gap"], args["padding"], args["limit"], args["output_file"], args["ending_mode"], args["randomize_mode"], args["demo_mode"], args["mpv_options"], args["audio_mode"], args["video_mode"], args["video_with_sub_mode"])
-    else:
-        usage()
+            main(args["media_dir"], args["search_phrase"], args["phrase_mode"], args["phrases_gap"], args["padding"], args["limit"], args["output_file"], args["ending_mode"], args["randomize_mode"], args["demo_mode"], args["mpv_options"], args["audio_mode"], args["video_mode"], args["video_with_sub_mode"], args["subtitles_mode"])
Author	SHA1	Message	Date
kelciour	447973e773	Update README	2020-10-05 19:30:05 +03:00
kelciour	983d302eaa	Add support for ripgrep	2020-10-05 19:22:16 +03:00
kelciour	c1c2a424f4	Pass a list of arguments to Popen	2020-10-05 18:42:29 +03:00
kelciour	64fba5c94c	Improve --demo option	2020-10-05 17:59:41 +03:00
kelciour	a2ea5cbd28	Split on \n only'	2020-10-05 17:10:04 +03:00
kelciour	b46e891978	Update README	2020-10-05 15:09:02 +03:00
kelciour	2f930673e4	Reword usage	2020-10-05 14:59:04 +03:00
kelciour	92dc4cf137	Make sure <media_dir> is a folder	2020-10-05 14:31:11 +03:00
kelciour	17673583f0	Fix for grep 3.1: -P supports only unibyte and UTF-8 locales	2020-10-05 12:46:48 +03:00
kelciour	d56180e89b	Update to Python 3	2020-10-05 12:41:25 +03:00
kelciour	accfa1b039	Update README	2017-11-16 00:44:32 +03:00
kelciour	cec525a11f	Update README	2017-11-16 00:35:26 +03:00
kelciour	4ece4ce006	Add progress bar	2017-11-16 00:22:48 +03:00
kelciour	1a8ec33ea6	Don't join sentences in songs	2017-11-15 20:15:28 +03:00
kelciour	e5d972c18f	Add subtitles export	2017-11-15 20:01:16 +03:00
kelciour	4f9d9867b8	Remove surrounding quotes from search_phrase in utf-8 representation	2017-11-07 16:18:01 +03:00