Mercurial > codedump
annotate channeldownloader.py @ 130:7b9795a60e59 default tip
add FLAC tracknum utility
dumb piece of shit but it works
| author | Paper <paper@tflc.us> | 
|---|---|
| date | Thu, 30 Oct 2025 09:21:00 -0400 | 
| parents | 8ec0e91a5dcf | 
| children | 
| rev | line source | 
|---|---|
| 67 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 1 #!/usr/bin/env python3 | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 2 """ | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 3 Usage: | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 4 channeldownloader.py <url>... (--database <file>) | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 5 [--output <folder>] | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 6 channeldownloader.py -h | --help | 
| 67 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 7 | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 8 Arguments: | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 9 <url> YouTube channel URL to download from | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 10 | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 11 Options: | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 12 -h --help Show this screen | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 13 -o --output <folder> Output folder, relative to the current directory | 
| 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 14 [default: .] | 
| 120 
3ecb2e815854
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
119diff
changeset | 15 -d --database <file> YTPMV_Database compatible JSON file | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 16 """ | 
| 67 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 17 from __future__ import print_function | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 18 import docopt | 
| 67 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 19 import internetarchive | 
| 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 20 try: | 
| 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 21 import orjson as json | 
| 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 22 except ImportError: | 
| 
9636d5dee08c
[channeldownloader.py] Python 2.7 compatibility
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
61diff
changeset | 23 import json | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 24 import os | 
| 59 
a3927b2ec6e6
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
58diff
changeset | 25 import re | 
| 68 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 26 import time | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 27 import urllib.request | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 28 import requests # need this for ONE (1) exception | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 29 import yt_dlp as youtube_dl | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 30 from urllib.error import HTTPError | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 31 from yt_dlp.utils import sanitize_filename, DownloadError | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 32 from pathlib import Path | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 33 from requests.exceptions import ConnectTimeout | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 34 | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 35 | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 36 class MyLogger(object): | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 37 def debug(self, msg): | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 38 pass | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 39 | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 40 def warning(self, msg): | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 41 pass | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 42 | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 43 def error(self, msg): | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 44 print(" " + msg) | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 45 pass | 
| 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 46 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 47 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 48 def ytdl_hook(d) -> None: | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 49 if d["status"] == "finished": | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 50 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"]))) | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 51 if d["status"] == "downloading": | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 52 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]), | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 53 d["_percent_str"]), end="") | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 54 if d["status"] == "error": | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 55 print("\n an error occurred downloading %s!" | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 56 % (os.path.basename(d["filename"]))) | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 57 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 58 | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 59 def load_split_files(path: str): | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 60 if not os.path.isdir(path): | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 61 yield json.load(open(path, "r", encoding="utf-8")) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 62 for fi in os.listdir(path): | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 63 if re.search(r"vids[0-9\-]+?\.json", fi): | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 64 with open(path + "/" + fi, "r", encoding="utf-8") as infile: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 65 print(fi) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 66 yield json.load(infile) | 
| 61 
c615532e6572
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
60diff
changeset | 67 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 68 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 69 def reporthook(count: int, block_size: int, total_size: int) -> None: | 
| 68 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 70 global start_time | 
| 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 71 if count == 0: | 
| 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 72 start_time = time.time() | 
| 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 73 return | 
| 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 74 percent = int(count * block_size * 100 / total_size) | 
| 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 75 print(" downloading %d%% \r" % (percent), end="") | 
| 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 76 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 77 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 78 def write_metadata(i: dict, basename: str) -> None: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 79 if not os.path.exists(basename + ".info.json"): | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 80 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 81 try: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 82 jsonfile.write(json.dumps(i).decode("utf-8")) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 83 except AttributeError: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 84 jsonfile.write(json.dumps(i)) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 85 print(" saved %s" % os.path.basename(jsonfile.name)) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 86 if not os.path.exists(basename + ".description"): | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 87 with open(basename + ".description", "w", | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 88 encoding="utf-8") as descfile: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 89 descfile.write(i["description"]) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 90 print(" saved %s" % os.path.basename(descfile.name)) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 91 | 
| 47 
00403c09455c
Add channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff
changeset | 92 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 93 def wayback_machine_dl(video: dict, basename: str) -> int: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 94 try: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 95 url = ''.join(["https://web.archive.org/web/2oe_/http://wayback-fakeu", | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 96 "rl.archive.org/yt/%s"]) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 97 headers = urllib.request.urlopen(url % video["id"]) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 98 contenttype = headers.getheader("Content-Type") | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 99 if contenttype == "video/webm": | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 100 ext = "webm" | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 101 elif contenttype == "video/mp4": | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 102 ext = "mp4" | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 103 else: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 104 raise HTTPError(url=None, code=None, msg=None, | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 105 hdrs=None, fp=None) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 106 urllib.request.urlretrieve(url % video["id"], "%s.%s" % (basename, ext), | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 107 reporthook) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 108 print(" downloaded %s.%s" % (basename, ext)) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 109 return 0 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 110 except TimeoutError: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 111 return 1 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 112 except HTTPError: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 113 print(" video not available on the Wayback Machine!") | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 114 return 0 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 115 except Exception as e: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 116 print(" unknown error downloading video!\n") | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 117 print(e) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 118 return 0 | 
| 114 
80bd4a99ea00
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
70diff
changeset | 119 | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 120 | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 121 def ia_file_legit(path: str, vidid: str) -> bool: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 122 return True if re.search(''.join([r"((?:.+?-)?", vidid, r"\.(?:mp4|jpg|web" | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 123 r"p|mkv|webm|info\\.json|description|annotations.xml" | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 124 "))"]), | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 125 path) else False | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 126 | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 127 | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 128 def internet_archive_dl(video: dict, basename: str, output: str) -> int: | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 129 if internetarchive.get_item("youtube-%s" % video["id"]).exists: | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 130 flist = [f.name for f in internetarchive.get_files("youtube-%s" % video["id"]) if ia_file_legit(f.name, video["id"])] | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 131 while True: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 132 try: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 133 internetarchive.download("youtube-%s" % video["id"], | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 134 files=flist, verbose=True, | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 135 destdir=output, | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 136 no_directory=True, | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 137 ignore_existing=True, | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 138 retries=9999) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 139 break | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 140 except ConnectTimeout: | 
| 61 
c615532e6572
Update channeldownloader.py
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
60diff
changeset | 141 continue | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 142 except Exception as e: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 143 print(e) | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 144 return 0 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 145 if flist[0][:len(video["id"])] == video["id"]: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 146 for fname in flist: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 147 if os.path.exists("%s/%s" % (output, fname)): | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 148 os.replace("%s/%s" % (output, fname), | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 149 "%s-%s" % (basename.rsplit("-", 1)[0], | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 150 fname)) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 151 return 1 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 152 return 0 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 153 | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 154 | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 155 ytdl_opts = { | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 156 "retries": 100, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 157 "nooverwrites": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 158 "call_home": False, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 159 "quiet": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 160 "writeinfojson": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 161 "writedescription": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 162 "writethumbnail": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 163 "writeannotations": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 164 "writesubtitles": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 165 "allsubtitles": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 166 "addmetadata": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 167 "continuedl": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 168 "embedthumbnail": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 169 "format": "bestvideo+bestaudio/best", | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 170 "restrictfilenames": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 171 "no_warnings": True, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 172 "progress_hooks": [ytdl_hook], | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 173 "logger": MyLogger(), | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 174 "ignoreerrors": False, | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 175 } | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 176 | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 177 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 178 def main(): | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 179 args = docopt.docopt(__doc__) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 180 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 181 if not os.path.exists(args["--output"]): | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 182 os.mkdir(args["--output"]) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 183 | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 184 for f in load_split_files(args["--database"]): | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 185 for i in f: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 186 uploader = i["uploader_id"] if "uploader_id" in i else None | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 187 for url in args["<url>"]: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 188 channel = url.split("/")[-1] | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 189 | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 190 output = "%s/%s" % (args["--output"], channel) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 191 if not os.path.exists(output): | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 192 os.mkdir(output) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 193 ytdl_opts["outtmpl"] = output + "/%(title)s-%(id)s.%(ext)s" | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 194 | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 195 if uploader == channel: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 196 print(uploader, channel) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 197 print("%s:" % i["id"]) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 198 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"], | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 199 restricted=True), i["id"]) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 200 files = [y for p in ["mkv", "mp4", "webm"] for y in list(Path(output).glob(("*-%s." + p) % i["id"]))] | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 201 if files: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 202 print(" video already downloaded!") | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 203 write_metadata(i, basename) | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 204 continue | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 205 # this code is *really* ugly... todo a rewrite? | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 206 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 207 try: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 208 ytdl.extract_info("https://youtube.com/watch?v=%s" | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 209 % i["id"]) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 210 continue | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 211 except DownloadError: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 212 print(" video is not available! attempting to find In" | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 213 "ternet Archive pages of it...") | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 214 except Exception as e: | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 215 print(" unknown error downloading video!\n") | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 216 print(e) | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 217 if internet_archive_dl(i, basename, output): # if we can't download from IA | 
| 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 218 continue | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 219 print(" video does not have a Internet Archive page! attem" | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 220 "pting to download from the Wayback Machine...") | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 221 while True: | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 222 if wayback_machine_dl(i, basename) == 0: # success | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 223 break | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 224 time.sleep(5) | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 225 continue | 
| 119 
196cf2e3d96e
channeldownloader: insane memory optimizations
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
118diff
changeset | 226 write_metadata(i, basename) | 
| 68 
a43ed076b28f
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
67diff
changeset | 227 | 
| 118 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 228 | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 229 if __name__ == "__main__": | 
| 
eac6dae753ca
*: major cleanup
 Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 
114diff
changeset | 230 main() | 
