annotate channeldownloader.py @ 12:77c93f46dd06

Update channeldownloader.py committer: GitHub <noreply@github.com>
author Paper <37962225+mrpapersonic@users.noreply.github.com>
date Fri, 14 Apr 2023 23:52:47 -0400
parents 1ac85f6f40c4
children 2e7a3725ad21
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
1 #!/usr/bin/env python3
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
2 """
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
3 Usage:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
4 channeldownloader.py <url>... (--database <file>)
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
5 [--output <folder>]
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
6 [--proxy <proxy>]
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
7 channeldownloader.py -h | --help
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
8
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
9 Arguments:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
10 <url> YouTube channel URL to download from
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
11
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
12 Options:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
13 -h --help Show this screen
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
14 -o --output <folder> Output folder, relative to the current directory
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
15 [default: .]
12
77c93f46dd06 Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 11
diff changeset
16 -d --database <file> YTPMV_Database compatible JSON file
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
17 """
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
18 from __future__ import print_function
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
19 import docopt
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
20 import internetarchive
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
21 try:
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
22 import orjson as json
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
23 except ImportError:
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
24 import json
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
25 import os
2
c65d14f01453 Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 1
diff changeset
26 import re
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
27 import time
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
28 import urllib.request
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
29 import requests # need this for ONE (1) exception
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
30 import yt_dlp as youtube_dl
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
31 from urllib.error import HTTPError
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
32 from yt_dlp.utils import sanitize_filename, DownloadError
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
33 from pathlib import Path
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
34 from requests.exceptions import ConnectTimeout
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
35
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
36
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
37 class MyLogger(object):
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
38 def debug(self, msg):
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
39 pass
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
40
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
41 def warning(self, msg):
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
42 pass
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
43
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
44 def error(self, msg):
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
45 print(" " + msg)
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
46 pass
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
47
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
48
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
49 def ytdl_hook(d) -> None:
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
50 if d["status"] == "finished":
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
51 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"])))
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
52 if d["status"] == "downloading":
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
53 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]),
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
54 d["_percent_str"]), end="")
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
55 if d["status"] == "error":
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
56 print("\n an error occurred downloading %s!"
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
57 % (os.path.basename(d["filename"])))
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
58
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
59
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
60 def load_split_files(path: str):
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
61 if not os.path.isdir(path):
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
62 yield json.load(open(path, "r", encoding="utf-8"))
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
63 for fi in os.listdir(path):
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
64 if re.search(r"vids[0-9\-]+?\.json", fi):
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
65 with open(path + "/" + fi, "r", encoding="utf-8") as infile:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
66 print(fi)
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
67 yield json.load(infile)
4
aa652a6f97af Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 3
diff changeset
68
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
69
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
70 def reporthook(count: int, block_size: int, total_size: int) -> None:
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
71 global start_time
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
72 if count == 0:
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
73 start_time = time.time()
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
74 return
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
75 percent = int(count * block_size * 100 / total_size)
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
76 print(" downloading %d%% \r" % (percent), end="")
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
77
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
78
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
79 def write_metadata(i: dict, basename: str) -> None:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
80 if not os.path.exists(basename + ".info.json"):
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
81 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
82 try:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
83 jsonfile.write(json.dumps(i).decode("utf-8"))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
84 except AttributeError:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
85 jsonfile.write(json.dumps(i))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
86 print(" saved %s" % os.path.basename(jsonfile.name))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
87 if not os.path.exists(basename + ".description"):
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
88 with open(basename + ".description", "w",
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
89 encoding="utf-8") as descfile:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
90 descfile.write(i["description"])
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
91 print(" saved %s" % os.path.basename(descfile.name))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
92
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
93
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
94 def wayback_machine_dl(video: dict, basename: str) -> int:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
95 try:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
96 url = ''.join(["https://web.archive.org/web/2oe_/http://wayback-fakeu",
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
97 "rl.archive.org/yt/%s"])
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
98 headers = urllib.request.urlopen(url % video["id"])
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
99 contenttype = headers.getheader("Content-Type")
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
100 if contenttype == "video/webm":
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
101 ext = "webm"
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
102 elif contenttype == "video/mp4":
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
103 ext = "mp4"
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
104 else:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
105 raise HTTPError(url=None, code=None, msg=None,
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
106 hdrs=None, fp=None)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
107 urllib.request.urlretrieve(url % video["id"], "%s.%s" % (basename, ext),
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
108 reporthook)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
109 print(" downloaded %s.%s" % (basename, ext))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
110 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
111 except TimeoutError:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
112 return 1
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
113 except HTTPError:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
114 print(" video not available on the Wayback Machine!")
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
115 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
116 except Exception as e:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
117 print(" unknown error downloading video!\n")
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
118 print(e)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
119 return 0
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
120
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
121
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
122 def ia_file_legit(path: str, vidid: str) -> bool:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
123 return True if re.search(''.join([r"((?:.+?-)?", vidid, r"\.(?:mp4|jpg|web"
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
124 r"p|mkv|webm|info\\.json|description|annotations.xml"
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
125 "))"]),
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
126 path) else False
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
127
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
128
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
129 def internet_archive_dl(video: dict, basename: str, output: str) -> int:
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
130 if internetarchive.get_item("youtube-%s" % video["id"]).exists:
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
131 flist = [f.name for f in internetarchive.get_files("youtube-%s" % video["id"]) if ia_file_legit(f.name, video["id"])]
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
132 while True:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
133 try:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
134 internetarchive.download("youtube-%s" % video["id"],
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
135 files=flist, verbose=True,
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
136 destdir=output,
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
137 no_directory=True,
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
138 ignore_existing=True,
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
139 retries=9999)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
140 break
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
141 except ConnectTimeout:
4
aa652a6f97af Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 3
diff changeset
142 continue
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
143 except Exception as e:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
144 print(e)
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
145 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
146 if flist[0][:len(video["id"])] == video["id"]:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
147 for fname in flist:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
148 if os.path.exists("%s/%s" % (output, fname)):
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
149 os.replace("%s/%s" % (output, fname),
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
150 "%s-%s" % (basename.rsplit("-", 1)[0],
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
151 fname))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
152 return 1
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
153 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
154
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
155
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
156 ytdl_opts = {
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
157 "retries": 100,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
158 "nooverwrites": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
159 "call_home": False,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
160 "quiet": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
161 "writeinfojson": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
162 "writedescription": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
163 "writethumbnail": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
164 "writeannotations": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
165 "writesubtitles": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
166 "allsubtitles": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
167 "addmetadata": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
168 "continuedl": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
169 "embedthumbnail": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
170 "format": "bestvideo+bestaudio/best",
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
171 "restrictfilenames": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
172 "no_warnings": True,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
173 "progress_hooks": [ytdl_hook],
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
174 "logger": MyLogger(),
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
175 "ignoreerrors": False,
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
176 }
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
177
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
178
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
179 def main():
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
180 args = docopt.docopt(__doc__)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
181
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
182 if not os.path.exists(args["--output"]):
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
183 os.mkdir(args["--output"])
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
184
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
185 for f in load_split_files(args["--database"]):
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
186 for i in f:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
187 uploader = i["uploader_id"] if "uploader_id" in i else None
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
188 for url in args["<url>"]:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
189 channel = url.split("/")[-1]
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
190
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
191 output = "%s/%s" % (args["--output"], channel)
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
192 if not os.path.exists(output):
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
193 os.mkdir(output)
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
194 ytdl_opts["outtmpl"] = output + "/%(title)s-%(id)s.%(ext)s"
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
195
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
196 if uploader == channel:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
197 print(uploader, channel)
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
198 print("%s:" % i["id"])
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
199 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"],
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
200 restricted=True), i["id"])
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
201 files = [y for p in ["mkv", "mp4", "webm"] for y in list(Path(output).glob(("*-%s." + p) % i["id"]))]
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
202 if files:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
203 print(" video already downloaded!")
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
204 write_metadata(i, basename)
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
205 continue
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
206 # this code is *really* ugly... todo a rewrite?
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
207 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
208 try:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
209 ytdl.extract_info("https://youtube.com/watch?v=%s"
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
210 % i["id"])
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
211 continue
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
212 except DownloadError:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
213 print(" video is not available! attempting to find In"
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
214 "ternet Archive pages of it...")
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
215 except Exception as e:
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
216 print(" unknown error downloading video!\n")
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
217 print(e)
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
218 if internet_archive_dl(i, basename, output): # if we can't download from IA
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
219 continue
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
220 print(" video does not have a Internet Archive page! attem"
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
221 "pting to download from the Wayback Machine...")
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
222 while True:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
223 if wayback_machine_dl(i, basename) == 0: # success
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
224 break
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
225 time.sleep(5)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
226 continue
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
227 write_metadata(i, basename)
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
228
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
229
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
230 if __name__ == "__main__":
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
231 main()