channeldownloader: channeldownloader.py annotate

annotate channeldownloader.py @ 10:8969930a9fa4

*: major cleanup committer: GitHub <noreply@github.com>

author	Paper <37962225+mrpapersonic@users.noreply.github.com>
date	Fri, 03 Mar 2023 22:51:28 +0000
parents	2e9ed463c0be
children	1ac85f6f40c4

rev	line source
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	1 #!/usr/bin/env python3
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	2 """
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	3 Usage:
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	4 channeldownloader.py <url>... (--database <file>)
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	5 [--output <folder>]
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	6 [--proxy <proxy>]
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	7 channeldownloader.py -h \| --help
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	8
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	9 Arguments:
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	10 <url> YouTube channel URL to download from
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	11
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	12 Options:
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	13 -h --help Show this screen
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	14 -o --output <folder> Output folder, relative to the current directory
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	15 [default: .]
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	16 -d --database <file> HTTP or HTTPS proxy (SOCKS5 with PySocks)
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	17 """
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	18 from __future__ import print_function
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	19 import docopt
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	20 import internetarchive
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	21 try:
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	22 import orjson as json
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	23 except ImportError:
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	24 import json
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	25 import os
2 c65d14f01453 Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 1 diff changeset	26 import re
6 5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	27 import time
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	28 import urllib.request
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	29 import requests # need this for ONE (1) exception
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	30 import yt_dlp as youtube_dl
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	31 from urllib.error import HTTPError
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	32 from yt_dlp.utils import sanitize_filename, DownloadError
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	33 from pathlib import Path
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	34 from requests.exceptions import ConnectTimeout
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	35
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	36
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	37 class MyLogger(object):
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	38 def debug(self, msg):
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	39 pass
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	40
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	41 def warning(self, msg):
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	42 pass
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	43
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	44 def error(self, msg):
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	45 print(" " + msg)
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	46 pass
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	47
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	48
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	49 def ytdl_hook(d) -> None:
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	50 if d["status"] == "finished":
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	51 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"])))
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	52 if d["status"] == "downloading":
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	53 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]),
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	54 d["_percent_str"]), end="")
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	55 if d["status"] == "error":
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	56 print("\n an error occurred downloading %s!"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	57 % (os.path.basename(d["filename"])))
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	58
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	59
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	60 def load_split_files(path: str) -> dict:
4 aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	61 if os.path.isdir(path):
aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	62 result = {"videos": []}
7 571c5525fccb Use regex instead of weirdness to filter archive.org names Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 6 diff changeset	63 for fi in os.listdir(path):
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	64 for f in re.findall(r"vids[0-9\-]+?\.json", fi):
7 571c5525fccb Use regex instead of weirdness to filter archive.org names Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 6 diff changeset	65 with open(path + "/" + f, "r", encoding="utf-8") as infile:
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	66 jsonnn = json.loads(infile.read())
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	67 result["videos"].extend(jsonnn)
4 aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	68 return result
aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	69 else:
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	70 return json.loads(open(path, "r", encoding="utf-8").read())
4 aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	71
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	72
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	73 def reporthook(count: int, block_size: int, total_size: int) -> None:
6 5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	74 global start_time
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	75 if count == 0:
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	76 start_time = time.time()
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	77 return
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	78 percent = int(count * block_size * 100 / total_size)
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	79 print(" downloading %d%% \r" % (percent), end="")
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	80
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	81
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	82 def write_metadata(i: dict, basename: str) -> None:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	83 if not os.path.exists(basename + ".info.json"):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	84 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	85 try:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	86 jsonfile.write(json.dumps(i).decode("utf-8"))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	87 except AttributeError:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	88 jsonfile.write(json.dumps(i))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	89 print(" saved %s" % os.path.basename(jsonfile.name))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	90 if not os.path.exists(basename + ".description"):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	91 with open(basename + ".description", "w",
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	92 encoding="utf-8") as descfile:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	93 descfile.write(i["description"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	94 print(" saved %s" % os.path.basename(descfile.name))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	95
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	96
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	97 ytdl_opts = {
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	98 "retries": 100,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	99 "nooverwrites": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	100 "call_home": False,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	101 "quiet": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	102 "writeinfojson": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	103 "writedescription": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	104 "writethumbnail": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	105 "writeannotations": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	106 "writesubtitles": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	107 "allsubtitles": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	108 "addmetadata": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	109 "continuedl": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	110 "embedthumbnail": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	111 "format": "bestvideo+bestaudio/best",
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	112 "restrictfilenames": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	113 "no_warnings": True,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	114 "progress_hooks": [ytdl_hook],
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	115 "logger": MyLogger(),
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	116 "ignoreerrors": False,
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	117 }
d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	118
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	119
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	120 def wayback_machine_dl(video: dict, basename: str) -> int:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	121 try:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	122 url = ''.join(["https://web.archive.org/web/2oe_/http://wayback-fakeu",
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	123 "rl.archive.org/yt/%s"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	124 headers = urllib.request.urlopen(url % video["id"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	125 contenttype = headers.getheader("Content-Type")
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	126 if contenttype == "video/webm":
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	127 ext = "webm"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	128 elif contenttype == "video/mp4":
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	129 ext = "mp4"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	130 else:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	131 raise HTTPError(url=None, code=None, msg=None,
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	132 hdrs=None, fp=None)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	133 urllib.request.urlretrieve(url % video["id"], "%s.%s" % (basename, ext),
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	134 reporthook)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	135 print(" downloaded %s.%s" % (basename, ext))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	136 return 0
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	137 except TimeoutError:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	138 return 1
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	139 except HTTPError:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	140 print(" video not available on the Wayback Machine!")
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	141 return 0
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	142 except Exception as e:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	143 print(" unknown error downloading video!\n")
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	144 print(e)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	145 return 0
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	146
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	147 def internet_archive_dl(video: dict, basename: str) -> int:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	148 if internetarchive.get_item("youtube-%s" % video["id"]).exists:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	149 fnames = [f.name for f in internetarchive.get_files(
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	150 "youtube-%s" % video["id"])]
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	151 flist = []
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	152 for fname in range(len(fnames)):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	153 if re.search(''.join([r"((?:.+?-)?", video["id"],
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	154 r"\.(?:mp4\|jpg\|webp\|mkv\|webm\|info\\.json\|des"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	155 r"cription\|annotations.xml))"]),
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	156 fnames[fname]):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	157 flist.append(fnames[fname])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	158 while True:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	159 try:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	160 internetarchive.download("youtube-%s" % video["id"],
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	161 files=flist, verbose=True,
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	162 destdir=output,
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	163 no_directory=True,
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	164 ignore_existing=True,
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	165 retries=9999)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	166 break
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	167 except ConnectTimeout:
4 aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	168 continue
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	169 except Exception:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	170 return 0
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	171 if flist[0][:len(video["id"])] == video["id"]:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	172 for fname in flist:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	173 if os.path.exists("%s/%s" % (output, fname)):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	174 os.replace("%s/%s" % (output, fname),
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	175 "%s-%s" % (basename.rsplit("-", 1)[0],
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	176 fname))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	177 return 1
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	178 return 0
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	179
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	180 def main():
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	181 args = docopt.docopt(__doc__)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	182
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	183 if not os.path.exists(args["--output"]):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	184 os.mkdir(args["--output"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	185
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	186 for i in load_split_files(args["--database"])["videos"]:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	187 uploader = i["uploader_id"] if "uploader_id" in i else None
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	188 for url in args["<url>"]:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	189 channel = url.split("/")[-1]
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	190
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	191 output = "%s/%s" % (args["--output"], channel)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	192 if not os.path.exists(output):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	193 os.mkdir(output)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	194 ytdl_opts["outtmpl"] = output + "/%(title)s-%(id)s.%(ext)s"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	195
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	196 if uploader == channel:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	197 print("%s:" % i["id"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	198 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"],
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	199 restricted=True), i["id"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	200 path = Path(output)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	201 files = list(path.glob("*-%s.mkv" % i["id"]))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	202 files.extend(list(path.glob("*-%s.mp4" % i["id"])))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	203 files.extend(list(path.glob("*-%s.webm" % i["id"])))
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	204 if files:
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	205 print(" video already downloaded!")
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	206 write_metadata(i, basename)
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	207 continue
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	208 # this code is really ugly... todo a rewrite?
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	209 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl:
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	210 try:
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	211 ytdl.extract_info("https://youtube.com/watch?v=%s"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	212 % i["id"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	213 continue
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	214 except DownloadError:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	215 print(" video is not available! attempting to find In"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	216 "ternet Archive pages of it...")
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	217 except Exception as e:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	218 print(" unknown error downloading video!\n")
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	219 print(e)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	220 if internet_archive_dl(i, basename) == 0: # if we can't download from IA
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	221 print(" video does not have a Internet Archive page! attem"
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	222 "pting to download from the Wayback Machine...")
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	223 while True:
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	224 if wayback_machine_dl(i, basename) == 0: # success
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	225 break
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	226 time.sleep(5)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	227 continue
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	228 write_metadata(i, basename)
6 5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	229
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	230
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	231 if __name__ == "__main__":
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	232 main()

Mercurial > channeldownloader

annotate channeldownloader.py @ 10:8969930a9fa4