annotate channeldownloader.py @ 15:615e1ca0212a

*: add support for loading the split db from a zip file saves time having to decompress it. also fixed a couple bugs here and there (notably with the IA downloading)
author Paper <paper@tflc.us>
date Fri, 27 Feb 2026 17:01:18 -0500
parents 03c8fd4069fb
children 088d9a3a2524
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
1 #!/usr/bin/env python3
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
2 # -*- coding: utf-8 -*-
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
3 # channeldownloader.py - scrapes youtube videos from a channel from
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
4 # a variety of sources
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
5
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
6 # Copyright (c) 2021-2025 Paper <paper@tflc.us>
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
7 # This program is free software: you can redistribute it and/or modify
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
8 # it under the terms of the GNU General Public License as published by
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
9 # the Free Software Foundation, either version 2 of the License, or
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
10 # (at your option) any later version.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
11 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
12 # This program is distributed in the hope that it will be useful,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
15 # GNU General Public License for more details.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
16 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
17 # You should have received a copy of the GNU General Public License
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
18 # along with this program. If not, see <http://www.gnu.org/licenses/>.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
19
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
20 """
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
21 Usage:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
22 channeldownloader.py <url>... (--database <file>)
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
23 [--output <folder>]
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
24 channeldownloader.py -h | --help
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
25
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
26 Arguments:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
27 <url> YouTube channel URL to download from
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
28
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
29 Options:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
30 -h --help Show this screen
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
31 -o --output <folder> Output folder, relative to the current directory
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
32 [default: .]
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
33 -d --database <file> yt-dlp style database of videos. Should contain
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
34 an array of yt-dlp .info.json data. For example,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
35 FinnOtaku's YTPMV metadata archive.
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
36 """
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
37
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
38 # Built-in python stuff (no possible missing dependencies)
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
39 from __future__ import print_function
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
40 import docopt
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
41 import os
2
c65d14f01453 Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 1
diff changeset
42 import re
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
43 import time
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
44 import urllib.request
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
45 import os
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
46 import ssl
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
47 import io
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
48 from urllib.error import HTTPError
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
49 from pathlib import Path
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
50
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
51 # We can utilize special simdjson features if it is available
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
52 simdjson = False
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
53
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
54 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
55 import simdjson as json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
56 simdjson = True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
57 print("INFO: using simdjson")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
58 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
59 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
60 import ujson as json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
61 print("INFO: using ujson")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
62 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
63 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
64 import orjson as json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
65 print("INFO: using orjson")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
66 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
67 import json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
68 print("INFO: using built-in json (slow!)")
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
69
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
70 ytdlp_works = False
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
71
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
72 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
73 import yt_dlp as youtube_dl
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
74 from yt_dlp.utils import sanitize_filename, DownloadError
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
75 ytdlp_works = True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
76 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
77 print("failed to import yt-dlp!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
78 print("downloading from YouTube directly will not work.")
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
79
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
80 ia_works = False
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
81
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
82 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
83 import internetarchive
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
84 from requests.exceptions import ConnectTimeout
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
85 ia_works = True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
86 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
87 print("failed to import the Internet Archive's python library!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
88 print("downloading from IA will not work.")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
89
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
90 zipfile_works = False
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
91
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
92 try:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
93 import zipfile
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
94 zipfile_works = True
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
95 except ImportError:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
96 print("failed to import zipfile!")
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
97 print("loading the database from a .zip file will not work.")
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
98
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
99 ##############################################################################
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
100 ## DOWNLOADERS
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
101
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
102 # All downloaders should be a function under this signature:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
103 # dl(video: dict, basename: str, output: str) -> int
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
104 # where:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
105 # 'video': the .info.json scraped from the YTPMV metadata archive.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
106 # 'basename': the basename output to write as.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
107 # 'output': the output directory.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
108 # yes, it's weird, but I don't care ;)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
109 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
110 # Magic return values:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
111 # 0 -- all good, video is downloaded
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
112 # 1 -- error downloading video; it may still be available if we try again
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
113 # 2 -- video is proved totally unavailable here. give up
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
114
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
115
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
116 # Basic downloader template.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
117 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
118 # This does a brute-force of all extensions within vexts and iexts
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
119 # in an attempt to find a working video link.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
120 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
121 # linktemplate is a template to be created using the video ID and
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
122 # extension. For example:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
123 # https://cdn.ytarchiver.com/%s.%s
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
124 def basic_dl_template(video: dict, basename: str, output: str,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
125 linktemplate: str, vexts: list, iexts: list) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
126 # actual downloader
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
127 def basic_dl_impl(vid: str, ext: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
128 url = (linktemplate % (vid, ext))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
129 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
130 with urllib.request.urlopen(url) as headers:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
131 with open("%s.%s" % (basename, ext), "wb") as f:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
132 f.write(headers.read())
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
133 print(" downloaded %s.%s" % (basename, ext))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
134 return 0
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
135 except TimeoutError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
136 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
137 except HTTPError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
138 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
139 except Exception as e:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
140 print(" unknown error downloading video!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
141 print(e)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
142 return 1
4
aa652a6f97af Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 3
diff changeset
143
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
144 for exts in [vexts, iexts]:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
145 for ext in exts:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
146 r = basic_dl_impl(video["id"], ext)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
147 if r == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
148 break # done!
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
149 elif r == 1:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
150 # timeout; try again later?
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
151 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
152 elif r == 2:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
153 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
154 else:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
155 # we did not break out of the loop
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
156 # which means all extensions were unavailable
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
157 return 2
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
158
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
159 # video was downloaded successfully
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
160 return 0
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
161
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
162
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
163 # GhostArchive, basic...
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
164 def ghostarchive_dl(video: dict, basename: str, output: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
165 return basic_dl_template(video, basename, output,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
166 "https://ghostvideo.b-cdn.net/chimurai/%s.%s",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
167 ["mp4", "webm", "mkv"],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
168 [] # none
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
169 )
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
170
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
171
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
172 # media.desirintoplaisir.net
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
173 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
174 # holds PRIMARILY popular videos (i.e. no niche internet microcelebrities)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
175 # or weeb shit, however it seems to be growing to other stuff.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
176 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
177 # there isn't really a proper API; I've based the scraping off of the HTML
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
178 # and the public source code.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
179 def desirintoplaisir_dl(video: dict, basename: str, output: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
180 return basic_dl_template(video, basename, output,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
181 "https://media.desirintoplaisir.net/content/%s.%s",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
182 ["mp4", "webm", "mkv"],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
183 ["webp"]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
184 )
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
185
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
186
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
187 # Internet Archive's Wayback Machine
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
188 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
189 # Internally, IA's javascript routines forward to the magic
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
190 # URL used here.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
191 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
192 # TODO: Download thumbnails through the CDX API:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
193 # https://github.com/TheTechRobo/youtubevideofinder/blob/master/lostmediafinder/finder.py
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
194 # the CDX API is pretty slow though, so it should be used as a last resort.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
195 def wayback_dl(video: dict, basename: str, output: str) -> int:
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
196 try:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
197 url = ("https://web.archive.org/web/2oe_/http://wayback-fakeurl.archiv"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
198 "e.org/yt/%s" % video["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
199 with urllib.request.urlopen(url) as headers:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
200 contenttype = headers.getheader("Content-Type")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
201 if contenttype == "video/webm" or contenttype == "video/mp4":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
202 ext = contenttype.split("/")[-1]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
203 else:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
204 raise HTTPError(url=None, code=None, msg=None,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
205 hdrs=None, fp=None)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
206 with open("%s.%s" % (basename, ext), "wb") as f:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
207 f.write(headers.read())
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
208 print(" downloaded %s.%s" % (basename, ext))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
209 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
210 except TimeoutError:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
211 return 1
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
212 except HTTPError:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
213 # dont keep trying
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
214 return 2
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
215 except Exception as e:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
216 print(" unknown error downloading video!")
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
217 print(e)
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
218 return 1
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
219
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
220
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
221 # Internet Archive (tubeup)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
222 def ia_dl(video: dict, basename: str, output: str) -> int:
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
223 def ia_file_legit(f: str, vidid: str) -> bool:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
224 # FIXME:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
225 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
226 # There are some items on IA that combine the old tubeup behavior
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
227 # (i.e., including the sanitized video name before the ID)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
228 # and the new tubeup behavior (filename only contains the video ID)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
229 # hence we will download the entire video twice.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
230 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
231 # This isn't much of a problem anymore (and hasn't been for like 3
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
232 # years), since I contributed code to not upload something if there
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
233 # is already something there. However we should handle this case
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
234 # anyway.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
235 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
236 # Additionally, there are some items that have duplicate video files
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
237 # (from when the owners changed the title). We should ideally only
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
238 # download unique files. IA seems to provide SHA1 hashes...
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
239 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
240 # We should also check if whether the copy on IA is higher quality
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
241 # than a local copy... :)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
242 if not re.search(r"((?:.+?-)?" + vidid + r"\.(?:mp4|jpg|webp|mkv|w"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
243 r"ebm|info\.json|description|annotations.xml))",
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
244 f):
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
245 return False
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
246
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
247 return True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
248
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
249
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
250 if not internetarchive.get_item("youtube-%s" % video["id"]).exists:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
251 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
252
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
253 flist = [
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
254 f.name
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
255 for f in internetarchive.get_files("youtube-%s" % video["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
256 if ia_file_legit(f.name, video["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
257 ]
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
258
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
259 while True:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
260 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
261 internetarchive.download("youtube-%s" % video["id"], files=flist,
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
262 verbose=True,
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
263 no_directory=True, ignore_existing=True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
264 retries=9999)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
265 break
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
266 except ConnectTimeout:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
267 time.sleep(1)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
268 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
269 except Exception as e:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
270 print(e)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
271 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
272
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
273 # Newer versions of tubeup save only the video ID.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
274 # Account for this by replacing it.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
275 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
276 # paper/2025-08-30: fixed a bug where video IDs with hyphens
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
277 # would incorrectly truncate
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
278 #
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
279 # paper/2026-02-27: an update in the IA python library changed
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
280 # the way destdir works, so it just gets entirely ignored.
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
281 for fname in flist:
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
282 def whitelist(s: str, vidid: str) -> bool:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
283 # special case: .info.json files
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
284 if s == ("%s.info.json" % vidid):
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
285 return ".info.json"
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
286
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
287 spli = os.path.splitext(fname)
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
288 if spli is None or len(spli) != 2 or spli[0] != vidid:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
289 return None
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
290
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
291 return spli[1]
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
292
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
293 if not os.path.exists(fname):
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
294 continue
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
295
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
296 ext = whitelist(fname, video["id"])
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
297 if ext is None:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
298 continue
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
299
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
300 os.replace(fname, "%s%s" % (basename, ext))
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
301 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
302
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
303
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
304 def ytdlp_dl(video: dict, basename: str, output: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
305 # intentionally ignores all messages besides errors
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
306 class MyLogger(object):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
307 def debug(self, msg):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
308 pass
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
309
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
310 def warning(self, msg):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
311 pass
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
312
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
313 def error(self, msg):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
314 print(" " + msg)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
315 pass
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
316
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
317
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
318 def ytdl_hook(d) -> None:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
319 if d["status"] == "finished":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
320 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"])))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
321 if d["status"] == "downloading":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
322 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]),
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
323 d["_percent_str"]), end="")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
324 if d["status"] == "error":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
325 print("\n an error occurred downloading %s!"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
326 % (os.path.basename(d["filename"])))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
327
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
328 ytdl_opts = {
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
329 "retries": 100,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
330 "nooverwrites": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
331 "call_home": False,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
332 "quiet": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
333 "writeinfojson": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
334 "writedescription": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
335 "writethumbnail": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
336 "writeannotations": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
337 "writesubtitles": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
338 "allsubtitles": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
339 "addmetadata": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
340 "continuedl": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
341 "embedthumbnail": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
342 "format": "bestvideo+bestaudio/best",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
343 "restrictfilenames": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
344 "no_warnings": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
345 "progress_hooks": [ytdl_hook],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
346 "logger": MyLogger(),
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
347 "ignoreerrors": False,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
348
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
349 #mm, output template
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
350 "outtmpl": output + "/%(title)s-%(id)s.%(ext)s",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
351 }
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
352
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
353 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
354 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
355 ytdl.extract_info("https://youtube.com/watch?v=%s" % video["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
356 return 0
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
357 except DownloadError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
358 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
359 except Exception as e:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
360 print(" unknown error downloading video!\n")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
361 print(e)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
362
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
363 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
364
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
365
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
366 # TODO: There are multiple other youtube archival websites available.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
367 # Most notable is https://findyoutubevideo.thetechrobo.ca .
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
368 # This combines a lot of sparse youtube archival services, and has
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
369 # a convenient API we can use. Nice!
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
370 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
371 # There is also the "Distributed YouTube Archive" which is totally
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
372 # useless because there's way to automate it...
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
373
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
374 ##############################################################################
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
375
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
376
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
377 def main():
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
378 def load_split_files(path: str):
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
379 def cruft(isdir: bool, listdir, openf):
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
380 # build the path list
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
381 if not isdir:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
382 list_files = [path]
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
383 else:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
384 list_files = filter(lambda x: re.search(r"vids[0-9\-]+?\.json", x), listdir())
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
385
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
386 # now open each as a json
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
387 for fi in list_files:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
388 print(fi)
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
389 with openf(fi, "r") as infile:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
390 if simdjson:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
391 # Using this is a lot faster in SIMDJSON, since instead
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
392 # of converting all of the JSON key/value pairs into
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
393 # native Python objects, they stay in an internal state.
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
394 #
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
395 # This means we only get the stuff we absolutely need,
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
396 # which is the uploader ID, and copy everything else
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
397 # if the ID is one we are looking for.
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
398 parser = json.Parser()
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
399 yield parser.parse(infile.read())
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
400 del parser
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
401 else:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
402 yield json.load(infile)
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
403
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
404
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
405 try:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
406 if not zipfile_works or os.path.isdir(path):
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
407 raise Exception
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
408
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
409 with zipfile.ZipFile(path, "r") as myzip:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
410 yield from cruft(True, lambda: myzip.namelist(), lambda f, m: io.TextIOWrapper(myzip.open(f, mode=m), encoding="utf-8"))
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
411 except Exception as e:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
412 yield from cruft(os.path.isdir(path), lambda: os.listdir(path), lambda f, m: open(path + "/" + f, m, encoding="utf-8"))
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
413
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
414
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
415 def write_metadata(i: dict, basename: str) -> None:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
416 # ehhh
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
417 if not os.path.exists(basename + ".info.json"):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
418 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
419 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
420 # orjson outputs bytes
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
421 jsonfile.write(json.dumps(i).decode("utf-8"))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
422 except AttributeError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
423 # everything else outputs a string
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
424 jsonfile.write(json.dumps(i))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
425 print(" saved %s" % os.path.basename(jsonfile.name))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
426 if not os.path.exists(basename + ".description"):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
427 with open(basename + ".description", "w",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
428 encoding="utf-8") as descfile:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
429 descfile.write(i["description"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
430 print(" saved %s" % os.path.basename(descfile.name))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
431
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
432 args = docopt.docopt(__doc__)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
433
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
434 if not os.path.exists(args["--output"]):
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
435 os.mkdir(args["--output"])
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
436
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
437 channels = dict()
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
438
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
439 for url in args["<url>"]:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
440 chn = url.split("/")[-1]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
441 channels[chn] = {"output": "%s/%s" % (args["--output"], chn)}
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
442
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
443 for channel in channels.values():
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
444 if not os.path.exists(channel["output"]):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
445 os.mkdir(channel["output"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
446
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
447 # find videos in the database.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
448 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
449 # despite how it may seem, this is actually really fast, and fairly
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
450 # memory efficient too (but really only if we're using simdjson...)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
451 videos = [
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
452 i if not simdjson else i.as_dict()
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
453 for f in load_split_files(args["--database"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
454 for i in (f if not "videos" in f else f["videos"]) # logic is reversed kinda, python is weird
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
455 if "uploader_id" in i and i["uploader_id"] in channels
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
456 ]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
457
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
458 while True:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
459 if len(videos) == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
460 break
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
461
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
462 videos_copy = videos
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
463
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
464 for i in videos_copy:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
465 channel = channels[i["uploader_id"]]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
466
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
467 # precalculated for speed
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
468 output = channel["output"]
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
469
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
470 print("%s:" % i["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
471 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
472 restricted=True), i["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
473 files = [y
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
474 for p in ["mkv", "mp4", "webm"]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
475 for y in Path(output).glob(("*-%s." + p) % i["id"])]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
476 if files:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
477 print(" video already downloaded!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
478 videos.remove(i)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
479 write_metadata(i, basename)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
480 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
481
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
482 # high level "download" function.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
483 def dl(video: dict, basename: str, output: str):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
484 dls = []
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
485
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
486 if ytdlp_works:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
487 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
488 "func": ytdlp_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
489 "name": "using yt-dlp",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
490 })
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
491
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
492 if ia_works:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
493 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
494 "func": ia_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
495 "name": "from the Internet Archive",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
496 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
497
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
498 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
499 "func": desirintoplaisir_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
500 "name": "from LMIJLM/DJ Plaisir's archive",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
501 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
502 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
503 "func": ghostarchive_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
504 "name": "from GhostArchive"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
505 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
506 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
507 "func": wayback_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
508 "name": "from the Wayback Machine"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
509 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
510
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
511 for dl in dls:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
512 print(" attempting to download %s" % dl["name"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
513 r = dl["func"](i, basename, output)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
514 if r == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
515 # all good, video's downloaded
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
516 return 0
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
517 elif r == 2:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
518 # video is unavailable here
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
519 print(" oops, video is not available there...")
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
520 continue
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
521 elif r == 1:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
522 # error while downloading; likely temporary.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
523 # TODO we should save which downloader the video
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
524 # was on, so we can continue back at it later.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
525 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
526 # video is unavailable everywhere
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
527 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
528
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
529 r = dl(i, basename, output)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
530 if r == 1:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
531 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
532
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
533 # video is downloaded, or it's totally unavailable, so
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
534 # remove it from being checked again.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
535 videos.remove(i)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
536 # ... and then dump the metadata, if there isn't any on disk.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
537 write_metadata(i, basename)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
538
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
539 if r == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
540 # video is downloaded
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
541 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
542
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
543 # video is unavailable; write out the metadata.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
544 print(" video is unavailable everywhere; dumping out metadata only")
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
545
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
546
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
547 if __name__ == "__main__":
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
548 main()