annotate channeldownloader.py @ 16:088d9a3a2524

improvements to IA downloader now we explicitly ignore any file not "original". this seems to filter out derivative files (such as ogv and other shit we don't want) but keeps some of the toplevel metadata
author Paper <paper@tflc.us>
date Sat, 28 Feb 2026 14:38:04 -0500
parents 615e1ca0212a
children 0d10b2ce0140
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
1 #!/usr/bin/env python3
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
2 # -*- coding: utf-8 -*-
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
3 # channeldownloader.py - scrapes youtube videos from a channel from
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
4 # a variety of sources
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
5
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
6 # Copyright (c) 2021-2025 Paper <paper@tflc.us>
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
7 # This program is free software: you can redistribute it and/or modify
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
8 # it under the terms of the GNU General Public License as published by
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
9 # the Free Software Foundation, either version 2 of the License, or
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
10 # (at your option) any later version.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
11 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
12 # This program is distributed in the hope that it will be useful,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
15 # GNU General Public License for more details.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
16 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
17 # You should have received a copy of the GNU General Public License
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
18 # along with this program. If not, see <http://www.gnu.org/licenses/>.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
19
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
20 """
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
21 Usage:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
22 channeldownloader.py <url>... (--database <file>)
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
23 [--output <folder>]
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
24 channeldownloader.py -h | --help
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
25
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
26 Arguments:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
27 <url> YouTube channel URL to download from
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
28
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
29 Options:
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
30 -h --help Show this screen
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
31 -o --output <folder> Output folder, relative to the current directory
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
32 [default: .]
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
33 -d --database <file> yt-dlp style database of videos. Should contain
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
34 an array of yt-dlp .info.json data. For example,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
35 FinnOtaku's YTPMV metadata archive.
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
36 """
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
37
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
38 # Built-in python stuff (no possible missing dependencies)
5
d4740dc7470c [channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 4
diff changeset
39 from __future__ import print_function
9
2e9ed463c0be Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 8
diff changeset
40 import docopt
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
41 import os
2
c65d14f01453 Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 1
diff changeset
42 import re
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
43 import time
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
44 import urllib.request
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
45 import os
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
46 import ssl
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
47 import io
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
48 import shutil
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
49 import xml.etree.ElementTree as XmlET
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
50 from urllib.error import HTTPError
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
51 from pathlib import Path
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
52
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
53 # We can utilize special simdjson features if it is available
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
54 simdjson = False
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
55
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
56 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
57 import simdjson as json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
58 simdjson = True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
59 print("INFO: using simdjson")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
60 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
61 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
62 import ujson as json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
63 print("INFO: using ujson")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
64 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
65 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
66 import orjson as json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
67 print("INFO: using orjson")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
68 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
69 import json
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
70 print("INFO: using built-in json (slow!)")
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
71
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
72 ytdlp_works = False
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
73
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
74 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
75 import yt_dlp as youtube_dl
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
76 from yt_dlp.utils import sanitize_filename, DownloadError
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
77 ytdlp_works = True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
78 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
79 print("failed to import yt-dlp!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
80 print("downloading from YouTube directly will not work.")
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
81
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
82 ia_works = False
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
83
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
84 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
85 import internetarchive
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
86 from requests.exceptions import ConnectTimeout
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
87 ia_works = True
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
88 except ImportError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
89 print("failed to import the Internet Archive's python library!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
90 print("downloading from IA will not work.")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
91
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
92 zipfile_works = False
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
93
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
94 try:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
95 import zipfile
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
96 zipfile_works = True
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
97 except ImportError:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
98 print("failed to import zipfile!")
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
99 print("loading the database from a .zip file will not work.")
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
100
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
101 ##############################################################################
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
102 ## DOWNLOADERS
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
103
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
104 # All downloaders should be a function under this signature:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
105 # dl(video: dict, basename: str, output: str) -> int
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
106 # where:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
107 # 'video': the .info.json scraped from the YTPMV metadata archive.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
108 # 'basename': the basename output to write as.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
109 # 'output': the output directory.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
110 # yes, it's weird, but I don't care ;)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
111 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
112 # Magic return values:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
113 # 0 -- all good, video is downloaded
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
114 # 1 -- error downloading video; it may still be available if we try again
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
115 # 2 -- video is proved totally unavailable here. give up
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
116
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
117
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
118 # Basic downloader template.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
119 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
120 # This does a brute-force of all extensions within vexts and iexts
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
121 # in an attempt to find a working video link.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
122 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
123 # linktemplate is a template to be created using the video ID and
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
124 # extension. For example:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
125 # https://cdn.ytarchiver.com/%s.%s
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
126 def basic_dl_template(video: dict, basename: str, output: str,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
127 linktemplate: str, vexts: list, iexts: list) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
128 # actual downloader
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
129 def basic_dl_impl(vid: str, ext: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
130 url = (linktemplate % (vid, ext))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
131 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
132 with urllib.request.urlopen(url) as headers:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
133 with open("%s.%s" % (basename, ext), "wb") as f:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
134 f.write(headers.read())
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
135 print(" downloaded %s.%s" % (basename, ext))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
136 return 0
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
137 except TimeoutError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
138 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
139 except HTTPError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
140 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
141 except Exception as e:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
142 print(" unknown error downloading video!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
143 print(e)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
144 return 1
4
aa652a6f97af Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 3
diff changeset
145
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
146 for exts in [vexts, iexts]:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
147 for ext in exts:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
148 r = basic_dl_impl(video["id"], ext)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
149 if r == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
150 break # done!
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
151 elif r == 1:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
152 # timeout; try again later?
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
153 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
154 elif r == 2:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
155 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
156 else:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
157 # we did not break out of the loop
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
158 # which means all extensions were unavailable
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
159 return 2
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
160
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
161 # video was downloaded successfully
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
162 return 0
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
163
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
164
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
165 # GhostArchive, basic...
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
166 def ghostarchive_dl(video: dict, basename: str, output: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
167 return basic_dl_template(video, basename, output,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
168 "https://ghostvideo.b-cdn.net/chimurai/%s.%s",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
169 ["mp4", "webm", "mkv"],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
170 [] # none
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
171 )
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
172
0
d098a293a02d Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff changeset
173
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
174 # media.desirintoplaisir.net
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
175 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
176 # holds PRIMARILY popular videos (i.e. no niche internet microcelebrities)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
177 # or weeb shit, however it seems to be growing to other stuff.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
178 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
179 # there isn't really a proper API; I've based the scraping off of the HTML
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
180 # and the public source code.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
181 def desirintoplaisir_dl(video: dict, basename: str, output: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
182 return basic_dl_template(video, basename, output,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
183 "https://media.desirintoplaisir.net/content/%s.%s",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
184 ["mp4", "webm", "mkv"],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
185 ["webp"]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
186 )
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
187
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
188
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
189 # Internet Archive's Wayback Machine
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
190 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
191 # Internally, IA's javascript routines forward to the magic
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
192 # URL used here.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
193 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
194 # TODO: Download thumbnails through the CDX API:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
195 # https://github.com/TheTechRobo/youtubevideofinder/blob/master/lostmediafinder/finder.py
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
196 # the CDX API is pretty slow though, so it should be used as a last resort.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
197 def wayback_dl(video: dict, basename: str, output: str) -> int:
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
198 try:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
199 url = ("https://web.archive.org/web/2oe_/http://wayback-fakeurl.archiv"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
200 "e.org/yt/%s" % video["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
201 with urllib.request.urlopen(url) as headers:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
202 contenttype = headers.getheader("Content-Type")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
203 if contenttype == "video/webm" or contenttype == "video/mp4":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
204 ext = contenttype.split("/")[-1]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
205 else:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
206 raise HTTPError(url=None, code=None, msg=None,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
207 hdrs=None, fp=None)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
208 with open("%s.%s" % (basename, ext), "wb") as f:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
209 f.write(headers.read())
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
210 print(" downloaded %s.%s" % (basename, ext))
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
211 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
212 except TimeoutError:
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
213 return 1
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
214 except HTTPError:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
215 # dont keep trying
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
216 return 2
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
217 except Exception as e:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
218 print(" unknown error downloading video!")
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
219 print(e)
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
220 return 1
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
221
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
222
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
223 # Also captures the ID for comparison
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
224 IA_REGEX = re.compile(r"(?:(?P<date>\d{8}) - )?(?P<title>.+?)?(?:-| \[)?(?:(?P<id>[A-z0-9_\-]{11})]?|(?: \((?P<format>(?:(?:(?P<resolution>\d+)p_(?P<fps>\d+)fps_(?P<vcodec>H264)-)?(?P<abitrate>\d+)kbit_(?P<acodec>AAC|Vorbis))|BQ|Description)\)))\.(?P<extension>mp4|info\.json|description|annotations\.xml|webp|mkv|webm|jpg|jpeg|ogg|txt|m4a)$")
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
225
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
226
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
227 # Internet Archive (tubeup)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
228 def ia_dl(video: dict, basename: str, output: str) -> int:
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
229 def ia_file_legit(f: str, vidid: str, vidtitle: str) -> bool:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
230 # FIXME:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
231 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
232 # There are some items on IA that combine the old tubeup behavior
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
233 # (i.e., including the sanitized video name before the ID)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
234 # and the new tubeup behavior (filename only contains the video ID)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
235 # hence we will download the entire video twice.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
236 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
237 # This isn't much of a problem anymore (and hasn't been for like 3
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
238 # years), since I contributed code to not upload something if there
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
239 # is already something there. However we should handle this case
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
240 # anyway.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
241 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
242 # Additionally, there are some items that have duplicate video files
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
243 # (from when the owners changed the title). We should ideally only
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
244 # download unique files. IA seems to provide SHA1 hashes...
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
245 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
246 # We should also check if whether the copy on IA is higher quality
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
247 # than a local copy... :)
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
248
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
249 IA_ID = "youtube-%s" % vidid
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
250
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
251 # Ignore IA generated thumbnails
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
252 if f.startswith("%s.thumbs/" % IA_ID) or f == "__ia_thumb.jpg":
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
253 return False
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
254
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
255 for i in ["_archive.torrent", "_files.xml", "_meta.sqlite", "_meta.xml"]:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
256 if f == (IA_ID + i):
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
257 return False
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
258
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
259 # Try to match with our known filename regex
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
260 # This properly matches:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
261 # ??????????? - YYYYMMDD - TITLE [ID].EXTENSION
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
262 # old tubeup - TITLE-ID.EXTENSION
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
263 # tubeup - ID.EXTENSION
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
264 # JDownloader - TITLE (FORMAT).EXTENSION
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
265 # (Possibly we should match other filenames too??)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
266 m = re.match(IA_REGEX, f)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
267 if m is None:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
268 return False
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
269
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
270 if m.group("id"):
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
271 return (m.group("id") == vidid)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
272 elif m.group("title") is not None:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
273 def asciify(s: str) -> str:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
274 # Replace all non-ASCII chars with underscores, and get rid of any whitespace
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
275 return ''.join([i if ord(i) >= 0x20 and ord(i) < 0x80 and i not in "/\\" else '_' for i in s]).strip()
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
276
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
277 if asciify(m.group("title")) == asciify(vidtitle):
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
278 return True # Close enough
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
279
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
280 # Uh oh
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
281 return False
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
282
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
283 def ia_get_original_files(identifier: str) -> typing.Optional[list]:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
284 def ia_xml(identifier: str) -> typing.Optional[str]:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
285 for _ in range(1, 9999):
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
286 try:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
287 with urllib.request.urlopen("https://archive.org/download/%s/%s_files.xml" % (identifier, identifier)) as req:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
288 return req.read().decode("utf-8")
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
289 except HTTPError as e:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
290 if e.code == 404 or e.code == 503:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
291 return None
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
292 time.sleep(5)
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
293
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
294 d = ia_xml(identifier)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
295 if d is None:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
296 return None
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
297
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
298 try:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
299 # Now parse the XML and make a list of each original file
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
300 return [x.attrib["name"] for x in filter(lambda x: x.attrib["source"] == "original", XmlET.fromstring(d))]
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
301 except Exception as e:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
302 print(e)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
303 return None
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
304
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
305 originalfiles = ia_get_original_files("youtube-%s" % video["id"])
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
306 if not originalfiles:
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
307 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
308
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
309 flist = [
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
310 f
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
311 for f in originalfiles
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
312 if ia_file_legit(f, video["id"], video["title"] if not "fulltitle" in video else video["fulltitle"])
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
313 ]
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
314
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
315 if not flist:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
316 return 2 # ??????
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
317
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
318 while True:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
319 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
320 internetarchive.download("youtube-%s" % video["id"], files=flist,
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
321 verbose=True, ignore_existing=True,
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
322 retries=9999)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
323 break
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
324 except ConnectTimeout:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
325 time.sleep(1)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
326 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
327 except Exception as e:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
328 print(e)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
329 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
330
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
331 # Newer versions of tubeup save only the video ID.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
332 # Account for this by replacing it.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
333 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
334 # paper/2025-08-30: fixed a bug where video IDs with hyphens
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
335 # would incorrectly truncate
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
336 #
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
337 # paper/2026-02-27: an update in the IA python library changed
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
338 # the way destdir works, so it just gets entirely ignored.
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
339 for fname in flist:
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
340 def getext(s: str, vidid: str) -> typing.Optional[str]:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
341 # special cases
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
342 for i in [".info.json", ".annotations.xml"]:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
343 if s.endswith(i):
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
344 return i
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
345
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
346 # Handle JDownloader "TITLE (Description).txt"
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
347 if s.endswith(" (Description).txt"):
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
348 return ".description"
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
349
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
350 # Catch-all for remaining extensions
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
351 spli = os.path.splitext(s)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
352 if spli is None or len(spli) != 2:
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
353 return None
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
354
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
355 return spli[1]
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
356
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
357 ondisk = "youtube-%s/%s" % (video["id"], fname)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
358
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
359 if not os.path.exists(ondisk):
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
360 continue
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
361
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
362 ext = getext(fname, video["id"])
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
363 if ext is None:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
364 continue
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
365
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
366 os.replace(ondisk, "%s%s" % (basename, ext))
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
367
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
368 shutil.rmtree("youtube-%s" % video["id"])
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
369
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
370 return 0
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
371
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
372
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
373 def ytdlp_dl(video: dict, basename: str, output: str) -> int:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
374 # intentionally ignores all messages besides errors
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
375 class MyLogger(object):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
376 def debug(self, msg):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
377 pass
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
378
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
379 def warning(self, msg):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
380 pass
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
381
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
382 def error(self, msg):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
383 print(" " + msg)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
384 pass
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
385
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
386
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
387 def ytdl_hook(d) -> None:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
388 if d["status"] == "finished":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
389 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"])))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
390 if d["status"] == "downloading":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
391 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]),
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
392 d["_percent_str"]), end="")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
393 if d["status"] == "error":
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
394 print("\n an error occurred downloading %s!"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
395 % (os.path.basename(d["filename"])))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
396
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
397 ytdl_opts = {
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
398 "retries": 100,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
399 "nooverwrites": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
400 "call_home": False,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
401 "quiet": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
402 "writeinfojson": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
403 "writedescription": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
404 "writethumbnail": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
405 "writeannotations": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
406 "writesubtitles": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
407 "allsubtitles": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
408 "addmetadata": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
409 "continuedl": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
410 "embedthumbnail": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
411 "format": "bestvideo+bestaudio/best",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
412 "restrictfilenames": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
413 "no_warnings": True,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
414 "progress_hooks": [ytdl_hook],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
415 "logger": MyLogger(),
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
416 "ignoreerrors": False,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
417
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
418 #mm, output template
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
419 "outtmpl": output + "/%(title)s-%(id)s.%(ext)s",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
420 }
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
421
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
422 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
423 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
424 ytdl.extract_info("https://youtube.com/watch?v=%s" % video["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
425 return 0
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
426 except DownloadError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
427 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
428 except Exception as e:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
429 print(" unknown error downloading video!\n")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
430 print(e)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
431
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
432 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
433
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
434
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
435 # TODO: There are multiple other youtube archival websites available.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
436 # Most notable is https://findyoutubevideo.thetechrobo.ca .
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
437 # This combines a lot of sparse youtube archival services, and has
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
438 # a convenient API we can use. Nice!
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
439 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
440 # There is also the "Distributed YouTube Archive" which is totally
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
441 # useless because there's way to automate it...
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
442
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
443 ##############################################################################
11
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
444
1ac85f6f40c4 channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 10
diff changeset
445
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
446 def main():
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
447 def load_split_files(path: str):
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
448 def cruft(isdir: bool, listdir, openf):
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
449 # build the path list
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
450 if not isdir:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
451 list_files = [path]
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
452 else:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
453 list_files = filter(lambda x: re.search(r"vids[0-9\-]+?\.json", x), listdir())
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
454
15
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
455 # now open each as a json
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
456 for fi in list_files:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
457 print(fi)
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
458 with openf(fi, "r") as infile:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
459 if simdjson:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
460 # Using this is a lot faster in SIMDJSON, since instead
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
461 # of converting all of the JSON key/value pairs into
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
462 # native Python objects, they stay in an internal state.
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
463 #
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
464 # This means we only get the stuff we absolutely need,
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
465 # which is the uploader ID, and copy everything else
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
466 # if the ID is one we are looking for.
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
467 parser = json.Parser()
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
468 yield parser.parse(infile.read())
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
469 del parser
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
470 else:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
471 yield json.load(infile)
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
472
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
473
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
474 try:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
475 if not zipfile_works or os.path.isdir(path):
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
476 raise Exception
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
477
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
478 with zipfile.ZipFile(path, "r") as myzip:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
479 yield from cruft(True, lambda: myzip.namelist(), lambda f, m: io.TextIOWrapper(myzip.open(f, mode=m), encoding="utf-8"))
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
480 except Exception as e:
615e1ca0212a *: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents: 14
diff changeset
481 yield from cruft(os.path.isdir(path), lambda: os.listdir(path), lambda f, m: open(path + "/" + f, m, encoding="utf-8"))
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
482
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
483
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
484 def write_metadata(i: dict, basename: str) -> None:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
485 # ehhh
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
486 if not os.path.exists(basename + ".info.json"):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
487 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
488 try:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
489 # orjson outputs bytes
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
490 jsonfile.write(json.dumps(i).decode("utf-8"))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
491 except AttributeError:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
492 # everything else outputs a string
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
493 jsonfile.write(json.dumps(i))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
494 print(" saved %s" % os.path.basename(jsonfile.name))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
495 if not os.path.exists(basename + ".description"):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
496 with open(basename + ".description", "w",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
497 encoding="utf-8") as descfile:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
498 descfile.write(i["description"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
499 print(" saved %s" % os.path.basename(descfile.name))
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
500
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
501 args = docopt.docopt(__doc__)
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
502
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
503 if not os.path.exists(args["--output"]):
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
504 os.mkdir(args["--output"])
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
505
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
506 channels = dict()
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
507
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
508 for url in args["<url>"]:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
509 chn = url.split("/")[-1]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
510 channels[chn] = {"output": "%s/%s" % (args["--output"], chn)}
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
511
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
512 for channel in channels.values():
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
513 if not os.path.exists(channel["output"]):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
514 os.mkdir(channel["output"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
515
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
516 # find videos in the database.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
517 #
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
518 # despite how it may seem, this is actually really fast, and fairly
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
519 # memory efficient too (but really only if we're using simdjson...)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
520 videos = [
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
521 i if not simdjson else i.as_dict()
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
522 for f in load_split_files(args["--database"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
523 for i in (f if not "videos" in f else f["videos"]) # logic is reversed kinda, python is weird
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
524 if "uploader_id" in i and i["uploader_id"] in channels
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
525 ]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
526
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
527 while True:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
528 if len(videos) == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
529 break
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
530
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
531 videos_copy = videos
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
532
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
533 for i in videos_copy:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
534 channel = channels[i["uploader_id"]]
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
535
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
536 # precalculated for speed
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
537 output = channel["output"]
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
538
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
539 print("%s:" % i["id"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
540 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"],
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
541 restricted=True), i["id"])
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
542 def filenotworthit(f) -> bool:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
543 try:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
544 return bool(os.path.getsize(f))
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
545 except:
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
546 return False
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
547
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
548 pathoutput = Path(output)
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
549
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
550 # This is terrible
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
551 files = list(filter(filenotworthit, [y
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
552 for p in ["mkv", "mp4", "webm"]
16
088d9a3a2524 improvements to IA downloader
Paper <paper@tflc.us>
parents: 15
diff changeset
553 for y in pathoutput.glob(("*-%s." + p) % i["id"])]))
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
554 if files:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
555 print(" video already downloaded!")
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
556 videos.remove(i)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
557 write_metadata(i, basename)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
558 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
559
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
560 # high level "download" function.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
561 def dl(video: dict, basename: str, output: str):
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
562 dls = []
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
563
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
564 if ytdlp_works:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
565 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
566 "func": ytdlp_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
567 "name": "using yt-dlp",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
568 })
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
569
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
570 if ia_works:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
571 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
572 "func": ia_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
573 "name": "from the Internet Archive",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
574 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
575
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
576 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
577 "func": desirintoplaisir_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
578 "name": "from LMIJLM/DJ Plaisir's archive",
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
579 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
580 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
581 "func": ghostarchive_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
582 "name": "from GhostArchive"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
583 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
584 dls.append({
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
585 "func": wayback_dl,
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
586 "name": "from the Wayback Machine"
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
587 })
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
588
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
589 for dl in dls:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
590 print(" attempting to download %s" % dl["name"])
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
591 r = dl["func"](i, basename, output)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
592 if r == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
593 # all good, video's downloaded
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
594 return 0
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
595 elif r == 2:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
596 # video is unavailable here
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
597 print(" oops, video is not available there...")
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
598 continue
14
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
599 elif r == 1:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
600 # error while downloading; likely temporary.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
601 # TODO we should save which downloader the video
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
602 # was on, so we can continue back at it later.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
603 return 1
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
604 # video is unavailable everywhere
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
605 return 2
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
606
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
607 r = dl(i, basename, output)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
608 if r == 1:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
609 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
610
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
611 # video is downloaded, or it's totally unavailable, so
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
612 # remove it from being checked again.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
613 videos.remove(i)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
614 # ... and then dump the metadata, if there isn't any on disk.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
615 write_metadata(i, basename)
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
616
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
617 if r == 0:
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
618 # video is downloaded
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
619 continue
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
620
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
621 # video is unavailable; write out the metadata.
03c8fd4069fb *: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents: 13
diff changeset
622 print(" video is unavailable everywhere; dumping out metadata only")
6
5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 5
diff changeset
623
10
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
624
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
625 if __name__ == "__main__":
8969930a9fa4 *: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents: 9
diff changeset
626 main()