Mercurial > channeldownloader
annotate channeldownloader.py @ 16:088d9a3a2524
improvements to IA downloader
now we explicitly ignore any file not "original". this seems to
filter out derivative files (such as ogv and other shit we don't
want) but keeps some of the toplevel metadata
| author | Paper <paper@tflc.us> |
|---|---|
| date | Sat, 28 Feb 2026 14:38:04 -0500 |
| parents | 615e1ca0212a |
| children | 0d10b2ce0140 |
| rev | line source |
|---|---|
|
5
d4740dc7470c
[channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
4
diff
changeset
|
1 #!/usr/bin/env python3 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
2 # -*- coding: utf-8 -*- |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
3 # channeldownloader.py - scrapes youtube videos from a channel from |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
4 # a variety of sources |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
5 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
6 # Copyright (c) 2021-2025 Paper <paper@tflc.us> |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
7 # This program is free software: you can redistribute it and/or modify |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
8 # it under the terms of the GNU General Public License as published by |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
9 # the Free Software Foundation, either version 2 of the License, or |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
10 # (at your option) any later version. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
11 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
12 # This program is distributed in the hope that it will be useful, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
13 # but WITHOUT ANY WARRANTY; without even the implied warranty of |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
15 # GNU General Public License for more details. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
16 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
17 # You should have received a copy of the GNU General Public License |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
18 # along with this program. If not, see <http://www.gnu.org/licenses/>. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
19 |
|
9
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
20 """ |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
21 Usage: |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
22 channeldownloader.py <url>... (--database <file>) |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
23 [--output <folder>] |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
24 channeldownloader.py -h | --help |
|
5
d4740dc7470c
[channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
4
diff
changeset
|
25 |
|
9
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
26 Arguments: |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
27 <url> YouTube channel URL to download from |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
28 |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
29 Options: |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
30 -h --help Show this screen |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
31 -o --output <folder> Output folder, relative to the current directory |
|
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
32 [default: .] |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
33 -d --database <file> yt-dlp style database of videos. Should contain |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
34 an array of yt-dlp .info.json data. For example, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
35 FinnOtaku's YTPMV metadata archive. |
|
9
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
36 """ |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
37 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
38 # Built-in python stuff (no possible missing dependencies) |
|
5
d4740dc7470c
[channeldownloader.py] Python 2.7 compatibility
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
4
diff
changeset
|
39 from __future__ import print_function |
|
9
2e9ed463c0be
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
8
diff
changeset
|
40 import docopt |
|
0
d098a293a02d
Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff
changeset
|
41 import os |
|
2
c65d14f01453
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
1
diff
changeset
|
42 import re |
|
6
5d93490e60e2
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
5
diff
changeset
|
43 import time |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
44 import urllib.request |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
45 import os |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
46 import ssl |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
47 import io |
| 16 | 48 import shutil |
| 49 import xml.etree.ElementTree as XmlET | |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
50 from urllib.error import HTTPError |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
51 from pathlib import Path |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
52 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
53 # We can utilize special simdjson features if it is available |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
54 simdjson = False |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
55 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
56 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
57 import simdjson as json |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
58 simdjson = True |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
59 print("INFO: using simdjson") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
60 except ImportError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
61 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
62 import ujson as json |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
63 print("INFO: using ujson") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
64 except ImportError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
65 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
66 import orjson as json |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
67 print("INFO: using orjson") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
68 except ImportError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
69 import json |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
70 print("INFO: using built-in json (slow!)") |
|
0
d098a293a02d
Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff
changeset
|
71 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
72 ytdlp_works = False |
|
0
d098a293a02d
Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff
changeset
|
73 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
74 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
75 import yt_dlp as youtube_dl |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
76 from yt_dlp.utils import sanitize_filename, DownloadError |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
77 ytdlp_works = True |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
78 except ImportError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
79 print("failed to import yt-dlp!") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
80 print("downloading from YouTube directly will not work.") |
|
0
d098a293a02d
Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff
changeset
|
81 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
82 ia_works = False |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
83 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
84 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
85 import internetarchive |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
86 from requests.exceptions import ConnectTimeout |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
87 ia_works = True |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
88 except ImportError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
89 print("failed to import the Internet Archive's python library!") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
90 print("downloading from IA will not work.") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
91 |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
92 zipfile_works = False |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
93 |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
94 try: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
95 import zipfile |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
96 zipfile_works = True |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
97 except ImportError: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
98 print("failed to import zipfile!") |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
99 print("loading the database from a .zip file will not work.") |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
100 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
101 ############################################################################## |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
102 ## DOWNLOADERS |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
103 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
104 # All downloaders should be a function under this signature: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
105 # dl(video: dict, basename: str, output: str) -> int |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
106 # where: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
107 # 'video': the .info.json scraped from the YTPMV metadata archive. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
108 # 'basename': the basename output to write as. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
109 # 'output': the output directory. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
110 # yes, it's weird, but I don't care ;) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
111 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
112 # Magic return values: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
113 # 0 -- all good, video is downloaded |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
114 # 1 -- error downloading video; it may still be available if we try again |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
115 # 2 -- video is proved totally unavailable here. give up |
|
0
d098a293a02d
Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff
changeset
|
116 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
117 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
118 # Basic downloader template. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
119 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
120 # This does a brute-force of all extensions within vexts and iexts |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
121 # in an attempt to find a working video link. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
122 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
123 # linktemplate is a template to be created using the video ID and |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
124 # extension. For example: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
125 # https://cdn.ytarchiver.com/%s.%s |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
126 def basic_dl_template(video: dict, basename: str, output: str, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
127 linktemplate: str, vexts: list, iexts: list) -> int: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
128 # actual downloader |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
129 def basic_dl_impl(vid: str, ext: str) -> int: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
130 url = (linktemplate % (vid, ext)) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
131 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
132 with urllib.request.urlopen(url) as headers: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
133 with open("%s.%s" % (basename, ext), "wb") as f: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
134 f.write(headers.read()) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
135 print(" downloaded %s.%s" % (basename, ext)) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
136 return 0 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
137 except TimeoutError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
138 return 1 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
139 except HTTPError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
140 return 2 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
141 except Exception as e: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
142 print(" unknown error downloading video!") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
143 print(e) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
144 return 1 |
|
4
aa652a6f97af
Update channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
3
diff
changeset
|
145 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
146 for exts in [vexts, iexts]: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
147 for ext in exts: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
148 r = basic_dl_impl(video["id"], ext) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
149 if r == 0: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
150 break # done! |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
151 elif r == 1: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
152 # timeout; try again later? |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
153 return 1 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
154 elif r == 2: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
155 continue |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
156 else: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
157 # we did not break out of the loop |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
158 # which means all extensions were unavailable |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
159 return 2 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
160 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
161 # video was downloaded successfully |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
162 return 0 |
|
6
5d93490e60e2
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
5
diff
changeset
|
163 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
164 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
165 # GhostArchive, basic... |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
166 def ghostarchive_dl(video: dict, basename: str, output: str) -> int: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
167 return basic_dl_template(video, basename, output, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
168 "https://ghostvideo.b-cdn.net/chimurai/%s.%s", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
169 ["mp4", "webm", "mkv"], |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
170 [] # none |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
171 ) |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
172 |
|
0
d098a293a02d
Add channeldownloader.py
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
diff
changeset
|
173 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
174 # media.desirintoplaisir.net |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
175 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
176 # holds PRIMARILY popular videos (i.e. no niche internet microcelebrities) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
177 # or weeb shit, however it seems to be growing to other stuff. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
178 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
179 # there isn't really a proper API; I've based the scraping off of the HTML |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
180 # and the public source code. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
181 def desirintoplaisir_dl(video: dict, basename: str, output: str) -> int: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
182 return basic_dl_template(video, basename, output, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
183 "https://media.desirintoplaisir.net/content/%s.%s", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
184 ["mp4", "webm", "mkv"], |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
185 ["webp"] |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
186 ) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
187 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
188 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
189 # Internet Archive's Wayback Machine |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
190 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
191 # Internally, IA's javascript routines forward to the magic |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
192 # URL used here. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
193 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
194 # TODO: Download thumbnails through the CDX API: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
195 # https://github.com/TheTechRobo/youtubevideofinder/blob/master/lostmediafinder/finder.py |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
196 # the CDX API is pretty slow though, so it should be used as a last resort. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
197 def wayback_dl(video: dict, basename: str, output: str) -> int: |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
198 try: |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
199 url = ("https://web.archive.org/web/2oe_/http://wayback-fakeurl.archiv" |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
200 "e.org/yt/%s" % video["id"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
201 with urllib.request.urlopen(url) as headers: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
202 contenttype = headers.getheader("Content-Type") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
203 if contenttype == "video/webm" or contenttype == "video/mp4": |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
204 ext = contenttype.split("/")[-1] |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
205 else: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
206 raise HTTPError(url=None, code=None, msg=None, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
207 hdrs=None, fp=None) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
208 with open("%s.%s" % (basename, ext), "wb") as f: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
209 f.write(headers.read()) |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
210 print(" downloaded %s.%s" % (basename, ext)) |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
211 return 0 |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
212 except TimeoutError: |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
213 return 1 |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
214 except HTTPError: |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
215 # dont keep trying |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
216 return 2 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
217 except Exception as e: |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
218 print(" unknown error downloading video!") |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
219 print(e) |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
220 return 1 |
|
11
1ac85f6f40c4
channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
10
diff
changeset
|
221 |
|
1ac85f6f40c4
channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
10
diff
changeset
|
222 |
| 16 | 223 # Also captures the ID for comparison |
| 224 IA_REGEX = re.compile(r"(?:(?P<date>\d{8}) - )?(?P<title>.+?)?(?:-| \[)?(?:(?P<id>[A-z0-9_\-]{11})]?|(?: \((?P<format>(?:(?:(?P<resolution>\d+)p_(?P<fps>\d+)fps_(?P<vcodec>H264)-)?(?P<abitrate>\d+)kbit_(?P<acodec>AAC|Vorbis))|BQ|Description)\)))\.(?P<extension>mp4|info\.json|description|annotations\.xml|webp|mkv|webm|jpg|jpeg|ogg|txt|m4a)$") | |
| 225 | |
| 226 | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
227 # Internet Archive (tubeup) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
228 def ia_dl(video: dict, basename: str, output: str) -> int: |
| 16 | 229 def ia_file_legit(f: str, vidid: str, vidtitle: str) -> bool: |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
230 # FIXME: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
231 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
232 # There are some items on IA that combine the old tubeup behavior |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
233 # (i.e., including the sanitized video name before the ID) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
234 # and the new tubeup behavior (filename only contains the video ID) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
235 # hence we will download the entire video twice. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
236 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
237 # This isn't much of a problem anymore (and hasn't been for like 3 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
238 # years), since I contributed code to not upload something if there |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
239 # is already something there. However we should handle this case |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
240 # anyway. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
241 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
242 # Additionally, there are some items that have duplicate video files |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
243 # (from when the owners changed the title). We should ideally only |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
244 # download unique files. IA seems to provide SHA1 hashes... |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
245 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
246 # We should also check if whether the copy on IA is higher quality |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
247 # than a local copy... :) |
| 16 | 248 |
| 249 IA_ID = "youtube-%s" % vidid | |
| 250 | |
| 251 # Ignore IA generated thumbnails | |
| 252 if f.startswith("%s.thumbs/" % IA_ID) or f == "__ia_thumb.jpg": | |
| 253 return False | |
| 254 | |
| 255 for i in ["_archive.torrent", "_files.xml", "_meta.sqlite", "_meta.xml"]: | |
| 256 if f == (IA_ID + i): | |
| 257 return False | |
| 258 | |
| 259 # Try to match with our known filename regex | |
| 260 # This properly matches: | |
| 261 # ??????????? - YYYYMMDD - TITLE [ID].EXTENSION | |
| 262 # old tubeup - TITLE-ID.EXTENSION | |
| 263 # tubeup - ID.EXTENSION | |
| 264 # JDownloader - TITLE (FORMAT).EXTENSION | |
| 265 # (Possibly we should match other filenames too??) | |
| 266 m = re.match(IA_REGEX, f) | |
| 267 if m is None: | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
268 return False |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
269 |
| 16 | 270 if m.group("id"): |
| 271 return (m.group("id") == vidid) | |
| 272 elif m.group("title") is not None: | |
| 273 def asciify(s: str) -> str: | |
| 274 # Replace all non-ASCII chars with underscores, and get rid of any whitespace | |
| 275 return ''.join([i if ord(i) >= 0x20 and ord(i) < 0x80 and i not in "/\\" else '_' for i in s]).strip() | |
| 276 | |
| 277 if asciify(m.group("title")) == asciify(vidtitle): | |
| 278 return True # Close enough | |
| 279 | |
| 280 # Uh oh | |
| 281 return False | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
282 |
| 16 | 283 def ia_get_original_files(identifier: str) -> typing.Optional[list]: |
| 284 def ia_xml(identifier: str) -> typing.Optional[str]: | |
| 285 for _ in range(1, 9999): | |
| 286 try: | |
| 287 with urllib.request.urlopen("https://archive.org/download/%s/%s_files.xml" % (identifier, identifier)) as req: | |
| 288 return req.read().decode("utf-8") | |
| 289 except HTTPError as e: | |
| 290 if e.code == 404 or e.code == 503: | |
| 291 return None | |
| 292 time.sleep(5) | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
293 |
| 16 | 294 d = ia_xml(identifier) |
| 295 if d is None: | |
| 296 return None | |
| 297 | |
| 298 try: | |
| 299 # Now parse the XML and make a list of each original file | |
| 300 return [x.attrib["name"] for x in filter(lambda x: x.attrib["source"] == "original", XmlET.fromstring(d))] | |
| 301 except Exception as e: | |
| 302 print(e) | |
| 303 return None | |
| 304 | |
| 305 originalfiles = ia_get_original_files("youtube-%s" % video["id"]) | |
| 306 if not originalfiles: | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
307 return 2 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
308 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
309 flist = [ |
| 16 | 310 f |
| 311 for f in originalfiles | |
| 312 if ia_file_legit(f, video["id"], video["title"] if not "fulltitle" in video else video["fulltitle"]) | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
313 ] |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
314 |
| 16 | 315 if not flist: |
| 316 return 2 # ?????? | |
| 317 | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
318 while True: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
319 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
320 internetarchive.download("youtube-%s" % video["id"], files=flist, |
| 16 | 321 verbose=True, ignore_existing=True, |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
322 retries=9999) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
323 break |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
324 except ConnectTimeout: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
325 time.sleep(1) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
326 continue |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
327 except Exception as e: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
328 print(e) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
329 return 1 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
330 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
331 # Newer versions of tubeup save only the video ID. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
332 # Account for this by replacing it. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
333 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
334 # paper/2025-08-30: fixed a bug where video IDs with hyphens |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
335 # would incorrectly truncate |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
336 # |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
337 # paper/2026-02-27: an update in the IA python library changed |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
338 # the way destdir works, so it just gets entirely ignored. |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
339 for fname in flist: |
| 16 | 340 def getext(s: str, vidid: str) -> typing.Optional[str]: |
| 341 # special cases | |
| 342 for i in [".info.json", ".annotations.xml"]: | |
| 343 if s.endswith(i): | |
| 344 return i | |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
345 |
| 16 | 346 # Handle JDownloader "TITLE (Description).txt" |
| 347 if s.endswith(" (Description).txt"): | |
| 348 return ".description" | |
| 349 | |
| 350 # Catch-all for remaining extensions | |
| 351 spli = os.path.splitext(s) | |
| 352 if spli is None or len(spli) != 2: | |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
353 return None |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
354 |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
355 return spli[1] |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
356 |
| 16 | 357 ondisk = "youtube-%s/%s" % (video["id"], fname) |
| 358 | |
| 359 if not os.path.exists(ondisk): | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
360 continue |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
361 |
| 16 | 362 ext = getext(fname, video["id"]) |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
363 if ext is None: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
364 continue |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
365 |
| 16 | 366 os.replace(ondisk, "%s%s" % (basename, ext)) |
| 367 | |
| 368 shutil.rmtree("youtube-%s" % video["id"]) | |
| 369 | |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
370 return 0 |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
371 |
|
11
1ac85f6f40c4
channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
10
diff
changeset
|
372 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
373 def ytdlp_dl(video: dict, basename: str, output: str) -> int: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
374 # intentionally ignores all messages besides errors |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
375 class MyLogger(object): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
376 def debug(self, msg): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
377 pass |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
378 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
379 def warning(self, msg): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
380 pass |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
381 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
382 def error(self, msg): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
383 print(" " + msg) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
384 pass |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
385 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
386 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
387 def ytdl_hook(d) -> None: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
388 if d["status"] == "finished": |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
389 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"]))) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
390 if d["status"] == "downloading": |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
391 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]), |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
392 d["_percent_str"]), end="") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
393 if d["status"] == "error": |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
394 print("\n an error occurred downloading %s!" |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
395 % (os.path.basename(d["filename"]))) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
396 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
397 ytdl_opts = { |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
398 "retries": 100, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
399 "nooverwrites": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
400 "call_home": False, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
401 "quiet": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
402 "writeinfojson": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
403 "writedescription": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
404 "writethumbnail": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
405 "writeannotations": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
406 "writesubtitles": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
407 "allsubtitles": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
408 "addmetadata": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
409 "continuedl": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
410 "embedthumbnail": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
411 "format": "bestvideo+bestaudio/best", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
412 "restrictfilenames": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
413 "no_warnings": True, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
414 "progress_hooks": [ytdl_hook], |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
415 "logger": MyLogger(), |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
416 "ignoreerrors": False, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
417 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
418 #mm, output template |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
419 "outtmpl": output + "/%(title)s-%(id)s.%(ext)s", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
420 } |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
421 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
422 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
423 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
424 ytdl.extract_info("https://youtube.com/watch?v=%s" % video["id"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
425 return 0 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
426 except DownloadError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
427 return 2 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
428 except Exception as e: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
429 print(" unknown error downloading video!\n") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
430 print(e) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
431 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
432 return 1 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
433 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
434 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
435 # TODO: There are multiple other youtube archival websites available. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
436 # Most notable is https://findyoutubevideo.thetechrobo.ca . |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
437 # This combines a lot of sparse youtube archival services, and has |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
438 # a convenient API we can use. Nice! |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
439 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
440 # There is also the "Distributed YouTube Archive" which is totally |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
441 # useless because there's way to automate it... |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
442 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
443 ############################################################################## |
|
11
1ac85f6f40c4
channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
10
diff
changeset
|
444 |
|
1ac85f6f40c4
channeldownloader: insane memory optimizations
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
10
diff
changeset
|
445 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
446 def main(): |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
447 def load_split_files(path: str): |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
448 def cruft(isdir: bool, listdir, openf): |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
449 # build the path list |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
450 if not isdir: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
451 list_files = [path] |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
452 else: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
453 list_files = filter(lambda x: re.search(r"vids[0-9\-]+?\.json", x), listdir()) |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
454 |
|
15
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
455 # now open each as a json |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
456 for fi in list_files: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
457 print(fi) |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
458 with openf(fi, "r") as infile: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
459 if simdjson: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
460 # Using this is a lot faster in SIMDJSON, since instead |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
461 # of converting all of the JSON key/value pairs into |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
462 # native Python objects, they stay in an internal state. |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
463 # |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
464 # This means we only get the stuff we absolutely need, |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
465 # which is the uploader ID, and copy everything else |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
466 # if the ID is one we are looking for. |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
467 parser = json.Parser() |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
468 yield parser.parse(infile.read()) |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
469 del parser |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
470 else: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
471 yield json.load(infile) |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
472 |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
473 |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
474 try: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
475 if not zipfile_works or os.path.isdir(path): |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
476 raise Exception |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
477 |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
478 with zipfile.ZipFile(path, "r") as myzip: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
479 yield from cruft(True, lambda: myzip.namelist(), lambda f, m: io.TextIOWrapper(myzip.open(f, mode=m), encoding="utf-8")) |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
480 except Exception as e: |
|
615e1ca0212a
*: add support for loading the split db from a zip file
Paper <paper@tflc.us>
parents:
14
diff
changeset
|
481 yield from cruft(os.path.isdir(path), lambda: os.listdir(path), lambda f, m: open(path + "/" + f, m, encoding="utf-8")) |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
482 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
483 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
484 def write_metadata(i: dict, basename: str) -> None: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
485 # ehhh |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
486 if not os.path.exists(basename + ".info.json"): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
487 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
488 try: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
489 # orjson outputs bytes |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
490 jsonfile.write(json.dumps(i).decode("utf-8")) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
491 except AttributeError: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
492 # everything else outputs a string |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
493 jsonfile.write(json.dumps(i)) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
494 print(" saved %s" % os.path.basename(jsonfile.name)) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
495 if not os.path.exists(basename + ".description"): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
496 with open(basename + ".description", "w", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
497 encoding="utf-8") as descfile: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
498 descfile.write(i["description"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
499 print(" saved %s" % os.path.basename(descfile.name)) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
500 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
501 args = docopt.docopt(__doc__) |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
502 |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
503 if not os.path.exists(args["--output"]): |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
504 os.mkdir(args["--output"]) |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
505 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
506 channels = dict() |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
507 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
508 for url in args["<url>"]: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
509 chn = url.split("/")[-1] |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
510 channels[chn] = {"output": "%s/%s" % (args["--output"], chn)} |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
511 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
512 for channel in channels.values(): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
513 if not os.path.exists(channel["output"]): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
514 os.mkdir(channel["output"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
515 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
516 # find videos in the database. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
517 # |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
518 # despite how it may seem, this is actually really fast, and fairly |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
519 # memory efficient too (but really only if we're using simdjson...) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
520 videos = [ |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
521 i if not simdjson else i.as_dict() |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
522 for f in load_split_files(args["--database"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
523 for i in (f if not "videos" in f else f["videos"]) # logic is reversed kinda, python is weird |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
524 if "uploader_id" in i and i["uploader_id"] in channels |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
525 ] |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
526 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
527 while True: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
528 if len(videos) == 0: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
529 break |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
530 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
531 videos_copy = videos |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
532 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
533 for i in videos_copy: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
534 channel = channels[i["uploader_id"]] |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
535 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
536 # precalculated for speed |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
537 output = channel["output"] |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
538 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
539 print("%s:" % i["id"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
540 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"], |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
541 restricted=True), i["id"]) |
| 16 | 542 def filenotworthit(f) -> bool: |
| 543 try: | |
| 544 return bool(os.path.getsize(f)) | |
| 545 except: | |
| 546 return False | |
| 547 | |
| 548 pathoutput = Path(output) | |
| 549 | |
| 550 # This is terrible | |
| 551 files = list(filter(filenotworthit, [y | |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
552 for p in ["mkv", "mp4", "webm"] |
| 16 | 553 for y in pathoutput.glob(("*-%s." + p) % i["id"])])) |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
554 if files: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
555 print(" video already downloaded!") |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
556 videos.remove(i) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
557 write_metadata(i, basename) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
558 continue |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
559 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
560 # high level "download" function. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
561 def dl(video: dict, basename: str, output: str): |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
562 dls = [] |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
563 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
564 if ytdlp_works: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
565 dls.append({ |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
566 "func": ytdlp_dl, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
567 "name": "using yt-dlp", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
568 }) |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
569 |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
570 if ia_works: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
571 dls.append({ |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
572 "func": ia_dl, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
573 "name": "from the Internet Archive", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
574 }) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
575 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
576 dls.append({ |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
577 "func": desirintoplaisir_dl, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
578 "name": "from LMIJLM/DJ Plaisir's archive", |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
579 }) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
580 dls.append({ |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
581 "func": ghostarchive_dl, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
582 "name": "from GhostArchive" |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
583 }) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
584 dls.append({ |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
585 "func": wayback_dl, |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
586 "name": "from the Wayback Machine" |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
587 }) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
588 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
589 for dl in dls: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
590 print(" attempting to download %s" % dl["name"]) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
591 r = dl["func"](i, basename, output) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
592 if r == 0: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
593 # all good, video's downloaded |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
594 return 0 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
595 elif r == 2: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
596 # video is unavailable here |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
597 print(" oops, video is not available there...") |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
598 continue |
|
14
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
599 elif r == 1: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
600 # error while downloading; likely temporary. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
601 # TODO we should save which downloader the video |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
602 # was on, so we can continue back at it later. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
603 return 1 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
604 # video is unavailable everywhere |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
605 return 2 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
606 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
607 r = dl(i, basename, output) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
608 if r == 1: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
609 continue |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
610 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
611 # video is downloaded, or it's totally unavailable, so |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
612 # remove it from being checked again. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
613 videos.remove(i) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
614 # ... and then dump the metadata, if there isn't any on disk. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
615 write_metadata(i, basename) |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
616 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
617 if r == 0: |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
618 # video is downloaded |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
619 continue |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
620 |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
621 # video is unavailable; write out the metadata. |
|
03c8fd4069fb
*: big refactor, switch to GPLv2, and add README
Paper <paper@tflc.us>
parents:
13
diff
changeset
|
622 print(" video is unavailable everywhere; dumping out metadata only") |
|
6
5d93490e60e2
[channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
5
diff
changeset
|
623 |
|
10
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
624 |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
625 if __name__ == "__main__": |
|
8969930a9fa4
*: major cleanup
Paper <37962225+mrpapersonic@users.noreply.github.com>
parents:
9
diff
changeset
|
626 main() |
