channeldownloader: channeldownloader.py annotate

annotate channeldownloader.py @ 18:05e71dd6b6ca default tip

no more ia python library

author	Paper <paper@tflc.us>
date	Sat, 28 Feb 2026 22:31:59 -0500
parents	0d10b2ce0140
children

rev	line source
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	1 #!/usr/bin/env python3
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	2 # -- coding: utf-8 --
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	3 # channeldownloader.py - scrapes youtube videos from a channel from
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	4 # a variety of sources
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	5
17 0d10b2ce0140 2026 Paper <paper@tflc.us> parents: 16 diff changeset	6 # Copyright (c) 2021-2026 Paper <paper@tflc.us>
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	7 # This program is free software: you can redistribute it and/or modify
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	8 # it under the terms of the GNU General Public License as published by
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	9 # the Free Software Foundation, either version 2 of the License, or
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	10 # (at your option) any later version.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	11 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	12 # This program is distributed in the hope that it will be useful,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	13 # but WITHOUT ANY WARRANTY; without even the implied warranty of
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	14 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	15 # GNU General Public License for more details.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	16 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	17 # You should have received a copy of the GNU General Public License
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	18 # along with this program. If not, see <http://www.gnu.org/licenses/>.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	19
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	20 # Okay, this is a bit of a clusterfuck.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	21 #
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	22 # This originated as a script that simply helped me scrape a bunch
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	23 # of videos off some deleted channels (in fact, that's still it's main
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	24 # purpose) and was very lackluster (hardcoded shite everywhere).
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	25 # Fortunately in recent times I've cleaned up the code and added some
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	26 # other mirrors, as well as improved the archive.org scraper to not
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	27 # shoot itself when it encounters an upload that's not from tubeup.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	28 #
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	29 # Nevertheless, I still consider much of this file to be dirty hacks,
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	30 # especially some of the HTTP stuff.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	31
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	32 """
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	33 Usage:
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	34 channeldownloader.py <url>... (--database <file>)
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	35 [--output <folder>]
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	36 channeldownloader.py -h \| --help
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	37
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	38 Arguments:
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	39 <url> YouTube channel URL to download from
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	40
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	41 Options:
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	42 -h --help Show this screen
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	43 -o --output <folder> Output folder, relative to the current directory
2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	44 [default: .]
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	45 -d --database <file> yt-dlp style database of videos. Should contain
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	46 an array of yt-dlp .info.json data. For example,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	47 FinnOtaku's YTPMV metadata archive.
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	48 """
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	49
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	50 # Built-in python stuff (no possible missing dependencies)
5 d4740dc7470c [channeldownloader.py] Python 2.7 compatibility Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 4 diff changeset	51 from __future__ import print_function
9 2e9ed463c0be Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 8 diff changeset	52 import docopt
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	53 import os
2 c65d14f01453 Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 1 diff changeset	54 import re
6 5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	55 import time
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	56 import urllib.request
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	57 import urllib.parse
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	58 import os
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	59 import ssl
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	60 import io
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	61 import shutil
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	62 import xml.etree.ElementTree as XmlET
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	63 import enum
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	64 from urllib.error import HTTPError
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	65 from pathlib import Path
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	66
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	67 # We can utilize special simdjson features if it is available
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	68 simdjson = False
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	69
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	70 try:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	71 import simdjson as json
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	72 simdjson = True
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	73 print("INFO: using simdjson")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	74 except ImportError:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	75 try:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	76 import ujson as json
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	77 print("INFO: using ujson")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	78 except ImportError:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	79 try:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	80 import orjson as json
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	81 print("INFO: using orjson")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	82 except ImportError:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	83 import json
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	84 print("INFO: using built-in json (slow!)")
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	85
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	86 ytdlp_works = False
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	87
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	88 try:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	89 import yt_dlp as youtube_dl
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	90 from yt_dlp.utils import sanitize_filename, DownloadError
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	91 ytdlp_works = True
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	92 except ImportError:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	93 print("failed to import yt-dlp!")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	94 print("downloading from YouTube directly will not work.")
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	95
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	96 zipfile_works = False
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	97
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	98 try:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	99 import zipfile
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	100 zipfile_works = True
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	101 except ImportError:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	102 print("failed to import zipfile!")
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	103 print("loading the database from a .zip file will not work.")
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	104
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	105
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	106 ##############################################################################
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	107 ## DOWNLOADERS
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	108
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	109 # All downloaders should be a function under this signature:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	110 # dl(video: dict, basename: str, output: str) -> int
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	111 # where:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	112 # 'video': the .info.json scraped from the YTPMV metadata archive.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	113 # 'basename': the basename output to write as.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	114 # 'output': the output directory.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	115 # yes, it's weird, but I don't care ;)
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	116
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	117 class DownloaderStatus(enum.Enum):
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	118 # Download finished successfully.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	119 SUCCESS = 0
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	120 # Download failed.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	121 # Note that this should NOT be used for when the video is unavailable
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	122 # (i.e. error 404); it should only be used when the video cannot be
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	123 # downloaded at this time, indicating a server problem. This is very
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	124 # common for the Internet Archive, not sure about others.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	125 ERROR = 1
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	126 # Video is unavailable from this provider.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	127 UNAVAILABLE = 2
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	128
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	129 """
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	130 Downloads a file from `url` to `path`, and prints the progress to the
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	131 screen.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	132 """
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	133 def download_file(url: str, path: str, guessext: bool = False, length: int = None) -> DownloaderStatus:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	134 # Download in 32KiB chunks
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	135 CHUNK_SIZE = 32768
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	136
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	137 # Don't exceed 79 chars.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	138 try:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	139 with urllib.request.urlopen(url) as http:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	140 if length is None:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	141 # Check whether the URL gives us Content-Length.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	142 # If so, call f.truncate to tell the filesystem how much
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	143 # we will be downloading before we start writing.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	144 #
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	145 # This is also useful for displaying how much we've
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	146 # downloaded overall as a percent.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	147 length = http.getheader("Content-Length", default=None)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	148 try:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	149 if length is not None:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	150 length = int(length)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	151 f.truncate(length)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	152 except:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	153 # fuck it
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	154 length = None
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	155
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	156 if guessext:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	157 # Guess file extension from MIME type
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	158 mime = http.getheader("Content-Type", default=None)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	159 if not mime:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	160 return DownloaderStatus.ERROR
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	161
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	162 if mime == "video/mp4":
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	163 path += ".mp4"
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	164 elif mime == "video/webm":
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	165 path += ".webm"
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	166 else:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	167 return DownloaderStatus.ERROR
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	168
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	169 par = os.path.dirname(path)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	170 if not os.path.isdir(par):
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	171 os.makedirs(par)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	172
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	173 with open(path, "wb") as f:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	174 # Download the entire file
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	175 while True:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	176 data = http.read(CHUNK_SIZE)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	177 if not data:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	178 break
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	179
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	180 f.write(data)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	181 print("\r downloading to %s, " % path, end="")
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	182 if length:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	183 print("%.2f%%" % (f.tell() / length * 100.0), end="")
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	184 else:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	185 print("%.2f MiB" % (f.tell() / (1 << 20)), end="")
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	186
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	187 print("\r downloaded to %s " % path)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	188
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	189 if length is not None and length != f.tell():
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	190 # Server lied about what the length was?
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	191 print(" INFO: HTTP server's Content-Length header lied??")
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	192 except TimeoutError:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	193 return DownloaderStatus.ERROR
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	194 except HTTPError:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	195 return DownloaderStatus.UNAVAILABLE
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	196 except Exception as e:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	197 print(" unknown error downloading video;", e);
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	198 return DownloaderStatus.ERROR
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	199
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	200 return DownloaderStatus.SUCCESS
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	201
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	202
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	203 # Basic downloader template.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	204 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	205 # This does a brute-force of all extensions within vexts and iexts
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	206 # in an attempt to find a working video link.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	207 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	208 # linktemplate is a template to be created using the video ID and
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	209 # extension. For example:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	210 # https://cdn.ytarchiver.com/%s.%s
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	211 def basic_dl_template(video: dict, basename: str, output: str,
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	212 linktemplate: str, vexts: list, iexts: list) -> DownloaderStatus:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	213 # actual downloader
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	214 def basic_dl_impl(vid: str, ext: str) -> int:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	215 url = (linktemplate % (vid, ext))
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	216 return download_file(url, "%s.%s" % (basename, ext))
4 aa652a6f97af Update channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 3 diff changeset	217
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	218 for exts in [vexts, iexts]:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	219 for ext in exts:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	220 r = basic_dl_impl(video["id"], ext)
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	221 if r == DownloaderStatus.SUCCESS:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	222 break # done!
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	223 elif r == DownloaderStatus.ERROR:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	224 # timeout; try again later?
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	225 return DownloaderStatus.ERROR
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	226 elif r == DownloaderStatus.UNAVAILABLE:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	227 continue
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	228 else:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	229 # we did not break out of the loop
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	230 # which means all extensions were unavailable
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	231 return DownloaderStatus.UNAVAILABLE
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	232
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	233 # video was downloaded successfully
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	234 return DownloaderStatus.SUCCESS
6 5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	235
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	236
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	237 # GhostArchive, basic...
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	238 def ghostarchive_dl(video: dict, basename: str, output: str) -> DownloaderStatus:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	239 return basic_dl_template(video, basename, output,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	240 "https://ghostvideo.b-cdn.net/chimurai/%s.%s",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	241 ["mp4", "webm", "mkv"],
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	242 [] # none
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	243 )
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	244
0 d098a293a02d Add channeldownloader.py Paper <37962225+mrpapersonic@users.noreply.github.com> parents: diff changeset	245
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	246 # media.desirintoplaisir.net
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	247 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	248 # holds PRIMARILY popular videos (i.e. no niche internet microcelebrities)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	249 # or weeb shit, however it seems to be growing to other stuff.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	250 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	251 # there isn't really a proper API; I've based the scraping off of the HTML
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	252 # and the public source code.
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	253 def desirintoplaisir_dl(video: dict, basename: str, output: str) -> DownloaderStatus:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	254 return basic_dl_template(video, basename, output,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	255 "https://media.desirintoplaisir.net/content/%s.%s",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	256 ["mp4", "webm", "mkv"],
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	257 ["webp"]
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	258 )
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	259
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	260
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	261 # Internet Archive's Wayback Machine
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	262 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	263 # Internally, IA's javascript routines forward to the magic
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	264 # URL used here.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	265 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	266 # TODO: Download thumbnails through the CDX API:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	267 # https://github.com/TheTechRobo/youtubevideofinder/blob/master/lostmediafinder/finder.py
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	268 # the CDX API is pretty slow though, so it should be used as a last resort.
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	269 def wayback_dl(video: dict, basename: str, output: str) -> DownloaderStatus:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	270 PREFIX = "https://web.archive.org/web/2oe_/http://wayback-fakeurl.archive.org/yt/"
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	271 return download_file(PREFIX + video["id"], basename, True)
11 1ac85f6f40c4 channeldownloader: insane memory optimizations Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 10 diff changeset	272
1ac85f6f40c4 channeldownloader: insane memory optimizations Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 10 diff changeset	273
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	274 # Also captures the ID for comparison
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	275 IA_REGEX = re.compile(r"(?:(?P<date>\d{8}) - )?(?P<title>.+?)?(?:-\| \[)?(?:(?P<id>[A-z0-9_\-]{11})]?\|(?: $(?P<format>(?:(?:(?P<resolution>\d+)p_(?P<fps>\d+)fps_(?P<vcodec>H264)-)?(?P<abitrate>\d+)kbit_(?P<acodec>AAC\|Vorbis))\|BQ\|Description)$))\.(?P<extension>mp4\|info\.json\|description\|annotations\.xml\|webp\|mkv\|webm\|jpg\|jpeg\|ogg\|txt\|m4a)$")
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	276
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	277
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	278 # Internet Archive (tubeup)
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	279 #
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	280 # NOTE: We don't actually need the python library anymore; we already
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	281 # explicitly download the file listing using our own logic, so there's
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	282 # really nothing stopping us from going ahead and downloading everything
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	283 # else using the download_file function.
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	284 def ia_dl(video: dict, basename: str, output: str) -> DownloaderStatus:
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	285 def ia_file_legit(f: str, vidid: str, vidtitle: str) -> bool:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	286 # FIXME:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	287 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	288 # There are some items on IA that combine the old tubeup behavior
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	289 # (i.e., including the sanitized video name before the ID)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	290 # and the new tubeup behavior (filename only contains the video ID)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	291 # hence we will download the entire video twice.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	292 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	293 # This isn't much of a problem anymore (and hasn't been for like 3
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	294 # years), since I contributed code to not upload something if there
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	295 # is already something there. However we should handle this case
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	296 # anyway.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	297 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	298 # Additionally, there are some items that have duplicate video files
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	299 # (from when the owners changed the title). We should ideally only
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	300 # download unique files. IA seems to provide SHA1 hashes...
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	301 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	302 # We should also check if whether the copy on IA is higher quality
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	303 # than a local copy... :)
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	304
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	305 IA_ID = "youtube-%s" % vidid
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	306
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	307 # Ignore IA generated thumbnails
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	308 if f.startswith("%s.thumbs/" % IA_ID) or f == "__ia_thumb.jpg":
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	309 return False
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	310
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	311 for i in ["_archive.torrent", "_files.xml", "_meta.sqlite", "_meta.xml"]:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	312 if f == (IA_ID + i):
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	313 return False
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	314
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	315 # Try to match with our known filename regex
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	316 # This properly matches:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	317 # ??????????? - YYYYMMDD - TITLE [ID].EXTENSION
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	318 # old tubeup - TITLE-ID.EXTENSION
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	319 # tubeup - ID.EXTENSION
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	320 # JDownloader - TITLE (FORMAT).EXTENSION
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	321 # (Possibly we should match other filenames too??)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	322 m = re.match(IA_REGEX, f)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	323 if m is None:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	324 return False
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	325
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	326 if m.group("id"):
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	327 return (m.group("id") == vidid)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	328 elif m.group("title") is not None:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	329 def asciify(s: str) -> str:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	330 # Replace all non-ASCII chars with underscores, and get rid of any whitespace
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	331 return ''.join([i if ord(i) >= 0x20 and ord(i) < 0x80 and i not in "/\\" else '_' for i in s]).strip()
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	332
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	333 if asciify(m.group("title")) == asciify(vidtitle):
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	334 return True # Close enough
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	335
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	336 # Uh oh
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	337 return False
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	338
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	339 def ia_get_original_files(identifier: str) -> typing.Optional[list]:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	340 def ia_xml(identifier: str) -> typing.Optional[str]:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	341 for _ in range(1, 9999):
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	342 try:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	343 with urllib.request.urlopen("https://archive.org/download/%s/%s_files.xml" % (identifier, identifier)) as req:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	344 return req.read().decode("utf-8")
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	345 except HTTPError as e:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	346 if e.code == 404 or e.code == 503:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	347 return None
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	348 time.sleep(5)
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	349
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	350 d = ia_xml(identifier)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	351 if d is None:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	352 return None
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	353
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	354 try:
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	355 r = []
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	356
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	357 # Now parse the XML and make a list of each original file
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	358 for x in filter(lambda x: x.attrib["source"] == "original", XmlET.fromstring(d)):
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	359 l = {"name": x.attrib["name"]}
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	360
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	361 sz = x.find("size")
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	362 if sz is not None:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	363 l["size"] = int(sz.text)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	364
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	365 r.append(l)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	366
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	367 return r
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	368
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	369 except Exception as e:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	370 print(e)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	371 return None
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	372
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	373 IA_IDENTIFIER = "youtube-%s" % video["id"]
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	374
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	375 originalfiles = ia_get_original_files(IA_IDENTIFIER)
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	376 if not originalfiles:
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	377 return DownloaderStatus.UNAVAILABLE
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	378
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	379 flist = [
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	380 f
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	381 for f in originalfiles
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	382 if ia_file_legit(f["name"], video["id"], video["title"] if not "fulltitle" in video else video["fulltitle"])
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	383 ]
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	384
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	385 if not flist:
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	386 return DownloaderStatus.UNAVAILABLE # ??????
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	387
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	388 for i in flist:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	389 for _ in range(1, 10):
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	390 path = "%s/%s" % (IA_IDENTIFIER, i["name"])
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	391 r = download_file("https://archive.org/download/" + urllib.parse.quote(path, encoding="utf-8"), path, False, None if not "size" in i else i["size"])
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	392 if r == DownloaderStatus.SUCCESS:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	393 break
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	394 elif r == DownloaderStatus.ERROR:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	395 # sleep for a bit and retry
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	396 time.sleep(1.0)
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	397 continue
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	398 elif r == DownloaderStatus.UNAVAILABLE:
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	399 return DownloaderStatus.UNAVAILABLE
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	400
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	401 # Newer versions of tubeup save only the video ID.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	402 # Account for this by replacing it.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	403 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	404 # paper/2025-08-30: fixed a bug where video IDs with hyphens
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	405 # would incorrectly truncate
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	406 #
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	407 # paper/2026-02-27: an update in the IA python library changed
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	408 # the way destdir works, so it just gets entirely ignored.
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	409 for f in flist:
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	410 def getext(s: str, vidid: str) -> typing.Optional[str]:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	411 # special cases
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	412 for i in [".info.json", ".annotations.xml"]:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	413 if s.endswith(i):
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	414 return i
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	415
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	416 # Handle JDownloader "TITLE (Description).txt"
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	417 if s.endswith(" (Description).txt"):
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	418 return ".description"
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	419
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	420 # Catch-all for remaining extensions
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	421 spli = os.path.splitext(s)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	422 if spli is None or len(spli) != 2:
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	423 return None
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	424
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	425 return spli[1]
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	426
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	427 ondisk = "youtube-%s/%s" % (video["id"], f["name"])
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	428
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	429 if not os.path.exists(ondisk):
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	430 continue
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	431
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	432 ext = getext(f["name"], video["id"])
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	433 if ext is None:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	434 continue
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	435
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	436 os.replace(ondisk, "%s%s" % (basename, ext))
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	437
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	438 shutil.rmtree("youtube-%s" % video["id"])
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	439
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	440 return DownloaderStatus.SUCCESS
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	441
11 1ac85f6f40c4 channeldownloader: insane memory optimizations Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 10 diff changeset	442
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	443 def ytdlp_dl(video: dict, basename: str, output: str) -> DownloaderStatus:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	444 # intentionally ignores all messages besides errors
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	445 class MyLogger(object):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	446 def debug(self, msg):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	447 pass
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	448
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	449 def warning(self, msg):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	450 pass
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	451
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	452 def error(self, msg):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	453 print(" " + msg)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	454 pass
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	455
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	456
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	457 def ytdl_hook(d) -> None:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	458 if d["status"] == "finished":
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	459 print(" downloaded %s: 100%% " % (os.path.basename(d["filename"])))
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	460 if d["status"] == "downloading":
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	461 print(" downloading %s: %s\r" % (os.path.basename(d["filename"]),
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	462 d["_percent_str"]), end="")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	463 if d["status"] == "error":
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	464 print("\n an error occurred downloading %s!"
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	465 % (os.path.basename(d["filename"])))
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	466
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	467 ytdl_opts = {
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	468 "retries": 100,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	469 "nooverwrites": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	470 "call_home": False,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	471 "quiet": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	472 "writeinfojson": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	473 "writedescription": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	474 "writethumbnail": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	475 "writeannotations": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	476 "writesubtitles": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	477 "allsubtitles": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	478 "addmetadata": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	479 "continuedl": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	480 "embedthumbnail": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	481 "format": "bestvideo+bestaudio/best",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	482 "restrictfilenames": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	483 "no_warnings": True,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	484 "progress_hooks": [ytdl_hook],
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	485 "logger": MyLogger(),
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	486 "ignoreerrors": False,
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	487 # yummy
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	488 "outtmpl": output + "/%(title)s-%(id)s.%(ext)s",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	489 }
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	490
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	491 with youtube_dl.YoutubeDL(ytdl_opts) as ytdl:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	492 try:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	493 ytdl.extract_info("https://youtube.com/watch?v=%s" % video["id"])
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	494 return DownloaderStatus.SUCCESS
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	495 except DownloadError:
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	496 return DownloaderStatus.UNAVAILABLE
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	497 except Exception as e:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	498 print(" unknown error downloading video!\n")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	499 print(e)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	500
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	501 return DownloaderStatus.ERROR
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	502
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	503
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	504 # TODO: There are multiple other youtube archival websites available.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	505 # Most notable is https://findyoutubevideo.thetechrobo.ca .
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	506 # This combines a lot of sparse youtube archival services, and has
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	507 # a convenient API we can use. Nice!
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	508 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	509 # There is also the "Distributed YouTube Archive" which is totally
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	510 # useless because there's way to automate it...
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	511
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	512 ##############################################################################
11 1ac85f6f40c4 channeldownloader: insane memory optimizations Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 10 diff changeset	513
1ac85f6f40c4 channeldownloader: insane memory optimizations Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 10 diff changeset	514
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	515 def main():
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	516 def load_split_files(path: str):
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	517 def cruft(isdir: bool, listdir, openf):
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	518 # build the path list
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	519 if not isdir:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	520 list_files = [path]
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	521 else:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	522 list_files = filter(lambda x: re.search(r"vids[0-9\-]+?\.json", x), listdir())
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	523
15 615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	524 # now open each as a json
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	525 for fi in list_files:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	526 print(fi)
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	527 with openf(fi, "r") as infile:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	528 if simdjson:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	529 # Using this is a lot faster in SIMDJSON, since instead
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	530 # of converting all of the JSON key/value pairs into
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	531 # native Python objects, they stay in an internal state.
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	532 #
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	533 # This means we only get the stuff we absolutely need,
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	534 # which is the uploader ID, and copy everything else
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	535 # if the ID is one we are looking for.
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	536 parser = json.Parser()
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	537 yield parser.parse(infile.read())
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	538 del parser
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	539 else:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	540 yield json.load(infile)
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	541
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	542
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	543 try:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	544 if not zipfile_works or os.path.isdir(path):
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	545 raise Exception
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	546
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	547 with zipfile.ZipFile(path, "r") as myzip:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	548 yield from cruft(True, lambda: myzip.namelist(), lambda f, m: io.TextIOWrapper(myzip.open(f, mode=m), encoding="utf-8"))
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	549 except Exception as e:
615e1ca0212a : add support for loading the split db from a zip file Paper <paper@tflc.us>* parents: 14 diff changeset	550 yield from cruft(os.path.isdir(path), lambda: os.listdir(path), lambda f, m: open(path + "/" + f, m, encoding="utf-8"))
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	551
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	552
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	553 def write_metadata(i: dict, basename: str) -> None:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	554 # ehhh
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	555 if not os.path.exists(basename + ".info.json"):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	556 with open(basename + ".info.json", "w", encoding="utf-8") as jsonfile:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	557 try:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	558 # orjson outputs bytes
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	559 jsonfile.write(json.dumps(i).decode("utf-8"))
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	560 except AttributeError:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	561 # everything else outputs a string
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	562 jsonfile.write(json.dumps(i))
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	563 print(" saved %s" % os.path.basename(jsonfile.name))
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	564 if not os.path.exists(basename + ".description"):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	565 with open(basename + ".description", "w",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	566 encoding="utf-8") as descfile:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	567 descfile.write(i["description"])
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	568 print(" saved %s" % os.path.basename(descfile.name))
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	569
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	570 args = docopt.docopt(__doc__)
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	571
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	572 if not os.path.exists(args["--output"]):
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	573 os.mkdir(args["--output"])
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	574
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	575 channels = dict()
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	576
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	577 for url in args["<url>"]:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	578 chn = url.split("/")[-1]
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	579 channels[chn] = {"output": "%s/%s" % (args["--output"], chn)}
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	580
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	581 for channel in channels.values():
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	582 if not os.path.exists(channel["output"]):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	583 os.mkdir(channel["output"])
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	584
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	585 # find videos in the database.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	586 #
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	587 # despite how it may seem, this is actually really fast, and fairly
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	588 # memory efficient too (but really only if we're using simdjson...)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	589 videos = [
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	590 i if not simdjson else i.as_dict()
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	591 for f in load_split_files(args["--database"])
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	592 for i in (f if not "videos" in f else f["videos"]) # logic is reversed kinda, python is weird
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	593 if "uploader_id" in i and i["uploader_id"] in channels
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	594 ]
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	595
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	596 while True:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	597 if len(videos) == 0:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	598 break
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	599
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	600 videos_copy = videos
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	601
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	602 for i in videos_copy:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	603 channel = channels[i["uploader_id"]]
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	604
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	605 # precalculated for speed
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	606 output = channel["output"]
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	607
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	608 print("%s:" % i["id"])
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	609 basename = "%s/%s-%s" % (output, sanitize_filename(i["title"],
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	610 restricted=True), i["id"])
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	611 def filenotworthit(f) -> bool:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	612 try:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	613 return bool(os.path.getsize(f))
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	614 except:
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	615 return False
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	616
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	617 pathoutput = Path(output)
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	618
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	619 # This is terrible
088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	620 files = list(filter(filenotworthit, [y
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	621 for p in ["mkv", "mp4", "webm"]
16 088d9a3a2524 improvements to IA downloader Paper <paper@tflc.us> parents: 15 diff changeset	622 for y in pathoutput.glob(("*-%s." + p) % i["id"])]))
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	623 if files:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	624 print(" video already downloaded!")
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	625 videos.remove(i)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	626 write_metadata(i, basename)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	627 continue
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	628
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	629 # high level "download" function.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	630 def dl(video: dict, basename: str, output: str):
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	631 dls = []
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	632
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	633 if ytdlp_works:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	634 dls.append({
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	635 "func": ytdlp_dl,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	636 "name": "using yt-dlp",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	637 })
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	638
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	639 dls.append({
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	640 "func": ia_dl,
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	641 "name": "from the Internet Archive",
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	642 })
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	643
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	644 dls.append({
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	645 "func": desirintoplaisir_dl,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	646 "name": "from LMIJLM/DJ Plaisir's archive",
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	647 })
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	648 dls.append({
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	649 "func": ghostarchive_dl,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	650 "name": "from GhostArchive"
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	651 })
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	652 dls.append({
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	653 "func": wayback_dl,
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	654 "name": "from the Wayback Machine"
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	655 })
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	656
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	657 for dl in dls:
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	658 print(" attempting to download %s" % dl["name"])
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	659 r = dl["func"](i, basename, output)
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	660 if r == DownloaderStatus.SUCCESS:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	661 # all good, video's downloaded
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	662 return DownloaderStatus.SUCCESS
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	663 elif r == DownloaderStatus.UNAVAILABLE:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	664 # video is unavailable here
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	665 print(" oops, video is not available there...")
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	666 continue
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	667 elif r == DownloaderStatus.ERROR:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	668 # error while downloading; likely temporary.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	669 # TODO we should save which downloader the video
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	670 # was on, so we can continue back at it later.
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	671 return DownloaderStatus.ERROR
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	672
05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	673 return DownloaderStatus.UNAVAILABLE
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	674
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	675 r = dl(i, basename, output)
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	676 if r == DownloaderStatus.ERROR:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	677 continue
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	678
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	679 # video is downloaded, or it's totally unavailable, so
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	680 # remove it from being checked again.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	681 videos.remove(i)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	682 # ... and then dump the metadata, if there isn't any on disk.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	683 write_metadata(i, basename)
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	684
18 05e71dd6b6ca no more ia python library Paper <paper@tflc.us> parents: 17 diff changeset	685 if r == DownloaderStatus.SUCCESS:
14 03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	686 # video is downloaded
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	687 continue
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	688
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	689 # video is unavailable; write out the metadata.
03c8fd4069fb : big refactor, switch to GPLv2, and add README Paper <paper@tflc.us>* parents: 13 diff changeset	690 print(" video is unavailable everywhere; dumping out metadata only")
6 5d93490e60e2 [channeldownloader.py] Implement HTTPError to circumvent Python 2 weirdness Paper <37962225+mrpapersonic@users.noreply.github.com> parents: 5 diff changeset	691
10 8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	692
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	693 if __name__ == "__main__":
8969930a9fa4 : major cleanup Paper <37962225+mrpapersonic@users.noreply.github.com>* parents: 9 diff changeset	694 main()

Mercurial > channeldownloader

annotate channeldownloader.py @ 18:05e71dd6b6ca default tip