Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telegraaf] Unable to download JSON metadata: HTTP Error 403 #31710

Open
jgrosmann opened this issue Feb 27, 2023 · 16 comments
Open

[Telegraaf] Unable to download JSON metadata: HTTP Error 403 #31710

jgrosmann opened this issue Feb 27, 2023 · 16 comments
Labels
broken-IE problem with existing site extraction fixed

Comments

@jgrosmann
Copy link

youtube-dl https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] 644858720: Downloading JSON metadata
ERROR: Unable to download JSON metadata: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

@dirkf
Copy link
Contributor

dirkf commented Feb 27, 2023

Fixed in git master:

commit d35557a75d943865e40410d51bfcc18276e98532
Author: coletdjnz <[email protected]>
Date:   Fri Sep 23 12:10:35 2022 +1200

    [Telegraaf] Use mobile GraphQL API endpoint
    
    Workaround for Cloudflare 403
    Fixes https://github.com/yt-dlp/yt-dlp/issues/5000
    Authored by: coletdjnz

Also: #30839

@nicolaasjan
Copy link

Be sure to call youtube-dl with the --verbose flag and include its complete output.

Worked here (although there were a lot of lines like e.g. this:

[mp4 @ 0x55a127d008c0] Invalid DTS: 8305200 PTS: 8301600 in output stream 0:0, replacing by guess

Output:
https://pastebin.com/eXQtdMZm

(youtube-dl latest version from source code)

@dirkf dirkf changed the title Unable to download JSON metadata: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>) [Telegraaf] Unable to download JSON metadata: HTTP Error 403 Feb 27, 2023
@nicolaasjan
Copy link

Video is out of sync.
Same with yt-dlp:

yt-dlp -v --ignore-config https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[debug] Command-line config: ['-v', '--ignore-config', 'https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.02.26 [8e9fe43cd] (zip)
[debug] Python 3.8.10 (CPython x86_64 64bit) - Linux-5.4.0-139-generic-x86_64-with-glibc2.29 (OpenSSL 1.1.1f  31 Mar 2020, glibc 2.31)
[debug] exe versions: ffmpeg N-109874-gaeceefa622-Nico-20230218 (fdk,setts), ffprobe N-109874-gaeceefa622-Nico-20230218, phantomjs 2.1.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.17, brotli-1.0.9, certifi-2022.12.07, mutagen-1.46.0, secretstorage-3.3.3, sqlite3-2.6.0, websockets-10.4, xattr-0.9.6
[debug] Proxy map: {}
[debug] Loaded 1782 extractors
[Telegraaf] Extracting URL: https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[debug] Default format spec: bestvideo*+bestaudio/best
[info] RylE3djQ5q02: Downloading 1 format(s): hls-3943+dash-audio=127999
[debug] Invoking hlsnative downloader on "https://media.tmgvideo.nl/hls/account=Kx1PKc/item=RylE3djQ5q02/version=202302261308_5/v2.0-RylE3djQ5q02-hls-202302261308_5-video=3597000.m3u8?v=20230226130802_4"
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fhls-3943.mp4
[download]   7.3% of ~  71.84MiB at    4.36MiB/s ETA 00:08 (frag 1/19)[download] Got error: Downloaded 1048576 bytes, expected 3504696 bytes. Retrying (1/10)...
[download]   8.6% of ~  71.84MiB at    4.61MiB/s ETA 00:09 (frag 1/19)[download] Got error: Downloaded 2064384 bytes, expected 3504696 bytes. Retrying (2/10)...
[download]  38.8% of ~  68.39MiB at    4.86MiB/s ETA 00:04 (frag 7/19)[download] Got error: Downloaded 1146880 bytes, expected 3507704 bytes. Retrying (1/10)...
[download]  57.2% of ~  67.52MiB at    6.42MiB/s ETA 00:03 (frag 10/19)[download] Got error: Downloaded 3145728 bytes, expected 3652088 bytes. Retrying (1/10)...
[download]  63.5% of ~  68.26MiB at    5.63MiB/s ETA 00:03 (frag 12/19)[download] Got error: Downloaded 983040 bytes, expected 4470264 bytes. Retrying (1/10)...
[download]  65.1% of ~  68.26MiB at    5.65MiB/s ETA 00:02 (frag 12/19)[download] Got error: Downloaded 2064384 bytes, expected 4470264 bytes. Retrying (2/10)...
[download]  75.4% of ~  67.73MiB at    5.09MiB/s ETA 00:02 (frag 14/19)[download] Got error: Downloaded 1015808 bytes, expected 3531768 bytes. Retrying (1/10)...
[download]  76.9% of ~  67.73MiB at    3.18MiB/s ETA 00:02 (frag 14/19)[download] Got error: Downloaded 2064384 bytes, expected 3531768 bytes. Retrying (2/10)...
[download]  91.0% of ~  67.06MiB at    5.49MiB/s ETA 00:00 (frag 17/19)[download] Got error: Downloaded 1081344 bytes, expected 3718264 bytes. Retrying (1/10)...
[download] 100% of   63.85MiB in 00:00:09 at 6.44MiB/s
[debug] Invoking dashsegments downloader on "https://media.tmgvideo.nl/dash/account=Kx1PKc/item=RylE3djQ5q02/version=202302261308_5/RylE3djQ5q02.mpd?v=20230226130802_4"
[dashsegments] Total fragments: 74
[download] Destination: Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fdash-audio=127999.m4a
[download] 100% of    2.25MiB in 00:00:04 at 490.34KiB/s
[Merger] Merging formats into "Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fhls-3943.mp4' -i 'file:Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fdash-audio=127999.m4a' -c copy -map 0:v:0 -map 1:a:0 -movflags +faststart 'file:Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].temp.mp4'
Deleting original file Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fhls-3943.mp4 (pass -k to keep)
Deleting original file Nanninga (JA21) over coalitie ruzie: ‘Zit met popcorn te kijken, ga zo door!’ [RylE3djQ5q02].fdash-audio=127999.m4a (pass -k to keep)

@dirkf, is there a remedy for such an issue?

@Vangelis66
Copy link

Vangelis66 commented Feb 27, 2023

When I -F the link in OP, I get myself:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--ffmpeg-location', '.\\FFmpeg', '--external-downloader-args', '-v 8 -stats', '-vF', 'https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door']
[debug] Encodings: locale cp1253, fs mbcs, out cp737, pref cp1253
[debug] youtube-dl version 2023.02.27.114514
[debug] Python version 3.4.4 (CPython) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: ffmpeg n5.2-dev-2245-N-109649-gab8cde6, ffprobe n5.2-dev-2245-N-109649-gab8cde6, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[info] Available formats for RylE3djQ5q02:
format code               extension  resolution note
hls-audio-aacl-127-audio  mp4        audio only
hls-audio-aacl-64-audio   mp4        audio only
dash-audio=64045          m4a        audio only DASH audio   64k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-audio=127999         m4a        audio only DASH audio  127k , m4a_dash container, mp4a.40.2 (48000Hz)
dash-video=318000         mp4        480x270    DASH video  318k , mp4_dash container, avc1.640015, video only
hls-402                   mp4        480x270     402k , avc1.640015, video only
dash-video=1182000        mp4        854x480    DASH video 1182k , mp4_dash container, avc1.64001E, video only
hls-1383                  mp4        854x480    1383k , avc1.64001E, video only
dash-video=2181000        mp4        1280x720   DASH video 2181k , mp4_dash container, avc1.64001F, video only
hls-2442                  mp4        1280x720   2442k , avc1.64001F, video only
dash-video=3597000        mp4        1920x1080  DASH video 3597k , mp4_dash container, avc1.640028, video only
hls-3943                  mp4        1920x1080  3943k , avc1.640028, video only
http-270p                 mp4        480x270
http-480p                 mp4        854x480
http-720p                 mp4        1280x720
http-1080p                mp4        1920x1080  (best)

So, I would've expected that on an actual download attempt, format best = http-1080p would be fetched ... But,

[debug] Default format spec: bestvideo+bestaudio/best

is set as the default... Was it always like that ("fragmented" formats preferred over "standalone container" ones) ?

(although there were a lot of lines like e.g. this:

[mp4 @ 0x55a127d008c0] Invalid DTS: 8305200 PTS: 8301600 in output stream 0:0, replacing by guess

You can suppress such FFmpeg output about inconsistencies between DTS/PTS via
--external-downloader-args "-v 8 -stats" or use --hls-prefer-native flag...

FWIW, DTS = decoding timestamp and PTS = presentation timestamp;
more info here 😄 ...

@dirkf
Copy link
Contributor

dirkf commented Feb 27, 2023

Try -f '[format_id^=http]' ?

@dirkf
Copy link
Contributor

dirkf commented Feb 27, 2023

... Was it always like that ...

FM says:

Since the end of April 2015 and version 2015.04.26, youtube-dl uses -f bestvideo+bestaudio/best as the default format selection (see #5447 (#5447), #5456 (#5456)).

@Vangelis66
Copy link

@nicolaasjan

[info] RylE3djQ5q02: Downloading 1 format(s): hls-3943+dash-audio=127999

The above is from yt-dlp, but it's fetching and (later merging) video over HLS and audio over DASH...

In youtube-dl, my command below:

yt-dl --console-title --hls-prefer-native --hls-use-mpegts -c --no-part -f hls-2442+hls-audio-aacl-127-audio "https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door" -o test.mp4 => 

[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: test.fhls-2442.mp4
[download] 100% of 38.89MiB in 00:59
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 19
[download] Destination: test.fhls-audio-aacl-127-audio.mp4
[download] 100% of 2.45MiB in 00:13
[ffmpeg] Merging formats into "test.mp4"
Deleting original file test.fhls-2442.mp4 (pass -k to keep)
Deleting original file test.fhls-audio-aacl-127-audio.mp4 (pass -k to keep)

produced a media file with perfect A/V sync...
In my own fetches, I never mix different transfer protocols for raw video+raw audio (i.e I specifically request hls-V+hls-A or dash-V+dash-A); as advised, the http(s) formats, whenever available, are a good (and speedier) solution to circumvent consecutive raw stream download and merge...

@dirkf : Thanks for bringing me "up-to-date" 😉 ; memory lapse/brain fog on my part?
My ancient ancestors described it better:

ού γάρ έρχεται μόνον...

@nicolaasjan
Copy link

-f dash-video=3597000+dash-audio=127999 also gives a good result.

If this is an issue with all videos from this site, should the extractor for "De Telegraaf" be rewritten?

@dirkf
Copy link
Contributor

dirkf commented Feb 27, 2023

So, for Telegraaf, -f '(bestvideo+bestaudio)[format_id^=hls]/(bestvideo+bestaudio)[format_id^=dash]/best' ?

Or -f best ?

Or the format selection should automatically attempt to find a matching audio format?

@dirkf

This comment was marked as off-topic.

@Vangelis66

This comment was marked as off-topic.

@dirkf

This comment was marked as off-topic.

@Vangelis66

This comment was marked as off-topic.

@dirkf

This comment was marked as off-topic.

Vangelis66 referenced this issue in dirkf/youtube-dl Mar 2, 2023
@dirkf
Copy link
Contributor

dirkf commented Mar 7, 2023

@pukkandan, is this a known problem?

yt-dl currently needs manual intervention to select the same transport when bestvideo+bestaudio is picked. In this case it would seem better to select the combined format that appears to have the same resolution as the separate formats.

With yt-dlp's bv*+ba, does any audio in the bv* stream get selected ahead of a separate ba, or if it's not "worse"? Apparently not in the logged case.

@pukkandan
Copy link
Contributor

With yt-dlp's bv*+ba, does any audio in the bv* stream get selected ahead of a separate ba, or if it's not "worse"?

Yes. However,

So, I would've expected that on an actual download attempt, format best = http-1080p would be fetched ... But,

Since the hls/dash formats have more metadata available, yt-dlp is treating them as being better.

❯ yt-dlp -F https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] Extracting URL: https://www.telegraaf.nl/video/644858720/nanninga-ja21-over-coalitieruzie-zit-met-popcorn-te-kijken-ga-zo-door
[Telegraaf] 644858720: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading JSON metadata
[Telegraaf] RylE3djQ5q02: Downloading MPD manifest
[Telegraaf] RylE3djQ5q02: Downloading m3u8 information
[info] Available formats for RylE3djQ5q02:
ID                       EXT RESOLUTION │  FILESIZE   TBR PROTO │ VCODEC        VBR ACODEC      ABR ASR MORE INFO
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
hls-audio-aacl-127-audio mp4 audio only │                 m3u8  │ audio only        unknown             audio
hls-audio-aacl-64-audio  mp4 audio only │                 m3u8  │ audio only        unknown             audio
dash-audio=64045         m4a audio only │ ~ 1.13MiB   64k dash  │ audio only        mp4a.40.2   64k 48k DASH audio, m4a_dash
dash-audio=127999        m4a audio only │ ~ 2.27MiB  128k dash  │ audio only        mp4a.40.2  128k 48k DASH audio, m4a_dash
http-270p                mp4 480x270    │                 https │ unknown           unknown
dash-video=318000        mp4 480x270    │ ~ 5.63MiB  318k dash  │ avc1.640015  318k video only          DASH video, mp4_dash
hls-402                  mp4 480x270    │ ~ 7.12MiB  402k m3u8  │ avc1.640015  402k video only
http-480p                mp4 854x480    │                 https │ unknown           unknown
dash-video=1182000       mp4 854x480    │ ~20.92MiB 1182k dash  │ avc1.64001E 1182k video only          DASH video, mp4_dash
hls-1383                 mp4 854x480    │ ~24.48MiB 1383k m3u8  │ avc1.64001E 1383k video only
http-720p                mp4 1280x720   │                 https │ unknown           unknown
dash-video=2181000       mp4 1280x720   │ ~38.60MiB 2181k dash  │ avc1.64001F 2181k video only          DASH video, mp4_dash
hls-2442                 mp4 1280x720   │ ~43.22MiB 2442k m3u8  │ avc1.64001F 2442k video only
http-1080p               mp4 1920x1080  │                 https │ unknown           unknown
dash-video=3597000       mp4 1920x1080  │ ~63.67MiB 3597k dash  │ avc1.640028 3597k video only          DASH video, mp4_dash
hls-3943                 mp4 1920x1080  │ ~69.79MiB 3943k m3u8  │ avc1.640028 3943k video only

This is a trivial fix in the extractor, but before I commit anything, I need confirmation that https formats are always expected to be of same/better quality that others for same resolution (for this site). We can also make it treat hls > dash for audio which would also prevent desync. Unlike prioritizing https, this is something youtube-dl can easily do as well.

Or the format selection should automatically attempt to find a matching audio format?

This would be quite difficult. I can't even think of how we could approach an implementation within the current format selection framework.

@dirkf dirkf added fixed broken-IE problem with existing site extraction labels Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction fixed
Projects
None yet
Development

No branches or pull requests

5 participants