-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Identify core/essential metadata and add upload safeties for missing MD #279
Comments
@vxbinaca What Youtube metadata are we currently not uploading to IA that yt-dlp is able to extract? |
If it's able to be extracted it's in JSON. I'm talking about minimum metadata for items. Harkening back to the recent What I'm saying is, we'd need creator, video URL, and the time to be present to make a valid item where creator isn't Not all metadata needs to have a item creation halt, but what I'm saying is it would be helpful to identify what is core metadata and insert safeties to prevent uploading if it's missing. |
After thinking about this for a bit, I'm thinking about erring on being conservative from a cultural preservation perspective. As long as enough metadata exists to upload an item (ie unique identifier from the service the content is being retrieved from), tubeup should continue on so the artifacts are preserved. If metadata can be derived in the future from other data sources (or programatic analysis of the uploaded artifacts), so be it. |
Given the issues with live chat extraction increasingly (broken on Youtube live videos IIRC, but for sure Twitch), should live chat be considered a core metadata like manual subtitles (auto subs suck)? Are channel URLs a core metadata? Twitch doesn't give them it gives the video URL instead. |
If we're considering live chat to be metadata that is in-scope for tubeup to handle, then yes. Though I would argue that live chat should be in the same realm of comments, as in it's not really our problem to deal with, since realistically we should only be handling the video itself and its surrounding metadata.
Only on platforms where that URL cannot be changed at will by the user (see: youtube and channel IDs). In other cases where the user is able to change the URL/ID at will it's not very useful at all. In fact, now tubeup uses the stupid channel handles youtube added which is actually fairly annoying in itself. p.s. sorry for being like, a year late x) |
No it's fine. What of extractors that didn't provide (but I think do now) like BilliBilli like channel URLs? Edit: With youtube livechat I believe thats extracted into JSON, but Twitches is broken with yt-dlp. Do all sites have live chat? No. Do all of them have creator metadata? Yes - mostly. A current example of this is the OnlyFans TV extractor which is all kinds of messed up right now and not routing metadata properly. Take any OFTV video and try to rip it. |
I'm not sure really. It's likely best though to not consider channel URLs as particularly important. We should warn if a URL could not be found though so the user can manually fix it and send a report to yt-dlp to fix/expand the extractors.
no offense but I wouldn't touch that website with a ten foot pole lmao |
OFTV is all non-nude content it's public facing and it's exclusive, lots of cooking classes or game stream clips. |
@mrpapersonic so the issue is we put fail-safe in place years ago to prevent blank creator items being rejected by IA. The creator tag is blank due to some extractors (LBRY/Odysee, Instagram posts containing multiple videos, many other extractors) misdirecting or not having the Creator tag. This failsafe was "tubeup.py" being put into the creator tag. The alternative was item creation failure and the video and metadata being stuck on the users disk. What I want to do is try to find permutations of the creator tag and account for it, using a few other extractors as a test case to build whats breaking, because I've been (rightly) telling people for years that the problem isn't us it's the extractor not naming or accounting for metadata properly. Anyway last bump on this for a while: I want to define the minimum metadata elements we need to upload an item, so I can go to yt-dlp people and ask them to modify their standard for accepting an extractor, and then I or a few of us can go and find what extractors are broken and flag them to meet those minimum standards. Either that or we use |
See title.
Twitch extractor currently does not add channel metadata, TikTok though broken also did the same. This proposal is aimed at Youtube.
The text was updated successfully, but these errors were encountered: