Skip to content

Handle age verification links on youtube #318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
baryluk opened this issue Mar 23, 2012 · 6 comments
Closed

Handle age verification links on youtube #318

baryluk opened this issue Mar 23, 2012 · 6 comments

Comments

@baryluk
Copy link
Contributor

baryluk commented Mar 23, 2012

How about supporting links of this form:

http://www.youtube.com/verify_age?next_url=/watch%3Ffeature%3Dplayer_embedded%26v%3DVXLz-uMLtmA

It is quite easy to extract original link, however it may be inconvient to copy it (real link) from browser, because for example we clicked from flash, and have no way of copy things to clipboard, or it was linked in this way originally.

It is similar to handling youtu.be links.

@baryluk
Copy link
Contributor Author

baryluk commented Mar 23, 2012

If you do not have a time, just say, I can work on it next week probably. Do you think it is useful for others?

@baryluk
Copy link
Contributor Author

baryluk commented Mar 23, 2012

Ok, first try. Looks to work nice

diff --git a/youtube-dl b/youtube-dl
index 5224611..14adb81 100755
--- a/youtube-dl
+++ b/youtube-dl
@@ -1172,6 +1172,7 @@ class YoutubeIE(InfoExtractor):
        """Information extractor for youtube.com."""

        _VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z
+       _VALID_URL_WITH_AGE = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com))/verify_age\?next_url=([^&]+)(.+)?$'
        _LANG_URL = r'http://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
        _LOGIN_URL = 'https://www.youtube.com/signup?next=/&gl=US&hl=en'
        _AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
@@ -1335,6 +1336,14 @@ class YoutubeIE(InfoExtractor):
                        return

        def _real_extract(self, url):
+               # Extract original video URL from URL with age verification, using next_url parameter
+               mobj = re.match(self._VALID_URL_WITH_AGE, url)
+               if mobj:
+                       urldecode = lambda x: re.sub('%([0-9a-hA-H]{2})', lambda m: chr(int(m.group(1), 16)), x)
+                       # Keep original domain. We can probably change to www.youtube.com, but it should not hurt so keep it.
+                       # Remember that mobj.group(2), a next_url, contains a leading slash (/) in it.
+                       url = mobj.group(1) + urldecode(mobj.group(2))
+
                # Extract video id from URL
                mobj = re.match(self._VALID_URL, url)
                if mobj is None:

@baryluk
Copy link
Contributor Author

baryluk commented Mar 23, 2012

Ok, this may be better. I moved a common part of _VALID_URL and _VALID_URL_WITH_AGE to _PREFIX variable. This means I need to take care of double slash in request (from domain name part, that it from _PREFIX; and from next_url variable itself). I just run small re, which takes care of it.

diff --git a/youtube-dl b/youtube-dl
index 5224611..d8b33e5 100755
--- a/youtube-dl
+++ b/youtube-dl
@@ -1171,7 +1171,9 @@ class InfoExtractor(object):
 class YoutubeIE(InfoExtractor):
        """Information extractor for youtube.com."""

-       _VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
+       _PREFIX = r'(?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)'
+       _VALID_URL = r'^('+_PREFIX+r'(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
+       _VALID_URL_WITH_AGE = r'^('+_PREFIX+')verify_age\?next_url=([^&]+)(?:.+)?$'
        _LANG_URL = r'http://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
        _LOGIN_URL = 'https://www.youtube.com/signup?next=/&gl=US&hl=en'
        _AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
@@ -1335,6 +1337,14 @@ class YoutubeIE(InfoExtractor):
                        return

        def _real_extract(self, url):
+               # Extract original video URL from URL with age verification, using next_url parameter
+               mobj = re.match(self._VALID_URL_WITH_AGE, url)
+               if mobj:
+                       urldecode = lambda x: re.sub(r'%([0-9a-hA-H][0-9a-hA-H])', lambda m: chr(int(m.group(1), 16)), x)
+                       # Keep original domain. We can probably change to www.youtube.com, but it should not hurt so keep it.
+                       # We just make sure we do not have double //, in URL, so we strip starting slash in next_url.
+                       url = mobj.group(1) + re.sub(r'^/', '', urldecode(mobj.group(2)))
+
                # Extract video id from URL
                mobj = re.match(self._VALID_URL, url)
                if mobj is None:

Edit. long line was truncated. Sorry, git diff is guilty! Fixed.

@baryluk
Copy link
Contributor Author

baryluk commented Mar 23, 2012

This is this change in my rapo.

baryluk/youtube-dl@69d3b2d

Looks to be working.

@FiloSottile
Copy link
Collaborator

I modified your code a bit to make it work whenever next_url is present.
That also it makes it clearer IMHO: it took me time to understand why the _VALID_URL_WITH_AGE was working as the system only matches against _VALID_URL. In fact, the next_url stuff is only used during extraction, so i changed the regex name to a different one.

Also swapped your pretty urldecode lambda (I like your functional programming style!) with urllib.unquote.

Please note that development is in youtube_dl/, then compiled into youtube-dl with make compile

@FiloSottile
Copy link
Collaborator

Got merged and is working

joedborg referenced this issue in joedborg/youtube-dl Nov 17, 2020
[pull] master from ytdl-org:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants