-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Handle age verification links on youtube #318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you do not have a time, just say, I can work on it next week probably. Do you think it is useful for others? |
Ok, first try. Looks to work nice diff --git a/youtube-dl b/youtube-dl
index 5224611..14adb81 100755
--- a/youtube-dl
+++ b/youtube-dl
@@ -1172,6 +1172,7 @@ class YoutubeIE(InfoExtractor):
"""Information extractor for youtube.com."""
_VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z
+ _VALID_URL_WITH_AGE = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com))/verify_age\?next_url=([^&]+)(.+)?$'
_LANG_URL = r'http://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
_LOGIN_URL = 'https://www.youtube.com/signup?next=/&gl=US&hl=en'
_AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
@@ -1335,6 +1336,14 @@ class YoutubeIE(InfoExtractor):
return
def _real_extract(self, url):
+ # Extract original video URL from URL with age verification, using next_url parameter
+ mobj = re.match(self._VALID_URL_WITH_AGE, url)
+ if mobj:
+ urldecode = lambda x: re.sub('%([0-9a-hA-H]{2})', lambda m: chr(int(m.group(1), 16)), x)
+ # Keep original domain. We can probably change to www.youtube.com, but it should not hurt so keep it.
+ # Remember that mobj.group(2), a next_url, contains a leading slash (/) in it.
+ url = mobj.group(1) + urldecode(mobj.group(2))
+
# Extract video id from URL
mobj = re.match(self._VALID_URL, url)
if mobj is None: |
Ok, this may be better. I moved a common part of _VALID_URL and _VALID_URL_WITH_AGE to _PREFIX variable. This means I need to take care of double slash in request (from domain name part, that it from _PREFIX; and from next_url variable itself). I just run small re, which takes care of it. diff --git a/youtube-dl b/youtube-dl
index 5224611..d8b33e5 100755
--- a/youtube-dl
+++ b/youtube-dl
@@ -1171,7 +1171,9 @@ class InfoExtractor(object):
class YoutubeIE(InfoExtractor):
"""Information extractor for youtube.com."""
- _VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
+ _PREFIX = r'(?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)'
+ _VALID_URL = r'^('+_PREFIX+r'(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
+ _VALID_URL_WITH_AGE = r'^('+_PREFIX+')verify_age\?next_url=([^&]+)(?:.+)?$'
_LANG_URL = r'http://www.youtube.com/?hl=en&persist_hl=1&gl=US&persist_gl=1&opt_out_ackd=1'
_LOGIN_URL = 'https://www.youtube.com/signup?next=/&gl=US&hl=en'
_AGE_URL = 'http://www.youtube.com/verify_age?next_url=/&gl=US&hl=en'
@@ -1335,6 +1337,14 @@ class YoutubeIE(InfoExtractor):
return
def _real_extract(self, url):
+ # Extract original video URL from URL with age verification, using next_url parameter
+ mobj = re.match(self._VALID_URL_WITH_AGE, url)
+ if mobj:
+ urldecode = lambda x: re.sub(r'%([0-9a-hA-H][0-9a-hA-H])', lambda m: chr(int(m.group(1), 16)), x)
+ # Keep original domain. We can probably change to www.youtube.com, but it should not hurt so keep it.
+ # We just make sure we do not have double //, in URL, so we strip starting slash in next_url.
+ url = mobj.group(1) + re.sub(r'^/', '', urldecode(mobj.group(2)))
+
# Extract video id from URL
mobj = re.match(self._VALID_URL, url)
if mobj is None: Edit. long line was truncated. Sorry, git diff is guilty! Fixed. |
This is this change in my rapo. Looks to be working. |
I modified your code a bit to make it work whenever next_url is present. Also swapped your pretty Please note that development is in |
Got merged and is working |
[pull] master from ytdl-org:master
How about supporting links of this form:
http://www.youtube.com/verify_age?next_url=/watch%3Ffeature%3Dplayer_embedded%26v%3DVXLz-uMLtmA
It is quite easy to extract original link, however it may be inconvient to copy it (real link) from browser, because for example we clicked from flash, and have no way of copy things to clipboard, or it was linked in this way originally.
It is similar to handling youtu.be links.
The text was updated successfully, but these errors were encountered: