Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes #912

Closed
wants to merge 1 commit into from

Conversation

marcospri
Copy link
Member

@marcospri marcospri commented May 3, 2023

Use full YouTube URLs instead of just video IDs as the path params for the /video/ route.

Add a new YouTubeService that handles parsing YouTube URLs in order to extract the video IDs from them. In the future this same YouTubeService will also be used to get video transcripts from the YouTube API.

This commit also adds a new YOUTUBE_CAPTIONS setting and disables the /video route (HTTP Unauthorized) is this setting isn't enabled.

get_url_details.py is also turned into a service (URLDetailsService) because it now depends on two other services (and because previously it was a random top-level file).

Also, HTTP caching headers are added to the /video route.

@classmethod
def parse_url(cls, public_url):
"""Get the youtube video ID from an URL."""
parsed = urlparse(public_url)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit wordy compared to just a regexp but easier to get right and to read IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -9,7 +9,7 @@ def add_routes(config): # pragma: no cover
config.add_route("index", "/", factory=QueryURLResource)
config.add_route("get_status", "/_status")
config.add_route("view_pdf", "/pdf", factory=QueryURLResource)
config.add_route("view_video", "/video/{id}")
config.add_route("view_video", "/video", factory=QueryURLResource)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This context support URLs in the URL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so we've changed from /video/{id} to /video/{youtube_url} and using QueryURLResource instead of normal route params so that {youtube_url} can be a full URL


if via_client_svc.is_pdf(mime_type):
if content_type in ["pdf", "video"]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same headers as for pdfs now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so we're going to use the same caching headers for videos as we've been using for PDFs 👍


@view_config(
renderer="via:templates/video_player.html.jinja2",
route_name="view_video",
)
def view_video(request):
def view_video(context, request):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We take full youtube URLs now instead of the video id and parse it here.

This is the same approach as for PDFs but there we split this further to support different PDF providers. As we only support youtube we don't need that specialization here.

If we were to add other video providers we'd keep the /video route taking the full URL as the main entry point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there was an earlier PR (#905) that added this /video/* route, but I missed that.

So this is a new video player app that's part of Via now. But my understanding is that we're not actually proxying the bytes of YouTube videos through Via. Rather, Via just renders a page that includes the YouTube video player embedded, the video transcript, and the annotation client.

@classmethod
def parse_url(cls, public_url):
"""Get the youtube video ID from an URL."""
parsed = urlparse(public_url)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we integrate this into LMS we might have to extend this to support our own video://youtube/ID type of URL if we go that route.

For now this allows for easier testing directly in via.

@robertknight
Copy link
Member

The general approach of using existing Via URLs (https://{via_url}/{http_url}) but serving the response in the video player if the URL matches a recognized pattern, or the content type is a video, makes sense. On some sites we may run into an issue where users might want to annotate the content either as regular HTML or a video. We could show some kind of mode-switcher in the UI in that case, and ensure we record enough information with the annotations to know what mode was being used when a particular annotation was made.

@@ -17,6 +17,7 @@ def pyramid_settings():
"checkmate_ignore_reasons": None,
"checkmate_allow_all": False,
"enable_front_page": True,
"youtube_captions": True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New setting to enable/disable this



@pytest.fixture
def call_view(pyramid_request):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the common fixtures

@@ -36,7 +37,7 @@ def test_it_calls_get_for_normal_urls(

url = "http://example.com"

result = get_url_details(http_service, url, headers=sentinel.headers)
result = svc.get_url_details(url, headers=sentinel.headers)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now a service, not a function

("HTML", 404, "public, max-age=60, stale-while-revalidate=86400"),
("HTML", 500, "no-cache"),
("HTML", 501, "no-cache"),
("pdf", 200, "public, max-age=300, stale-while-revalidate=86400"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were only compared in the test against "PDF" directly. Lowercase values the ones expected for content type.

from via.services.youtube import YoutubeService


class URLDetailsService:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this into a service.

This was a function which already had a dependency on HTTPService and I was about to add one for YoutubeService.

Made into a service in one commit and extended with the YoutubeService dependency next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Yeah this was a weird top-level file so I'm glad to see it gone. Turning it into a service seems right especially given the dependency on another service. Definitely the right decision when adding a second dependency on a second other service

}

def content_type(self, mime_type):
return self._mime_types_content_type.get(mime_type, "html")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One function instead of is_pdf, is_video...


@property
def enabled(self):
return self._enabled
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit silly but it will potentially grow with other attributes.

@marcospri marcospri requested a review from seanh May 3, 2023 14:24
@marcospri marcospri marked this pull request as ready for review May 3, 2023 14:24
("https://youtube.com?param=nope", None),
("https://youtube.com?v=", None),
("https://www.youtube.com/watch?v=VIDEO_ID", "VIDEO_ID"),
("https://www.youtube.com/watch?v=VIDEO_ID&t=14s", "VIDEO_ID"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanh seanh changed the title Scaffolding for proxing youtube videos Scaffolding for proxying youtube videos May 30, 2023
@seanh seanh changed the title Scaffolding for proxying youtube videos Scaffolding for proxying YouTube videos May 30, 2023
@seanh seanh changed the title Scaffolding for proxying YouTube videos Add scaffolding for proxying YouTube videos May 30, 2023
Copy link
Contributor

@seanh seanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YouTube needs to be spelled with a capital T, e.g. YouTubeService.

Otherwise this all looks good to me 👍

@@ -44,6 +44,7 @@ setenv =
dev: VIA_SECRET = not_a_secret
dev: CHECKMATE_API_KEY = dev_api_key
dev: ENABLE_FRONT_PAGE = {env:ENABLE_FRONT_PAGE:true}
dev: YOUTUBE_CAPTIONS = {env:YOUTUBE_CAPTIONS:true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable the feature flag by default in dev 👍

from via.services.youtube import YoutubeService


class URLDetailsService:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Yeah this was a weird top-level file so I'm glad to see it gone. Turning it into a service seems right especially given the dependency on another service. Definitely the right decision when adding a second dependency on a second other service

Comment on lines +33 to +34
if self._youtube.enabled and self._youtube.parse_url(url):
return "video/x-youtube", 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as being moved into a service this now supports potentially returning "video/x-youtube" as a special case

from urllib.parse import parse_qs, urlparse


class YoutubeService:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class YoutubeService:
class YouTubeService:

@classmethod
def parse_url(cls, public_url):
"""Get the youtube video ID from an URL."""
parsed = urlparse(public_url)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +11 to +13
mime_type, _status_code = request.find_service(URLDetailsService).get_url_details(
url
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updating the call now that URLDetailsService is a service 👍

Comment on lines +16 to +17
mime_type, status_code = request.find_service(URLDetailsService).get_url_details(
url, request.headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updating the call now that URLDetailsService is a service 👍


if via_client_svc.is_pdf(mime_type):
if content_type in ["pdf", "video"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so we're going to use the same caching headers for videos as we've been using for PDFs 👍

@@ -9,7 +9,7 @@ def add_routes(config): # pragma: no cover
config.add_route("index", "/", factory=QueryURLResource)
config.add_route("get_status", "/_status")
config.add_route("view_pdf", "/pdf", factory=QueryURLResource)
config.add_route("view_video", "/video/{id}")
config.add_route("view_video", "/video", factory=QueryURLResource)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so we've changed from /video/{id} to /video/{youtube_url} and using QueryURLResource instead of normal route params so that {youtube_url} can be a full URL


@view_config(
renderer="via:templates/video_player.html.jinja2",
route_name="view_video",
)
def view_video(request):
def view_video(context, request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there was an earlier PR (#905) that added this /video/* route, but I missed that.

So this is a new video player app that's part of Via now. But my understanding is that we're not actually proxying the bytes of YouTube videos through Via. Rather, Via just renders a page that includes the YouTube video player embedded, the video transcript, and the annotation client.

Use full YouTube URLs instead of just video IDs as the path params for
the `/video/` route.

Add a new `YouTubeService` that handles parsing YouTube URLs in order to
extract the video IDs from them. In the future this same
`YoutubeService` will also be used to get video transcripts from the
YouTube API.

This commit also adds a new `YOUTUBE_CAPTIONS` setting and disables the
`/video` route (HTTP Unauthorized) is this setting isn't enabled.

`get_url_details.py` is also turned into a service (`URLDetailsService`)
because it now depends on two other services (and because previously it
was a random top-level file).

Also, HTTP caching headers are added to the `/video` route.
@seanh seanh force-pushed the youtube-service branch from cdc5889 to ea2cade Compare June 6, 2023 16:07
@seanh
Copy link
Contributor

seanh commented Jun 6, 2023

Rebased and squashed...

@seanh seanh changed the title Add scaffolding for proxying YouTube videos Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes Jun 6, 2023
@seanh seanh marked this pull request as draft June 6, 2023 16:08
@marcospri marcospri mentioned this pull request Jun 8, 2023
@seanh
Copy link
Contributor

seanh commented Jun 12, 2023

I think I've parcelled everything in this PR out into separate smaller PRs now, so closing this one

@seanh seanh closed this Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants