Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes #912

marcospri · 2023-05-03T09:25:55Z

Use full YouTube URLs instead of just video IDs as the path params for the /video/ route.

Add a new YouTubeService that handles parsing YouTube URLs in order to extract the video IDs from them. In the future this same YouTubeService will also be used to get video transcripts from the YouTube API.

This commit also adds a new YOUTUBE_CAPTIONS setting and disables the /video route (HTTP Unauthorized) is this setting isn't enabled.

get_url_details.py is also turned into a service (URLDetailsService) because it now depends on two other services (and because previously it was a random top-level file).

Also, HTTP caching headers are added to the /video route.

marcospri · 2023-05-03T09:27:26Z

via/services/youtube.py

+    @classmethod
+    def parse_url(cls, public_url):
+        """Get the youtube video ID from an URL."""
+        parsed = urlparse(public_url)


This is a bit wordy compared to just a regexp but easier to get right and to read IMO.

marcospri · 2023-05-03T09:27:44Z

via/views/__init__.py

@@ -9,7 +9,7 @@ def add_routes(config):  # pragma: no cover
    config.add_route("index", "/", factory=QueryURLResource)
    config.add_route("get_status", "/_status")
    config.add_route("view_pdf", "/pdf", factory=QueryURLResource)
-    config.add_route("view_video", "/video/{id}")
+    config.add_route("view_video", "/video", factory=QueryURLResource)


This context support URLs in the URL

Ok, so we've changed from /video/{id} to /video/{youtube_url} and using QueryURLResource instead of normal route params so that {youtube_url} can be a full URL

marcospri · 2023-05-03T09:28:14Z

via/views/route_by_content.py


-    if via_client_svc.is_pdf(mime_type):
+    if content_type in ["pdf", "video"]:


Same headers as for pdfs now.

Ok, so we're going to use the same caching headers for videos as we've been using for PDFs 👍

marcospri · 2023-05-03T09:31:32Z

via/views/view_video.py


 @view_config(
    renderer="via:templates/video_player.html.jinja2",
    route_name="view_video",
 )
-def view_video(request):
+def view_video(context, request):


We take full youtube URLs now instead of the video id and parse it here.

This is the same approach as for PDFs but there we split this further to support different PDF providers. As we only support youtube we don't need that specialization here.

If we were to add other video providers we'd keep the /video route taking the full URL as the main entry point.

Ok, there was an earlier PR (#905) that added this /video/* route, but I missed that.

So this is a new video player app that's part of Via now. But my understanding is that we're not actually proxying the bytes of YouTube videos through Via. Rather, Via just renders a page that includes the YouTube video player embedded, the video transcript, and the annotation client.

marcospri · 2023-05-03T09:32:52Z

via/services/youtube.py

+    @classmethod
+    def parse_url(cls, public_url):
+        """Get the youtube video ID from an URL."""
+        parsed = urlparse(public_url)


Once we integrate this into LMS we might have to extend this to support our own video://youtube/ID type of URL if we go that route.

For now this allows for easier testing directly in via.

robertknight · 2023-05-03T11:20:37Z

The general approach of using existing Via URLs (https://{via_url}/{http_url}) but serving the response in the video player if the URL matches a recognized pattern, or the content type is a video, makes sense. On some sites we may run into an issue where users might want to annotate the content either as regular HTML or a video. We could show some kind of mode-switcher in the UI in that case, and ensure we record enough information with the annotations to know what mode was being used when a particular annotation was made.

marcospri · 2023-05-03T14:19:31Z

tests/conftest.py

@@ -17,6 +17,7 @@ def pyramid_settings():
        "checkmate_ignore_reasons": None,
        "checkmate_allow_all": False,
        "enable_front_page": True,
+        "youtube_captions": True,


New setting to enable/disable this

marcospri · 2023-05-03T14:19:45Z

tests/unit/conftest.py

+
+
+@pytest.fixture
+def call_view(pyramid_request):


Moved to the common fixtures

marcospri · 2023-05-03T14:20:08Z

tests/unit/via/services/url_details_test.py

@@ -36,7 +37,7 @@ def test_it_calls_get_for_normal_urls(

        url = "http://example.com"

-        result = get_url_details(http_service, url, headers=sentinel.headers)
+        result = svc.get_url_details(url, headers=sentinel.headers)


Now a service, not a function

marcospri · 2023-05-03T14:20:53Z

tests/unit/via/views/route_by_content_test.py

-            ("HTML", 404, "public, max-age=60, stale-while-revalidate=86400"),
-            ("HTML", 500, "no-cache"),
-            ("HTML", 501, "no-cache"),
+            ("pdf", 200, "public, max-age=300, stale-while-revalidate=86400"),


These were only compared in the test against "PDF" directly. Lowercase values the ones expected for content type.

marcospri · 2023-05-03T14:22:07Z

via/services/url_details.py

+from via.services.youtube import YoutubeService
+
+
+class URLDetailsService:


Made this into a service.

This was a function which already had a dependency on HTTPService and I was about to add one for YoutubeService.

Made into a service in one commit and extended with the YoutubeService dependency next.

Cool. Yeah this was a weird top-level file so I'm glad to see it gone. Turning it into a service seems right especially given the dependency on another service. Definitely the right decision when adding a second dependency on a second other service

marcospri · 2023-05-03T14:22:37Z

via/services/via_client.py

+    }
+
+    def content_type(self, mime_type):
+        return self._mime_types_content_type.get(mime_type, "html")


One function instead of is_pdf, is_video...

marcospri · 2023-05-03T14:23:02Z

via/services/youtube.py

+
+    @property
+    def enabled(self):
+        return self._enabled


This is a bit silly but it will potentially grow with other attributes.

marcospri · 2023-05-04T11:05:57Z

tests/unit/via/services/youtube_test.py

+            ("https://youtube.com?param=nope", None),
+            ("https://youtube.com?v=", None),
+            ("https://www.youtube.com/watch?v=VIDEO_ID", "VIDEO_ID"),
+            ("https://www.youtube.com/watch?v=VIDEO_ID&t=14s", "VIDEO_ID"),


Support more formats see:

https://stackoverflow.com/a/8260383/15031259

And also not there:

https://www.youtube.com/live/4Lb2JoDN70o?feature=share
https://www.youtube.com/shorts/ktl1ncCTBGg

seanh

YouTube needs to be spelled with a capital T, e.g. YouTubeService.

Otherwise this all looks good to me 👍

seanh · 2023-05-30T11:04:10Z

tox.ini

@@ -44,6 +44,7 @@ setenv =
    dev: VIA_SECRET = not_a_secret
    dev: CHECKMATE_API_KEY = dev_api_key
    dev: ENABLE_FRONT_PAGE = {env:ENABLE_FRONT_PAGE:true}
+    dev: YOUTUBE_CAPTIONS = {env:YOUTUBE_CAPTIONS:true}


Enable the feature flag by default in dev 👍

seanh · 2023-05-30T11:09:10Z

via/services/url_details.py

+from via.services.youtube import YoutubeService
+
+
+class URLDetailsService:


Cool. Yeah this was a weird top-level file so I'm glad to see it gone. Turning it into a service seems right especially given the dependency on another service. Definitely the right decision when adding a second dependency on a second other service

seanh · 2023-05-30T11:29:58Z

via/services/url_details.py

+        if self._youtube.enabled and self._youtube.parse_url(url):
+            return "video/x-youtube", 200


As well as being moved into a service this now supports potentially returning "video/x-youtube" as a special case

seanh · 2023-05-30T11:30:58Z

via/services/youtube.py

+from urllib.parse import parse_qs, urlparse
+
+
+class YoutubeService:


Suggested change

class YoutubeService:

class YouTubeService:

seanh · 2023-05-30T11:31:36Z

via/services/youtube.py

+    @classmethod
+    def parse_url(cls, public_url):
+        """Get the youtube video ID from an URL."""
+        parsed = urlparse(public_url)


seanh · 2023-05-30T11:37:40Z

via/views/proxy.py

+    mime_type, _status_code = request.find_service(URLDetailsService).get_url_details(
+        url
+    )


Just updating the call now that URLDetailsService is a service 👍

seanh · 2023-05-30T11:38:16Z

via/views/route_by_content.py

+    mime_type, status_code = request.find_service(URLDetailsService).get_url_details(
+        url, request.headers


Just updating the call now that URLDetailsService is a service 👍

seanh · 2023-05-30T11:39:23Z

via/views/route_by_content.py


-    if via_client_svc.is_pdf(mime_type):
+    if content_type in ["pdf", "video"]:


Ok, so we're going to use the same caching headers for videos as we've been using for PDFs 👍

seanh · 2023-05-30T11:42:29Z

via/views/__init__.py

@@ -9,7 +9,7 @@ def add_routes(config):  # pragma: no cover
    config.add_route("index", "/", factory=QueryURLResource)
    config.add_route("get_status", "/_status")
    config.add_route("view_pdf", "/pdf", factory=QueryURLResource)
-    config.add_route("view_video", "/video/{id}")
+    config.add_route("view_video", "/video", factory=QueryURLResource)


Ok, so we've changed from /video/{id} to /video/{youtube_url} and using QueryURLResource instead of normal route params so that {youtube_url} can be a full URL

seanh · 2023-05-30T15:27:02Z

via/views/view_video.py


 @view_config(
    renderer="via:templates/video_player.html.jinja2",
    route_name="view_video",
 )
-def view_video(request):
+def view_video(context, request):


Ok, there was an earlier PR (#905) that added this /video/* route, but I missed that.

So this is a new video player app that's part of Via now. But my understanding is that we're not actually proxying the bytes of YouTube videos through Via. Rather, Via just renders a page that includes the YouTube video player embedded, the video transcript, and the annotation client.

Use full YouTube URLs instead of just video IDs as the path params for the `/video/` route. Add a new `YouTubeService` that handles parsing YouTube URLs in order to extract the video IDs from them. In the future this same `YoutubeService` will also be used to get video transcripts from the YouTube API. This commit also adds a new `YOUTUBE_CAPTIONS` setting and disables the `/video` route (HTTP Unauthorized) is this setting isn't enabled. `get_url_details.py` is also turned into a service (`URLDetailsService`) because it now depends on two other services (and because previously it was a random top-level file). Also, HTTP caching headers are added to the `/video` route.

seanh · 2023-06-06T16:07:34Z

Rebased and squashed...

seanh · 2023-06-12T16:54:03Z

I think I've parcelled everything in this PR out into separate smaller PRs now, so closing this one

marcospri commented May 3, 2023

View reviewed changes

marcospri force-pushed the youtube-service branch from 222618e to 130e2b8 Compare May 3, 2023 09:46

marcospri commented May 3, 2023

View reviewed changes

marcospri requested a review from seanh May 3, 2023 14:24

marcospri marked this pull request as ready for review May 3, 2023 14:24

marcospri mentioned this pull request May 4, 2023

Backend support for youtube caption annotations hypothesis/lms#5361

Closed

marcospri commented May 4, 2023

View reviewed changes

seanh changed the title ~~Scaffolding for proxing youtube videos~~ Scaffolding for proxying youtube videos May 30, 2023

seanh changed the title ~~Scaffolding for proxying youtube videos~~ Scaffolding for proxying YouTube videos May 30, 2023

seanh changed the title ~~Scaffolding for proxying YouTube videos~~ Add scaffolding for proxying YouTube videos May 30, 2023

seanh approved these changes May 30, 2023

View reviewed changes

marcospri added the ⚠️ do not merge ⚠️ label May 31, 2023

seanh self-assigned this Jun 6, 2023

seanh force-pushed the youtube-service branch from cdc5889 to ea2cade Compare June 6, 2023 16:07

seanh changed the title ~~Add scaffolding for proxying YouTube videos~~ Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes Jun 6, 2023

seanh marked this pull request as draft June 6, 2023 16:08

seanh removed the ⚠️ do not merge ⚠️ label Jun 6, 2023

marcospri mentioned this pull request Jun 8, 2023

Add YouTube URL parsing #971

Merged

seanh closed this Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes #912

Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes #912

marcospri commented May 3, 2023 •

edited by seanh

Loading

marcospri May 3, 2023

seanh May 30, 2023

marcospri May 3, 2023

seanh May 30, 2023

marcospri May 3, 2023

seanh May 30, 2023

marcospri May 3, 2023

seanh May 30, 2023

marcospri May 3, 2023

robertknight commented May 3, 2023

marcospri May 3, 2023

marcospri May 3, 2023

marcospri May 3, 2023

marcospri May 3, 2023

marcospri May 3, 2023

seanh May 30, 2023

marcospri May 3, 2023

marcospri May 3, 2023

marcospri May 4, 2023

seanh left a comment

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh May 30, 2023

seanh commented Jun 6, 2023

seanh commented Jun 12, 2023


		if via_client_svc.is_pdf(mime_type):
		if content_type in ["pdf", "video"]:

		from via.services.youtube import YoutubeService


		class URLDetailsService:

		if self._youtube.enabled and self._youtube.parse_url(url):
		return "video/x-youtube", 200

		from urllib.parse import parse_qs, urlparse


		class YoutubeService:

		mime_type, status_code = request.find_service(URLDetailsService).get_url_details(
		url, request.headers

Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes #912

Use full YouTube URLs instead of just video IDs, and various other miscellaneous YouTube-related changes #912

Conversation

marcospri commented May 3, 2023 • edited by seanh Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertknight commented May 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanh commented Jun 6, 2023

seanh commented Jun 12, 2023

marcospri commented May 3, 2023 •

edited by seanh

Loading