-
-
Notifications
You must be signed in to change notification settings - Fork 15.6k
Add option to restrict media domains #25783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
62c63e1
dc56069
863129e
7326436
a08be4f
01dab53
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,6 +50,7 @@ def __init__( | |
| connection: HTTPConnection = global_http_connection, | ||
| *, | ||
| allowed_local_media_path: str = "", | ||
| allowed_media_domains: Optional[list[str]] = None, | ||
| ) -> None: | ||
| """ | ||
| Args: | ||
|
|
@@ -82,6 +83,9 @@ def __init__( | |
| allowed_local_media_path_ = None | ||
|
|
||
| self.allowed_local_media_path = allowed_local_media_path_ | ||
| if allowed_media_domains is None: | ||
| allowed_media_domains = [] | ||
| self.allowed_media_domains = allowed_media_domains | ||
|
|
||
| def _load_data_url( | ||
| self, | ||
|
|
@@ -115,6 +119,14 @@ def _load_file_url( | |
|
|
||
| return media_io.load_file(filepath) | ||
|
|
||
| def _assert_url_in_allowed_media_domains(self, url_spec) -> None: | ||
| if self.allowed_media_domains and url_spec.hostname not in \ | ||
| self.allowed_media_domains: | ||
| raise ValueError( | ||
| f"The URL must be from one of the allowed domains: " | ||
| f"{self.allowed_media_domains}. Input URL domain: " | ||
| f"{url_spec.hostname}") | ||
|
|
||
| def load_from_url( | ||
| self, | ||
| url: str, | ||
|
|
@@ -125,6 +137,8 @@ def load_from_url( | |
| url_spec = urlparse(url) | ||
|
|
||
| if url_spec.scheme.startswith("http"): | ||
| self._assert_url_in_allowed_media_domains(url_spec) | ||
|
Comment on lines
127
to
+140
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current implementation checks the domain of the initial URL, but it does not prevent SSRF vulnerabilities that arise from HTTP redirects. If an allowed URL redirects to a URL on a disallowed domain (including internal network addresses), To properly mitigate this, you should either disable redirects or verify the domain of the final URL after all redirects have been followed. Disabling redirects might be the safest option if they are not a required feature. If they are required, the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @huachenheli @DarkLight1337 this seems like a good point. I wouldn't block merging the current change over this, but it seems worth a follow-up change.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. This PR should be fine for immediate needs. We already have |
||
|
|
||
| connection = self.connection | ||
| data = connection.get_bytes(url, timeout=fetch_timeout) | ||
|
|
||
|
|
@@ -150,6 +164,8 @@ async def load_from_url_async( | |
| loop = asyncio.get_running_loop() | ||
|
|
||
| if url_spec.scheme.startswith("http"): | ||
| self._assert_url_in_allowed_media_domains(url_spec) | ||
|
|
||
| connection = self.connection | ||
| data = await connection.async_get_bytes(url, timeout=fetch_timeout) | ||
| future = loop.run_in_executor(global_thread_pool, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance, it's better to convert
allowed_media_domainsto asetduring initialization. Checking for an item's existence is O(1) on average for a set, while it's O(n) for a list. This can be significant if the list of allowed domains is large.