build: Introduce GURL with shimmed ICU#13808
build: Introduce GURL with shimmed ICU#13808dio wants to merge 52 commits intoenvoyproxy:mainfrom dio:introduce-googleurl
Conversation
This patch re-introduces GURL as deps. It is linked against shimmed ICU, hence the unicode checking is bypassed. Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
|
Does this PR bring back the code that caused CVE-2020-25018? https://groups.google.com/g/envoy-announce/c/fk0Qvgrln_s/m/w7kbfOHgCAAJ.What does this approach give us over the current implementation? |
@moderation not at all. This allows us to use GURL facility beyond Also, as mentioned in: https://github.com/envoyproxy/envoy/tree/master/source/common/chromium_url's README, "Long term we need this to be moved to absl or QUICHE for upgrades and long-term support." This provides that. |
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
third_party/icu/googleurl/test.cc
Outdated
| EXPECT_EQ(url.ref(), "section"); | ||
|
|
||
| // Ensure ICU shim is functioning correctly, i.e. not crashing and resulting invalid parsed URL. | ||
| GURL idn_url("https://\xe5\x85\x89.example/"); |
There was a problem hiding this comment.
Can you also add tests that have %-encoded sequences in the host name as well as tests with hostnames that are not DNS RFC compliant, please?
There was a problem hiding this comment.
I added some. but I think my creativity is lacking. If you have some examples, please let me know. Thank you!
There was a problem hiding this comment.
Btw, this seems like a great (and not so difficult) candidate for a fuzzer that would catch GURL crashes
@dio, what do you think? I think it would very roughly look like https://github.com/envoyproxy/envoy/blob/master/test/common/common/hash_fuzz_test.cc
There was a problem hiding this comment.
@asraa thanks! I added the setup. Please take a look when you have time.
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
|
Tagging @htuch to seek opinions on this. 🙏🏽 |
| build --action_env=PATH | ||
|
|
||
| # Skip system ICU linking. | ||
| build --@com_googlesource_googleurl//build_config:system_icu=0 |
There was a problem hiding this comment.
Is there any way in CI to verify that we definitely don't have ICU in the binary somewhow? E.g. with readelf or the like.
There was a problem hiding this comment.
I think we may not need to deal with ICU at all here. We just need the ParseStandardURL function which does use or include anything from ICU.
I think what would work is this:
- define new cc_library in the googleurl/url/BUILD which only includes sources necessary to build the
ParseStandardURLfunction. I think this should not involve any files that reference ICU.
Note that we may also need the RemoveURLWhitespace function. I'm yet unsure if this function is needed. I'll look into it later tonight.
- Add flag protected change to the
Http::Utility::Urlclass to use theParseStandardURLfrom google URL. - Make sure all existing tests pass.
There was a problem hiding this comment.
@yanavlasov sorry, a question. I can definitely pull only the ParseStandardURL related files (that will be url/third_party/mozilla/url_parse.{cc, h}) and let the Http::Utility::Url implementation to use that, however, how about:
envoy/source/common/http/path_utility.cc
Lines 3 to 4 in deded53
chromium_url around? cc. @htuch.
If yes then we don't need to pull googleurl from googlesource.
There was a problem hiding this comment.
We will removal common/chromium_url in favor of this eventually, but will need to keep it around while we do the runtime guarded transition.
We should still verify ICU is not creeping into the build, since this is a regression check for the previous CVE.
There was a problem hiding this comment.
Ultimately we want something to replace the uses in https://github.com/envoyproxy/envoy/tree/master/source/common/chromium_url, which are around url::CanonicalizePath(). Can we do this without the ICU shim? What about pulling other URL parsing functionality out? Do we need build gymnastics to skip ICU in these cases, e.g. patch files?
There was a problem hiding this comment.
OK. So from my understanding, the caller site is
envoy/source/common/http/path_utility.cc
Lines 15 to 28 in dd0befa
ParseStandardURL. I will try to look at this.
There was a problem hiding this comment.
After a few attempts, I think we can't do a clean "import" e.g. referencing third_party/mozilla/url_parse.cc from a BUILD file in envoy can cause a visibility problem (can be solved by using exports_files but that requires a patch to upstream), without as @htuch said, a custom genrule_cmd.
re: fuzzing, the motivation was to make sure if we have a googleurl lib linked with the shimmed ICU lib should not causing crashes (but yeah this should be predictable by inputting URLs with IDN to the function). If we want to skip testing it via GURL class, I can remove the test and do testing by calling "url::CanonicalizePath" instead.
There was a problem hiding this comment.
I think a simple exports_files patch is OK. What we're really weary of is complex patches that are likely to break on version upgrades.
There was a problem hiding this comment.
I see. Submitted a draft here: #14583, patching the googleurl's url/BUILD file.
htuch
left a comment
There was a problem hiding this comment.
Looks good, thanks for working on this!
|
|
||
| DEFINE_FUZZER(const uint8_t* buf, size_t len) { | ||
| const std::string input(reinterpret_cast<const char*>(buf), len); | ||
| GURL url(input); |
There was a problem hiding this comment.
Would it make sense to add some accessors on the GURL to make sure that lazily computed attributes don't crash?
| @@ -0,0 +1,45 @@ | |||
| #pragma once | |||
There was a problem hiding this comment.
@danzh2010 @alyssawilk do you know how reasonable it will be to add proper Chromium URL support for eliding unicode translation? I think this shim is fine for now, but ideally we can move away from needing to workarounds like this.
There was a problem hiding this comment.
I'm not sure, but cc @DavidSchinazi @ianswett just to have on our list.
There was a problem hiding this comment.
It's definitely doable to pull in the full Chromium IDN logic, but that has significant impact on binary size - not sure what the full set of design goals and requirements is here though
There was a problem hiding this comment.
@DavidSchinazi I think probably what we need is a way to build GURL without Unicode support (probably via defining build flag etc). Is that something viable?
There was a problem hiding this comment.
@dio I think that should be doable by using IDN from the OS:
https://source.chromium.org/chromium/chromium/src/+/master:url/BUILD.gn;l=72;drc=a82b58b53bbc8b7e843c0624a64a4428ad2fa7bd
|
Requesting @envoyproxy/dependency-shepherds for deps approval. :) |
|
/lgtm deps |
|
@yanavlasov do you need me to have another set of changes here? |
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
This reverts commit 77a8099. Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
|
/retest |
|
Retrying Azure Pipelines: |
|
@lizan sorry, do you think the failure is related with azp? https://dev.azure.com/cncf/4684fb3d-0389-4e0b-8251-221942316e06/_apis/build/builds/61726/logs/193 Please let me know if I need to do something on my end. |
|
/retest |
|
Retrying Azure Pipelines: |
|
I think you can close this one in favor of PR #14583 |
|
Yes, thank you for the reminder and review @yanavlasov! |
Commit Message: This patch re-introduces GURL. It is linked against shimmed ICU, hence the Unicode checking is bypassed.
Additional Description: Currently, in this PR the introduced GURL + shimmed ICU is not used. The main intention of this patch is to return false on every attempt to
IDNToASCIIcall, which callsuidna_nameToASCIIof ICU. Normally, it converts IDN to ASCII by doing "lookup" to the ICU data. However, in this patch, since we use shimmed ICUuidna_nameToASCIIwill always be failed (return false).Risk Level: Low
Testing: Added testing for the ICU shim and its integration with GURL.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Signed-off-by: Dhi Aurrahman dio@tetrate.io