Skip to content

build: Introduce GURL with shimmed ICU#13808

Closed
dio wants to merge 52 commits intoenvoyproxy:mainfrom
dio:introduce-googleurl
Closed

build: Introduce GURL with shimmed ICU#13808
dio wants to merge 52 commits intoenvoyproxy:mainfrom
dio:introduce-googleurl

Conversation

@dio
Copy link
Member

@dio dio commented Oct 29, 2020

Commit Message: This patch re-introduces GURL. It is linked against shimmed ICU, hence the Unicode checking is bypassed.
Additional Description: Currently, in this PR the introduced GURL + shimmed ICU is not used. The main intention of this patch is to return false on every attempt to IDNToASCII call, which calls uidna_nameToASCII of ICU. Normally, it converts IDN to ASCII by doing "lookup" to the ICU data. However, in this patch, since we use shimmed ICU uidna_nameToASCII will always be failed (return false).

Risk Level: Low
Testing: Added testing for the ICU shim and its integration with GURL.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

Signed-off-by: Dhi Aurrahman dio@tetrate.io

dio added 4 commits October 29, 2020 02:38
This patch re-introduces GURL as deps. It is linked against shimmed ICU, hence
the unicode checking is bypassed.

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Oct 29, 2020
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

🐱

Caused by: #13808 was opened by dio.

see: more, trace.

dio added 3 commits October 29, 2020 03:03
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@dio dio marked this pull request as ready for review October 29, 2020 08:17
@yanavlasov yanavlasov self-assigned this Oct 29, 2020
@moderation
Copy link
Contributor

Does this PR bring back the code that caused CVE-2020-25018? https://groups.google.com/g/envoy-announce/c/fk0Qvgrln_s/m/w7kbfOHgCAAJ.What does this approach give us over the current implementation?

@dio
Copy link
Member Author

dio commented Oct 29, 2020

Does this PR bring back the code that caused CVE-2020-25018? https://groups.google.com/g/envoy-announce/c/fk0Qvgrln_s/m/w7kbfOHgCAAJ. What does this approach give us over the current implementation?

@moderation not at all. This allows us to use GURL facility beyond url::CanonicalizePath() that is currently provided by https://github.com/envoyproxy/envoy/tree/master/source/common/chromium_url without needing the actual ICU. That CVE was caused by missing data when trying to convert IDN to Punycode, however, here since we don't support IDN (Envoy should only see the hostname in Punycode) we should always return "false" when there is an attempt to do so.

Also, as mentioned in: https://github.com/envoyproxy/envoy/tree/master/source/common/chromium_url's README, "Long term we need this to be moved to absl or QUICHE for upgrades and long-term support." This provides that.

dio added 5 commits October 31, 2020 02:36
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Copy link
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

EXPECT_EQ(url.ref(), "section");

// Ensure ICU shim is functioning correctly, i.e. not crashing and resulting invalid parsed URL.
GURL idn_url("https://\xe5\x85\x89.example/");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add tests that have %-encoded sequences in the host name as well as tests with hostnames that are not DNS RFC compliant, please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some. but I think my creativity is lacking. If you have some examples, please let me know. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this seems like a great (and not so difficult) candidate for a fuzzer that would catch GURL crashes
@dio, what do you think? I think it would very roughly look like https://github.com/envoyproxy/envoy/blob/master/test/common/common/hash_fuzz_test.cc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asraa thanks! I added the setup. Please take a look when you have time.

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
dio added 4 commits November 3, 2020 20:18
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@dio
Copy link
Member Author

dio commented Nov 9, 2020

Tagging @htuch to seek opinions on this. 🙏🏽

build --action_env=PATH

# Skip system ICU linking.
build --@com_googlesource_googleurl//build_config:system_icu=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way in CI to verify that we definitely don't have ICU in the binary somewhow? E.g. with readelf or the like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may not need to deal with ICU at all here. We just need the ParseStandardURL function which does use or include anything from ICU.
I think what would work is this:

  1. define new cc_library in the googleurl/url/BUILD which only includes sources necessary to build the ParseStandardURL function. I think this should not involve any files that reference ICU.

Note that we may also need the RemoveURLWhitespace function. I'm yet unsure if this function is needed. I'll look into it later tonight.

  1. Add flag protected change to the Http::Utility::Url class to use the ParseStandardURL from google URL.
  2. Make sure all existing tests pass.

Copy link
Member Author

@dio dio Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanavlasov sorry, a question. I can definitely pull only the ParseStandardURL related files (that will be url/third_party/mozilla/url_parse.{cc, h}) and let the Http::Utility::Url implementation to use that, however, how about:

#include "common/chromium_url/url_canon.h"
#include "common/chromium_url/url_canon_stdstring.h"
? Are we going to keep chromium_url around? cc. @htuch.

If yes then we don't need to pull googleurl from googlesource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will removal common/chromium_url in favor of this eventually, but will need to keep it around while we do the runtime guarded transition.

We should still verify ICU is not creeping into the build, since this is a regression check for the previous CVE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately we want something to replace the uses in https://github.com/envoyproxy/envoy/tree/master/source/common/chromium_url, which are around url::CanonicalizePath(). Can we do this without the ICU shim? What about pulling other URL parsing functionality out? Do we need build gymnastics to skip ICU in these cases, e.g. patch files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So from my understanding, the caller site is

absl::optional<std::string> canonicalizePath(absl::string_view original_path) {
std::string canonical_path;
chromium_url::Component in_component(0, original_path.size());
chromium_url::Component out_component;
chromium_url::StdStringCanonOutput output(&canonical_path);
if (!chromium_url::CanonicalizePath(original_path.data(), in_component, &output,
&out_component)) {
return absl::nullopt;
} else {
output.Complete();
return absl::make_optional(std::move(canonical_path));
}
}
} // namespace
plus we need ParseStandardURL. I will try to look at this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few attempts, I think we can't do a clean "import" e.g. referencing third_party/mozilla/url_parse.cc from a BUILD file in envoy can cause a visibility problem (can be solved by using exports_files but that requires a patch to upstream), without as @htuch said, a custom genrule_cmd.

re: fuzzing, the motivation was to make sure if we have a googleurl lib linked with the shimmed ICU lib should not causing crashes (but yeah this should be predictable by inputting URLs with IDN to the function). If we want to skip testing it via GURL class, I can remove the test and do testing by calling "url::CanonicalizePath" instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simple exports_files patch is OK. What we're really weary of is complex patches that are likely to break on version upgrades.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Submitted a draft here: #14583, patching the googleurl's url/BUILD file.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for working on this!


DEFINE_FUZZER(const uint8_t* buf, size_t len) {
const std::string input(reinterpret_cast<const char*>(buf), len);
GURL url(input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add some accessors on the GURL to make sure that lazily computed attributes don't crash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it.

@@ -0,0 +1,45 @@
#pragma once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danzh2010 @alyssawilk do you know how reasonable it will be to add proper Chromium URL support for eliding unicode translation? I think this shim is fine for now, but ideally we can move away from needing to workarounds like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but cc @DavidSchinazi @ianswett just to have on our list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely doable to pull in the full Chromium IDN logic, but that has significant impact on binary size - not sure what the full set of design goals and requirements is here though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidSchinazi I think probably what we need is a way to build GURL without Unicode support (probably via defining build flag etc). Is that something viable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

htuch
htuch previously approved these changes Nov 20, 2020
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dio
Copy link
Member Author

dio commented Nov 20, 2020

Requesting @envoyproxy/dependency-shepherds for deps approval. :)

@htuch
Copy link
Member

htuch commented Nov 20, 2020

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Nov 20, 2020
@dio
Copy link
Member Author

dio commented Nov 21, 2020

@yanavlasov do you need me to have another set of changes here?

@repokitteh-read-only repokitteh-read-only bot added deps Approval required for changes to Envoy's external dependencies and removed waiting labels Dec 8, 2020
@dio dio marked this pull request as draft December 11, 2020 06:42
dio added 6 commits December 11, 2020 14:54
This reverts commit 77a8099.

Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
Signed-off-by: Dhi Aurrahman <dio@tetrate.io>
@dio
Copy link
Member Author

dio commented Dec 18, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13808 (comment) was created by @dio.

see: more, trace.

@dio
Copy link
Member Author

dio commented Dec 18, 2020

@lizan sorry, do you think the failure is related with azp? https://dev.azure.com/cncf/4684fb3d-0389-4e0b-8251-221942316e06/_apis/build/builds/61726/logs/193

2020-12-18T02:23:56.7589199Z E: Failed to fetch https://packages.microsoft.com/ubuntu/18.04/prod/dists/bionic/main/binary-amd64/Packages.bz2  File has unexpected size (150752 != 150918). Mirror sync in progress? [IP: 40.117.131.251 443]

Please let me know if I need to do something on my end.

@dio dio marked this pull request as ready for review December 18, 2020 10:09
@dio
Copy link
Member Author

dio commented Dec 18, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13808 (comment) was created by @dio.

see: more, trace.

Base automatically changed from master to main January 15, 2021 23:01
@yanavlasov
Copy link
Contributor

I think you can close this one in favor of PR #14583

@dio
Copy link
Member Author

dio commented Jan 27, 2021

Yes, thank you for the reminder and review @yanavlasov!

@dio dio closed this Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deps Approval required for changes to Envoy's external dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants