Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
c3d45f6
build: Introduce GURL with shimmed ICU
dio Oct 29, 2020
da2dd95
Revert changes
dio Oct 29, 2020
34ee5c4
Remove unused param
dio Oct 29, 2020
f9add04
Fix comments
dio Oct 29, 2020
413a53e
Array in bash
dio Oct 29, 2020
0e622c4
Merge remote-tracking branch 'upstream/master'
dio Oct 29, 2020
40a65c8
Add uidna_openUTS46 test
dio Oct 29, 2020
4c639f1
Merge remote-tracking branch 'upstream/master'
dio Oct 31, 2020
3409bf8
Delete uidna
dio Nov 1, 2020
542c300
Random 2
dio Nov 1, 2020
40742b0
Remove cstdlib
dio Nov 1, 2020
74ec92c
Use expect
dio Nov 1, 2020
b320b8e
More tests
dio Nov 3, 2020
a69238f
Fix
dio Nov 3, 2020
ce9abef
Add fuzz
dio Nov 3, 2020
127968a
Use URL with IDN as corpus example
dio Nov 7, 2020
e36e797
Merge remote-tracking branch 'upstream/master'
dio Nov 7, 2020
86898d2
WIP: review comments
dio Nov 10, 2020
429afb3
Move check after testing
dio Nov 10, 2020
cbc6c36
Add test target
dio Nov 10, 2020
6263208
Debug
dio Nov 10, 2020
c005164
Debug again
dio Nov 10, 2020
9adc264
Run check
dio Nov 10, 2020
5550e24
Fix format
dio Nov 10, 2020
f285916
Debug
dio Nov 10, 2020
0945e28
Debug
dio Nov 10, 2020
feb9b80
Add another line
dio Nov 10, 2020
99e5f2b
Activate check
dio Nov 10, 2020
d4fc6f2
Fix
dio Nov 10, 2020
5068f77
Remove iostreams
dio Nov 10, 2020
b2a02c4
Move around stuff
dio Nov 10, 2020
71e6003
Update tags
dio Nov 10, 2020
241c6f4
Remove
dio Nov 10, 2020
52055e8
Remove
dio Nov 10, 2020
3fe1379
Fix format
dio Nov 11, 2020
86c2693
Fix format again
dio Nov 11, 2020
5aafdde
Exclude files which are part of ICU shims
dio Nov 11, 2020
96ec874
Superfluous s
dio Nov 11, 2020
b58e957
clang-tidy
dio Nov 11, 2020
64645dc
Format
dio Nov 11, 2020
91758d9
Skip all ICU for now
dio Nov 11, 2020
526b5ed
Merge remote-tracking branch 'upstream/master'
dio Nov 13, 2020
24e3c72
Merge remote-tracking branch 'upstream/master'
dio Dec 8, 2020
c5ef876
Merge remote-tracking branch 'upstream/master'
dio Dec 11, 2020
77a8099
Move shimmed icu to external repo
dio Dec 11, 2020
f4342d1
Revert "Move shimmed icu to external repo"
dio Dec 11, 2020
1c3a923
Symlink
dio Dec 11, 2020
87ba219
Remove WORKSPACE
dio Dec 11, 2020
60e5c13
Update WORKSPACE.filter.example
dio Dec 17, 2020
cbc1c09
Merge remote-tracking branch 'upstream/master'
dio Dec 18, 2020
e8d43dd
@envoy
dio Dec 18, 2020
5e06616
Merge remote-tracking branch 'upstream/master'
dio Jan 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ build --action_env=CXX
build --action_env=LLVM_CONFIG
build --action_env=PATH

# Skip system ICU linking.
build --@com_googlesource_googleurl//build_config:system_icu=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way in CI to verify that we definitely don't have ICU in the binary somewhow? E.g. with readelf or the like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may not need to deal with ICU at all here. We just need the ParseStandardURL function which does use or include anything from ICU.
I think what would work is this:

  1. define new cc_library in the googleurl/url/BUILD which only includes sources necessary to build the ParseStandardURL function. I think this should not involve any files that reference ICU.

Note that we may also need the RemoveURLWhitespace function. I'm yet unsure if this function is needed. I'll look into it later tonight.

  1. Add flag protected change to the Http::Utility::Url class to use the ParseStandardURL from google URL.
  2. Make sure all existing tests pass.

Copy link
Member Author

@dio dio Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yanavlasov sorry, a question. I can definitely pull only the ParseStandardURL related files (that will be url/third_party/mozilla/url_parse.{cc, h}) and let the Http::Utility::Url implementation to use that, however, how about:

#include "common/chromium_url/url_canon.h"
#include "common/chromium_url/url_canon_stdstring.h"
? Are we going to keep chromium_url around? cc. @htuch.

If yes then we don't need to pull googleurl from googlesource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will removal common/chromium_url in favor of this eventually, but will need to keep it around while we do the runtime guarded transition.

We should still verify ICU is not creeping into the build, since this is a regression check for the previous CVE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately we want something to replace the uses in https://github.com/envoyproxy/envoy/tree/master/source/common/chromium_url, which are around url::CanonicalizePath(). Can we do this without the ICU shim? What about pulling other URL parsing functionality out? Do we need build gymnastics to skip ICU in these cases, e.g. patch files?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. So from my understanding, the caller site is

absl::optional<std::string> canonicalizePath(absl::string_view original_path) {
std::string canonical_path;
chromium_url::Component in_component(0, original_path.size());
chromium_url::Component out_component;
chromium_url::StdStringCanonOutput output(&canonical_path);
if (!chromium_url::CanonicalizePath(original_path.data(), in_component, &output,
&out_component)) {
return absl::nullopt;
} else {
output.Complete();
return absl::make_optional(std::move(canonical_path));
}
}
} // namespace
plus we need ParseStandardURL. I will try to look at this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few attempts, I think we can't do a clean "import" e.g. referencing third_party/mozilla/url_parse.cc from a BUILD file in envoy can cause a visibility problem (can be solved by using exports_files but that requires a patch to upstream), without as @htuch said, a custom genrule_cmd.

re: fuzzing, the motivation was to make sure if we have a googleurl lib linked with the shimmed ICU lib should not causing crashes (but yeah this should be predictable by inputting URLs with IDN to the function). If we want to skip testing it via GURL class, I can remove the test and do testing by calling "url::CanonicalizePath" instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simple exports_files patch is OK. What we're really weary of is complex patches that are likely to break on version upgrades.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Submitted a draft here: #14583, patching the googleurl's url/BUILD file.


# Common flags for sanitizers
build:sanitizer --define tcmalloc=disabled
build:sanitizer --linkopt -ldl
Expand Down
4 changes: 4 additions & 0 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ load("//bazel:api_repositories.bzl", "envoy_api_dependencies")

envoy_api_dependencies()

load("//bazel:icu_binding.bzl", "envoy_icu_binding")

envoy_icu_binding()

load("//bazel:repositories.bzl", "envoy_dependencies")

envoy_dependencies()
Expand Down
52 changes: 52 additions & 0 deletions bazel/external/googleurl.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# TODO(dio): Consider to remove this patch when we solely compile the project using clang-cl.
# Tracked in https://github.com/envoyproxy/envoy/issues/11974.

diff --git a/base/compiler_specific.h b/base/compiler_specific.h
index 0cd36dc..8c4cbd4 100644
--- a/base/compiler_specific.h
+++ b/base/compiler_specific.h
@@ -7,10 +7,6 @@

#include "build/build_config.h"

-#if defined(COMPILER_MSVC) && !defined(__clang__)
-#error "Only clang-cl is supported on Windows, see https://crbug.com/988071"
-#endif
-
// Annotate a variable indicating it's ok if the variable is not used.
// (Typically used to silence a compiler warning when the assignment
// is important for some other reason.)
@@ -55,8 +51,12 @@
// prevent code folding, see gurl_base::debug::Alias.
// Use like:
// void NOT_TAIL_CALLED FooBar();
-#if defined(__clang__) && __has_attribute(not_tail_called)
+#if defined(__clang__)
+#if defined(__has_attribute)
+#if __has_attribute(not_tail_called)
#define NOT_TAIL_CALLED __attribute__((not_tail_called))
+#endif
+#endif
#else
#define NOT_TAIL_CALLED
#endif
@@ -226,7 +226,9 @@
#endif
#endif

-#if defined(__clang__) && __has_attribute(uninitialized)
+#if defined(__clang__)
+#if defined(__has_attribute)
+#if __has_attribute(uninitialized)
// Attribute "uninitialized" disables -ftrivial-auto-var-init=pattern for
// the specified variable.
// Library-wide alternative is
@@ -257,6 +259,8 @@
// E.g. platform, bot, benchmark or test name in patch description or next to
// the attribute.
#define STACK_UNINITIALIZED __attribute__((uninitialized))
+#endif
+#endif
#else
#define STACK_UNINITIALIZED
#endif
9 changes: 9 additions & 0 deletions bazel/external/icu/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
licenses(["notice"]) # Apache 2

test_suite(
name = "ci_tests",
tests = [
"//bazel/external/icu/googleurl:ci_tests",
"@org_unicode_icuuc//:ci_tests",
],
)
47 changes: 47 additions & 0 deletions bazel/external/icu/googleurl/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
load(
"//bazel:envoy_build_system.bzl",
"envoy_cc_fuzz_test",
"envoy_cc_test",
"envoy_package",
"envoy_sh_test",
)

licenses(["notice"]) # Apache 2

envoy_package()

envoy_cc_fuzz_test(
name = "googleurl_fuzz_test",
srcs = ["googleurl_fuzz_test.cc"],
corpus = "googleurl_corpus",
external_deps = [
"googleurl",
],
)

envoy_cc_test(
name = "googleurl_test",
srcs = ["googleurl_test.cc"],
external_deps = [
"googleurl",
],
)

envoy_sh_test(
name = "system_icu_test",
srcs = ["system_icu_test.sh"],
coverage = False,
data = ["//bazel/external/icu/googleurl:googleurl_test"],
tags = [
"skip_on_windows",
],
)

test_suite(
name = "ci_tests",
tests = [
"googleurl_fuzz_test",
"googleurl_test",
"system_icu_test",
],
)
1 change: 1 addition & 0 deletions bazel/external/icu/googleurl/googleurl_corpus/example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://\xe5\x85\x89.example/
24 changes: 24 additions & 0 deletions bazel/external/icu/googleurl/googleurl_fuzz_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#include "test/fuzz/fuzz_runner.h"

#include "url/gurl.h"

namespace Envoy {
namespace Fuzz {
namespace {

DEFINE_FUZZER(const uint8_t* buf, size_t len) {
const std::string input(reinterpret_cast<const char*>(buf), len);

GURL url(input);
url.is_valid();
url.scheme();
url.host();
url.EffectiveIntPort();
url.path();
url.query();
url.ref();
}

} // namespace
} // namespace Fuzz
} // namespace Envoy
47 changes: 47 additions & 0 deletions bazel/external/icu/googleurl/googleurl_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#include "gtest/gtest.h"
#include "url/gurl.h"

namespace Envoy {
namespace {

void expectInvalidUrl(const std::string& input) {
GURL url(input);
EXPECT_FALSE(url.is_valid());
}

} // namespace

// Verifies that valid URLs are parsed correctly.
TEST(GoogleUrl, ValidUrl) {
GURL url("https://example.org/test?foo=bar#section");
EXPECT_TRUE(url.is_valid());
EXPECT_EQ(url.scheme(), "https");
EXPECT_EQ(url.host(), "example.org");
EXPECT_EQ(url.EffectiveIntPort(), 443);
EXPECT_EQ(url.path(), "/test");
EXPECT_EQ(url.query(), "foo=bar");
EXPECT_EQ(url.ref(), "section");

GURL punycode("https://xn--c1yn36f.example");
EXPECT_TRUE(punycode.is_valid());

GURL percent_encoded_valid("https://%20.example");
EXPECT_TRUE(percent_encoded_valid.is_valid());

GURL extra_slashes("https:///host");
EXPECT_TRUE(extra_slashes.is_valid());
EXPECT_EQ(extra_slashes.host(), "host");
}

// Verifies that invalid URLs can be handled and ensures that GURL (with shimmed ICU) works properly
// when parsing URLs with IDN host name.
TEST(GoogleUrl, InvalidUrl) {
// Ensure ICU shim is functioning correctly, i.e. not crashing and resulting invalid parsed URL.
expectInvalidUrl("https://\xe5\x85\x89.example/");

expectInvalidUrl("http://\xef\xb9\xaa.com");
expectInvalidUrl("https://%Da%aa.example");
expectInvalidUrl("http://[wwww].example");
expectInvalidUrl("http://?k=v");
}
} // namespace Envoy
18 changes: 18 additions & 0 deletions bazel/external/icu/googleurl/system_icu_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

set -e

TEST_BIN="${TEST_SRCDIR}/envoy/bazel/external/icu/googleurl/googleurl_test"

if [[ $(uname) == "Darwin" ]]; then
echo "Skipping on macOS."
exit 0
fi

if readelf -d "${TEST_BIN}" | grep "NEEDED" | grep "icu"; then
echo "${TEST_BIN} is linked to system ICU."
exit 1
fi

echo "${TEST_BIN} is not linked to system ICU."
exit 0
29 changes: 29 additions & 0 deletions bazel/external/icu/shim/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
load("@rules_cc//cc:defs.bzl", "cc_library", "cc_test")

licenses(["notice"]) # Apache 2

cc_library(
name = "common",
hdrs = [
"common/unicode/uidna.h",
"common/unicode/utypes.h",
],
includes = ["common"],
visibility = ["//visibility:public"],
)

cc_test(
name = "common_test",
srcs = ["common_test.cc"],
visibility = ["//visibility:public"],
deps = [
":common",
],
)

test_suite(
name = "ci_tests",
tests = [
"common_test",
],
)
45 changes: 45 additions & 0 deletions bazel/external/icu/shim/common/unicode/uidna.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// NOLINT(namespace-envoy)
#pragma once

#include "unicode/utypes.h"

// Initialize UIDNAInfo object with UIDNA_ERROR_DISALLOWED as its errors (any non-zero value should
// work).
#define UIDNA_INFO_INITIALIZER \
{ 0x80 }

// This enum is declared to satisfy uidna_openUTS46 when initializing UIDNAWrapper
// https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#50.
enum { UIDNA_CHECK_BIDI = 4 };

// The fake UIDNA.
struct UIDNA {};

// The fake UIDNAInfo.
struct UIDNAInfo {
uint32_t errors;
};

// This is called by IDNToASCII
// (https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#85)
// which is responsible for converting the Unicode input representing a hostname to ASCII using IDN
// rules. In this shimmed implementation, we always set the error code to U_ILLEGAL_ARGUMENT_ERROR
// and info errors to UIDNA_ERROR_DISALLOWED, hence the coversion always fails.
int32_t uidna_nameToASCII(const UIDNA*, const UChar*, int32_t, UChar*, int32_t, UIDNAInfo* info,
UErrorCode* error_code) {
*error_code = U_ILLEGAL_ARGUMENT_ERROR;
return 0;
}

// This is called inside the UIDNAWrapper
// (https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#50)
// and should be accessed only through GetUIDNA
// (https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#66).
// This pattern is used inside GURL to initialize a static value at first use. It returns an
// instance of UIDNA to satisfy the check (GURL_DCHECK(uidna != nullptr);) here:
// https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#89.
UIDNA* uidna_openUTS46(uint32_t options, UErrorCode* error_code) { return new UIDNA; }

// While this is never be called, it needs to be defined since:
// https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#53.
const char* u_errorName(UErrorCode) { return "U_ILLEGAL_ARGUMENT_ERROR"; }
33 changes: 33 additions & 0 deletions bazel/external/icu/shim/common/unicode/utypes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// NOLINT(namespace-envoy)
#pragma once

#include <cstdint>

using UBool = bool;
using UChar = uint16_t;

enum UErrorCode {
// This value is used for initializing error code in this line:
// https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#91.
U_ZERO_ERROR = 0,

// This makes info.errors in this line:
// https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#102
// to be non-zero.
U_ILLEGAL_ARGUMENT_ERROR = 1,

// Required in this line:
// https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#102.
U_BUFFER_OVERFLOW_ERROR = 15
};

// This is called inside IDNToASCII
// (https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#95),
// this should always return false.
static inline UBool U_SUCCESS(UErrorCode) { return false; }

// This is called by UIDNAWrapper constructor
// (https://quiche.googlesource.com/googleurl/+/ef0d23689e240e6c8de4c3a5296b209128c87373/url/url_idna_icu.cc#51).
// This should always return false, since the initialization of UIDNAWrapper should always be
// successful.
static inline UBool U_FAILURE(UErrorCode) { return false; }
48 changes: 48 additions & 0 deletions bazel/external/icu/shim/common_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
// NOLINT(namespace-envoy)
// Basic smoke test to ensure that ICU shim works properly.
#include <iostream>

#include "unicode/uidna.h"

#define ASSERT_EQ(v1, v2) \
if ((v1) != (v2)) { \
std::cerr << "Expected equality of" << std::endl \
<< " " << #v1 << " (equal to " << (v1) << ")" << std::endl \
<< "and" << std::endl \
<< " " << #v2 << " (equal to " << (v2) << ")" << std::endl; \
return 1; \
}

int main(int argc, char** argv) {
{
UErrorCode err = U_ZERO_ERROR;
auto uidna = uidna_openUTS46(0, &err);

ASSERT_EQ(uidna != nullptr, true);
ASSERT_EQ(err, U_ZERO_ERROR);

delete uidna;
}

{
UErrorCode err = U_ZERO_ERROR;
UIDNAInfo info = UIDNA_INFO_INITIALIZER;
uidna_nameToASCII(nullptr, nullptr, 0, nullptr, 0, &info, &err);

ASSERT_EQ(info.errors, 0x80);
ASSERT_EQ(err, U_ILLEGAL_ARGUMENT_ERROR);
}

{
UErrorCode err = U_ZERO_ERROR;
UIDNAInfo info = UIDNA_INFO_INITIALIZER;
UChar data[] = {'1', '2', '3'};
UChar* src = &data[0];
uidna_nameToASCII(nullptr, src, 0, nullptr, *src, &info, &err);

ASSERT_EQ(info.errors, 0x80);
ASSERT_EQ(err, U_ILLEGAL_ARGUMENT_ERROR);
}

return 0;
}
Loading