Skip to content

Conversation

@vhvb1989
Copy link
Member

@vhvb1989 vhvb1989 commented Sep 7, 2021

An azure-core lib version without OpenSSL is required for the speech service which will consume only UUID (and HTTP in some future).

This PR makes some changes to the current azure core components consuming OpenSSL :

  • Uuid on Linux : It is used to generate a random buffer. Switching to use the win-implementation which uses the std lib lib.
  • base64 : Removing the current implementation calling openSSL for base64 for the azure-core for C99 implementation created in the past by @antkmsft
  • Crypto / Hashing : MD5 and Sha hashing is currently exposed on the public surface and implemented by calling OpenSSL lib. This PR is making all the Cryptographic namespace an optional feature that can be opt-out when building the azure-core lib

fixes: #2790

@vhvb1989 vhvb1989 self-assigned this Sep 7, 2021
@vhvb1989 vhvb1989 added Azure.Core Client This issue points to a problem in the data-plane of the library. Cognitive - Speech labels Sep 7, 2021
@vhvb1989 vhvb1989 added this to the [2021] October milestone Sep 7, 2021
Copy link
Member

@RickWinter RickWinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will result in non-unique UUIDs on linux platforms. std::random is not cryptographically secure on linux

@vhvb1989
Copy link
Member Author

vhvb1989 commented Sep 9, 2021

Linux perf results:

64-bit Mersenne Twister by Matsumoto
=== Results ===
Completed 536,943 operations in a weighted-average of 10s (53,693.482313 ops/s, 1.86242e-05 s/op)

OpenSSL
=== Results ===
Completed 44,534 operations in a weighted-average of 10s (4,453.341869 ops/s, 0.00022455 s/op)

@vhvb1989 vhvb1989 requested a review from RickWinter September 9, 2021 18:34
RickWinter
RickWinter previously approved these changes Sep 13, 2021
@RickWinter RickWinter self-requested a review September 13, 2021 18:49
Copy link
Contributor

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should expose the build crypto option, if we can remove the openssl dependency all together.

Will review base64 impl in a bit, do we have sufficient test coverage for it?

Copy link
Contributor

@ahsonkhan ahsonkhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need more testing of the base64 implementation, especially around edge cases (for integer overflow) and invalid input.


while (sourceIndex < static_cast<int64_t>(inputSize) - 4)
{
int64_t result = base64Decode(inputPtr + sourceIndex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If result is negative, it means invalid input and we should throw.

We need test cases like the following, which are invalid input:
"A" - too small
"AB" - too small
"ABC" - too small
"ABCD==" - not a multiple of 4
"ABCD====" - invalid padding count
"A%%%" - non-base64 chars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input has to be of characters from "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

i0 |= i3;
i0 |= i2;

base64WriteThreeLowOrderBytes(destinationPtr, i0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 463 to 464
destination.resize(resultSize);
destination.shrink_to_fit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resizing strings and vectors after the fact seems to be a relatively expensive operation.
@CaseyCarter how do we avoid this, and maybe just create the vector of the right size at the end without having to do linear data copy or re-allocation? In .NET/C#, this was relatively easy with span/slicing for buffers.

Comment on lines 396 to 397
int64_t sourceIndex = 0;
int64_t destinationIndex = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto "indices into memory buffers, these should be size_t"

int64_t result = base64Decode(inputPtr + sourceIndex);
base64WriteThreeLowOrderBytes(destinationPtr, result);
destinationPtr += 3;
destinationIndex += 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destinationIndex is unused.

vhvb1989 and others added 2 commits September 27, 2021 10:16
Co-authored-by: Ahson Khan <[email protected]>
Co-authored-by: Casey Carter <[email protected]>
Copy link
Member

@antkmsft antkmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casey has some good feedback, can we at least open work items? Re: random seems pretty important. Prefix increment, not so important, but I'd take it as well, but I am biased.

@vhvb1989 vhvb1989 enabled auto-merge (squash) October 27, 2021 20:59
@vhvb1989 vhvb1989 merged commit 3be793d into Azure:main Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure.Core Client This issue points to a problem in the data-plane of the library. Cognitive - Speech

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove OpenSSL dependency for Base64 encoding

5 participants