Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Sep 24, 2019

I have added a simple base64 implementation (Zlib license) to arrow/vendored from

https://github.com/ReneNyffenegger/cpp-base64

@kou
Copy link
Member

kou commented Sep 24, 2019

@wesm
Copy link
Member Author

wesm commented Sep 25, 2019

I think the strategy here is a stopgap so that we can release. I think we are trying to depend less in general on Boost so either way vendoring a base64 implementation may be a good idea

@kou
Copy link
Member

kou commented Sep 25, 2019

I see.

@wesm
Copy link
Member Author

wesm commented Sep 25, 2019

This needs exports for MSVC. Adding them

@wesm
Copy link
Member Author

wesm commented Sep 25, 2019

Here's running builds

Appveyor: https://ci.appveyor.com/project/wesm/arrow/builds/27646405
Travis CI: https://travis-ci.org/wesm/arrow/builds/589252353

Can be merged after a bit more of the builds run if deemed acceptable

in a product, an acknowledgment in the product documentation would be
appreciated but is not required.

2. Altered source versions must be plainly marked as such, and must not be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the source code hasn't been modified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unmodified

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be pedantic, the only difference is adding the arrow::util namespace.

@emkornfield
Copy link
Contributor

All C++ related builds have passed. I think this can merged.

As follow-ups it would be nice to have:

  1. Tests for forward compatibility if the encoding changes in some way
  2. Possibly adding integration smoke tests (including the one used that found this issue to some place in our CI). Maybe just running the parquet-mr tool?


while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_])) {
char_array_4[i++] = encoded_string[in_]; in_++;
if (i ==4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit space is off here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we don't clang-format the files in vendored/*)

@wesm
Copy link
Member Author

wesm commented Sep 25, 2019

Whether UTF-8 is validated may depend on the Thrift library. It was only caught incidentally in a Python-based Thrift decoder because Python did decode('utf-8') on the bytes and failed

@emkornfield
Copy link
Contributor

+1, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants