-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6825: [C++] Rework CSV reader IO around readahead iterator #5727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kou There is a weird Github Actions error here: https://github.com/apache/arrow/pull/5727/checks?check_run_id=273513110 Looks like GH uses powershell by default? Edit: indeed: https://github.blog/changelog/2019-10-17-github-actions-default-shell-on-windows-runners-is-changing-to-powershell/ |
f487461 to
c2804d0
Compare
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very elegant. Just a few comments
cpp/src/arrow/json/chunker.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is empty enough to fold into json/reader.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chunker.h is small, but chunker.cc is non-trivial IMHO.
Make the delimiting chunker a common facility used by CSV and JSON.
c2804d0 to
68a5a02
Compare
|
@bkietz I think I've addressed all your comments. |
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| // Two blocks | ||
| auto csv1 = MakeCSVData({"ab,cd\n"}); | ||
| auto csv2 = MakeCSVData({"ef,"}); | ||
| AssertParseFinal(parser, {csv1, csv2}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou I'm getting 'ambiguous function call' compilation errors with this and AssertParseOk(line 241). Not sure why this was not caught in CI builds. I am building with tests, gandiva, jni ON. Could you please take a look and let me know if I should set any compiler flags? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you post the full compiler output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:241:5: error: call to 'AssertParseOk' is ambiguous
AssertParseOk(parser, {csv1, csv2});
^~~~~~~~~~~~~
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:138:6: note: candidate function
void AssertParseOk(BlockParser& parser, const std::string& str) {
^
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:144:6: note: candidate function
void AssertParseOk(BlockParser& parser, const std::vectorutil::string_view& data) {
^
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:393:3: error: call to 'AssertParseFinal' is ambiguous
AssertParseFinal(parser, {csv1, csv2});
^~~~~~~~~~~~~~~~
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:150:6: note: candidate function
void AssertParseFinal(BlockParser& parser, const std::string& str) {
^
/Users/travis/build/dremio/arrow-build/arrow/cpp/src/arrow/csv/parser_test.cc:156:6: note: candidate function
void AssertParseFinal(BlockParser& parser, const std::vectorutil::string_view& data) {
^
`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try the following patch?
diff --git a/cpp/src/arrow/csv/parser_test.cc b/cpp/src/arrow/csv/parser_test.cc
index f988f3ce2..418340f54 100644
--- a/cpp/src/arrow/csv/parser_test.cc
+++ b/cpp/src/arrow/csv/parser_test.cc
@@ -135,16 +135,25 @@ Status ParseFinal(BlockParser& parser, const std::string& str, uint32_t* out_siz
return parser.ParseFinal(util::string_view(str), out_size);
}
+std::vector<util::string_view> ViewsFromStrings(const std::vector<std::string>& data) {
+ std::vector<util::string_view> views(data.size());
+ for (size_t i = 0; i < data.size(); ++i) {
+ views[i] = data[i];
+ }
+ return views;
+}
+
void AssertParseOk(BlockParser& parser, const std::string& str) {
uint32_t parsed_size = static_cast<uint32_t>(-1);
ASSERT_OK(Parse(parser, str, &parsed_size));
ASSERT_EQ(parsed_size, str.size());
}
-void AssertParseOk(BlockParser& parser, const std::vector<util::string_view>& data) {
+void AssertParseOk(BlockParser& parser, const std::vector<std::string>& data) {
uint32_t parsed_size = static_cast<uint32_t>(-1);
- ASSERT_OK(parser.Parse(data, &parsed_size));
- ASSERT_EQ(parsed_size, TotalViewLength(data));
+ auto views = ViewsFromStrings(data);
+ ASSERT_OK(parser.Parse(views, &parsed_size));
+ ASSERT_EQ(parsed_size, TotalViewLength(views));
}
void AssertParseFinal(BlockParser& parser, const std::string& str) {
@@ -153,10 +162,11 @@ void AssertParseFinal(BlockParser& parser, const std::string& str) {
ASSERT_EQ(parsed_size, str.size());
}
-void AssertParseFinal(BlockParser& parser, const std::vector<util::string_view>& data) {
+void AssertParseFinal(BlockParser& parser, const std::vector<std::string>& data) {
uint32_t parsed_size = static_cast<uint32_t>(-1);
- ASSERT_OK(parser.ParseFinal(data, &parsed_size));
- ASSERT_EQ(parsed_size, TotalViewLength(data));
+ auto views = ViewsFromStrings(data);
+ ASSERT_OK(parser.ParseFinal(views, &parsed_size));
+ ASSERT_EQ(parsed_size, TotalViewLength(views));
}
void AssertParsePartial(BlockParser& parser, const std::string& str,There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same error with the patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enclosing elements of initializer list in braces seems to work. I pushed a patch - https://github.com/apache/arrow/pull/5791/files . Please merge if it looks alright. Thanks.
Make the delimiting chunker a common facility used by CSV and JSON.