-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-17735: [C++][Parquet] Optimize parquet reading for String/Binary type #14353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
df7b91a
6c746d0
1a3cf43
5c8a253
3f34ecd
d534e0b
390323c
3294ace
f703c5a
8f83984
c4312a7
9104f0e
3a80c9d
50ec964
7b2e109
5512db8
85c99ce
8d611c6
19e2ba3
f10d019
6abd879
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -852,12 +852,23 @@ class ColumnReaderImplBase { | |||||||||||||
| current_encoding_ = encoding; | ||||||||||||||
| current_decoder_->SetData(static_cast<int>(num_buffered_values_), buffer, | ||||||||||||||
| static_cast<int>(data_size)); | ||||||||||||||
| if (!hasSet_uses_opt_) { | ||||||||||||||
| if (current_encoding_ == Encoding::PLAIN_DICTIONARY || | ||||||||||||||
| current_encoding_ == Encoding::PLAIN || | ||||||||||||||
| current_encoding_ == Encoding::RLE_DICTIONARY) { | ||||||||||||||
|
Comment on lines
+856
to
+858
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are all these cases covered by UT?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, Other encodings will skip this optimization. Existed UTs will cover all encodings cases.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just curious, why other encodings are not supported? |
||||||||||||||
| uses_opt_ = true; | ||||||||||||||
| } | ||||||||||||||
| hasSet_uses_opt_ = true; | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| int64_t available_values_current_page() const { | ||||||||||||||
| return num_buffered_values_ - num_decoded_values_; | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| bool hasSet_uses_opt_ = false; | ||||||||||||||
| bool uses_opt_ = false; | ||||||||||||||
|
Comment on lines
+869
to
+870
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these two flags really necessary?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These two flags is for Just avoid comparing every time.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try to make it simple. I don't think there's any performance consideration here.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you create a separate function for that, as Yibo suggested? If you do measure a meaningful performance difference, could you share your results then? In addition, could you add a comment explaining why the optimization is only applicable to those those three encodings?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, also please find a more descriptive name than "uses optimization". (which optimization?) |
||||||||||||||
|
|
||||||||||||||
| const ColumnDescriptor* descr_; | ||||||||||||||
| const int16_t max_def_level_; | ||||||||||||||
| const int16_t max_rep_level_; | ||||||||||||||
|
|
@@ -1594,6 +1605,8 @@ class TypedRecordReader : public TypedColumnReaderImpl<DType>, | |||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| std::shared_ptr<ResizableBuffer> ReleaseOffsets() override { return nullptr; } | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
|
||||||||||||||
| std::shared_ptr<ResizableBuffer> ReleaseIsValid() override { | ||||||||||||||
| if (leaf_info_.HasNullableValues()) { | ||||||||||||||
| auto result = valid_bits_; | ||||||||||||||
|
|
@@ -1697,7 +1710,7 @@ class TypedRecordReader : public TypedColumnReaderImpl<DType>, | |||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| void ReserveValues(int64_t extra_values) { | ||||||||||||||
| void ReserveValues(int64_t extra_values) override { | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
| const int64_t new_values_capacity = | ||||||||||||||
| UpdateCapacity(values_capacity_, values_written_, extra_values); | ||||||||||||||
| if (new_values_capacity > values_capacity_) { | ||||||||||||||
|
|
@@ -1959,6 +1972,138 @@ class ByteArrayChunkedRecordReader : public TypedRecordReader<ByteArrayType>, | |||||||||||||
| typename EncodingTraits<ByteArrayType>::Accumulator accumulator_; | ||||||||||||||
| }; | ||||||||||||||
|
|
||||||||||||||
| class ByteArrayChunkedOptRecordReader : public TypedRecordReader<ByteArrayType>, | ||||||||||||||
| virtual public BinaryRecordReader { | ||||||||||||||
| public: | ||||||||||||||
| ByteArrayChunkedOptRecordReader(const ColumnDescriptor* descr, LevelInfo leaf_info, | ||||||||||||||
| ::arrow::MemoryPool* pool) | ||||||||||||||
| : TypedRecordReader<ByteArrayType>(descr, leaf_info, pool) { | ||||||||||||||
| DCHECK_EQ(descr_->physical_type(), Type::BYTE_ARRAY); | ||||||||||||||
| accumulator_.builder.reset(new ::arrow::BinaryBuilder(pool)); | ||||||||||||||
| values_ = AllocateBuffer(pool); | ||||||||||||||
| offset_ = AllocateBuffer(pool); | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| ::arrow::ArrayVector GetBuilderChunks() override { | ||||||||||||||
| if (uses_opt_) { | ||||||||||||||
| std::vector<std::shared_ptr<Buffer>> buffers = {ReleaseIsValid(), ReleaseOffsets(), | ||||||||||||||
| ReleaseValues()}; | ||||||||||||||
| auto data = std::make_shared<::arrow::ArrayData>( | ||||||||||||||
| ::arrow::binary(), values_written(), buffers, null_count()); | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is |
||||||||||||||
|
|
||||||||||||||
| auto chunks = ::arrow::ArrayVector({::arrow::MakeArray(data)}); | ||||||||||||||
| return chunks; | ||||||||||||||
| } else { | ||||||||||||||
| ::arrow::ArrayVector result = accumulator_.chunks; | ||||||||||||||
| if (result.size() == 0 || accumulator_.builder->length() > 0) { | ||||||||||||||
| std::shared_ptr<::arrow::Array> last_chunk; | ||||||||||||||
| PARQUET_THROW_NOT_OK(accumulator_.builder->Finish(&last_chunk)); | ||||||||||||||
| result.push_back(std::move(last_chunk)); | ||||||||||||||
| } | ||||||||||||||
| accumulator_.chunks = {}; | ||||||||||||||
| return result; | ||||||||||||||
|
Comment on lines
+1996
to
+2004
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duplicates
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The optimized RecordReader implementation is ByteArrayChunkedOptRecordReader And is just for Binary/String/LargeBinary/LargeString types.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This probably means it's not good to create a new class for this optimized reader. That said, I don't have deep knowlege of parquet code and lack bandwidth recently to investigate, someone else might comment.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, would shouldn't create another class. I don't see any reason this can't be used for the Decimal case.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've found locally that if I merge the implementations, the unit tests pass. Could you please merge them in the PR?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wjones127
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, those are the two classes I merged.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is the changeset that allows the Could you incorporate those changes?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for not adding a separate class. This would be difficult to maintain if more optimization will be added. It would be better if an option can be added so that user can manually turn it off when something goes wrong with the new feature. |
||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| void ReadValuesDense(int64_t values_to_read) override { | ||||||||||||||
| if (uses_opt_) { | ||||||||||||||
| int64_t num_decoded = this->current_decoder_->DecodeArrowZeroCopy( | ||||||||||||||
| static_cast<int>(values_to_read), 0, NULLPTR, | ||||||||||||||
| (reinterpret_cast<int32_t*>(offset_->mutable_data()) + values_written_), | ||||||||||||||
| values_, 0, &binary_length_); | ||||||||||||||
| DCHECK_EQ(num_decoded, values_to_read); | ||||||||||||||
| } else { | ||||||||||||||
| int64_t num_decoded = this->current_decoder_->DecodeArrowNonNull( | ||||||||||||||
| static_cast<int>(values_to_read), &accumulator_); | ||||||||||||||
| CheckNumberDecoded(num_decoded, values_to_read); | ||||||||||||||
| ResetValues(); | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| void ReadValuesSpaced(int64_t values_to_read, int64_t null_count) override { | ||||||||||||||
| if (uses_opt_) { | ||||||||||||||
| int64_t num_decoded = this->current_decoder_->DecodeArrowZeroCopy( | ||||||||||||||
| static_cast<int>(values_to_read), static_cast<int>(null_count), | ||||||||||||||
| valid_bits_->mutable_data(), | ||||||||||||||
| (reinterpret_cast<int32_t*>(offset_->mutable_data()) + values_written_), | ||||||||||||||
| values_, values_written_, &binary_length_); | ||||||||||||||
| DCHECK_EQ(num_decoded, values_to_read - null_count); | ||||||||||||||
| } else { | ||||||||||||||
| int64_t num_decoded = this->current_decoder_->DecodeArrow( | ||||||||||||||
| static_cast<int>(values_to_read), static_cast<int>(null_count), | ||||||||||||||
| valid_bits_->mutable_data(), values_written_, &accumulator_); | ||||||||||||||
| CheckNumberDecoded(num_decoded, values_to_read - null_count); | ||||||||||||||
| ResetValues(); | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| void ReserveValues(int64_t extra_values) override { | ||||||||||||||
| const int64_t new_values_capacity = | ||||||||||||||
| UpdateCapacity(values_capacity_, values_written_, extra_values); | ||||||||||||||
| if (new_values_capacity > values_capacity_) { | ||||||||||||||
| PARQUET_THROW_NOT_OK( | ||||||||||||||
| values_->Resize(new_values_capacity * binary_per_row_length_, false)); | ||||||||||||||
|
Comment on lines
+2044
to
+2045
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we make this an option:
Suggested change
|
||||||||||||||
| PARQUET_THROW_NOT_OK(offset_->Resize((new_values_capacity + 1) * 4, false)); | ||||||||||||||
|
|
||||||||||||||
| auto offset = reinterpret_cast<int32_t*>(offset_->mutable_data()); | ||||||||||||||
| offset[0] = 0; | ||||||||||||||
|
|
||||||||||||||
| values_capacity_ = new_values_capacity; | ||||||||||||||
| } | ||||||||||||||
| if (leaf_info_.HasNullableValues()) { | ||||||||||||||
| int64_t valid_bytes_new = bit_util::BytesForBits(values_capacity_); | ||||||||||||||
| if (valid_bits_->size() < valid_bytes_new) { | ||||||||||||||
| int64_t valid_bytes_old = bit_util::BytesForBits(values_written_); | ||||||||||||||
| PARQUET_THROW_NOT_OK(valid_bits_->Resize(valid_bytes_new, false)); | ||||||||||||||
| // Avoid valgrind warnings | ||||||||||||||
| memset(valid_bits_->mutable_data() + valid_bytes_old, 0, | ||||||||||||||
| valid_bytes_new - valid_bytes_old); | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
| std::shared_ptr<ResizableBuffer> ReleaseValues() override { | ||||||||||||||
| auto result = values_; | ||||||||||||||
| values_ = AllocateBuffer(this->pool_); | ||||||||||||||
| values_capacity_ = 0; | ||||||||||||||
| return result; | ||||||||||||||
| } | ||||||||||||||
| std::shared_ptr<ResizableBuffer> ReleaseOffsets() override { | ||||||||||||||
| auto result = offset_; | ||||||||||||||
| if (ARROW_PREDICT_FALSE(!hasCal_average_len_)) { | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we make it an optional:
Suggested change
|
||||||||||||||
| auto offsetArr = reinterpret_cast<int32_t*>(offset_->mutable_data()); | ||||||||||||||
| const auto first_offset = offsetArr[0]; | ||||||||||||||
| const auto last_offset = offsetArr[values_written_]; | ||||||||||||||
| int64_t binary_length = last_offset - first_offset; | ||||||||||||||
| binary_per_row_length_ = binary_length / values_written_ + 1; | ||||||||||||||
| hasCal_average_len_ = true; | ||||||||||||||
| } | ||||||||||||||
| offset_ = AllocateBuffer(this->pool_); | ||||||||||||||
| binary_length_ = 0; | ||||||||||||||
| return result; | ||||||||||||||
| } | ||||||||||||||
| void ResetValues() { | ||||||||||||||
| if (values_written_ > 0) { | ||||||||||||||
| // Resize to 0, but do not shrink to fit | ||||||||||||||
| PARQUET_THROW_NOT_OK(valid_bits_->Resize(0, false)); | ||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I ran the unit tests (parquet-arrow-test) with a debugger, and found this branch was never hit. Does that seem right? Could you add a test that validates this branch?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just follow here: arrow/cpp/src/parquet/column_reader.cc Lines 1846 to 1851 in fc53ff8
|
||||||||||||||
| PARQUET_THROW_NOT_OK(offset_->Resize(0, false)); | ||||||||||||||
| PARQUET_THROW_NOT_OK(values_->Resize(0, false)); | ||||||||||||||
|
|
||||||||||||||
| values_written_ = 0; | ||||||||||||||
| values_capacity_ = 0; | ||||||||||||||
| null_count_ = 0; | ||||||||||||||
| binary_length_ = 0; | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
| private: | ||||||||||||||
| // Helper data structure for accumulating builder chunks | ||||||||||||||
| typename EncodingTraits<ByteArrayType>::Accumulator accumulator_; | ||||||||||||||
|
|
||||||||||||||
| int32_t binary_length_ = 0; | ||||||||||||||
|
|
||||||||||||||
| std::shared_ptr<::arrow::ResizableBuffer> offset_; | ||||||||||||||
| }; | ||||||||||||||
|
|
||||||||||||||
| class ByteArrayDictionaryRecordReader : public TypedRecordReader<ByteArrayType>, | ||||||||||||||
| virtual public DictionaryRecordReader { | ||||||||||||||
| public: | ||||||||||||||
|
|
@@ -2056,8 +2201,10 @@ std::shared_ptr<RecordReader> MakeByteArrayRecordReader(const ColumnDescriptor* | |||||||||||||
| bool read_dictionary) { | ||||||||||||||
| if (read_dictionary) { | ||||||||||||||
| return std::make_shared<ByteArrayDictionaryRecordReader>(descr, leaf_info, pool); | ||||||||||||||
| } else { | ||||||||||||||
| } else if (descr->logical_type()->is_decimal()) { | ||||||||||||||
| return std::make_shared<ByteArrayChunkedRecordReader>(descr, leaf_info, pool); | ||||||||||||||
| } else { | ||||||||||||||
| return std::make_shared<ByteArrayChunkedOptRecordReader>(descr, leaf_info, pool); | ||||||||||||||
| } | ||||||||||||||
| } | ||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -55,6 +55,8 @@ static constexpr uint32_t kDefaultMaxPageHeaderSize = 16 * 1024 * 1024; | |||||||||||||||||||||||||||||||||||||||||||||||||
| // 16 KB is the default expected page header size | ||||||||||||||||||||||||||||||||||||||||||||||||||
| static constexpr uint32_t kDefaultPageHeaderSize = 16 * 1024; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| static constexpr int32_t kDefaultBinaryPerRowSize = 20; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this corresponds to
Suggested change
(Also change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be better to add a comment here. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| class PARQUET_EXPORT LevelDecoder { | ||||||||||||||||||||||||||||||||||||||||||||||||||
| public: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| LevelDecoder(); | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -291,6 +293,8 @@ class PARQUET_EXPORT RecordReader { | |||||||||||||||||||||||||||||||||||||||||||||||||
| /// \brief Pre-allocate space for data. Results in better flat read performance | ||||||||||||||||||||||||||||||||||||||||||||||||||
| virtual void Reserve(int64_t num_values) = 0; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| virtual void ReserveValues(int64_t capacity) {} | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A new interface? Are these added data members and functions necessary for this base class? I suppose they are only for the new reader implementation.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previously. it's TypedRecordReader internal interface, arrow/cpp/src/parquet/column_reader.cc Lines 1698 to 1720 in f82501e
And ByteArrayChunkedOptRecordReader extends from TypedRecordReader, so extract it as public interface.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't it just a helper function specific to implementation?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since these are coming from
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it better to make it pure virtual? In addition, it helps to add a comment for public function. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| /// \brief Clear consumed values and repetition/definition levels as the | ||||||||||||||||||||||||||||||||||||||||||||||||||
| /// result of calling ReadRecords | ||||||||||||||||||||||||||||||||||||||||||||||||||
| virtual void Reset() = 0; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -299,6 +303,8 @@ class PARQUET_EXPORT RecordReader { | |||||||||||||||||||||||||||||||||||||||||||||||||
| /// allocated in subsequent ReadRecords calls | ||||||||||||||||||||||||||||||||||||||||||||||||||
| virtual std::shared_ptr<ResizableBuffer> ReleaseValues() = 0; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| virtual std::shared_ptr<ResizableBuffer> ReleaseOffsets() = 0; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+306
to
+307
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here for a comment. |
||||||||||||||||||||||||||||||||||||||||||||||||||
| /// \brief Transfer filled validity bitmap buffer to caller. A new one will | ||||||||||||||||||||||||||||||||||||||||||||||||||
| /// be allocated in subsequent ReadRecords calls | ||||||||||||||||||||||||||||||||||||||||||||||||||
| virtual std::shared_ptr<ResizableBuffer> ReleaseIsValid() = 0; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -370,6 +376,9 @@ class PARQUET_EXPORT RecordReader { | |||||||||||||||||||||||||||||||||||||||||||||||||
| int64_t values_capacity_; | ||||||||||||||||||||||||||||||||||||||||||||||||||
| int64_t null_count_; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| bool hasCal_average_len_ = false; | ||||||||||||||||||||||||||||||||||||||||||||||||||
| int64_t binary_per_row_length_ = kDefaultBinaryPerRowSize; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+379
to
+380
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, I think this would be clearer as a
Suggested change
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| /// \brief Each bit corresponds to one element in 'values_' and specifies if it | ||||||||||||||||||||||||||||||||||||||||||||||||||
| /// is null or not null. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| std::shared_ptr<::arrow::ResizableBuffer> valid_bits_; | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stick to snake_case for variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just follow https://github.com/apache/arrow/blob/f82501e763ff48af610077f9525ae83cc3ab2e95/cpp/src/parquet/column_reader.cc#L1271,
Why different renaming ways for variables?
Thank!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mix camel and snake case.
has_set_uses_opt