- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3.9k
 
Description
Describe the bug, including details regarding any error messages, version, and platform.
nums_rows() method is inherited from ParquetFileWriter. It seems that the method is designed to count the number of rows of RowGroups which are closed.
/// Number of rows in the yet started RowGroups.
///
/// Changes on the addition of a new RowGroup.
int64_t num_rows() const;But in the implementation of class FileSerializer : public ParquetFileWriter::Contents, the num_rows_ variable is only modified in Close method. It means that num_rows() method always returns 0 before Close is called.
And after Close of ParquetFileWriter is called we can no longer call nums_rows() method, because contents_ is reset in Close and we will get a null pointer exception if we call nums_rows().
So, in the current implementation of FileSerializer, num_rows() will only return 0 or throw exception.
int ParquetFileWriter::num_columns() const { return contents_->num_columns(); }
void ParquetFileWriter::Close() {
  if (contents_) {
    contents_->Close();
    file_metadata_ = contents_->metadata();
    contents_.reset();
  }
}
class FileSerializer : public ParquetFileWriter::Contents {
  int64_t num_rows() const override { return num_rows_; }
  void Close() override {
    if (is_open_) {
      if (row_group_writer_) {
        num_rows_ += row_group_writer_->num_rows();    // <- num_rows is only modified here and can never be read.
        ...
      }
    ...
    }
  }
  RowGroupWriter* AppendRowGroup(bool buffered_row_group) {
    if (row_group_writer_) {
      // <- A possible missing `num_rows_ += row_group_writer_->num_rows()` here.
      row_group_writer_->Close();
    }
    ...
  }
};Should we mark the virtual method in ParquetFileWriter deprecated and remove the method in next release? I guess the method which is used to count the row number during writing row groups is useless and our default implementation always returns 0 now. Otherwise we can fix the issue by following the comments above.
Component(s)
C++