ORC:ORC supports rolling writers. #3784

hililiwei · 2021-12-21T13:19:22Z

The length method of the OrcFileAppender class is modified. If the file is closed, the value of 'file.toInputFile().getLength()' is return, If not closed, use the estimated memory usage in treeWriter plus the position of the last tripe. Use reflection to get treeWriter(it's private).

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java

build.gradle

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java

rdblue · 2022-01-24T00:50:53Z

@hililiwei, can you describe how you're estimating the size of data that is buffered in memory for ORC? I think a description to explain to reviewers would help.

hililiwei · 2022-01-27T06:33:30Z

@hililiwei, can you describe how you're estimating the size of data that is buffered in memory for ORC? I think a description to explain to reviewers would help.

If a file is being written, to estimate its size, in three steps:

Size of data that has been written to stripe.The value is obtained by summing the offset and length of the last stripe of the writer.
Size of data that has been submitted to the writer but has not been written to the stripe. When creating OrcFileAppender, treeWriter is obtained through reflection, and use its estimateMemory to estimate how much memory is being used.
Data that has not been submitted to the writer, that is, the size of the buffer. The maximum default value of the buffer is used here.

Add these three values to estimate the data size.

rdblue · 2022-02-06T22:36:42Z

@hililiwei, I don't understand what #3 is. Why is this tracking the data that hasn't been submitted to the writer? It seems like all you're doing is adding a constant to the estimated size. For Parquet, we use the current file offset plus the size that is buffered in memory.

hililiwei · 2022-02-07T06:16:48Z

@hililiwei, I don't understand what #3 is. Why is this tracking the data that hasn't been submitted to the writer? It seems like all you're doing is adding a constant to the estimated size. For Parquet, we use the current file offset plus the size that is buffered in memory.

#3 mainly refers to the data in the VectorizedRowBatch

iceberg/orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java

Lines 81 to 91 in 2208b24

    
           public void add(D datum) { 
        
             try { 
        
               valueWriter.write(datum, batch); 
        
               if (batch.size == this.batchSize) { 
        
                 writer.addRowBatch(batch); 
        
                 batch.reset(); 
        
               } 
        
             } catch (IOException ioe) { 
        
               throw new RuntimeIOException(ioe, "Problem writing to ORC file %s", file.location()); 
        
             } 
        
           }

The data is written to the batch first.

coolderli · 2022-02-07T11:36:20Z

Any update about this? We found use orc can save more storage space than using parquet. So I'd like to try the Orc file.

hililiwei · 2022-02-16T07:04:44Z

Any update about this? We found use orc can save more storage space than using parquet. So I'd like to try the Orc file.

ping @rdblue @liubo1022126

liubo1022126 · 2022-02-22T03:39:28Z

@coolderli yes, parquet query performance is worse than orc when select by trino.

@hililiwei and does this pr have any remaining unfinished work? I want merge this pr to my branch.

hililiwei · 2022-02-22T06:21:05Z

@coolderli yes, parquet query performance is worse than orc when select by trino.

@hililiwei and does this pr have any remaining unfinished work? I want merge this pr to my branch.

For now, there are no major changes. However, I'm still waiting for comments from @rdblue or anyone else, so may revise it again. 😄

hililiwei · 2022-02-27T04:11:01Z

cc @rdblue @openinx, Could you please review when you have some time?

openinx · 2022-03-23T02:35:21Z

There are 3 failure cases from travis CI report:

org.apache.iceberg.flink.actions.TestRewriteDataFilesAction > testRewriteAvoidRepeateCompress[catalogName=testhive, baseNamespace=, format=ORC] FAILED
    java.lang.AssertionError: Action should add 1 data file expected:<1> but was:<2>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.apache.iceberg.flink.actions.TestRewriteDataFilesAction.testRewriteAvoidRepeateCompress(TestRewriteDataFilesAction.java:367)

org.apache.iceberg.flink.actions.TestRewriteDataFilesAction > testRewriteAvoidRepeateCompress[catalogName=testhadoop, baseNamespace=, format=ORC] FAILED
    java.lang.AssertionError: Action should add 1 data file expected:<1> but was:<2>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.apache.iceberg.flink.actions.TestRewriteDataFilesAction.testRewriteAvoidRepeateCompress(TestRewriteDataFilesAction.java:367)

org.apache.iceberg.flink.actions.TestRewriteDataFilesAction > testRewriteAvoidRepeateCompress[catalogName=testhadoop_basenamespace, baseNamespace=l0.l1, format=ORC] FAILED
    java.lang.AssertionError: Action should add 1 data file expected:<1> but was:<2>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.apache.iceberg.flink.actions.TestRewriteDataFilesAction.testRewriteAvoidRepeateCompress(TestRewriteDataFilesAction.java:367)

orc/src/main/java/org/apache/iceberg/orc/EstimateOrcAveWidthVisitor.java

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java

orc/src/main/java/org/apache/iceberg/orc/EstimateOrcAveWidthVisitor.java

openinx · 2022-03-23T03:24:47Z

orc/src/main/java/org/apache/iceberg/orc/EstimateOrcAveWidthVisitor.java

+      return 0;
+    }
+
+    switch (primitive.getCategory()) {


I think we need to align with the approach to estimate the avg width for each data type. I think the basic rule is: we need to read the GenericOrcWriters to see how those data types are encouded into the ORC column vector. That is the occupied in-memory byte size without any columnar compression.

The corresponding relationship is as follows:

Boolean -> LongColumnVector Byte -> LongColumnVector Short -> LongColumnVector INT -> LongColumnVector LONG -> LongColumnVector FLOAT -> DoubleColumnVector DOUBLE -> DoubleColumnVector DATE -> LongColumnVector TIMESTAMP -> TimestampColumnVector BINARY -> BytesColumnVector STRING -> BytesColumnVector DECIMAL -> Decimal18Writer or Decimal38Writer

The byte estimation corresponds:

LongColumnVector -> 8 byte DoubleColumnVector -> 8 byte TimestampColumnVector -> 12 byte Decimal18Writer/Decimal38Writer -> (precision + 4) / 2 byte BytesColumnVector -> 128 byte

How about this?

The estimated byte size of decimal is just: precision + 2. Just as I said in another comment, each digit will occupy just one byte. and in fact, the BigDecimal's unscaled value is usually a BigInteger, and the BigInteger will just encode each digit into a byte.

orc/src/main/java/org/apache/iceberg/orc/EstimateOrcAveWidthVisitor.java

orc/src/test/java/org/apache/iceberg/orc/TestEstimateOrcAveWidthVisitor.java

kbendick

Thanks @hililiwei. Left some further comments.

Additionally, is it possible for these changes to be backported to earlier Spark versions in subsequent PRs to make reviewing easier? It's possible I missed some discussion on this, so let me know if so.

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java

kbendick · 2022-03-23T19:57:39Z

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java

+    this.avgRowByteSize =
+        OrcSchemaVisitor.visitSchema(orcSchema, new EstimateOrcAvgWidthVisitor()).stream().reduce(Integer::sum)
+            .orElse(0);


The use of orElse(0) concerns me somewhat.

Looking at its usage, it seems as though using avgRowByteSize of 0 would mean that the entirety of batch.size would be unaccounted for in the estimate in the length function.

return (long) (dataLength + (estimateMemory + (long) batch.size * avgRowByteSize) * 0.2);

Under what situations would we expect this to reasonably return 0? Is that possible / expected in some edge case, or more indicative of a bug?

Would it make sense to default to some non-zero value (even 1) so that the ongoing batch.size isn't entirely dropped?

At the very least, it seems like we should potentially log a debug message stating that 0 is being used. If user's are investigating ORC files being written at sizes they find strange, having a log would be beneficial.

I initially set it to 1, but as long as the Schema has a field, it won't be 0. Setting it to 1 might mask some exceptions. When the value is 0, we can raise an WARN in the log.

orc/src/test/java/org/apache/iceberg/orc/TestEstimateOrcAvgWidthVisitor.java

hililiwei · 2022-03-24T07:55:59Z

Thanks @hililiwei. Left some further comments.

Additionally, is it possible for these changes to be backported to earlier Spark versions in subsequent PRs to make reviewing easier? It's possible I missed some discussion on this, so let me know if so.

reverted old version changes for flink and spark.

dungdm93 · 2022-03-24T15:09:27Z

core/src/main/java/org/apache/iceberg/io/FanoutDataWriter.java

  @Override
  protected FileWriter<T, DataWriteResult> newWriter(PartitionSpec spec, StructLike partition) {
-    // TODO: support ORC rolling writers
-    if (fileFormat == FileFormat.ORC) {


When ORC support rolling writers, fileFormat is using in no where.
Should we deprecated/remove it?

Since these methods involve multiple spark/flink versions, I suggest that a separate PR cleans it up after this is done.

openinx

Looks good to me now.

openinx · 2022-03-28T03:55:46Z

Got this merged now, thanks all for reviewing, and thanks @hililiwei for the contribution !

hililiwei · 2022-03-28T04:10:08Z

Thanks openinx and all for reviewing. 😃
I'm going to start port it to multiple supported flink/spark versions and do some cleanup.

chenwyi2 · 2023-09-06T10:09:49Z

In OrcFileAppender, i found writer.estimateMemory() is 0 and writer.getStripes() is empty, why?

github-actions bot added the core label Dec 21, 2021

hililiwei force-pushed the 3169 branch from 5f9aa17 to c535d20 Compare December 21, 2021 13:28

hililiwei marked this pull request as draft December 22, 2021 03:26

hililiwei force-pushed the 3169 branch 2 times, most recently from 7af31f0 to 8c93b98 Compare December 27, 2021 06:27

github-actions bot added data ORC labels Dec 27, 2021

hililiwei marked this pull request as ready for review December 27, 2021 07:30

rdblue reviewed Jan 18, 2022

View reviewed changes

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 18, 2022

View reviewed changes

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 18, 2022

View reviewed changes

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java Outdated Show resolved Hide resolved

rdblue mentioned this pull request Jan 19, 2022

Core: Orc data file not support to shouldRollToNewFile #3916

Closed

hililiwei force-pushed the 3169 branch from 8c93b98 to a0b0f54 Compare January 19, 2022 03:19

github-actions bot added the build label Jan 19, 2022

hililiwei force-pushed the 3169 branch from a0b0f54 to ab7d183 Compare January 19, 2022 03:20

hililiwei marked this pull request as draft January 19, 2022 05:20

hililiwei marked this pull request as ready for review January 19, 2022 14:05

rdblue reviewed Jan 20, 2022

View reviewed changes

build.gradle Show resolved Hide resolved

hililiwei force-pushed the 3169 branch from ab7d183 to a025882 Compare January 20, 2022 10:42

hililiwei commented Jan 20, 2022

View reviewed changes

orc/src/main/java/org/apache/iceberg/orc/OrcFileAppender.java Outdated Show resolved Hide resolved

hililiwei requested a review from rdblue January 20, 2022 13:17

hililiwei force-pushed the 3169 branch from dc2cc4f to ec7bcf7 Compare March 22, 2022 09:11

hililiwei added 2 commits March 22, 2022 20:10

ORC:ORC supports rolling writers

2484ed5

estimate the average width

3682076

hililiwei force-pushed the 3169 branch from ec7bcf7 to 3682076 Compare March 22, 2022 12:43

UUID

2942865

openinx requested changes Mar 23, 2022

View reviewed changes

openinx reviewed Mar 23, 2022

View reviewed changes

orc/src/test/java/org/apache/iceberg/orc/TestEstimateOrcAveWidthVisitor.java Outdated Show resolved Hide resolved

openinx reviewed Mar 23, 2022

View reviewed changes

orc/src/test/java/org/apache/iceberg/orc/TestEstimateOrcAveWidthVisitor.java Outdated Show resolved Hide resolved

hililiwei force-pushed the 3169 branch from 5e5f2bf to 3820337 Compare March 23, 2022 12:11

Modify the estimation method and add the test case

018ec81

hililiwei force-pushed the 3169 branch from 3820337 to 018ec81 Compare March 23, 2022 12:28

kbendick reviewed Mar 23, 2022

View reviewed changes

openinx reviewed Mar 24, 2022

View reviewed changes

orc/src/test/java/org/apache/iceberg/orc/TestEstimateOrcAvgWidthVisitor.java Outdated Show resolved Hide resolved

revert old version

1547869

add warn log, split Unit Test

fbcf9fe

hililiwei force-pushed the 3169 branch from ae7c8be to fbcf9fe Compare March 24, 2022 09:12

dungdm93 reviewed Mar 24, 2022

View reviewed changes

hililiwei requested a review from openinx March 28, 2022 01:40

openinx approved these changes Mar 28, 2022

View reviewed changes

openinx merged commit 6d39f3c into apache:master Mar 28, 2022

This was referenced Mar 28, 2022

Spark/Flink: ORC support estimated length for unclosed file. #4419

Merged

Spark 3.0: ORC support estimated length for unclosed file. #4420

Closed

Spark 2.4: ORC support estimated length for unclosed file. #4421

Closed

zhongqishang mentioned this pull request Dec 19, 2022

[Bug]: Optimizing Status is incorrectly set apache/amoro#951

Closed

1 task

chenwyi2 mentioned this pull request Aug 31, 2023

spark write orc error: Java heap space #8318

Closed

ORC:ORC supports rolling writers. #3784

ORC:ORC supports rolling writers. #3784

Uh oh!

Conversation

hililiwei commented Dec 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rdblue commented Jan 24, 2022

Uh oh!

hililiwei commented Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rdblue commented Feb 6, 2022

Uh oh!

hililiwei commented Feb 7, 2022

Uh oh!

coolderli commented Feb 7, 2022

Uh oh!

hililiwei commented Feb 16, 2022

Uh oh!

liubo1022126 commented Feb 22, 2022

Uh oh!

hililiwei commented Feb 22, 2022

Uh oh!

hililiwei commented Feb 27, 2022

Uh oh!

openinx commented Mar 23, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openinx Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

hililiwei Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

openinx Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kbendick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kbendick Mar 23, 2022

Choose a reason for hiding this comment

Uh oh!

hililiwei Mar 24, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hililiwei commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dungdm93 Mar 24, 2022

Choose a reason for hiding this comment

Uh oh!

hililiwei Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx left a comment

Choose a reason for hiding this comment

Uh oh!

openinx commented Mar 28, 2022

Uh oh!

hililiwei commented Mar 28, 2022

hililiwei commented Dec 21, 2021 •

edited

Loading

hililiwei commented Jan 27, 2022 •

edited

Loading

hililiwei commented Mar 24, 2022 •

edited

Loading

hililiwei Mar 25, 2022 •

edited

Loading