Skip to content

Conversation

@dungdm93
Copy link
Contributor

@dungdm93 dungdm93 commented Feb 15, 2022

This PR aim to:

  1. replace BaseTaskWriter inner classes (BaseRollingWriter, RollingFileWriter, RollingEqDeleteWriter)
    by the respective implementations of RollingFileWriter interface.
  2. provide single implementation of TaskWriter that can handle both partition & un-partition data (by delegate to the PartitioningWriter)

Here is my approach, from top down:

TaskWriter
    |
    V
PartitioningWriter
    |
    V
RollingFileWriter
    |
    V
FileWriter
    |
    V
FileAppender
  1. TaskWriter is used to handle diffent kinds of record.
    • With Insert-only data, TaskWriter basically call PartitioningWriter.write. See DirectTaskWriter for more details.
    • With delta data, TaskWriter can have 3 PartitioningWriters for insertWriter, equalityDeleteWriter and positionDeleteWriter. So, for each incomming record, base on its type (insert, update or delete), TaskWriter will call corresponding writer to write data. See FlinkTaskWriter for more details.
  2. PartitioningWriter is used to write to multiple specs and partitions.
    Note that for unpartitioned tables, partition = null is passed to PartitioningWriter.write
  3. Internally, PartitioningWriter already use RollingFileWriter for rolling to new file if current file is large. RollingFileWriter is a just wrapper of other FileWriter.
  4. FileWriter is used to write to single file.

Why we need that

  • The scope that inner classes of bounded to outer object (BaseTaskWriter instance) that make the code more complex and hard to read and understand.
  • Currently, the approach of separated class for partition & un-partition data is still OK. But it's better if we can unify it into a single class, this mean:
    • Less effort is needed for new engine
    • No need to create new class when existing engine wanna extends new use-case. For example, now Flink only support fan-out partition which is required for streaming execution mode, but when execution mode is batch, people may prefer using ClusteredPartitionWriter because it take less resources.

Result

With new 2 TaskWriters DirectTaskWriter and FlinkTaskWriter, it can cover all Flink and Spark cases.
Bellow is equal code of Flink's DeltaTaskWriter:

  • write INSERT only data, unpartition table:
// current
taskWriter = new UnpartitionedWriter<>(...);

// new
partitioner = DirectTaskWriter.unpartition();
taskWriter = new DirectTaskWriter<>(partitioner, ...);
  • write INSERT only data, partition table:
// current
class RowDataPartitionedFanoutWriter extends PartitionedFanoutWriter<RowData> {...}
taskWriter = new RowDataPartitionedFanoutWriter(...);

// new
partitioner = FlinkTaskWriter.partitionerFor(spec, schema, flinkSchema);
taskWriter = new DirectTaskWriter<>(partitioner, ...);
  • write both INSERT and DELETE data, unpartition table:
// current
taskWriter = new UnpartitionedDeltaWriter(...);

// new
partitioner = DirectTaskWriter.unpartition();
taskWriter = new FlinkTaskWriter(partitioner, ...);
  • write both INSERT and DELETE data, partition table:
// current
taskWriter = new PartitionedDeltaWriter(...);

// new
partitioner = FlinkTaskWriter.partitionerFor(spec, schema, flinkSchema);
taskWriter = new FlinkTaskWriter(partitioner, ...);

How do I test it

Pass all unit-tests and run with some sample datasets in my local machine.

@dungdm93
Copy link
Contributor Author

cc @aokolnychyi, @rdblue, @jackye1995, @openinx, @stevenzwu, @szehon-ho, @RussellSpitzer

Copy link
Contributor Author

@dungdm93 dungdm93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's breaking change, but only effected if you have custom implementation.
Never the less, those 2 interface just introduce in 0.13, so the number of affected users can be negligible.

* @return PathOffset of written row
*/
void write(T row);
PathOffset write(T row);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a delete, it can have 2 rows in delete files. One for EqualityDelete to delete record in previous snapshot, and one for PositionDelete to delete record in current snapshot. So it's required to track PathOffset of all inserted records in current snapshot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean each abstracted FileWriter will get a PathOffset back when append a newly row ? That does not make sense for me because not every writer need this PathOffset to do the following thing.

Copy link
Contributor Author

@dungdm93 dungdm93 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that not every writer need PathOffset, but it's required for writing delta data (like Flink's DeltaTaskWriter)
For the writer that does not need this, just ignore the return.

@rdblue
Copy link
Contributor

rdblue commented Feb 15, 2022

@dungdm93, I don't think I understand quite what you're trying to do in this PR from the description. Can you add some more detail about what your motivation is and what you're changing? It would probably also help to do this in several smaller PRs.

@dungdm93 dungdm93 force-pushed the refactor-task-writer branch from 1b8b040 to f40f485 Compare February 15, 2022 23:55
@dungdm93
Copy link
Contributor Author

@rdblue sorry for my bad wording. Let's me try to add more details

@dungdm93 dungdm93 force-pushed the refactor-task-writer branch from f40f485 to 0359202 Compare February 16, 2022 04:57
@github-actions github-actions bot added the data label Feb 16, 2022
@dungdm93
Copy link
Contributor Author

Hello @rdblue, I just update the PR description, hope it's clear enough to understand.

public void write(T row) {
public PathOffset write(T row) {
appender.add(row);
return PathOffset.of(location, recordCount++);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Iceberg, we don't use the return value of ++ operators because it is hard to read code that uses them. Can you move the increment to a separate line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

    long offset = recordCount++;
    return PathOffset.of(location, offset);

@RussellSpitzer
Copy link
Member

The description makes sense to me here, It will take me some time though to get through this whole PR, I'll try to set aside time later this week.

@dungdm93 dungdm93 force-pushed the refactor-task-writer branch from da95b4d to 18bc270 Compare February 17, 2022 08:47
@dungdm93 dungdm93 requested a review from rdblue February 17, 2022 09:06
*/
@SuppressWarnings("unchecked")
public void partition(StructLike row) {
public PartitionKey partition(StructLike row) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm think this will produce an api compatibility issue, why do we need to change this basic API ?

Copy link
Contributor Author

@dungdm93 dungdm93 Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a side-change, I made it to align with other StructLike wrappers, StructProjection.wrap, IndexedStructLike.wrap, InternalRowWrapper.wrap,... to name a few.
@openinx Could you please explain how this can make an API compatibility issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downstream users may add this iceberg-api module to their application project, since the PartitionKey is a public API, then their application artifact does include the void partition(StructLike row) . When they upgrade their iceberg-api to the next release version, then it will fail to load the expected void partition(StructLike row). That breaks a user's normal upgrade process and that's why we say it's an API compatibility issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Let's me roll it back.
@openinx could you help me review other changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm currently checking the whole write path. I think I will need one or two day to understand the whole newly introduced writers since 0.13.x. Replacing the old writer API with the new one is a great thing. I think we can collaborate to make this forward. Thanks

* @return PathOffset of written row
*/
void write(T row);
PathOffset write(T row);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean each abstracted FileWriter will get a PathOffset back when append a newly row ? That does not make sense for me because not every writer need this PathOffset to do the following thing.

@dungdm93 dungdm93 force-pushed the refactor-task-writer branch 2 times, most recently from 06f2708 to 3ef9ea5 Compare February 19, 2022 09:05
@dungdm93 dungdm93 force-pushed the refactor-task-writer branch from 3ef9ea5 to 86128e7 Compare February 19, 2022 15:14
@dungdm93 dungdm93 requested a review from openinx February 28, 2022 08:39
Signed-off-by: Đặng Minh Dũng <[email protected]>
@dungdm93 dungdm93 force-pushed the refactor-task-writer branch from 86128e7 to 197b233 Compare March 1, 2022 02:49
import org.apache.iceberg.io.DefaultPartitioningWriterFactory.Type;
import org.apache.iceberg.relocated.com.google.common.base.Preconditions;

public interface PartitioningWriterFactory<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why name it as PartitioningWriterFactory ? I don't see any partition info for those defined interface methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's factory class used to create PartitioningWriters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just add a little docs for better understanding.

dungdm93 added 8 commits March 1, 2022 11:18
Signed-off-by: Đặng Minh Dũng <[email protected]>
Signed-off-by: Đặng Minh Dũng <[email protected]>
to use `DirectTaskWriter` and `FlinkTaskWriter`

Signed-off-by: Đặng Minh Dũng <[email protected]>
Signed-off-by: Đặng Minh Dũng <[email protected]>
@dungdm93 dungdm93 force-pushed the refactor-task-writer branch from 197b233 to 40354bb Compare March 1, 2022 04:22
import org.apache.iceberg.StructLike;
import org.apache.iceberg.util.Tasks;

public class DirectTaskWriter<T> implements TaskWriter<T> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@openinx are you have any naming suggestion for this class, DirectTaskWriter, AppendTaskWriter,...?

@github-actions
Copy link

github-actions bot commented Aug 7, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 7, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 15, 2024
@dungdm93 dungdm93 deleted the refactor-task-writer branch August 15, 2024 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants