-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Support _deleted metadata column in vectorized read #4888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f8889b7
15d8b72
f1dfb30
aa919f6
685f94e
e3cbd97
a40d6b1
d4e90bd
a81ebc6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,8 +70,7 @@ public class TestSparkParquetReadMetadataColumns { | |
| private static final Schema PROJECTION_SCHEMA = new Schema( | ||
| required(100, "id", Types.LongType.get()), | ||
| required(101, "data", Types.StringType.get()), | ||
| MetadataColumns.ROW_POSITION, | ||
flyrain marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| MetadataColumns.IS_DELETED | ||
| MetadataColumns.ROW_POSITION | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need this change since the class VectorizedReaderBuilder is shared by all spark versions. The change in line 94 of VectorizedReaderBuilder changes the type of the reader as the following code shows. Then, the read throws exception in the method |
||
| ); | ||
|
|
||
| private static final int NUM_ROWS = 1000; | ||
|
|
@@ -104,7 +103,6 @@ public class TestSparkParquetReadMetadataColumns { | |
| } | ||
| row.update(1, UTF8String.fromString("str" + i)); | ||
| row.update(2, i); | ||
| row.update(3, false); | ||
| EXPECTED_ROWS.add(row); | ||
| } | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, | ||
| * software distributed under the License is distributed on an | ||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| * KIND, either express or implied. See the License for the | ||
| * specific language governing permissions and limitations | ||
| * under the License. | ||
| */ | ||
|
|
||
| package org.apache.iceberg.spark.data.vectorized; | ||
|
|
||
| import org.apache.iceberg.arrow.vectorized.VectorHolder; | ||
| import org.apache.iceberg.arrow.vectorized.VectorHolder.ConstantVectorHolder; | ||
| import org.apache.iceberg.types.Types; | ||
| import org.apache.spark.sql.vectorized.ColumnVector; | ||
|
|
||
| class ColumnVectorBuilder { | ||
| private boolean[] isDeleted; | ||
| private int[] rowIdMapping; | ||
|
|
||
| public ColumnVectorBuilder withDeletedRows(int[] rowIdMappingArray, boolean[] isDeletedArray) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I feel we better make this a constructor and pass these arrays only once during the construction.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am trying to make the builder more generic so that it can also be used for creation of vectors without deletes.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I see now. Then it is fine. |
||
| this.rowIdMapping = rowIdMappingArray; | ||
| this.isDeleted = isDeletedArray; | ||
| return this; | ||
| } | ||
|
|
||
| public ColumnVector build(VectorHolder holder, int numRows) { | ||
| if (holder.isDummy()) { | ||
| if (holder instanceof VectorHolder.DeletedVectorHolder) { | ||
| return new DeletedColumnVector(Types.BooleanType.get(), isDeleted); | ||
| } else if (holder instanceof ConstantVectorHolder) { | ||
| return new ConstantColumnVector(Types.IntegerType.get(), numRows, | ||
| ((ConstantVectorHolder<?>) holder).getConstant()); | ||
| } else { | ||
| throw new IllegalStateException("Unknown dummy vector holder: " + holder); | ||
| } | ||
| } else if (rowIdMapping != null) { | ||
| return new ColumnVectorWithFilter(holder, rowIdMapping); | ||
| } else { | ||
| return new IcebergArrowColumnVector(holder); | ||
| } | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.