Skip to content

Conversation

@xxlaykxx
Copy link

@xxlaykxx xxlaykxx commented Jun 20, 2023

Added new method rootAsMap() to ComplexWriter and implement it in ComplexWriterImpl for supporting @output map type. Need this to unlock ARRAY_FREQUENCY function.

Previously in dremio side:

When i trying to return map like @output ComplexWrite
with this code:

org.apache.arrow.vector.complex.writer.BaseWriter.MapWriter mapWriter = out.rootAsList().map(false);

mapWriter.startMap();
for (java.util.Map.Entry<java.lang.Integer, java.lang.Integer> element : map.entrySet()) {
    mapWriter.startEntry();
    mapWriter.key().integer().writeInt((Integer) element.getKey());
    mapWriter.value().integer().writeInt((Integer) element.getValue());
    mapWriter.endEntry();
}
mapWriter.endMap();

It use UnionMapWriter and generate schema like:
EXPR$0: Map(false)<$data$: Union(Sparse, [1, 39])<struct: Struct<key: Int(32, true) not null, value: Int(32, true) not null> not null, map: Map(false)<entries: Struct<key: Int(32, true) not null, value: Int(32, true)> not null>>>
But in OutputDerivation impl class where i should create output Complete type

List<Field> children = Arrays.asList( CompleteType.INT.toField("key", false), CompleteType.INT.toField("value", false));
return new CompleteType(CompleteType.MAP.getType(), CompleteType.struct(children).toField(MapVector.DATA_VECTOR_NAME, false));

(This is only one valid case, because MapVector.initializeChildrenFromFields())
return
EXPR$0::map<key::int32, value::int32> I found a place where it start using union - PromotableWriter.promoteToUnion.
And in the end i have

SCHEMA_CHANGE ERROR: Schema changed during projection. Schema was 
schema(EXPR$0::map<key::int32, value::int32>)
 but then changed to 
schema(EXPR$0::map<struct::struct<key::int32, value::int32>, map::map<key::int32, value::int32>>)

@xxlaykxx xxlaykxx requested review from DenisTarasyuk and lriggs June 20, 2023 16:45
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@xxlaykxx xxlaykxx closed this by deleting the head repository Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant