Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) #15426

Merged
merged 3 commits into from
May 23, 2022

Conversation

nicoloboschi
Copy link
Contributor

Motivation

In case of the inbound message is structured (Avro or JSON) the fields order may change overtime and there's no fields order guarantee from the Pulsar function framework.
The generated _id field of the document is supposed to be the same regardless the input message key fields order.

Modifications

  • New option canonicalKeyFields (boolean, default false) to sort the key fields. Both for JSON and Avro we have to parse and rewrite the entire payload. It may increase the CPU overhead even if the sort is only performed on the keys that MUST be lower to 512 bytes in order to suit in the ElasticSearch _id field.
  • doc

* [feat][elasticsearch-sink] Option to output canonical JSON

* style

* fix test

* style
@github-actions github-actions bot added the doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. label May 4, 2022
nicoloboschi added a commit to datastax/pulsar that referenced this pull request May 10, 2022
)

* [feat][elasticsearch-sink] Option to output canonical JSON

* style

* fix test

* style

(cherry picked from commit e3a16b7)
@Anonymitaet
Copy link
Member

can you rename your PR? eg. [feature][doc] xx

guideline: https://docs.google.com/document/d/1d8Pw6ZbWk-_pCKdOmdvx9rnhPiyuxwq60_TrD68d7BA/edit#heading=h.wu6ygjne8e35

@nicoloboschi nicoloboschi changed the title [feat][elasticsearch-sink] Option to output canonical key fields (JSON and Avro) [feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) May 10, 2022
@nicoloboschi
Copy link
Contributor Author

@lhotari @dlg99 @eolivelli PTAL

@nicoloboschi nicoloboschi added this to the 2.11.0 milestone May 23, 2022
@nicoloboschi nicoloboschi merged commit a4488fd into apache:master May 23, 2022
@nicoloboschi nicoloboschi deleted the es-sink-canonical-json branch May 23, 2022 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connector doc Your PR contains doc changes, no matter whether the changes are in markdown or code files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants