[feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) #15426

nicoloboschi · 2022-05-04T12:25:41Z

Motivation

In case of the inbound message is structured (Avro or JSON) the fields order may change overtime and there's no fields order guarantee from the Pulsar function framework.
The generated _id field of the document is supposed to be the same regardless the input message key fields order.

Modifications

New option canonicalKeyFields (boolean, default false) to sort the key fields. Both for JSON and Avro we have to parse and rewrite the entire payload. It may increase the CPU overhead even if the sort is only performed on the keys that MUST be lower to 512 bytes in order to suit in the ElasticSearch _id field.

doc

* [feat][elasticsearch-sink] Option to output canonical JSON * style * fix test * style

site2/docs/io-elasticsearch-sink.md

Co-authored-by: momo-jun <[email protected]>

) * [feat][elasticsearch-sink] Option to output canonical JSON * style * fix test * style (cherry picked from commit e3a16b7)

Anonymitaet · 2022-05-10T09:53:02Z

can you rename your PR? eg. [feature][doc] xx

guideline: https://docs.google.com/document/d/1d8Pw6ZbWk-_pCKdOmdvx9rnhPiyuxwq60_TrD68d7BA/edit#heading=h.wu6ygjne8e35

nicoloboschi · 2022-05-17T09:39:17Z

@lhotari @dlg99 @eolivelli PTAL

[feat][elasticsearch-sink] Option to output canonical JSON

e3a16b7

* [feat][elasticsearch-sink] Option to output canonical JSON * style * fix test * style

github-actions bot assigned nicoloboschi May 4, 2022

github-actions bot added the doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. label May 4, 2022

correct doc

1abe61a

nicoloboschi mentioned this pull request May 4, 2022

[feat][elasticsearch] Add hashed id support #15428

Merged

1 task

nicoloboschi requested review from eolivelli and dlg99 May 5, 2022 08:12

Anonymitaet reviewed May 5, 2022

View reviewed changes

site2/docs/io-elasticsearch-sink.md Outdated Show resolved Hide resolved

momo-jun reviewed May 10, 2022

View reviewed changes

site2/docs/io-elasticsearch-sink.md Outdated Show resolved Hide resolved

Update site2/docs/io-elasticsearch-sink.md

e6c49b2

Co-authored-by: momo-jun <[email protected]>

nicoloboschi changed the title ~~[feat][elasticsearch-sink] Option to output canonical key fields (JSON and Avro)~~ [feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) May 10, 2022

nicoloboschi added the area/connector label May 23, 2022

nicoloboschi added this to the 2.11.0 milestone May 23, 2022

eolivelli approved these changes May 23, 2022

View reviewed changes

nicoloboschi merged commit a4488fd into apache:master May 23, 2022

nicoloboschi deleted the es-sink-canonical-json branch May 23, 2022 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) #15426

[feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) #15426

nicoloboschi commented May 4, 2022

Anonymitaet commented May 10, 2022

nicoloboschi commented May 17, 2022

[feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) #15426

[feature][connector] ElasticSearch Sink: option to output canonical key fields (JSON and Avro) #15426

Conversation

nicoloboschi commented May 4, 2022

Motivation

Modifications

Anonymitaet commented May 10, 2022

nicoloboschi commented May 17, 2022