Skip to content

[Improve] improve avro format convert#6082

Merged
Hisoka-X merged 4 commits into
apache:devfrom
liunaijie:avro-array
Jan 5, 2024
Merged

[Improve] improve avro format convert#6082
Hisoka-X merged 4 commits into
apache:devfrom
liunaijie:avro-array

Conversation

@liunaijie
Copy link
Copy Markdown
Member

@liunaijie liunaijie commented Dec 26, 2023

Purpose of this pull request

this pr #5084 bring in avro format to kafka.
But there has some issue when type convert, and the previous CI has't verify the convert issue.
So this pr has some improvemant about the avro type convert. and update the CI to check converted data value and type consistent.

Does this PR introduce any user-facing change?

How was this patch tested?

use the exist e2e

Check list

@liunaijie liunaijie marked this pull request as ready for review December 26, 2023 05:27
@zhilinli123
Copy link
Copy Markdown
Contributor

Add end to end data End to end test , refer to checkDebeziumFormat()
https://github.com/liunaijie/seatunnel/blob/avro-array/seatunnel-e2e/seatunnel-connector-v2-e2e/connector-kafka-e2e/src/test/java/org/apache/seatunnel/e2e/connector/kafka/KafkaFormatIT.java

@liunaijie liunaijie force-pushed the avro-array branch 4 times, most recently from 04ed0db to 4474b70 Compare December 27, 2023 13:44
@liunaijie liunaijie force-pushed the avro-array branch 2 times, most recently from 886a416 to f158cf1 Compare December 28, 2023 06:54
Comment on lines +85 to +99
},
{
field_name = c_string
field_type = string
field_value = [
{
rule_type = MIN_LENGTH
rule_value = 6
},
{
rule_type = MAX_LENGTH
rule_value = 6
}
]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still hope to do an end-to-end test on all fields to ensure that the data is not only consistent in the amount of data, but also to ensure that each data and field are consistent, and it is best to write the data to the database for a Check
cc @hailin0 @Hisoka-X

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert sink can't do some equal check, so i add the data check in code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert sink can't do some equal check, so i add the data check in code

But you can write it to a database such as pg and then read it to make sure the input and output are consistent,The type is only one aspect of the test and we must make sure that the data values are exactly the same

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, so i add the data check after job finish, read data from kafka topic and verify the data consistent

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, so i add the data check after job finish, read data from kafka topic and verify the data consistent

ok for me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhilinli123 I update the code.
In this update, it can verify all field value and type. also can test the serialize (ST row -> Avro Generic Record -> bytes ) and deserialize (bytes -> Avro Generic Record -> ST Row ). I think this test can cover this feature.

@liunaijie liunaijie force-pushed the avro-array branch 2 times, most recently from c4389c8 to 45f1d74 Compare December 28, 2023 13:04
@zhilinli123
Copy link
Copy Markdown
Contributor

Add a reference to the description of the document information:
https://github.com/apache/seatunnel/blob/dev/docs/en/connector-v2/formats/ogg-json.md

@zhilinli123
Copy link
Copy Markdown
Contributor

Wait CI, looks good, can you add documentation about avro for this
@liunaijie

Copy link
Copy Markdown
Contributor

@zhilinli123 zhilinli123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zhilinli123
Copy link
Copy Markdown
Contributor

PTAL: @hailin0 @Hisoka-X

Comment thread docs/en/connector-v2/formats/avro.md Outdated
Comment thread docs/en/connector-v2/formats/avro.md Outdated
@Hisoka-X
Copy link
Copy Markdown
Member

Hisoka-X commented Jan 5, 2024

Could you give us more detail information?
image

Hisoka-X
Hisoka-X previously approved these changes Jan 5, 2024
@github-actions github-actions Bot removed the approved label Jan 5, 2024
@liunaijie
Copy link
Copy Markdown
Member Author

@Hisoka-X @zhilinli123 added another commit to fix bytes type convert. and add type check in fake source result 7235aed

@hailin0
Copy link
Copy Markdown
Member

hailin0 commented Jan 5, 2024

waiting for ci ok

@Hisoka-X Hisoka-X merged commit fd971b9 into apache:dev Jan 5, 2024
@liunaijie liunaijie deleted the avro-array branch March 4, 2024 05:54
chaorongzhi pushed a commit to chaorongzhi/seatunnel that referenced this pull request Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants