Skip to content

Conversation

@jklamer
Copy link
Contributor

@jklamer jklamer commented Apr 26, 2022

Single object writer conforming to the spec: https://avro.apache.org/docs/current/spec.html#single_object_encoding.

2 different implementations to:

  1. To more closely match the current library file writer
  2. To take advantage of the new AvroSchema trait to allow for less boilerplate and more type safety

Jira

Tests

  • Contains a single interop sanity test tested against the java implementation
  • Contains unit tests

Documentation

  • WIP

@jklamer jklamer changed the title [AVRO-3506] Single object writer [AVRO-3506] [rust] Single object writer Apr 26, 2022
@github-actions github-actions bot added Java Pull Requests for Java binding Rust labels Apr 26, 2022
@jklamer jklamer force-pushed the jklamer/SingleObjectWriter branch from 4eca588 to 88c06d6 Compare April 26, 2022 02:49
@jklamer jklamer force-pushed the jklamer/SingleObjectWriter branch from ecf69e4 to b0e3e22 Compare April 26, 2022 02:52
public class TestInteropMessageData {
private static String inDir = System.getProperty("share.dir", "../../../share") + "/test/data/messageV1";
private static File SCHEMA_FILE = new File(inDir + "/test_schema.json");
private static File MESSAGE_FILE = new File(inDir + "/test_message.bin");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the extension is .avro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this was something I wasn't sure about because the binary format is not the same as the object container files. Wanted to be clear that this shouldn't be read by any Avro File Readers

@jklamer
Copy link
Contributor Author

jklamer commented May 1, 2022

@martin-g Are you okay with the general design/direction of the rust avro writer design? I'd like to implement the reader in a similar pattern

@jklamer jklamer requested a review from martin-g May 1, 2022 17:54
}
}

pub(crate) struct ResolvedOwnedSchema {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the code duplication somehow ?
For example by using ResolvedSchema.clone() where needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure. The ResolvedSchema has that lifetime that Im not sure how to associate with another field in the struct.
The design for GenericSingleObjectWriter would have to be

pub struct GenericSingleObjectWriter {
    buffer: Vec<u8>,
    schema: Schema, // with lifetime 'a?
    resolved: ResolvedSchema<'a>,
}

@martin-g
Copy link
Member

martin-g commented May 3, 2022

Are you okay with the general design/direction of the rust avro writer design? I'd like to implement the reader in a similar pattern

I think it looks OK!

A question: since one of the new struct is named Generic*** should the other be named Specific*** ? To be consistent to Java/CSharp ?

@github-actions github-actions bot added the build label May 4, 2022
export RUST_LOG=apache_avro=debug
export RUST_BACKTRACE=1
cargo run --all-features --example generate_interop_data
cargo run --all-features --example generate_interop_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be test_interop_message_data. I will take care!

oandrew and others added 8 commits May 4, 2022 08:17
Related to: apache@72e1135

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
Give a better name to TestGenerateInteropSingleObjectEncoding
Remove useless lifetime in schema.rs
Remove .json files for the single object encoded test file

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@github-actions github-actions bot added C C# C++ Pull Requests for C++ binding JS labels May 4, 2022
@martin-g
Copy link
Member

martin-g commented May 4, 2022

Duh! The rebase went bad ...

@martin-g martin-g closed this May 4, 2022
@martin-g martin-g reopened this May 4, 2022
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g martin-g closed this May 4, 2022
@martin-g martin-g reopened this May 4, 2022
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g martin-g closed this May 4, 2022
@martin-g martin-g reopened this May 4, 2022
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g martin-g closed this May 4, 2022
@martin-g martin-g reopened this May 4, 2022
@martin-g martin-g merged commit 7ba9447 into apache:master May 4, 2022
martin-g added a commit that referenced this pull request May 4, 2022
* Encoer v1 with interop data

* unit tested

* fmt

* Interop tested

* uneed file

* remove bugs

* clippy

* fix README

* rat fix

* Update lang/rust/avro/src/writer.rs

Co-authored-by: Martin Grigorov <[email protected]>

* Update lang/rust/avro/src/writer.rs

Co-authored-by: Martin Grigorov <[email protected]>

* Update lang/rust/avro/src/writer.rs

Co-authored-by: Martin Grigorov <[email protected]>

* Update lang/rust/avro/src/writer.rs

Co-authored-by: Martin Grigorov <[email protected]>

* Update lang/rust/avro/src/writer.rs

Co-authored-by: Martin Grigorov <[email protected]>

* PR changes

* static setup

* Specific rename and interop test in script

* typo

* AVRO-3492: Add support for deriving Schema::Record aliases (#1647)

* AVRO-3492: Add support for deriving Schema::Record aliases

Uses Darling's 'multiple' attribute feature.

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3492: Add a test case with multiple attributes with different values for 'alias' key

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3494: Rust: uncomment some tests which actually pass

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3494: Uncomment a test for recursive types (#1648)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3492: Add logic to derive the aliases for Schema::Enum (#1649)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3415: Add code coverage report support for csharp (#1565)

* AVRO-3360 Updated XML documentation

* Revert "AVRO-3360 Updated XML documentation"

This reverts commit b8601c0.

* AVRO-3415 Add code coverage report support for csharp

* Ignore Updates and package references

* Updated names

* Sorted packages alphabetically

* Mode ReportGenerator instructions for global.

* Update versions.props

* Remove path

* Updated tabbing

* Cleanup version.props

* Add missing settings from version.props

* Updated from tabs to 2 space indents

* Added command in code block

* Fix carriage return

* force carriage return

* Another carriage return

* Added longer path to report

Co-authored-by: Kyle T. Schoonover <[email protected]>

* AVRO-3384: Define C# Coding Style Guidelines (#1534)

* AVRO-3360 Updated XML documentation

* Revert "AVRO-3360 Updated XML documentation"

This reverts commit b8601c0.

* AVRO-3384 Initial check in

* Formatting fix

* Additional formatting

* More formatting

* Added additional rule

* Completed new line rules

* Indentation preferences complete

* Updated header

* Additional formatting

* More formatting changes

* Added spacing options

* Updated wrap options

* Additional documentation for styling

* Updated notes

* Updated more

* Added var preferences and Expression-bodied member preferences

* Initial styling rules documented

* Updated naming rules to reflect Roslyn naming rules

* Added other styling rule callouts.

* Updated Readme

* Updated rule

* Add header template

* Microsoft has a bug for semicolon which makes this not work.

* Added license

* Added note about IDE0055

Co-authored-by: Kyle T. Schoonover <[email protected]>

* AVRO-3424: Added support to parse string into Schema.Type (#1571)

* AVRO-3360 Updated XML documentation

* Revert "AVRO-3360 Updated XML documentation"

This reverts commit b8601c0.

* AVRO-3424 Created extension method for converting string into a Schema.Type enumeration

* Updated functionality

* Removed breaking code

* Updated remove quotes

* Removed if from tests

Co-authored-by: Kyle T. Schoonover <[email protected]>

* AVRO-3003: Fully qualify enum default value in C# code gen (#1596)

* AVRO-3458: Added tests for GenericRecord (#1606)

* AVRO-3360 Updated XML documentation

* Revert "AVRO-3360 Updated XML documentation"

This reverts commit b8601c0.

* AVRO-3458 Added tests for GenericRecord

* Moved Schema to const

* using discard

* Empty

* Add license

Co-authored-by: Kyle T. Schoonover <[email protected]>

* AVRO-2883: Fix namespace mapping (#1610)

* Remove unused package references

* Replace namespace in text schema

* Remove namespace mapping

* Add unit tests

* Match namespace mapping used in ticket

* Make ReplaceMappedNamespacesInSchema private

* Mark NamespaceMapping obsolete

Co-authored-by: Zoltan Csizmadia <[email protected]>

* AVRO-2211: SchemaBuilder equivalent or other means of schema creation (#1597)

* AVRO-2211: Support schema creation

* Add license info to new files

* Fix documentation for FixedSchema ctor

* Remove and sort using

* Add missing brackets and replace var with explicit type

* Fix exception type in case of parsing

* Rename field to follow conventions

* AVRO 2211: Inlining temporary variable in linq

* AVRO-2211: Change exception type and add missing documentations

* AVRO-2211: Fix RecordSchema to set the positions of it's fields, instead of verifying it

* AVRO-2211: Fix RecordSchema fields assignment when creation new RecordSchema

* AVRO-2211: Change constructors of schema classes to factory method

* AVRO-2211: Add unit tests for RecordSchema and EnumSchema

* :AVRO-2211: Remove whitespace

* :AVRO-2211: Add symbol names verification for EnumSchema

* AVRO-2211: Fix enum name validation

* AVRO-2211: Throw AvroException consistently

* AVRO-2211: Throw AvroException in RecrodSchema consistently

* AVRO-2211: Remove duplicate factory methods on MapSchema

* AVRO-2211: Remove redundant parameter doc

* AVRO-2211: Add Schema creation tests

* AVRO-2211: Change ValidateSymbol to throw exception

* AVRO-2211: Fix typo

* AVRO-2211: Fix code QL issues

* AVRO-2211: Fix typo

Co-authored-by: Martin Grigorov <[email protected]>

* AVRO-3841: Try exact schema match first in union type (#1635)

* Try exact schema match

* Fix formatting

* Add tests for exception

Co-authored-by: Zoltan Csizmadia <[email protected]>

* AVRO-3495: Rust: Fields order should not matter (#1650)

* AVRO-3495: The order of the struct's fields and schema's fields should not matter

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3495: Use the lookup table when comparing values against fields by name

Until now it was expected that both the schema fields and the input
values are sorted the same way.

Use BTreeMap instead of HashMap for the lookup table because otherwise
the assertion on the validation error messages is impossible due to
random printing of the map's entries

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3495: Update the test case

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* Bump slf4j.version from 1.7.33 to 1.7.36 in /lang/java (#1646)

Bumps `slf4j.version` from 1.7.33 to 1.7.36.

Updates `slf4j-api` from 1.7.33 to 1.7.36
- [Release notes](https://github.com/qos-ch/slf4j/releases)
- [Commits](qos-ch/slf4j@v_1.7.33...v_1.7.36)

Updates `slf4j-simple` from 1.7.33 to 1.7.36
- [Release notes](https://github.com/qos-ch/slf4j/releases)
- [Commits](qos-ch/slf4j@v_1.7.33...v_1.7.36)

Updates `slf4j-log4j12` from 1.7.33 to 1.7.36
- [Release notes](https://github.com/qos-ch/slf4j/releases)
- [Commits](qos-ch/slf4j@v_1.7.33...v_1.7.36)

---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-api
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.slf4j:slf4j-simple
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: org.slf4j:slf4j-log4j12
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* AVRO-3491 Avoid a cast after is check (#1645)

* AVRO-3360 Updated XML documentation

* Revert "AVRO-3360 Updated XML documentation"

This reverts commit b8601c0.

* AVRO-3491 Avoid a cast after is check

Co-authored-by: Kyle T. Schoonover <[email protected]>

* AVRO-3496: Rust: Use visitor.visit_borrowed_str() when possible (#1652)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3477: Add unit tests for logical types with fixed base type (#1629)

* Support fixed base type for logical types

* Tweak

* Revert

* Fix fixed type definition

* Add AvroGen tests

Co-authored-by: Zoltan Csizmadia <[email protected]>

* AVRO-3465: Add avrogen protocol tests (#1616)

* Add avrogen protocol tests

* Add protocol test case

* Fix merge conflicts

Co-authored-by: Zoltan Csizmadia <[email protected]>

* AVRO-3484: Add support for deriving a default value for a record field (#1651)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3497 Simplify conditional expression (#1658)

* AVRO-3497 Simplify conditional expression

* Added null check back

* Updated tests

* AVRO-3500: Use property-based testing for the IT tests in avro_derive module (#1659)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* Configure Dependabot to check for Rust updates daily

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3501: Rust: Cache ~/.cargo and target folder for faster builds (#1661)

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* Avro 3502 logical type wrong order (#1664)

* AVRO-3501: Rust: Cache ~/.cargo and target folder for faster builds

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3502: Rust: Wrong [ORDER] for Parsing Canonical Form

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* Update uuid requirement from 0.8.2 to 1.0.0 in /lang/rust (#1660)

* Update uuid requirement from 0.8.2 to 1.0.0 in /lang/rust

Updates the requirements on [uuid](https://github.com/uuid-rs/uuid) to permit the latest version.
- [Release notes](https://github.com/uuid-rs/uuid/releases)
- [Commits](uuid-rs/uuid@0.8.2...1.0.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* Issue #1660 - Fix compilation errors after updating uuid crate from 0.8 to 1.0

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Martin Tzvetanov Grigorov <[email protected]>

* Bump jmh.version from 1.34 to 1.35 in /lang/java (#1662)

Bumps `jmh.version` from 1.34 to 1.35.

Updates `jmh-core` from 1.34 to 1.35

Updates `jmh-generator-annprocess` from 1.34 to 1.35

---
updated-dependencies:
- dependency-name: org.openjdk.jmh:jmh-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.openjdk.jmh:jmh-generator-annprocess
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump zstd-jni from 1.5.1-1 to 1.5.2-2 in /lang/java (#1663)

Bumps [zstd-jni](https://github.com/luben/zstd-jni) from 1.5.1-1 to 1.5.2-2.
- [Release notes](https://github.com/luben/zstd-jni/releases)
- [Commits](luben/zstd-jni@v1.5.1-1...v1.5.2-2)

---
updated-dependencies:
- dependency-name: com.github.luben:zstd-jni
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump libthrift from 0.15.0 to 0.16.0 in /lang/java (#1665)

Bumps [libthrift](https://github.com/apache/thrift) from 0.15.0 to 0.16.0.
- [Release notes](https://github.com/apache/thrift/releases)
- [Changelog](https://github.com/apache/thrift/blob/master/CHANGES.md)
- [Commits](apache/thrift@v0.15.0...v0.16.0)

---
updated-dependencies:
- dependency-name: org.apache.thrift:libthrift
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* AVRO-3498 Deprecate NameCtorKey (#1657)

* AVRO-3490 Updated to use throw expressions (#1644)

* AVRO-3360 Updated XML documentation

* Revert "AVRO-3360 Updated XML documentation"

This reverts commit b8601c0.

* AVRO-3490 Updated to use throw expressions

* Additional expressions

Co-authored-by: Kyle T. Schoonover <[email protected]>

* Bump grpc.version from 1.45.0 to 1.45.1 in /lang/java (#1671)

Bumps `grpc.version` from 1.45.0 to 1.45.1.

Updates `grpc-core` from 1.45.0 to 1.45.1
- [Release notes](https://github.com/grpc/grpc-java/releases)
- [Commits](grpc/grpc-java@v1.45.0...v1.45.1)

Updates `grpc-stub` from 1.45.0 to 1.45.1
- [Release notes](https://github.com/grpc/grpc-java/releases)
- [Commits](grpc/grpc-java@v1.45.0...v1.45.1)

Updates `grpc-netty` from 1.45.0 to 1.45.1
- [Release notes](https://github.com/grpc/grpc-java/releases)
- [Commits](grpc/grpc-java@v1.45.0...v1.45.1)

---
updated-dependencies:
- dependency-name: io.grpc:grpc-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: io.grpc:grpc-stub
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: io.grpc:grpc-netty
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump underscore from 1.13.2 to 1.13.3 in /lang/js (#1669)

Bumps [underscore](https://github.com/jashkenas/underscore) from 1.13.2 to 1.13.3.
- [Release notes](https://github.com/jashkenas/underscore/releases)
- [Commits](jashkenas/underscore@1.13.2...1.13.3)

---
updated-dependencies:
- dependency-name: underscore
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* AVRO-3484: Followup Check default json parsing at compile time for derive macro  (#1668)

* check json parsing at compile time

* fmt

* AVRO-3427: skip creation of namespace directories for csharp schema (#1578)

* Add new argument parameter --skip-directories. It will skip creation of directories for namespace. Just generate classes in output directory

* Add missing doc param description

* Fix Unit tests after merge with master

* Fix Unit tests after merge with master

* C# Add unit tests for --skip-directories option

Co-authored-by: Pawel Kordowski <[email protected]>

* AVRO-3482: Reuse MAGIC in DataFileReader (#1639)

DataFileReader reads magic information twice. seek(0) is invoked
twice due to this. In cloud object stores, seeking back to 0 will
cause it to fall back to "random IO policy". Example of this is
S3A connector for s3. This causes suboptimal reads in object stores.
Refactoring in the patch addresses this case by reusing MAGIC.

* AVRO-2870: Avoid throwing from destructor in DataFileWriterBase (#921)

Co-authored-by: Thiruvalluvan M G <[email protected]>

* Updated the checksum for PHP composer download (#1677)

* Remove trailing ^M to make Git happy

Related to: 72e1135

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* Encoer v1 with interop data

* unit tested

* fmt

* AVRO-3506: Cleanup and minor improvements

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3506: Cleanup

Give a better name to TestGenerateInteropSingleObjectEncoding
Remove useless lifetime in schema.rs
Remove .json files for the single object encoded test file

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3506: Add licence header to TestGenerateInteropSingleObjectEncoding

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3506: Fix spotless issues in the new Java test classes

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3506: Fix the path to the schema file

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3506: Fix the id to match the expected value

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

* AVRO-3506: Fix spotless again

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

Co-authored-by: Martin Grigorov <[email protected]>
Co-authored-by: Martin Tzvetanov Grigorov <[email protected]>
Co-authored-by: Kyle Schoonover <[email protected]>
Co-authored-by: Kyle T. Schoonover <[email protected]>
Co-authored-by: Jose Massada <[email protected]>
Co-authored-by: Zoltan Csizmadia <[email protected]>
Co-authored-by: Zoltan Csizmadia <[email protected]>
Co-authored-by: yanivru <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kordos <[email protected]>
Co-authored-by: Pawel Kordowski <[email protected]>
Co-authored-by: rbalamohan <[email protected]>
Co-authored-by: Andrew Onyshchuk <[email protected]>
Co-authored-by: Thiruvalluvan M G <[email protected]>
(cherry picked from commit 7ba9447)
@martin-g
Copy link
Member

martin-g commented May 4, 2022

Thank you, @jklamer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build C++ Pull Requests for C++ binding C# C Java Pull Requests for Java binding JS Rust

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants