Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3984 [C++] Improve code generated for unions #3047

Merged

Conversation

hwse
Copy link
Contributor

@hwse hwse commented Jul 25, 2024

What is the purpose of the change

AVRO-3984

This pull request should improve the code generated by avrogencpp for union types. The improvements are:

Getter return a reference

Previously the getters for a union branch returned a value:

std::map<std::string, float > get_map() const;

This lead to a complete copy of the map, which may be really expensive depending on the use case.

In this pull request the getter return a reference instead, which allows the user to choose if a copy should be done.

const std::map<std::string, float >& get_map() const;

Additionally a getter is created, that returns a mutable reference. This can be useful if a value needs to be added to a map or array.

std::map<std::string, int32_t >& get_map();

Setters can take a r-value reference

Currently the setter of a union takes a reference to the value that should be set. This forces the user to copy the value here.

void set_map(const std::map<std::string, int32_t >& v);

With this pull request a alternative overload is provided that takes a r-value reference. Now copies can be avoided, since the cheaper move assignment is called.

void set_map(std::map<std::string, int32_t >&& v);

Additional Branch enum for each union type

Currently the idx() method is available to check which branch of a union is set.

size_t idx() const { return idx_; }

The issue is that the user needs to know which size_t value matches to which branch. This might lead to issues if values are inserted in the middle of a union and the indices change.

To avoid such issues a enum is generated that maps the branch types to the index. For a union of null, map and float this will look like this:

enum class Branch: size_t {
    null = 0,
    map = 1,
    float_ = 2,
};

The user can then cast the value return by idx() to this enum and write a switch case statement

const auto myunion_branch = static_cast<testgen::RootRecord::myunion_t::Branch>(decoded_record.myunion.idx());
switch (myunion_branch) {
    case testgen::RootRecord::myunion_t::Branch::null:
        std::cout << "null" << std::endl;
        break;
    case testgen::RootRecord::myunion_t::Branch::map:
        std::cout << "map" << std::endl;
        break;
    case testgen::RootRecord::myunion_t::Branch::float_:
        std::cout << "float: " << decoded_record.myunion.get_float() << std::endl;
        break;
}

Verifying this change

I added unit tests for the new generated functions and enum class. Also existing test still pass.

Documentation

  • Does this pull request introduce a new feature? no
    • the setters and getters are changed in a way that should be backwards compatible, so no new feature i think
    • i would not consider the new enum a real feature
  • If yes, how is the feature documented? -

…ead of a value to avoid calling copy constructor of large classes
@github-actions github-actions bot added the C++ Pull Requests for C++ binding label Jul 25, 2024
hwse added 5 commits July 25, 2024 15:47
…e reference. This allows the user to modify values in union branches after creation (apache#3047)
…anch names to the corresponding index. This allows the user to avoid checks against "magic numbers" (apache#3047)
@martin-g
Copy link
Member

@mkmkme @Gerrit0

@mkmkme
Copy link
Contributor

mkmkme commented Jul 26, 2024

@mkmkme @Gerrit0

Sorry I'm on a holiday right now and will be available from Monday

}
os_ << " };\n";

os_ << " size_t idx() const { return idx_; }\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth including a branch method which returns static_cast<Branch>(idx_)? In code I have that uses Avro I always end up doing this anyways to switch over the enum for branches.

Copy link
Contributor Author

@hwse hwse Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, i added the method. Thanks for the quick feedback!

@Gerrit0
Copy link
Contributor

Gerrit0 commented Jul 26, 2024

i would not consider the new enum a real feature

I would! It means I can get rid of the enums that are manually maintained alongside the avro file today. I think the release note is sufficient documentation though, anyone new to using avro will already end up looking at the generated header and see it.

…h enum directly, this avoids a manual static_cast (apache#3047)
@hwse
Copy link
Contributor Author

hwse commented Aug 20, 2024

@mkmkme @Gerrit0 There was no activity on this pull request lately, is there anything that i still need to do?

Copy link
Contributor

@mkmkme mkmkme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything alarming from the code. But I never used avro in projects, so my vote can be considered as a half of the vote :)

@martin-g martin-g merged commit f350a8f into apache:main Aug 21, 2024
4 checks passed
@martin-g
Copy link
Member

Thank you, @hwse !
And @mkmkme & @Gerrit0 for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Pull Requests for C++ binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants