-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code generation: how to avoid naming collisions #1118
Comments
Option 1:
|
Option 1.5:
|
Option 2:
|
Option 3: do nothing
Cons:
|
Option 4:
|
When 2 different symbols in semantic conventions generate the same symbol on the generated code, To minimize the overall impact, it is preferable that only symbols that are the least frequent gets to be broken. Any solution that changes the output for So, solution 1.5 is definitively out in my opinion, it will break everything. Assuming semantic conventions containing the
[Edit] Looks like I misunderstood option 2. Sounds viable if these collisions are forbidden, and it avoids causing renames with the current generated code. +1 then Option 4 does not scale in my opinion, after several renames, and it assumes there is some "state" stored somewhere to remember if a collision ever existed in the past or not. Does not sound viable. Option 3 does not resolve the issue, this is bound to be repeated. For new semantic conventions, a desirable goal is to be able to decide the most natural name for a given semconv, without having to look at possible collisions, that would prevent to use a good name ... especially when the colliding semconv is deprecated and will eventually be removed over time. The only viable solution looks like option 1 or option 2. [EDIT 2024-06-11] See newer proposal as option 5 in this thread, which is better to 1 and 2 in my opinion. |
Option 2 seems the preferable approach, but just to confirm this means that any semconv generated code after this is implemented will cause a breaking change with previous versions as the variables will have been renamed, correct? I'm specifically trying to understand what this looks like in the context of the message client id change, does option 2 mean that the client ID change will be rolled back? |
just to check, does that mean that |
According to the Option2, there will be no collision for
So I don't think we need to roll it back.
That's correct. Any option we pick (except opt3) will result in breaking changes in generated code. In most languages this is tolerable since semconv artifact is experimental. |
Another point that in this specific case: The new attribute To me, option 2 is the one that makes the most sense. |
It will not. Existing code using |
Ah yes, that for sure. I think I didn't do a good job explaining, but what I was after is: whatever we do, the old const name can't carry the new attribute name. Because as you said, that wouldn't break, and be even worse because it would start sending the new attribute name. |
See my previous comment:
I don't think there is any way around this. Failure to update the name will use the new semantic convention, so using |
You're right, seems I also misunderstood option 2. I somehow thought the new one would be changed and not be used automatically, I see it now. I think this is rather bad :(. It will be hard to not overlook something and end up with this wrong situation of old instrumentation producing/using the new attributes while not fully yet converted to the stable conventions. The only way to not have this, would be to break both old and new generated consts. But breaking the mapping of |
In JS we're just marking everything currently exported by the package as deprecated and keeping it until we go to 2.0. We're exporting all the new names with a new style anyway to be friendlier to tree shakers and code minifiers (important in JS) so any breaking name changes due to a change in code generation are fine. If it happens again we'll just rev the major version as this package is entirely separate from the rest of the JS client. In short: option 2 is fine for JS and is my preference |
Here's what the option 2 codegen looks like in JS open-telemetry/opentelemetry-js@dbb8328 note: we aren't merging any changes until this issue is resolved, this is just to prototype and make sure we're ready to move quickly when a final decision is made. |
There is an overwhelming support for option 2:
We're working on the tooling updates and have a couple of draft PRs which show how things would look like:
Before we ship tooling update, we'd like to get any last-minute feedback from everyone interested, specifically @lzchen, @jack-berg, @trask on the above PRs. |
Discussed at today's Java SIG and option 2 was the preference. Thanks! |
Option 2 does not look good in the real life:
open-telemetry/semantic-conventions-java#75
I'd like to go back and explore Option 1 ( Option 3+ may look like:
We can also prohibit |
Here's more details on the Option 3.5:
Pros:
Cons:
Prototypes:
|
I think option 3.5 with prohibiting The biggest question IMO would be then what to do about the collisions that have already happened. We can probably just accept the renames that have already happened as an accident of history. |
i'm feeling prohibiting consider |
I don't understand how this would work, what is the code generation supposed to do then:
To expand on this, if the plan is to allow deprecated and non-deprecated semconv to coexist when there is a collision, "ignoring the deprecated", how about removing (instead of deprecating) the old semconv ... removal will surely resolve the collision problem. |
I'll take this (back) to the Java SIG on Thursday. I suspect that we may prefer this trade off:
note that Java already recommends that libraries make copies of experimental attributes in order to avoid the diamond dependency problem:
see open-telemetry/semantic-conventions-java#50 (comment) for a bit more background on this recommendation:
|
Discussed at Semconv and Maintainers calls:
One of the possible solutions includes a grace period during which |
List of existing (non-deprecated) attributes with _
Some examples where
... I believe we don't always know whether more attributes about something (e.g. request) are expected and default to adding an attribute with Some vague semconv guidance could be:
|
Changing code generation rules in general for OPTION 5 How about:
For example:
Taking C++ as an example, code generation will look like:
[EDIT 2024-06-11] Alternate example, if we want to change the new name instead:
Taking C++ as an example, code generation will look like:
The major benefit I see is that existing semconv like:
are unchanged. Only symbols that actually collide need special treatment, and only user code that depends on these symbols needs to change (to use What |
@lmolkova Please see proposal for option 5 above. |
@marcalff thanks! It does not make send to generate the deprecated attribute differently - nobody will update their code to point to a deprecated attribute - we don't want them to. I.e. having But I really like the idea of manual resolution:
More context on this here: open-telemetry/semantic-conventions-java#75 (comment) and here's the prototype: open-telemetry/semantic-conventions-java#76 - effectively there is a way to remap constant name and/or remove attribute. TL;DR: SIGs decide to drop the deprecated attribute or assign a different constant name - they pick a new constant name in case of a collision. |
I don't understand. I was hopping for a solution where the alternate name comes from metadata added in the semantic convention for collisions, so that the code generation scripts do not have to be changed all the time to account for special cases as they arise. Now it looks like each SIG, in their code generation scripts, it to add code for special cases. Is the conclusion really that SIGs are on their own and must fix collisions created "upstream" in semantic conventions ? In this case, opentelemetry-cpp already complies: it drops the deprecated |
Assuming there is an alternative constant name in the schema, would you drop the attribute or would you generate both attributes (with different names) in C++? The prescriptive solution coming from semconv would be:
Jinja scripts would need to be changed to either drop based one
The templates need to be changed now to support dropping/renaming. If a collision happens again, the change would be to add another attribute name to the list of excluded attributes (or to a manually maintained map). E.g. this change is java open-telemetry/semantic-conventions-java#76 adds all you need to either drop or rename constants by making a small config change.
I think it's a path forward and not a conclusion. Manual resolution seems like an easy and cheap thing to do to solve this and similar collisions in the same way. We can automate it, but it's still not clear to me what are all the options we need to provide. Note: TL;DR:
|
Discussed at Semconv WG 6/17
So:
|
Has there been any resolution on this? |
Yes, we're adding collision checks - #1209 - necessary tooling changes just landed. |
As we discovered in #1031, certain semantic conventions changes (together with default code generator behavior) result in ambiguous constant names generated in the code.
For example, when
foo.bar_baz
is renamed tofoo.bar.baz
, the code generator produces the same constant name for both -FOO_BAR_BAR
,FooBarBaz
, etc depending on the preferred casing.We don't remove attributes/enum members/metrics from the semantic conventions anymore (old property is deprecated), so both constants would exist in the same version.
So, the semantic conventions together with the codegen should prevent such collisions from happening:
Please comment/vote on specific options listed in the comments.
The text was updated successfully, but these errors were encountered: