Skip to content

release: flipping fatal-by-defaults#8847

Closed
alyssawilk wants to merge 6 commits intoenvoyproxy:masterfrom
alyssawilk:flips
Closed

release: flipping fatal-by-defaults#8847
alyssawilk wants to merge 6 commits intoenvoyproxy:masterfrom
alyssawilk:flips

Conversation

@alyssawilk
Copy link
Contributor

@alyssawilk alyssawilk commented Nov 1, 2019

Making the following configs fatal by default:
"envoy.deprecated_features.route.proto:regex_match",
"envoy.deprecated_features.route.proto:regex",
"envoy.deprecated_features.cds.proto:tls_context",
"envoy.deprecated_features.route.proto:allow_origin",
"envoy.deprecated_features.string.proto:regex",
"envoy.deprecated_features.route.proto:value",
"envoy.deprecated_features.http_connection_manager.proto:idle_timeout",
"envoy.deprecated_features.lds.proto:use_original_dst",
"envoy.deprecated_features.config_source.proto:DEPRECATED_AND_UNAVAILABLE_DO_NOT_USE",
"envoy.deprecated_features.route.proto:pattern",
"envoy.deprecated_features.listener.proto:tls_context",
"envoy.deprecated_features.http_connection_manager.proto:operation_name",
"envoy.deprecated_features.health_check.proto:use_http2",
"envoy.deprecated_features.trace.proto:DEPRECATED_AND_UNAVAILABLE_DO_NOT_USE",
"envoy.deprecated_features.trace.proto:HTTP_JSON_V1",
"envoy.deprecated_features.cds.proto:ORIGINAL_DST_LB",
"envoy.deprecated_features.route.proto:regex",
"envoy.deprecated_features.cds.proto:extension_protocol_options",
"envoy.deprecated_features.route.proto:method",

Risk Level: High

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

I'm going to put this out here before the tests pass so @envoyproxy/maintainers can bike shed about what things we don't want to have fatal.

Also meta, while I love landing code and then turning features on, I think quarterly cadence is too slow, because it makes for one massive high risk PR. Should we true-by-default and just leave things in the code for 6 months? true by default unless it's super high risk (like the buffer change)? Thoughts?

@alyssawilk
Copy link
Contributor Author

Also as long as I'm inviting bike shedding, I'm inclined to split out fatal-by-defaults from feature flips, in case one causes more problems than the other. Thoughts?

May check in tomorrow but I don't think it's urgent we land this all before monday so we'll see :-P

@alyssawilk
Copy link
Contributor Author

alyssawilk commented Nov 1, 2019

Yeah, tests are unhappy locally. I'll look into it Monday unless someone else wants to take a look?
I still encourage discussion in the meantime.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk changed the title release: flipping flags and fatal-by-defaults release: flipping fatal-by-defaults Nov 1, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely some stuff to discuss in here. Let's all chat as a group on Monday.

/wait

// 1.12.0
"envoy.deprecated_features.route.proto:regex_match",
"envoy.deprecated_features.route.proto:regex",
"envoy.deprecated_features.accesslog.proto:config",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just did this 1-2 weeks ago so this seems pretty mean for those running on master. Defer to next release? Although aren't we moving to a 6 month cycle for this anyway? I can't remember. (Applies to all config in this list). Also I think this deprecation isn't even fully completed yet as @lizan identified some stragglers. cc @yanavlasov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stragglers should be addressed PR #8861

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave everything with config until the next release then? I don't think we can do the stragglers in this release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's just leave until next release.

"envoy.deprecated_features.route.proto:regex",
"envoy.deprecated_features.accesslog.proto:config",
"envoy.deprecated_features.thrift_proxy.proto:config",
"envoy.deprecated_features.cds.proto:tls_context",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, just done, seems not good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this has been stale for a month, WDYT? :-P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is still that this is a config option that almost everyone sets. With that said, after a month, it's probably fine. @lizan WDYT?

"envoy.deprecated_features.route.proto:allow_origin",
"envoy.deprecated_features.string.proto:regex",
"envoy.deprecated_features.grpc_service.proto:config",
"envoy.deprecated_features.tcp_proxy.proto:deprecated",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? I think this should be "deprecated_v1"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the script doesn't deal well with

// [#next-free-field: 11]
message TcpProxy {
// [#not-implemented-hide:] Deprecated.
// TCP Proxy filter configuration using V1 format.
message DeprecatedV1 {
option deprecated = true;

I think this one is beyond python so should manually tweak.

"envoy.deprecated_features.grpc_service.proto:config",
"envoy.deprecated_features.tcp_proxy.proto:deprecated",
"envoy.deprecated_features.overload.proto:config",
"envoy.deprecated_features.route.proto:value",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in a nested message. I'm a little worried this might accidentally conflict? Should the name actually include the full ptah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, I agree, but I think changing naming at this point is going to be ugly.
I'll see what I can do - I think we'll have to transition over to the new naming pretty carefully.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I remember why I didn't do this - it's really non-trivial to do given the python script is fairly stateless and doesn't know the message it's in. I think we'd have to fundamentally rewrite from scratch into something that did proto introspection.

Another alternate is we could have a not-yet-fatal section in runtime_features, and ASSERT in test that everything tagged as deprecated is either in disallowed_features or not_yet_disallowed_features. This would CI-regression-test that folks to add their own fully qualified fields as they added code, and then we could just cut from not_yet_disallowed to disallowed as part of the release.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk do you mind briefly syncing up with @htuch about this? Some of the proto parsing stuff he is doing now is pretty incredibly and I'm guessing it wouldn't be that difficult to change your script to actually run against the real proto tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the approach you could take here is to leverage the API type database to pull out this information. There may be other approaches we could take, e.g. we could include deprecation date annotations which could allow us to infer the disallowed fields via reflection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Do you have someone who could take a look some time this release (at which point I can revert this) and file a tracking bug at someone else? Alternately I could snag some time on your calendar for code pointers, given my python fu is pretty weak.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anyone who can do this is probably going to be busy landing v3 for this release, so I think we can TODO/file issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I'll just land manual cleanups, and TODO script cleanups or script rewrites.

"envoy.deprecated_features.listener.proto:tls_context",
"envoy.deprecated_features.listener.proto:config",
"envoy.deprecated_features.health_check.proto:config",
"envoy.deprecated_features.http_connection_manager.proto:[(validate.rules).enum",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, that's
OperationName operation_name = 1
[(validate.rules).enum = {defined_only: true}, deprecated = true];

Again I think there's some things that are beyond simple checking - we'd have to either do this programatically which would be awesome but I'm not sure how to do, or accept some clean up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment, let's sync up with @htuch

@mattklein123 mattklein123 self-assigned this Nov 1, 2019
@mattklein123
Copy link
Member

Should we true-by-default and just leave things in the code for 6 months? true by default unless it's super high risk (like the buffer change)? Thoughts?

+1

Also as long as I'm inviting bike shedding, I'm inclined to split out fatal-by-defaults from feature flips, in case one causes more problems than the other. Thoughts?

+1

@stale
Copy link

stale bot commented Nov 12, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 12, 2019
@alyssawilk alyssawilk added the no stalebot Disables stalebot from closing an issue label Nov 13, 2019
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Nov 13, 2019
Copy link
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, I have time to work on Envoy this month!
Still not ready for review - I'm going to do some script fixes, but sending out comments for conversation

// 1.12.0
"envoy.deprecated_features.route.proto:regex_match",
"envoy.deprecated_features.route.proto:regex",
"envoy.deprecated_features.accesslog.proto:config",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to leave everything with config until the next release then? I don't think we can do the stragglers in this release

"envoy.deprecated_features.route.proto:regex",
"envoy.deprecated_features.accesslog.proto:config",
"envoy.deprecated_features.thrift_proxy.proto:config",
"envoy.deprecated_features.cds.proto:tls_context",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this has been stale for a month, WDYT? :-P

"envoy.deprecated_features.grpc_service.proto:config",
"envoy.deprecated_features.tcp_proxy.proto:deprecated",
"envoy.deprecated_features.overload.proto:config",
"envoy.deprecated_features.route.proto:value",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, I agree, but I think changing naming at this point is going to be ugly.
I'll see what I can do - I think we'll have to transition over to the new naming pretty carefully.

"envoy.deprecated_features.route.proto:allow_origin",
"envoy.deprecated_features.string.proto:regex",
"envoy.deprecated_features.grpc_service.proto:config",
"envoy.deprecated_features.tcp_proxy.proto:deprecated",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the script doesn't deal well with

// [#next-free-field: 11]
message TcpProxy {
// [#not-implemented-hide:] Deprecated.
// TCP Proxy filter configuration using V1 format.
message DeprecatedV1 {
option deprecated = true;

I think this one is beyond python so should manually tweak.

"envoy.deprecated_features.listener.proto:tls_context",
"envoy.deprecated_features.listener.proto:config",
"envoy.deprecated_features.health_check.proto:config",
"envoy.deprecated_features.http_connection_manager.proto:[(validate.rules).enum",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, that's
OperationName operation_name = 1
[(validate.rules).enum = {defined_only: true}, deprecated = true];

Again I think there's some things that are beyond simple checking - we'd have to either do this programatically which would be awesome but I'm not sure how to do, or accept some clean up

@mattklein123 mattklein123 added waiting and removed no stalebot Disables stalebot from closing an issue waiting labels Dec 3, 2019
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

Ok, got rid of stragglers. Running tests locally but figured I'd ping about protocol buffer names - do we want to do name.field for 1.12 or do the changeover consistently for all fields when we cut to the next API version? I can go either way

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for coming back to this. A few comments.

/wait

tracing:
operation_name: INGRESS
idle_timeout: 840s
http2_protocol_options: {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to delete this?

Comment on lines +35 to +36
common_http_protocol_options:
idle_timeout: 840s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did this pass tests in the first place? Shouldn't this have failed somehow on the compile time options build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the example configs test tagged as excluded. I'll go and fix that as I think it's more trouble than it's worth.

Comment on lines +27 to +29
- safe_regex:
google_re2: {}
regex: .*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do an exact match on "*" here. Same question though on how this passed tests before?

@mattklein123
Copy link
Member

Running tests locally but figured I'd ping about protocol buffer names - do we want to do name.field for 1.12 or do the changeover consistently for all fields when we cut to the next API version? I can go either way

Sorry can you clarify? I'm not sure what we are choosing between?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

At a high level we were discussing if we wanted to have
proto.file:field_name
or
proto.file:MessageName:field_name
for the new ones only.
While I'm not positive I can update the scripts to do the latter I can do a one off manually and Harvey and team can fix it for the next release.
or we can just switch naming when we cut v3.

@mattklein123
Copy link
Member

While I'm not positive I can update the scripts to do the latter I can do a one off manually and Harvey and team can fix it for the next release.

IMO I would just manually fix it now? The downside would be potentially breaking people who are setting these flags but IMO an envoy-announce@ email would be good enough for that. WDYT?

/wait-any

@alyssawilk
Copy link
Contributor Author

Oh sorry, did you want to fix including all the ones that are overridedden today, or just the new ones? I thought we'd have to support both for a bit to not be crummy.

Also two questions. If we're going to change this
Do we want to just go from filename:fieldname to FieldDescriptor fullname, i.e. from
envoy.deprecated_features.deprecated.proto:is_deprecated to
envoy.test.deprecation_test.Base.is_deprecated
Second and @htuch is this all getting replaced by something fancier like auto-conversion in v2 and I should just revert this PR and let you sort things out going forward?

@alyssawilk
Copy link
Contributor Author

Fullname has the disadvantage f being a bit longer, but AFIK can't have conflicts

@mattklein123
Copy link
Member

Oh sorry, did you want to fix including all the ones that are overridedden today, or just the new ones? I thought we'd have to support both for a bit to not be crummy.

IMO it's probably OK to just change it without supporting both as long as we email envoy-announce, since users should be able to set both variables in runtime if needed before the deploy. I think this will effect almost no one so it would be better to reduce our pain if we think that is ok.

Fullname has the disadvantage f being a bit longer, but AFIK can't have conflicts

Big +1 on full name. @htuch any thoughts here? Also see the comment above about auto conversion.

/wait

@htuch
Copy link
Member

htuch commented Dec 17, 2019

Yeah, full name is the way to go for sure, way less danger of conflict. Regarding whether this gets replaced by something fancier; I think in terms of the technology we use, we have options after v3 lands, we now have some pretty deep proto and C++ analysis/rewrite capabilities (Envoy begins learning at a geometric rate and becomes self aware at 2:14am Eastern time, January 1 2020).

OTOH, I do like the idea of fatal-by-default flips each quarter, so orthogonal to the technology choice, I think we want to maintain this capability. This encourages earlier migration to new API use best practices and provides an audit trail of where folks are lagging, within a single major API version.

@stale
Copy link

stale bot commented Dec 24, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 24, 2019
@stale
Copy link

stale bot commented Dec 31, 2019

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot closed this Dec 31, 2019
@alyssawilk
Copy link
Contributor Author

So I had this theory wherein I'd get this out today, and unexpectedly spent [longer than I will admit] why I couldn't get things to fail

Turns out with the v3 migration, we had depcated and made fatal by default
envoy.deprecated_features.http_connection_manager.proto:idle_timeout
but now there's only the field
envoy.deprecated_features.http_connection_manager.proto:hidden_envoy_deprecated_idle_timeout
so my checks for "do I have parity between old deprected fields and my new fully qualified ones" are kinda moot because we just renamed all the things which were deprecated.

v2 is still supported AFIK via proto conversion so I think for this release we mark the "fatal" ones illegal conversions, and then deal with new deprecations after the release?
I'm going to block off some time tomorrow morning to sort out a plan with @htuch

@htuch
Copy link
Member

htuch commented Jan 7, 2020

Yeah, let's chat tomorrow. The deprecated field renaming is very mechanical, so I feel there must be a way to recover the information you are after here.

alyssawilk added a commit that referenced this pull request Jan 15, 2020
This PR makes the following fatal by default:
from cluster.proto: ORIGINAL_DST_LB, tls_context, extension_protocol_options
from health_check.proto: use_http2
from route_components.proto: allow_origin regex, pattern, method, regex_match, value
from http_connection_manager.proto: operation_name
from trace.proto: HTTP_JSON_V1
from string.proto: regex

Risk Level: Medium (who knows who is using them)
Testing: test framework updates
Docs Changes: n/a
Release Notes: n/a

Originally #8847

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk deleted the flips branch May 18, 2020 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale stalebot believes this issue/PR has not been touched recently waiting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants