Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Jaeger's GRPC handler to flow the tenant from an HTTP header #3688

Merged
merged 2 commits into from
May 31, 2022

Conversation

esnible
Copy link
Contributor

@esnible esnible commented May 17, 2022

Signed-off-by: Ed Snible [email protected]

If Jaeger is started with for example --collector.tenant-header=x-tenant, flow the value of the x-tenant header to the tenant parameter for span processing.

If Jaeger is not started with --collector.tenant-header, this is essentially a no-op.

cc @pavolloffay

@esnible esnible requested a review from a team as a code owner May 17, 2022 14:38
@esnible esnible requested a review from vprithvi May 17, 2022 14:38
@codecov
Copy link

codecov bot commented May 17, 2022

Codecov Report

Merging #3688 (c2551c7) into main (557ff71) will decrease coverage by 0.04%.
The diff coverage is 96.47%.

@@            Coverage Diff             @@
##             main    #3688      +/-   ##
==========================================
- Coverage   97.49%   97.45%   -0.05%     
==========================================
  Files         269      271       +2     
  Lines       15934    16014      +80     
==========================================
+ Hits        15535    15606      +71     
- Misses        315      321       +6     
- Partials       84       87       +3     
Impacted Files Coverage Δ
cmd/collector/app/flags/flags.go 96.73% <57.14%> (-3.27%) ⬇️
cmd/collector/app/handler/grpc_handler.go 100.00% <100.00%> (ø)
cmd/collector/app/handler/otlp_receiver.go 100.00% <100.00%> (ø)
cmd/collector/app/span_handler_builder.go 100.00% <100.00%> (ø)
pkg/config/tenancy/flags.go 100.00% <100.00%> (ø)
pkg/config/tenancy/tenancy.go 100.00% <100.00%> (ø)
cmd/query/app/static_handler.go 94.01% <0.00%> (-3.60%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 557ff71...c2551c7. Read the comment docs.

}
md, ok := metadata.FromIncomingContext(ctx)
if !ok {
return "", status.Errorf(codes.Internal, "missing metadata")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it an error to be missing metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Google's GRPC implementation there is always metadata, and I was thinking this is a 500/unexpected situation.

This code path is hit when Jaeger has been deployed with a requirement for a tenant header, and GRPC hasn't supplied metadata.
There is no security-related reason to treat missing metadata as different from valid metadata lacking the tenant header. Shall I rework this to combine missing-metadata and missing-header handling?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, but then I would not treat it as codes.Internal, but similar to other ones and I would alter the error message, e.g. "missing metadata, tenant header is required"

@@ -40,6 +40,7 @@ const (
collectorGRPCMaxReceiveMessageLength = "collector.grpc-server.max-message-size"
collectorMaxConnectionAge = "collector.grpc-server.max-connection-age"
collectorMaxConnectionAgeGrace = "collector.grpc-server.max-connection-age-grace"
collectorTenantHeader = "collector.tenant-header"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC is not the only way to submit data to collector. Is the intention to require the same header name across transports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro I'll need your advice on this. I have only used the GRPC transport. I had assumed that most users only use a single transport. If some users enable multiple transports, and wish to use different tenancy headers, I am happy to change it.

I'm expecting to implement flowing the header from HTTP to the internal tenant for GRPC only, as that is the only transport I use and know how to test. I hope to leave the other transports as a follow-on for others who desire tenancy on the other transports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of any existing header conventions for the tenant? Is x-tenant suitable to both HTTP and gRPC?

My suggestion:

  • keep a single header for now, we can extend this in the future if needed
  • I would like to have a choke point in the pipeline that would know that a tenancy is required and would verify that the Context contains a valid tenant. This way, even if we don't implement tenancy in HTTP headers immediately, the system would still be otherwise "safe", i.e. it would not accept data that is missing the tenant value.

The last point requires a configuration, which I would suggest to consolidate, e.g. something like

  multi_tenancy:
    enabled: true
    header: "x-tenant"  # this could be the default value instead of always requiring it
    tenants: ["a", "b", "c"]  # required

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify tenants at the deployment time? I wanted to avoid restarting Jaeger (and changing deployments) when a new tenant is added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration can be extended to accept tenants in other forms, like watching a file or by integrating with another source. This would be a good opportunity to start introducing a control plane where tenants can be provided externally via api call, such that the integration with the source can be decoupled from Jaeger code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose tackling the tenant restrictions in a separate PR.

More sophisticated deployments should do tenant validation in the auth layer (proxy). Which makes me think do we even need Jaeger to do this validation? A valid requirement for predefined tenants might be if the storage implementation needs to do some initialization per tenant. In the case of ES it should not be required, but maybe other storages like C* need to create a new DB that cannot be done in-flight of the first store call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation I'm most concerned about is not whether tenant is valid, but whether it is sent at all. Without that we open a hole that could allow other tenants to read your data.

I agree that there needs to be a stronger authentication somewhere other than a header, which is trivial to spoof.

@jkowall any experience to share here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will relax the handling of valid tenants so that if multi_tenancy.tenants is undefined any tenant will pass. This will allow the spanwriter to reject spans. In the future we can provide a way for the writer to reject earlier.

Storage might have rules about what a valid tenant can be that we can't express on a list. For example, some storage might reject tenants that include non-ASCII characters. Or some storage might want to hot-enable/disable a tenant.

validTenants = "multi_tenancy.tenants"
)

// TenancyFlagsConfig describes which CLI flags for TLS client should be generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type does not seem necessary, we could just have global functions. Or, at least, it should be the same type as Options, that's the pattern we used, i.e. the config/options type exposes functions to initialize itself via CLI.

tenants map[string]bool
}

func (tl tenantList) Valid(candidate string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure you want by-value receiver here? There's going to be a heap allocation anyway when the struct is converted into Guard interface.

Comment on lines 49 to 61
var tenant string
if g.tenancyConfig.Enabled {
var err error
tenant, err = g.validateTenant(ctx)
if err != nil {
g.logger.Error("rejecting spans (tenancy)", zap.Error(err))
return nil, err
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest moving this to validateTenant

Suggested change
var tenant string
if g.tenancyConfig.Enabled {
var err error
tenant, err = g.validateTenant(ctx)
if err != nil {
g.logger.Error("rejecting spans (tenancy)", zap.Error(err))
return nil, err
}
}
tenant, err := g.validateTenant(ctx)
if err != nil {
g.logger.Error("failed to validate tenant, rejecting spans", zap.Error(err))
return nil, err
}

type TenancyConfig struct {
Enabled bool
Header string
Valid Guard
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks weird in the usage: config.Valid.Valid(). How about:

type TenancyConfig struct {
    ...
    guard Guard
}

func (c *TenancyConfig) Valid(v string) bool {
    return c.guard.Valid()
}

especially since the current Valid variable is force-assigned in the constructor & cannot be user-supplied.

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@esnible esnible force-pushed the flow-tenant-from-grpc branch 2 times, most recently from 9532548 to ecba97d Compare May 23, 2022 13:33
func AddFlags(flags *flag.FlagSet) {
flags.Bool(tenancyEnabled, false, "Enable tenancy header when receiving or querying")
flags.String(tenancyHeader, "x-tenant", "HTTP header carrying tenant")
flags.String(validTenants, "", "Acceptable tenants")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should mention it is a comma-separated list. What if the flag is not defined?

},
}
for _, test := range tests {
_, err := client.PostSpans(test.ctx, &api_v2.PostSpansRequest{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the individual test cases should be executed separately via t.Run

Copy link
Contributor Author

@esnible esnible May 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the format of TestPostSpans(). No tests in cmd/collector/app/handler use t.Run().

Anyway, I made the changes.

Tenants: validTenants,
}))
for _, test := range tests {
tenant, err := handler.validateTenant(test.ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here, test cases should be run via t.Run

// AddFlags adds flags for tenancy to the FlagSet.
func AddFlags(flags *flag.FlagSet) {
flags.Bool(tenancyEnabled, false, "Enable tenancy header when receiving or querying")
flags.String(tenancyHeader, "x-tenant", "HTTP header carrying tenant")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first version, we could have a single flag tenant-header which would enable tenancy if it is defined.

Internally we will most likely not use the predefined list of tenants (e.g. in the observatorium project) and if the storage layer does not need to have access to the tenant names I find this configuration redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to define flags more forward looking than just what works with the first version. Using .enabled is already an established pattern in Jaeger flags.

@esnible esnible mentioned this pull request May 25, 2022
@esnible
Copy link
Contributor Author

esnible commented May 31, 2022

I struggled with re-basing after #3707 . Instead I copied the code changes and tests and manually applied them to main.

batch := &r.Batch
err := g.batchConsumer.consume(batch)
err = g.batchConsumer.consume(batch, tenant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest changing the signature to consume(context, batch) and handling tenancy inside. This way it can be reused in OTLP receiver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ok to do in another PR if you prefer)

@yurishkuro yurishkuro merged commit a184e94 into jaegertracing:main May 31, 2022
albertteoh pushed a commit to albertteoh/jaeger that referenced this pull request Jul 13, 2022
…aegertracing#3688)

* Flow tenant from GRPC PostSpans header through processors

Signed-off-by: Ed Snible <[email protected]>

* Restore accidentally deleted test case

Signed-off-by: Ed Snible <[email protected]>
Signed-off-by: Albert Teoh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants