-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support tail-based sampling from OTEL Collector #5867
Comments
@yurishkuro I'm interested in working on this but have never contributed to this repository before. Would you be able to guide me on how to tackle this issue? |
@mahadzaryab1 I would start with
|
sounds good! i'll give this a shot |
@yurishkuro I got the first item done in #5878 and read the blog post. Are there any instructions/examples on how I can can create a sample configuration and docker-compose file. Thank you so much for your time and help! |
Sample configurations are provided in the blog post. Example of docker compose using Jaeger is in docker-compose/monitor/docker-compose-v2.yml (but you will need to build the Docker image locally so that the code recognizes tail sampling processor, because officially published image will not) |
@yurishkuro thank you! as a follow-up, i'm trying to build the image locally by running
|
@yurishkuro this is how i've modified the docker setup to reproduce tail based sampling - let me know if you have any feedback |
you need to do initialize & update git submodules in order to access UI dir |
@yurishkuro awesome! thank you so much. I was able to get the setup going based on the configuration I linked above. Here's some sample output I'm seeing from the different policies that are being evaluated:
|
I suggest now to think about what e2e integration test could look like for this |
@yurishkuro will do - thanks for all your help and guidance so far! should the sample configuration/docker compose/readme go in https://github.com/jaegertracing/jaeger/tree/main/examples? |
docker-compose/tail-sampling |
@yurishkuro I completed the second task of creating a sample configuration. I'm looking for a bit of guidance on the e2e integration tests. I see that we have some integration tests but those mostly seem to be for the storage collectors. In this case, do we want to maybe test that the traces are being sampled according to the policies in our config? For example, we could test the 'filter-by-attribute' or the 'all-errors' policies by ensuring that those traces are getting captured and the rest are getting filtered out. |
You can use the tests we have as inspiration, but don't try to fit your test to them. How would you e2e test tail sampling if starting from scratch? Try explaining the whole setup and test process. |
@yurishkuro I am envisioning something like this:
Let me know what you think and if you have any feedback! |
SGTM. A couple thoughts:
BTW, in order to perform this test you do not need load generator running continuously, it's better if you just generate a fixed number of traces. You also need to make sure the storage is purged between A and B. |
@yurishkuro Thanks for the follow-up. I'm going to try playing around with tracegen to begin with to see what kind of traces I can generate. I'm currently trying to replace mirosim with tracegen to the docker-compose.yml with the following configuration.
This doesn't seem to work and gives me the following error:
Do you know what I'm missing? Here is the full docker-compose set up if that helps. |
You're missing network setting on tracegen compose config |
@yurishkuro thank you so much! As a follow up, I was wondering if you had any guidance (or an example if we're already doing this somewhere) on how I can query the spans that are stored? If we set up a policy on the service name, then we can do an A/B test with and without sampling as you suggested by querying the spans from store. |
@yurishkuro I've been reading through the storage integration tests. If we add a filter on the service name in our tail sampling processor config, then can we do something similar to https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/integration/integration.go#L136 to get all the services from the traces we have collected (in memory) and ensure that we only have the ones we listed in our policy? |
As well, do you have any thoughts on where the integration tests should live? It seems like the storage ones are in |
Conceptually yes, but sampling works at the trace level (all spans or nothing), so still need a bit more thought how you'd test them |
The test needs a new workflow yaml file and a new shell script to orchestrate. Then depending on how you want to do the analysis you might need more - at this point I don't know what you have in mind. If you need go code then cmd/jaeger/internal/integration would be a good place. |
@yurishkuro Would it not be enough to have a policy in the tail sampling processor that filters out a single service using a string attribute policy, say |
That's fine as a first version, but it's not quite a robust test because it doesn't prove that service02 was actually generated in the first place. An A/B test wouldn't have that problem. |
@yurishkuro I see. How about something like?
|
@yurishkuro would you be able to help me with the setup of the test? i'm stuck on the following:
|
It is easiest to query data store if you write Go code in Load balancer is not necessary since you will only be running a single instance of the collector. The objective of the load balancer is to ensure that all spans for the same trace ID end up in the same instance of the collector that runs tail sampling logic. |
@yurishkuro Got it! Thank you so much. A couple of follow-ups I had:
|
Yes if you have to write code to read you might as well save whichever traces you need via Span Writer. The test won't automatically run because those integration tests all require an environment variable to activate, otherwise they will all run at once and most of them fail because storage backend won't be available. |
@yurishkuro I've completed the integration test and pushed it to #5878. Working on the final README task - should this go in |
yes please |
@yurishkuro Thanks for all your help and guidance in helping me complete this issue. I learnt a lot about OpenTelemtry, Jaeger, and Distributed Tracing. I'm very excited about this project and would like to keep contributing to it. Do you have any recommendations for what I can pick up next? |
@mahadzaryab1 appreciate the help. The top priority is completing the work on Jaeger-v2, which you can see in the project board https://github.com/orgs/jaegertracing/projects/3/views/2 |
Jaeger v2 can now support tail based sampling that exists in OTEL Collector as an extension.
loadbalancing
exporter andtail_sampling
processor incomponents.go
As a single task it's too large for
good-first-issue
, but can be done incrementallyThe text was updated successfully, but these errors were encountered: