TIKA-4513 Instrument tika-server #2367
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This covers task # 1 (Research and Setup) from TIKA-4513 e.g.
I have lots of commentary to add... which I will do in due course. For now I was thinking of creating a video demo to better communicate the PR and what it offers.
One important thing, instrumentation (per OTEL) is disabled by default therefore the impact to existing Tika users is very small.
Before I get around to asking people to review this PR, I want to agree on how structure the constituent tasks in TIKA-4513. I will continue that conversation on the Jira ticket.
In the meantime if anyone wishes to take this for a spin the markdown documentation (most notably
OPENTELEMETRY.md) will get you up and running.NOTE: I used
Claude-4.5-sonnetto generateTikaOpenTelemetryTest.java... some basic unit test coverage which was convenient.TikaOpenTelemetryConfighad toimplements Initializable... this saved me loads of study time as it had been ages since I looked at tika-server internals and lots has changed.I'm aware that the PR title will need to be augmented to accommodate this.
This instrumentation mega-project is likely similar in scale to tika-pipes. There is still loads of work to do.
You will also have noticed that I used Jaeger a basic example. I will be providing another example using Grafana Alloy as the OTEL collector as it is much more closely aligned with $dayjob but that being said I did want to demonstrate the power of OTEL as a vendor agnostic instrumentation framework. Very powerful indeed.
In the meantime heres a few screenshots which demonstrate what a trace containing two spans looks like in Jaeger. Pretty basic but exciting stuff.