Skip to content

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented Oct 17, 2025

This covers task # 1 (Research and Setup) from TIKA-4513 e.g.

  1. Research and Setup

Review OpenTelemetry Java getting-started guide and instrumentation registry for Tika-relevant libraries (e.g., auto-instrumentation for Jetty HTTP server, Apache HttpClient).
Set up a local dev environment with Tika Server, OpenTelemetry Java agent (latest stable release), and a test collector (e.g., Grafana Alloy in Docker).
Prototype basic trace export for a sample /tika request.

I have lots of commentary to add... which I will do in due course. For now I was thinking of creating a video demo to better communicate the PR and what it offers.

One important thing, instrumentation (per OTEL) is disabled by default therefore the impact to existing Tika users is very small.

Before I get around to asking people to review this PR, I want to agree on how structure the constituent tasks in TIKA-4513. I will continue that conversation on the Jira ticket.

In the meantime if anyone wishes to take this for a spin the markdown documentation (most notably OPENTELEMETRY.md) will get you up and running.

NOTE: I used Claude-4.5-sonnet to generate

  • the markdown documents, I will note that Claude generates lots of mistakes which I fixed by hand during my peer review. That being said, I've literally stepped through this documentation line-by-line now and I genuinely don't think I could have done it better myself if you gave me another week. I'm impressed and satisfied with the in-progress result.
  • some Javadoc, notably the Javadocs with loads of commentary. Again, I'm satisfied with the outcome and I think it will assist in a better understanding of the additions.
  • TikaOpenTelemetryTest.java... some basic unit test coverage which was convenient.
  • to figure out that TikaOpenTelemetryConfig had to implements Initializable... this saved me loads of study time as it had been ages since I looked at tika-server internals and lots has changed.
    I'm aware that the PR title will need to be augmented to accommodate this.

This instrumentation mega-project is likely similar in scale to tika-pipes. There is still loads of work to do.

You will also have noticed that I used Jaeger a basic example. I will be providing another example using Grafana Alloy as the OTEL collector as it is much more closely aligned with $dayjob but that being said I did want to demonstrate the power of OTEL as a vendor agnostic instrumentation framework. Very powerful indeed.

In the meantime heres a few screenshots which demonstrate what a trace containing two spans looks like in Jaeger. Pretty basic but exciting stuff.

Screenshot 2025-10-16 at 22 27 23 Screenshot 2025-10-16 at 22 27 47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant