Skip to content

fix: fix test of root span to match what is being set#2494

Merged
bbrowning merged 1 commit intollamastack:mainfrom
grs:issue-2493
Jun 26, 2025
Merged

fix: fix test of root span to match what is being set#2494
bbrowning merged 1 commit intollamastack:mainfrom
grs:issue-2493

Conversation

@grs
Copy link
Contributor

@grs grs commented Jun 23, 2025

What does this PR do?

I get errors when trying to query spans. It appears to be a result of traces being inserted where there is no root_span_id which causes a pydantic validation error on trying to load the data for a query response (and in any case having no span referenced undermines the purpose of the trace). The root cause as far as I can see is an invalid test in the code that inserts the trace, where it is testing for the string "true" against an object set to the python value True.

Closes #2493

Test Plan

With this change I can query spans.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 23, 2025
@grs grs force-pushed the issue-2493 branch 2 times, most recently from 1268a91 to 8a24da5 Compare June 23, 2025 16:19
@grs grs changed the title fix test of root span to match what is being set fix: fix test of root span to match what is being set Jun 23, 2025
@bbrowning
Copy link
Collaborator

Even with this change, we'll still sometimes insert None into this field in the database. Won't that lead to the same error querying spans later on? This just shifts the conditions that may trigger that error?

I believe this change looks reasonable for the purposes of properly testing if we're recording a root span or not, but we may also need to adjust the Trace class in llama_stack/apis/telemetry/telemetry.py to make root_span_id optional, because we can still end up placing None in that field.

Or, if it's actually required, we'll have to do the work to figure out what our root_span_id actually is for every trace request and never insert None there, right?

@grs
Copy link
Contributor Author

grs commented Jun 23, 2025

Even with this change, we'll still sometimes insert None into this field in the database. Won't that lead to the same error querying spans later on? This just shifts the conditions that may trigger that error?

I believe this change looks reasonable for the purposes of properly testing if we're recording a root span or not, but we may also need to adjust the Trace class in llama_stack/apis/telemetry/telemetry.py to make root_span_id optional, because we can still end up placing None in that field.

Or, if it's actually required, we'll have to do the work to figure out what our root_span_id actually is for every trace request and never insert None there, right?

My assumption (which may well be flawed), is that there is little value in a trace entry that does not reference a span. In what circumstances would a trace be inserted with no root_span_id, and how would that entry then be used?

@bbrowning
Copy link
Collaborator

If we're part of a larger distributed trace started outside of Llama Stack by the calling client that root span id won't be our span_id but may be our parent_span_id (or our parent's parent, or parent's parent's parent, etc). I don't know if Llama Stack itself has visibility into the outermost root_span_id in those cases, although you could argue that for the purposes of this work it would probably be enough to traverse our parent lineage as far back as we can and consider that the root_span_id in most or all cases?

@bbrowning
Copy link
Collaborator

And, maybe I'm reading too much into this column in sqlite being named root_span_id. If it's only meant to represent the span_id, and not the actual root_span_id, then that may change what we send. I'll admit to being mostly oblivious as to what queries this sqlite vs just shipping traces off to purpose-built distributed tracing backends.

@grs
Copy link
Contributor Author

grs commented Jun 23, 2025

Even with propagated trace context though, there doesn't seem to be any value in inserting a record into the traces table where the 'root_span_id' is null as it then has no link to any span information.

@grs
Copy link
Contributor Author

grs commented Jun 24, 2025

@bbrowning I've made a small further change here to correctly handle the case where a traceparent context is passed in by the client. I have introduced a separate marker for the 'local' root, so that inserting a span id for each trace is not dependent on the __root_span__ marker which is removed when the logical root is remote. Wdyt?

Copy link
Collaborator

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable approach, ensuring we log the span_id we know about even if for some reason it's not the ultimate root span because we're part of a larger context.

I left a non-blocking comment around maybe considering a code comment or constant just to help us remember the purpose of this.

Thanks!

Using the telemetry API to query the sqlite store of traces and spans
requires that each trace record references the first span recorded
locally in the 'spans' table.

This fix ensures such a reference is always in place whether or
not a traceparent is passed in.

Signed-off-by: Gordon Sim <gsim@redhat.com>
@bbrowning
Copy link
Collaborator

I commented out the pytest skip for the only test in tests/integration/telemetry/test_telemetry.py and it fails on current main. I pulled your PR locally, and now that test passes locally. This looks good to me!

Here's how I ran that test locally, after commenting out the pytest skip marker on it:

ollama run llama3.2:3b

INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
uv run llama stack run llama_stack/templates/ollama/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
uv run pytest -sv tests/integration/telemetry/test_telemetry.py \
  --text-model meta-llama/Llama-3.2-3B-Instruct

I don't know if this was the only reason that test was marked as skipped. If you're willing, it might be worth digging into if this makes that test stable now or if it was also failing for other reasons in CI. But, either way, great job on this one - merging!

@bbrowning bbrowning merged commit 68d8f21 into llamastack:main Jun 26, 2025
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

span query error

4 participants