Skip to content

chore(test): fix flaky telemetry tests#2815

Merged
ehhuang merged 1 commit intollamastack:mainfrom
Elbehery:20250718_fix_telemetry_tests
Jul 22, 2025
Merged

chore(test): fix flaky telemetry tests#2815
ehhuang merged 1 commit intollamastack:mainfrom
Elbehery:20250718_fix_telemetry_tests

Conversation

@Elbehery
Copy link
Contributor

What does this PR do?

This PR fixes flaky telemetry tests

See #2814

Test Plan

@Elbehery
Copy link
Contributor Author

cc @cdoern

Copy link
Collaborator

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Is there explanation as to why this is the proper fix?

@Elbehery
Copy link
Contributor Author

thanks! Is there explanation as to why this is the proper fix?

so the CI failed on with this error

pydantic_core._pydantic_core.ValidationError: 1 validation error for Trace
root_span_id
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]

the failure is due to database can contain traces where root_span_id is NULL (when no span was marked with the LOCAL_ROOT_SPAN_MARKER), but the Trace Pydantic model expects root_span_id to be a required string field.

@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch from e065b94 to 8caf78b Compare July 18, 2025 19:17
@Elbehery
Copy link
Contributor Author

ci is green 🙏🏽

Comment on lines +79 to +83
COALESCE(t.root_span_id,
(SELECT s.span_id FROM spans s
WHERE s.trace_id = t.trace_id
AND s.parent_span_id IS NULL
ORDER BY s.start_time LIMIT 1)) as root_span_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think this is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the filter below should be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 👍🏽

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this actually addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch from 8caf78b to 110746b Compare July 18, 2025 23:23
@Elbehery
Copy link
Contributor Author

Elbehery commented Jul 19, 2025

@ehhuang the failure did not happen last run

lets try keeping COALESCE

@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch 2 times, most recently from 4b6878b to b030143 Compare July 19, 2025 00:11
@ehhuang
Copy link
Contributor

ehhuang commented Jul 19, 2025

@ehhuang the failure did not happen last run

lets try keeping COALESCE

I'd prefer that we don't introduce unnecessary changes unless there's a reason to.

@Elbehery
Copy link
Contributor Author

@ehhuang the failure did not happen last run
lets try keeping COALESCE

I'd prefer that we don't introduce unnecessary changes unless there's a reason to.

the CI failed after removing COALESCE https://github.com/meta-llama/llama-stack/actions/runs/16382156494/job/46295644087?pr=2815

@Elbehery
Copy link
Contributor Author

@ehhuang the failure did not happen last run
lets try keeping COALESCE

I'd prefer that we don't introduce unnecessary changes unless there's a reason to.

the CI failed after removing COALESCE https://github.com/meta-llama/llama-stack/actions/runs/16382156494/job/46295644087?pr=2815

no failure with COALESCE :D

@Elbehery
Copy link
Contributor Author

Another question, this repo always request to update branch, even if the files changed in PR are not changed !!

It is big effort to keep rebasing after each PR merge :D

@Elbehery
Copy link
Contributor Author

I think requiring

Require linear history

it too strict, as long as no conflicts in PRs

cc @leseb

@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch from 4b2d6d1 to 417e149 Compare July 20, 2025 18:57
@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch from 417e149 to be88fee Compare July 21, 2025 07:48
@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch from be88fee to b5bef8c Compare July 21, 2025 11:03
@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch 2 times, most recently from 5bb7cc4 to 046ecca Compare July 21, 2025 14:02
@Elbehery
Copy link
Contributor Author

the failures are from #2575

This PR used @pytest.mark.asyncio

cc @leseb

@ashwinb
Copy link
Contributor

ashwinb commented Jul 21, 2025

Hey @Elbehery @ehhuang -- can we please DISABLE telemetry tests ASAP first and then land this PR? I realize this PR makes some changes which need careful consideration so while we iterate we need CI to be green.

@ehhuang
Copy link
Contributor

ehhuang commented Jul 21, 2025

#2814 @ashwinb

@Elbehery
Copy link
Contributor Author

Elbehery commented Jul 21, 2025

@ehhuang the ci failed again, it passes only with COALESCE

@ashwinb
Copy link
Contributor

ashwinb commented Jul 21, 2025

#2814 @ashwinb

thank you, merged that one.

@Elbehery
Copy link
Contributor Author

@ehhuang the ci failed again, it passes only with COALESCE

@ehhuang shall we keep COALESCE ?

@ehhuang
Copy link
Contributor

ehhuang commented Jul 21, 2025

@ehhuang the failure did not happen last run
lets try keeping COALESCE

I'd prefer that we don't introduce unnecessary changes unless there's a reason to.

the CI failed after removing COALESCE https://github.com/meta-llama/llama-stack/actions/runs/16382156494/job/46295644087?pr=2815

image looks like the revert wasn't clean (root_span_id was removed)

I think you removed t.root_span_id and that's why the tests fail without COALESCE

@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch 3 times, most recently from 05c963e to 48d0552 Compare July 22, 2025 10:56
@Elbehery
Copy link
Contributor Author

@ehhuang any other changes required ?

Copy link
Contributor

@ehhuang ehhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Can you remove the @pytest.mark.skip(reason="Skipping telemetry tests for now") too?

@Elbehery
Copy link
Contributor Author

Thank you for working on this!

Can you remove the @pytest.mark.skip(reason="Skipping telemetry tests for now") too?

i was thinking to add them already, but thought maybe I should wait till you request :D

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
@Elbehery Elbehery force-pushed the 20250718_fix_telemetry_tests branch from 48d0552 to 81361ad Compare July 22, 2025 18:32
@ehhuang ehhuang merged commit 9736f09 into llamastack:main Jul 22, 2025
76 checks passed
@Elbehery Elbehery deleted the 20250718_fix_telemetry_tests branch July 22, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants