Skip to content

Jaeger: Single lookback_period_hours causes performance vs functionality trade-off #5887

@nico34638

Description

@nico34638

Describe the bug
The current Jaeger configuration in Quickwit uses a single lookback_period_hours parameter for all time-unbounded queries (get_services, get_operations, and get_trace operations). This creates a problem when you need different time windows for service/operation discovery versus trace retrieval operations.
The issue is that I cannot increase the current parameter because it would make the search page display too slow (service and operation discovery would take too long), but I need the find_trace functionality to work with a longer lookback period because it's very convenient for sharing traces between colleagues via trace links.

Steps to reproduce (if applicable)
Steps to reproduce the behavior:

  • Configure Quickwit with Jaeger endpoint enabled and a short lookback_period_hours (e.g., 24 hours) for acceptable UI performance
  • Use Jaeger UI to browse services and operations (this works well with short lookback period)
  • Try to share a trace link with a colleague for a trace that is older than the configured lookback period
  • The trace retrieval will fail or return incomplete results because it's using the same short lookback period
  • If you increase lookback_period_hours to fix trace sharing, the search page becomes unacceptably slow

Expected behavior
There should be separate configuration parameters:

  • One for service/operation discovery operations (get_services, get_operations) that can be set to a shorter period (e.g., 24-72 hours) for good UI performance
  • Another for trace retrieval operations (get_trace, find_traces) that can be set to a longer period (e.g., 30 days) to ensure trace links work properly for sharing traces between colleagues and accessing historical traces

Configuration:
jaeger: enable_endpoint: true lookback_period_hours: 24 # Can't increase this without making search page too slow max_trace_duration_secs: 3600 max_fetch_spans: 10000

Use case impact

  • Performance constraint: Cannot increase lookback_period_hours because service/operation discovery becomes too slow
  • Collaboration need: Need longer lookback for find_trace to share trace links between colleagues
  • Historical analysis: Need to access older traces for debugging and investigation purposes

Proposed solution
Split the lookback_period_hours into two separate parameters:

  • service_discovery_lookback_period_hours: For get_services and get_operations (default: 24-72 hours for performance)
  • trace_lookback_period_hours: For get_trace and trace retrieval operations (default: longer period, e.g., 30 days for trace sharing)

Maintain backward compatibility by keeping the existing lookback_period_hours as a fallback when the new parameters are not specified.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions