Skip to content

Conversation

ayushmangpta
Copy link

@ayushmangpta ayushmangpta commented Sep 9, 2025

Pre-handle #3454 context length before LLM call to avoid empty responses

Summary

  • Proactively handle context window overflow right before invoking the LLM to prevent providers that fail silently (returning empty responses) from causing ValueError downstream.

Why

  • Some providers (e.g., DeepInfra with Qwen/Qwen3-235B-A22B-Instruct-2507) return an empty message when input exceeds the model’s context window.
  • Previously we only handled context overflow after an exception was thrown. If the provider fails silently, no exception is raised, and get_llm_response raises "Invalid response from LLM call - None or empty."
  • This change adds a pre-call check so we summarize or abort before calling the LLM in an overflow scenario.

What changed

  • File: crewai/agents/crew_agent_executor.py
    • In CrewAgentExecutor._invoke_loop, before get_llm_response:
      • Compute the total content length of messages and compare with llm.get_context_window_size().
      • If overflow is likely, call handle_context_length (summarize when respect_context_window=True; otherwise raise).
      • Keep the existing exception-based overflow handling as fallback.

Code snippet added

# ...existing code...

enforce_rpm_limit(self.request_within_rpm_limit)

# Pre-handle context length just before calling the LLM
try:
    window_size = self.llm.get_context_window_size()
except Exception:
    window_size = None
if window_size is not None:
    total_len = sum(len(m.get("content", "")) for m in self.messages)
    if total_len >= window_size:
        handle_context_length(
            respect_context_window=self.respect_context_window,
            printer=self._printer,
            messages=self.messages,
            llm=self.llm,
            callbacks=self.callbacks,
            i18n=self._i18n,
        )

answer = get_llm_response(
    llm=self.llm,
    messages=self.messages,
    callbacks=self.callbacks,
    printer=self._printer,
    from_task=self.task,
)
formatted_answer = process_llm_response(answer, self.use_stop_words)

# ...existing code...

Behavior

  • If respect_context_window is True: messages are summarized before the LLM call when the total content length meets/exceeds the model window size.
  • If respect_context_window is False: a clear error is raised earlier, before calling the LLM, maintaining previous behavior while avoiding a silent empty response.
  • Existing exception-based handling remains for providers that properly raise overflow errors.

Risks / Trade-offs

  • Uses character length as a heuristic, not precise token counting. May summarize slightly earlier/later than ideal. This mirrors existing summarize_messages behavior and remains a practical safeguard.

Note: This is my first PR—feedback appreciated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
I don't think get_context_window_size will return the true context size for llm backed by ollama.

Khalid,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants