Fix: (#3454) LLM returns empty response when input is too long #3488

ayushmangpta · 2025-09-09T21:07:40Z

Pre-handle #3454 context length before LLM call to avoid empty responses

Summary

Proactively handle context window overflow right before invoking the LLM to prevent providers that fail silently (returning empty responses) from causing ValueError downstream.

Why

Some providers (e.g., DeepInfra with Qwen/Qwen3-235B-A22B-Instruct-2507) return an empty message when input exceeds the model’s context window.
Previously we only handled context overflow after an exception was thrown. If the provider fails silently, no exception is raised, and get_llm_response raises "Invalid response from LLM call - None or empty."
This change adds a pre-call check so we summarize or abort before calling the LLM in an overflow scenario.

What changed

File: crewai/agents/crew_agent_executor.py
- In CrewAgentExecutor._invoke_loop, before get_llm_response:
  - Compute the total content length of messages and compare with llm.get_context_window_size().
  - If overflow is likely, call handle_context_length (summarize when respect_context_window=True; otherwise raise).
  - Keep the existing exception-based overflow handling as fallback.

Code snippet added

# ...existing code...

enforce_rpm_limit(self.request_within_rpm_limit)

# Pre-handle context length just before calling the LLM
try:
    window_size = self.llm.get_context_window_size()
except Exception:
    window_size = None
if window_size is not None:
    total_len = sum(len(m.get("content", "")) for m in self.messages)
    if total_len >= window_size:
        handle_context_length(
            respect_context_window=self.respect_context_window,
            printer=self._printer,
            messages=self.messages,
            llm=self.llm,
            callbacks=self.callbacks,
            i18n=self._i18n,
        )

answer = get_llm_response(
    llm=self.llm,
    messages=self.messages,
    callbacks=self.callbacks,
    printer=self._printer,
    from_task=self.task,
)
formatted_answer = process_llm_response(answer, self.use_stop_words)

# ...existing code...

Behavior

If respect_context_window is True: messages are summarized before the LLM call when the total content length meets/exceeds the model window size.
If respect_context_window is False: a clear error is raised earlier, before calling the LLM, maintaining previous behavior while avoiding a silent empty response.
Existing exception-based handling remains for providers that properly raise overflow errors.

Risks / Trade-offs

Uses character length as a heuristic, not precise token counting. May summarize slightly earlier/later than ideal. This mirrors existing summarize_messages behavior and remains a practical safeguard.

Note: This is my first PR—feedback appreciated.

BELKHIR · 2025-09-11T12:25:21Z

src/crewai/agents/crew_agent_executor.py

Hi,
I don't think get_context_window_size will return the true context size for llm backed by ollama.

Khalid,

added handle_context_length call right before calling llm

5814050

BELKHIR reviewed Sep 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: (#3454) LLM returns empty response when input is too long #3488

Fix: (#3454) LLM returns empty response when input is too long #3488

ayushmangpta commented Sep 9, 2025 •

edited

Loading

Uh oh!

BELKHIR Sep 11, 2025

Uh oh!

Uh oh!

Fix: (#3454) LLM returns empty response when input is too long #3488

Are you sure you want to change the base?

Fix: (#3454) LLM returns empty response when input is too long #3488

Conversation

ayushmangpta commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-handle #3454 context length before LLM call to avoid empty responses

Uh oh!

BELKHIR Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ayushmangpta commented Sep 9, 2025 •

edited

Loading