Litellm update blog posts rss by ryan-crabbe · Pull Request #23791 · BerriAI/litellm

ryan-crabbe · 2026-03-16T23:10:44Z

Type

🧹 Refactoring

Changes

Fetch blog posts from the docs site RSS feed (https://docs.litellm.ai/blog/rss.xml) instead of a manually-updated JSON file on GitHub
Parses RSS XML to extract title, description, date, and URL, no new dependencies (uses stdlib xml.etree.ElementTree and email.utils)
Falls back to bundled local blog_posts.json on any failure (network error, invalid XML, etc.)
Blog posts now stay in sync with the docs site automatically, no more manual JSON updates

…itHub

vercel · 2026-03-16T23:10:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 16, 2026 11:24pm

codspeed-hq · 2026-03-16T23:12:37Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_update-blog-posts-rss (67482db) with main (278c9ba)¹}

No successful run was found on litellm_ryan_march_16 (4f2fe33) during the generation of this report, so main (278c9ba) was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

greptile-apps · 2026-03-16T23:13:15Z

Greptile Summary

This PR replaces the previous manual GitHub-hosted JSON blog post fetch with a live RSS feed parse from https://docs.litellm.ai/blog/rss.xml, using only stdlib modules (xml.etree.ElementTree, email.utils) alongside the existing httpx dependency. The fallback to the bundled blog_posts.json and the in-process TTL cache are preserved.

Key changes:

litellm/__init__.py: Default blog_posts_url updated to point at the Docusaurus RSS endpoint.
get_blog_posts.py: fetch_remote_blog_posts → fetch_rss_feed (returns raw XML text); new parse_rss_to_posts method extracts post dicts from <item> elements; validate_blog_posts simplified to check for a non-empty list.
test_get_blog_posts.py: All tests updated to mock httpx.get returning RSS XML; new unit tests cover the XML parser, including invalid XML and missing <channel> edge cases. All tests are properly mocked with no real network calls.

Issues found:

xml.etree.ElementTree.fromstring is documented as insecure against maliciously constructed XML (Billion Laughs / entity-expansion DoS). Since blog_posts_url is operator-configurable via environment variable, this is an exploitable surface if it is ever pointed at an untrusted endpoint. Using defusedxml would resolve this with a one-line change.
BlogPost and BlogPostsResponse Pydantic models are defined but the parsing pipeline returns raw dicts, leaving the models as dead code that provides no runtime validation.

Confidence Score: 3/5

Functional logic is sound and well-tested, but the use of the unsafe xml.etree.ElementTree parser should be addressed before merging.
The core RSS-parsing logic is correct, the fallback chain works, and all tests are properly mocked. The score is lowered primarily because xml.etree.ElementTree is explicitly documented as vulnerable to entity-expansion DoS attacks, and the parsing URL is user-configurable via an environment variable. Additionally, the defined Pydantic models are dead code in the production path.
litellm/litellm_core_utils/get_blog_posts.py — review the XML parsing security concern and unused Pydantic models.

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/get_blog_posts.py	Replaces GitHub JSON fetch with RSS XML parsing. Contains a security concern: `xml.etree.ElementTree` is vulnerable to entity-expansion (Billion Laughs) DoS attacks. Also has dead code: `BlogPost`/`BlogPostsResponse` Pydantic models are defined but never used in the parsing pipeline.
tests/test_litellm/test_get_blog_posts.py	Tests updated to mock `httpx.get` returning RSS XML text. All tests use proper mocks — no real network calls. New tests for `parse_rss_to_posts`, including invalid XML and missing channel edge cases.
litellm/init.py	Changes the default `blog_posts_url` from a GitHub raw JSON URL to the Docusaurus RSS feed endpoint. Simple one-line change, no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_blog_posts called] --> B{LITELLM_LOCAL_BLOG_POSTS=true?}
    B -- Yes --> C[load_local_blog_posts\nblog_posts.json]
    B -- No --> D{Cache valid?\nwithin TTL?}
    D -- Yes --> E[Return cached posts]
    D -- No --> F[fetch_rss_feed\nhttpx.get RSS URL]
    F -- Network/HTTP error --> G[load_local_blog_posts\nfallback]
    F -- Success: raw XML --> H[parse_rss_to_posts\nET.fromstring\nmax_posts=1]
    H -- Parse error --> G
    H -- Parsed posts --> I{validate_blog_posts\nnon-empty list?}
    I -- False --> G
    I -- True --> J[Cache posts\nReturn posts]

Comments Outside Diff (1)

litellm/litellm_core_utils/get_blog_posts.py, line 27-35 (link)

Pydantic models BlogPost / BlogPostsResponse are defined but never used

BlogPost and BlogPostsResponse were defined to validate the blog post structure, but parse_rss_to_posts returns raw List[Dict[str, str]] instead of List[BlogPost]. The models therefore provide no actual runtime validation in the production path — they only appear in two isolated test assertions.

Either use them to validate/coerce the parsed dicts (which would catch malformed RSS responses early), or remove them to avoid dead code:
```
# Option A – use the model for validation inside parse_rss_to_posts
posts.append(
    BlogPost(
        title=title_el.text or "",
        description=desc_el.text or "" if desc_el is not None else "",
        date=date_str,
        url=link_el.text or "",
    ).model_dump()
)
```
```
# Option B – remove the unused classes and imports
```

_{Last reviewed commit: 12facb2}

greptile-apps · 2026-03-16T23:13:22Z

litellm/litellm_core_utils/get_blog_posts.py

+import xml.etree.ElementTree as ET
+from email.utils import parsedate_to_datetime
 from importlib.resources import files
 from typing import Any, Dict, List, Optional


Unused Any import

Any was used by the old validate_blog_posts(data: Any) signature but is no longer referenced after the refactor. It should be removed.

Suggested change

from typing import Any, Dict, List, Optional

from typing import Dict, List, Optional

greptile-apps · 2026-03-16T23:13:23Z

litellm/litellm_core_utils/get_blog_posts.py

        try:
-            data = cls.fetch_remote_blog_posts(url)
+            xml_text = cls.fetch_rss_feed(url)
+            posts = cls.parse_rss_to_posts(xml_text)


max_posts=1 silently limits output to a single post

get_blog_posts calls parse_rss_to_posts(xml_text) without overriding max_posts, so the RSS feed is always truncated to one post. If the previous blog_posts.json contained multiple posts and the UI surfaces more than one, callers will now always receive a single-item list — a silent regression in the number of blog posts shown.

The default of 1 in parse_rss_to_posts is reasonable as a conservative safeguard, but the call site should explicitly set the intended limit (or use a named constant) so the behaviour is obvious and easy to change:

Suggested change

posts = cls.parse_rss_to_posts(xml_text)

posts = cls.parse_rss_to_posts(xml_text, max_posts=10)

(Adjust the value to whatever the UI is designed to display.)

greptile-apps · 2026-03-16T23:25:27Z

litellm/litellm_core_utils/get_blog_posts.py

+
+        Extracts title, description, date (YYYY-MM-DD), and url from each <item>.
+        """
+        root = ET.fromstring(xml_text)


xml.etree.ElementTree is vulnerable to XML entity expansion attacks

Python's own documentation explicitly states that xml.etree.ElementTree is not secure against maliciously constructed data and is vulnerable to "Billion Laughs" and "Quadratic Blowup" entity-expansion DoS attacks.

While the default URL (https://docs.litellm.ai/blog/rss.xml) is trusted, the URL is user-configurable via LITELLM_BLOG_POSTS_URL. If an operator points this to an attacker-controlled endpoint, the server can return a deeply nested entity-expansion payload that exhausts memory/CPU before the response is even fully processed.

The defusedxml library is the recommended drop-in replacement:

import defusedxml.ElementTree as ET

This single swap neutralises billion-laughs, quadratic-blowup, and external-entity attacks without any other code changes. Alternatively, you can validate the response size before parsing (e.g., reject payloads over ~1 MB).

ryan-crabbe · 2026-03-16T23:39:10Z

@greptileai the URL is only configurable via server-side env var (LITELLM_BLOG_POSTS_URL), meaning only operators who already have full system access can change it. It's not user-facing input. "Unused Pydantic models", BlogPost and BlogPostsResponse are used they are not dead code (inside public endpoints)

greptile-apps · 2026-03-16T23:40:19Z

You're right that the threat model is limited here - if an operator has access to set LITELLM_BLOG_POSTS_URL, they already have significant system access. That said, defusedxml is still worth considering for defense-in-depth:

Zero additional complexity - it's a drop-in replacement (import defusedxml.ElementTree as ET)
Guards against indirect attacks - if this env var is ever set via a config management system with broader access, or if the code is reused elsewhere
Standard Python security guidance - the stdlib docs explicitly recommend against using xml.etree.ElementTree for untrusted data

But if you want to skip the dependency and keep stdlib-only, validating the response size before parsing (e.g., reject if len(response.text) > 1_000_000) would also mitigate the DoS risk without adding a new import.

Either way, your point about the access model is valid - this isn't a high-severity issue given the current architecture.

feat: fetch blog posts from docs RSS feed instead of static JSON on G…

67482db

…itHub

ryan-crabbe changed the base branch from main to litellm_ryan_march_16 March 16, 2026 23:11

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

fix: remove unused Any import from get_blog_posts

12facb2

vercel bot deployed to Preview March 16, 2026 23:24 View deployment

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

ryan-crabbe merged commit 0d45b1d into litellm_ryan_march_16 Mar 16, 2026
59 of 65 checks passed

ryan-crabbe deleted the litellm_update-blog-posts-rss branch March 17, 2026 00:12

ryan-crabbe mentioned this pull request Mar 17, 2026

Litellm ryan march 16 #23822

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Litellm update blog posts rss#23791

Litellm update blog posts rss#23791
ryan-crabbe merged 2 commits intolitellm_ryan_march_16from
litellm_update-blog-posts-rss

ryan-crabbe commented Mar 16, 2026

Uh oh!

vercel bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 16, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 16, 2026

Uh oh!

greptile-apps bot Mar 16, 2026

Uh oh!

greptile-apps bot Mar 16, 2026

Uh oh!

ryan-crabbe commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	from typing import Any, Dict, List, Optional
	from typing import Dict, List, Optional

	posts = cls.parse_rss_to_posts(xml_text)
	posts = cls.parse_rss_to_posts(xml_text, max_posts=10)

Uh oh!

Conversation

ryan-crabbe commented Mar 16, 2026

Type

Changes

Uh oh!

vercel bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 16, 2026

Merging this PR will not alter performance

Footnotes

Uh oh!

greptile-apps bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

ryan-crabbe commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 16, 2026 •

edited

Loading

greptile-apps bot commented Mar 16, 2026 •

edited

Loading