Skip to content

Conversation

@sachinML
Copy link
Contributor

Refs #30200

  • Use split_documents(docs) after header-based splitting (preserves per-section metadata; overlap applied per document).
  • Overlap appears only when a single section exceeds chunk_size.
  • Overlap does not cross section/document boundaries.
  • Consider strip_headers=True to avoid a tiny header-only chunk; keep "" as a fallback separator if text lacks newlines/spaces.

@github-actions github-actions bot added langchain For docs changes to LangChain python For content related to the Python version of LangChain projects oss labels Oct 21, 2025
@mdrxy mdrxy merged commit 8790e49 into langchain-ai:main Oct 21, 2025
6 checks passed
mdrxy added a commit to anjaliratnam-msft/docs that referenced this pull request Oct 22, 2025
…angchain-ai#1061)

Refs #30200

- Use `split_documents(docs)` after header-based splitting (preserves
per-section metadata; overlap applied per document).
- Overlap appears only when a single section exceeds `chunk_size`.
- Overlap does not cross section/document boundaries.
- Consider `strip_headers=True` to avoid a tiny header-only chunk; keep
`""` as a fallback separator if text lacks newlines/spaces.

---------

Co-authored-by: Mason Daugherty <[email protected]>
Co-authored-by: Mason Daugherty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

langchain For docs changes to LangChain oss python For content related to the Python version of LangChain projects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants