Skip to content

Conversation

sachinML
Copy link

Refs #30200

  • Use split_documents(docs) after header-based splitting (preserves per-section metadata; overlap applied per document).
  • Overlap appears only when a single section exceeds chunk_size.
  • Overlap does not cross section/document boundaries.
  • Consider strip_headers=True to avoid a tiny header-only chunk; keep "" as a fallback separator if text lacks newlines/spaces.

@github-actions github-actions bot added documentation Improvements or additions to documentation text-splitters Related to the package `text-splitters` and removed documentation Improvements or additions to documentation labels Oct 13, 2025
@sachinML
Copy link
Author

CI is green; CodeQL is pending. This is a docs-only change. Could a maintainer please trigger the CodeQL scan so this can be merged?

@sachinML sachinML force-pushed the docs/chunk-overlap-troubleshooting branch 6 times, most recently from 88e993b to 5e569a9 Compare October 18, 2025 09:44
@sachinML sachinML force-pushed the docs/chunk-overlap-troubleshooting branch from 5e569a9 to ae92263 Compare October 19, 2025 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant