Skip to content

Commit bd4bb0e

Browse files
Docs update (8213ab1)
1 parent 93d0ad9 commit bd4bb0e

File tree

2 files changed

+11
-1
lines changed

2 files changed

+11
-1
lines changed

docs/docs/concepts.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -422,6 +422,15 @@ That means there are two different axes along which you can customize your text
422422

423423
For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).
424424

425+
#### Markdown
426+
427+
LangChain provides specialized text splitters for Markdown documents. These splitters are designed to handle Markdown-specific syntax and preserve the structure of the document.
428+
429+
- **MarkdownHeaderTextSplitter**: Splits text based on Markdown headers, adding relevant information about where each chunk came from.
430+
- **ExperimentalMarkdownSyntaxTextSplitter**: Retains the original whitespace and formatting, addressing issues with code blocks and nested lists.
431+
432+
For guidance on using these splitters, refer to the [how-to guides](/docs/how_to/#text-splitters).
433+
425434
### Embedding models
426435
<span data-heading-keywords="embedding,embeddings"></span>
427436

@@ -1038,7 +1047,7 @@ Table columns:
10381047
|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
10391048
| Recursive | [RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/) | A list of user defined characters | | Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text. |
10401049
| HTML | [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/) | HTML specific characters || Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML) |
1041-
| Markdown | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), | Markdown specific characters || Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown) |
1050+
| Markdown | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), [ExperimentalMarkdownSyntaxTextSplitter](/docs/how_to/experimental_markdown_syntax_text_splitter/) | Markdown specific characters || Splits text based on Markdown-specific characters. The `ExperimentalMarkdownSyntaxTextSplitter` retains the original whitespace and formatting, addressing issues with code blocks and nested lists. |
10421051
| Code | [many languages](/docs/how_to/code_splitter/) | Code (Python, JS) specific characters | | Splits text based on characters specific to coding languages. 15 different languages are available to choose from. |
10431052
| Token | [many classes](/docs/how_to/split_by_token/) | Tokens | | Splits text on tokens. There exist a few different ways to measure tokens. |
10441053
| Character | [CharacterTextSplitter](/docs/how_to/character_text_splitter/) | A user defined character | | Splits text based on a user defined character. One of the simpler methods. |

docs/docs/how_to/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,7 @@ What LangChain calls [LLMs](/docs/concepts/#llms) are older forms of language mo
134134
- [How to: split by character](/docs/how_to/character_text_splitter)
135135
- [How to: split code](/docs/how_to/code_splitter)
136136
- [How to: split Markdown by headers](/docs/how_to/markdown_header_metadata_splitter)
137+
- [How to: split Markdown with experimental syntax retention](/docs/how_to/experimental_markdown_syntax_text_splitter)
137138
- [How to: recursively split JSON](/docs/how_to/recursive_json_splitter)
138139
- [How to: split text into semantic chunks](/docs/how_to/semantic-chunker)
139140
- [How to: split by tokens](/docs/how_to/split_by_token)

0 commit comments

Comments
 (0)