You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/concepts.mdx
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -422,6 +422,15 @@ That means there are two different axes along which you can customize your text
422
422
423
423
For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).
424
424
425
+
#### Markdown
426
+
427
+
LangChain provides specialized text splitters for Markdown documents. These splitters are designed to handle Markdown-specific syntax and preserve the structure of the document.
428
+
429
+
-**MarkdownHeaderTextSplitter**: Splits text based on Markdown headers, adding relevant information about where each chunk came from.
430
+
-**ExperimentalMarkdownSyntaxTextSplitter**: Retains the original whitespace and formatting, addressing issues with code blocks and nested lists.
431
+
432
+
For guidance on using these splitters, refer to the [how-to guides](/docs/how_to/#text-splitters).
| Recursive |[RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/)| A list of user defined characters || Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text. |
1040
1049
| HTML |[HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/)| HTML specific characters | ✅ | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML) |
1041
-
| Markdown |[MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), | Markdown specific characters | ✅ | Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown)|
1050
+
| Markdown |[MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), [ExperimentalMarkdownSyntaxTextSplitter](/docs/how_to/experimental_markdown_syntax_text_splitter/)| Markdown specific characters | ✅ | Splits text based on Markdown-specific characters. The `ExperimentalMarkdownSyntaxTextSplitter` retains the original whitespace and formatting, addressing issues with code blocks and nested lists.|
1042
1051
| Code |[many languages](/docs/how_to/code_splitter/)| Code (Python, JS) specific characters || Splits text based on characters specific to coding languages. 15 different languages are available to choose from. |
1043
1052
| Token |[many classes](/docs/how_to/split_by_token/)| Tokens || Splits text on tokens. There exist a few different ways to measure tokens. |
1044
1053
| Character |[CharacterTextSplitter](/docs/how_to/character_text_splitter/)| A user defined character || Splits text based on a user defined character. One of the simpler methods. |
0 commit comments