Docs update (8213ab1)

promptless[bot] · web-flow · commit bd4bb0ed4257 · 2024-12-17T04:37:01.000Z
diff --git a/docs/docs/concepts.mdx b/docs/docs/concepts.mdx
@@ -422,6 +422,15 @@ That means there are two different axes along which you can customize your text
 
 For specifics on how to use text splitters, see the [relevant how-to guides here](/docs/how_to/#text-splitters).
 
+#### Markdown
+
+LangChain provides specialized text splitters for Markdown documents. These splitters are designed to handle Markdown-specific syntax and preserve the structure of the document.
+
+- **MarkdownHeaderTextSplitter**: Splits text based on Markdown headers, adding relevant information about where each chunk came from.
+- **ExperimentalMarkdownSyntaxTextSplitter**: Retains the original whitespace and formatting, addressing issues with code blocks and nested lists.
+
+For guidance on using these splitters, refer to the [how-to guides](/docs/how_to/#text-splitters).
+
 ### Embedding models
 <span data-heading-keywords="embedding,embeddings"></span>
 
@@ -1038,7 +1047,7 @@ Table columns:
 |----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | Recursive | [RecursiveCharacterTextSplitter](/docs/how_to/recursive_text_splitter/), [RecursiveJsonSplitter](/docs/how_to/recursive_json_splitter/) | A list of user defined characters     |               | Recursively splits text. This splitting is trying to keep related pieces of text next to each other. This is the `recommended way` to start splitting text.                                                                                                                    |
 | HTML      | [HTMLHeaderTextSplitter](/docs/how_to/HTML_header_metadata_splitter/), [HTMLSectionSplitter](/docs/how_to/HTML_section_aware_splitter/)          | HTML specific characters                                                                                                 | ✅             | Splits text based on HTML-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the HTML)                                                                                                                               |
-| Markdown  | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/),                                                                                                           | Markdown specific characters                                                                                    | ✅             | Splits text based on Markdown-specific characters. Notably, this adds in relevant information about where that chunk came from (based on the Markdown)                                                                                                                       |
+| Markdown  | [MarkdownHeaderTextSplitter](/docs/how_to/markdown_header_metadata_splitter/), [ExperimentalMarkdownSyntaxTextSplitter](/docs/how_to/experimental_markdown_syntax_text_splitter/) | Markdown specific characters                                                                                    | ✅             | Splits text based on Markdown-specific characters. The `ExperimentalMarkdownSyntaxTextSplitter` retains the original whitespace and formatting, addressing issues with code blocks and nested lists.                                                                                                                       |
 | Code      | [many languages](/docs/how_to/code_splitter/)                                                                                                                                 | Code (Python, JS) specific characters                                                                           |               | Splits text based on characters specific to coding languages. 15 different languages are available to choose from.                                                                                                                                                           |
 | Token    | [many classes](/docs/how_to/split_by_token/)                                                                                                                                  | Tokens                                                                                                          |               | Splits text on tokens. There exist a few different ways to measure tokens.                                                                                                                                                                                                   |
 | Character  | [CharacterTextSplitter](/docs/how_to/character_text_splitter/)                                                                                                                | A user defined character                                                                                        |               | Splits text based on a user defined character. One of the simpler methods.                                                                                                                                                                                                   |
diff --git a/docs/docs/how_to/index.mdx b/docs/docs/how_to/index.mdx
@@ -134,6 +134,7 @@ What LangChain calls [LLMs](/docs/concepts/#llms) are older forms of language mo
 - [How to: split by character](/docs/how_to/character_text_splitter)
 - [How to: split code](/docs/how_to/code_splitter)
 - [How to: split Markdown by headers](/docs/how_to/markdown_header_metadata_splitter)
+- [How to: split Markdown with experimental syntax retention](/docs/how_to/experimental_markdown_syntax_text_splitter)
 - [How to: recursively split JSON](/docs/how_to/recursive_json_splitter)
 - [How to: split text into semantic chunks](/docs/how_to/semantic-chunker)
 - [How to: split by tokens](/docs/how_to/split_by_token)