Skip to content

Commit

Permalink
[README] Update
Browse files Browse the repository at this point in the history
  • Loading branch information
Menghuan1918 committed Nov 26, 2024
1 parent 6223669 commit ae8755e
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,18 @@ Use various OCR or PDF recognition tools to identify images and add them to the

After conversion and pre-processing of PDF using Doc2X, you can achieve better recognition rates when used with knowledge base applications such as [graphrag](https://github.com/microsoft/graphrag), [Dify](https://github.com/langgenius/dify), and [FastGPT](https://github.com/labring/FastGPT).

### Markdown Document Processing Features

`pdfdeal` also provides a series of powerful tools to handle Markdown documents:

- **Convert HTML tables to Markdown format**: Allows conversion of HTML formatted tables to Markdown format for easy use in Markdown documents.
- **Upload images to remote storage services**: Supports uploading local or online images in Markdown documents to remote storage services to ensure image persistence and accessibility.
- **Convert online images to local images**: Allows downloading and converting online images in Markdown documents to local images for offline use.
- **Document splitting and separator addition**: Supports splitting Markdown documents by headings or adding separators within documents for better organization and management.

For detailed feature introduction and usage, please refer to the [documentation link](https://menghuan1918.github.io/pdfdeal-docs/guide/Tools/).


## Cases

### graphrag
Expand Down
11 changes: 11 additions & 0 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,17 @@

对 PDF 使用 Doc2X 转换并预处理后,与知识库应用程序(例如[graphrag](https://github.com/microsoft/graphrag)[Dify](https://github.com/langgenius/dify)[FastGPT](https://github.com/labring/FastGPT)),可以显著提升召回率。

### Markdown 文档处理功能

`pdfdeal` 也提供了一系列强大的工具来处理 Markdown 文档:

- **HTML 表格转换为 Markdown 格式**:可以将 HTML 格式的表格转换为 Markdown 格式,方便在 Markdown 文档中使用。
- **图片上传到远端储存服务**:支持将 Markdown 文档中的本地或在线图片上传到远端储存服务,确保图片的持久性和可访问性。
- **在线图片转换为本地图片**:可以将 Markdown 文档中的在线图片下载并转换为本地图片,便于离线使用。
- **文档拆分与分隔符添加**:支持按照标题拆分 Markdown 文档或在文档中添加分隔符,以便于文档的组织和管理。

详细功能介绍和使用方法请参见[文档链接](https://menghuan1918.github.io/pdfdeal-docs/zh/guide/Tools/)

## 案例

### graphrag
Expand Down

0 comments on commit ae8755e

Please sign in to comment.