Skip to content

Latest commit

 

History

History
3 lines (3 loc) · 655 Bytes

MinerU.md

File metadata and controls

3 lines (3 loc) · 655 Bytes

MinerU:A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

  • MinerU is a tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format. MinerU was born during the pre-training process of InternLM. We focus on solving symbol conversion issues in scientific literature and hope to contribute to technological development in the era of large models. Compared to well-known commercial products, MinerU is still young.
  • https://github.com/opendatalab/MinerU