Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 932 Bytes

README.md

File metadata and controls

45 lines (31 loc) · 932 Bytes

spider_transformations

The Rust spider cloud transformation library built for performance, AI, and multiple locales. The library is used on Spider Cloud for data cleaning.

Usage

[dependencies]
spider_transformations = "2"
use spider_transformations::transformation::content;

fn main() {
    // page comes from the spider object when streaming.
    let mut conf = content::TransformConfig::default();
    conf.return_format = content::ReturnFormat::Markdown;
    let content = content::transform_content(&page, &conf, &None, &None);
}

Transfrom types

  1. Markdown
  2. Commonmark
  3. Text
  4. Markdown (Text Map) or HTML2Text
  5. WIP: HTML2XML

Enhancements

  1. Readability
  2. Encoding

Chunking

There are several chunking utils in the transformation mod.

This project has rewrites and forks of html2md, and html2text for performance and bug fixes.

License

MIT