[Document Understanding] Can we support a new task on document understanding? #218

jlia0 · 2023-07-26T15:42:36Z

Document Understanding

Some example models:

DiT: https://huggingface.co/microsoft/dit-large
LayoutLMv3: https://huggingface.co/microsoft/layoutlmv3-large
Donut: https://huggingface.co/docs/transformers/model_doc/donut

Reason for request

Document understanding is a very popular task which I couldn't find any supports for the web environment.

Some tasks include:

Key Information Extraction (KIE)
Document Layout Analysis (DLA)
Document Question Answering (DQA)
Optical Character Recognition (OCR)

xenova · 2023-07-26T17:01:51Z

Those do sound like quite interesting use-cases! Do you mind sharing example code for how you would use the models, as well as the inputs and expected outputs?

jlia0 · 2023-07-26T17:21:28Z

Here's an example code using detectron2 and DiT on document layout analysis.

DiT Doc: https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/dit
HF Space: https://huggingface.co/spaces/imjliao/dit-document-layout-analysis/blob/main/app.py

xenova · 2023-07-26T17:38:19Z

The repo you shared is private, but I assume I can use this one: https://huggingface.co/spaces/nielsr/dit-document-layout-analysis

jlia0 · 2023-07-26T19:11:41Z

The repo you shared is private, but I assume I can use this one: https://huggingface.co/spaces/nielsr/dit-document-layout-analysis

Oh yes sorry! I forgot it's my private repo. But you're correct, I am using that one as well.

How do you think we can include this to transformer.js? Seems like there is a dependency issue of detectron2...

xenova · 2023-07-26T21:26:24Z

Hmm, that might complicate things somewhat... Perhaps there is a JS library out there which is a suitable substitute?

jlia0 · 2023-07-26T21:36:53Z

Hmm, that might complicate things somewhat... Perhaps there is a JS library out there which is a suitable substitute?

I don't see a JS library out there could do similar stuffs. But I found something that's worth checking out:

https://github.com/Unstructured-IO/unstructured-inference/blob/main/unstructured_inference/models/detectron2onnx.py

^^^ This is a working example of detectron2 using ONNXRuntime...

xenova · 2023-10-30T23:20:18Z

Just an update on this:

Add support for DonutSwin models (Closes #318) #320 added support for Donut models (document parsing and document question answering)
Add support for TrOCR models (for Optical Character Recognition) #375 will add support for TrOCR models (optical character recognition)

The other tasks (Key Information Extraction and Document Layout Analysis) might be slightly more difficult to add (due the their additional dependencies)... but we'll get there eventually :)

martinsomm · 2024-10-23T12:06:47Z

Dear @xenova :) I'm trying to implement https://huggingface.co/docs/transformers.js/api/pipelines#module_pipelines.DocumentQuestionAnsweringPipeline while importing transformers.js from cdn as described in https://huggingface.co/docs/transformers.js/main/en/tutorials/vanilla-js#step-2-javascript-setup.

I get error below

transformers:187 Uncaught Error: This pipeline is not yet supported in Transformers.js v3. at Function._call (transformers:187:16286) at e (transformers:214:134)

Wondering how to circumvent this issue..

Thanks and best regards
Martin

xenova · 2024-10-23T18:54:05Z

Thanks @martinsomm for the report - this will be fixed by #987.

martinsomm · 2024-10-25T12:55:39Z

Hi @xenova and wow, thanks for the swift turnaround, very much appreciate your effort :) now I'm wondering when the change will be available on jsdelivr as I still get the same exception, even after "empty cache and hard refresh" in browser.

Best regards, Martin

xenova · 2024-10-25T17:34:27Z

@martinsomm We've now published https://www.npmjs.com/package/@huggingface/transformers/v/3.0.1, so you can import it from jsdelivr using:

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';

martinsomm · 2024-10-29T12:33:56Z

@xenova thanks again :) now I get updated version from jsdelivr.. this is probably the wrong channel to address another question.. I was wondering what it would take to make batch processing possible, so we can ask multiple questions per image. I can see a comment that currently only batch size of 1 is supported for the DocumentQuestionAnsweringPipeline

jlia0 added the enhancement New feature or request label Jul 26, 2023

xenova mentioned this issue Oct 23, 2024

Fix Document QA pipeline #987

Merged

xenova closed this as completed in #987 Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Document Understanding] Can we support a new task on document understanding? #218

[Document Understanding] Can we support a new task on document understanding? #218

jlia0 commented Jul 26, 2023 •

edited

Loading

xenova commented Jul 26, 2023

jlia0 commented Jul 26, 2023 •

edited

Loading

xenova commented Jul 26, 2023

jlia0 commented Jul 26, 2023

xenova commented Jul 26, 2023

jlia0 commented Jul 26, 2023

xenova commented Oct 30, 2023

martinsomm commented Oct 23, 2024

xenova commented Oct 23, 2024

martinsomm commented Oct 25, 2024

xenova commented Oct 25, 2024

martinsomm commented Oct 29, 2024 •

edited

Loading

[Document Understanding] Can we support a new task on document understanding? #218

[Document Understanding] Can we support a new task on document understanding? #218

Comments

jlia0 commented Jul 26, 2023 • edited Loading

xenova commented Jul 26, 2023

jlia0 commented Jul 26, 2023 • edited Loading

xenova commented Jul 26, 2023

jlia0 commented Jul 26, 2023

xenova commented Jul 26, 2023

jlia0 commented Jul 26, 2023

xenova commented Oct 30, 2023

martinsomm commented Oct 23, 2024

xenova commented Oct 23, 2024

martinsomm commented Oct 25, 2024

xenova commented Oct 25, 2024

martinsomm commented Oct 29, 2024 • edited Loading

jlia0 commented Jul 26, 2023 •

edited

Loading

jlia0 commented Jul 26, 2023 •

edited

Loading

martinsomm commented Oct 29, 2024 •

edited

Loading