feature: upload from pdf2md incrementallyu rather than just at the end #2930

densumesh · 2024-12-11T00:32:47Z

Description

Target(s)

Community channels

Matrix is preferred. Reach out on discord or Matrix for further assistance.

vtempest · 2024-12-13T22:07:13Z

https://airesearch.js.org/functions/convertPDFToHTML.html
I've checked out the pdf2md but think how will that scale to 10k PDFs
That's why you need a hybrid use my 0$ version which also does reference extraction
Then ocr images tables only if needed by the prompt about that chunk. But not summarize or topicify types of prompts. Otherwise this is a very expensive overkill to use gpt token many PDFs

Nick Khami (@skeptrune) is probably telling you all to ingore me which is violating open source code of conduct for welcoming environment when he passive aggressively looks for bs excuse to block me ignoring my years of research ideas many of which he later adopted like switching to HF vector and pdf2md. This is not the way to handle it hoping I go away instead talk it out apologize in the spirit of the holidays and include qualified developers.

cdxker linked a pull request Dec 14, 2024 that will close this issue

feature: incrementally add pages #2932

Merged

cdxker closed this as completed in #2932 Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: upload from pdf2md incrementallyu rather than just at the end #2930

feature: upload from pdf2md incrementallyu rather than just at the end #2930

densumesh commented Dec 11, 2024

vtempest commented Dec 13, 2024 •

edited

Loading

feature: upload from pdf2md incrementallyu rather than just at the end #2930

feature: upload from pdf2md incrementallyu rather than just at the end #2930

Comments

densumesh commented Dec 11, 2024

Description

Target(s)

Community channels

vtempest commented Dec 13, 2024 • edited Loading

vtempest commented Dec 13, 2024 •

edited

Loading