Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Paperless-AI Fails to Rename and Tag New Documents in Paperless-ngx Automatically After Initial Deployment Success #87

Closed
wigwurm opened this issue Jan 10, 2025 · 14 comments
Assignees
Labels
investigating Investigating an issue should-be-fixed-recheck-please Implemented a fix. Try again. If still bugged, reopen issue.

Comments

@wigwurm
Copy link

wigwurm commented Jan 10, 2025

Describe the bug
After deployment of Paperless-AI (Docker Compose) everything works exactly as expected; existing docs are being fed into ChatGPT, being renamed and tagged correctly afterwards. However, if new documents are being added to Paperless, then they are being recognized and shown by Paperless-AI (on its dashboard) and it shows, that all were "AI Processed," but the automatic renaming and tagging in Paperless doesn't work anymore, as it was right after deployment.

The Linux instance, Docker, both containers/stacks have been updated using the latest versions (Paperless-AI: 2.0.0, Paperless-ngx: 2.13.5). I tried it out on two different Docker hosts/completely different machines; one time in deployment of Paperless-AI in the same Docker Compose file as Paperless, and one time "standalone" as Docker Compose. Both times, exactly the same thing happens - no automatic updates are being played back to Paperless, after it had worked initially perfectly fine.

Expected behavior
Ideally, Paperless-AI should scan the Paperless library every X minutes (set in cron format in the setup window) - which is actually working - and then the new docs are being scanned by ChatGPT, BUT not updated accordingly in Paperless.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Docker on Ubuntu 24.04
  • Browser: Chrome 131.0.6778.205 (Official Build) (arm64)
Screenshot 2025-01-10 at 12 59 20 Screenshot 2025-01-10 at 13 02 19
@wigwurm wigwurm changed the title Refresh Paperless-AI Fails to Rename and Tag New Documents in Paperless-ngx After Initial Deployment Success Jan 10, 2025
@wigwurm wigwurm changed the title Paperless-AI Fails to Rename and Tag New Documents in Paperless-ngx After Initial Deployment Success [Bug] Paperless-AI Fails to Rename and Tag New Documents in Paperless-ngx After Initial Deployment Success Jan 10, 2025
@wigwurm wigwurm changed the title [Bug] Paperless-AI Fails to Rename and Tag New Documents in Paperless-ngx After Initial Deployment Success [Bug] Paperless-AI Fails to Rename and Tag New Documents in Paperless-ngx Automatically After Initial Deployment Success Jan 10, 2025
@clusterzx clusterzx self-assigned this Jan 10, 2025
@clusterzx clusterzx added the investigating Investigating an issue label Jan 10, 2025
@clusterzx
Copy link
Owner

Thats weird. Do you add the documents with the same user as the token is from?

@wigwurm
Copy link
Author

wigwurm commented Jan 10, 2025

Thanks so much for your quick reply - much, much appreciated... ❤️ Your app is just so amazing!!! I am very hyped, and even more sad, when I now don't get it working properly.

Absolutely, I did. I even renewed the token a couple times to ensure, that they're not outdated or that I might have made a copy-paste error... But nope. In addition, it works perfectly fine right after deployment; and the new docs are being shown in Paperless-AI, that means it has full access to Paperless as such; it send the docs to OpenAI, everything is working as expected - only not the "update" (renaming and tagging) of the docs in Paperless. Very weird indeed... The container logs all look perfectly fine, too.

@clusterzx
Copy link
Owner

You are absolutly welcome 👍 we will get this fixed together. I am sure about that.
I will get back to you in a bit.

As I see you are German, maybe you have Discord where we could investigate that further? Via Screensharing or something like that.

Beste Grüße! 😆

@clusterzx
Copy link
Owner

Another question, just curious, what was the time span between you did the initial run and adding new documents?

@wigwurm
Copy link
Author

wigwurm commented Jan 10, 2025

As said, tested it with two different deployments and with various documents, specifically, adding docs to test - and in terms of time span I had anything between a couple minutes and two hours... Tried a couple times, re-deployed the containers, etc.

@clusterzx
Copy link
Owner

Can you include a docker log of a one hour run of the container?

@cepheiden
Copy link

Your app is just so damn amazing!!!

but looks like , i have the same issue. no auto tagging works. Fresh installation.

@kolle86
Copy link

kolle86 commented Jan 10, 2025

i have a similar behaviour. tagging and renaming works fine in manual mode or when restarting the container.
but the cron event throws following error:

Starting scheduled scan at 2025-01-10T13:30:00.697Z
Fetched page 1, got 100 tags. Total so far: 100
Fetched page 2, got 25 tags. Total so far: 125
Fetched page 1, got 43 documents. Total so far: 43
Finished fetching. Found 43 documents.
Processing new document: Gescannt_20250110-1427
Thumbnail not cached, fetching from Paperless
Error status: 500
Error fetching thumbnail for document undefined: Request failed with status code 500
Error headers: Object [AxiosHeaders] {
  server: 'openresty',
  date: 'Fri, 10 Jan 2025 13:30:01 GMT',
  'content-type': 'text/html; charset=utf-8',
  'content-length': '145',
  connection: 'keep-alive',
  'x-frame-options': 'SAMEORIGIN',
  'x-api-version': '5',
Thumbnail nicht gefunden
  'x-version': '2.13.5',
  vary: 'Accept-Language, origin, Cookie',
  'content-language': 'en-us',
  'x-content-type-options': 'nosniff',
  'referrer-policy': 'same-origin',
  'cross-origin-opener-policy': 'same-origin'
}
Failed to get thumbnail TypeError [ERR_INVALID_ARG_TYPE]: The "data" argument must be of type string or an instance of Buffer, TypedArray, or DataView. Received null
    at Object.writeFile (node:internal/fs/promises:1203:5)
    at OpenAIService.analyzeDocument (/app/services/openaiService.js:76:20)
    at async scanDocuments (/app/server.js:144:26)
    at async Task._execution (/app/server.js:263:7) {
  code: 'ERR_INVALID_ARG_TYPE'
}
[DEBUG] [10.01.25, 13:30] OpenAI request sent
[DEBUG] [10.01.25, 13:30] Used tokens: 237, Total tokens: 698
Refreshing tag cache...
Tag cache refreshed. Found 25 tags.
No tags provided to processTags
Current tags for document 113: []
Adding new tags: []
Current correspondent: null
New correspondent: undefined
Combined tags: []
Updated document 113 with: {
  tags: [],
  title: 'Gescannt_20250110-1427',
  created: '1986-05-23T22:00:00.000Z'
}
Document Gescannt_20250110-1427 added to processed_documents
[INFO] Task completed

Right after that i analysed the document in manual mode and everything went ok!

@clusterzx
Copy link
Owner

clusterzx commented Jan 10, 2025

@kolle86 I would say this documents belongs to another user. Had this issue with a friend of mine with a big paperless instance of thousand of files and different users. That message popped up everytime the api token had not the rights for that document

@clusterzx
Copy link
Owner

@wigwurm @cepheiden @kolle86 I will push a major update in the next minutes. It comes with User Authentication, many bugfixes and a reworked document scanning function.

Maybe this solves some issues here.

@wigwurm
Copy link
Author

wigwurm commented Jan 10, 2025

Oh man, yes... @clusterzx Have missed your message before - allerbeste Grüße zurück! 🤩

Thank you so, so much... 😇 I am also sure about it, as this would be a real game-changer for my Paperless endeavors - and I really want it to work...

Despite that I am IMHO not that bad equipped in homelab terms, I don't have Discord - maybe I really should have a look into it... 🤔

Of course, attached you'll find a "sanitized" log... 👍🏻

Paperless-AI_Container_Log.json

@cepheiden
Copy link

yes , indeed a access rights issue ! , many thanks @clusterzx

@bketelsen
Copy link

I have a similar issue, but haven't had time to investigate for clues/causes. it works initially for several minutes, then it appears to be working, but the documents in Paperless aren't modified.

@kolle86
Copy link

kolle86 commented Jan 10, 2025

okay, i set the owner for every document and this solved the issue ! thanks for the update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Investigating an issue should-be-fixed-recheck-please Implemented a fix. Try again. If still bugged, reopen issue.
Projects
None yet
Development

No branches or pull requests

5 participants