Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appears that multiple passes are stepping on each other #45

Closed
paradizelost opened this issue Jan 7, 2025 · 6 comments
Closed

Appears that multiple passes are stepping on each other #45

paradizelost opened this issue Jan 7, 2025 · 6 comments
Assignees
Labels
investigating Investigating an issue

Comments

@paradizelost
Copy link

Describe the bug
Cron job is set to run every 30 minutes, logs appear that multiple runs are going concurrently. I have over 5k documents so it seems scanning through them for ones tagged to be processed is taking a very long time.

To Reproduce
Steps to reproduce the behavior:
set up default container with 30 minute Cron job and many documents

paperless-ai | Fetched page 23, got 0 matching documents. Total so far: 8
paperless-ai | Fetched page 58, got 0 matching documents. Total so far: 4

Expected behavior
prior run should finish before another pass is started

Desktop (please complete the following information):

  • OS: docker on debian
@clusterzx clusterzx self-assigned this Jan 7, 2025
@clusterzx clusterzx added the investigating Investigating an issue label Jan 7, 2025
@clusterzx
Copy link
Owner

Thanks for reporting. I will look into it.

@paradizelost
Copy link
Author

paradizelost commented Jan 7, 2025

Figured a bit more log output may help show the scope just in case.

paperless-ai | Fetched page 46, got 1 matching documents. Total so far: 479
paperless-ai | Fetched page 7, got 32 matching documents. Total so far: 417
paperless-ai | Fetched page 42, got 2 matching documents. Total so far: 482
paperless-ai | Fetched page 45, got 0 matching documents. Total so far: 481
paperless-ai | Fetched page 46, got 1 matching documents. Total so far: 482
paperless-ai | Fetched page 43, got 0 matching documents. Total so far: 482
paperless-ai | Fetched page 47, got 5 matching documents. Total so far: 484
paperless-ai | Fetched page 8, got 0 matching documents. Total so far: 417
paperless-ai | Fetched page 47, got 5 matching documents. Total so far: 487
paperless-ai | Fetched page 9, got 0 matching documents. Total so far: 417
paperless-ai | Fetched page 44, got 0 matching documents. Total so far: 482
paperless-ai | Fetched page 48, got 28 matching documents. Total so far: 512
paperless-ai | Fetched page 48, got 28 matching documents. Total so far: 515
paperless-ai | Fetched page 10, got 0 matching documents. Total so far: 417
paperless-ai | Fetched page 45, got 0 matching documents. Total so far: 482
paperless-ai | Fetched page 49, got 26 matching documents. Total so far: 538
paperless-ai | Fetched page 49, got 26 matching documents. Total so far: 541
paperless-ai | Fetched page 46, got 1 matching documents. Total so far: 483
paperless-ai | Fetched page 50, got 17 matching documents. Total so far: 555
paperless-ai | Fetched page 11, got 0 matching documents. Total so far: 417

@STL2020
Copy link

STL2020 commented Jan 7, 2025

Processing tag: AI-new

Refreshing tag cache...

Tag cache refreshed. Found 25 tags.

Found tag "AI-new" in cache with ID 80

Found tag "AI" in cache with ID 5

Created new correspondent "Palo Alto Networks" with ID 33

Removing unused tags from document 97, keeping tags: [ 80, 5 ]

No tags to remove

Current tags for document 97: [ 80 ]

Adding new tags: [ 80, 5 ]

Combined tags: [ 80, 5 ]

Updated document 97 with: {

tags: [ 80, 5 ],

correspondent: 33,

title: 'XYZ 01.05.2024'

}

Document 97 updated in processed_documents

Starting scheduled scan at 2025-01-07T13:00:00.471Z

Filtering documents for tags: [ 'ai-new' ]

Fetched page 1, got 63 matching documents. Total so far: 63

Fetched page 2, got 6 matching documents. Total so far: 69

Finished filtering. Found 69 documents matching the predefined tags.

80

30

Current config TAGS: [ 'AI-new' ]

Current config PROMPT_TAGS: []

80

30

2025-01-07T13:09:52: PM2 log: Stopping app:paperless-ai id:0

2025-01-07T13:09:52: PM2 log: 0 application online, retry = 3

2025-01-07T13:09:52: PM2 log: App name:paperless-ai id:0 disconnected

2025-01-07T13:09:52: PM2 log: App [paperless-ai:0] exited with code [0] via signal [SIGINT]

2025-01-07T13:09:52: PM2 log: pid=33 msg=process killed

2025-01-07T13:09:53: PM2 log: PM2 successfully stopped

2025-01-07T13:10:01: PM2 log: Launching in no daemon mode

2025-01-07T13:10:02: PM2 log: App [paperless-ai:0] starting in -cluster mode-

2025-01-07T13:10:02: PM2 log: App [paperless-ai:0] online

Loading .env from: /app/data/.env

Loaded environment variables: {

PAPERLESS_API_URL: 'http://paperless-ngx:8000/api',

PAPERLESS_API_TOKEN: '1fd33f9XXXXXXX9c05fb8XXXXXXd9138'

}

(node:19) [DEP0040] DeprecationWarning: The punycode module is deprecated. Please use a userland alternative instead.

(Use node --trace-deprecation ... to show where the warning was created)

Server running on port 3000

[DEBUG] [07.01.25, 13:10] OpenAI request sent

Configured scan interval: */15 * * * *

Starting initial scan at 2025-01-07T13:10:04.583Z

Refreshing tag cache...

Tag cache refreshed. Found 25 tags.

Filtering documents for tags: [ 'ai-new' ]

Error fetching tag text for ID 24: Request failed with status code 500

Fetched page 1, got 63 matching documents. Total so far: 63

Fetched page 2, got 6 matching documents. Total so far: 69

Finished filtering. Found 69 documents matching the predefined tags.

Starting scheduled scan at 2025-01-07T13:15:00.562Z

Refreshing tag cache...

Tag cache refreshed. Found 25 tags.

Filtering documents for tags: [ 'ai-new' ]

Error fetching tag text for ID 67: Request failed with status code 500

Error fetching tag text for ID 54: Request failed with status code 500

Error fetching tag text for ID 24: Request failed with status code 500

Error fetching tag text for ID 64: Request failed with status code 500

Error fetching tag text for ID 68: Request failed with status code 500

Error fetching tag text for ID 61: Request failed with status code 500

Error fetching tag text for ID 66: Request failed with status code 500

Error fetching tag text for ID 60: Request failed with status code 500

Error fetching tag text for ID 62: Request failed with status code 500

Error fetching tag text for ID 65: Request failed with status code 500

Error fetching tag text for ID 69: Request failed with status code 500

Error fetching tag text for ID 9: Request failed with status code 500

Error fetching tag text for ID 61: Request failed with status code 500

Fetched page 1, got 63 matching documents. Total so far: 63

Fetched page 2, got 6 matching documents. Total so far: 69

Finished filtering. Found 69 documents matching the predefined tags.

@STL2020
Copy link

STL2020 commented Jan 7, 2025

Version latest.
Portainer / Docker on NAS

@paradizelost
Copy link
Author

I am also seeing the status 500 error codes like @STL2020 shows.
Also, not sure if it was related or not with the number of documents i have, i did end up having to increase the mariadb max_connections value significantly to not have it spewing errors because the database was locked out.

clusterzx added a commit that referenced this issue Jan 8, 2025
Addressing Fixes and new Features:

Fixes:
#66
#61
#58
#55
#53
#45
#59
#52
#49
#31
#37
#52

Added:
- Big New Feature: Playground
	- Try your prompts on your documents and see how they perform. In Playground no data will be updated in Paperless.
- Added Code and Markdown interpretation in Chat Mode.
- Chat Mode now works with Ollama
@paradizelost
Copy link
Author

the processing definitely appears much faster, appears to be pulling in all of the documents and tags, and goes fast enough that it shouldn't step on itself. Very much improved and the 500 errors appear to be gone as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Investigating an issue
Projects
None yet
Development

No branches or pull requests

3 participants