Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please give big warnings #59

Closed
saschabrockel opened this issue Jan 7, 2025 · 14 comments
Closed

Please give big warnings #59

saschabrockel opened this issue Jan 7, 2025 · 14 comments
Labels
done Issue closed with success

Comments

@saschabrockel
Copy link

Is your feature request related to a problem? Please describe.
I actually killed my Paperless-NGX instance because the tool edited all my documents with default settings. Now all have different correspondents, tags, and names and I can't revert it. I tried to restore the database but of course, the file names and folder are also wrong now. I used the default settings and did not really think about what could happen.

Please give a big warning about this! I hope I'm the only one for who this horrible things happened. I will have so much work now to fix all of these documents manually...

@clusterzx
Copy link
Owner

Very sorry to hear that you ran default config on all your files and have to reorganize correspondents.

I will take your advice and implement a information directly into the setup process you have to confirm.

But - don't get me wrong - if you would have it locked onto specific tags that would not happened.

I repeat myself, I am sorry budd.
The tool was intended to help, not to bring pain to the user.
Hope you get your pre run state back.

A little side note at the end regarding your assumption that it changed the file names, it did not. It only sets correspondents, tags and title. As far as I know the changing of the title does not change the physical filename.

@paradizelost
Copy link

Something like this makes me think that defaulting to requiring a specific tag be on the document to be processed up front, and leave it to the person to remove that tag. That said, Before implementing something new, ALWAYS take backups, Paperless-ngx is fairly easy to do backup/restores (i've had to do several) even separate from vm-level backups.

@paradizelost
Copy link

A little side note at the end regarding your assumption that it changed the file names, it did not. It only sets correspondents, tags and title. As far as I know the changing of the title does not change the physical filename.

The way that paperless-ngx works with document paths is that it will update them based on correspondent and document type automatically, so if those change, paperless changes them

https://docs.paperless-ngx.com/advanced_usage/#storage-paths

@paradizelost
Copy link

Might even be worth adding a recommendation to the documentation to take a backup before starting it up and a checkbox or something to say you understand that it could mess up your stuff
https://docs.paperless-ngx.com/administration/#backup

@clusterzx
Copy link
Owner

That's weird, I did check my files and the remain the original filename after all.

Noted! Thanks for clarification.

@clusterzx
Copy link
Owner

Ok doublechecked it on a different machine with different instance.

Still remain the same filename. 🤯

But I will take my side note back now as I can not confirm on what settings this is based.

@thorschtn
Copy link

The recommendation should be: Not just a backup but a completely separate DEV instance.

Paperless AI does not work out of the box for any configuration: For really good results, you have to spend some time developing your own prompt and carry out lots of test runs. Therefore, it is essential to have two instances of PaperlessNGX and two instances of paperless-ai.

But the repo is receiving more and more attention and possibly also less experienced users.

@clusterzx - I would recommend to add a disclaimer in the sense of:

Caution!
paperless-ai makes changes to the documents in your productive paperlessNGX instance that cannot be easily undone.
Please test the results beforehand in a separate development environment and be sure to back up your documents and metadata beforehand!

@clusterzx
Copy link
Owner

I feel bad right now, as I do want to contribute something to the society.

But I have also to admit that I did not think broad enough that Users can fck up things in their paperless.

That's my bad as I assume what is logical to me, is also logical to others.

I will add a disclaimer right away.

@paradizelost
Copy link

Don't feel bad, this is something that anyone who is smart enough to run a paperless stack and figure out how to integrate with an LLM should have some kind of forethought. Thing is, the worst case scenario with Paperless is to have it reprocess your documents from scratch, so while it can be a pain in the rear, there is no actual data loss, and we all should have decent backups

@thorschtn
Copy link

@clusterzx Don't feel bad!

You've done incredible things in the last few days. Very big praise! <3

Unfortunately I can't support as a programmer, only as an experienced user and integrator. Let's hope that a few developers will soon come along to support this incredible project.

@clusterzx
Copy link
Owner

Thank you both @thorschtn @paradizelost for the uplift 🤩
And also for all your community work here, helping to make things better and improve.

@saschabrockel
Copy link
Author

It's not your fault. I saw your post on Reddit in r/selfhosted and really wanted to try it quickly in very limited time and thought like hell yeah I want to see if I can chat with my documents. Then I just saw "ah I have to process them first, okay there we go" and it was dumb.

And I wasn't as dumb as I supposed to be. I just split the locations for DB backup and files. Look what I found!
image

Let's hope I'm the only one who was doing this fault. But if there is one idiot like me there probably will be more.

@clusterzx
Copy link
Owner

Wow I am mooore than happy to hear that. Woah that's what a good backup strategy feels like, good for you.

Thanks for the update und beste Grüße 👋🏼

clusterzx added a commit that referenced this issue Jan 8, 2025
Addressing Fixes and new Features:

Fixes:
#66
#61
#58
#55
#53
#45
#59
#52
#49
#31
#37
#52

Added:
- Big New Feature: Playground
	- Try your prompts on your documents and see how they perform. In Playground no data will be updated in Paperless.
- Added Code and Markdown interpretation in Chat Mode.
- Chat Mode now works with Ollama
@mamema
Copy link

mamema commented Jan 8, 2025

This is a valuable topic. I’ve also restored my environment and I also have daily pgsql backups.
But the comment above about a dev environment I was also thinking about but with 25000 documents fine tuned, I was just to lazy…. Restore is faster

But again, think about this discussion, More and more (noob) users are getting interested into this solution. Neither “be warned” messages nor build a dev environment is suitable for them.
But perhaps a “dry run” implementation?
Or, I know this from a sophisticated renaming app, record every change and offer the reverting playback if needed.
THIS would help to learn not only for inexperienced users but all of us.

Perhaps I mix up things because there is a playground option, but I’m hesitating to use it, cause at the same time paperless ai is running and working in the background and mixing things up, if my settings are not there where the should be or bugs are still around.

@clusterzx clusterzx added the done Issue closed with success label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done Issue closed with success
Projects
None yet
Development

No branches or pull requests

5 participants