Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Only use Existing Correspondents #37

Closed
CreekDuzz opened this issue Jan 6, 2025 · 5 comments
Closed

[Feature Request] Only use Existing Correspondents #37

CreekDuzz opened this issue Jan 6, 2025 · 5 comments

Comments

@CreekDuzz
Copy link

CreekDuzz commented Jan 6, 2025

Is your feature request related to a problem? Please describe.
If the AI is free to pick the correspondent, in some cases it will make small variations of the correspondent's name. E.g. [Apple] vs [Apple Inc]. This makes grouping the documents later much more difficult.

Describe the solution you'd like
Like with the existing tags function, limit the AI to only use and consider the existing correspondents. If it cannot be found, allow for the option to add it to the title.

Describe alternatives you've considered
I have tested with sending a full list of the correspondents with the Prompt. It seems to help but it is not a full match.

@clusterzx
Copy link
Owner

Will implement this till the weekend including existing Tags.

But bear in mind:
If you have thousands of correspondents and thousands of tags then the cost will explode tremendously.

@CreekDuzz
Copy link
Author

Yes, you are right. I expect it will not reach those numbers in many cases. I have about 2700 documents collected since 2004 and have 90 correspondents. Yes, I did not include some of the one-off correspondents.

Nevertheless, I would pick higher quality output over a bit more cost.

Thank you!

@thorschtn
Copy link

thorschtn commented Jan 6, 2025

@CreekDuzz

You can already significantly improve the generation of corredpondents by expanding the prompt:

When generating the corredpondent, always create the shortest possible form of the company name (e.g. "Amazon" instead of "Amazon EU SARL, German branch", [add further examples]).

This already improves the accuracy immensely.

Alternatively, you can specify the complete list of all allowed cortrdpondents in the prompt.

@clusterzx
Copy link
Owner

Thats a great tip! @thorschtn

@clusterzx clusterzx changed the title Only use Existing Correspondents [Feature Request] Only use Existing Correspondents Jan 6, 2025
@Lumpybd
Copy link

Lumpybd commented Jan 7, 2025

I'll second this. My list of correspondents is very carefully curated. Letting OpenAI loose on creating new correspondents has created quite a large mess, duplicates etc. I'd prefer to be able to restrict Paperless AI to only select from existing correspondents when populating this field. Perhaps allow for adding new or suggested correspondents when manually processing files?

clusterzx added a commit that referenced this issue Jan 8, 2025
Addressing Fixes and new Features:

Fixes:
#66
#61
#58
#55
#53
#45
#59
#52
#49
#31
#37
#52

Added:
- Big New Feature: Playground
	- Try your prompts on your documents and see how they perform. In Playground no data will be updated in Paperless.
- Added Code and Markdown interpretation in Chat Mode.
- Chat Mode now works with Ollama
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants