Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize srv broker and executor logic #630

Merged
merged 6 commits into from
May 7, 2024
Merged

Conversation

fkzhao
Copy link
Contributor

@fkzhao fkzhao commented Apr 30, 2024

What problem does this PR solve?

Optimize task broker and executor for reduce memory usage and deployment complexity.

Type of change

  • Performance Improvement
  • Refactoring

Change Log

  • Enhance redis utils for message queue(use stream)
  • Modify task broker logic via message queue (1.get parse event from message queue 2.use ThreadPoolExecutor async executor )
  • Modify the table column name of document and task (process_duation -> process_duration maybe just a spelling mistake)
  • Reformat some code style(just what i see)
  • Add requirement_dev.txt for developer
  • Add redis container on docker compose

@KevinHuSh
Copy link
Collaborator

KevinHuSh commented May 1, 2024

Supper fantastic! Appreciate for your contribution!
A concern: if the name 'process_duation' is refined, how to update it to the existing users easily and smoothly, who have already deployed locally?
Do you have any opionion?

@KevinHuSh KevinHuSh requested a review from cike8899 May 1, 2024 00:31
@fkzhao
Copy link
Contributor Author

fkzhao commented May 1, 2024

Supper fantastic! Appreciate for your contribution! A concern: if the name 'process_duation' is refined, how to update it to the existing users easily and smoothly, who have already deployed locally? Do you have any opionion?

I think the best solution is manually change column of existing table.
If the change merge to main, I will open an issue to record how to fix it.
SQL:
ALTER TABLE rag_flow.task CHANGE COLUMN process_duation process_duration FLOAT;;
ALTER TABLE rag_flow.document CHANGE COLUMN process_duation process_duration FLOAT

@KevinHuSh
Copy link
Collaborator

Supper fantastic! Appreciate for your contribution! A concern: if the name 'process_duation' is refined, how to update it to the existing users easily and smoothly, who have already deployed locally? Do you have any opionion?

I think the best solution is manually change column of existing table. If the change merge to main, I will open an issue to record how to fix it. SQL: ALTER TABLE rag_flow.task CHANGE COLUMN process_duation process_duration FLOAT;; ALTER TABLE rag_flow.document CHANGE COLUMN process_duation process_duration FLOAT

What about letting it be. There're bunch of users or developers who do not even know how to execute SQL.
How about letting it be.

tm = findMaxTm(tm_fnm)
rows = collect(tm)
if len(rows) == 0:
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As designed, broker is used for dispatching task and updating progress, and executor is for processing task including embedding since we can deploy one executor or multiple executors in different machines to enlarge the document processing through output.
I suggest that:

  1. remove broker, and put the progress updating function to ragflow_server using multi-thread.
  2. the executor reads tasks from redis queue and execute them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, I'm doing this suggest.
The goal is to make the design simpler.

@yingfeng
Copy link
Member

yingfeng commented May 6, 2024

Hi, adopting redis as a MQ is a good alternative compared with current MySQL. However, redis does not guarantee the durability of data which might lead to task loss. A better solution is to introduce some dedicated MQ system such as redpanda, however it will introduce extra maintainence overhead. We will consider the tradeoff among maintainability, availability, as well as performance. Thanks for your contribution.

What problem does this PR solve?

Optimize task broker and executor for reduce memory usage and deployment complexity.

Type of change

* [ ]  Performance Improvement

* [ ]  Some code optimize

Change Log

* Enhance redis utils for message queue(use stream)

* Modify task broker logic via message queue (1.get parse event from message queue 2.use ThreadPoolExecutor async executor )

* Modify the table column name of document and task  (process_duation -> process_duration maybe just a spelling mistake)

* Reformat some code style(just what i see)

* Add requirement_dev.txt for developer

* Add redis container on docker compose

@fkzhao
Copy link
Contributor Author

fkzhao commented May 6, 2024

Supper fantastic! Appreciate for your contribution! A concern: if the name 'process_duation' is refined, how to update it to the existing users easily and smoothly, who have already deployed locally? Do you have any opionion?

I think the best solution is manually change column of existing table. If the change merge to main, I will open an issue to record how to fix it. SQL: ALTER TABLE rag_flow.task CHANGE COLUMN process_duation process_duration FLOAT;; ALTER TABLE rag_flow.document CHANGE COLUMN process_duation process_duration FLOAT

What about letting it be. There're bunch of users or developers who do not even know how to execute SQL. How about letting it be.

Got it, I'm going to roll back this change.
After thinking about it, I don’t think this is a very important change.

@KevinHuSh KevinHuSh merged commit de839fc into infiniflow:main May 7, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants