Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-package: Add handling for IR extraction jobs to the query scheduler and workers. #460

Merged
merged 30 commits into from
Jun 28, 2024

Conversation

haiqi96
Copy link
Contributor

@haiqi96 haiqi96 commented Jun 24, 2024

Description

This PR adds a new ExtractIR job and supports it in the CLP package. The change is a preparation for LogViewer support.
The IR extraction job takes 3 arguments: original_file_id, msg_ix and ir_target_size. The query scheduler use the first two arguments to find the file split to decompress, and creates a single task to extract the IR into a local path specified in the clp-config.

The PR makes the following changes:

  1. Adds a new job type, ExtractIR and support it in the query scheduler
  2. Refactors query scheduler to let IR extraction job and search job share common code
  3. Add a new worker task extract_ir_task. Refactor the search task to let share common code with the new task.
  4. Updated start-clp.py to let different type of workers have specific mounts and enviroment
  5. Add IR extraction related configs into the clp-config.

Validation performed

  • Submitted an IR extraction job with a helper scripts, confirmed that job finishes successfully and an IR is extracted.
  • Submitted search and aggregation jobs via both commandline and webui. Confirmed that there is no error.

@haiqi96 haiqi96 changed the title Finalize extraction job clp-package: support IR extraction job Jun 25, 2024
@haiqi96 haiqi96 marked this pull request as ready for review June 25, 2024 21:32
@haiqi96 haiqi96 force-pushed the finalize_extraction_job branch from 3b8f72c to fe36599 Compare June 26, 2024 20:09
Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

clp-package: Add handling for IR extraction jobs to the query scheduler and workers.

@haiqi96 haiqi96 changed the title clp-package: support IR extraction job clp-package: Add handling for IR extraction jobs to the query scheduler and workers. Jun 28, 2024
@haiqi96 haiqi96 merged commit 9ba0451 into y-scope:main Jun 28, 2024
4 checks passed
haiqi96 added a commit that referenced this pull request Jul 15, 2024
- Write search results to collection named job_id rather than task_id.
- Convert int to str in IR extraction command generation.
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
- Write search results to collection named job_id rather than task_id.
- Convert int to str in IR extraction command generation.
@haiqi96 haiqi96 deleted the finalize_extraction_job branch December 6, 2024 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants