Fix task creation with `gt_pool` validation and cloud storage data #8539

Marishka17 · 2024-10-14T10:45:03Z

Motivation and context

How has this been tested?

Added corresponding REST API test

Checklist

I submit my changes into the develop branch
I have created a changelog fragment
~~- [ ] I have updated the documentation accordingly~~
I have added tests to cover my changes
I have linked related issues (see GitHub docs)
- [ ] I have increased versions of npm packages if it is necessary
(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

New Features
- Enhanced task creation process with improved validation for segment sizes and parameters.
- New method for reordering images in the manifest based on user-defined lists.
Bug Fixes
- Improved handling of manifest files and error handling for invalid configurations during task creation.
Tests
- Expanded test suite for task management, focusing on validation frames and cloud data integration.
Chores
- Updates to function signatures and type annotations for better clarity and maintainability.

coderabbitai · 2024-10-14T10:45:11Z

Walkthrough

The changes primarily focus on enhancing task creation and validation processes within the CVAT application. Modifications include new validation steps for task parameters, improvements in manifest file handling, and updates to segment creation logic. The testing suite has also been enhanced with new test cases and improved error handling, ensuring comprehensive coverage of task management features. Additionally, function signatures and type annotations have been refined across several files for better clarity and maintainability.

Changes

File	Change Summary
cvat/apps/engine/task.py	- Added validation for `validation_params`.
	- Updated `_create_thread` with checks for `job_file_mapping`.
	- Refined manifest handling functions.
	- Improved logic for filtering server files and downloading from cloud.
	- Adjusted segment creation process for validation frames.
	- Updated validation layout for ground truth tasks.
tests/python/rest_api/test_tasks.py	- Added new test cases for validation frames and cloud data.
	- Enhanced annotation handling checks.
	- Improved error handling tests.
	- Refactored tests for clarity.
	- Added utility functions for task management.
tests/python/shared/utils/helpers.py	- Updated `generate_image_files` function signature and logic.
utils/dataset_manifest/core.py	- Introduced `reorder` method in `ImageManifestManager` for image reordering.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TaskManager
    participant CloudStorage
    participant ManifestHandler

    User->>TaskManager: Create Task
    TaskManager->>TaskManager: Validate Parameters
    TaskManager->>CloudStorage: Check Job File Mapping
    CloudStorage-->>TaskManager: Return Mapping
    TaskManager->>ManifestHandler: Handle Manifest Files
    ManifestHandler-->>TaskManager: Return Processed Manifest
    TaskManager->>User: Task Created Successfully

🐰 In fields so wide and bright,
New tasks take flight, oh what a sight!
With validation strong and true,
Cloud data dances, fresh and new.
So hop along, let's celebrate,
For every change, we elevate! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)

utils/dataset_manifest/core.py (2)
730-744: New method reorder looks good, but could benefit from some improvements.

The new reorder method in the ImageManifestManager class is a good addition for reordering images based on a provided list. However, there are a few points to consider:

The method handles potential duplicates in the reordered list, which is good.

Error handling is implemented for missing images, which is appropriate.

Suggestions for improvement:

Consider adding type hints for better code readability and maintainability.

The method could benefit from some inline comments explaining the logic, especially for handling duplicates.

Consider adding a check for an empty reordered_images list to avoid unnecessary processing.

Here's a suggested improvement with type hints and comments:
-    def reorder(self, reordered_images: List[str]) -> None:
+    def reorder(self, reordered_images: List[str]) -> None:
         """
         The method takes a list of image names and reorders its content based on this new list.
         Due to the implementation of Honeypots, the reordered list of image names may contain duplicates.
         """
+        if not reordered_images:
+            raise ValueError("The reordered_images list cannot be empty")
+
+        # Create a dictionary of unique images from the current manifest
         unique_images: Dict[str, Any] = {}
         for _, image_details in self:
             if image_details.full_name not in unique_images:
                 unique_images[image_details.full_name] = image_details

         try:
+            # Create a new manifest with the reordered images, handling potential duplicates
             self.create(content=(unique_images[x] for x in reordered_images))
         except KeyError as ex:
             raise InvalidManifestError(f"Previous manifest does not contain {ex} image")
These changes improve the method's robustness and readability.

Line range hint 1-744: Consider refactoring for improved modularity and testability.

While the new reorder method is the only change in this file, there are some general observations and suggestions for the overall codebase:

The file is quite long and contains multiple classes and functions. Consider splitting it into smaller, more focused modules for better maintainability.

Some classes, like _ManifestManager, VideoManifestManager, and ImageManifestManager, could benefit from more extensive use of type hints throughout their methods.

The error handling could be more consistent across the file. Consider creating custom exception classes for different types of errors that can occur during manifest operations.

Some methods, like create in ImageManifestManager, are quite long and could potentially be split into smaller, more focused methods.

Consider adding more unit tests, especially for edge cases and error conditions, to ensure the robustness of the manifest management system.

To improve the overall structure, consider the following refactoring steps:

Split the file into separate modules, e.g., base_manifest.py, video_manifest.py, image_manifest.py, and validators.py.

Create a custom exceptions module, e.g., manifest_exceptions.py, to centralize all custom exceptions.

Implement more comprehensive logging throughout the classes to aid in debugging and monitoring.

Review and update the docstrings for all classes and methods to ensure they are up-to-date and follow a consistent format (e.g., Google style or NumPy style).

These changes would significantly improve the maintainability and readability of the codebase.
tests/python/shared/utils/helpers.py (1)
Line range hint 32-38: Add assertion to ensure sizes length matches count

Currently, there is no assertion to check that when sizes is provided, its length matches count. This could lead to an IndexError if sizes is shorter than count.

Consider adding the following assertion:
 assert not (prefixes and filenames), "prefixes cannot be used together with filenames"
 assert not prefixes or len(prefixes) == count
 assert not filenames or len(filenames) == count
+assert not sizes or len(sizes) == count

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f49c1c6 and a3caa9e.

📒 Files selected for processing (4)

cvat/apps/engine/task.py (1 hunks)
tests/python/rest_api/test_tasks.py (7 hunks)
tests/python/shared/utils/helpers.py (3 hunks)
utils/dataset_manifest/core.py (1 hunks)

🧰 Additional context used

🔇 Additional comments (5)

cvat/apps/engine/task.py (1)

1327-1328: Manifest Reordering Implemented Correctly

The manifest is properly updated to reflect the new image order based on the frame index mapping in new_db_images.

tests/python/rest_api/test_tasks.py (4)

35-35: Importing 'Iterable' is appropriate

The addition of Iterable to the imports enhances type hinting and improves code clarity.

Line range hint 1536-1616: Enhancement of _create_task_with_cloud_data with additional parameters

The method _create_task_with_cloud_data now includes org, filenames, task_spec_kwargs, and data_spec_kwargs as parameters. This extension increases the flexibility and reusability of the method by allowing customization of task and data specifications.

1991-1993: Passing data_spec_kwargs and data_type correctly

The inclusion of data_spec_kwargs and data_type when calling _create_task_with_cloud_data ensures that additional data specifications and data types are accurately passed, enhancing the method's versatility.

2558-2634: Well-implemented test for task creation with validation and cloud data

The test_can_create_task_with_validation_and_cloud_data method effectively tests task creation with different validation modes and cloud storage data. The assertions verify that the GT job is created and that job metadata corresponds to the chunk data.

tests/python/shared/utils/helpers.py

tests/python/rest_api/test_tasks.py

codecov-commenter · 2024-10-14T11:47:26Z

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.27%. Comparing base (c557f70) to head (6f1a980).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8539   +/-   ##
========================================
  Coverage    74.26%   74.27%           
========================================
  Files          400      400           
  Lines        43218    43222    +4     
  Branches      3909     3909           
========================================
+ Hits         32096    32103    +7     
+ Misses       11122    11119    -3

Components	Coverage Δ
cvat-ui	`78.73% <ø> (+0.01%)`	⬆️
cvat-server	`70.47% <80.00%> (+<0.01%)`	⬆️

…data

bsekachev

LGTM

…data

sonarcloud · 2024-10-16T13:11:46Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Marishka17 · 2024-10-16T14:25:39Z

Failed checks (License Compliance, Security Analysis) are not related to the PR, I'll merge it.

Marishka17 added 2 commits October 14, 2024 12:28

Fix task creation with gt_pool validation and CS data

ce15376

Add REST API test

a3caa9e

Marishka17 requested a review from azhavoro as a code owner October 14, 2024 10:45

coderabbitai bot reviewed Oct 14, 2024

View reviewed changes

tests/python/shared/utils/helpers.py Show resolved Hide resolved

tests/python/rest_api/test_tasks.py Show resolved Hide resolved

tests/python/rest_api/test_tasks.py Show resolved Hide resolved

Do not recreate the same manifest when creating a task from a backup

55d53e0

Merge branch 'develop' into mk/fix_task_creation_with_gt_pool_and_cs_…

c3ff053

…data

Marishka17 requested review from Bobronium and bsekachev and removed request for azhavoro October 14, 2024 11:52

Bobronium approved these changes Oct 15, 2024

View reviewed changes

bsekachev approved these changes Oct 16, 2024

View reviewed changes

Add changelog fragment

c46fa21

Marishka17 requested a review from nmanovic as a code owner October 16, 2024 12:29

Merge branch 'develop' into mk/fix_task_creation_with_gt_pool_and_cs_…

6f1a980

…data

Marishka17 merged commit 49ec1d1 into develop Oct 16, 2024
34 of 36 checks passed

Marishka17 deleted the mk/fix_task_creation_with_gt_pool_and_cs_data branch October 16, 2024 14:26

cvat-bot bot mentioned this pull request Oct 18, 2024

Release v2.21.1 #8559

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix task creation with `gt_pool` validation and cloud storage data #8539

Fix task creation with `gt_pool` validation and cloud storage data #8539

Marishka17 commented Oct 14, 2024 •

edited

Loading

coderabbitai bot commented Oct 14, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

codecov-commenter commented Oct 14, 2024 •

edited

Loading

bsekachev left a comment

sonarcloud bot commented Oct 16, 2024

Marishka17 commented Oct 16, 2024 •

edited

Loading

Fix task creation with gt_pool validation and cloud storage data #8539

Fix task creation with gt_pool validation and cloud storage data #8539

Conversation

Marishka17 commented Oct 14, 2024 • edited Loading

Motivation and context

How has this been tested?

Checklist

License

Summary by CodeRabbit

coderabbitai bot commented Oct 14, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 14, 2024 • edited Loading

Codecov Report

bsekachev left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Oct 16, 2024

Quality Gate passed

Marishka17 commented Oct 16, 2024 • edited Loading

Fix task creation with `gt_pool` validation and cloud storage data #8539

Fix task creation with `gt_pool` validation and cloud storage data #8539

Marishka17 commented Oct 14, 2024 •

edited

Loading

coderabbitai bot commented Oct 14, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov-commenter commented Oct 14, 2024 •

edited

Loading

Marishka17 commented Oct 16, 2024 •

edited

Loading