-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow lists wherever possible in builtin operators #5241
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces enhancements to multiple components of the FiftyOne library. The changes focus on improving input handling, field management, and aggregation processes across different modules. Key modifications include updating the list field handling in dataset summary creation, adding a utility function to manage nested paths, and refactoring operator input validation to support more flexible field operations with enhanced error handling. Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
fiftyone/operators/builtin.py
Outdated
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
fiftyone/operators/builtin.py
Outdated
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
fiftyone/operators/builtin.py
Outdated
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
fiftyone/operators/builtin.py
Outdated
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
fiftyone/operators/builtin.py
Outdated
ctx.dataset.drop_index(index_name) | ||
index_names = _to_string_list(ctx.params.get("index_names", [])) | ||
|
||
index_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
fiftyone/operators/builtin.py
Outdated
|
||
field_names = _to_string_list(ctx.params.get("field_names", [])) | ||
|
||
field_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
fiftyone/operators/builtin.py
Outdated
|
||
names = _to_string_list(ctx.params.get("names", [])) | ||
|
||
saved_view_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
fiftyone/operators/builtin.py
Outdated
|
||
names = _to_string_list(ctx.params.get("names", [])) | ||
|
||
workspace_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
290cc96
to
6082a67
Compare
fiftyone/operators/builtin.py
Outdated
|
||
names = _to_string_list(ctx.params.get("names", [])) | ||
|
||
slice_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
fiftyone/operators/builtin.py (5)
250-264
: Refactor repeated field validation logic into a helper functionThe validation code for checking if a field exists in the schema and handling error messages is duplicated across multiple functions (
_clone_sample_field_inputs
,_clone_frame_field_inputs
,_rename_sample_field_inputs
,_rename_frame_field_inputs
). To adhere to the DRY (Don't Repeat Yourself) principle and improve maintainability, consider refactoring this logic into a reusable helper function.Also applies to: 368-382, 445-459, 537-551
650-650
: Simplify loop by iterating directly over the dictionaryYou can iterate directly over the dictionary
schema
instead of usingschema.keys()
. This is more Pythonic and efficient.Apply this change:
- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.0)
650-650: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
758-758
: Simplify loop by iterating directly over the dictionarySame improvement as in line 650; iterate directly over
schema
for simplicity.Apply this change:
- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.0)
758-758: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
910-910
: Simplify loop by iterating directly over the dictionaryConsistently iterate directly over
schema
without callingkeys()
.Apply this change:
- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.0)
910-910: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
985-985
: Simplify loop by iterating directly over the dictionaryAdjust the loop to iterate over
schema
directly for cleaner code.Apply this change:
- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.0)
985-985: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
fiftyone/core/odm/mixins.py (1)
1872-1878
: Add a docstring to_remove_nested_paths
functionAdding a docstring to the
_remove_nested_paths(paths)
function will enhance code readability by explaining its purpose and usage. This is helpful for future maintainability and for other developers who may work with the code.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
fiftyone/core/dataset.py
(1 hunks)fiftyone/core/odm/mixins.py
(4 hunks)fiftyone/operators/builtin.py
(21 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
fiftyone/operators/builtin.py
650-650: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
758-758: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
910-910: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
985-985: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (2)
fiftyone/operators/builtin.py (1)
656-656
: Use types.String()
instead of types.OneOf([...])
Consider using types.String()
directly instead of types.OneOf([types.Object(), types.String()])
for the input type. This simplifies the code and removes the need for _to_string_list()
usage.
Also applies to: 764-764, 916-916, 991-991
fiftyone/core/dataset.py (1)
1971-1972
: LGTM! This fixes list field handling in the aggregation pipeline.
The change correctly iterates over all list fields to unwind them in the pipeline, rather than only handling the first list field. This ensures proper population of summary fields when multiple list fields are present.
if list_fields: | ||
pipeline.append({"$unwind": "$" + list_fields[0]}) | ||
for list_field in list_fields: | ||
pipeline.append({"$unwind": "$" + list_field}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would quickly explode the number of docs in the aggregation and doesn't scale. Would it be better to just have separate parallel aggregations for each list item or is there a situation in which you need to calculate every combination of a list field against every combination of another list field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation was just wrong. Generating a summary field for something like ground_truth.detections.tags
that contains doubly-nested lists needs to generate a ["list", "of", "tags"]
for each sample. Currently it is returning [["list"], ["of"], ["tags"]]
.
6082a67
to
b59bcee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
fiftyone/core/odm/mixins.py (1)
1872-1878
: Add docstring and type hints to improve code clarity.The function implementation is correct, but would benefit from documentation:
-def _remove_nested_paths(paths): +def _remove_nested_paths(paths: list[str]) -> list[str]: + """Remove any paths that are nested within other paths in the list. + + Args: + paths: List of field paths + + Returns: + Filtered list containing only top-level paths + + Example: + >>> _remove_nested_paths(['a', 'a.b', 'c']) + ['a', 'c'] + """ return [ path for path in paths if not any(path.startswith(p + ".") for p in paths) ]fiftyone/operators/builtin.py (3)
2442-2447
: Add docstring and type hints to improve code clarity.The function implementation is correct, but would benefit from documentation:
-def _to_string_list(values): +def _to_string_list(values: list[Union[dict, str]]) -> list[str]: + """Convert a list of values to a list of strings. + + Args: + values: List of values that can be strings or dicts with 'value' key + + Returns: + List of strings extracted from the input values + + Example: + >>> _to_string_list([{'value': 'a'}, 'b']) + ['a', 'b'] + """ if not values: return [] return [d["value"] if isinstance(d, dict) else d for d in values]
650-650
: Optimize dictionary key lookup operations.Use
key in dict
instead ofkey in dict.keys()
for better performance:- for key in schema.keys(): + for key in schema:Also applies to: 758-758, 910-910, 985-985
🧰 Tools
🪛 Ruff (0.8.2)
650-650: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
1822-1823
: Track and fix the App bug affecting view setting.The TODO comment indicates a known issue that prevents proper view setting functionality.
Would you like me to help create a GitHub issue to track this bug? I can include:
- Description of the bug
- Impact on functionality
- Suggested investigation steps
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
fiftyone/core/dataset.py
(1 hunks)fiftyone/core/odm/mixins.py
(4 hunks)fiftyone/operators/builtin.py
(21 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/dataset.py
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/operators/builtin.py
650-650: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
758-758: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
910-910: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
985-985: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (2)
fiftyone/core/odm/mixins.py (2)
673-673
: LGTM! Correctly filters nested paths before field processing.
The integration of _remove_nested_paths
ensures efficient field clearing by preventing redundant operations on nested fields.
721-721
: LGTM! Consistent implementation across field operations.
The _remove_nested_paths
function is correctly integrated in both methods, maintaining consistent behavior for nested field handling.
Also applies to: 826-826
b59bcee
to
c8355bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
plugins/operators/__init__.py (9)
268-268
: Clarify the purpose of the new field
The description for"new_field_name"
is clear. Just ensure that the user-facing label or any documentation references the reason for creating this new field for better clarity.
365-379
: Reuse existing validation approach
This section mirrors the clone logic for frame fields. Reusing or unifying the validation approach for sample vs. frame fields might reduce duplication and simplify long-term maintenance.
611-626
: Parameterizing the clear operation target
The radio group design is consistent with other operators, letting users quickly choose which subset of samples to modify. However, consider extracting the repeated “target” logic (dataset/current view/selected samples) into a shared helper to avoid duplicative code across operators.
644-650
: Avoid.keys()
usage
Usefor key in schema:
instead offor key in schema.keys():
to adhere to Python best practices and pass static analysis.- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.2)
647-647: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
692-697
: Frame fields clearing
Mirrors the sample fields clearing logic, ensuring frame-level data can also be cleared in bulk. Consider factoring out common patterns for maintainability.
719-734
: Radio button grouping for frame clearing
Similar design to sample clearing. The code is consistent but duplicative across sample and frame operators; recommended to unify shared logic.
752-758
: Inline keys iteration
Similarly, usefor key in schema:
to remove.keys()
.- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.2)
755-755: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
904-910
: Avoid.keys()
usage
Minor performance/readability nitpick: remove.keys()
.- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.2)
907-907: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
979-985
: Iterating schema keys
As with prior spots, remove.keys()
to meet static analysis guidance.- for key in schema.keys(): + for key in schema:🧰 Tools
🪛 Ruff (0.8.2)
982-982: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
fiftyone/core/dataset.py
(1 hunks)fiftyone/core/odm/mixins.py
(4 hunks)plugins/operators/__init__.py
(21 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/dataset.py
🧰 Additional context used
🪛 Ruff (0.8.2)
plugins/operators/__init__.py
647-647: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
755-755: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
907-907: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
982-982: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (26)
plugins/operators/__init__.py (22)
247-261
: Validate field existence before cloning
The code correctly checks whether the specified field name exists in the schema and marks the input as invalid if not found. This ensures that users receive immediate feedback on typos or non-existent fields.
442-456
: Read-only field rename checks
Great job restricting renaming of read-only fields. This prevents unintended modifications of critical data. Ensure that any attempts to rename system fields (like _id
) are also blocked if they appear.
593-598
: Multiple sample fields clearing
Allowing multiple fields to be cleared at once provides significant flexibility. The code properly retrieves them via _to_string_list
and applies the operation.
651-657
: List input usage
Replacing traditional single-string fields with inputs.list(...)
for multiple field deletion is well-structured. It aligns with the PR objective to “Allow lists wherever possible.”
660-664
: Early return for invalid fields
Returning immediately when a field is invalid or does not exist is a good UX practice. This short-circuits further processing and provides a clear error message.
666-670
: Respect read-only fields
Rejecting attempts to clear read-only fields maintains data integrity. This is consistent with earlier constraints preventing edits to read-only fields.
759-765
: Accept multiple frame fields
Switching to inputs.list(...)
to accept multiple frame fields for clearing is consistent with the multi-field approach.
768-782
: Validate frame fields
These checks parallel the sample field logic, ensuring non-existent and read-only frame fields are handled properly.
886-888
: Delete multiple sample fields
Good move making “DeleteSampleField” accept multiple fields in a single request. This helps streamline user workflow.
911-917
: List of fields to delete
Leveraging inputs.list(...)
again. This consistent approach aligns well with your multi-field design.
920-930
: Immediate returns for invalid or read-only fields
Clearly marks invalid fields or read-only fields, preventing any partial or half-complete operations.
952-954
: Delete multiple frame fields
Matching sample-level deletion logic. The uniform approach across both sample and frame ensures consistent user experience.
986-992
: Frame field(s) to delete
Using “multiple=True” for the autocomplete is user-friendly, matching the multi-field paradigm.
995-1009
: User feedback on invalid or read-only frame fields
Comprehensive checks are in place to prevent accidental deletions of important fields.
1124-1176
: DropIndex operator
The use of _drop_index_inputs
and the acceptance of multiple index names parallels the multi-field approach. Good consistency.
1452-1500
: DeleteSummaryField: handling multiple fields
Clear approach to removing multiple summary fields at once. The thorough validation ensures no partial or unexpected deletions.
1639-1653
: DeleteGroupSlice approach
Allowing multiple group slices to be deleted helps unify the user experience for multi-item operations. The reversion to the default slice if the current slice is removed is a useful fallback.
1659-1693
: _delete_group_slice_inputs
Mirrors other multi-delete patterns, verifying that group slices exist and returning early if they do not. Good user feedback.
1929-1941
: DeleteSavedView operator
Adopting the multi-delete pattern here as well. This is consistent with the overall PR objective of allowing lists for operated items.
1942-1974
: _delete_saved_view_inputs logic
The usage of auto-complete with multiple selection is consistent across the various multi-delete features. Well done.
2224-2276
: DeleteWorkspace operator
Again, the multi-workspace delete approach is consistent. Notably, reverting to a default workspace if the current workspace is removed is very user-friendly.
2439-2445
: _to_string_list utility
Converts dict-encoded input or raw strings to lists, simplifying the repeated pattern of multi-field inputs. This is a neat approach to unify input types.
fiftyone/core/odm/mixins.py (4)
673-674
: Remove nested paths for cleared fields
Excluding nested paths avoids partial or conflicting operations on subpaths. This step ensures integrity when clearing high-level fields.
721-722
: Delete fields safely
Ensuring nested paths are stripped out before deletion helps avoid confusion or partial field states.
826-827
: Remove dynamic fields correctly
Filtering nested subfields first keeps the schema consistent and avoids partial embedded field definitions.
1872-1878
: _remove_nested_paths
This is a concise and effective helper. One consideration: if the input list is large, repeatedly calling path.startswith(p + ".")
in a double loop may be costly, but likely not an immediate concern for typical usage.
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
|
||
field_prop = inputs.list( | ||
"field_names", | ||
types.OneOf([types.Object(), types.String()]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this just types.String()
and remove _to_string_list()
usage
ctx.dataset.drop_index(index_name) | ||
index_names = _to_string_list(ctx.params.get("index_names", [])) | ||
|
||
index_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
|
||
field_names = _to_string_list(ctx.params.get("field_names", [])) | ||
|
||
field_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
|
||
names = _to_string_list(ctx.params.get("names", [])) | ||
|
||
saved_view_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
|
||
names = _to_string_list(ctx.params.get("names", [])) | ||
|
||
workspace_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
|
||
names = _to_string_list(ctx.params.get("names", [])) | ||
|
||
slice_selector = types.AutocompleteView(multiple=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally can make this DropdownView(multiple=True)
to avoid manually enforcing valid values below
Updates the builtin operators to use
list()
to allow users to act on multiple fields/entities wherever it makes sense.Summary by CodeRabbit
New Features
Bug Fixes
Improvements