-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix copied doc updates not insert #4729
Conversation
WalkthroughThe changes involve significant modifications to the document handling functionalities in the FiftyOne library. A new method, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Document
participant Dataset
User->>Dataset: Clone Document
Dataset->>Document: Call copy(new_id=True)
Document->>Document: Create new document instance
Document->>Document: Generate new ObjectId
Document->>Document: Set _created attribute to True
Document-->>Dataset: Return new document
Dataset-->>User: Provide cloned document
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (3)
- fiftyone/core/dataset.py (3 hunks)
- fiftyone/core/odm/document.py (2 hunks)
- tests/unittests/odm_tests.py (2 hunks)
Additional comments not posted (6)
tests/unittests/odm_tests.py (1)
35-56
: LGTM!The test function is well-structured and covers the necessary test cases for the
copy_with_new_id
method.The code changes are approved.
fiftyone/core/odm/document.py (1)
587-599
: LGTM!The method is well-implemented and follows the necessary steps to create a new document with a unique ID.
The code changes are approved.
fiftyone/core/dataset.py (4)
7776-7776
: LGTM!The code correctly uses
copy_with_new_id()
to create a new ID for the cloned dataset document.The code changes are approved.
7780-7781
: LGTM!The code correctly assigns the newly cloned dataset document to the variable
dataset_doc
.The code changes are approved.
8252-8253
: LGTM!The code correctly uses
copy_with_new_id()
to create a new ID for the cloned reference document.The code changes are approved.
Line range hint
8257-8264
: LGTM!The code correctly uses
copy_with_new_id()
to create a new ID for the cloned run document and handles copying the GridFS files.The code changes are approved.
Nice find. Odd and opaque behavior. It makes sense to me. Trying to grok things a bit...we still like regular Thinking out loud, what about a |
@swheaton +1 to both of @benjaminpkane's thoughts here:
But, I'm now wondering if we ever use Okay, here's an interesting bit of code when dealing with embedded docs: fiftyone/fiftyone/utils/eval/coco.py Lines 770 to 781 in c9cfc05
The use case here is that we actually want a copy of the embedded docs with the same IDs because we intend to do in-memory computations on them but don't want the stuff we do to be tracked and persisted on the label objects in the database. Annnnd, this reminds me that apparently d = fo.Detection()
assert d.id == d.copy().id # False! And then there's fiftyone/fiftyone/core/document.py Lines 376 to 390 in c9cfc05
|
Somehow I missed the emails of these reviews coming in ... Good ideas, I didn't love
But yes embedded document is different, not sure why exactly. But I don't see any further bugs due to this. |
changed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (3)
- fiftyone/core/dataset.py (3 hunks)
- fiftyone/core/odm/document.py (2 hunks)
- tests/unittests/odm_tests.py (2 hunks)
Files skipped from review as they are similar to previous changes (3)
- fiftyone/core/dataset.py
- fiftyone/core/odm/document.py
- tests/unittests/odm_tests.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- fiftyone/core/odm/document.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/odm/document.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
What changes are proposed in this pull request?
Another weird bug that doesn't cause issues really until to-come changes are introduced.
If you do this code (done 3 times with dataset clone)
Then
mongoengine
sets_created
toFalse
which means it thinks it's an update object not a new one.When you call
save()
it emits an upsert call instead of an insert.OK, not so bad, maybe a little weird but whatever ...
But in #4597 @brimoor proposes an optimization where only changed fields are serialized to get the document. In combo, this causes all kinds of strange behavior. It just so happens to work in that PR because
doc._changed_fields
is uninitialized (a code smell in mongoengine, they have a TODO to clean it up...) and sodoc._delta()
returns the whole doc.But say we cleared changed fields because we didn't know about this strange requirement.
Ok but we don't clear changed_field so we're fine? Nope, balancing on a thread due to another mongoengine weirdness where
_get_changed_fields()
can return something due to embedded documents being editedIn mongoengine code, this appears to be band-aided with this:
which is what @brimoor 's optimization is trying to avoid
How is this patch tested? If it is not, please explain why.
Added test for
copy_with_new_id
.Ensured that cloning dataset uses INSERT methods not UPSERT. Added print in
Document._save()
Previously,
Summary by CodeRabbit
New Features
Bug Fixes
Tests