-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast delete #227
base: master
Are you sure you want to change the base?
Fast delete #227
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a substantial reservation - which isn't really to do with this PR but explains why using this function has been deprecated.
deleted = deleted + 1 | ||
batch.commit() | ||
print(f"Deleted {deleted} objects ({first_id} to {last_id})") # Useful because Firebase get()s sort by id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sort by ID guaranteed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. "By default, a query retrieves all documents that satisfy the query in ascending order by document ID" (https://firebase.google.com/docs/firestore/query-data/order-limit-data)
deleted = deleted + 1 | ||
batch.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this creates a requirement that batch_size < firebase's commit batch limit. We could probably extract this into a global constant to stop someone having to worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the suggestion here - removing the batch_size parameter from this function's signature and always using the constant, or keeping the parameter but using the constant at each call site?
@@ -303,10 +303,18 @@ def delete_dataset(dataset_id): | |||
def _delete_collection(coll_ref, batch_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this method of deleting collections is either safe or cost-effective, because of Firebase's orphaned collection problem. This was the reason why we haven't actually been using it, but have used the console instead.
Basically - if the docs in the collection contain subcollections, I think this method will orphan them.
We could either try and understand what the recommended approach to doing this is, or we should delete this function from the fcw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly there is no easy way to recursively delete a document. Firebase's recommendation appears to be to do it in a cloud function (which has access to a recursive delete API). Another option is to reimplement the CLI's delete function in python.
This does work correctly at the moment when operating on valid datasets. It would only break if we uploaded an invalid dataset or if we updated the rest of data tools rather than just here. However, backup and restore suffer the same problems...
Manually deleting 7 segments in Firestore is a pain, as is remembering to delete the segment_count doc.
Review #223 first.