[BUG]: Delete collection resource leak (single-node Chroma) #3297

tazarov · 2024-12-13T16:30:55Z

Description of changes

The delete collection logic slightly changes to accomodate the fix without breaking the transactional integrity of self._sysdb.delete_collection. The chromadb.segment.SegmentManager.delete_segments had to change to accept the list of segments to delete instead of collection_id.

Summarize the changes made by this PR.

Improvements & Bug fixes
- Fixes the resource leak when deleting a collection

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

N/A

github-actions · 2024-12-13T16:31:09Z

tazarov · 2024-12-13T16:31:15Z

[BUG]: Local segment manager memory leak fix #3340
[BUG]: Delete collection resource leak (single-node Chroma) #3297 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

rohitcpbot

Thanks for identifying the leak and raising the fix. I did not see this earlier so did not review earlier. my miss. Reviewed it now.

rohitcpbot · 2025-01-04T05:18:00Z

chromadb/api/segment.py

@@ -384,10 +384,11 @@ def delete_collection(
        )

        if existing:
+            segments = self._sysdb.get_segments(collection=existing[0].id)


I feel we should try not to call sysdb for getting segments. It adds extra call to the backend for distributed chroma.

Seeing the current code, I see we are already calling sysdb.get_segments() from the manager, so you are simply moving that line here, and not adding extra calls. But i feel we can do better.

Do you think we should just call delete_segment() from delete_collection() ?
So we can add this snippet back -

for s in self._manager.delete_segments(existing[0]["id"]): self._sysdb.delete_segment(s)

and do a no-op inside delete_segments() in db/impl/grpc/client.py
Will that fix the leak ?

@rohitcpbot,

using this snippet:

for s in self._manager.delete_segments(existing[0]["id"]): self._sysdb.delete_segment(s)

Makes sense however we revert back to a non-atomic deletion of sysdb resources. In the above snippet we'd delete the segments separately from deleting the collection, which I wanted to avoid on purpose which is why I pulled the get of the segments here before the were atomically deleted as part of self._sysdb.delete_collection.

Why do you think that this would cause extra calls in the distributed backend?

tazarov · 2025-01-06T18:56:09Z

chromadb/segment/impl/manager/distributed.py

@@ -76,8 +76,7 @@ def prepare_segments_for_new_collection(
        return [vector_segment, record_segment, metadata_segment]

    @override
-    def delete_segments(self, collection_id: UUID) -> Sequence[UUID]:
-        segments = self._sysdb.get_segments(collection=collection_id)


@rohitcpbot, is this your concern about the call to distributed sysdb?

tazarov requested review from rohitcpbot and HammadB December 13, 2024 16:35

tazarov added bug Something isn't working Local Chroma An improvement to Local (single node) Chroma labels Dec 16, 2024

This was referenced Dec 19, 2024

[BUG]: Local segment manager memory leak fix #3337

Closed

[BUG]: Local segment manager memory leak fix #3340

Open

tazarov force-pushed the trayan-12-13-fix_delete_collection_resource_leak branch 2 times, most recently from 2e113a0 to b53dadb Compare January 3, 2025 07:48

rohitcpbot reviewed Jan 4, 2025

View reviewed changes

tazarov commented Jan 6, 2025

View reviewed changes

tazarov added 2 commits January 8, 2025 10:20

fix: delete_collection resource leak

d18caaa

fix: Adapt to new list_collection semantics

ba07228

tazarov force-pushed the trayan-12-13-fix_delete_collection_resource_leak branch from b53dadb to ba07228 Compare January 8, 2025 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Delete collection resource leak (single-node Chroma) #3297

[BUG]: Delete collection resource leak (single-node Chroma) #3297

tazarov commented Dec 13, 2024 •

edited

Loading

github-actions bot commented Dec 13, 2024

tazarov commented Dec 13, 2024 •

edited

Loading

rohitcpbot left a comment

rohitcpbot Jan 4, 2025

tazarov Jan 6, 2025

tazarov Jan 6, 2025

[BUG]: Delete collection resource leak (single-node Chroma) #3297

Are you sure you want to change the base?

[BUG]: Delete collection resource leak (single-node Chroma) #3297

Conversation

tazarov commented Dec 13, 2024 • edited Loading

Description of changes

Test plan

Documentation Changes

github-actions bot commented Dec 13, 2024

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

tazarov commented Dec 13, 2024 • edited Loading

rohitcpbot left a comment

Choose a reason for hiding this comment

rohitcpbot Jan 4, 2025

Choose a reason for hiding this comment

tazarov Jan 6, 2025

Choose a reason for hiding this comment

tazarov Jan 6, 2025

Choose a reason for hiding this comment

tazarov commented Dec 13, 2024 •

edited

Loading

tazarov commented Dec 13, 2024 •

edited

Loading