Skip to content

Commit

Permalink
Ensure embeddings file is garbage collected
Browse files Browse the repository at this point in the history
It turns out the embeddings file cache was not being cleared, presumably
because we where streaming the object before it was closed, resulting in
the file never getting closed, and therefore not being cleared of
cloudpathlibs cache. Testing on fixtures confirms this change results in
a clear cache.

Also removing a log line that effectively repeats another to make logs
easier to read through
  • Loading branch information
olaughter committed Feb 2, 2024
1 parent 50c9040 commit 31baac9
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 3 deletions.
1 change: 0 additions & 1 deletion src/index/vespa_.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,6 @@ def populate_vespa(

if len(to_process[DOCUMENT_PASSAGE_SCHEMA]) >= config.VESPA_DOCUMENT_BATCH_SIZE:
_batch_ingest(vespa, to_process)
_LOGGER.info(f"Clearing batch with length: {len(to_process[DOCUMENT_PASSAGE_SCHEMA])}")
to_process.clear()

_LOGGER.info("Final ingest batch")
Expand Down
3 changes: 1 addition & 2 deletions src/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,5 +73,4 @@ def filter_on_block_type(

def read_npy_file(file_path: Path) -> Any:
"""Read an npy file."""
with open(file_path, "rb") as task_array_file_like:
return np.load(BytesIO(task_array_file_like.read()))
return np.load(BytesIO(file_path.read_bytes()))

0 comments on commit 31baac9

Please sign in to comment.