Skip to content

Commit

Permalink
Try batching on text blocks not families (#85)
Browse files Browse the repository at this point in the history
The task is failing, we know part of the issue is that batching on families
is leading to insanely large amount of text blocks being loaded into memory
(500 families with up to 40,000 text blocks each!) This may also solve
the other issue we are seeing of empty requests being sent to vespa, but
at the very least it will be educational
  • Loading branch information
olaughter authored Jan 26, 2024
1 parent a0bf482 commit 07ed516
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def _convert_to_bool(x: str) -> bool:


# Vespa config
VESPA_DOCUMENT_BATCH_SIZE: int = int(os.getenv("VESPA_BATCH_SIZE", "500"))
VESPA_DOCUMENT_BATCH_SIZE: int = int(os.getenv("VESPA_BATCH_SIZE", "2000"))
VESPA_INSTANCE_URL: str = os.getenv("VESPA_INSTANCE_URL", "")
VESPA_CERT_LOCATION: str = os.getenv("VESPA_CERT_LOCATION", "")
VESPA_KEY_LOCATION: str = os.getenv("VESPA_KEY_LOCATION", "")
Expand Down
2 changes: 1 addition & 1 deletion src/index/vespa_.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ def populate_vespa(
}
)

if len(to_process[FAMILY_DOCUMENT_SCHEMA]) >= config.VESPA_DOCUMENT_BATCH_SIZE:
if len(to_process[DOCUMENT_PASSAGE_SCHEMA]) >= config.VESPA_DOCUMENT_BATCH_SIZE:
asyncio.run(_batch_ingest(vespa, to_process))
to_process.clear()

Expand Down

0 comments on commit 07ed516

Please sign in to comment.