Skip to content

Conversation

clefourrier
Copy link
Member

@clefourrier clefourrier commented Aug 8, 2025

Todos
Predictions

  • feat: adding the cache system
  • test: testing prediction with accelerate
  • feat: making the system ligther by loading cached inputs after processing the other ones (probably with an index system?)
  • feat: adding the system for all models
  • fix: change cache path to ~/.cache after debugging
  • stab: adding a test suite for all models

We'll need to tokenize inputs later.

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@clefourrier clefourrier changed the title Caching samples PR (ongoing) Caching samples PR Aug 11, 2025
GenerativeResponse,
LoglikelihoodResponse,
LoglikelihoodSingleTokenResponse,
ModelResponse,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up imports, unrelated to the PR

GenerativeResponse,
LoglikelihoodResponse,
LoglikelihoodSingleTokenResponse,
ModelResponse,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up imports, unrelated to the PR

config = yaml.safe_load(f)["model_parameters"]
else:
# We extract the model args
config: dict = ModelConfig._parse_args(model_args)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up unused params to simplify tests

def loglikelihood_rolling(self, docs: list[Doc], override_bs=None) -> list[ModelResponse]:
return self._loglikelihood(docs, rolling=True)

def _loglikelihood(self, docs: list[Doc], rolling: bool = False) -> list[ModelResponse]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouped logic of both functions, and separated the API function from the logic to avoid breaking changes in people's pipeline if we change the core logic

)

def loglikelihood(self, requests: List[LoglikelihoodRequest]) -> List[LoglikelihoodResponse]:
@cached("predictions")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the single token loglikelihood since it should have been removed with the refacto, plus some cleaning up of imports

Attributes:
base_model (str):
HuggingFace Hub model ID or path to the base model. This is the original
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unused params

during loading to reconstruct the full fine-tuned model.
Attributes:
base_model (str):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed unused useless param

@clefourrier clefourrier merged commit df3a82d into main Aug 11, 2025
4 of 6 checks passed
NathanHB pushed a commit that referenced this pull request Sep 19, 2025
Adds a new caching system for generative evals, plus test suite, plus doc - the system loads indices first, then runs samples as needed, then lastly loads the cached items as needed. (We don't keep the cache in mem when running models).
Contains a test suite and doc page
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants