Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/fingerprinting #1039

Closed
wants to merge 4 commits into from
Closed

Feat/fingerprinting #1039

wants to merge 4 commits into from

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Jul 18, 2024

Following up our design discussion on caching and fingerprinting. Here are the basic working mechanisms for caching as we discussed.

Composable parts:

  • caching algorithm: decide when to compute or use cache
  • fingerprint store (repository): how to store {execution context: fingerprint}
  • result store (cache): how to store {fingerprint: result / data}
  • serialization and deserialization (related to result store)

TODO

  • figure out ideal module structure
  • documentation
  • standardize and agree on naming
  • introspection mechanisms
  • good logger for debugging
  • special caching algorithm behavior
    • always recompute: does the recompute and produces a new fingerprint; this fingerprint affects the keys of downstream nodes
    • don't fingerprint: can be recomputed or read from cache; in the eye of the downstream nodes, the value / fingerprint is constant (related: Add ability to mark function outputs as unserializable  #743 )
    • read historical value: if a value exists, use it (i.e., the existing CachingGraphAdapter). It would require storing a node_name metadata in the fingerprint store. This is very easy with diskcache
  • surface to user post-run caching explanations (visualization, rule-based text explanations)
  • surface to user pre-run caching explanations; this will use node versions (must recompute), top-level inputs and overrides, and the state of the fingerprint store (if it's empty, we know we must recompute everything same if node_name metadata is absent)
  • checkpointing is a subset of this cache feature. All you need to do is explicitly pass a mapping of fingerprints to SmartCache(fingerprints={...}) (which could be hidden beind a single run_id). Then the adapter will use these fingerprints to read from the result store instead of trying to compute these fingerprints or load them from the fingerprint store.
  • improve the protocol definition for Store objects. get and set are required, but unclear if open() and close() are necessary for all (could be null operations)
  • create a MaterializerStore where get() and set() are materializers

@zilto zilto marked this pull request as draft July 18, 2024 21:46
@skrawcz
Copy link
Collaborator

skrawcz commented Aug 5, 2024

this is related to #940

@zilto zilto closed this Aug 22, 2024
@zilto zilto mentioned this pull request Aug 23, 2024
7 tasks
@zilto zilto deleted the feat/fingerprinting branch November 20, 2024 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants