Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APIs design #172

Open
laurentsimon opened this issue May 13, 2024 · 11 comments
Open

APIs design #172

laurentsimon opened this issue May 13, 2024 · 11 comments
Labels
enhancement New feature or request
Milestone

Comments

@laurentsimon
Copy link
Collaborator

laurentsimon commented May 13, 2024

This issues proposes what the (long-term) APIs will look like. Looking for comments, nothing set in stone.

1. Hash engine

Tracked in #140
Why:

  • for callers to customize the hash parameters
  • for callers to customize hash implementation, eg using hw-accelerattion or distributed across machines.
class hash_engine:
  def name() -> str:
  # Shard size
  def shard() -> int
  # Chunk size
  def chunk() -> int
  # Standard hash APIs
  def update(...):..
  def final():..

We will provide default hash engine that we use in this library, with the possibility to customize its parameters:

class shah256pv1:
   def __init__(self, shard=1000000, chunk=memory_available())
   # All class function from hash_engine

A hash name is parameterized. I suggest something simple like <name>$param1$param2.... For the existing sha256p, it could be sha256pv1$1000000 for a shard of 1 GB.

2. Serializer

This will serialize a model (folder or file).
Why: some callers may want to serialize models using our library but not sign it with our library.

class serializer:
   def __init__(self, hash: hash_engine)
      ...
   def run(
          # Path is a file-system path.
          # PathIter is a path iterator, for callers who want to keep things in memory.
          input: Path | PathIter,
          # A list of paths to recompute Tracked in https://github.com/sigstore/model-transparency/issues/160
          recompute_paths: []Path = None,
          # HashedPathIter is an iterator for path+hash computed
          ) -> HashedPathIter:
       
     ...
)

3. Raw Signer / Verifier

This provides low-level signer: It allows to abstract away how private key is managed (TUF, HSM, static key, etc). The raw verifier lets callers set a custom verifier if needed, eg if the signature uses a different encoding, etc.

class RawSigner:
  @abstract
  def sign(raw: bytes) -> bytes
     # Perform signature via HSM API, use PCKS11 API, use raw bytes from memory, etc

We can have helper function for common raw signers like (one will be the default?):

class ECDSARawSigner(RawSigner):
  def from_path(path: filepath.Path):
     ...
  def from_bytes(key: bytes):
     # Verifies that 1) the key is of the right type, eg right length and on the right curve, 2) private and public key correspond to the same key pair. 
    ...
  def sign(raw):
   # Use the raw key
   ...

4. Signer / Verifier

This will create a generic signer class that can be instantiated for Sigstore, PKI, etc.

class Signer:
  def sign(
          # Path is a file-system path.
          # PathIter is a path iterator, for callers who want to keep things in memory.
          # HashedPathIter is an iterator for pre-computed path+hash.
          inputs: Path | PathIter | HashedPathIter,
	) -> bytes
     ...

class Verifier:
  def verify(
          input: Path | PathIter | HashedPathIter,
          sig: Path | bytes,
          ) -> bool

Example for Sigstore keyless:

# sigstore.py
class SigstoreKeylessSigner(Signer)
  def __init__(self, rekor_url, fulcio_cert, ...)
  def sign(input) -> bytes

class SigstoreKeylessVerifier(Verifier)
  def __init__(self, rekor_url, fulcio_tuf, ...)
  def verify(input, sig) -> bool

Example for Sigstore key flow:

# sigstore.py
class SigstoreKeyedSigner(Signer)
  def __init__(self, rekor_url, raw_signer, ...)
  def sign(input) -> bytes

class SigstoreKeyedVerifier(Verifier)
  def __init__(self, rekor_url, raw_verifier, ...)
  def verify(input, sig) -> bool

Example for PKI:

# pki.py
class PKISigner(Signer)
  def __init__(self, certs_chain, raw_signer, ...)
  def sign(input) -> bytes

class PKIVerifier(Verifier)
  def __init__(self, certs_chain, raw_verifier, ...)
  def verify(input, sig) -> bool

Example for ECSigner (no cert):

# ec.py
class ECSigner(Signer)
  def __init__(self, raw_signer, ...)
  def sign(input) -> bytes

class ECVerifier(Verifier)
  def __init__(self, raw_verifier, ...)
  def verify(input, sig) -> bool

4. model sign / verify

This is the main entry point for callers to use.

# model.py
def sign(
      input: Path | PathIter | HashedPathIter,
      sig: Path  = defaultSigPath(input),
      signer: Signer = SigstoreSigner(),
      he: hash_engine = shah256pv1(),
      recompute_paths: []Path = None,
      ignored_paths: []Path = [".git"],
    ) -> bytes
   # High level logic.
   s = serializer(he)
   iter = s.run(input, recompute_paths)
   m = manifest(iter)
   c = signer.sign(m)
   sig.write(c)
   return c

def verify(
      input: Path | PathIter | HashedPathIter,
      sig: Path | bytes = defaultSigPath(input),
      verifier: Verifier = SigstoreVerifier(),
      he: hash_engine = shah256pv1(),
      ignored_paths: []Path = [".git"],
    ) -> bool
   # High level logic.
   s = serializer(he)
   iter = s.run(input, recompute_paths)
   m = manifest(iter)
   return verifier.verify(m, sig)
@laurentsimon laurentsimon added the enhancement New feature or request label May 13, 2024
@laurentsimon laurentsimon changed the title API designs APIs design May 13, 2024
@laurentsimon laurentsimon mentioned this issue May 13, 2024
10 tasks
@laurentsimon
Copy link
Collaborator Author

@susperius thoughts?

@mihaimaruseac
Copy link
Collaborator

I think we also need some lower level API for incremental hashing.

@laurentsimon
Copy link
Collaborator Author

I think the serializer API takes care of it, it takes as input recompute_paths, more generally it takes as input whatever we decide in #160.

@mihaimaruseac
Copy link
Collaborator

For the hash engine: does it take as input a file to hash or a memory string? Having shard and chunk signals a file-based API, but update and final signal a memory string, reading from other place.

@mihaimaruseac
Copy link
Collaborator

For the hash engine: does it take as input a file to hash or a memory string? Having shard and chunk signals a file-based API, but update and final signal a memory string, reading from other place.

In #188, I moved shard to be only informative here. Working on the next level of API where shard is actually relevant.

If we kept shard at the hash-an-object level where we would have implemented the multi-process queue, then we could have gone to a state where too many threads are started.

@udaysavagaonkar
Copy link

How is the hash_engine expected to be used? Will the serializer call update() and final() methods on the hash_engine? If that is the case, why does the hash_engine need to know what the shard size and chunk size is? I am sure I am missing something.

@laurentsimon
Copy link
Collaborator Author

laurentsimon commented May 29, 2024

Good observation, I think @mihaimaruseac had a related comment in #172 (comment)

I think it depends what level of abstraction we're talking about: the underlying (traditional) hash engine (sha256, etc), the shard-aware hash, something else. The shard-aware hash engine would use a lower-level hash engine under the hood (eg sha256). It may also depends if we're talking about streaming engine or not.

#188 has moved out the chunk size and shard out of the engine for now so we'll see if the original API def is needed or not. PTAL we'd love your feedback too

@mihaimaruseac
Copy link
Collaborator

Will the serializer call update() and final() methods on the hash_engine?

That is correct. I'll implement this in #190 (WIP at the moment, but should push a new change tomorrow-ish).

If that is the case, why does the hash_engine need to know what the shard size and chunk size is? I am sure I am missing something.

Chunk size doesn't matter, see #188. Shard size matters, but more just for verifications.

@font
Copy link
Member

font commented May 30, 2024

Based on the PKISigner and PKIVerifier examples, it looks like we're looking to add support for both keyed and keyless signing flows, and I think we should. If so, the PKISigner and PKIVerifier classes need to account for supplying the necessary keys. Not sure how official this API spec is, but just wanted to call it out JIC.

@laurentsimon
Copy link
Collaborator Author

laurentsimon commented May 30, 2024

The keyless flow will be handled by SigstoreSigner / SigstoreVerifier (with optional fulcio / reko parameters for private developments). The PKISigner / PKIVerifier is exclusively for "private" PKI deployment, ie existing PKI using existing "off the shelf" CAs or custom CAs (possibly using TUF-managed keys). See a PoC for the PKI #177. We'll need to support TUF-managed and HSM keys: the API in the issue description has a simple private key bytes input but I think we'll need something like a KeySigner / KeyVerifier interface instead (providing raw sign / verify) EDIT: Updated description with RawSigner / RawVerifier. /cc @udaysavagaonkar @susperius

There may be a case for supporting sigstore keyed flow, but I don't think we're prioritizing it atm.

Not sure how official this API spec is, but just wanted to call it out JIC.

Nothing official, just the overall direction :) Feel free to comment.

@mihaimaruseac
Copy link
Collaborator

+1, the proposal in the first message here is mostly informative, as you see in #188/#190 things are changing. For #190 I still need to find way to incorporate the current serialization format, but I have some ideas on that, which I'll test by next week.

Any comments and suggestions for improvement, as well as PR reviews, are very very welcome

mihaimaruseac added a commit to mihaimaruseac/model-transparency that referenced this issue Jun 4, 2024
This is the middle layer of the API design work (sigstore#172). We add a manifest abstract class to represent various manifests (sigstore#111 sigstore#112) and also ways to serialize a model directory into manifests and ways to verify the manifests.

For now, this only does what was formerly known as `serialize_v0`. The v1 and the manifest versions will come soon.

Note: This has a lot of inspiration from sigstore#112, but makes the API work with all the usecases we need to consider right now.

Signed-off-by: Mihai Maruseac <[email protected]>
mihaimaruseac added a commit to mihaimaruseac/model-transparency that referenced this issue Jun 4, 2024
This is the middle layer of the API design work (sigstore#172). We add a manifest abstract class to represent various manifests (sigstore#111 sigstore#112) and also ways to serialize a model directory into manifests and ways to verify the manifests.

For now, this only does what was formerly known as `serialize_v0`. The v1 and the manifest versions will come soon.

Note: This has a lot of inspiration from sigstore#112, but makes the API work with all the usecases we need to consider right now.

Signed-off-by: Mihai Maruseac <[email protected]>
mihaimaruseac added a commit to mihaimaruseac/model-transparency that referenced this issue Jun 4, 2024
This is the middle layer of the API design work (sigstore#172). We add a manifest abstract class to represent various manifests (sigstore#111 sigstore#112) and also ways to serialize a model directory into manifests and ways to verify the manifests.

For now, this only does what was formerly known as `serialize_v0`. The v1 and the manifest versions will come soon.

Note: This has a lot of inspiration from sigstore#112, but makes the API work with all the usecases we need to consider right now.

Signed-off-by: Mihai Maruseac <[email protected]>
mihaimaruseac added a commit that referenced this issue Jun 5, 2024
)

* Migrate `serialize_v0` to new API.

This is the middle layer of the API design work (#172). We add a manifest abstract class to represent various manifests (#111 #112) and also ways to serialize a model directory into manifests and ways to verify the manifests.

For now, this only does what was formerly known as `serialize_v0`. The v1 and the manifest versions will come soon.

Note: This has a lot of inspiration from #112, but makes the API work with all the usecases we need to consider right now.

Signed-off-by: Mihai Maruseac <[email protected]>

* Clarify some comments

Signed-off-by: Mihai Maruseac <[email protected]>

* Encode name with base64

Signed-off-by: Mihai Maruseac <[email protected]>

* Add another test case

Signed-off-by: Mihai Maruseac <[email protected]>

* Empty commit to retrigger DCO check.

See dcoapp/app#211 (comment)

Signed-off-by: Mihai Maruseac <[email protected]>

---------

Signed-off-by: Mihai Maruseac <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants