Skip to content

google/ent

🌳 Ent

Ent (named after the Ent species from The Lord of the Rings by J. R. R. Tolkien) is an experimental universal, scalable, general purpose, Content-Addressable Store (CAS) to explore verifiable data structures, policies and graphs.

This is not an officially supported Google product.

Content-Addressability

Ent encourages a model in which files are referred to by their digest (as a proxy for their content), instead of which server they happen to be located (which is what a URL normally is for).

For example, instead of referring to the image below by its URL https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/The_Calling_of_Saint_Matthew-Caravaggo_%281599-1600%29.jpg/405px-The_Calling_of_Saint_Matthew-Caravaggo_%281599-1600%29.jpg, in Ent it would be referred to by its digest sha256:f3e737f4d50fbf6bb6053e3b8c72d6bf7f1a7229aacf2e9b4c97e9dd27cb1dcf.

The digest of a file is a stable cryptographic identifier for the actual data that is contained in the file, and does not depend on which server the file happens to be hosted on, or at what path. If at some point in the future the file were to disappear from the original location and be made available at a different location, the original URL would stop working, but the digest of the file would remain the same, and can be used to refer to that file forever.

Additionally, using a digest to refer to a file is useful for security and trustworthiness: if someone sends you the digest of a file to download (e.g. a program to install on your computer), you can be sure that, by resolving that digest to an actual file via Ent, the resulting file is exactly the one that the sender intended, without having to trust the Ent Server, the Ent Index or the server where the file is ultimately hosted.

Installation

The Ent CLI can be built and installed with the following command, after having cloned this repository locally:

go install ./cmd/ent

And is then available via the binary called ent:

ent help

Examples

In order to fetch a file with a given digest, the ent get subcommand can be used.

You can try the following command in your terminal, which fetches the text of the Treasure Island book:

$ ent get sha256:4c350163715b7b1d0fc3bcbf11bfffc0cf2d107f69253f237111a7480809e192 | head
The Project Gutenberg EBook of Treasure Island, by Robert Louis Stevenson

This eBook is for the use of anyone anywhere in the United States and most
other parts of the world at no cost and with almost no restrictions
whatsoever.  You may copy it, give it away or re-use it under the terms of
the Project Gutenberg License included with this eBook or online at
www.gutenberg.org.  If you are not located in the United States, you'll have
to check the laws of the country where you are located before using this ebook.

Title: Treasure Island

The Ent CLI queries the default Ent index (https://github.com/tiziano88/ent-index) to resolve the digest to a URL, and then fetches the file at that URL, and also verifies that it corresponds to the expected digest. It first buffers the entire file internally in order to verify its digest, and only prints it to stdout if it does match the expected digest.

You can also manually double check that the returned file does in fact correspond to the expected digest:

$ ent get sha256:4c350163715b7b1d0fc3bcbf11bfffc0cf2d107f69253f237111a7480809e192 | sha256sum
4c350163715b7b1d0fc3bcbf11bfffc0cf2d107f69253f237111a7480809e192  -

Ent Server

An Ent Server provides access to an underlying Ent store via an HTTP-based REST API.

An Ent Server may be running locally (on port 27333 by default), or remotely.

Some Ent Servers require the user to be authenticated in order for the user to read and / or write, which is performed via an API key.

JSON HTTP API

The JSON API allows retrieving and creating multiple objects at once. All the methods below are invoked via HTTP POST methods.

  • /api/v1/blobs/get

    Get one or more objects by their digest.

  • /api/v1/blobs/put

    Put one or more objects.

Raw HTTP API

The raw HTTP API is meant to be used by existing basic tools without requiring any special serialization.

The API supports the following HTTP operations:

  • GET /raw/:digest
  • PUT /raw

For instance, this API may be used directly from the terminal via curl:

  • Get an object by digest:

    $ curl --header 'x-api-key: xxx' localhost:27333/raw/sha256:4c350163715b7b1d0fc3bcbf11bfffc0cf2d107f69253f237111a7480809e192 | sha256sum
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    100  390k    0  390k    0     0  5818k      0 --:--:-- --:--:-- --:--:-- 5832k
    4c350163715b7b1d0fc3bcbf11bfffc0cf2d107f69253f237111a7480809e192  -
  • Put an object:

    $ curl --header 'x-api-key: yyy' --upload-file README.md --head localhost:27333/raw
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                    Dload  Upload   Total   Spent    Left  Speed
    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 100 Continue
    
    HTTP/1.1 201 Created
    Location: /raw/sha256:c1a4c83dfeca632af8dcac3591f4b01a303342cf0bae0a63d9a5d7688b0e77cc
    Date: Thu, 10 Mar 2022 18:00:45 GMT
    Content-Length: 0
    
    100  8013    0     0  100  8013      0   132k --:--:-- --:--:-- --:--:--  134k

Ent Index

An Ent index is a "cheap" way to provide access to existing (location-addressed) content on the internet, but in a content-addressable way.

It consists of a static website, which serves an entry for each digest, listing one or more "traditional" URLs which may provide the file in question.

For instance, it may be serialized as a Git repository with a directory structure corresponding to the digest of each entry, and a JSON file for each entry that lists one or more URLs at which the object may be found.

The directory path is obtained by grouping sets of two digits from the digest, and creating a nested folder for each of them; this is in order to limit the number of files or directories inside each directory, since that would otherwise not scale when there are millions of entries in the index.

For instance, the file with digest sha256:4c350163715b7b1d0fc3bcbf11bfffc0cf2d107f69253f237111a7480809e192 is stored in the Ent index under the file /sha256/4c/35/01/63/71/5b/7b/1d/0f/c3/bc/bf/11/bf/ff/c0/cf/2d/10/7f/69/25/3f/23/71/11/a7/48/08/09/e1/92/entry.json, which contains the following entry:

https://github.com/tiziano88/ent-index/blob/fddaa4b78ec4f4ba1e2c1e3e1c0b5ae9b06565e2/sha256/4c/35/01/63/71/5b/7b/1d/0f/c3/bc/bf/11/bf/ff/c0/cf/2d/10/7f/69/25/3f/23/71/11/a7/48/08/09/e1/92/entry.json#L1

Note that the Ent index only stores URLs, not actual data, under the assumption that each URL will keep pointing to the same file forever.

The client querying the index is responsible to verify that the target file still corresponds to the expected digest; if this validation fails, it means that the URL was moved to point to a diferent file after it was added to the Ent index.

Updating the index

Currently, entries may be added to the default index by creating a comment in tiziano88/ent-index#1 containing the URL of the file to index. A GitHub actions is then triggered that fetches the file, creates an appropriate entry in the index, and commits that back into the git repository.

You can try this by picking a URL of an existing file, and creating a comment in tiziano88/ent-index#1 ; after a few minutes, the GitHub action should post another comment in reply, confirming that the entry was correctly incorporated in the index, and printing out its digest, which may then be used with the Ent CLI as above.

If the URL stops pointing to the file that was originally indexed, the Ent CLI will detect that and produce an error.

There is no process for cleaning up / fixing inconsistent entries in the index (yet).

Comparison with other systems

IPFS

https://ipfs.io/

IPFS aims to be a fully decentralized and censorship-resistant protocol and ecosystem, which heavily relies on content-addressability.