Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinning and Garbage Collection #28

Closed
achingbrain opened this issue Jan 27, 2023 · 2 comments · Fixed by #36
Closed

Pinning and Garbage Collection #28

achingbrain opened this issue Jan 27, 2023 · 2 comments · Fixed by #36
Labels
Milestone

Comments

@achingbrain
Copy link
Member

Context

Garbage collection in js-ipfs originally followed the go-ipfs model whereby pins were stored in a big DAG that was traversed to work out which blocks could be deleted and which couldn't while running garbage collection.

ipfs/js-ipfs#2771 changed that to store the pins in the datastore instead of a big DAG which yielded a massive speed up when adding new pins, but garbage collection was still slow because the algorithm has to walk every dag that's pinned to build up a list of blocks in those dags.

Helia gives us an amazing opportunity to solve that slow garbage collection problem, this would be incredibly valuable to pinning services, for example, who typically don't garbage collect anything as their blockstores are so large the time it takes to run gc makes it impractical to do so.

Gotchas

  • Js-ipfs GC is a stop-the-world model, e.g. the blockstore cannot be used while GC occurs in case add operations add blocks that GC then immediately deletes. Coming up with a clever way to not have to do this would be greatly appreciated
  • Two CIDs with different versions and/or codecs can have the same multihash - if both are pinned the removal of one pin should not delete the blocks for the other
  • If the application crashes while creating a pin, it should mark the pin as failed
  • Manually deleting blocks from the blockstore should be prevented when the block being deleted is part of a pinned DAG
  • Datastore keys are used for storing pin metadata - these can be stored on filesystems so all keys should be case-insensitive

Interface

An interface to the pinning system might look like this (somewhat similar to js-ipfs):

import { CID } from 'multiformats/cid'
import type { AbortOptions } from '@libp2p/interfaces'

enum PinStatus {
  /**
   * All blocks in the pin have been stored in the blockstore
   */
  pinned = 'pinned',

  /**
   * The pin is being created, blocks in the DAG are still being fetched from
   * the network
   */
  pending = 'pending',

  /**
   * Not all blocks could be fetched from the network - this is usually because
   * abort signal passed into the `pin.add` operation emitted it's `abort` event.
   */
  failed = 'failed'
}

interface AddOptions extends AbortOptions {
  /**
   * When pinning a DAG, Helia will ensure that all blocks in the DAG are present in
   * the blockstore which may involve network operations. By default Helia will traverse
   * the entire DAG but pass a depth here to limit that behaviour.
   */
  depth?: number

  /**
   * A user-chosen name for the pin
   */
  name?: string

  /**
   * User-specific metadata for the pin
   */
  metadata?: Record<string, string | number | boolean>

  /**
   * Receives progress events
   */
  progress?: (evt: Event) => void
}

interface RmOptions extends AbortOptions {
  /**
   * Receives progress events
   */
  progress?: (evt: Event) => void
}

interface LsOptions extends AbortOptions {
  type?: PinType
}

interface Pin {
  /**
   * The current status of the pin
   */
  status: PinStatus

  /**
   * The pinned CID
   */
  cid: CID

  /**
   * The pin name
   */
  name?: string

  /**
   * `Infinity` for a recursive pin, 1 for a direct pin or an arbitrary number
   */
  depth: number

  /**
   * User-specific metadata for the pin
   */
  metadata: Record<string, string | number | boolean>
}

interface Pinning {
  /**
   * Pin the block that corresponds to the passed CID. If the DAG in the pinned block
   * contains CIDs, the blocks corresponding to those CIDs will also be pinned.  Pass
   * `{ direct: true }` to only pin the top level block.
   */
  add: (cid: CID, opts?: AddOptions) => Promise<void>

  /**
   * Unpin the block that corresponds to the passed CID. If the DAG in the pinned block
   * contains CIDs, the blocks corresponding to those CIDs will also be unpinned.  Pass
   * `{ direct: true }` to only unpin the top level block.
   */
  rm: (cid: CID, opts?: RmOptions) => Promise<void>

  /**
   * List all pins stored by this node
   */
  ls: (opts?: LsOptions) => AsyncGenerator<Pin>
}

interface GCOptions {
  /**
   * Receives progress events
   */
  progress: (evt: Event) => void
}

interface Helia {
  // ...other methods here...

  /**
   * Run garbage collection on this node - any blocks that are not pinned will be deleted
   */
  gc: (opts?: GCOptions) => Promise<void>

  /**
   * The pinning API
   */
  pin: Pinning
}

Strategies

Some benchmarking will be required to choose the appropriate pinning strategy. These should store several 100k of pins of varying depths before running gc.

Classic

  • Store a /pin/${cid.multihash} object for each pin:
interface Pin {
  /**
   * A user friendly name for the pin
   */
  name?: string

  /**
   * `Infinity` for a recursive pin, 1 for a direct pin or an arbitrary number
   */
  depth: number

  /**
   * User-specific metadata for the pin
   */
  metadata: Record<string, string | number | boolean>

  /**
   * The codec from the CID that was pinned
   */
  codec: number

  /**
   * The version from the CID that was pinned
   */
  version: number
}
  • When running GC the CID is recreated from the version & codec from the pin and the multihash from the pin datastore key
  • All recreated CIDs are traversed, a set of all pinned CIDs is created, then all blocks in the datastore that do not have CIDs with multihashes corresponding to the pinned CIDs are deleted
  • Really slow!

Reference counting

  • Store a /pin/${cid} object for each pin:
interface Pin {
  /**
   * A user friendly name for the pin
   */
  name: string

  /**
   * `Infinity` for a recursive pin, 1 for a direct pin or an arbitrary number
   */
  depth: number

  /**
   * User-specific metadata for the pin
   */
  metadata: Record<string, string | number | boolean>
}
  • Also store an object in the datastore for every pinned block referenced by the multihash of the block, e.g. '/pinned-block/${cid.multihash}'
interface PinnedBlock {
  pinCount: number
  pinnedBy: CID[]
}
  • When a block is pinned, increment pinCount by 1, creating the PinnedBlock entry if necessary
  • Also store the root CID that is being pinned so the user can be informed of which pin to remove in order to delete the block
  • When a block is unpinned, decrement pinCount - if it's then zero, remove the /pinned-block/${mh} key from the datastore
  • When running GC, delete any block without a corresponding PinnedBlock entry in the datastore - this should be nicely parallelizable
  • When using the helia.blockstore.delete method only checking for the presence of a PinnedBlock entry should be sufficient to prevent a pinned block from being deleted accidentally

Something else?

We are open to suggestions, but all implementations should be benchmarked.

@tinytb tinytb added this to the v1 milestone Feb 3, 2023
achingbrain added a commit that referenced this issue Feb 20, 2023
Adds the pinning API as specified in #28 - see that issue for discussion

Benchmarks incoming!

Closes: #28
achingbrain added a commit that referenced this issue Feb 24, 2023
Adds the pinning API with the reference counting implementation as
specified in #28 - see that issue for discussion

Benchmarks prove reference counting is faster than DAG traversal during
gc, see [this comment for results &
discussion](#36 (comment)).

Closes: #28
github-actions bot pushed a commit that referenced this issue Mar 23, 2023
## @helia/interface-v1.0.0 (2023-03-23)

### Features

* add bitswap progress events ([#50](#50)) ([7460719](7460719)), closes [#27](#27)
* add pinning API ([#36](#36)) ([270bb98](270bb98)), closes [#28](#28) [/github.com//pull/36#issuecomment-1441403221](https://github.com/ipfs//github.com/ipfs/helia/pull/36/issues/issuecomment-1441403221) [#28](#28)
* initial implementation ([#17](#17)) ([343d360](343d360))

### Bug Fixes

* extend blockstore interface ([#55](#55)) ([42308c0](42308c0))
* make all helia args optional ([#37](#37)) ([d15d76c](d15d76c))
* survive a cid causing an error during gc ([#38](#38)) ([5330188](5330188))
* update block events ([#58](#58)) ([d33be53](d33be53))
* update blocks interface to align with interface-blockstore ([#54](#54)) ([202b966](202b966))

### Dependencies

* update interface-store to 5.x.x ([#63](#63)) ([5bf11d6](5bf11d6))

### Trivial Changes

* add release config ([a1c7ed0](a1c7ed0))
* fix ci badge ([50929c0](50929c0))
* release main ([#62](#62)) ([2bce77c](2bce77c))
* update logo ([654a70c](654a70c))
* update publish config ([913ab6a](913ab6a))
* update release please config ([b52d5e3](b52d5e3))
@github-actions
Copy link
Contributor

🎉 This issue has been resolved in version @helia/interface-v1.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

github-actions bot pushed a commit that referenced this issue Mar 23, 2023
## helia-v1.0.0 (2023-03-23)

### Features

* add bitswap progress events ([#50](#50)) ([7460719](7460719)), closes [#27](#27)
* add pinning API ([#36](#36)) ([270bb98](270bb98)), closes [#28](#28) [/github.com//pull/36#issuecomment-1441403221](https://github.com/ipfs//github.com/ipfs/helia/pull/36/issues/issuecomment-1441403221) [#28](#28)
* initial implementation ([#17](#17)) ([343d360](343d360))

### Bug Fixes

* make all helia args optional ([#37](#37)) ([d15d76c](d15d76c))
* survive a cid causing an error during gc ([#38](#38)) ([5330188](5330188))
* update blocks interface to align with interface-blockstore ([#54](#54)) ([202b966](202b966))
* use release version of libp2p ([#59](#59)) ([a3a7c9c](a3a7c9c))

### Trivial Changes

* add release config ([a1c7ed0](a1c7ed0))
* fix ci badge ([50929c0](50929c0))
* release main ([#62](#62)) ([2bce77c](2bce77c))
* update logo ([654a70c](654a70c))
* update publish config ([913ab6a](913ab6a))
* update release please config ([b52d5e3](b52d5e3))
* use wildcards for interop test deps ([29b4fb0](29b4fb0))

### Dependencies

* update interface-store to 5.x.x ([#63](#63)) ([5bf11d6](5bf11d6))
* update sibling dependencies ([ac28d38](ac28d38))
@github-actions
Copy link
Contributor

🎉 This issue has been resolved in version helia-v1.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

This was referenced Jan 8, 2024
achingbrain added a commit that referenced this issue Jan 8, 2024
* deps(dev): bump aegir from 39.0.13 to 40.0.11

Bumps [aegir](https://github.com/ipfs/aegir) from 39.0.13 to 40.0.11.
- [Release notes](https://github.com/ipfs/aegir/releases)
- [Changelog](https://github.com/ipfs/aegir/blob/master/CHANGELOG.md)
- [Commits](ipfs/aegir@v39.0.13...v40.0.11)

---
updated-dependencies:
- dependency-name: aegir
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: achingbrain <[email protected]>
achingbrain pushed a commit that referenced this issue Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants