Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Refactor to a plugin/middleware architecture #167

Open
7 tasks
SgtPooki opened this issue Jan 14, 2025 · 7 comments · May be fixed by #169
Open
7 tasks

Proposal: Refactor to a plugin/middleware architecture #167

SgtPooki opened this issue Jan 14, 2025 · 7 comments · May be fixed by #169
Assignees

Comments

@SgtPooki
Copy link
Member

SgtPooki commented Jan 14, 2025

I've already mentioned this a few times in Helia working group and to @2color and @achingbrain directly, but I wanted to formalize a plan.

Summary

The current VerifiedFetch class (in verified-fetch.ts) handles various content types (dag-cbor, dag-pb, etc.) but has grown large and somewhat monolithic. It uses a procedural structure (if/switch) that makes adding new codecs and behaviors cumbersome. This proposal introduces a plugin/pipeline approach to make VerifiedFetch more extensible and maintainable.

Current issues

Monolithic VerifiedFetch class

All logic for handling IPNS records, CAR files, DAG-CBOR, raw blocks, etc., is in a single class. This makes it difficult to add new handlers (e.g., new codecs, new output formats) without modifying the core class.

Hardcoded “if/else” logic

We have many if (accept === '...') branches, or switch statements keyed by CID codec. The logic is scattered throughout large private methods, reducing clarity and making changes or additions risky.

Limited Extensibility

Users and future contributors cannot easily plug in their own custom handlers. Hard-coded references to existing handlers make it impossible to replace them without forking the repo.

This is very relevant for #86. If we implement the below changes, it would be trivial to add dir-index-listing support.

Proposed changes

Define a plugin interface

example:

interface FetchHandlerPlugin {
  canHandle (options: FetchHandlerFunctionArg): boolean
  handle (options: FetchHandlerFunctionArg): Promise<Response | undefined>
}
  1. A plugin can check if it wants to handle the request (canHandle) based on options (CID codec, Accept header, path, etc.).
  2. If it can handle the request, it returns a Response. Otherwise, undefined (so other plugins can try).

Break out each handler into its own plugin

Convert handleIPNSRecord, handleCar, handleTar, handleJson, handleDagCbor, handleDagPb, and handleRaw into separate plugin/middleware classes implementing the above interface.

Example:

export class IpnsRecordPlugin implements FetchHandlerPlugin {
  canHandle({ accept }: FetchHandlerFunctionArg, context: VerifiedFetchContext): boolean {
    return accept === 'application/vnd.ipfs.ipns-record'
  }
  async handle(options: FetchHandlerFunctionArg, context: VerifiedFetchContext): Promise<Response> {
    // implementation
  }
}

Create a pipeline in VerifiedFetch's fetch method

  1. In the VerifiedFetch constructor, store an array of plugins in the desired priority order.
  2. In fetch(), iterate through that array:
for (const plugin of this.plugins) {
  if (plugin.canHandle(handlerArgs, this.context)) {
    const response = await plugin.handle(handlerArgs, this.context)
    if (response != null) {
      // apply final headers, e.g. ETag, Cache-Control, etc.
      return this.handleFinalResponse(response)
    }
  }
}
  1. If no plugin handles the request, return a default “not supported” response.

Share utilities with a “context”

  • Common utilities (e.g., this.getBlockstore, handleServerTiming) could be passed to each plugin via a shared context object.
  • This prevents duplication and avoids scattering logic throughout the code.

Allow external/custom plugins

  1. Make the plugin array configurable (e.g., via the constructor or an addPlugin() method).
  2. Users can add or override plugins without touching core code.

Benefits

Better maintainability and clarity

  • Each handler’s logic resides in a dedicated, self-contained plugin.
  • The VerifiedFetch class focuses on orchestration (resolving resources, final headers, plugin pipeline) rather than encoding/decoding details.

Easily extensible

  • New data formats or custom request handling can be added by implementing a new plugin.
  • Existing plugins can be replaced or removed without modifying the rest of the code.

Improved Testability

  • Individual plugins can be tested in isolation.
  • The core fetch() method (the pipeline) can be tested separately from the specific codec logic.

Open Questions

Plugin registration

  • Should we fully configure the plugin array in the constructor only, or allow an addPlugin() method that can be called post-construction?
  • Should we think about making the initial parse resource more extensible, or leave that for now since it's required processing before moving to the handlers?
  • How can we ensure that individual plugins/middleware can be easily extended without needing to override everything? e.g. handleDagPb has a try/catch where dirIndexHtml was added in the catch method.. How do we implement canHandle in a future handleDagPb plugin so that if finding index.html fails, it can move on to a handleDirIndexHtml? I feel like for this to work we would need a context that can evolve and be modified as certain processing is done. (i.e. handleDagPb.canHandle=true, but fails due to being a directory and missing an index.html)

Prioritization

  • How do we handle potential overlap among plugins? For instance, if two plugins both return true from canHandle(), which gets priority? Can we ensure there is no overlap?
  • What is the LCD of data needed to pass to plugins for them to discern whether they should take over a request or not? Based on the current code, I think FetchHandlerFunctionArg combined with the determined accept header should be sufficient.
  • Do we want to see if there are multiple plugins that canHandle prior to handle-ing a request?

Performance considerations

  • @achingbrain previously mentioned we should do some performance measurements prior to this change, but are there any performance-critical pathways we should measure? The current code doesn't lend itself to easy benchmarking

Next Steps

Gather feedback

Please share your thoughts on the plugin interface design, naming conventions, and plugin priorities.

Implementation

When there is consensus, we’ll proceed to:

  1. Extract each existing handler method into a plugin class.
  2. Replace the internal “big switch” in fetch() with a plugin pipeline.
  3. Ensure existing tests pass and add new tests for each plugin.

Feel free to comment or suggest changes. Once we finalize the approach, we can open a PR with incremental commits for easier review.

@SgtPooki SgtPooki changed the title Proposal: Refactor to a plugin/middleware Architecture Proposal: Refactor to a plugin/middleware architecture Jan 14, 2025
@2color
Copy link
Member

2color commented Jan 15, 2025

This seems reasonable, in particular, in seems that this would improve testability and maintainability.

Extensibility remains an open question for me though. In most contexts extensibility, especially runtime extensibility allows for adaptability by users. However, in this case, Verified Fetch aims to adhere as closely as possible to the gateway specs, so I wonder if this is really a bundling question (gorilla banana problem) rather then extensibility per se. In other words, it seems that it's modularity we seek rather than extensibility.

Since this came up in the context of directory listing, do you see html rendering of directories as part of Verified Fetch's scope?

I ask because the specs are pretty vague:

While implementations decide on the way HTML directory listing is generated and presented to the user, following below suggestions is advised.
Linking to alternative response types such as CAR and dag-json allows clients to consume directory listings programmatically without the need for parsing HTML.
https://specs.ipfs.tech/http-gateways/path-gateway/#generated-html-with-directory-index

The spec isn't strict about what you do in the case of no explicit Accept header and a directory terminal element without an index.html.

Might this in part be a problem with the specs that aren't specific enough? On one hand, it may make sense to return a structured response with the directory listing (when there's no index.html), allowing consumers to process and render directory listings html. However, this deviates from how UnixFS is handled in the gateway spec (where mostly bytes are streamed to the user). On the other hand, if you don't return something useful, you waste useful work you've done and require another roundtrip by consumers.

@SgtPooki
Copy link
Member Author

Since this came up in the context of directory listing, do you see html rendering of directories as part of Verified Fetch's scope?

I don't.

The whole goal of this change isn't just to satisfy the dir-index-html listing.. but implementing that is difficult, which speaks to the lack of flexibility of verified-fetch for any changes we want to make. Improving testability, maintainability, and ensuring the library can be extended -- to add dir-index-html listing, or to extend for any unforeseen future use-cases -- are all important and enabled by the proposed changes here.

In other words, it seems that it's modularity we seek rather than extensibility.

Shmemantics.. I think the library also needs modularity, but being able to extend the library, even with core functionality, is important. Verified-fetch is in a unique position that's a little different than a typical library.. we want devs to be able to benefit from it, but we also depend on it very directly with two projects that may conflict with typical verified-fetch programmatic use-cases.

With that said, it sounds like you feel that adding an addPlugin method is not desirable. I can get on board with that, but I do think there are cases where helia-http-gateway and sw-gateway will need to alter the functionality of verified-fetch beyond the base programmatic use-case.

@2color
Copy link
Member

2color commented Jan 15, 2025

With that said, it sounds like you feel that adding an addPlugin method is not desirable. I can get on board with that, but I do think there are cases where helia-http-gateway and sw-gateway will need to alter the functionality of verified-fetch beyond the base programmatic use-case.

I was mostly curious to better understand how plugins would be concretely used. Based on the proposed interface, plugins are essentially handlers for CID codec/Accept tuples (simplifying a bit here). That clearly sounds useful. What are some of the use-cases this would be used for in the helia-http-gateway?

@SgtPooki
Copy link
Member Author

SgtPooki commented Jan 16, 2025

Based on the proposed interface, plugins are essentially handlers for CID codec/Accept tuples (simplifying a bit here)

yep, pretty much a "state machine" except instead of hardcoding a simple map where all things are known, we can flexibly call a method for each handler and let them decide if it's appropriate for them to handle the request.

we could go more of the express-middleware route where we just call "handle" on all plugins and maybe next() to give other handlers an opportunity to do things.. but I think that doesn't quite fit the needs here.

What are some of the use-cases this would be used for in the helia-http-gateway?

I don't have a full vision for all the potential benefits, but I know the flexibility discussed in this issue would allow a variety of future pivots. A few things off the top of my head for helia-http-gateway specifically:

  • have a more robust dir-index-html listing (sw-gateway needs to focus on a smaller library size).
  • It could expand on how it handles some requests in ways that may not be possible, or optimal, in a service-worker.
  • change priority of certain handling logic to optimize as needs arise.

@SgtPooki
Copy link
Member Author

One thing I would like to resolve before starting this work is whether benchmarking is needed or if these changes are desirable enough, regardless of the cost.

What types of performance or benchmarks can we use to establish a basic understanding of how long it takes verified fetch to do its work (outside of networking delays) so we can ensure there is no considerable degradation with these changes?

If we do want to do some benchmarking, I can request some of the fixtures (in the interop package) over 1000+ runs and see where the average processing time is. What fixtures should we check? I think the most time-consuming requests are:

  1. DNSlink (network is the bottleneck, and parseResource won't be changing with this proposal, really.. can we benchmark this..?)
  2. peerId (network is the bottleneck, and parseResource won't be changing with this proposal, really.. can we benchmark this..?)
  3. handleDagPb (checking for index.html, building dir-index-html..)

The rest of the handlers are fairly lightweight.. but some of the tar/car stuff could probably use benchmarking.

I don't imagine many of the handlers' logic will be changing with these proposed changes, though. It would encapsulate the same logic elsewhere, but anything could go wrong. We will be switching from a simple if expression to multiple function executions, which will be slightly, but likely not noticeably, slower.


Honestly, I think these changes will make verified-fetch much better, and spending time on benchmarking would be unnecessary given all the potential benefits.

@2color
Copy link
Member

2color commented Jan 17, 2025

What types of performance or benchmarks can we use to establish a basic understanding of how long it takes verified fetch to do its work (outside of networking delays) so we can ensure there is no considerable degradation with these changes?

My hunch is that that networking delays account for most of the delay. So I'm tempted to say —in the spirit of shipping— let's go through with this and if needed introduce benchmarks later.

I don't think we're at the performance optimisation stage yet, for which benchmarking would be helpful. The tracing work that's mostly done serves us better right now in terms of understanding the time distribution in the request lifecycle and allows us to later be more targeted in our benchmarking, if needed.

@SgtPooki SgtPooki self-assigned this Jan 22, 2025
@SgtPooki
Copy link
Member Author

i've got car, tar, raw, and ipns record handle* methods moved over to plugins now and almost have all tests passing:

  358 passing (308ms)
  2 pending
  1 failing

  1) @helia/verifed-fetch
       ?format
         raw?format=dag-json should be able to override curl/browser default accept header when query parameter is provided:

      AssertionError: expected 501 to equal 200
      + expected - actual

      -501
      +200
      
      at Context.<anonymous> (file:///Users/sgtpooki/code/work/ipshipyard/ipfs/helia-verified-fetch/packages/verified-fetch/test/verified-fetch.spec.ts:805:30)

@SgtPooki SgtPooki linked a pull request Jan 22, 2025 that will close this issue
3 tasks
@SgtPooki SgtPooki linked a pull request Jan 22, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants