Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: manifest file for the IPFS gateway #6214

Closed
ItalyPaleAle opened this issue Apr 13, 2019 · 21 comments
Closed

Proposal: manifest file for the IPFS gateway #6214

ItalyPaleAle opened this issue Apr 13, 2019 · 21 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/accepted This issue has been accepted topic/gateway Topic gateway

Comments

@ItalyPaleAle
Copy link

ItalyPaleAle commented Apr 13, 2019

Problem

Some people (myself included) are running websites through IPFS, and relying on gateways (like Cloudflare's) to serve them to users via HTTP(S).

The current IPFS protocol has some limitations when doing this. The biggest one is the inability to set custom headers that a HTTP web server might need, starting from Content-Type.

Proposed solution

I propose we create a manifest file that can be stored inside each folder added to IPFS. The manifest file [could be] a YAML (or JSON) document, for example called .ipfs-gateway.yaml and could contain additional metadata that is relevant to IPFS gateways only. For example:

# Version of this manifest format
version: 1

# Add rules to specific files/patterns
files:
  - name: 'logo.svg'
    contentType: 'image/svg' # Set the Content-Type header
  - name: 'images/*.dng' # Use glob-style patterns
    contentType: 'image/x-adobe-dng'
    contentDisposition: 'attachment' # Set the Content-Disposition header
  - name: 'index.html'
    contentSecurityPolicy: '...' # Set the Content-Security-Policy header
    etag: 'abcdef123' # Set the ETag header
    contentLanguage: 'en-us' # Set the Content-Language header
  - name: 'redirect.html'
    redirect: 'other-page.html' # Set the Location header (requires a 3xx status code)

# Configure additional options for the gateway
options:
  # Redirects HTTP -> HTTPS traffic
  alwaysUseHTTPS: true

When the IPFS gateway serves a folder, it needs to check if there's a manifest file, and apply the rules configured in it.

The manifest allows adding certain HTTP headers to files served by the gateway. We should explicitly whitelist the allowed headers, as in shared gateways there could be issues with other apps (e.g. imagine someone deployed an app that enabled HSTS, and that would impact the entire gateway).

The manifest file should be placed in the root of the folder added to IPFS. Since it's just another document published through the IPFS network, a change in the manifest file would result in the entire folder having a completely different hash, and this is by design.

Alternative proposals

There have been many users asking to implement custom metadata/headers for files inside IPFS, including on ipfs-inactive/faq#224 I believe that, while the ask was for the ability to add metadata to files published on IPFS, in reality what users want/need could be better satisfied with a proposal like this.

Compared to adding support for metadata in IPFS, this proposal has many pros:

  • It's easier to implement as it doesn't require changes to the IPFS protocol, and users on old versions of IPFS would simply ignore the manifest.
  • The manifest file would be picked up by the IPFS gateway only, and that's good. Users who just request documents via the CLI don't need metadata anyways.
  • Adding metadata to each file is cumbersome when you try to add/pin files using the IPFS APIs and CLIs. For example, you couldn't just do ipfs add -r folder/ anymore, and the CLI would become complex fast.
  • The manifest file is extendible in the future, should we want to use it for other configurations for the gateway.
  • The manifest file becomes part of the folder published on IPFS, so a change in the document would lead to the folder having a completely different hash. I believe this should be considered a feature, as it maintains the immutability principle of IPFS.
  • Lastly, manifest files can be checked into source control together with the web app published on IPFS.

The cons:

  • It requires adding another file to the folder published
  • It doesn't work for files published on IPFS that aren't part of a folder
@magik6k
Copy link
Member

magik6k commented Apr 16, 2019

I like this approach but we have to be careful:

  • We should only allow safe to set headers
  • UnixFSv2 might start moving again soon, we might wast to use it for some of this

@Stebalien Stebalien added the kind/enhancement A net-new feature or improvement to an existing feature label Apr 18, 2019
@dokterbob
Copy link
Contributor

dokterbob commented May 5, 2019

We've been waiting for any sort of progress on UnixFS for a very long time. It's starting to get to the point where I'd rather have something that works now than something that is always at some point in the future. Specifically: mimetype support.

Concretely, there hasn't been changes to UnixFSv2 for over a year!
https://github.com/ipfs/unixfs-v2

@ItalyPaleAle
Copy link
Author

I am not very familiar with UnixFSv2, but echoing @dokterbob I am wondering too why this would have a dependency on that?

@lanzafame lanzafame added the topic/gateway Topic gateway label Jul 16, 2019
@Stebalien Stebalien added the status/accepted This issue has been accepted label Nov 16, 2019
@Stebalien
Copy link
Member

We've done some soul searching recently on mime-types and realized that, really, they probably don't belong in the filesystem itself anyways. Even if we did store them in the filesystem, you're right about tooling.

Given that, I like this proposal and would be happy to accept a patch (although, to be realistic, I may take a while to review it).

Changes from the original proposal:

  • I wouldn't support alwaysUseHTTPS.
    • Using HTTPs should be a gateway level option.
    • Really, all public gateways should use HTTPs.
  • I would use JSON. While it sucks to write, this is a part of a protocol. We can always add toml/yaml support later if this becomes too much trouble.

Thank you @ItalyPaleAle for taking the time to think this through.

@olizilla
Copy link
Member

Very related:

@rhyeal
Copy link

rhyeal commented Nov 26, 2019

I am in favor of something like this specification for gateways. One of our issues in adopting IPFS for our frontend is that we serve HSTS preload and CSP headers on our domains, making it necessary for these headers to be present on any gateway solution. While this can be solved by making our own gateway, that defeats the purpose of the decentralized nature of IPFS.

I do wonder about integrating HTTPS and custom certificates into gateways as well. Does anyone have a solution to that? Perhaps a DNS-level solution for the gateway to provide a valid certificate?

EDIT:

  • I support the alwaysUseHTTPS toggle as having valid HTTP -> HTTPS upgrades on the connection is required for HSTS preload websites
  • I would use YML, not JSON. While I understand the "purity" of JSON, YML is a commonly accepted configuration language (see: Serverless.com, Swagger, AWS CloudFormation), especially relevant to the group of programmers that would use IPFS gateways for content

@ItalyPaleAle
Copy link
Author

@rhyeal I think that @Stebalien's point, which I do subscribe to, is that all decisions related to the transport layer, such as enabling or enforcing TLS, or adding HSTS, should be done outside of the gateway. Indeed, you likely don't want the ipfs-gateway directly exposed on the Internet, but you should proxy it with Nginx or something similar. You can then enable HTTPS and HSTS on the gateway.

@kevincox
Copy link

Note that gateways are also used for serving sites on custom domains. For these domains the user may want to enforce things such as HSTS. I think that there should be a suggested set of allowed headers for shared-domain and custom-domain gateways can use whatever headers they would like.

@kevincox
Copy link

I would also highly recommend JSON for the manifest as it is ubiquitous. YAML parsers have notable variation in what they accept and could cause compatibility problems. We can consider automatically compiling YAML to JSON on upload however the actual protocol should be JSON.

@ItalyPaleAle
Copy link
Author

@kevincox I disagree on HSTS. It should be something set at the infrastructure-level and not at the app-level. Let's say also you deploy an app and serve it on that custom domain and you enable HSTS. The next version of the app does not contain that key in the manifest: now you can't roll-back HSTS easily.

As for format... My preference would be to support both YAML and JSON if possible. We could use the same file and parse it with a YAML parser... since YAML is a superset of JSON, there shouldn't be issues

@kevincox
Copy link

I guess I can be convinced about HSTS specifically, however it would be nice if there was a portable way to specify this in case I even need to move between gateways. However I can see the argument that it can be infrastructure specific.

@kevincox
Copy link

kevincox commented Feb 28, 2020

I think the advantages of YAML are small compared to allowing the user to write YAML and convert it to JSON before storing.

Note that YAML also has security concerns including insecure features (often disabled by default but this is a footgun that will keep happening over time) as well as Denial of Service options (a small YAML file can expand to consume arbitrary amounts of RAM). In the end you need to restrict to a subset of YAML which will cause confusion, interoperability concerns and security vulnerabilities (when people fail to make this restriction).

I strongly recommend that we stick to something simple (like JSON) for the protocol end of things.

I agree that YAML can be simpler to write, however we can benefit from that while still keeping the protocol efficient, simple and safe.

@zebateira
Copy link

This is looking really interesting.
Regarding redirects, web apps can perform redirects client-side, for SEO purposes, a server-side redirect would be preferred.
So would be nice to be able to set a status code in the config:

... 

# Add rules to specific files/patterns
files:
  - name: 'redirect.html'
    redirect: 'other-page.html' # Set the Location header (requires a 3xx status code)
    statusCode: 302 # default could be 301

...

@lidel
Copy link
Member

lidel commented Jan 9, 2021

Related feature request from DNSLink website backed by go-ipfs gateway:

  • enable website owners to define X-Frame-Options header to protect their page from click jacking / cross site request forgeries (CSRF)

@lidel
Copy link
Member

lidel commented Mar 2, 2021

Another specific pattern that manifest could help with is ensuring that specific paths are loaded from original HTTP server (selectively disabling DNSLink redirect to local node).

The idea here is to ease the transition from centralized backend to decentralized websites.
For a while we will live in a hybrid reality, where things like this happen:

2021-03-02--01-18-41

@bmann
Copy link
Contributor

bmann commented Mar 11, 2021

Thanks @lidel for mentioning this thread.

Our interest is in giving developers the ability to customize the apps they host on IPFS.

For Single Page Apps, a default root file is the big thing, and for us that would be the best starting point.

Rather than coming up with a spec from scratch, looking at specifically what hosting providers, buildpacks https://buildpacks.io, and others already use as common command patterns would be a good starting point. IF you're trying to model it on a hosting syntax (which I think we should).

For example, Heroku Static Buildpack https://github.com/heroku/heroku-buildpack-static

So, for SPA routing, this looks like:

{
  "routes": {
    "/*.html": "index.html",
    "/route/**": "bar/baz.html"
  }
}

This is JSON. For the record, we have a YAML file for our CLI, and lots of cloud config stuff is in YAML :P We shouldn't support both -- just adopt one.

I posted back last August about this on the Discourse, but didn't end up going back to it yet https://discuss.ipfs.io/t/support-for-spa-mode-in-http-gateway/8953

We also intend to encourage users to include a PWA Manifest in all apps, which would be another location that some commands could go.

I'd like to see some of these commands go into the default gateway, because if everyone just uses nginx to proxy, then apps on IPFS won't be portable.

@saurik
Copy link

saurik commented Mar 11, 2021

@bmann (I might, and frankly probably am, misunderstanding what you are suggesting; I'm casually following some of these related issues with interest, as I am waiting on some of the functionality in order to be able to move all of my websites to IPFS, but I'm not too in-depth on IPFS in general.) FWIW, I'd argue the opposite: using an off-the-shelf format like that is going to make people expect that whatever the current documentation for that format--including potentially any minor breaking changes they make for their community--should be supported by IPFS (including any complex syntax involving variable references and the such, which seems important for being able to support redirects, and which I'd argue is the biggest blocker for porting existing small sites, a la ipfs/notes#339 and #7392; to do this well seems like it would require at least regex backreferences). In my "perfect world", IPFS wouldn't #include some other large, complex product behavior as part of what should eventually be a standardizable specification with numerous implementations.

@bmann
Copy link
Contributor

bmann commented Mar 12, 2021

@saurik it's pretty minimal. I'm not saying "include it", I'm saying "the customer" of developers wanting to host apps on IPFS over https are going to be used to something that looks like this.

IPFS is closest to a "PaaS" experience like what buildpacks deliver. The platform is the protocol plus the HTTPS gateway. It supports 404s in a unique way and ... that's about it?

The flip side of this is -- do nothing, and everyone implements this as custom Nginx proxy rules (which partially are included in these static buildpacks, too).

So: lets work together to be inspired by standards like this, look at other static hosting providers and their templates for inspiration. Do we want to book a call to discuss live? Should we open a HackMD to play with some example formats? Let's move this forward!

@lidel
Copy link
Member

lidel commented Apr 12, 2021

Pulse check: I feel the recent Cloudflare announcement makes it easier for us to standardize on _redirects files for websites on IPFS (see rationale and links to docs and examples in #7392 (comment)).

How do we feel about removing redirects from the scope of the manifest file?
Anyone feeling strongly about his either way?

Just to be clear:

  • The manifest is still something we want to add as a means for customizing HTTP headers.
    • The manifest should work only on DNSLink websites and subdomain gateways (when a distinct origin is present).
      • Rationale: I don't believe the ability to set arbitrary HTTP headers can be exposed securely on path gateways, and its actual real world utility is around more complex websites, which need separate origin by definition. By explicitly scoping this to env with own Origin, we can allow for way more than we ever could otherwise.
  • Due to this, I don't think manifest needs to include redirects. Those are separate use cases.
    • This also makes it easier to support redirects everywhere (manifest would be limited to origins, but _redirects could work on path gw, subdomain gw, and dnslink origin)
    • W could ship support for _redirects independently of the manifest work.

@lidel
Copy link
Member

lidel commented Nov 15, 2021

Another data point: Netlify has _headers file (docs) which compliment the _redirects discussed in #7392 (comment)

@lidel
Copy link
Member

lidel commented Nov 18, 2021

Either way, we need specs first. Let's consolidate and continue in ipfs/specs#257

@lidel lidel closed this as completed Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/accepted This issue has been accepted topic/gateway Topic gateway
Projects
None yet
Development

No branches or pull requests