Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should mime just use the MIME sniffing algorithm? #106

Open
seanmonstar opened this issue Jan 15, 2019 · 8 comments
Open

Should mime just use the MIME sniffing algorithm? #106

seanmonstar opened this issue Jan 15, 2019 · 8 comments

Comments

@seanmonstar
Copy link
Member

The target domain of the mime crate is webdev. Instead of following the original RFCs (as is done now), perhaps it's best to just use the sniffing algorithm that is now used by web browsers.

@seanmonstar
Copy link
Member Author

cc @nox @SimonSapin @rustonaut

@SimonSapin
Copy link
Contributor

https://mimesniff.spec.whatwg.org/ is called "MIME Sniffing" and contains a parse a MIME type algorithm that is relevant.

But "sniffing" refers to looking at the contents of a file or the body of an HTTP response (in addition to other signals) to make a guess at the actual file format, in case the Content-Type header is missing or unspecific or inaccurate. For example, if the first 6 bytes of a file are GIF89a in ASCII it’s very probably a GIF, especially if it’s used in <img>. That spec also has algorithms for this.

This kind of sniffing can be useful, but I don’t know if it should be in scope for this crate.

@seanmonstar
Copy link
Member Author

Sorry, I don't mean sniffing the body bytes, just using the parse algorithm mentioned in that document.

@seanmonstar
Copy link
Member Author

seanmonstar commented Jan 15, 2019

So, looking through the test cases, I noticed this as a valid MIME type:

!#$%&'+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/!#$%&'+-.^`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^
`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Something I appreciate in the API in mime/master is the difference between MediaType and MediaRange. They allow things like text/* to be a MediaRange, but not MediaType. That combined with headers::ContentType would help prevent setting a frankly bogus content-type header (even though mimesniff says to parse it).

So I'm torn.

@seanmonstar
Copy link
Member Author

After some more thought, the advantages of just following what the Fetch spec wants outweighs having MediaType and MediaRange splits.

So, the new plan is to remove the split, only having Mime again, and only supporting the mimesniff parsing algorithm.

@nox
Copy link
Contributor

nox commented Jan 30, 2019

The closest it is to the mimesniff algorithm, the more we can make use of it.

@nox
Copy link
Contributor

nox commented Jan 30, 2019

What would be useful too is a way to represent just the essence of a mime type, because many specs have prose about that.

@ghostd
Copy link

ghostd commented Oct 28, 2020

Hi,

Is there a way to expose the both parsers (rfc and mime-sniff)? Actually i'd like to make some servo tests pass, so i need to follow the mime-sniff algo. @SimonSapin already has implemented it in rust-url (but not officially exposed by the crate). Should i duplicate the code in servo or can i help here?

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants