-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove hex encoding from flatfs #39
Conversation
@whyrusleeping base64url encoding will unfortunately not work on case insensitive filesystems. You could just hex encode the raw key but that will lead to an overly long file name. This is the one place in the datastore that I believe should be special cased. It should decode the base64url key and reencode it to hex. The end result should be that the filenames will remain unchanged unless the block was effected by the path.Clean bug. |
Then we can use base32 encoding in go-ipfs to make sure that paths are safe for case insensitive file systems or do hex in go-ipfs. base32 saves 20% over hex (hex gives increase of 200% and base32 gives 160%). |
I would prefer we special case it here, rather than increasing the key size everywhere with base32. (base64 only increases the size by 133%) This could have an impact if we use lots of keys in the datastore such as with ipfs/kubo#2860. The datastore key size could also impact the size of the filestore database. The flatfs filesystem is already special cased in a way to expect block keys, so I don't see it as a major problem if we special case it to expect base64url encoded block store keys. This will also have a side effect of enforcing that the keys are are in the correct format. |
Yeah, but we could encode them in base32 in go-ipfs and save them straight to disk in that form. No re-encoding. If we do base64 to hex on disk we have 160% * 200% = 320% increase, if we do base64 decode re-encode to base32 or hex we are encoding two times. I can't see how little bit of RAM usage, while saving the key (+20%, which gives 20 bytes more on stack on average) outweighs fact that we would have to do encode, decode, and encode again. |
@whyrusleeping what do you think? Your pull ipfs/kubo#2860 request will be most likely effected by the key size increase. I have stated my preference and so has @Kubuxu. The wiki article (https://en.wikipedia.org/wiki/Base32) nicely sums up the advantages and disadvantages over Base32 vs Base64 or hex. |
This PR is blocking pretty much everything else at this point ( blocking the migrations code which is blocking the encoding fixing which is blocking a bunch of other PRs) So lets make a decision ASAP |
I think it is good idea. We might want to have it check that path contains only valid characters if Debug mode is active. |
I don't particular agree that base32 is the best way to go. But I don't strongly object either. I can understand the simplicity of avoiding the re-encoding in the flatfs datastore; however, you still have to manipulate the key to get ride of the leading '/'. My preference would of been to avoid a massive renaming of flatfs files, but it is not that big of a deal. I also don't particular like the inclusion of the pad character, but it seams the standard go library doesn't not provide an option to encode it without. Also I looked it up and '=' is a valid character in windows filename (but someone should double test this). The '=' might cause problems if unquoted in Unix. For reference here are the key sizes of a sha256 multihash in various encoding: base16 = 68 bytes |
@kevina I also don't like fact that key are bigger but we have to encode them to something. It can be base32 or Hex, in memory most keys are stored in binary, only on the datastore side they will be stored in base32. In FAT* filesystems |
subscribing to this issue, so that I can migrate js-ipfs-repo to do the same. |
@Kubuxu like I said, I don't strongly object to just using base32 (as oppose to base64url). To get rid of the padding we will either need to write our own encode/decoder for base32 or remove and add back the '=' manually. If we don't add the '=' back we will get a "error: illegal base32 data at input byte 32". |
@whyrusleeping if we decide we want to remove the '=' in the base32 encoding, I can handle that later today. I already spent some time thinking about what needs to be done. |
The failing tests are because some of the datastore impls dont have everything vendored. |
I didn't do vendoring for packages for testing so we would have to install packages from |
the flatfs code is vendored properly |
Tests pass locally. |
RFM? |
RFM, I will look into fixing tests in separate PR. |
|
||
func (fs *Datastore) encode(key datastore.Key) (dir, file string) { | ||
safe := hex.EncodeToString(key.Bytes()[1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping, note that the point of key.Bytes()[1:]
this was to remove the '/' prefix'...
with the changes being made to go-ipfs to base64 encode all keys (previously they were binary). It is safe to assume that inputs to flatfs are path safe.
This PR isnt strictly necessary, but it avoids an extra layer of encoding
cc @Kubuxu @kevina