-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL store #4896
URL store #4896
Conversation
This is beautiful, thank you. ❤️ |
Reading through this and thinking a bit, It seems like the filestore/urlstore logic should be self-contained entirely in a special chunker. The fact that its leaking into the dagbuilderhelper importer stuff appears to be unnecessary. Cleaning that up might make it easier to integrate both the filestore and the urlstore in the same codebase -- more thinking -- For this to work, the import would need to be changed to so that it accepts a chunker with an interface that gives it just a cid and a length (and changing the chunker to be responsible for putting the blocks to the datastore. Switching things up might actually make it easier for us to optimize the importing process by adding batched writes of the file being chunked in the chunker itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if this allowed to pass cid/multihash params as it would allow to build bridges to ipld things more easily
filestore/pb/dataobj.proto
Outdated
@@ -4,4 +4,5 @@ message DataObj { | |||
optional string FilePath = 1; | |||
optional uint64 Offset = 2; | |||
optional uint64 Size = 3; | |||
optional bool URL = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make this an enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Making this a type enum makes more sense and allows for better future extensibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could:
- Rename
FilePath
toLocation
. - Use URLs for everything and use the
file:///
schema for files. - Unify the commands.
This should make this trivial to extend via plugins.
(don't bother if this leads to a bunch of additional work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) Use URLs for everything and use the file:/// schema for files.
This will require a migration. Alternatively if I am correct that paths or stored absolute, we can use that to distinguish between url's and files.
(3) Unify the commands.
This makes sense since they are essentially doing the same thing, and I don't think it will be too hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require a migration. Alternatively if I am correct that paths or stored absolute, we can use that to distinguish between url's and files.
I assume they're absolute. And yes, that'd be a reasonable way forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there not. However the string http://
or https://
will not appear in any normalized file path as //
will always be simplified to '/'. So I still think checking for a http://
or https://
is valid way forward and can make this patch fairly small and non-invasive.
attempting to bring this up to date with master to potentially resolve some issues that IA is seeing (https://github.com/protocol/collab-internet-archive/issues/13) most of the conflicts seem to be caused (or made worse by) unstable import sort order, is this not something gofmt takes care of for us? |
I've rebased it. |
Tests are failing, I didn't have time to check why. |
not finding |
side note: this is probably a (long) separate discussion, but the goimports behavior with gx seems to introduce a nontrivial overhead with constantly reordering imports based on the hash, possible to patch it to sort based on alias instead? for example, this rebase could be done automatically ( |
Yeah... #4831 |
still don't have push access, unbroken here: #5096 |
Sorry that I've missed it. |
hi guys, in order to ensure readiness for the demo at DWeb summit (7/30-ish), we need to get this merged the scenario I am particularly concerned about is breaking changes introduced at the last minute in master that would require panicked rebasing, a sure recipe for demofail (precedent: the rebase in #4896 (comment) was required because accessing unixfs objects broke, thus breaking serving the static frontend of the IA viewer) by @bigs's estimate there's about an engineer-week of work to get this in good shape (probably less for minimum viable shape) can we get some eng time allocated to this? 100% understand that go-ipfs team is stretched thin already, but this is a relatively small linchpin that holds together a lot of other work and not getting it done would jeopardize an important opportunity to showcase IPFS to the world |
Could I get some background on what lead to the p.r. and why is it needed? If I understand what is going allows storing a url in place on an actual object. When it comes time to retrieve the object the URL is fetched instead. I can see how this can cause all sorts of problems, not the least of which is that URL can be very unstable and there is no guarantee that the content will stay around. Retrieving the content could also be very slow. |
This is designed for cases where the user controls both the server and the IPFS node. Basically, it's treating a remote HTTP server as a filesystem. Specifically, the internet archive needs this so they can serve their files over IPFS without moving their entire infrastructure over to IPFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for now. We can make it extensible later.
importer/helpers/dagbuilder.go
Outdated
@@ -48,6 +48,8 @@ type DagBuilderParams struct { | |||
// NoCopy signals to the chunker that it should track fileinfo for | |||
// filestore adds | |||
NoCopy bool | |||
|
|||
URL string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment?
filestore/filestore_test.go
Outdated
} | ||
if IsURL("adir/afile") { | ||
t.Fatal("IsURL recognized non-url") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should add some tests that look more like URLs (e.g., http:/ /a/file
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Already thinking of some changes we'll want for the future (like a cache in front of the http fetch), but this works for now!
This will be the first thing merged in after 0.4.16 lands. The rush to get it merged was simply so nothing breaks it, right @parkan ?
@whyrusleeping should I rebase this now or wait until 0.4.16 lands (to avoid having to rebase it a second time) |
@kevina I'd just wait until 0.4.16 lands. I don't think anything else is going to land between then and now, but just in case. |
merging first thing after 0.4.16 sgtm -- ideally this would be before the dev days (planning to demo there and would prefer to do it from master) thanks again @kevina! |
@parkan your welcome. I have this rebased within a day after 0.4.16 lands. If it is before the dev conf, well that depends on @whyrusleeping. |
License: MIT Signed-off-by: Jakub Sztandera <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT Signed-off-by: Kevin Atkinson <[email protected]>
@whyrusleeping @parkan just rebased now that it looks like we finally got a release out. |
0.4.16 has been released \o/ |
choo choo! |
My personal thanks to everyone that worked on this.
…On Fri, Jul 13, 2018, 5:09 PM Whyrusleeping ***@***.***> wrote:
Merged #4896 <#4896>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4896 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AABt-QlHaD2plUnWDhhHcztdogH2mDuKks5uGLgegaJpZM4TA0Zp>
.
|
I think the next steps here will be to add some sort of cache in front of the urlstore, maybe even disk backed. But before we do that, we should probably add metrics of some sort to get an idea of how well it performs. |
This is an experimental URL store feature that needs some discussion/design work before we even consider merging it. I'm opening this PR at @diasdavid's request so we don't lose track of it.