URL store #4896

Stebalien · 2018-03-29T18:55:27Z

This is an experimental URL store feature that needs some discussion/design work before we even consider merging it. I'm opening this PR at @diasdavid's request so we don't lose track of it.

kyledrake · 2018-03-29T23:08:00Z

This is beautiful, thank you. ❤️

whyrusleeping · 2018-03-30T09:19:53Z

Reading through this and thinking a bit, It seems like the filestore/urlstore logic should be self-contained entirely in a special chunker. The fact that its leaking into the dagbuilderhelper importer stuff appears to be unnecessary.

Cleaning that up might make it easier to integrate both the filestore and the urlstore in the same codebase

-- more thinking --

For this to work, the import would need to be changed to so that it accepts a chunker with an interface that gives it just a cid and a length (and changing the chunker to be responsible for putting the blocks to the datastore. Switching things up might actually make it easier for us to optimize the importing process by adding batched writes of the file being chunked in the chunker itself.

magik6k

It would be great if this allowed to pass cid/multihash params as it would allow to build bridges to ipld things more easily

magik6k · 2018-03-30T09:14:09Z

filestore/pb/dataobj.proto

@@ -4,4 +4,5 @@ message DataObj {
        optional string FilePath = 1;
        optional uint64 Offset = 2;
        optional uint64 Size = 3;
+        optional bool URL = 4;


I'd make this an enum

Agreed. Making this a type enum makes more sense and allows for better future extensibility.

Alternatively, we could:

Rename FilePath to Location.

Use URLs for everything and use the file:/// schema for files.

Unify the commands.

This should make this trivial to extend via plugins.

(don't bother if this leads to a bunch of additional work)

(2) Use URLs for everything and use the file:/// schema for files.

This will require a migration. Alternatively if I am correct that paths or stored absolute, we can use that to distinguish between url's and files.

(3) Unify the commands.

This makes sense since they are essentially doing the same thing, and I don't think it will be too hard.

This will require a migration. Alternatively if I am correct that paths or stored absolute, we can use that to distinguish between url's and files.

I assume they're absolute. And yes, that'd be a reasonable way forward.

Actually there not. However the string http:// or https:// will not appear in any normalized file path as // will always be simplified to '/'. So I still think checking for a http:// or https:// is valid way forward and can make this patch fairly small and non-invasive.

parkan · 2018-06-05T16:23:44Z

attempting to bring this up to date with master to potentially resolve some issues that IA is seeing (https://github.com/protocol/collab-internet-archive/issues/13)

most of the conflicts seem to be caused (or made worse by) unstable import sort order, is this not something gofmt takes care of for us?

Kubuxu · 2018-06-05T17:28:07Z

I've rebased it.

Kubuxu · 2018-06-06T03:13:29Z

Tests are failing, I didn't have time to check why.

parkan · 2018-06-06T04:04:12Z

not finding gx/ipfs/QmWo8jYc19ppG7YoTsrr2kEtLRbARTJho5oNXFTR6B7Peq/go-ipfs-chunker, can I get push access to the repo so I can diagnose without PRing from my fork?

parkan · 2018-06-06T04:05:56Z

side note: this is probably a (long) separate discussion, but the goimports behavior with gx seems to introduce a nontrivial overhead with constantly reordering imports based on the hash, possible to patch it to sort based on alias instead?

for example, this rebase could be done automatically (--theirs semantics) if not for non-deterministic sorting

Stebalien · 2018-06-06T04:37:14Z

side note: this is probably a (long) separate discussion, but the goimports behavior with gx seems to introduce a nontrivial overhead with constantly reordering imports based on the hash, possible to patch it to sort based on alias instead?

Yeah... #4831

parkan · 2018-06-07T21:14:59Z

still don't have push access, unbroken here: #5096

Kubuxu · 2018-06-08T15:20:18Z

Sorry that I've missed it.

parkan · 2018-06-21T20:56:35Z

hi guys, in order to ensure readiness for the demo at DWeb summit (7/30-ish), we need to get this merged

the scenario I am particularly concerned about is breaking changes introduced at the last minute in master that would require panicked rebasing, a sure recipe for demofail (precedent: the rebase in #4896 (comment) was required because accessing unixfs objects broke, thus breaking serving the static frontend of the IA viewer)

by @bigs's estimate there's about an engineer-week of work to get this in good shape (probably less for minimum viable shape)

can we get some eng time allocated to this? 100% understand that go-ipfs team is stretched thin already, but this is a relatively small linchpin that holds together a lot of other work and not getting it done would jeopardize an important opportunity to showcase IPFS to the world

@Kubuxu @whyrusleeping

kevina · 2018-06-21T21:27:44Z

Could I get some background on what lead to the p.r. and why is it needed?

If I understand what is going allows storing a url in place on an actual object. When it comes time to retrieve the object the URL is fetched instead. I can see how this can cause all sorts of problems, not the least of which is that URL can be very unstable and there is no guarantee that the content will stay around. Retrieving the content could also be very slow.

Stebalien · 2018-06-21T22:24:30Z

If I understand what is going allows storing a url in place on an actual object. When it comes time to retrieve the object the URL is fetched instead. I can see how this can cause all sorts of problems, not the least of which is that URL can be very unstable and there is no guarantee that the content will stay around. Retrieving the content could also be very slow.

This is designed for cases where the user controls both the server and the IPFS node. Basically, it's treating a remote HTTP server as a filesystem.

Specifically, the internet archive needs this so they can serve their files over IPFS without moving their entire infrastructure over to IPFS.

Stebalien

Looks good for now. We can make it extensible later.

Stebalien · 2018-06-29T22:22:04Z

importer/helpers/dagbuilder.go

@@ -48,6 +48,8 @@ type DagBuilderParams struct {
 	// NoCopy signals to the chunker that it should track fileinfo for
 	// filestore adds
 	NoCopy bool
+
+	URL string


Stebalien · 2018-06-29T22:23:23Z

filestore/filestore_test.go

+	}
+	if IsURL("adir/afile") {
+		t.Fatal("IsURL recognized non-url")
+	}


Probably should add some tests that look more like URLs (e.g., http:/ /a/file).

whyrusleeping

This LGTM. Already thinking of some changes we'll want for the future (like a cache in front of the http fetch), but this works for now!

This will be the first thing merged in after 0.4.16 lands. The rush to get it merged was simply so nothing breaks it, right @parkan ?

kevina · 2018-06-30T04:02:35Z

@whyrusleeping should I rebase this now or wait until 0.4.16 lands (to avoid having to rebase it a second time)

whyrusleeping · 2018-06-30T06:34:42Z

@kevina I'd just wait until 0.4.16 lands. I don't think anything else is going to land between then and now, but just in case.

parkan · 2018-07-01T01:26:26Z

merging first thing after 0.4.16 sgtm -- ideally this would be before the dev days (planning to demo there and would prefer to do it from master)

thanks again @kevina!

kevina · 2018-07-01T01:35:32Z

@parkan your welcome. I have this rebased within a day after 0.4.16 lands. If it is before the dev conf, well that depends on @whyrusleeping.

License: MIT Signed-off-by: Jakub Sztandera <[email protected]>

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

kevina · 2018-07-13T13:17:20Z

@whyrusleeping @parkan just rebased now that it looks like we finally got a release out.

daviddias · 2018-07-13T13:46:48Z

0.4.16 has been released \o/

whyrusleeping · 2018-07-13T15:09:05Z

choo choo!

kyledrake · 2018-07-13T15:35:42Z

My personal thanks to everyone that worked on this.

…

On Fri, Jul 13, 2018, 5:09 PM Whyrusleeping ***@***.***> wrote: Merged #4896 <#4896>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4896 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABt-QlHaD2plUnWDhhHcztdogH2mDuKks5uGLgegaJpZM4TA0Zp> .

parkan · 2018-07-16T12:31:52Z

whyrusleeping · 2018-07-18T02:17:50Z

I think the next steps here will be to add some sort of cache in front of the urlstore, maybe even disk backed. But before we do that, we should probably add metrics of some sort to get an idea of how well it performs.

Stebalien added status/deferred Conscious decision to pause or backlog status/blocked Unable to be worked further until needs are met labels Mar 29, 2018

Stebalien requested a review from Kubuxu as a code owner March 29, 2018 18:55

ghost assigned Stebalien Mar 29, 2018

ghost added status/in-progress In progress and removed status/deferred Conscious decision to pause or backlog labels Mar 29, 2018

Stebalien added status/deferred Conscious decision to pause or backlog and removed status/in-progress In progress labels Mar 29, 2018

Stebalien changed the title ~~[DO NOT MERGE} URL store~~ [DO NOT MERGE] URL store Mar 29, 2018

magik6k reviewed Mar 30, 2018

View reviewed changes

Stebalien removed their assignment Mar 31, 2018

Kubuxu force-pushed the feat/ai-mirror branch from 22c871e to 3f7ad66 Compare June 5, 2018 17:27

ghost assigned Kubuxu Jun 5, 2018

ghost added status/in-progress In progress and removed status/deferred Conscious decision to pause or backlog labels Jun 5, 2018

Stebalien mentioned this pull request Jun 5, 2018

Bring urlstore fork in sync with master #5082

Closed

ghost assigned whyrusleeping Jun 8, 2018

Stebalien commented Jun 29, 2018

View reviewed changes

whyrusleeping approved these changes Jun 30, 2018

View reviewed changes

Kubuxu and others added 13 commits July 13, 2018 09:04

filestore: add URLStore

1a83520

License: MIT Signed-off-by: Jakub Sztandera <[email protected]>

Fix "ipfs urlstore add" output.

d59a6e9

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Simplify code: use prefix instead of flag to determine if a url

696a0f0

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Add config option to enable urlstore.

b53a1b3

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Add test cases for urlstore.

9097209

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Return better error code when an http request failed.

e5189f4

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Enhance tests.

b3457f2

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Add some documentation to ipfs urlstore add command.

0e24444

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Code cleanups to make code climate happy.

ed2bb81

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

More test fixes.

0c2efb9

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Make sure you can't add URL's unless the url store is enabled.

6a4b126

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

filestore: Return consistent err msg. when file/urlstore is not enabled.

8dd970b

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

Address c.r. and additional tweaks.

1f29699

License: MIT Signed-off-by: Kevin Atkinson <[email protected]>

kevina force-pushed the feat/ai-mirror branch from 0a011dc to 1f29699 Compare July 13, 2018 13:14

whyrusleeping merged commit 95f721c into master Jul 13, 2018

ghost removed the status/in-progress In progress label Jul 13, 2018

whyrusleeping deleted the feat/ai-mirror branch July 13, 2018 15:09

leerspace mentioned this pull request Oct 3, 2018

add version, usage, and planning info for urlstore #5552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL store #4896

URL store #4896

Stebalien commented Mar 29, 2018

kyledrake commented Mar 29, 2018

whyrusleeping commented Mar 30, 2018 •

edited

Loading

magik6k left a comment

magik6k Mar 30, 2018

whyrusleeping Jun 21, 2018

Stebalien Jun 21, 2018

kevina Jun 22, 2018

Stebalien Jun 22, 2018

kevina Jun 22, 2018 •

edited by Stebalien

Loading

parkan commented Jun 5, 2018

Kubuxu commented Jun 5, 2018

Kubuxu commented Jun 6, 2018

parkan commented Jun 6, 2018

parkan commented Jun 6, 2018 •

edited

Loading

Stebalien commented Jun 6, 2018

parkan commented Jun 7, 2018

Kubuxu commented Jun 8, 2018

parkan commented Jun 21, 2018

kevina commented Jun 21, 2018

Stebalien commented Jun 21, 2018

Stebalien left a comment

Stebalien Jun 29, 2018

Stebalien Jun 29, 2018

whyrusleeping left a comment

kevina commented Jun 30, 2018

whyrusleeping commented Jun 30, 2018

parkan commented Jul 1, 2018

kevina commented Jul 1, 2018

kevina commented Jul 13, 2018 •

edited

Loading

daviddias commented Jul 13, 2018

whyrusleeping commented Jul 13, 2018

kyledrake commented Jul 13, 2018 via email

parkan commented Jul 16, 2018

whyrusleeping commented Jul 18, 2018

URL store #4896

URL store #4896

Conversation

Stebalien commented Mar 29, 2018

kyledrake commented Mar 29, 2018

whyrusleeping commented Mar 30, 2018 • edited Loading

magik6k left a comment

Choose a reason for hiding this comment

magik6k Mar 30, 2018

Choose a reason for hiding this comment

whyrusleeping Jun 21, 2018

Choose a reason for hiding this comment

Stebalien Jun 21, 2018

Choose a reason for hiding this comment

kevina Jun 22, 2018

Choose a reason for hiding this comment

Stebalien Jun 22, 2018

Choose a reason for hiding this comment

kevina Jun 22, 2018 • edited by Stebalien Loading

Choose a reason for hiding this comment

parkan commented Jun 5, 2018

Kubuxu commented Jun 5, 2018

Kubuxu commented Jun 6, 2018

parkan commented Jun 6, 2018

parkan commented Jun 6, 2018 • edited Loading

Stebalien commented Jun 6, 2018

parkan commented Jun 7, 2018

Kubuxu commented Jun 8, 2018

parkan commented Jun 21, 2018

kevina commented Jun 21, 2018

Stebalien commented Jun 21, 2018

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien Jun 29, 2018

Choose a reason for hiding this comment

Stebalien Jun 29, 2018

Choose a reason for hiding this comment

whyrusleeping left a comment

Choose a reason for hiding this comment

kevina commented Jun 30, 2018

whyrusleeping commented Jun 30, 2018

parkan commented Jul 1, 2018

kevina commented Jul 1, 2018

kevina commented Jul 13, 2018 • edited Loading

daviddias commented Jul 13, 2018

whyrusleeping commented Jul 13, 2018

kyledrake commented Jul 13, 2018 via email

parkan commented Jul 16, 2018

whyrusleeping commented Jul 18, 2018

whyrusleeping commented Mar 30, 2018 •

edited

Loading

kevina Jun 22, 2018 •

edited by Stebalien

Loading

parkan commented Jun 6, 2018 •

edited

Loading

kevina commented Jul 13, 2018 •

edited

Loading