Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify container listing mechanism #116

Closed
csarven opened this issue Nov 7, 2019 · 19 comments
Closed

Specify container listing mechanism #116

csarven opened this issue Nov 7, 2019 · 19 comments
Assignees
Labels
doc: Protocol status: Nominated An issue that has been nominated for the next monthly milestone topic: resource access
Milestone

Comments

@csarven
Copy link
Member

csarven commented Nov 7, 2019

@ericprud
Copy link

ericprud commented Nov 8, 2019

Concrete example:

http://alice.pod/share/photoAnnotations/

<http://alice.pod/share/photoAnnotations/> a ldp:BasicContainer ;
  ldp:contains <annot123>,  <annot456> .

# data cached from contained resources
<annot123> a photo:Annotation ;
  dc:author "Alice" ;
   photo:photo <http://amy.pod/photos/photo123.jpg>, 
   photo:caption "Alice and Amy at work" .

<annot456> a photo:Annotation ;
  dc:author "Alice" ;
   photo:photo <http://bob.pod/photos/photo456.jpg>, 
   photo:caption "Bob sleeping at work" .

Timeline:

  1. Alice created the container
  2. Alice allowed Bob to "contribute" (append and own his contributions)
  3. Alice creates annot123 and annot456
    -- unprivileged GET on /share/photoAnnotations/ gets a everything in the container
  4. Bob asks Alice to restrict access to the "Bob sleeping at work" photo.
  5. Alice sets ACLs on annot456 to only allow Alice and Bob to see it.
    -- unprivileged GET on /share/photoAnnotations/ ideall doesn't get either the link to or the triples
<annot456> a photo:Annotation ;
  dc:author "Alice" ;
   photo:photo <http://bob.pod/photos/photo456.jpg>, 
   photo:caption "Bob sleeping at work" .

@csarven
Copy link
Member Author

csarven commented Dec 1, 2019

In the context of access controls, there are generally two cases to consider pertaining to the binding relationship between a LDP resource and the container it is in:

  1. Unless otherwise specified, an agent with Read access to a container can observe its containment triples ie. read the names of the resources it contains. Access to each resource in the container is subject to the permissions of the resource itself - this is orthogonal, and may be obvious but important to emphasise. This is a common behaviour in *nix systems and compatible with chmod and setfacl in that if the permission block applicable to the user has +r, then they can see the directory's contents (filenames) regardless of the setting on each of the files. Similarly, well-known Web servers eg. Apache, would include all files as long as the directory index is enabled/allowed.

  2. In order to have a strong privacy protection pertaining to containment, restricting the visibility of resources in containment triples is required. That is, an agent without Read access to a resource in a container should not be aware of its existence through containment triples - so as to not leak or expose information. This makes it possible to have a public read container meanwhile having the ability to effectively have a fine-controlled containment listing.

Aside:
I want to note here that one way to organise containers and resources would be to simply restrict read access on the container so that its resource names are not exposed through containment triples. Another way around this would be to not add resources to a container so that they don't show up in containment triples. However, this is not possible given consensus in #98 (comment) about containment being strictly hierarchical. There may be exceptions but I'll leave that investigation for another discussion/issue.


AFAIK, when we discussed WAC/ACL, we've generally used chmod as analogous to ACL modes. It is close enough for some cases but not accurate enough. setfacl is in fact much closer to how we've been using WAC/ACL. However, I would argue that that itself is insufficient with respect to what we've been trying to get out of WebACL (at least in a declarative way). We need to clarify the extent of using expressive policy rules in junction with ACLs so that the "out of band" expectations is actually part of the rules we want to set. Using *nix like systems as analogous, this may be along the lines of SELinux, AppArmour, and alike - so, possibly moving in the direction of or borrowing from Mandatory Access Control (MAC).

To that end, and FWIW, I've created solid/authorization-panel#55 and solid/data-interoperability-panel#31 to properly explore the ODRL Information Model (but it could also be something else) and to see how it can work with WAC/ACL. What this may entail is having default (pod) policies to express a layer of rules and conditions when interacting with resources.

For container listing, I propose that we adopt a privacy first approach as a default. That is, agent with Read access to container can view all containment triples with the exception of contained resources that they do not have Read access to.

Going forward, as soon as we can express that declaratively eg. how ACL+ODRL works in close proximity to LDP, we can specify the behaviour (as MUST?) but still allow other possible pod configurations (as MAY?) eg. an agent's Read access or lack of is not part of the policy in constructing containment triples.

@simonstey
Copy link

something along the lines of ->

<http://alice.pod/share/policy>
    a odrl:Policy ;
    odrl:conflict odrl:perm ;
    odrl:prohibition [ 
        odrl:target <http://alice.pod/share/photoAnnotations/> ;
        odrl:action odrl:use 
    ] ;
    odrl:permission [ 
        odrl:target <http://bob.pod/photos/photo456.jpg> ;
        odrl:action acl:Read;
        odrl:assignee :Alice, :Bob 
    ] .

?

@kjetilk
Copy link
Member

kjetilk commented Jan 25, 2020

@timbl explained to me in a F2F discussion that he considered putting details like in the above examples to be a bad practice, you shouldn't do that. Instead, he had suggested using a different resource for that purpose. The databrowser uses index.ttl throughout for this resource.

The way that I now understand it is that should not be interpreted as a default resource for the container (#69), it is not a representation of the container, merely an easily accessible aggregate of certain data from the contained resources. He suggested that even though he had some regrets about calling it index.ttl, this practice should be continued. I think a case could be made that such a resource should be referenced with a special predicate, I think saying <./> rdfs:seeAlso <./index.ttl> . in the container would be a good idea when it exists.

Merely moving these data to a different resource doesn't resolve all concerns in this resource, as access controls are more granular for resources in a container than for the container itself, which I understand motivates the desire to list only resources that a client is authorized to read. This is a general departure from the current Solid, though, as the ACL system currently has a resource as the smallest unit. It is also a departure from the UNIX filesystem analogy. This suggests to me that a best practice recommendation is to not put things in the identifier someone with read access to the container shouldn't see.

Other than that, I think we should stay close to the UNIX filesystem analogy, i.e., the container has containment triples, required metadata, optional timestamps, etc, but not actual user data.

@ericprud
Copy link

Proposal: Computed Containers

I haven't implemented this yet but I think it would work to not store containment triples in index.ttl and instead synthesize LDPC responses. From an implementation perspective, this is how pretty much every web server works (apache, lighttpd, nginx, iis).

There could be a system-managed __index.html with triples that are subject only to the perms of the LDPC. This implies that clients typically don't PUT Containers (which honestly always freaked me out anyways). As an analogy, Apache systems running mod_dir render the contents of README.html above the directory listing (which means it's never valid HTML). A client PUTting __index.ttl would be like a client PUTting README.html on an Apache system.

Semantics

GET -- __index.ttl + readdir().filter(d => d.isReadable()).map(d => makeURLfor(d))
PUT P reject if it has containment triples -- writeFile('__index.ttl', P)
POST P current POST behavior

Motivation

  1. Makes it easy to enforce permissions.
  2. Simplifies server's GET (compared to whatever we would have dreamed up to walk through the index.ttl graph and decide what to include in a GET response).
  3. Simplifies server's PUT
  4. Scopes when clients can PUT.

@kjetilk
Copy link
Member

kjetilk commented Jan 28, 2020

Yeah, I think that the hierarchy assumption makes it necessary to computer container membership on request. However, as per the discussion I had with Tim as referenced above, index.ttl is a completely different thing as implemented in databrowser, so we have several discussions here:

  1. Whether the container representation should have augmented descriptions of contained resources.
  2. Whether that representation should be included from a certain resource, e.g. index.ttl.
  3. Whether the container representation should have containment triples for resources the client does not have read access to
  4. whether the container representation should contain a link to data that augments the container representation.
  5. whether containment representation augmentation data should be in a specially named file, e.g. index.ttl
  6. or could it be in several referenced files?

By "augmentation", I mean examples such as

<annot456> a photo:Annotation ;
  dc:author "Alice" ;
   photo:photo <http://bob.pod/photos/photo456.jpg>, 
   photo:caption "Bob sleeping at work" .

From the conversation we had, it seems Tim's answer to the two first question are "no", but that the answer to 5 is yes.

As for whether the containment triples should be listed, that's a trickier question. I think it comes down to whether the URI is sensitive in itself. On one hand, it can be sensitive in some sense, clearly, a WebID is something that identifies a person. OTOH, we should be careful to imply that URIs can always be protected, because if we communicate that URIs are protected, what does that have to say for a lot of other security questions we are treating? My hunch is that we could end up in a situation where we rely on security by obscurity if that's the model we go for, but I'm not too well versed in this. Instead, it might be better to communicate clearly that you shouldn't put sensitive information in URIs, ever.

A pragmatic issue is also that if acl:Read is required to get the containment triples, clients who have other permissions than read on a resource has no way to discover the resources. Also, UNIX lists directory contents that I can't access.

Like Tim, I believe that augmented descriptions should not be a part of the container representation, instead it should go in different resources. I'm thinking that we should rdf:seeAlso links to the augmentation data, and that it shouldn't be a magic index.ttl but that it rather should be linked. Then, the different augmentation resources can have different permissions.

@csarven
Copy link
Member Author

csarven commented Jan 28, 2020

I agree that index.ttl should be discoverable. I also think that the use cases for index.ttl can be covered by describedby or https://www.w3.org/ns/iana/link-relations/relation#describedby. Isn't the naming an implementation detail?

@kjetilk
Copy link
Member

kjetilk commented Jan 28, 2020

I agree that index.ttl should be discoverable. I also think that the use cases for index.ttl can be covered by describedby or https://www.w3.org/ns/iana/link-relations/relation#describedby. Isn't the naming an implementation detail?

Noooo, I don't think so... I think that the client need to recognize the semantics, and so it should be speced, at least as a best practice. Also, I think the semantics is more like "you should see this resource too, it has more info", which is quite different from "this resource has metadata about it here". My opinions aren't as strong as they often are about stuff, though :-)

@csarven
Copy link
Member Author

csarven commented Jan 29, 2020

Perhaps start by stating the the actual use cases before proposing a solution.

@kjetilk
Copy link
Member

kjetilk commented Jan 29, 2020

Yeah, the mechanisms around data augmentation is a case that would better be use case driven. But that also means that we should be careful not to put too much into the container representation.

We do have to think carefully about the status of URIs, if they are considered to be sensitive in themselves, and I think that's a fairly urgent issue. Perhaps we should open a separate issue on that topic, though?

@kjetilk
Copy link
Member

kjetilk commented Jan 29, 2020

I opened solid/solid#142 to discuss whether we should make a different assumption than RFC7231 on the sensitivity of URIs, lets discuss that topic over there.

@csarven
Copy link
Member Author

csarven commented Jan 30, 2020

Just to retain focus: regarding using index.ttl to address "data augmentation" needs its own issue. It doesn't mean that the name "index.ttl" (as fixed/reserved/well-known naming) will be used and/or a particular property to discover it. That's part of understanding what the actual requirement is any way. Would you mind creating it? 142 can broadly help with this issue (116) but I don't see the specific relevance of index.ttl ("data augmentation") here.

@kjetilk
Copy link
Member

kjetilk commented Jan 30, 2020

Yes, good idea, I opened solid/solid#144 to address that.

So, what's open to discuss here is the implications of consensus in solid/solid#142. PUT semantics was discussed over in solid/solid#40.

Then, there's the discussion of linking ACL and other metadata resources. I suggest that is best dealt with in the general metadata resource discussion.

The use of Prefer header to list all available container metadata seems to be something that can be use case driven, perhaps we should have a separate issue on that too?

Anything else?

@csarven
Copy link
Member Author

csarven commented Aug 15, 2020

Given: resource-based authorization (like WAC/ACL) holds that agents with Read access privilege can read a container's description in full; including containment statements.

Noting the performance consideration as brought up in #142 (comment) , the above behaviour does not require additional machinery and so can generate container representations with minimal effort.

Noting also the alignment with common *nix directory listing behaviour as mentioned above.

And, noting that resource naming - whether a URI path discloses any sensitive information - to be orthogonal to this issue.

I propose that the spec remains consistent when resource-centric access control is used.

When supplemental access control policies eg. attribute-based, as mentioned in #116 (comment) are put in place - possibly even extending or combined with WAC/ACL - they can allow agents to set fine-grained policies. The same mechanism can potentially allow users to hide resources from container listing by setting required policy parameters.

Currently the spec does not have a requirement for container representations to include anything beyond containment triples. I agree to revisit Prefer-based listing separately - possibly as optional behaviour but indeed would be use case driven. Is there a significant use case that would need to have a way for a container representation to include information from its or possibly its containments' auxiliary resources?

@dmitrizagidulin
Copy link
Member

@csarven

And, noting that resource naming - whether a URI path discloses any sensitive information - to be orthogonal to this issue.

I don't think it's quite an orthogonal issue, it's the main issue here. To put it simply, showing links to resources to users who don't have read access to those resources, is a really disturbing privacy and usability implication. And while I don't have results of formal usability studies (though hopefully they are out there), I would argue that this behavior would go directly against existing user mental models and intuitions. In vast majority of current web-based files & folders storage systems (such as Google Drive, Dropbox, etc), if you don't have read access to a document or file, you don't see it if you view the contents of the folder.
Similarly, in every single social media app (twitter, facebook, livejournal, tumblr, etc etc), if you're viewing a user's feed and you don't have access to some posts, you do not see those posts.

I think it would be incredibly dangerous to do otherwise, as your'e proposing.

The same mechanism can potentially allow users to hide resources from container listing by setting required policy parameters.

Aha! Now this is thinking along the right lines! Except we should invert that default. By default, if somebody doesn't have Read access to a resource, they don't see it in container listings. But that can be overridden, on a policy level, and you can set some sort of ACL directive that says "show this resource to users who don't have access" (presumably, so they can request access to it, or just to taunt them, etc.)

@csarven
Copy link
Member Author

csarven commented Aug 16, 2020

Here "sensitivity" is in context of information in URI as opposed to access control on contained resources, and so it is not applicable to the listing mechanism practically speaking: 1) there is no requirement in Solid that exposes sensitive information in URIs ( see summary in #142 (comment) ) and 2) we wouldn't be able to test anything above and beyond the considerations mentioned in RFCs.

Let me throw in an example coming from a different direction on sensitivity: an agent knowing the existence of /foo/bar/baz can by definitions know the existence and hierarchical containment of everything leading up to baz. It is entirely orthogonal to their access privilege on any of the resources. Neither does this violate the sensitivity consideration.

What's proposed - more like clarified - earlier is intended to remain consistent with the foundations and good practices that we have some rough consensus on. I view this as the default ie. containment listing is not affected by resource-centric access control, at least within the scope of current WAC/ACL. I think we should first acknowledge that in order to pave the way to extensions and fine-grained policies.

Needless to say, ACL Read's definition is confined to operations related to accessing a resource and reading its contents. Visibility of resource names is currently not influenced through WAC/ACL. So, we should take care to not subscribe behaviour that's not originally there. We have to determine which access control mechanism or paradigm can take this on. We could of course introduce this through WAC/ACL a simple statement or extend/combine with other models - mentioned ODRL above as example, and the default can be overridden there. It can also be inherited from root container if say storage owner/controller sets it.

In addition to the default that I've mentioned, we can also recommend that servers SHOULD or MAY want to control containment listing based on authenticated agent's access control on contained resources, and take note of performance considerations (among other things).

As you know, Apache's directory index by default behaves like the *nix system in that files under a directory are visible even while agents don't have read access on each file. Granted actual applications (more like centralised services coupling UI and data) work on some layers above that, so what you highlight about some mental models definitely holds true and still possible.

@acoburn
Copy link
Member

acoburn commented Aug 16, 2020

I have implementation experience with both modes described here:

(A) in which a container resource will list all child resources, regardless of the client's access to those child resources and
(B) in which a container resource will list only those child resources to which a client has read access

(To be clear, I'm assuming that the client has at least acl:Read access to the container resource itself)

The TL;DR version is that, at scale, (B) degrades very quickly and very badly. The WebACL enforcement algorithm is already not especially fast, and if a server needs to perform N+1 ACL checks for every container read request (where N is the number of child resources), then container listings becomes very slow. On a piece of software that I previously worked on that implements approach (B), I heard stories about container GET requests timing out after ten minutes (!) -- this is for containers with ~10,000 child resoruces. This also means that it is not possible to cache container resource responses, since every response will depend on exactly who is making the request.

Option (A) tends to have very good performance characteristics and a much simpler implementation. It is also cacheable. The major downside, as @dmitrizagidulin points out, is that one cannot hide URLs.

If that is your concern, however, there is an easy way to deal with that: add a layer of indirection. If you have a publicly readable container at /foo/ and you want ./child1, ./child2 and ./child3 to be discoverable but you would like ./child4 and ./child5 hidden, create a structure such as:

/foo/a/child1
/foo/a/child2
/foo/a/child3
/foo/b/child4
/foo/b/child5

The ./foo/ container will list ./a/ and ./b/, but (so long as those are opaque identifiers) a client will have no way to know what is contained inside of those unless the client has read access to the intermediate container itself.

A very different way of thinking about @dmitrizagidulin's use case is in the context of a query interface, especially one with an ability to page through responses. So long as the mechanism for interacting with containers is via a RESTful interface (i.e. resource-oriented), I have a hard time supporting option (B). But if we view this through the lens of a query interface, one would typically constrain the result set through a paged collection of responses. There, the constrained set of results makes it possible to do all sorts of response filtering, based on access controls. And the expectations about RESTful, resource-based orientation are entirely different.

In other words, I find it problematic to require filtering containment triples of containers, based on access controls, but I think it is perfectly reasonable to do that very thing in a query context (provided that one is able to page through the result set)

@csarven
Copy link
Member Author

csarven commented Aug 23, 2020

I suggest to include the following text to tighten up the spec in the spirit of what's discussed. Preventing information leakage in context of HTTP responses to successful resource creation:

"
Normative:

When using Web Access Control, an acl:agent MUST have acl:Read privilege per the ACL inheritance algorithm on the created resource in order for servers to include the HTTP Location and Content-Location headers in response to POST requests.
"

Above is not applicable to PUT and PATCH because the URI of a created resource is assigned by a client whereas server assigns the URI whenPOST is used. It holds true regardless of the requesting agent having Read access on the created resource or its container, or not.

Edit: Updated guideline as normative.

@csarven csarven self-assigned this May 17, 2021
@kjetilk kjetilk added the status: Nominated An issue that has been nominated for the next monthly milestone label Oct 6, 2021
@csarven
Copy link
Member Author

csarven commented Dec 15, 2021

Nothing to do here (for now). Move along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc: Protocol status: Nominated An issue that has been nominated for the next monthly milestone topic: resource access
Projects
Status: Done
Development

No branches or pull requests

7 participants