Support multiple resources #12

pheyvaer · 2023-08-07T13:53:13Z

It's possible to read and write to a single resource, but certain use cases need multiple resources.
Reading from multiple resources is supported by Comunica, but for the writing we have to have a closer look at how we should do this in YARRRML/RML.

sevrijss · 2023-08-17T09:56:30Z

https://developers.google.com/sheets/api/guides/metadata this could be used to keep track where each piece of information comes from.

pheyvaer · 2023-08-18T11:29:24Z

@bjdmeest For this, do we want to try/do something specific with Targets?

bjdmeest · 2023-08-18T11:34:49Z

Currently, the single source also isn't done via Targets, so I'd first try to figure how to do it (without thinking too much about RML), and then have a look to see whether RML targets makes sense for this or not (I think we'll first need to add a Solid-target or equivalent to RMLMapper, Els is working on that)

pheyvaer · 2023-08-18T11:45:22Z

So what do you suggest as next step? Can you maybe put that in a separate issue?

bjdmeest · 2023-08-18T11:54:16Z

I think just doing this issue will be tricky enough on itself. For next steps I'd first need to see the outcome of this one

pheyvaer · 2023-08-18T11:57:00Z

But I don't understand from your initial comment what we should use? Something with Targets? Or something else in the RML space? Or just something engineered that works?

bjdmeest · 2023-08-18T12:06:58Z

I suggest that @sevrijss analyses the problem and suggests a course of action. We can then review and decide whether that course of action makes sense as is or should be enhanced (eg by using RML target to generate multiple local files), I don't know what the potential pitfalls may be so I can't give a more clear direction yet

sevrijss · 2023-08-22T07:41:16Z

Possible courses of action:

1 rules file / resource
If there is 1 rules file / resource, every resource can be dealt with in a separate way. The rules file should contain the rules to go from the sheet data to the triples that need to be stored in that specific resource. All the paths could be specified in the config file. A disadvantage is that there will be a lot of files when you have a log of resources.
1 rules file using targets from RML spec
I've done some research about the target functionality in the RML spec, it seems promising but I think there are some problems:
- multiple subjects
  there is no guarantee that all the resources use the same subject identifiers. You would have to define a subject target link in the rules file.
- complex to set up
  If everything is contained in 1 single rules file with multiple subject entries and targets, it becomes very difficult to setup / maintain.
I've taken a look at the spec and tried a couple of things in matey. The targets seems to work fine, but I'm having trouble with the multiple subjects functionality.

Both approaches have advantages and disadvantages. The first is easier to setup but will require a lot of files. The second way is more RML based, but will be more complex to setup and maintain.

pheyvaer · 2023-08-22T07:44:34Z

Interesting! What would happen if you do a query over multiple resources, but you don't know which data is coming from which data source? How do you know which resources to update for which specific data?

sevrijss · 2023-08-22T07:53:31Z

In a specific case, that might pose a problem.
Say you have 2 resources, 1 containing data about a TV show and another about actors. Those 2 can be easily linked using a sparql query and comunica.
But as soon as the same data is spread out over multiple pods, e.g. 2 pods containing book information, I don't think it's possible to know where data came from when using the RML spec. Also, where would data go?

pheyvaer · 2023-08-22T07:56:54Z

I got imagine that you would want to write the changes to the resource where originally the data came from. But indeed I don't think you can specify that with RML. Comunica might be able to tell us where every triple came from. But then we still need to see how that information can be used by RML.

@bjdmeest What are your thoughts?

bjdmeest · 2023-08-23T12:40:57Z

for me, the most relevant case is when a single query result/row contains data from multiple sources. So I'd start with specifying which column updates should be written to which source.

e.g a sheet mentioning my favorite tv shows and my personal ratings: the tv show metadata comes from dbpedia, the ratings come from my pod. my rating updates should be persisted in my pod, tv show metadata updates should be (ideally) feed back to dbpedia, but currently practicaly will just fail, and that's OK :)

You can probably figure that out by combining the RML and SPARQL query (?tv_rating comes from term map <tm_001> which comes from logical source <myPersonalPod>), but adding that as separate metadata as initial test is fine for me

pheyvaer · 2023-08-23T13:38:26Z

@sevrijss Based on Ben's answer do you think do that the RML Target-based solution will work? Ignoring the potential complexity of the files.

Also about this from your earlier comment

there is no guarantee that all the resources use the same subject identifiers

This also an issue when querying so I would not worry about that now.

sevrijss · 2023-08-24T06:42:16Z

I think such a solution will definitely work. A lot of heavy lifting will be done by the RML api endpoint and not by the code.
But I might need some help writing such a query, since I never managed to combine targets and multiple subjects in matey.

pheyvaer added the enhancement New feature or request label Aug 7, 2023

pheyvaer mentioned this issue Aug 11, 2023

Agent that syncs pods and Google Sheets v2 SolidLabResearch/Challenges#123

Open

pheyvaer assigned sevrijss Aug 18, 2023

sevrijss linked a pull request Aug 21, 2023 that will close this issue

Support to query and write to multiple resources #21

Merged

sevrijss removed a link to a pull request Aug 24, 2023

Support to query and write to multiple resources #21

Merged

sevrijss linked a pull request Aug 31, 2023 that will close this issue

Support to query and write to multiple resources #21

Merged

bjdmeest added the priority label Aug 31, 2023

sevrijss closed this as completed in #21 Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple resources #12

Support multiple resources #12

pheyvaer commented Aug 7, 2023

sevrijss commented Aug 17, 2023

pheyvaer commented Aug 18, 2023

bjdmeest commented Aug 18, 2023

pheyvaer commented Aug 18, 2023

bjdmeest commented Aug 18, 2023

pheyvaer commented Aug 18, 2023

bjdmeest commented Aug 18, 2023

sevrijss commented Aug 22, 2023

pheyvaer commented Aug 22, 2023

sevrijss commented Aug 22, 2023 •

edited

Loading

pheyvaer commented Aug 22, 2023

bjdmeest commented Aug 23, 2023

pheyvaer commented Aug 23, 2023

sevrijss commented Aug 24, 2023

Support multiple resources #12

Support multiple resources #12

Comments

pheyvaer commented Aug 7, 2023

sevrijss commented Aug 17, 2023

pheyvaer commented Aug 18, 2023

bjdmeest commented Aug 18, 2023

pheyvaer commented Aug 18, 2023

bjdmeest commented Aug 18, 2023

pheyvaer commented Aug 18, 2023

bjdmeest commented Aug 18, 2023

sevrijss commented Aug 22, 2023

pheyvaer commented Aug 22, 2023

sevrijss commented Aug 22, 2023 • edited Loading

pheyvaer commented Aug 22, 2023

bjdmeest commented Aug 23, 2023

pheyvaer commented Aug 23, 2023

sevrijss commented Aug 24, 2023

sevrijss commented Aug 22, 2023 •

edited

Loading