Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregators to improve data access across many pods, a social media perspective #24

Closed
pbonte opened this issue Apr 8, 2022 · 10 comments
Labels
challenge technical problem applied to a use case completion: approved ✅ proposal: approved ✅ report: done ✅ The report of the complated challenge/scenario is done.

Comments

@pbonte
Copy link

pbonte commented Apr 8, 2022

Pitch

Applications that require to aggregate data across many pods can be faced with slow response times due to the latency of data retrieval and processing of the large number of pods. This is typically the case in a social media scenario, where the timelines of their users are curated based on the activities of their contacts. Computing these timelines when the users access their social media applications is typically not feasible due to latency constraints. Therefore, the timelines should be precomputed as a form of aggregation. The SolidBench.js benchmark will be used to simulate data pods with social media data.

Desired solution

A first version proof of concept aggregator server that functions as an intermediate compontent in the Solid network and accepts queries from client side applications and directly exposes the result of the queries, i.e. the computed bindings. This allows client applications to retrieve the query results directly from the aggregator instead of evaluating expensive queries themselves.
This aggregator server should compute the bindings and keep them up to date when changes in the resources occur. In other words, make sure that changes in the resources reflect in possible changes in the resulting bindings of a specific query. As a proof of concept this can be done by re-evaluating the queries every time a resource has changed. (In later optimisations this could be done by using incremental query execution techniques. Also as this is a proof of concept no authentication needs to be considered.)

The solution is required to:

  • Accept and execute the queries and reasoning rules from a client application.
  • Keep the used resources (which are stored on pods) up to date by polling or with a websocket.
  • When a resource changes, the queries that use this resource should be re-evaluated. (This is subject to change in future optimisations)
  • Have a way for the client to get the bindings of these queries in bulk (like a get request).
  • Have a way of streaming the change/difference in bindings when a resource has changed.

Acceptance criteria

Show the speed increase for the query evaluation between client side query evaluation and using the aggregator server by using the SolidBench (https://github.com/SolidBench/SolidBench.js) benchmark.

A demo that showcases this solution would need to be able to:

  • Show the increase in speed for the different types of queries from the SolidBench (queries that use link traversal, reasoning,…).
  • Show that the bindings get updated when the resources change.
  • Show that the results are complete.

Assumptions

As the topic of aggregation is still a novel research topic, a number of assumptions were taken:

  • Long term server-side authenticated sessions have been solved and therefore the authentication part of this challenge is not taken into account.
  • The registered queries are SPARQL Select queries
  • This is a first prototype that does not need to be fully optimised
@pbonte pbonte added challenge technical problem applied to a use case proposal: pending ❓ labels Apr 8, 2022
@pheyvaer
Copy link
Contributor

In the Solid Calendar Store there is a store that allows to pre-generate the representation of a store. This is done to reduce the response time when an agent request a calendar, because sometimes this might take up to 30 seconds if multiple calendars are combined. This might be a pointer or serve as inspiration for a demo.

@pheyvaer
Copy link
Contributor

pheyvaer commented Aug 2, 2022

@pbonte @fongenae Did you have the chance to look into making the necessary changes?

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@maartyman
Copy link

The proof of concept aggregator is done. A server has been made that accepts queries in the form of post requests and is able to perform these queries using comunica. The used resources are then observed for changes using web sockets (version 0.1 of the solid web socket spec) or polling, and the query is reevaluated if the resources change. The query results are then made available by the aggregator through a GET request (snapshot result) and web sockets (bulk result + constant update stream).

I'm now working out some bugs and making some demos using the SolidBench data and queries.

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@maartyman
Copy link

I'm working on closing this challenge.

@github-actions
Copy link

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@maartyman
Copy link

@github-actions
Copy link

github-actions bot commented Jan 5, 2023

Please provide a status update about this challenge. Every ongoing challenge needs at least one status update every 2 weeks. Thanks!

@RubenVerborgh RubenVerborgh removed their assignment Feb 21, 2023
@pheyvaer pheyvaer self-assigned this Feb 22, 2023
@pheyvaer pheyvaer added report: ongoing 👷‍♂️ The report of the complated challenge/scenario is being written. and removed ongoing The challenge is actively being tackled. update-required labels Feb 22, 2023
pheyvaer added a commit that referenced this issue Feb 22, 2023
@pheyvaer
Copy link
Contributor

You find the report for this challenge here.

@pheyvaer pheyvaer added report: done ✅ The report of the complated challenge/scenario is done. and removed report: ongoing 👷‍♂️ The report of the complated challenge/scenario is being written. labels Feb 23, 2023
@pheyvaer pheyvaer removed their assignment Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case completion: approved ✅ proposal: approved ✅ report: done ✅ The report of the complated challenge/scenario is done.
Projects
None yet
Development

No branches or pull requests

5 participants