Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using HDT (and other 'hybrid' data) on a hybrid Pod #88

Open
j-steinbach opened this issue Nov 29, 2022 · 4 comments
Open

Using HDT (and other 'hybrid' data) on a hybrid Pod #88

j-steinbach opened this issue Nov 29, 2022 · 4 comments
Assignees
Labels
challenge technical problem applied to a use case proposal: changes needed 👷

Comments

@j-steinbach
Copy link

j-steinbach commented Nov 29, 2022

Pitch

  • The what's in a pod vision interprets a Solid pod as a hybrid knowledge graph (KG).
  • It is possible to store both raw data/documents and RDF triples
  • But what if a document is both? E.g. a HDT file is compressed and serialized RDF data.
  • How is this data to be used? Does it belong to the KG by default? How do we read and traverse it?
    • If it is part of the pod KG, then how do we extend/add more triples to it?
    • If it is not, then how do we store and use big amounts of data/triples on a pod?
      • Querying a HDT with Comunica is faster/more efficient than querying the Turtle file. Depending on the machine/memory, Comunica will often not even be able to query big collections and fail with a OOM error (10mil triples, for example dbnary).

Desired solution

  • Be able to put a HDT file (or similar) on a Pod, query it and extend it like it was part of the KG.
  • Have it interoperate with Turtle/Quads/... (look them up at the same time)
  • Also be able to use it as a 'regular' file. (E: This should also work the other way around. Can we use/edit/display e.g. Turtle files as regular text files?)

Acceptance criteria

  • Have a HDT file one a pod together with some non-HDT triples (a Turtle file)
  • Have both files interoperate (get traversed/queries)
  • Extend the HDT file (how?)

Pointers

  • This might be a CSS issue
  • This could also be relevant in the context of 'plug&play' RDF data -- people can 'extend' their pod with a hashed, signed HDT file, in cases where the remote LOD server or aggregator is not trustworthy. This also gives the pod owner more control over the data (in case the remote LOD server gets shut down or aquired)

Scenarios

Use-Case / Origin

I want to put the Wiktionary data on a pod and then be able to re-create dictionary entries from the RDF data. I also want to be able to extend/annotate the dictionary entries (add new triples: my own example sentences, related words, ...) and export the data.

[The data is available as .ttl and .hdt. Comunica fails to read/query the Turtle data because it goes OOM (locally on the CLI, 16 GB RAM). The HDT however works.]

@j-steinbach j-steinbach added challenge technical problem applied to a use case proposal: pending ❓ labels Nov 29, 2022
@rubensworks
Copy link

rubensworks commented Nov 29, 2022

HDT would definitely be a good match as back-end for certain Solid use cases (mostly for non-write-intensive cases, since HDT doesn't support updates).

Related to this there is the need for being able to expose a query interface at pod-level (or container-level) that could be backed by triple stores such as HDT (#43). This would remove the requirement on the client to understand HDT (which can be quite tricky), and only having to interact with the query API.

Related work:

  • Verborgh, R., Vander Sande, M., Hartig, O., Van Herwegen, J., De Vocht, L., De Meester, B., ... & Colpaert, P. (2016). Triple pattern fragments: a low-cost knowledge graph interface for the web. Journal of Web Semantics, 37, 184-206.
  • Azzam, Amr, et al. "SMART-KG: hybrid shipping for SPARQL querying on the web." Proceedings of The Web Conference 2020. 2020.
  • Azzam, Amr, et al. "WiseKG: Balanced access to web knowledge graphs." Proceedings of the Web Conference 2021. 2021.

@j-steinbach
Copy link
Author

j-steinbach commented Nov 29, 2022

(Unrelated, but maybe also interesting: Is it possible to export parts of the KG? Maybe as HDT :))

E: Similar to how we select tables in SQL and then export them. Create a view > export.

@rubensworks
Copy link

(Unrelated, but maybe also interesting: Is it possible to export parts of the KG? Maybe as HDT :))
E: Similar to how we select tables in SQL and then export them. Create a view > export.

Certainly, such materialized views are really interesting for query optimization.

@pheyvaer
Copy link
Contributor

  • The acceptance criteria have to be more concrete. It has to be a list of steps that the user should be able to complete once the solution is provided.
  • Scenarios need to be in a separate issue. There is a template for scenarios that you need to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge technical problem applied to a use case proposal: changes needed 👷
Projects
None yet
Development

No branches or pull requests

4 participants