Skip to content

Commit

Permalink
Poll solid pod for changes to update the google sheet
Browse files Browse the repository at this point in the history
New things in config file:

"host" variable: used for websocket setup
sheet > name: instead of using a fixed range "A:ZZZ" for the google api, we look at the entire sheet. But the name can be changed and is therefor needed by the config file.
websockets: toggle to use websockets or not ("true" / "false")
  • Loading branch information
sevrijss authored Aug 18, 2023
2 parents fac52cc + 5d83c40 commit d293f66
Show file tree
Hide file tree
Showing 12 changed files with 234 additions and 24 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
src/testing.js
.env
config.yml
credentials.json
Expand Down
52 changes: 43 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,22 @@ The synchronisation app can now read and use these tokes to access the Google Sh

The synchronisation application is configured through the `config.yml` file.

#### resource (string)
### resource (string)
This parameter allows a user to specify a resource.
This resource should be represented as a URI to a Solid pod from which the data will be fetched.


### host (string)
This parameter allows a user to specify the host of a resource.
This is required to use the websocket protocol to listen for changes on the resource.

example:
```yaml
resource: "https://data.knows.idlab.ugent.be/person/office/software"
resource: "http://localhost:3000/example/software"
host: "http://localhost:3000"
```
#### query (string)
### query (string)
This parameter allows a user to define a SPARQL query that will be used to retrieve data from the specified data sources.
example:
Expand All @@ -77,12 +83,6 @@ The `sheet` section of the configuration file contains settings related to a spe
#### id (string)
This parameter allows you to specify an id for the Google sheet that should be read and/or altered.

example:
```yaml
sheet:
id: "ABCD1234"
```

To find the id of your Google sheet, look at the URL of the Google Sheet in the address bar of your web browser.
The URL should look something like this:
```
Expand All @@ -92,6 +92,26 @@ https://docs.google.com/spreadsheets/d/DOCUMENT_ID/edit#gid=0
Here, "DOCUMENT_ID" will be a long string of characters, letters, and numbers.
This is the unique identifier for the Google Sheet.

#### name (string)
This parameter allows you to specify a name for the Google sheet that should be read and/or altered. \
This is the name of the tab on the bottom left that you want to sync.


#### interval (int)
This parameter allows you to specify the number of milliseconds between polls.
The code will poll the sheet for changes after the specified number of milliseconds.
The code will also poll the pod after this amount of milliseconds when websockets aren't used.

example:

```yaml
sheet:
id: "ABCD1234"
name: "Sheet1"
interval: 1000
```



### Using Fields for Data Retrieving
Instead of using a single, user defined SPARQL query as in the previous method, the user can use the `fields` option
Expand Down Expand Up @@ -120,6 +140,17 @@ fields:
- logo: "<http://schema.org/logo>"
```

### Debug configurations

#### websockets
This parameter allows you to turn off websockets when you want explicit polling every 5 seconds.
The `interval` option from the Google Sheet configuration changes this value.

example:
```yaml
debug:
websockets: "false"
```

### Full examples
Full configuration examples that incorporate either the query or fields method are present in
Expand Down Expand Up @@ -154,3 +185,6 @@ write back changes from the Google Sheet back to a single destination.
### Public read/write authorization
It is required for the resource specified in the configuration file to have public read and write access,
as the agent has no support for authentication.

### No 2 applications write at the same time
Currently, it is not handled when the sheet and another application try to update the resource in the pod at exactly the same time.
18 changes: 17 additions & 1 deletion TESTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,20 @@ Set the `id` section to the id of an existing Google Sheet.
2. Wait at least the configured amount of milliseconds as configured under `interval` in the configuration file (default 5000).

### Postconditions
- The changes are correctly converted and written back to the resource `http://localhost:3000/example/software`.
- The resource `http://localhost:3000/example/software` contains the new data.

## Test if changes on the Pod are synced back to the Google Sheet

### Preconditions
- Follow and execute all steps in the "cold start" test above.

### Steps
The pod can be updated with the following request:
```shell
curl --location --request PATCH 'http://localhost:3000/example/software' --header 'Content-Type: text/n3' --data-raw '@prefix solid: <http://www.w3.org/ns/solid/terms#>. @prefix software: <https://data.knows.idlab.ugent.be/person/office/software#>. @prefix schema: <http://schema.org/>. _:rename a solid:InsertDeletePatch; solid:inserts { software:test schema:name "test"; schema:description "abracadabra". }.'
```
When using websockets, the change should be almost immediately shown,
otherwise wait at least the configured amount of milliseconds (default 5000ms).

### Postconditions
- The Google sheet contains the new data.
5 changes: 4 additions & 1 deletion config.fields.example.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
resource: "http://localhost:3000/example/software"
host: "http://localhost:3000"

fields:
required:
Expand All @@ -8,4 +9,6 @@ fields:
- logo: "<http://schema.org/logo>"

sheet:
id: "ABCD1234"
id: "ABCD1234"
name: "Sheet1"
interval: 10000
5 changes: 4 additions & 1 deletion config.query.example.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
resource: "http://localhost:3000/example/software"
host: "http://localhost:3000"

query: >
SELECT DISTINCT * WHERE {
Expand All @@ -8,4 +9,6 @@ query: >
}
sheet:
id: "ABCD1234"
id: "ABCD1234"
name: "Sheet1"
interval: 10000
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
"googleapis": "^122.0.0",
"http-server": "^14.1.1",
"js-yaml": "^4.1.0",
"n3": "^1.17.0"
"n3": "^1.17.0",
"ws": "^8.13.0"
},
"type": "module"
}
15 changes: 10 additions & 5 deletions src/google.js
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,12 @@ export async function writeToSheet(array, sheetId) {
/**
* Pull data from the sheet and check if there are any changes with the previously pulled data.
* @param {String} sheetId - ID of the Google sheet from which the data should be pulled and checked.
* @param {String} sheetName - Name of the Sheet page to check
* @return {Promise<{Boolean, Array}>} - 2D-array containing the latest data from the sheet
* and a boolean indicating a possible change.
*/
export async function checkSheetForChanges(sheetId) {
const rows = await getFromSheet(sheetId);
export async function checkSheetForChanges(sheetId, sheetName) {
const rows = await getFromSheet(sheetId, sheetName);
const hasChanged = previousRows !== undefined && !areArraysEqual(rows, previousRows);
previousRows = rows;
return {
Expand All @@ -73,12 +74,13 @@ export async function checkSheetForChanges(sheetId) {
/**
* Get the data from the sheet in the initial range.
* @param {String} sheetId - ID from the sheet from which the data should be pulled.
* @return {Array} 2D-array containing the data from the sheet.
* @param {String} sheetName - Name of the Sheet page to check
* @return {Promise<Array>} 2D-array containing the data from the sheet.
*/
async function getFromSheet(sheetId){
async function getFromSheet(sheetId, sheetName) {
const response = await sheets.spreadsheets.values.get({
spreadsheetId: sheetId,
range: 'A:ZZZ'
range: sheetName
});

return response.data.values;
Expand Down Expand Up @@ -115,6 +117,9 @@ function areArraysEqual(arr1, arr2) {
}

for (let i = 0; i < arr1.length; i++) {
if (arr1[i].length !== arr2[i].length) {
return false;
}
for (let j = 0; j < arr1[i].length; j++) {
if (arr1[i][j] !== arr2[i][j]) {
return false;
Expand Down
64 changes: 62 additions & 2 deletions src/main.js
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import {checkSheetForChanges, makeClient, writeToSheet} from "./google.js";
import {load} from "js-yaml";
import {objectsToRdf, yarrrmlToRml} from "./rdf-generation.js";
import {queryResource, updateResource} from "./solid.js";
import {getNotificationChannelTypes, queryResource, updateResource} from "./solid.js";
import {readFile} from 'fs/promises'
import {WebSocket} from 'ws';
import {compareArrays, getWebsocketRequestOptions} from "./util.js";

// Object containing information relating to the configuration of the synchronisation app.
let config = {};
Expand Down Expand Up @@ -53,6 +55,22 @@ function ymlContentToConfig(ymlContent) {
throw new Error("Error parsing YAML: Google sheet id should be specified");
}

if (configJson.sheet.name) {
config.sheetName = configJson.sheet.name
} else {
throw new Error("Error parsing YAML: Google sheet name should be specified")
}

if (configJson.host) {
config.host = configJson.host
} else {
throw new Error("Error parsing YAML: host value should be specified")
}

if (configJson.websockets) {
config.noWebsockets = configJson.websockets === "false"
}

config.interval = configJson.sheet.interval ? configJson.sheet.interval : 5000;
}

Expand Down Expand Up @@ -167,9 +185,51 @@ async function startFromFile(configPath, rulesPath) {

console.log("Synchronisation cold start completed");

// Pod -> Sheet sync
let websocketEndpoints = await getNotificationChannelTypes(config.host + "/.well-known/solid");

if (websocketEndpoints.length > 0 && websocketEndpoints[0].length > 0 && (!config.noWebsockets)) {
// listen using websockets
let url = websocketEndpoints[0]
let requestOptions = getWebsocketRequestOptions(config.source)

let response = await (await fetch(url, requestOptions)).json()
let endpoint = response["receiveFrom"];
const ws = new WebSocket(endpoint);
ws.on("message", async (notification) => {
let content = JSON.parse(notification);
if (content.type === "Update") {
const {results} = await queryResource(config, true);
const arrays = mapsTo2DArray(results);
const maps = rowsToObjects(arrays);
const quads = await objectsToRdf({data: maps}, rml);
if (!compareArrays(quads, previousQuads, compareQuads)) {
const rows = await writeToSheet(arrays, config.sheetid);
const maps2 = rowsToObjects(rows);
previousQuads = await objectsToRdf({data: maps2}, rml);
} else {
console.log("got notified but the latest changes are already present");
}
}
})
} else {
// polling using timers
setInterval(async () => {
const {results} = await queryResource(config, true);
const arrays = mapsTo2DArray(results);
const maps = rowsToObjects(arrays);
const quads = await objectsToRdf({data: maps}, rml);
if (!compareArrays(quads, previousQuads, compareQuads)) {
const rows = await writeToSheet(arrays, config.sheetid);
const maps2 = rowsToObjects(rows);
previousQuads = await objectsToRdf({data: maps2}, rml);
}
}, config.interval);
}

// Sheet -> Pod sync
setInterval(async () => {
const {rows, hasChanged} = await checkSheetForChanges(config.sheetid);
const {rows, hasChanged} = await checkSheetForChanges(config.sheetid, config.sheetName);
if (hasChanged) {
console.log("Changes detected. Synchronizing...");
const maps = rowsToObjects(rows);
Expand Down
2 changes: 1 addition & 1 deletion src/rdf-generation.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ export async function objectsToRdf(data, rml) {
/**
* Convert a String containing RDF data into quad objects.
* @param {String} text - A string containing the RDF data.
* @return {[quad]} Parsed quad objects.
* @return {Promise<[quad]>} Parsed quad objects.
*/
async function convertRdfToQuads(text) {
const parser = new Parser();
Expand Down
26 changes: 25 additions & 1 deletion src/solid.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,15 @@ import {Writer} from "n3";
/**
* Query the data from the Solid pod/resource(s) using the configuration
* @param {Object} config - Configuration object containing the necessary information to query and process the retrieved data.
* @param {boolean} noCache - clear http cache to get most recent document.
* @return {Promise<{array, array}>} Map objects containing the retrieved data
* and all possible keys representing the properties contained in the maps.
*/
export async function queryResource(config) {
export async function queryResource(config, noCache = false) {
const myEngine = new QueryEngine();
if (noCache){
await myEngine.invalidateHttpCache();
}
const results = [];
const keys = new Set();
const query = config.query !== undefined ? config.query : configToSPARQLQuery(config);
Expand Down Expand Up @@ -37,6 +41,26 @@ export async function queryResource(config) {
});
}

/**
* Query the available websocket channels that may be listed in a given endpoint
* @param {string} url - host to query (e.g. http://localhost:3000/.well-known/solid/)
* @returns {Promise<string[]>} list of available endpoints to request a websocket connection
*/
export async function getNotificationChannelTypes(url){
const myEngine = new QueryEngine();
const result = await (await myEngine.queryBindings(`
SELECT DISTINCT ?channel WHERE {
?s a <http://www.w3.org/ns/pim/space#Storage> .
?s <http://www.w3.org/ns/solid/notifications#subscription> ?channel .
?channel <http://www.w3.org/ns/solid/notifications#channelType> <http://www.w3.org/ns/solid/notifications#WebSocketChannel2023>
}`,
{
sources: [url],
}
)).toArray();
return result.map(binding => binding.get("channel").value)
}

/**
* Convert the "fields" configuration data into a SPARQL query
* @param {Object} config - Configuration object containing the necessary information build the SPARQL query (required and optional fields).
Expand Down
Loading

0 comments on commit d293f66

Please sign in to comment.