Google Sheet Sync is an agent that allows a user to convert and synchronise data between a Google Sheet and a Solid pod. It is based off of this challenge. You find the latest screencast of the agent here. You find a screencast of the first version here.
To set up the CSS instances with pod data, run
npm run prepare:pods
To start the server, run
npm run start:pods
To read and alter Google Sheets, we use the Google Sheet API. To access and use the Google Sheets API, one has to create/use a Google Cloud project.
These steps should be ignored if Google Sheets API OAuth2 client credentials (ID/secret pair) is supplied by an administrator.
- Navigate to the Google Cloud console and create a new project.
- Go to "APIs & services" and press "Enable APIs and services".
- Search for the Google Sheets API and press "Enable".
- Before we can create any OAuth2 credentials, one has to set up the OAuth consent screen by pressing "OAuth consent screen" on the left side of the projects API's & services interface.
- Select external user type.
- Fill out the required fields. App logo and domain can be left empty. No scopes have to be specified either.
- If the project is not yet published and still has the "Testing" status, the user should add themselves or be added as a test user, even if the user is the administrator/creator of the project.
- Navigate to the Google Sheet API's Credentials interface by first navigating to the Google Sheets API that is listed under "Enabled APIs & services" and navigating to the "Credentials" tab.
- Press "Create credentials" and select "OAuth client ID".
- Select application type "Desktop app" and create the credentials.
After creating or receiving OAuth2 credentials, this ID/secret pair should be pasted to the .env
file.
An example on how this should be done is present in .example.env
.
When a valid OAuth2 ID and secret is supplied, one still has to create an access token and refresh token to use the API. To create these, follow these steps using the authentication app:
- Make sure all dependencies have been installed by running
npm i
. - Run
npm run auth
to start the authentication web app. - Navigate to
http://localhost:8081/
(or another port if changed) in a browser. - Press "Authenticate" under the "Google" section.
- Log in/select a Google account that has access to the Google Cloud project and/or is added as a test user if the project is not published yet.
- When successful, the correct tokens have now been written to
credentials.json
. You find an example incredentials.example.json
.
The synchronisation app can now read and use these tokes to access the Google Sheet with the Google Sheets API.
To set up authentication for Solid pods, you use the same authentication server as the OAuth2 setup.
-
Make sure all dependencies have been installed by running
npm i
. -
Run
npm run auth
to start the authentication web app. -
Navigate to
http://localhost:8081/
(or another port if changed) in a browser. -
Fill in all the necessary information:
- host server (Url of your Community Solid Server where your pod is located)
- email for your pod
- password for your pod
The Solid instance that comes with this program is seeded with 1 pod by default. The default pod name is
example
. The email is[email protected]
with passwordabc123
. You can use these values to authenticate. -
Press "Authenticate" under the "Solid Pod" section.
-
When successful, the correct tokens have now been written to
solid-credentials.json
.
The synchronisation application is configured through the config.yml
file.
This parameter allows a user to specify a resource. This resource should be represented as a URI to a Solid pod from which the data will be fetched.
This parameter allows a user to specify the host of a resource. This is required to use the websocket protocol to listen for changes on the resource.
example:
resource: "http://localhost:3000/testing/software"
host: "http://localhost:3000"
When querying multiple resources at the same time, use the following structure.
resources:
- resource: "http://localhost:3000/testing/ratings"
host: "http://localhost:3000"
- resource: "http://localhost:3000/testing/tv-shows"
host: "http://localhost:3000"
Make sure that for every resource, you provide a host value. Because each resource could be on a different host.
You can find an example for multiple resources in the files config.query-multiple.example.yml
and rules-multiple.example.yml
This parameter allows a user to define a SPARQL query that will be used to retrieve data from the specified data sources.
example:
query: >
SELECT DISTINCT * WHERE {
?s <http://schema.org/name> ?name .
OPTIONAL {?s <http://schema.org/description> ?description} .
OPTIONAL {?s <http://schema.org/logo> ?logo} .
}
The sheet
section of the configuration file contains settings related to a specific sheet.
This parameter allows you to specify an id for the Google sheet that should be read and/or altered.
To find the id of your Google sheet, look at the URL of the Google Sheet in the address bar of your web browser. The URL should look something like this:
https://docs.google.com/spreadsheets/d/DOCUMENT_ID/edit#gid=0
Here, "DOCUMENT_ID" will be a long string of characters, letters, and numbers. This is the unique identifier for the Google Sheet.
This parameter allows you to specify a name for the Google sheet that should be read and/or altered.
This is the name of the tab on the bottom left that you want to sync.
This parameter allows you to specify the number of milliseconds between polls. The code will poll the sheet for changes after the specified number of milliseconds. The code will also poll the pod after this amount of milliseconds when websockets aren't used.
example:
sheet:
id: "ABCD1234"
name: "Sheet1"
interval: 1000
Instead of using a single, user defined SPARQL query as in the previous method, the user can use the fields
option
to specify the specific fields you want to retrieve from the data source.
This method provides a more structured way of fetching data.
This parameter allows you to specify a list of fields that must/should be present in the retrieved RDF data on the resource. Each field is represented as a key-value pair, where the key is the field name and the value is the corresponding SPARQL predicate or URI.
example:
fields:
required:
- name: "<http://schema.org/name>"
This parameter allows you to specify a list of fields that are optional in the retrieved RDF data on the resource(s). Similar to required, each field is represented as a key-value pair.
example:
fields:
optional:
- description: "<http://schema.org/description>"
- logo: "<http://schema.org/logo>"
This parameter allows you to turn off websockets when you want explicit polling every 5 seconds.
The interval
option from the Google Sheet configuration changes this value.
example:
websockets: "false"
Full configuration examples that incorporate either the query or fields method are present in
config.query.example.yml
and config.fields.example.yml
respectively.
To convert and write back changes from the Google Sheet back to the resource, the synchronisation agent
uses the RMLMapper. This mapper relies on declarative rules
that define how the RDF data should be generated from the data on the Google Sheet.
Write these rules in the form of YARRRML in the rules.yml
file.
You are responsible that the Sheet data that is fed to the RMLMapper contains enough information to be converted back to triples. The program itself keeps no track of the origin of the separate pieces of data nor the entity to which they belong.
You find an example in rules.example.yml
.
To set up and use the synchronisation agent, first make sure all the necessary dependencies have been installed by running
npm i
Afterward, start the agent by running
npm start
ESLint linter set up for this project. To run the linter, execute the following command:
npm run lint:js
There is also a markdown linter set up for this project. To run, execute the following command:
npm run lint:markdown
or
npm run lint:markdown:fix
to apply automatic fixes.
Currently, it is not handled when the sheet and another application try to update the resource in the pod at exactly the same time.