forked from coolharsh55/phd-thesis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintroduction.tex
493 lines (415 loc) · 69.7 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
\chapter{Introduction}
\label{chapter:introduction}
\section{Background \& Motivation}\label{sec:intro:background}
% privacy laws across the world
% disconnect with technological progress
To date, 132 of the 206 states listed by the United Nations (UN) have a privacy law which regulates the usage of personal data \cite{greenleaf_global_2019-1}.
However, their intended application suffers from a disconnect with the rapid progress in technology. In particular, the use of internet as a medium for data exchange and its pervasiveness and connectivity to individuals via devices such as the smartphone has led to industrial data harvesting at large scales \cite{christl_networks_2016}.
To counter this problem, lawmakers in the European Union (EU) passed the General Data Protection Regulation (GDPR) \cite{Regulation_GDPR} in 2016 with the aim of providing individuals with the right to information and control over use of their personal data, and to simplify requirements for organisations through a unified regulation across the EU.
The GDPR has received a large amount of attention due to its prospective fines which can potentially be up to 4\% of an organisation's annual turnover or €20 million - whichever is greater.
As of February 2020, there have been over 215 publicly known instances of fines associated with the GDPR \cite{GDPR_fines_tracker}, the largest of which was the €50 million fine to internet giant Google \cite{CNIL_GOOGLE_2019}.
Being a regulation and replacing the Data Protection Directive (DPD) \cite{directive_DPD}, GDPR provides a uniform set of compliance requirements across the EU, and is the basis of national privacy laws implemented in its member states \cite{mccullagh_national_2019}.
Furthermore, GDPR has influenced other privacy laws, such as the California Consumer Protection Act (CCPA) \cite{CCPA}, thereby further expanding similarities in compliance requirements across the globe.
The most visible change of the GDPR for most individuals is the ubiquitous `consent dialogue' on websites that requests `consent' - one of the legal basis for processing of personal data in GDPR.
Despite being a legal requirement, consent dialogues have been accused of being non-transparent and subverting the spirit of the GDPR \cite{machuletz_multiple_2019,utz_informed_2019}.
The issue of consent itself has received significant interest in development and utilisation of technological solutions for compliance due to the right to withdraw consent provided by the GDPR which enables an individual to revoke their previously given consent and requires processing of personal data based on it to be halted.
Opinions published by legal experts and bodies, in absence of case law on this issue, have expressed the need for greater transparency regarding activities associated with use of consent \cite{opinion_AG_2019}.
% information associated with gdpr compliance
Compared to other privacy laws, including its predecessor DPD, GDPR provides significantly stricter and detailed requirements for processing of personal data and requires organisations to explicitly document information in relation to its obligations in order to be compliant.
This information consists of identification of GDPR clauses applicable to the practices of an organisation and the steps taken to fulfil requirements and obligations for compliance.
From a technical or information management point of view, GDPR specifies interactions between entities in a clear manner. An example of this is an organisation using consent as the legal basis being required to provide information about processing activities to the data subject. Furthermore, this information is required to be maintained, evaluated, and documented to demonstrate compliance upon request by authorities. At the same time, this information is also associated with other stakeholders - such as through privacy policies, user agreements, terms and conditions, or even data processing agreements. This makes it clear that information associated with GDPR compliance is also used in other applications and involves multiple stakeholders.
As GDPR is a data protection law, its compliance is concerned primarily with processing of personal data, its legality, and associated operations within an organisation.
This includes processing in both tenses - past as well as future - where an organisation is obligated to first determine and ensure its requirements and activities involving processing of personal data are valid as per the GDPR, and to then maintain a record of such activities as the processing takes place.
These are defined\footnote{Defined in EU terminology database (IATE) \url{https://iate.europa.eu/entry/result/787324/en}} within the legal domain by the terms `ex-ante' to specify compliance assessment before activity takes place (preventative) and `ex-post' to specify compliance assessment after the activity has taken place (corroborative).
While the GDPR does not explicitly mention a `phase' of compliance, its use enables associating the information to the planning and processing operations carried out within an organisation. The planning of processing operations also involves investigation of whether the intended operations will be compliant to the legal obligations, and the required corrections to ensure they continue to be so. The processing operations carried out also need to be inspected to ensure they met the requirements set forth in the planning stage and that the processing itself was legally compliant.
The combination of new requirements and significant fines has provided an incentive to utilise technology in meeting the obligations and requirements stipulated by GDPR towards its compliance.
Existing efforts, such as the International Organization for Standardization (ISO), have addressed this change by updating standards to meet increased requirements with global privacy laws.
In the context of GDPR, ISO/IEC 27001\footnote{\url{https://www.iso.org/standard/54534.html}} defines requirements for an information security management system, and its extension ISO/IEC 27701\footnote{\url{https://www.iso.org/standard/71670.html}} defines a privacy information management system, which together provide a framework for managing privacy risks associated with personal data processing.
Adherence to such standards provides a commonality in the information management practices of an organisation, and assists in the compliance process by providing a structured interpretation and demonstration of practices based on the standardised specifications.
% challenges in developing technological solutions
Technological development of solutions for legal compliance face two problems in general - the first being algorithmic interpretation of requirements associated with legal compliance. This is difficult as the text used in a legal document such as GDPR does not readily lead to algorithmic compliance due to ambiguity and uncertainty in its legal interpretation - especially in domain specific use-cases.
In addition, because GDPR has been enforced for a comparatively short period - the interpretation of its clauses as requirements for compliance relies on clarification through legal opinions and decisions by supervisory authorities and courts.
The second problem is that regardless of how technology is used in the compliance process, formal investigations of legal compliance require information to be documented and associated with the specifics of the law they intend to comply with - in this case the articles and clauses of GDPR.
Traditionally, this is carried out through creation of documentation by legal experts, lawyers, and legal departments.
Therefore, technological solutions addressing GDPR compliance must also provide information documentation in addition to assessment of compliance.
% problem with existing text based approaches
Incorporating legal compliance into organisational requirements has led to several approaches such as: use of symbolic (mathematical) logic, knowledge representation of legal text as logical rules, deontic rights specifying rights and obligations, defeasible logic based on exceptions, first order temporal logic, access control, markup based representations, and goal modelling of obligations \cite{otto_addressing_2007}.
While there has been significant work in the use of technology to adopt these approaches towards addressing and evaluating compliance in the last decade \cite{sadiq_modeling_2007,otto_addressing_2007,gordon_rules_2009,fellmann_state---art_2014,benyoucef_information_2015,elgammal_formalizing_2016,kirrane_access_2016}, the issue of associating information with legal documents has received relatively less attention.
Where contemporary methods are sufficient to meet legal requirements, their use of text-based document formats prevents effective technological solutions that can be scaled, automated, or utilised in an information management system. To enable such approaches, information associated with compliance must be represented using machine-readable formats that enable the use of querying to retrieve information as well as validation methods to ensure its correctness. Furthermore, the need to share information between stakeholders defined within the GDPR provides motivation towards developing interoperability in information and solutions - which also provides transparency in the compliance process. By using open and interoperable standards, the commonality in representation and interpretation of information benefits stakeholders and reduces costs associated with innovation regarding information management and regulatory compliance.
% linked data principles and ELI, Akoma Ntoso
Governmental agencies across the globe have addressed the issue of information interoperability by adopting the principles of Linked Data \cite{bizer_linked_2011} and have produced interoperable standards \cite{palmirani_akoma_2018,european_union_eli_2015,van_opijnen_european_2011} to facilitate use of information in technological solutions.
These standards implement principles of the Semantic Web \cite{semantic-web} by utilising the Resource Description Framework (RDF) \cite{RDF} to specify information in an interoperable, extensible, and machine-readable manner.
This has paved the way for development of technologies that address challenges associated with legal compliance through greater use of automation and operations at large scales.
Consequently, the use of Linked Data and Semantic Web within the legal domain has resulted in the development of ontologies for organising and structuring information, reasoning and problem solving, semantic indexing and search, semantic integration and interoperability, and understanding the domain \cite{rodrigues_legal_2019}.
Semantic Web is also being used to address the challenges associated with GDPR compliance through commercial\footnote{Example: Top Quadrant's Semantic Data Governance for GDPR Compliance \url{https://www.topquadrant.com/gdpr-compliance/}} solutions as well as large-scale European research projects such as SPECIAL \cite{SPECIAL}, MIREL \cite{MIREL}, DAPRECO \cite{DAPRECO}, BPR4GDPR \cite{BPR4GDPR}, and RestAssured \cite{RestAssured}.
The technological solutions developed within these utilise ontologies to represent the information required for compliance and a corresponding approach that expresses and evaluates obligations to assess compliance.
In general, semantic web technologies provide numerous advantages, such as: their basis in standards maintained through community and stakeholder engagement by W3C; an interoperable information representation specification that serves as the base for other specifications that build on top of it to provide knowledge modeling, querying, and validation; and the ability to easily build upon an existing knowledge-system by extending the underlying data models while retaining compatibility. These advantages make semantic web technologies and ideal and useful toolset in the legal compliance domain, and especially the GDPR given the emphasis of its requirements on information and documentation for compliance.
%%% link between GDPR compliance and research question???
The work presented in this thesis also utilises the Semantic Web to address GDPR compliance. It focuses on the representation of activities associated with processing of personal data and consent as a subset of information relevant to the investigation of GDPR compliance. These activities correspond to how organisations plan their processing of personal data and execute or implement them, and are therefore relevant to the planning and management of operations within organisations.
This includes activities associated with acquiring consent owing to the role of consent as a legal basis and the assertion that consent itself is also personal data.
The novelty of this work lies in the application of linked data principles to associate information with GDPR and the advantages this provides in utilising semantic web technologies to represent, query, and validate information relevant for compliance.
The role of semantic web in this is towards representing information relevant for GDPR compliance that can be associated with the text of GDPR following Linked Data Principles.
This involves use of existing standards of RDF \cite{RDF} and Web Ontology Language (OWL2) \cite{OWL} to represent information as ontologies, SPARQL \cite{SPARQL} for querying information, and Shapes Constraint Language (SHACL) \cite{SHACL} to validate information.
The use of semantic web standards and technologies enables the information to be persisted in a machine-readable, interoperable, and queryable form - and thus readily lends itself to automation using technological solutions in the areas of legal compliance and its documentation.
% The work presented in this thesis and representing its contributions is presented with a comparison to approaches identifed and analysed within the state of the art to demonstrate its novelty.
% Research Scope
In terms of scope, the work presented in this thesis addresses only the representation and management of information associated with GDPR compliance, and is not intended to provide an authoritative assessment of compliance as only supervisory authorities and courts have legal authority in this matter.
In the same vein, the research presented in this thesis is also not intended to replace professional opinions such as that offered by lawyers and legal experts.
Instead, the intention of the work is to demonstrate the applicability and feasibility of using technology as a tool to assist with the compliance process.
\section{Research Question}\label{sec:intro:RQ}
% The aim of the thesis is to enable representation of information associated with ex-ante and ex-post activities involving processing of personal data and consent by using semantic web technologies for GDPR compliance.
The research question investigated in this thesis is:
\begin{framed}
\small{Research Question}
\begin{quote}
\textbf{To what extent can information regarding activities associated with processing of personal data and consent be represented, queried, and validated using Semantic Web technologies for GDPR compliance?}
\end{quote}
\end{framed}
\subsection{Definitions}\label{sec:intro:definitions}
The following definitions are used in the context of the research question outlined above and this thesis:
\begin{itemize}
\item \textit{information regarding activities}: information about how processes, services, tasks, or other similar concepts are planned, executed or carried out, along with the resulting outcomes and the artefacts used or required;
\item \textit{activities associated with processing of personal data}: information about how personal data will be or has been obtained (its source), its usage - including storage, sharing, analysis, or other forms of processing;
\item \textit{activities associated with consent}: information about how consent will be or has been obtained, its usage as a legal basis, the information represented by consent, and its planned or recorded withdrawal;
\item \textit{querying}: retrieving information using a structured representation based on the underlying representation of information;
\item \textit{validation}: assessment of information to meet a constraint or requirement;
\item \textit{associate or link information with GDPR}: to establish an association or link between information and clauses or concepts of the text of GDPR;
\item \textit{subset of GDPR}: a subset of the clauses defined in the text of the GDPR;
\item \textit{ex-ante compliance}: compliance regarding processing before it has taken place, i.e. \textit{A-priori};
\item \textit{ex-post compliance}: compliance regarding processing after it has taken place, i.e. \textit{A-posteriori};
\item \textit{compliance questions}: questions that retrieve information relevant for determination of compliance;
\item \textit{transparency of information}: specifying or providing information in a way that enables others (external entities) to understand and analyse it.
\end{itemize}
\subsection{Research Objectives}\label{sec:intro:RO}
The research question represents a broad investigation which is difficult to address as a whole. Therefore, it is reconstructed as multiple `sub-research questions' which are smaller in scope and provide specific aims in the form of research objectives. These objectives are influenced by the analysis of the state of the art and subsequent identification of gaps in \autoref{sec:sota:analysis} as potential opportunities to answer the research question.
The first two objectives are structured on the identification of information required for GDPR compliance. The third objective focuses on the use of semantic web technologies for information representation, while the fourth and fifth objectives are associated with querying and validation of information respectively.
% RO1
The GDPR is a legal document structured into 173 Recitals, 99 Articles, and 21 Citations. Of these, not all clauses are relevant to activities associated with personal data and consent. Therefore, the first research sub-question concerns investigation and identification of the sub-set of GDPR regarding activities associated with personal data and consent, along with information on the ex-ante and ex-post aspects of such activities towards compliance. This provides the first objective as:
\begin{framed}
$RO1$: Identify the subset of GDPR relevant for activities associated with processing of personal data and consent regarding compliance.
\end{framed}
% RO2
Following identification of the relevant sub-set of GDPR, information required to represent activities needs to be identified through `compliance questions' representing an investigation process to identify the actors, entities, and relationships relevant for GDPR compliance. This provides the second objective as:
\begin{framed}
$RO2$: Identify information required to represent activities associated with processing of personal data and consent in investigation of GDPR compliance.
\end{framed}
% RO3
The identified information is then represented as semantic web ontologies consisting of concepts and relationships. This representation acts as the information model upon which questions or queries can be executed to retrieve information for determining compliance. The formalisation of information as an ontology provides a controlled vocabulary for validation of information to determine its sufficiency and correctness before determining compliance.
% Rather than assimilating all information within a singular ontology, good practice dictates creation of modular ontologies specific to a particular task of domain. The information requirements can thus be divided into three distinct areas, each of which correspond to a specific topic or context, and lead towards the creation of an ontology within it. The first sub-objective concerns creating an ontology to associate information with the concepts and clauses within the text of the GDPR. The second objective utilises this ontology to represent activities associated with processing of personal data and consent. The third objective provides additional information regarding consent as required to determine its compliance. The distinction between the second and third objectives is based on the requirement of information other than that associated with activities in the determination of compliance for consent.
Instead of representing all required information in a single large ontology, modular ontologies provide better reuse and are easier to engineer \cite{suarez-figueroa_neon_2012}.
A modular ontology is limited in scope towards representing a specific information category, and therefore is more consistent in its representation of concepts, and is easier to evaluate as compared to a larger ontology in which different concepts may have differing semantic connotations.
Modular ontologies also provide better motivation for reuse though selective choosing of concepts in a module without dependency of concepts in other modules.
With this as motivation, the larger objective of $RO3$ for creating an ontology to address the research question is divided into three modular ontologies of: $RO3(a)$ - associating information with clauses and concepts of GDPR; $RO3(b)$ - representing information about activities associated with processing of personal data and consent; and $RO3(c)$ representing information about consent.
This provides the third objective as:
\begin{framed}
$RO3$: Create OWL2 ontologies for representation information about:
\newline\indent\indent\textbf{(a)}: concepts and text of GDPR
\newline\indent\indent\textbf{(b)}: activities associated with processing of personal data and consent
\newline\indent\indent\textbf{(c)}: consent required to determine compliance
\end{framed}
% RO4
'Compliance questions' retrieve information required to determine compliance, and are important in the documentation process. The information retrieval can be automated by utilising SPARQL queries to represent compliance questions using corresponding concepts and relationships from the developed ontologies. This provides the fourth objective as:
\begin{framed}
$RO4$: Represent compliance questions using SPARQL to query information about activities associated with processing of personal data and consent
\end{framed}
% RO5
The determination of compliance includes assessing whether a given information satisfies all obligations and requirements, and also involves validation of information itself in terms of correctness and completeness.
In software engineering processes such assessments are automated as `tests' that validate data and produce a report to record documentation.
The same principle is utilised here to assess information for correctness and completeness based on requirements of GDPR.
This is done using SHACL which enables expressing validation requirements over developed ontologies and produces a report which can be persisted and linked back to the GDPR for documentation of compliance.
This provides the fifth objective as:
\begin{framed}
$RO5$: Utilise SHACL to:
\newline\indent\indent\textbf{(a)}: validate information for GDPR compliance regarding activities associated with processing of personal data and consent
\newline\indent\indent\textbf{(b)}: link validation results with GDPR
\end{framed}
\section{Research Methodology}\label{sec:intro:research-methodology}
% state of the art
\subsection{Reviewing the State of the Art}
A review of the state of the art (SotA) regarding approaches towards GDPR compliance was conducted at several stages of the research from March 2016 to September 2019.
Publications associated with research objectives were driving factors in providing requirements to conduct a SotA review to capture the approaches and progress at that particular time.
In addition, a general review of legal models for compliance was also conducted to identify relevant approaches which could be reused towards addressing requirements of the GDPR.
The inclusion of approaches in SotA largely focused on the use of semantic web technologies and the extent of their applicability towards addressing the requirements of the GDPR.
An understanding of GDPR was obtained from sources including the official text of GDPR \cite{Regulation_GDPR}, its interpretation and clarification provided by authoritative bodies such as Data Protection Commissions in various jurisdictions, Article 29 Working Party (A29WP), and the European Data Protection Board (EDPB).
In addition, guides and expert opinions provided by legal experts and organisations were utilised as non-authoritative sources to better understand requirements of GDPR compliance.
Information requirements associated with compliance presented within the thesis are based on these sources and through studying case law related to interpretation of the GDPR where accessible.
Approaches and resources within SotA were reviewed where information was open and accessible - such as through academic publications and project deliverables.
Where such information was not accessible - such as in commercial products and some resources in academic projects - only the available information was included in the review of SotA.
Publications and resources were discovered through Google Scholar, Scopus, IEEExplore, ACM Digital Library, and through events such as conferences and events, and through dissemination networks such as Twitter.
Zotero was used as a bibliography tool for managing references and notes.
\subsection{Information Gathering}
The gathering of information regarding requirements of GDPR and its compliance was done through a literature review of official and authoritative documentation published by legal bodies and organisations.
In order to understand the requirements of GDPR and stakeholders involved, a model was developed to understand requirements for information interoperability for each stakeholder.
The information about GDPR and its requirements was used to create `compliance questions' to guide the ontology development process by acting as `competency questions' (see \autoref{sec:intro:ontology-engineering}) and to act as queries for retrieving information relevant to the compliance process. The questions also provided the basis for creating information validation constraints.
This process fulfilled research objectives $RO1$ and $RO2$ and is described in \autoref{chapter:information}.
% construction of ontologies
\subsection{Ontology Engineering}\label{sec:intro:ontology-engineering}
The ontologies developed to fulfil research objective $RO3$ used methodologies commonly adopted and recommended within the semantic web community. A general introductory guide for creating ontologies \cite{noy_ontology_2001} was used to understand and start the process of ontology engineering.
The actual construction of ontologies followed a combination of NeON methodology \cite{suarez-figueroa_neon_2012} and UPON Lite \cite{de_nicola_lightweight_2016} - where NeON was used to identify existing scenarios and gather requirements and UPON Lite was used to derive actionable steps or tasks to build and test the ontology using an agile development process.
The combination provided a methodology for identifying relevant information from the GDPR (using NeOn) and iteratively building and updating an ontology to represent it (using UPON Lite).
The methodology is described in more detail in \autoref{sec:voc:methodology}, with a summary as:
\begin{enumerate}
\item Identification of aims, objectives, scope
\item Identify and analyse relevant information
\item Create use-cases and competency questions
\item Identify concepts and relationships
\item Create Ontology
\item Evaluate
\item Progressive iterations following steps 2 to 6
\item Dissemination
\end{enumerate}
Each ontology was documented with metadata based on best practices advocated by the semantic web community\footnote{\url{https://dgarijo.github.io/Widoco/doc/bestPractices/index-en.html}} for automatic generation of documentation using the WIDOCO tool \cite{garijo_widoco_2017}.
The namespace IRI was defined with persistent identifiers through the use of W3ID\footnote{\url{http://w3id.org/}}.
The ontology itself was archived in the public open repository Zenodo\footnote{\url{https://zenodo.org/}} which provided it with DOIs.
All code and resources associated with the ontologies are published in GitHub - an open and public code repository.
The ontology and related resources were hosted on Trinity College Dublin servers to enable resolution of their IRIs on the internet.
% querying and validation framework
\subsection{Querying Information for GDPR Compliance}
The querying of information utilised SPARQL and fulfilled research objective $RO4$.
The methodology to represent compliance questions as SPARQL queries utilised questions from a real-world document published by the Irish Data Protection Commission for assisting organisations in evaluation their readiness for GDPR.
The querying was demonstrated by representing each question within the document as a SPARQL query using the developed ontologies and executed over a synthetic use-case.
\subsection{Information Validation Framework for GDPR Compliance}
In order to demonstrate the validation of information, a modular framework was proposed in \autoref{sec:testing:shacl} consisting of creating a `compliance graph' separate from the data graph for storing information relevant to compliance.
This facilitated the querying and validation of information associated with compliance in a modular approach using SPARQL and SHACL respectively.
The constraints and assumptions created from constraint questions in \autoref{chapter:information} were represented using SHACL and used to validate information based on obligations and requirements of GDPR compliance.
Its application was demonstrated through a use-case evaluating validity of consent in a real-world website.
% evaluation strategy
\subsection{Evaluation Methodology}\label{sec:intro:evaluation}
A summary of evaluations methods used in the thesis is presented in \autoref{table:intro:evaluation-methods}
\begin{table}[htbp]
\footnotesize
\centering
\caption{Summary of Evaluation Methods}\label{table:intro:evaluation-methods}
\rowcolors{1}{}{gray!10}
\begin{tabularx}{\textwidth}{|l|X|X|X|X|X|}
\hline
Method & GDPRtEXT Ontology & GDPRov Ontology & GConsent Ontology & Querying using SPARQL & Validation using SHACL \\ \hline
Fulfilment of Competency Questions & \cmark & \cmark & \cmark & N/A & N/A \\ \hline
Semantic reasoner logical consistency & \cmark & \cmark & \cmark & \cmark & \cmark \\ \hline
OOPS! common pitfalls detection & \cmark & \cmark & \cmark & N/A & N/A \\ \hline
Documentation metadata and quality & \cmark & \cmark & \cmark & N/A & N/A \\ \hline
Demonstrate application to use-case & \cmark & \cmark & \cmark & \cmark & \cmark \\ \hline
External use-case & \xmark & \cmark & \cmark & \cmark & \cmark \\ \hline
Comparison with SotA & \cmark & \cmark & \cmark & \cmark & \cmark \\ \hline
Analysis of citations & \cmark & \cmark & N/A & \cmark & N/A \\ \hline
\multicolumn{6}{|l|}{Dissemination of work (for providing transparency)} \\ \hline
Peer-reviewed publication & \cmark & \cmark & \cmark & \cmark & \cmark \\ \hline
Reproducibility (open access resources) & \cmark & \cmark & \cmark & \cmark & \cmark \\ \hline
\end{tabularx}
\end{table}
\subsubsection{Evaluating Ontologies}
%% ontologies
The developed ontologies presented in \autoref{chapter:vocabularies} were assessed in their sufficiency and completeness to answer the competency questions they were designed for.
In addition, use-cases related to situations differing in compliance requirements were used to assess the ontology in terms of sufficient representation of related information. These use-cases were compiled from GDPR-related case law, SotA, and synthetic situations, and validated regarding information requirements with a legal expert.
Ontologies were also evaluated using best practices advocated by the community throughout its development using a semantic reasoner to ensure logical consistency in expressed facts and axioms, and by using the OOPS! \cite{poveda-villalon_oops!_2014} online service to detect common pitfalls in ontology design.
% While most detected pitfalls were corrected in the ontology engineering process, some pitfalls were found to have originated in parts of reused ontologies not relevant to this work and were ignored .
Finally, the sections in this thesis describing each ontology present a comparison against similar ontologies identified in SotA to analyse novelty, strengths, and weaknesses.
The sections also present relevant peer-reviewed publications where ontologies were presented and discussed. Citations to these publications were used to identify relevant approaches and to investigate criticisms and comparisons with other ontological representations.
An ad-hoc evaluation of ontologies is also presented through their use in querying and validation of information for research objectives $RO4$ and $RO5$. This demonstrated the sufficiency of each ontology to provide sufficient concepts for representing information to facilitate querying and validation processes.
\subsubsection{Evaluating Querying of Information}
The use of SPARQL to query information based on compliance question as presented in \autoref{sec:testing:sparql} was evaluated by applying it to questions in a document published by the Irish Data Protection Commission.
The SPARQL queries utilised developed ontologies to represent the given question as a compliance question and provided an opportunity to evaluate the extent to which the ontology could represent these concepts.
The approach itself was evaluated based on the extent to which the questions in the document could be expressed as SPARQL queries.
Where a question could not or was not expressed using SPARQL, an analysis was carried out to determine the reason - such as the question not being in scope of the research question.
\subsubsection{Evaluating Information Validation Framework}
%% querying and validation framework
The framework developed for validating information using constraints derived from compliance questions is presented in \autoref{sec:testing:shacl}.
Its evaluation consisted of generating a synthetic use-case using the consent mechanism of a real-world website where the constraints related to consent and personal data activities were validated on information from the website .
The use-case enabled representation of activities in both ex-ante and ex-post phases where ex-ante represented validity of the consent dialogue being presented, and ex-post represented determining validity of given consent.
The information regarding activities related to personal data and consent within the use-case was represented using developed ontologies.
SHACL was then used to define constraints derived from competency questions with links to GDPR added to the constraints using developed ontologies and custom properties.
The evaluation consisted of demonstrating use of SHACL and developed ontologies to express the constraints and the ability to link constraints and its validation results with relevant clauses of GDPR.
The approach also demonstrated the use of validation results as actionable tasks for compliance associated with clauses of GDPR.
The framework and the application were compared with approaches within the SotA to demonstrate novelty in use of SPARQL and SHACL for GDPR compliance.
\section{Contributions of this Thesis}\label{sec:intro:contributions}
The two major contributions of this thesis are (based on ontologies in $RO3$): first - enabling association of information with the text of GDPR following linked data principles, and second - ontologies for representing information about activities associated with processing of personal data and consent. Minor contributions include formulating an information model of entities and their relationships in GDPR (based on information in $RO1$ and $RO2$), using semantic web technologies for querying (based on $RO4$) and validating (based on and $RO5$) information required for compliance. Resources associated with the contributions\footnote{\url{http://openscience.adaptcentre.ie/res/}}, including published papers\footnote{\url{https://openscience.adaptcentre.ie/publications/}}, have been made accessible under open licenses (MIT, CC-by-4.0) for reproducibility and to foster adoption and re-use by the community.
\subsection{GDPR as a Linked Data Resource}
The first major contribution of this thesis is the GDPRtEXT resource - which provides a linked data version of the text of GDPR and a glossary of its concepts. It fulfils research objective $RO3(a)$ and enables fulfilment of $RO5(b)$ by exposing each individual article or point within the text of GDPR as a unique resource using semantic web to enable links to be established between information and clauses of the GDPR. As these links are machine-readable, they can be used in approaches to automate the generation and querying of information associated with GDPR - such as for compliance, management of business processes, or generation of privacy policies. Furthermore, GDPRtEXT extends and is therefore compatible with the ELI ontology \cite{thomas_european_2019} used by the European Publications Office to publish legislations - including GDPR. ELI currently provides representations only at the document level, which GDPRtEXT extends for representing clauses at a granular level. GDPRtEXT thus provides its features in a manner compatible and interoperable with ELI.
It is currently common practice to refer to concepts within legal documents such as GDPR by associating them with their defining or relevant clauses within the document.
GDPRtEXT provides a glossary (and vocabulary) of concepts defined or referred to within GDPR to assist with use of concepts associated with its compliance. Each concept or term is associated with its definition or articles of relevance within GDPR by using the linked data version of text provided by GDPRtEXT. This provides another way to link information to GDPR through use of concepts and has been used to indicate the source in definitions of terms and relationships within the other developed ontologies (see \autoref{sec:contributions:ontologies}).
GDPRtEXT fills an important gap in the state of the art (as investigated in \autoref{chapter:sota}) by providing a mechanism to link information with the text of GDPR in a machine-readable manner. It is the only provider of a semantic web glossary of terms associated with GDPR and its compliance with a reference to their definition and usage within the text of GDPR.
While there are other comparable and relevant methods to address such information \cite{agarwal_legislative_2018,palmirani_pronto_compliance_2018}, GDPRtEXT is currently the only one that uses and extends ELI \cite{ELI_2012} - the official metadata standard for European legislation documents, and is also the only open and accessible ontology regarding GDPR and its concepts \cite{leone_taking_2019}.
GDPRtEXT has been released\footnote{\url{https://w3id.org/GDPRtEXT}} under an open license (CC-by-4.0) and has been incorporated into Ireland's open data portal\footnote{\url{https://data.gov.ie/dataset/gdprtext}}.
The provision of machine-readable concepts and reference to clauses of the GDPR makes GDPRtEXT an important resource for use in legal knowledge graphs.
\subsection{Ontologies for representing activities about Personal Data and Consent}\label{sec:contributions:ontologies}
The second major contribution of this thesis are the two semantic web ontologies - GDPRov for representing information about activities associated with processing of personal data and consent, and GConsent for representing information associated with determining compliance of consent. Both ontologies define concepts and relationships using GDPRtEXT to indicate source within GDPR.
Together with GDPRtEXT, GDPRov and GConsent enable representation of activities required to evaluate and validate compliance with the GDPR. Apart from advancing state of the art, the ontologies also provide a vocabulary of terms and concepts relevant for GDPR compliance, and demonstrate the use of legal documents as a source for ontologies using linked data principles.
Their usefulness has been demonstrated in approaches of: representation of information in privacy policies \cite{pandit_ontology_2018}, generation of privacy policies from metadata \cite{pandit_personalised_2018}, and automating change-detection and its effects on activities \cite{pandit_gdpr-driven_2018}.
GDPRov\footnote{\url{https://w3id.org/GDPRov}} and GConsent\footnote{\url{https://w3id.org/GConsent}} are published under an open license (CC-by-4.0).
\subsubsection{GDPRov}
GDPRov enables representation of the processes and activities associated with the life-cycle of personal data and consent, and fulfils the research objectives $RO3(b)$ and $RO3(c)$.
GDPRov extends PROV-O \cite{lebo_prov-o_2013} - which is the W3C standard for defining provenance information - to define ex-post (activity logs indicating things that have happened) information, and P-Plan \cite{garijo_p-plan_2014} to define ex-ante (as an abstract model, template, or plan) representations of PROV activities based on scientific workflows.
This enables it to represent planned activities as a model or template which is required to assess ex-ante compliance, and to associate it with its corresponding executions which are required to assess ex-post compliance.
The linking of information between ex-ante and ex-post phase in GDPRov comes from its basis in scientific workflows. It also provides the opportunity to exploit this association for a more efficient approach in evaluation of compliance, as proposed and demonstrated in \autoref{sec:testing:shacl}, and summarised as a contribution in the sections below.
% In this approach, ex-post activities are assumed to be compliant for information already found compliant in their ex-ante representation. Therefore, ex-post activities only need to be evaluated for information and validations specific to the ex-post phase, thus saving the repetition in evaluations of information. An application of this approach is demonstrated using SHACL and features prominent use of GDPRov and GConsent in \autoref{sec:testing:shacl}.
The state of art contains ontologies for representing activities and their provenance related to the GDPR \cite{pasquier_data_2018,palmirani_pronto_compliance_2018}, including those utilising PROV \cite{belhajjame_provenance_2018,bonatti_special_2018-1}, and holistic approaches combining ex-ante and ex-post compliance \cite{dullaert_d3.4_2019}.
In comparison, GDPRov provides the most exhaustive vocabulary of concepts based on the GDPR (based on comparisons demonstrated in \autoref{chapter:vocabularies}), and is the only ontology to provide ex-ante and ex-post concepts within the same ontology.
GDPRov thus advances the state of the art by providing the most comprehensive vocabulary for modelling and representing activities based on GDPR concepts.
\subsubsection{GConsent}
The determination of consent validity under the GDPR requires additional information \cite{politou_forgetting_2018,article_29_data_protection_working_party_guidelines_2018} which is not captured using GDPRov as it does not relate to representation of activities and artefacts.
Therefore, a separate ontology called GConsent was created (and is the basis for formulating research objective $RO3(c)$) to provide necessary concepts and relationships for representing information relevant for management of consent.
GConsent focuses on representation of \textit{only} consent information as required to evaluate its compliance. It acts as a distinct modular ontology which can be used by itself to represent consent, or in conjunction with GDPRov to represent consent and its related activities.
While GDPRov and GConsent both represent consent, the focus of GDPRov is on representing activities and artefacts associated with consent, while GConsent represents information associated with management of consent based on GDPR compliance requirements.
Another perspective on this is that GDPRov represents a specific semantic view based on the notion of capturing provenance of activities in ex-ante and ex-post phases, while GConsent represents a state-based representation of consent.
The application of these ontologies within use-cases in \autoref{chapter:testing} show, both ontologies share some concepts and overlap, but are complimentary in their use and represent different aims in their representation of information.
GConsent provides the necessary concepts and relationships to express information about consent in terms of entities such as individuals or agents, purposes and processing, involvement of third parties, medium and context of provision, relationship between instances (e.g. withdraws, updates), and the novel concept of `consent states' which enables management of consent as an entity.
In comparison with state of the art, GConsent provides greater representation of information related to consent and is the most comprehensive ontology for representing consent (based on comparisons demonstrated in \autoref{chapter:vocabularies}).
\subsection{Querying Information Related to Compliance using SPARQL}\label{sec:contributions:querying}
A minor contribution of this thesis is the utilisation of SPARQL to query information relevant for GDPR compliance, which fulfils research objective $RO4$.
The use of developed ontologies, namely GDPRtEXT, GDPRov, and GConsent - provide representation of concepts associated with GDPR for use in SPARQL queries to represent compliance questions derived from state of the art (see \autoref{chapter:information}).
Where approaches in state of the art also use SPARQL to represent questions for compliance \cite{agarwal_legislative_2018,palmirani_pronto_2018}, the work presented in \autoref{sec:testing:sparql} is the only one within the state of the art to demonstrate derivation of queries from questions associated with an investigation of compliance, i.e. compliance questions as presented in \autoref{sec:info:compliance-questions}.
A practical application of this demonstrates SPARQL queries derived from questions provided by the Irish Data Protection Commission for assisting organisations with their GDPR compliance readiness \cite{GDPR_readiness_checklist} and shows use of SPARQL in assisting the investigation process associated with compliance.
This application of SPARQL was published in a peer-reviewed publication \cite{pandit_queryable_2018} and was presented to members of the Irish Data Protection Commission as part of research developed in this thesis.
\subsection{Framework for Validating Information using SHACL Compliance}\label{sec:contributions:validation}
Another contribution of this thesis is the approach for using SHACL to validate information and linking the results with relevant clauses of GDPR for compliance, which fulfils research objective $RO5$.
While SPARQL is sufficient to query information and in some cases to determine compliance based on presence or absence of information, the use of SHACL provides a standardised approach for validation of information based on representing constraints and persisting the results of validation.
The validation using SHACL is part of a proposed framework presented in \autoref{sec:testing:shacl} which consists of creating a `compliance graph' for storing information relevant in the investigation and demonstration of compliance.
The validation requirements are derived from constraints and assumptions based on compliance questions in \autoref{sec:info:constraints}, and are represented using SHACL with a link to relevant clauses of the GDPR defined using GDPRtEXT to indicate their role in the compliance process.
The constraints expressed using SHACL utilise concepts and relationships from GDPRov and GConsent to represent validation requirements, and re-use SPARQL queries created for $RO4$ to retrieve information.
The validation results are persisted and annotated with GDPRtEXT to link them with the GDPR, thereby providing a form of documentation for information validation associated with compliance.
The framework suggests a more efficient form of validation by reusing ex-ante validation results in ex-post evaluations by abstracting common constraints belonging to ex-ante information and validating them in the ex-ante stage itself so that only specific constraints associated with instances in the ex-post stage - such as provenance information - need to be validated.
The demonstration of the framework and approach consists of evaluating consent on a real-world website to generate a `compliance report' listing status of validations linked to GDPR.
The framework and approach have been published in peer-reviewed publications \cite{pandit_towards_2018,pandit_exploring_2018,pandit_test-driven_2019}
Related work in state of the art uses a variety of approaches for validation and assessment of compliance. The SPECIAL project demonstrates use of OWL2 reasoners to validate consent at ex-ante and ex-post stages \cite{bonatti_fast_2018,dullaert_d3.4_2019} and the application of ODRL policies as a compliance checking mechanism \cite{agarwal_legislative_2018,vos_odrl_2019}. The MIREL project proposes the use of deontic logic for legal reasoning using LegalRuleML \cite{palmirani_pronto_2018,monica_modelling_2018}, while the BPR4GDPR project proposes checking provenance logs for conformance to predetermined processes (ex-post analysis) \cite{mehr_compliance_2019}.
The use of SHACL utilising P-Plan workflows to validate policies expressed in ODRL for GDPR compliance has been proposed \cite{lieber_policy-compliant_2019} as a doctoral consortium paper - which provides future directions for application of this research.
Compared to state of the art, the approach presented in this thesis is novel in its utilisation of SHACL to validate information and link its results with the GDPR for compliance. It is also novel in its combination and reuse of ex-ante and ex-post validations for compliance.
\subsection{Information Interoperability Model of the GDPR}
A minor contribution of this thesis consists of an information interoperability model based on representing categories of entities (stakeholders) as defined by GDPR and their interactions with respect to interoperability of information shaped by GDPR compliance requirements.
The model, described in \autoref{sec:info:model}, conceptualises interactions between stakeholders based on information identified as part of $RO1$ and $RO2$, and provides an overview of requirements regarding information and interoperability shaped by GDPR.
The model provides categorisation of information requirements based on provenance, agreements, consent, certification, and compliance; and assists in exploration of existing standards - including semantic web - by outlining requirements and applications of information based on interoperability between entities.
It advances state of the art by providing the first systemic analysis of information flows and interoperability between stakeholders, and serves to provide a framework for developing and evaluating potential consensus on interoperability of information for compliance between stakeholders.
The model, its analysis, and application in the context of right to data portability was published in peer-reviewed publications \cite{pandit_modelling_2017,pandit_exploration_2018} and as a book chapter \cite{pandit_standardisation_2020}.
\subsection{Participation in DPVCG}\label{sec:intro:dpvcg}
The Data Privacy Vocabularies and Controls Community Group\footnote{\url{https://www.w3.org/community/dpvcg/}} (DPVCG) is a W3C community group working towards developing a vocabulary associated with personal data processing based on relevant laws such as GDPR.
The group was created by members of the SPECIAL project in May 2018 and currently consists of community members from diverse domains such as academia, legal experts, lawyers, and industry stakeholders.
The work done within DPVCG in its 18 months of operation has produced the Data Privacy Vocabulary\footnote{\url{http://w3.org/ns/dpv}} (DPV) - an ontological resource for representing information associated with processing of data.
The DPV represents a community agreement of vocabulary and semantics of terms and concepts associated with GDPR, and provides a degree of interoperability in representing information for legal compliance.
The work regarding creation of DPV has been published in a peer-reviewed conference \cite{pandit_dpv_2019}, and has also been listed as a deliverable within the SPECIAL project \cite{pandit_d6.5_2019}.
The author of this thesis is listed as an editor and contributor in both publications, and is the co-chair DPVCG since January 2020.
The research presented in this thesis had an impact in the creation of DPV through use of developed ontologies as an input as well as through direct participation of the author as an active contributing member.
An overview of DPV is therefore presented in \autoref{sec:voc:DPV} along with comparisons to developed ontologies (GDPRtEXT, GDPRov, GConsent) and SotA.
To summarise the comparison, DPV provides a high-level abstraction of terms and concepts, whereas the ontologies in this thesis provide representations of information with more granularity and detail - which makes their usage with DPV complimentary rather than contradictory.
\subsection{Publications}\label{sec:intro:publications}
The following peer-reviewed publications present the research in this thesis (grouped by relevance, ordered chronologically reversed):
\subsubsection{Ontologies representing information for GDPR compliance}
The following publications are associated with $RO3$ - developing ontologies for representing the concepts and relationships within the GDPR.
\begin{enumerate}[start]
\item ``\textbf{GConsent - A Consent Ontology Based on the GDPR}'' \cite{pandit_gconsent_2019} \\
\textit{\textbf{H. J. Pandit}, C. Debruyne, D. O’Sullivan, and D. Lewis.} \\
\textit{16\textsuperscript{th} European Semantic Web Conference (ESWC), 2019.}
\vspace{0.1cm} \newline
This publication presents the GConsent ontology for representing information about consent as required by GDPR. GConsent fulfils research objective $RO3(c)$, and provides a detailed representation of consent for information management and documentation. GConsent is described in \autoref{sec:voc:GConsent}.
\item ``\textbf{GDPRtEXT - GDPR as a Linked Data Resource}'' \cite{pandit_gdprtext_2018} \\
\textit{\textbf{H. J. Pandit}, K. Fatema, D. O’Sullivan, and D. Lewis.} \\
\textit{15\textsuperscript{th} European Semantic Web Conference (ESWC), 2018.}
\vspace{0.1cm} \newline
This publication presents the GDPRtEXT resource consisting of a linked data representation of the text of GDPR, and a glossary of its concepts. It also provides a mapping from clauses of the DPD to GDPR based on reuse of compliance methods developed for DPD for GDPR. GDPRtEXT fulfils research objective $RO3(a)$, and is instrumental in providing semantic association between information and GDPR for approaches presented in this thesis. GDPRtEXT is described in \autoref{sec:voc:GDPRtEXT}.
\item ``\textbf{Modelling Provenance for GDPR Compliance using Linked Open Data Vocabularies}'' \cite{pandit_modelling_2017} \\
\textit{\textbf{H. J. Pandit}, and D. Lewis.} \\
\textit{5\textsuperscript{th} Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017), co-located with the 16\textsuperscript{th} International Semantic Web Conference (ISWC), 2017. }
\vspace{0.1cm} \newline
This publication presents the GDPRov ontology for representing the provenance of personal data and consent for GDPR, and discusses use of its concepts in SPARQL queries for retrieving information associated with compliance. GDPRov fulfils research objective $RO3(b)$, and provides ex-ante and ex-post representations for activities associated with personal data and consent for GDPR. GDPRov is described in \autoref{sec:voc:GDPRov}.
\item \textbf{Compliance through Informed Consent: Semantic Based Consent Permission and Data Management Model} \cite{fatema_compliance_2017} \\
\textit{K. Fatema, E. Hadziselimovic, \textbf{H. J. Pandit}, C. Debruyne, D. Lewis, and D. O’Sullivan.} \\
\textit{5\textsuperscript{th} Workshop on Society, Privacy and the Semantic Web - Policy and Technology (PrivOn2017), co-located with the 16\textsuperscript{th} International Semantic Web Conference (ISWC), 2017. }
\vspace{0.1cm} \newline
This publication presents an early (pre-GDPR enforcement) collaboration in developing a preliminary ontology for representing consent and a data management model for GDPR. The early work was crucial towards understanding complexities of consent, and provided valuable feedback towards development of GConsent.
\item ``\textbf{Linked Data Contracts to Support Data Protection and Data Ethics in the Sharing of Scientific Data}'' \cite{hadziselimovic_linked_2017} \\
\textit{E. Hadziselimovic, K. Fatema, \textbf{H. J. Pandit}, and D. Lewis.} \\
\textit{Workshop on Enabling Open Semantic Science (SemSci), co-located with the 16\textsuperscript{th} International Semantic Web Conference (ISWC), 2017.}
\vspace{0.1cm} \newline
This publication presents an early collaboration (pre-GDPR) towards developing an ontology for representing data sharing agreements for GDPR by extending the ODRL ontology. The ontology enables representation of obligations associated with propagation of rights between parties that share or exchange data.
\end{enumerate}
\subsubsection{Querying and validating information for GDPR compliance}
The following publications are associated with $RO4$ - querying for information, and $RO5$ - validating information for compliance.
\begin{enumerate}[resume]
\item ``\textbf{Test-driven Approach Towards GDPR Compliance}'' \cite{pandit_test-driven_2019} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{14\textsuperscript{th} International Conference on Semantic Systems (SEMANTiCS), 2019.}
\vspace{0.1cm} \newline
This publication presents implementation of approach for validation of information by utilising the use-case of consent in a real-world website. It utilises SHACL to validate information represented by GDPRov and GConsent, and uses GDPRtEXT to associate constraints and results with GDPR. It also demonstrates use of SPARQL to identify tasks and reports related to compliance by querying validation results. The approach demonstrates usefulness of combining ex-ante and ex-post approaches in terms of efficiency and compliance. This research fulfils research objective $RO5$ and is presented in \autoref{sec:testing:shacl}.
\item ``\textbf{Queryable Provenance Metadata For GDPR Compliance}'' \cite{pandit_queryable_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{14\textsuperscript{th} International Conference on Semantic Systems (SEMANTiCS), 2018.}
\vspace{0.1cm} \newline
This publication presents use of SPARQL queries to represent questions associated with compliance by using GDPRtEXT and GDPRov ontologies.
It demonstrates effectiveness of SPARQL in retrieving information for GDPR compliance, and fulfils research objective $RO4$. This work is presented in \autoref{sec:testing:sparql}.
\item ``\textbf{ Exploring GDPR Compliance Over Provenance Graphs Using SHACL}'' \cite{pandit_exploring_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{14\textsuperscript{th} International Conference on Semantic Systems (SEMANTiCS) - Posters track, 2018.}
\vspace{0.1cm} \newline
This publication presents an overview of the approach for validating information using SHACL and associating results with specific articles of GDPR. The approach proposes persistence of validation results to create a `compliance graph' that can be queried and validated for documenting information for compliance. This work is presented in \autoref{sec:testing:shacl:approach}.
\item ''\textbf{Towards Knowledge-based Systems for GDPR Compliance}'' \cite{pandit_towards_2018} \\
\textit{\textbf{H. J. Pandit}, C. Debruyne, D. O’Sullivan, and D. Lewis.} \\
\textit{International Workshops on Contextualized Knowledge Graphs (CKG), co-located with 17\textsuperscript{th} International Semantic Web Conference (ISWC), 2018.}
\vspace{0.1cm} \newline
This publication explores creation of a knowledge-based framework based on utilisation of information associated with compliance using semantic web technologies for applications such as creation of reports, documentation, and assessment of compliance for different stakeholders. The approach was used in conjunction with the above publication in addressing research objective $RO5$.
\end{enumerate}
\subsubsection{Model for information interoperability based on requirements of GDPR compliance}
These publications present a model of interaction between entities as defined by the GDPR, and explore information categories and their interoperability requirements based on existing standards, including those provided by the semantic web.
The model provides an overview of information flows between stakeholders, and the role of interoperability in facilitating information for compliance between them. This research is presented in \autoref{sec:info:model}.
\begin{enumerate}[resume]
\item ``\textbf{Standardisation, Data Interoperability, and GDPR}'' \cite{pandit_exploration_2018} \\
\textit{\textbf{H. J. Pandit}, C. Debruyne, D. O’Sullivan, and D. Lewis.} \\
\textit{Book Chapter in Shaping the Future Through Standardization, 2019}
\item ``\textbf{An Exploration of Data Interoperability for GDPR}'' \cite{pandit_standardisation_2020} \\
\textit{\textbf{H. J. Pandit}, C. Debruyne, D. O’Sullivan, and D. Lewis.} \\
\textit{International Journal of Standardization Research (IJSR) , Vol. 16 Issue. (1), 2018}
\item ``\textbf{GDPR Data Interoperability Model}'' \cite{pandit_gdpr_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{23\textsuperscript{rd} European Academy for Standardisation Annual Standardisation Conference (EURAS), 2018}
\end{enumerate}
\subsubsection{Investigated applications of research - Information Management}
The following publications do not directly address the research question, but consist of applying the research presented in this thesis towards processes that assist with the compliance process.
\begin{enumerate}[resume]
\item ``\textbf{Towards Generating Policy- Compliant Datasets}'' \cite{debruyne_towards_2019} \\
\textit{C. Debruyne, \textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{13\textsuperscript{th} IEEE International Conference on Semantic Computing (ICSC), 2019.}
\vspace{0.1cm} \newline This publication presents an approach for generating just-in-time datasets consisting of personal data based on given consent to ensure processes are compliant in their usage of consent.
\item ``\textbf{GDPR-driven Change Detection in Consent and Activity Metadata}'' \cite{pandit_gdpr-driven_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{4\textsuperscript{th} Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW), co-located with 15\textsuperscript{th} European Semantic Web Conference (ESWC), 2018.}
\vspace{0.1cm} \newline This publication proposes an approach for detecting changes related to use of personal data and consent in activities by utilising the ex-ante component of P-Plan to represent activities and comparing them using a graph-based algorithm.
\end{enumerate}
\subsubsection{Investigated Applications of Research - Privacy Policies}
The following publications do not directly address the research question, but consist of applying the research presented in this thesis towards privacy policies.
\begin{enumerate}[resume]
\item ``\textbf{Extracting Provenance Metadata from Privacy Policies}'' \cite{pandit_extracting_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{7\textsuperscript{th} International Provenance \& Annotation Workshop (IPAW), par t of Provenance Week, 2018.}
\vspace{0.1cm} \newline This publication discusses use of GDPRov to represent extracted information about activities associated with personal data within a privacy policy.
\item ``\textbf{An Ontology Design Pattern for Describing Personal Data in Privacy Policies}'' \cite{pandit_ontology_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{9\textsuperscript{th} Workshop on Ontology Design and Patterns (WOP), co-located with 17\textsuperscript{th} International Semantic Web Conference (ISWC), 2018.}
\vspace{0.1cm} \newline This publication presents an ontology design pattern that uses GDPRov and GDPRtEXT to represent information about personal data and its processing in a privacy policy.
\item ``\textbf{Personalised Privacy Policies}'' \cite{pandit_personalised_2018} \\
\textit{\textbf{H. J. Pandit}, D. O’Sullivan, and D. Lewis.} \\
\textit{4\textsuperscript{th} International Workshop on TEchnical and LEgal aspects of data pRIvacy and SEcurity (TELERISE), co-located with 22\textsuperscript{nd} European Conference on Advances in Databases and Information Systems, 2018.}
\vspace{0.1cm} \newline This publication discusses personalisation of privacy policies by using information about an individual's personal data processing and using GDPRtEXT and GDPRov to annotate it for a machine-readable representation.
\end{enumerate}
\subsubsection{Data Privacy Vocabulary}
The following publication presents work related to creation of the Data Privacy Vocabulary by DPVCG and describes the methodology used with relation to the existing vocabularies - including those presented in this thesis - namely GDPRtEXT, GDPRov, and GConsent. The DPV is described in \autoref{sec:voc:DPV}.
\begin{enumerate}[resume]
\item ``\textbf{Creating A Vocabulary for Data Privacy}'' \cite{pandit_creating_2019} \\
\textit{\textbf{H. J. Pandit}, A. Polleres, B. Bos, R. Brennan, B. Bruegger, F. J. Ekaputra, J. D. Fernández, R. G. Hamed, E. Kiesling, M. Lizar, E. Schlehahn, S. Steyskal, R. Wenning} \\
\textit{18\textsuperscript{th} International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), 2019.}
\end{enumerate}
\section{Thesis Overview}
The rest of this thesis is structured as follows:
\subsubsection*{\autoref{chapter:background}: Background on GDPR and Semantic Web}
This chapter presents a summary of information required to understand the work presented in this thesis. The chapter consists of two sections: the first describes concepts and requirements of GDPR, while the second section describes semantic web technologies with an overview of its standards and vocabularies.
\subsubsection*{\autoref{chapter:sota}: State of the Art}
This chapter reviews existing work and approaches regarding regulatory compliance with a specific focus on those addressing GDPR compliance. The chapter starts by providing an overview of approaches used for legal compliance. It then presents an in-depth review of approaches utilising semantic web technologies to address GDPR compliance requirements, followed by other approaches for GDPR compliance. Approaches which do not directly address the GDPR, but are relevant to legislative compliance and semantic web are also presented. The chapter then presents an analysis of the state of the art and concludes with a discussion on identified gaps and limitations.
\subsubsection*{\autoref{chapter:information}: Information Required for GDPR Compliance}
This chapter presents information required for GDPR compliance of activities associated with processing of personal data and consent in ex-ante and ex-post phases.
The chapter starts by presenting an information model for interoperability of information between stakeholders defined by the GDPR.
The model provides an analysis of information interoperability requirements based on requirements of GDPR compliance and the role of existing standards in addressing them.
This is followed by expressing information requirements as analytical questions - termed `compliance questions' - whose answers provide the information necessary to evaluate compliance.
The chapter then concludes with identification of constraints and assumptions which can be used to validate information for GDPR compliance.
\subsubsection*{\autoref{chapter:vocabularies}: Representing Information for GDPR Compliance using Ontologies}
This chapter presents the OWL2 ontologies developed to represent information associated processing of personal data and consent for GDPR compliance.
The ontologies present concepts for answering compliance queries presented in \autoref{chapter:information}. The first ontology presented is GDPRtEXT - which provides a method to link information with concepts and clauses of GDPR through a linked data version of its text and a vocabulary of concepts. The second ontology presented is GDPRov - which enables representation of provenance information regarding personal data and consent as models or templates and their executions or activity logs. The third ontology presented is GConsent - which enables representation of information associated with consent. The chapter presents an overview of concepts and relationships for each vocabulary, its relation with GDPR, and the competency questions used to guide its development. The chapter also presents a brief overview of the Data Privacy Vocabulary and its comparison with the other presented ontologies and the SotA.
\subsubsection*{\autoref{chapter:testing}: Querying and Validating Information for GDPR Compliance}
This chapter presents use of SPARQL to express compliance queries using ontologies presented in Chapter 5. The chapter also presents a framework to validate information using SHACL based on constraints identified in Chapter 4. The framework demonstrates use of semantic web technologies in validating information for GDPR compliance by utilising a combination of ex-ante and ex-post validations and linking of results with GDPR for documentation of information for compliance.
\subsubsection*{\autoref{chapter:conclusion}: Conclusion}
This chapter concludes the thesis with a summary of key findings and outcomes of the presented work. It discusses the extent to which the thesis serves to address the research question(s) and objective(s), and outlines directions for future work in terms of potential applications and extension through related work.