GrETEL stands for Greedy Extraction of Trees for Empirical Linguistics. It is a user-friendly search engine for the exploitation of syntactically annotated corpora or treebanks
This is the current version of GrETEL which is being maintained, but active development currently takes place at https://github.com/UUDigitalHumanitieslab/gretel-django as GrETEL 5. That has a completely rebuild back-end and which will eventually be merged into this repository.
GrETEL is publicly available at https://gretel.hum.uu.nl (version 4) or https://gretel5.hum.uu.nl (version 5)
The stable predecessor can be found at http://gretel.ccl.kuleuven.be/gretel3 (and the source at https://github.com/CCL-KULeuven/gretel/).
- v4.2.0 August 2019: federated search, improved configuration and state management, download results with node properties and again many more fixes.
- v4.1.0 February 2019: Fixed support for GrInded corpora, many more fixes, feature complete replacement of version 3.
- v4.0.2 October 2018: GrETEL 4 release with many bugfixes and improvements.
- v4.0.0 June 2018: First GrETEL 4 release with new interface.
- v3.9.99 November 2017: GrETEL 4 currently under development!
- v3.0.2 July 2017: Show error message if the BaseX server is down
- v3.0. November 2016: GrETEL 3 initial release. Available at http://gretel.ccl.kuleuven.be/gretel3
master: official version of GrETEL 4, available at http://gretel.hum.uu.nl/gretel3/
dev: development version
gretel2.0: official version of GrETEL 2.0, available at http://gretel.ccl.kuleuven.be/gretel-2.0
Next to a standard LAMP server (with a PHP version > 5.4), GrETEL requires the following packages to be installed on your machine:
- Download (or clone) GrETEL from GitHub.
- Download the Alpino dependency parser. Current binary used in the live version:
Alpino-x86_64-linux-glibc2.5-20548-sicstus
(available here).
It is recommended to use the same version used for creating the treebanks. This way an example based search will result in the same search structure as stored in the database.
- Create BaseX databases containing the treebanks you want to make available (not necessary when using GrETEL-upload).
- Adapt
config.example.php
file and change name toconfig.php
, and then:
- Set the path to the Alpino dependency parser in the variable
$alpinoDirectory
(by default: directoryparsers
) - Set BaseX variables (machine names, port numbers, password and username)
- Set path for the Python virtual environment or other place where the required commands are installed.
- Install composer to be able to install PHP dependencies.
- Enable the rewrite module (e.g.
sudo a2enmod rewrite && sudo systemctl restart apache2
). - Set
AllowOverride
toAll
to allow.htaccess
to set the settings for the rewrite module. - Run
pip install -r requirements.txt
. - Run
npm run build
to compile all the remaining dependencies. - Make sure
tmp
andlog
folders exist in the root and can be accessed by Apache.
Only the properties of the first node matched by an XPATH variable is returned for analysis. For example:
A user searches for //node[node]
. Two variables are found in this query: $node1 = //node
and $node2 = $node1[node]
.
The following sentence would match this query:
node[np] (node[det] node[noun])
The node found for $node1
will then be node[np]
.
The node found for $node2
will then be node[det]
. The properties of node[noun]
will not be available for analysis using this query.
When searching for a more specific structure, this is unlikely to occur.
The Angular front-end can be found under web-ui
and run from there: npm run start
. You can also use npm run start:live
to use the production back-end.
- The results that are flushed to the user at a time as well as the maximum results that will be fetched is stored in variables in
config.php
. Change$flushLimit
and$resultsLimit
to the values that you want. - Scripts are organised according to their function:
api/
: entry point for server calls from the front-end (throughapi/src/router.php
).basex-search-scripts/
: scripts that are required to do the actual searching for results. However, thebasex-client.php
is sometimes needed in other cases as well to open up a BaseX session.preparatory-scripts/
: scripts that run functions on the input leading up to but not including the actual fetching of results. These scripts manipulate do things such as creating XPath, generating breadth-first patterns, parsing the input, and modifying input examples.functions.php
: contains general functions that are often required but that are not specific to any part of the process
- Liesbeth Augustinus and Vincent Vandeghinste: concept and initial implementation;
- Bram Vanroy: GrETEL 3 improvements and design;
- Martijn van der Klis: initial GrETEL 4 functionality and improvements;
- Sheean Spoel, Gerson Foks and Jelte van Boheemen: additional GrETEL 4 functionality and improvements;
- Koen Mertens: federated search at Instituut voor de Nederlandse taal.
- Colleagues at the Centre for Computational Linguistics at KU Leuven, and Utrecht University Digital Humanities Lab for their feedback.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-sa-4.0). See the LICENSE file for license rights and limitations.