Skip to content

Commit 9122e00

Browse files
committed
How to insert files not using the web api
1 parent 3db39a9 commit 9122e00

File tree

5 files changed

+55
-0
lines changed

5 files changed

+55
-0
lines changed

readme.md

+15
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,21 @@ To run the docker instance with a custom java maximum memory allocation of 6GB a
4848
docker run -it -p 127.0.0.1:3000:1189 --init --entrypoint "java" --memory=8g --memory-swap=8g --rm ghcr.io/ucrel/lexidb:latest -Xmx6g -jar lexidb-2.0.jar ./app.properties
4949
```
5050

51+
#### Formatting / Importing data
52+
53+
If you would like to import data into LexiDB without having to use the web API, you can do this through the java insert script. The java insert script converts the data files you want to import into a format that LexiDB can read. The insert script takes 3 arguments:
54+
55+
1. File path to a `app.properties` files.
56+
2. Name of the corpus / database. This is equivalent to the name of the database in a MySQL database.
57+
3. File path to the corpus configuration file.
58+
4. File path to the files to insert.
59+
60+
``` bash
61+
docker run -v $(pwd)/test_data:/lexidb/lexi-data --entrypoint "java" --rm ghcr.io/ucrel/lexidb:latest -cp lexidb-2.0.jar util/Insert /lexidb/lexi-data/app.properties example /lexidb/lexi-data/.conf.json /lexidb/lexi-data
62+
```
63+
64+
In the command above we have created a new database called `example` whereby the [`/lexidb/lexi-data/app.properties`](./test_data/app.properties) states that this `example` corpus will be stored on the docker container in the folder `/lexidb/data` within the folder `/lexidb/data/example`.
65+
5166
### Build Docker
5267

5368
If you would like to build the docker image locally:

test_data/.conf.json

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"name": "tokens",
3+
"sets": [
4+
{
5+
"name": "tokens",
6+
"columns": [
7+
{
8+
"name": "token"
9+
}
10+
]
11+
},
12+
{
13+
"name": "file",
14+
"rle": true,
15+
"columns": [
16+
{
17+
"name": "$file"
18+
}
19+
]
20+
}
21+
]
22+
}

test_data/app.properties

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
data.path=/lexidb/data
2+
kwic.context=5
3+
result.page.size=100
4+
block.size=10000000

test_data/output_example1.tsv

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
token
2+
This
3+
is
4+
a
5+
simple
6+
test
7+
file
8+
.

test_data/output_example2.tsv

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
token
2+
Another
3+
,
4+
test
5+
file
6+
.

0 commit comments

Comments
 (0)