New location for further development and issue tracking is at: https://opencode.it4i.eu/lexis-platform/clients/py4lexis.
Py4Lexis is a Python package which provides a manager to handle datasets within the LEXIS DDI (upload, download, list existing datasets) and a manager to handle workflows in the LEXIS Airflow instance (list existing workflows, execute workflows, check state). For both, there exist classes suitable for further processing within a script, and classes that work in interactive mode by printing the content of requests and other messages right into the console/terminal.
Logging in to the LEXIS session is performed via the LEXIS login page. Thus, B2Access and MyAccessID could be used to log in (only in interactive way!). Support of using cron-like commands to use Py4Lexis will be implemented in near future. Hovewer, logging in using console/terminal could be also used but only LEXIS credentials (username/password) are supported. Uploading of files (datasets) to the LEXIS DDI is performed by tuspy Python package, i.e., by TUS Client.
-
Download the repository.
-
Create a virtual environment within the Py4Lexis repository:
python -m venv ./venv
-
Activate the virtual environment:
source path/to/Py4Lexis/.venv/bin/activate
-
Install the package from root of Py4Lexis repository:
python -m pip install .
It will install the Py4Lexis package with all dependencies defined in requirements.txt.
To log in to the LEXIS session via LEXIS login page, use:
session = LexisSession(login_method="browser")
You will be redirected to the LEXIS login page (PORT 8083 is used!). LEXIS username/password, B2Access or MyAccessID could be used as credentials. This method works only in interactive sense. Support of using cron-like commands to use Py4Lexis will be implemented in near future.
NOTE: login_method="password" could be also used. Logging in is performed via console/terminal but only LEXIS credentials (username/password) are supported.
Example of how to use Py4Lexis to manage datasets within LEXIS DDI. Further information about classes/functions can be found in comments within each class/function.
NOTE: It is considered that for given examples below an interactive DatasetCLI class is used.
Interactive class to manage datasets which functions return None
because they print content to console/terminal is imported from:
from py4lexis.cli import DatasetsCLI
ds = DatasetsCLI(session)
NOTE: Filter parameters (as shown below) are available only in interactive class DatasetsCLI.
A core class to manage datasets which functions return content of requests (suitable for further processing) could be also used, and is imported from:
from py4lexis.ddi import Datasets
core_ds = Datasets(session)
To create new dataset, use:
ds.create_dataset(access="DATASET_ACCESS",
project="PROJECT_SHORTNAME")
To create new dataset with files upload, use:
ds.tus_uploader_new(access="DATASET_ACCESS",
project="PROJECT_SHORTNAME",
filename="FILENAME",
file_path="FILEPATH",
title=["TITLE"])
To rewrite existing files in dataset or to upload new files to it, use:
ds.tus_uploader_rewrite(dataset_id="DATASET_INTERNAL_ID",
dataset_title="DATASET_TITLE",
project="PROJECT_SHORTNAME",
access="ACCESS",
filename="FILENAME",
file_path="FILEPATH")
NOTE: DATASET_TITLE have to be same as it is identified by DATASET_INTERNAL_ID.
To get datasets' upload status, use:
ds.get_dataset_status(filter_project="PROJECT_SHORTNAME")
You can also filter the content by filter_filename or filter_task_state (one of ["PENDING", "SUCCESS"]).
NOTE: Filters are available only in interactive class.
To list all existing datasets, use:
ds.get_all_datasets(filter_access="DATASET_ACCESS",
filter_project="PROJECT_SHORT_NAME")
Can be also filtered by: filter_title, filter_zone.
NOTE: Function should be used to obtain datasets' InternalID. Filters are available only in interactive class.
To delete dataset, use:
ds.delete_dataset_by_id(dataset_id="DATASETS_INTERNAL_ID",
access="DATASET_ACCESS",
project="PROJECT_SHORT_NAME")
To download whole dataset, use:
ds.download_dataset(dataset_id="DATASET_INTERNAL_ID",
access="DATASET_ACCESS",
project="PROJECT_SHORT_NAME")
To list all files in dataset as ASCII directory tree, use:
ds.get_list_of_files_in_dataset(dataset_id="DATASET_INTERNAL_ID",
access="DATASET_ACCESS",
project="PROJECT_SHORT_NAME",
print_dir_tree=True)
NOTE: print_dir_tree is available only in interactive class.
To list all files in dataset as DataFrame table, use:
ds.get_list_of_files_in_dataset(dataset_id="DATASET_INTERNAL_ID",
access="DATASET_ACCESS",
project="PROJECT_SHORT_NAME",
print_dir_tree=False)
NOTE: print_dir_tree is available only in interactive class.
To obtain dataset's path needed within workflow's transfers, use:
ds.get_dataset_path(access="DATASETS_ACCESS",
project="PROJECT_SHORT_NAME",
internalID="DATASET_INTERNAL_ID")
Example of how to use Py4Lexis to Airflow's workflows within LEXIS Airflow instance. Further information about classes/functions can be found in comments within each class/function.
NOTE: It is considered that for given examples below an interactive AirflowCLI class is used.
Interactive class to manage datasets could be initialised by code below. Functions mostly return None
because they print content to console/terminal. Only get_workflow_params returns the content.
from py4lexis.cli.airflow import AirflowCLI
airflow = AirflowCLI(session)
A core class to manage workflows could be also used. Functions return content of requests which can be processed further within a script.
from py4lexis.workflows.airflow import Airflow
core_airflow = Airflow(session)
To get a table of all existing workflows, use:
airflow.get_workflows_list()
To get workflow's info, use:
airflow.get_workflow_info(workflow_id="WORKFLOW_(DAG)_ID")
To get more workflow's info, use:
airflow.get_workflow_details(workflow_id="WORKFLOW_(DAG)_ID")
To obtain workflow's paramateres, use:
wrf_params = airflow.get_workflow_params(workflow_id="WORKFLOW_(DAG)_ID")
NOTE: It prints the content to the console/terminal and also returns it as the output. Such an output can be used for executing workflow as shown below.
To execute workflow, use:
airflow.execute_workflow(workflow_id="WORKFLOW_(DAG)_ID",
workflow_parameters=wrf_params)
To get all workflow's run states, use:
airflow.get_workflow_states(workflow_id="WORKFLOW_(DAG)_ID",
filter_by_workflow_state="running")
NOTE: Can be also filtered by Workflow Run ID. In that case, filter_by_workflow_state will be ignored. Filters can be only used in interactive class.