Skip to content

Commit 143d837

Browse files
authored
Configure poetry. (#8)
1 parent f39ff60 commit 143d837

15 files changed

+3629
-135
lines changed

.github/workflows/build.yml

-35
This file was deleted.

.github/workflows/pr-validation.yml

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: Pull Request Validation Pipeline
2+
on: [pull_request]
3+
4+
jobs:
5+
pr-validation:
6+
name: Pull Request Validation
7+
runs-on: ubuntu-22.04
8+
steps:
9+
- name: Generate GitHub App Token
10+
id: generate_github_app_token
11+
uses: tibdex/[email protected]
12+
with:
13+
app_id: ${{ secrets.GH_APP_ID }}
14+
private_key: ${{ secrets.GH_APP_KEY }}
15+
16+
- name: Checkout repository
17+
uses: actions/checkout@v3
18+
with:
19+
fetch-depth: 0
20+
token: ${{ steps.generate_github_app_token.outputs.token }}
21+
persist-credentials: true
22+
23+
- name: Setup Python
24+
uses: actions/setup-python@v4
25+
with:
26+
python-version: 3.8
27+
28+
- name: Install & configure Poetry
29+
uses: snok/[email protected]
30+
with:
31+
version: 1.5.1
32+
33+
- name: Install & validate Poetry dependencies
34+
run: |
35+
poetry install
36+
poetry check
37+
38+
- name: Run pylint
39+
run: |
40+
set -e # fail on error
41+
poetry run pylint -j 0 --fail-under=9.0 src
42+
43+
- name: Run isort
44+
run: |
45+
set -e # fail on error
46+
poetry run isort --check-only .
47+
48+
- name: Run black
49+
run: |
50+
set -e # fail on error
51+
poetry run black --check .
52+

.github/workflows/tag-and-release.yml

+22-4
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@ jobs:
2323
fetch-depth: 0
2424
token: ${{ steps.generate_github_app_token.outputs.token }}
2525
persist-credentials: true
26+
27+
- name: Configure Git Credentials
28+
run: |
29+
git config user.name "GitHub Actions [bot]"
30+
git config user.email "[email protected]"
2631
2732
- name: Install .NET Core
2833
uses: actions/setup-dotnet@v3
@@ -43,18 +48,31 @@ jobs:
4348
useConfigFile: true
4449
configFilePath: GitVersion.yml
4550

46-
- name: Tag and Release
51+
- name: Setup Python
52+
uses: actions/setup-python@v4
53+
with:
54+
python-version: 3.8
55+
56+
- name: Install & configure Poetry
57+
uses: snok/[email protected]
58+
with:
59+
version: 1.5.1
60+
61+
- name: Update poetry package version
62+
run: poetry version ${{ steps.get_gitversion.outputs.majorMinorPatch }}
63+
64+
- name: Commit, tag and release new version
4765
env:
4866
GITHUB_TOKEN: ${{ steps.generate_github_app_token.outputs.token }}
4967
run: |
50-
git config user.name "GitHub Actions [bot]"
51-
git config user.email "[email protected]"
5268
git status # This is just to check if the git config worked
5369
git pull origin main # In case any changes were made since checkout
70+
git add pyproject.toml # Add dependency version changes
71+
git commit -m "Update project version to ${{ steps.get_gitversion.outputs.majorMinorPatch }} [skip ci]"
5472
git tag ${{ steps.get_gitversion.outputs.majorMinorPatch }} -m "Release ${{ steps.get_gitversion.outputs.majorMinorPatch }}"
5573
git push --tags
5674
57-
# Create a release
75+
# Create a GitHub release from latest tag
5876
gh release create ${{ steps.get_gitversion.outputs.majorMinorPatch }} \
5977
--title "${{ steps.get_gitversion.outputs.majorMinorPatch }}" \
6078
--generate-notes

.gitignore

-4
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,6 @@ wheels/
3131
*.manifest
3232
*.spec
3333

34-
# Installer logs
35-
pip-log.txt
36-
pip-delete-this-directory.txt
37-
3834
# Unit test / coverage reports
3935
htmlcov/
4036
.tox/

README.md

+51-14
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,43 @@
11
# model-training
22
Contains the ML training pipeline used for the main project of course CS4295: Release Engineering for Machine Learning Applications. This pipeline is of an ML model that evaluates restaurant reviews. The repository structure is based off the Cookiecutter template.
33

4-
## Dependencies
5-
All required packages can be found in `requirements.txt`. To install the required packages, run the following command:
4+
## **Dependencies**
5+
6+
This project is using Poetry instead of Pip to manage dependencies. Poetry is a Python dependency management tool that simplifies the process of managing dependencies and packaging. Additionally, Poetry is also used to manage the virtual environment from which the project is run, thus not requiring the user to manually create a virtual environment.
7+
8+
### **Installation (Poetry)**
9+
10+
To install Poetry, please follow the instructions on the [Poetry website](https://python-poetry.org/docs/#installation) and follow the corresponding steps for your operating system.
11+
12+
### **Installing dependencies**
13+
14+
To install the project dependencies, please run the following command:
615

716
```bash
8-
pip install -r dep/requirements.txt
17+
poetry install
918
```
1019

11-
## Usage
20+
This will install all dependencies listed in `pyproject.toml` and create a virtual environment for the project. As such, instead of using `pip` to install a specific dependency and then run that dependency in a virtual environment, Poetry will handle this for you.
21+
22+
### **Adding a new dependency**
23+
24+
To add a new dependency, please run the following command:
25+
26+
```bash
27+
poetry add <dependency-name>
28+
```
29+
30+
This will add the dependency to `pyproject.toml` and install it in the virtual environment.
31+
However, if you would like to install a dependency for development purposes, please run the following command:
32+
33+
```bash
34+
poetry add --dev <dependency-name>
35+
```
36+
37+
In any case, dependency changes will also show up in the `poetry.lock` file. This file is used to ensure that all developers are using the same versions of the dependencies. Consequently, it is good practice and actually recommended that this file is committed to version control.
38+
39+
## **Usage**
40+
1241
In order to run the pipeline, ensure that you have `dvc` installed and run the following command:
1342

1443
```bash
@@ -21,39 +50,47 @@ To view a graphical representation of the pipeline, run the following command:
2150
``` bash
2251
dvc dag
2352
```
24-
### Remote
53+
### **Remote**
54+
2555
A Google drive folder has been configured to be used as remote storage.
2656

27-
### Testing
57+
### **Testing**
58+
2859
In order to test the ML pipeline, several tests are performed which can be found in `tests/`. These are ran automatically as part of the pipeline. They can be manually ran using the following command:
2960

3061
```bash
31-
pytest
62+
poetry run pytest
3263
```
33-
### Metrics
64+
65+
### **Metrics**
66+
3467
The accuracy metric is stored in `reports/model_evaluation.json`. In order to see the experiment history, run the following command:
3568

3669
```bash
3770
dvc exp show
3871
```
39-
Two experiments are listed, comparing the use of a 20% and 10% test split size.
40-
### Dataset
72+
Two experiments are listed, comparing the use of a 20% and 10% test split size.
73+
74+
### **Dataset**
75+
4176
Project was created using the dataset provided by course instructors on [SURFdrive](https://surfdrive.surf.nl/files/index.php/s/207BTysNQFuVZPE?path=%2Fmaterial).
4277

43-
### Preprocessing
78+
### **Preprocessing**
79+
4480
Any preprocessing steps can be found in `preprocessing.py`. These are executed automatically with the execution of the pipeline. Processed data (corpus) is stored in `data/processed/`.
4581

46-
### Storing the trained model
82+
### **Storing the trained model**
83+
4784
The trained model is stored in `data/models/`.
4885

86+
## **Pylint & DSLinter**
4987

50-
## Pylint & DSLinter
5188
Pylint and DSLinter have been used and configured to ensure the code quality. All configuration options can be found in `.pylintrc`. This configuration file is based on [this example from the DSLinter documentation](https://github.com/SERG-Delft/dslinter/blob/main/docs/pylint-configuration-examples/pylintrc-for-ml-projects/.pylintrc). Besides this, there are a few custom changes, such as adding the variable names `X_train`, `X_test` etc. to the list of accepted variable names by Pylint, as these variable names are commonly used in ML applications. The `init_hook` variable in `.pylintrc` is also set to the path of this directory, in order to ensure that all imports within the code do not result in a warning from Pylint.
5289

5390
If you would like to manually verify the code quality, please run the following command:
5491

5592
```bash
56-
pylint src
93+
poetry run pylint src
5794
```
5895

5996
DSLinter is configured and will automatically run. This should return a perfect score of 10.00. A report summarising the findings can be found in `data/reports/`.

0 commit comments

Comments
 (0)