remla23-team08
diff --git a/‎.github/workflows/build.yml
-35 b/‎.github/workflows/build.yml
-35
diff --git a/‎.github/workflows/pr-validation.yml
+52 b/‎.github/workflows/pr-validation.yml
+52
diff --git a/‎.github/workflows/tag-and-release.yml
+22-4 b/‎.github/workflows/tag-and-release.yml
+22-4
diff --git a/‎.gitignore
-4 b/‎.gitignore
-4
diff --git a/‎README.md
+51-14 b/‎README.md
+51-14
@@ -0,0 +1,52 @@
+name: Pull Request Validation Pipeline
+on: [pull_request]
+
+jobs:
+  pr-validation:
+    name: Pull Request Validation
+    runs-on: ubuntu-22.04
+    steps:
+      - name: Generate GitHub App Token
+        id: generate_github_app_token
+        uses: tibdex/[email protected]
+        with:
+          app_id: ${{ secrets.GH_APP_ID }}
+          private_key: ${{ secrets.GH_APP_KEY }}
+      
+      - name: Checkout repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+          token: ${{ steps.generate_github_app_token.outputs.token }}
+          persist-credentials: true
+      
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.8
+      
+      - name: Install & configure Poetry
+        uses: snok/[email protected]
+        with:
+          version: 1.5.1
+      
+      - name: Install & validate Poetry dependencies
+        run: |
+          poetry install
+          poetry check
+      
+      - name: Run pylint
+        run: |
+          set -e # fail on error
+          poetry run pylint -j 0 --fail-under=9.0 src
+      
+      - name: Run isort
+        run: |
+          set -e # fail on error
+          poetry run isort --check-only .
+      
+      - name: Run black
+        run: | 
+          set -e # fail on error
+          poetry run black --check .
+      
@@ -23,6 +23,11 @@ jobs:
           fetch-depth: 0
           token: ${{ steps.generate_github_app_token.outputs.token }}
           persist-credentials: true
+    
+      - name: Configure Git Credentials
+        run: |
+          git config user.name "GitHub Actions [bot]"
+          git config user.email "[email protected]"
       
       - name: Install .NET Core
         uses: actions/setup-dotnet@v3
@@ -43,18 +48,31 @@ jobs:
           useConfigFile: true
           configFilePath: GitVersion.yml
 
-      - name: Tag and Release
+      - name: Setup Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: 3.8
+    
+      - name: Install & configure Poetry
+        uses: snok/[email protected]
+        with:
+          version: 1.5.1
+      
+      - name: Update poetry package version
+        run: poetry version ${{ steps.get_gitversion.outputs.majorMinorPatch }}
+      
+      - name: Commit, tag and release new version
         env: 
           GITHUB_TOKEN: ${{ steps.generate_github_app_token.outputs.token }}
         run: |
-          git config user.name "GitHub Actions [bot]"
-          git config user.email "[email protected]"
           git status # This is just to check if the git config worked
           git pull origin main # In case any changes were made since checkout
+          git add pyproject.toml # Add dependency version changes
+          git commit -m "Update project version to ${{ steps.get_gitversion.outputs.majorMinorPatch }} [skip ci]"
           git tag ${{ steps.get_gitversion.outputs.majorMinorPatch }} -m "Release ${{ steps.get_gitversion.outputs.majorMinorPatch }}"
           git push --tags
 
-          # Create a release
+          # Create a GitHub release from latest tag
           gh release create ${{ steps.get_gitversion.outputs.majorMinorPatch }} \
             --title "${{ steps.get_gitversion.outputs.majorMinorPatch }}" \
             --generate-notes
@@ -31,10 +31,6 @@ wheels/
 *.manifest
 *.spec
 
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
 # Unit test / coverage reports
 htmlcov/
 .tox/
 
@@ -1,14 +1,43 @@
 # model-training
 Contains the ML training pipeline used for the main project of course CS4295: Release Engineering for Machine Learning Applications. This pipeline is of an ML model that evaluates restaurant reviews. The repository structure is based off the Cookiecutter template.
 
-## Dependencies
-All required packages can be found in `requirements.txt`. To install the required packages, run the following command:
+## **Dependencies**
+
+This project is using Poetry instead of Pip to manage dependencies. Poetry is a Python dependency management tool that simplifies the process of managing dependencies and packaging. Additionally, Poetry is also used to manage the virtual environment from which the project is run, thus not requiring the user to manually create a virtual environment.
+
+### **Installation (Poetry)**
+
+To install Poetry, please follow the instructions on the [Poetry website](https://python-poetry.org/docs/#installation) and follow the corresponding steps for your operating system.
+
+### **Installing dependencies**
+
+To install the project dependencies, please run the following command:
 
 ```bash
-pip install -r dep/requirements.txt
+poetry install
 ```
 
-## Usage
+This will install all dependencies listed in `pyproject.toml` and create a virtual environment for the project. As such, instead of using `pip` to install a specific dependency and then run that dependency in a virtual environment, Poetry will handle this for you.
+
+### **Adding a new dependency**
+
+To add a new dependency, please run the following command:
+
+```bash
+poetry add <dependency-name>
+```
+
+This will add the dependency to `pyproject.toml` and install it in the virtual environment.
+However, if you would like to install a dependency for development purposes, please run the following command:
+
+```bash
+poetry add --dev <dependency-name>
+```
+
+In any case, dependency changes will also show up in the `poetry.lock` file. This file is used to ensure that all developers are using the same versions of the dependencies. Consequently, it is good practice and actually recommended that this file is committed to version control. 
+
+## **Usage**
+
 In order to run the pipeline, ensure that you have `dvc` installed and run the following command:
 
 ```bash
@@ -21,39 +50,47 @@ To view a graphical representation of the pipeline, run the following command:
 ``` bash
 dvc dag
 ```
-### Remote
+### **Remote**
+
 A Google drive folder has been configured to be used as remote storage.
 
-### Testing
+### **Testing**
+
 In order to test the ML pipeline, several tests are performed which can be found in `tests/`. These are ran automatically as part of the pipeline. They can be manually ran using the following command:
 
 ```bash
-pytest
+poetry run pytest
 ```
-### Metrics
+
+### **Metrics**
+
 The accuracy metric is stored in `reports/model_evaluation.json`. In order to see the experiment history, run the following command:
 
 ```bash
 dvc exp show
 ```
-Two experiments are listed, comparing the use of a 20% and 10% test split size. 
-### Dataset
+Two experiments are listed, comparing the use of a 20% and 10% test split size.
+
+### **Dataset**
+
 Project was created using the dataset provided by course instructors on [SURFdrive](https://surfdrive.surf.nl/files/index.php/s/207BTysNQFuVZPE?path=%2Fmaterial).
 
-### Preprocessing
+### **Preprocessing**
+
 Any preprocessing steps can be found in `preprocessing.py`. These are executed automatically with the execution of the pipeline. Processed data (corpus) is stored in `data/processed/`.
 
-### Storing the trained model
+### **Storing the trained model**
+
 The trained model is stored in `data/models/`.
 
+## **Pylint & DSLinter**
 
-## Pylint & DSLinter
 Pylint and DSLinter have been used and configured to ensure the code quality. All configuration options can be found in `.pylintrc`. This configuration file is based on [this example from the DSLinter documentation](https://github.com/SERG-Delft/dslinter/blob/main/docs/pylint-configuration-examples/pylintrc-for-ml-projects/.pylintrc). Besides this, there are a few custom changes, such as adding the variable names `X_train`, `X_test` etc. to the list of accepted variable names by Pylint, as these variable names are commonly used in ML applications. The `init_hook` variable in `.pylintrc` is also set to the path of this directory, in order to ensure that all imports within the code do not result in a warning from Pylint.
 
 If you would like to manually verify the code quality, please run the following command:
 
 ```bash
-pylint src
+poetry run pylint src
 ```
 
 DSLinter is configured and will automatically run. This should return a perfect score of 10.00. A report summarising the findings can be found in `data/reports/`.