GitHub - ml6team/laine-engineer-coding-challenge: Coding challenge for evaluating AI Engineer candidates

Hello potential Laine colleague!

If you are reading this, you are probably applying for a Machine Learning engineering job at Laine. This coding challenge will evaluate if you have the right skills for the job!

Completing the challenge should take about half a day if you have the relevant experience. In this challange, you will try to build a job description classifier and deploy your model on Google Cloud Platform (GCP) Vertex AI.

A job desciption is a paragraph of text that describes a certain job position. For example:

You will mostly work with TensorFlow and Python to solve hard Machine Learning tasks and help to put these in production.

For the challenge, we ask you to create a ML model to classify such texts into 5 categories: IT Jobs, Sales Jobs, HR & Recruitment Jobs, Accounting & Finance Jobs, and Customer Services Jobs.

Prerequisites

Template code

This repository contains the template codes that you can start with. You can clone this repository using command:

git clone [email protected]:ml6team/laine-engineer-coding-challenge.git

Google support

In the end you'll need to deploy your model on GCP. So you need to register your Google Cloud account. You'll need a credit card for the registration, but then you'll receive some free credits from Google to start your development for free.

You need to install the gcloud command in your system, which is a part of the Google Cloud SDK. You'll use this command in several steps.

Python Environment

This challenge requires Python 3.7 for compatibility with TensorFlow 2.1.0. If you don't have Python 3.7 installed, you can install it using pyenv:

# Install pyenv
curl https://pyenv.run | bash
export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"

# Install Python 3.7
pyenv install 3.7.16
pyenv global 3.7.16

# Create virtual environment
python -m venv venv
source venv/bin/activate
pip install --upgrade pip

Data

Before you begin to implement your classification model you need to download the data used for training and local evaluation from Google Cloud Storage and place the data folder in the base folder. To download the data, execute the following command:

gsutil -m cp -r gs://ml6_junior_ml_engineer_challenge_cv_job_description_data/data .

For your purposes, the data has already been split into training set and validation set. They are respectively in the train.csv and eval.csv files. There are five kinds of job classes: Sales Jobs, Customer Services Jobs, IT Jobs, HR & Recruitment Jobs and, Accounting & Finance Jobs. Labeled 0 to 4 as defined in 'trainer/config.py'. If you want, you can inspect the data. The code that loads the CSV files into texts and labels is already provided to you.

The test set will be used for the final evaluation when you submit your solution. Hence, the test.csv file is not provided.

Local Workflow - Model Training

You can start your development on your local PC / remote server / Cloud VM. The goal is to train a job description classifier that achieves an accuracy as high as possible. To do so, you need to code the data preprocessor, model definition, model exporter and model loader. The main program logic is written in task.py, where the dataset loader and model training codes are also provided.

The general local workflow is as follow:

The template Python files are provided in the trainer folder:

task.py: containing the main training logics such as loading the CSV dataset, preprocess the data, model training, model evaluation and model exporting. Normally you don't need to modify this file.

config.py: containing the global configurations. You can change some existing definitions such as EPOCHS and BATCH_SIZE to control the training, or define more global variables that are needed.

preprocess.py: containing the preprocessor class. You should implement this class to preprocess the input data. Notice that the preprocessing method could be written differently for training set, validation set and test set.

model.py: containing the model definition. You should implement your model in this file.

export.py: containing the code for exporting and loading the model. You should implement these methods so that your model can be exported to file and loaded from file.

predictor.py: containing the entry code for the online prediction (used by the containerized server).

In principle, you can feel free to change any files in the trainer folder. Just remember that the objective is to deploy your classifier on Google Cloud Vertex AI, your model will then accept API request that contains the job description text, and it will return the job category as the classification result.

You can train your model by running the below command after implementing the files above:

python3 trainer/task.py

Deployment Files

In the main folder you will see two main files for deploying to Vertex AI:

**server.py: This specified how your app should serve and process requests once deployed in Vertex AI. It has been specifically tailored to format your model inputs/outputs into those expected by Vertex APIs. You should NOT need to change this file.

**Dockerfile: Specifies how the image should be built. You should NOT need to change this file.

Online Workflow - Model Deployment

After training the model locally, you can create a GCP project and deploy your model on GCP Vertex AI. Notice that the GCP deployment is also a part of the coding challenge, to see if you can quickly get familiar with a Cloud environment. It's as important as the model training part. Please read the guidelines carefully and deploy the model in the correct steps.

Since we want to provide flexibility on the approaches that you can choose, we don't restrict your solution to be a TensorFlow model with fixed input / output format. In order to deploy a customized model on GCP Vertex AI, you will use GCP's custom container feature, which allows you to control the logic of model loading, data preprocessing and results postprocessing.

The general online workflow is as follow:

Now we explain the model deployment step-by-step:

1. Before you start, you need to ensure that:

- You have completed the missing methods that were required for model training and the training was successful.

- You have exported your trained model to a file, e.g. a TensorFlow model saved in `output/saved_model`.

- You have installed the required dependencies:

    source venv/bin/activate
    pip install -r requirements.txt

1.1 You can test if your solution works by running:

    python3 trainer/task.py
    python3 server.py

2. Ensure google cloud cli is setup and your environment is configured with the correct apis enabled:

    gcloud auth login
    gcloud services enable aiplatform.googleapis.com
    gcloud services enable artifactregistry.googleapis.com

3. Build and a docker image of your trained model

    gcloud builds submit --tag gcr.io/<PROJECT_ID>/job-classifier:v1 .

4. Upload your model to vertex then create an endpoint

    gcloud ai models upload \
    --region=europe-west1 \
    --display-name=<MODEL_NAME> \
    --container-image-uri=gcr.io/<PROJECT_ID>/job-classifier:v1

    gcloud ai endpoints create \
    --region=europe-west1 \
    --display-name=<ENDPOINT_NAME>

You can check the deployed resource ids here:

    gcloud ai models list --region=europe-west1
    gcloud ai endpoints list --region=europe-west1

5. And finally deploy your model to the endpoint. It's important to use the model and endpoint ids and not the name you gave it.

    gcloud ai endpoints deploy-model <ENDPOINT_ID> \
    --region=europe-west1 \
    --model=<MODEL_ID> \
    --display-name=job-classifier-deployment \
    --traffic-split=0=100

THIS MAY TAKE A WHILE.

Check the Deployed Model

Before you submit your solution, you can check if your deployed model works by listing the created endpoint and running a test on it.

gcloud ai endpoints predict <ENDPOINT_ID> \
  --region=europe-west1 \
  --json-request=check_deployed_model/test.json

Check if you are able to get a prediction out of the gcloud command. If you get errors, you should try to resolve them before submitting the solution. The output of the command should look something like this (the numbers will probably be different):

{
  "predictions": [0]
}

The values you use for the $ENDPOINT_ID variable can be found by running the list command above. You will need these values and your Google Cloud Project ID to submit your coding test.

To be able to pass the coding test. You should be able to get an accuracy of 75% on our secret dataset of job descriptions (which you don't have access to). If your accuracy however seems to be less than 75% after we evaluated it, you can just keep submitting solutions until you are able to get an accuracy of 75%.

Submit your Coding Test

Once you are able to execute the command above without errors, you can add us to your project:

Go to the menu of your project
Click IAM & admin
Click Add
Add laine-coding-challenge-eval@zippy-carving-465819-p5.iam.gserviceaccount.com as a member with the role Project Owner

After you added us to your project you should fill in: this form so we are able to automatically evaluate your solution to the coding test. Once you've filled in the form someone from Laine will run the eval pipeline and get back to you. We'll hope with you that your results are good enough to land an interview at Laine. If however you don't you can resubmit a new solution as many times as you want, so don't give up!

If you are invited for an interview at Laine afterwards, make sure to bring your laptop with a copy of the code you wrote, so you can explain your model.py file to us.

Resource Teardown

Once finished with the coding challenge and (hopefully!) receiving a positive outcome, you can run the below commands to tear down the resources you have created.

1. Undeploy model from endpoint

    # get deployed model id
    gcloud ai endpoints describe <ENDPOINT_ID> --region=europe-west1
    #then run below with the id from ^
    gcloud ai endpoints undeploy-model <ENDPOINT_ID> \
    --deployed-model-id=<DEPLOYED_MODEL_ID> \
    --region=europe-west1 \
    --quiet

2. Delete the endpoint and model

    gcloud ai endpoints delete <ENDPOINT_ID> --region=europe-west1 --quiet
    gcloud ai models delete <MODEL_ID> --region=europe-west1 --quiet

3. Delete the container image

gcloud container images delete gcr.io/<PROJECT_ID>/job-classifier:v1 --quiet --force-delete-tags

Now your project should be clean once more! Note: you will need to do this for each submission created if you change the model and endpoint names. You can verify this has been successfuly via running:

    gcloud ai models list --region=europe-west1
    gcloud ai endpoints list --region=europe-west1

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
check_deployed_model		check_deployed_model
trainer		trainer
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
server.py		server.py
storage.png		storage.png
workflow-online.png		workflow-online.png
workflow_local.png		workflow_local.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Prerequisites

Template code

Google support

Python Environment

Data

Local Workflow - Model Training

Deployment Files

Online Workflow - Model Deployment

1. Before you start, you need to ensure that:

1.1 You can test if your solution works by running:

2. Ensure google cloud cli is setup and your environment is configured with the correct apis enabled:

3. Build and a docker image of your trained model

4. Upload your model to vertex then create an endpoint

5. And finally deploy your model to the endpoint. It's important to use the model and endpoint ids and not the name you gave it.

Check the Deployed Model

Submit your Coding Test

Resource Teardown

1. Undeploy model from endpoint

2. Delete the endpoint and model

3. Delete the container image

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

ml6team/laine-engineer-coding-challenge

Folders and files

Latest commit

History

Repository files navigation

Prerequisites

Template code

Google support

Python Environment

Data

Local Workflow - Model Training

Deployment Files

Online Workflow - Model Deployment

1. Before you start, you need to ensure that:

1.1 You can test if your solution works by running:

2. Ensure google cloud cli is setup and your environment is configured with the correct apis enabled:

3. Build and a docker image of your trained model

4. Upload your model to vertex then create an endpoint

5. And finally deploy your model to the endpoint. It's important to use the model and endpoint ids and not the name you gave it.

Check the Deployed Model

Submit your Coding Test

Resource Teardown

1. Undeploy model from endpoint

2. Delete the endpoint and model

3. Delete the container image

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages