- Clone the repository and go to the root of it.
- Set environment variable
OPENAI_SECRET_KEY
in your OS (Linux or macOS). - Using the dockerfile build the image with tag
flask3.12
:docker build -t flask3.12 .
- Running the app: run the container while with:
You should see:
docker run -it -p 8000:8000 -e OPENAI_SECRET_KEY=$OPENAI_SECRET_KEY flask3.12 app.py
* Serving Flask app 'app' * Debug mode: off
- Run
python evaluate.py
in another terminal. - Running the tests:
docker run -it -p 8000:8000 -e OPENAI_SECRET_KEY=$OPENAI_SECRET_KEY flask3.12 -m pytest .
- Rank: A
- Maintainability: Very-high
Using prompt3.txt
, JSON examples with Jinja2 templating.
python evaluate.py ⏎ ✹ ✭
[CORRECT] Expected ['V1', 'C1'], got ['V1', 'C1']
[CORRECT] Expected ['CO1'], got ['CO1']
[CORRECT] Expected ['RR1', 'CC1'], got ['RR1', 'CC1']
[CORRECT] Expected ['B1', 'M1'], got ['B1', 'M1']
[CORRECT] Expected ['MG1', 'G1'], got ['MG1', 'G1']
[CORRECT] Expected ['P1', 'C2'], got ['P1', 'C2']
[CORRECT] Expected ['H1', 'O1'], got ['H1', 'O1']
[CORRECT] Expected ['F1', 'S2'], got ['F1', 'S2']
[CORRECT] Expected ['A1', 'CH1'], got ['A1', 'CH1']
[CORRECT] Expected ['T1', 'PB1'], got ['T1', 'PB1']
[CORRECT] Expected ['SC1', 'L1'], got ['SC1', 'L1']
[CORRECT] Expected ['C3', 'BL1'], got ['C3', 'BL1']
[CORRECT] Expected [], got []
[CORRECT] Expected [], got []
[CORRECT] Expected [], got []
[CORRECT] Expected [], got []
[INCORRECT] Expected [], got ['C1']
[CORRECT] Expected [], got []
[CORRECT] Expected [], got []
[CORRECT] Expected [], got []
[CORRECT] Expected [], got []
[INCORRECT] Expected [], got ['C2']
data/case_1.json accuracy: 90.91%
[INCORRECT] Expected ['I1', 'R1'], got ['Q1']
[CORRECT] Expected ['Q1'], got ['Q1']
[CORRECT] Expected ['C1'], got ['C1']
[CORRECT] Expected ['A1'], got ['A1']
[CORRECT] Expected ['D1'], got ['D1']
[CORRECT] Expected ['T1'], got ['T1']
[CORRECT] Expected ['Ap1'], got ['Ap1']
[CORRECT] Expected ['S1'], got ['S1']
[INCORRECT] Expected ['F1'], got ['I1']
[INCORRECT] Expected ['Q1', 'S1'], got ['Q1']
[CORRECT] Expected ['Ap1', 'T1'], got ['Ap1', 'T1']
[CORRECT] Expected ['D1', 'C1'], got ['D1', 'C1']
[INCORRECT] Expected ['I1', 'S1', 'F1'], got ['S1']
[CORRECT] Expected ['R1', 'Q1'], got ['R1', 'Q1']
[INCORRECT] Expected ['I1', 'Ap1'], got ['Ap1']
[INCORRECT] Expected ['T1', 'Ap1', 'F1'], got ['T1', 'Ap1']
data/case_2.json accuracy: 62.50%
[CORRECT] Expected ['Y1'], got ['Y1']
[CORRECT] Expected ['N1'], got ['N1']
[CORRECT] Expected ['Y1'], got ['Y1']
[CORRECT] Expected ['N1'], got ['N1']
[CORRECT] Expected ['Y1'], got ['Y1']
[CORRECT] Expected ['N1'], got ['N1']
[CORRECT] Expected ['Y1'], got ['Y1']
[CORRECT] Expected ['N1'], got ['N1']
[CORRECT] Expected ['Y1'], got ['Y1']
[CORRECT] Expected ['N1'], got ['N1']
[CORRECT] Expected ['Y1'], got ['Y1']
[CORRECT] Expected ['N1'], got ['N1']
data/case_3.json accuracy: 100.00%
Average accuracy: 84.47%
Your task is to build a simple web application to do text classification using a large language model. gpt-3.5-turbo
will be sufficient for the task and should incur minimal cost. The main endpoint for your classification will be a POST request to /classify
. The JSON payload will have the following structure:
{
"query": "the text to be classified",
"options": {
"multilabel": true,
"show_reasoning": true
},
"classes": [
{
"class_id": "C1",
"class_name": "Class 1",
"class_description": "Description of class 1"
},
{
"class_id": "C2",
"class_name": "Class 2",
"class_description": "Description of class 2"
}
]
}
Your web app should return a JSON dictionary with the following structure:
{
"result": ["{THE CLASSIFICATION RESULT CLASS ID}"],
"reasoning": "The reasoning behind the classification"
}
If multilabel
is true
, then the "result" field can have more than one option. If "show_reasoning" is true
, then the "reasoning" field should be included in the response. If "show_reasoning" is false
or not included, then the "reasoning" field should be null
.
Included in the repo alongside this README is the script we will run to evaluate your web app (evaluate.py
). Some example test data is also included - we will run this and a held out test set through your app and compare it to our own implementation. Our implementation uses gpt-3.5-turbo
, and you should too. Using a more powerful model for better results will count against you - we are much more interested in the approach you take with your code than the actual results.
Feel free to include anything else in the deliverable that could help us approach it as a newbie, much like how you’d do it in a usual work setting.
- Deploy it somewhere
- Add security measures of some kind
- Extract text from PDFs and process
- Opens given URL, scrapes the text from it and summarize
- Other endpoints to accomplish different tasks with LLMs
- Add caching
- Make a simple frontend