Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
fd6d36a
Rename README to Assignment
loukaspe May 24, 2025
99e3ee5
Initial skeleton commit
loukaspe May 24, 2025
7b5fe73
Small fix
loukaspe May 25, 2025
8bf4dad
Adds initial data embedding
loukaspe May 25, 2025
b6d5747
configuration changes
loukaspe May 26, 2025
e668e16
Adds create chat session
loukaspe May 26, 2025
b07c79e
Adds get chat sessions
loukaspe May 26, 2025
dadda6f
Small fixes
loukaspe May 27, 2025
a6461f9
Adds send message functionality
loukaspe May 27, 2025
c6d139e
Fixes swagger definitions
loukaspe May 27, 2025
6e3bd4c
Sets answer from LLM
loukaspe May 27, 2025
3e38a6f
Inserts LLM into answer generation
loukaspe May 27, 2025
1483db9
Adds curl examples
loukaspe May 27, 2025
cb1a530
Adds an e2e script for testing all functionalities
loukaspe May 27, 2025
a273568
Small fixes
loukaspe May 27, 2025
ef85d96
Adds update title for chat session
loukaspe May 27, 2025
a4fab36
Change embeddings to be stored separately
loukaspe May 27, 2025
354cff9
Small fix to embedding separately
loukaspe May 27, 2025
cd0411b
Minor fix
loukaspe May 27, 2025
1974d8f
Adds similarity search threshold
loukaspe May 27, 2025
6d6135b
adds context and changes openai model
loukaspe May 27, 2025
436c12b
Adds history to the chat completion
loukaspe May 27, 2025
fdddddf
Make the e2e for clear and specific
loukaspe May 27, 2025
60c9f43
Use only provided context, not chatgpts
loukaspe May 27, 2025
a3d0f1c
add context to knowledge base
loukaspe May 27, 2025
e82a9bd
Adds submission of feedback for message
loukaspe May 27, 2025
a4f6a87
Small fixes
loukaspe May 27, 2025
62bce25
Fix e2e test code
loukaspe May 27, 2025
0ef324f
Swagger definition
loukaspe May 27, 2025
388aaf1
README finish vol1
loukaspe May 27, 2025
b575962
Adds some README info
loukaspe May 27, 2025
7d94fa8
Small fixes
loukaspe May 27, 2025
10b4c41
Make script messages more clear
loukaspe May 28, 2025
8db959b
Fix test
loukaspe May 28, 2025
e4e159c
Minor readme fix
loukaspe May 28, 2025
0844fd6
Remove TODOs
loukaspe May 28, 2025
43e8606
Improve readme
loukaspe May 28, 2025
9f88592
Small readme addition
loukaspe May 28, 2025
f9d84e0
Swagger fix
loukaspe May 28, 2025
c2d76c5
Swagger fix vol2
loukaspe May 28, 2025
06ee169
small readme fix
loukaspe May 28, 2025
835f4a5
Minor fixes
loukaspe May 28, 2025
14571e3
Add example response in README
loukaspe May 29, 2025
75fe317
Add option to input another source
loukaspe Jun 18, 2025
5bedbb5
Move http stuff to dedicated folders
loukaspe Jun 18, 2025
3ded284
Adds dummy mcp server (not working)
loukaspe Jul 13, 2025
eb925c4
Small change
loukaspe Jul 18, 2025
2cf30f8
Adds dummy ws routes
loukaspe Jul 18, 2025
801516e
Create send message in websocket
loukaspe Jul 18, 2025
3d30cd2
Adds sse streaming send message
loukaspe Jul 18, 2025
37700cc
Adds dummy cors to BE
loukaspe Aug 13, 2025
9565a08
Adds simple FE with SSE connection
loukaspe Aug 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.env
.idea
27 changes: 27 additions & 0 deletions Assignment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# GWI - Jedi Team - Engineering Challenge

Welcome to the engineering challenge for the Jedi Team at GWI!

This task is designed to help us understand how you approach software engineering problems and apply your skills in a real-world-inspired scenario. It focuses on backend engineering using **Go**, with optional extensions into **AI/LLMs**, **product thinking**, and **system design**. The Jedi team mainly works on and evolves the AI infrastructure of the company, so this exercise has a strong focus on that.

While the base functionality is straightforward, we encourage you to go beyond the minimum requirements — creativity, thoughtful design, and clean code are all appreciated.

## 🧪 Core Requirements

You are going to create a **chatbot** that helps GWI's clients answer questions based on market research data. Another tool has converted GWI's data into a **natural language** format and stored it in a database. You can find the data in `data.md`. You should use this data to answer users' questions.

Build a web server in **Go** that exposes this chat functionality (you decide the communication method and the necessary endpoints). The discussion within the chat should be persisted, and the user should be able to continue the conversation from where it was left off. A single user can open multiple chats.

## 🌟 Optional Enhancements

- If the answer to the user's question is not found in the data, the chatbot should decline to answer.
- The user can give negative feedback on a message.
- The chat should have an auto-generated title.
- Include a **Dockerfile** and a **Makefile** or **Taskfile** to simplify local development.
- Explain in the README how to run the application and the assumptions you made.

## 🧩 Submission

Just fork the current repository and send it to us!

Good luck, potential colleague!
156 changes: 140 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,151 @@
# GWI - Jedi Team - Engineering Challenge
# Jedi Team Challenge - Louk Chatwalker

Welcome to the engineering challenge for the Jedi Team at GWI!
---

This task is designed to help us understand how you approach software engineering problems and apply your skills in a real-world-inspired scenario. It focuses on backend engineering using **Go**, with optional extensions into **AI/LLMs**, **product thinking**, and **system design**. The Jedi team mainly works on and evolves the AI infrastructure of the company, so this exercise has a strong focus on that.
## Description

While the base functionality is straightforward, we encourage you to go beyond the minimum requirements — creativity, thoughtful design, and clean code are all appreciated.
This service provides a REST API that enables the creation of chat sessions, sending messages, reading chat sessions,
and submitting feedback for messages. The responses to the chat messages are generated using a Retrieval-Augmented
Generation (RAG) approach, leveraging a provided dataset of GWI data.

## 🧪 Core Requirements
---

You are going to create a **chatbot** that helps GWI's clients answer questions based on market research data. Another tool has converted GWI's data into a **natural language** format and stored it in a database. You can find the data in `data.md`. You should use this data to answer users' questions.
## Run

Build a web server in **Go** that exposes this chat functionality (you decide the communication method and the necessary endpoints). The discussion within the chat should be persisted, and the user should be able to continue the conversation from where it was left off. A single user can open multiple chats.
`cd script && make start-app`

## 🌟 Optional Enhancements
* This command will start the app with `localhost` address and `:8080` port (specified in build/Dev.Dockerfile and .env)

- If the answer to the user's question is not found in the data, the chatbot should decline to answer.
- The user can give negative feedback on a message.
- The chat should have an auto-generated title.
- Include a **Dockerfile** and a **Makefile** or **Taskfile** to simplify local development.
- Explain in the README how to run the application and the assumptions you made.
Then you can create, get User's chat-sessions, send message and get response from the Knowledge Base (data.md)
like the examples in `/examples` directory. To generate the needed Bearer token, please call `/token` endpoint with
username = "user" & password = "password" like in the example.

## 🧩 Submission
This runs the app with "dlv" so that we can also attach a debugger while running.

Just fork the current repository and send it to us!
Also you can run the `sh /scripts/e2e.sh` script to run all cases of the assignment:

Good luck, potential colleague!
1. It creates a JWT token
2. It creates three chat sessions for that User
3. In the first chat session we send three messages related to each other, so that the history
is shown:
1. "what do you know about latino mobile gamers?"
2. "do they use social media?"
3. "what social media do they use the most?"
4. It shows the whole chat session that shows the whole story
5. It submits a negative feedback for the last message
6. It sends a final message that the chat is not supposed to answer (what are butterflies)

---

## Makefile Commands

| Command | Usage |
|-------------------------------|--------------------------------------------------|
| start-app | `Start app` |
| kill-app | `Stop app` |
| rebuild-app | `Rebuild app in case of code changes` |
| tests | `Run both unit and integration tests` |
| generate-mock FILE={filePath} | `Generate mock for a specific file` |
| swag | `Generates swagger.json definitions in Docs dir` |

* All these are executed through docker containers
* In order to execute makefile commands type **make** plus a command from the table above

make {command}

---

## Notes

1. `/config/.env` and `deployment/.env` are not pushed to Git, so in order to run the app you need them with secret keys (e.g Pinecone, OpenAI etc)
2. There are three Dockerfile files.
1. Dockerfile is the normal, production one
2. Dev.Dockerfile is for setting up a remote debugger Delve
3. Utilities.Dockerfile is for building a docker for "utilities" like running tests, linting etc
4. LLM Choices (Made with limited knowledge):
1. Pinecone for vector database
2. `text-embedding-3-small` as embedding model
3. Tiktoken as a tokenizer with CHUNK_ENCODING_MODEL `cl100k_base` and MAX_TOKENS_PER_CHUNKS `3000`
4. The top 7 results are retrieved from the similarity search in the Vector DB, and there is a threshold of 0.35
that rejects the matches with score less than that. If no such matches are found, then the answer is "The force
is not strong enough for me to answer that question based on my context."
5. For OpenAI model I have chosen `gpt-4.1-nano` which is a nice combination and balance of speed, accuracy and price.
6. I've put a rate limiter when calling OpenAI because at times I was having 429 Many Request response.
5. There are swagger definitions in `/docs`, and examples in `/examples` that show the usage of the API. And the `e2e.sh` that
checks everything.
6. My approach for the code structure is the Hexagonal Architecture, more on that https://medium.com/@matiasvarela/hexagonal-architecture-in-go-cfd4e436faa3

## Known Issues

1. Only happy path tests are created due to time constraints.
2. JWT mechanism just requires a fake username and password to generate a JWT token and does NOT do
actual login due to lack of time. Also no test created for it. Also, the user_id that exists in the endpoint should
come the JWT directly.
3. In my implementation, when inputing the Chat History from the Messages DB, I import all messages to OpenAI so that
the discussion gets continued. In a production env, I would not do that, but put a limit to the number of messages read
from history, as there might be a lot of messages.
4. For performance increase, we can put indices in the DB, on the foreign keys so that the fetch in the GET
endpoints is faster.

## Security

1. JWT mechanism added for Authentication and Authorization (incomplete - see Known Issues)

## Libraries and Tools

1. github.com/gorilla/mux for routing
2. gorm.io/gorm as ORM for my PostgreSQL DB
3. github.com/golang-jwt/jwt/v4 for the JWT token handling
4. github.com/openai/openai-go for communicating with OpenAI
5. github.com/pinecone-io/go-pinecone/v3 for Pinecone Vector DB
6. github.com/pkoukk/tiktoken-go for tokenizer
7. github.com/swaggo/swag for Swagger definition
8. github.com/stretchr/testify & go.uber.org/mock & github.com/DATA-DOG/go-sqlmock for testing

## Example Chat Session Response

```
{
"id": "5488a398-1801-4a7c-ba6d-69d833453313",
"title": "Latino Mobile Gamers Overview",
"createdAt": "2025-05-29 10:13:58.368579 +0000 UTC",
"updatedAt": "2025-05-29 10:13:59.831385 +0000 UTC",
"messages": [
{
"id": "7a93a664-17fa-40a4-bd59-7d2c81d7accf",
"sender": "USER",
"content": "what do you know about latino mobile gamers",
"created_at": "2025-05-29 10:13:58.417981 +0000 UTC"
},
{
"id": "3bf684be-39ad-4fd0-8cae-27d42ce56111",
"sender": "SYSTEM",
"content": "Based on the provided context, Latino mobile gamers are characterized by the following behaviors and interests:\n- They are 49% more likely to visit Reddit daily compared to the average person.\n- They are 51% more likely to use TikTok weekly compared to the average person.\n- They are 42% more likely to use TikTok daily compared to the average person.\n- They are 36% more likely to use Instagram more than once a day than the average person.\n- They are 74% more likely to be interested in Esports compared to the average person.\n- They are 49% more likely to buy products or services to access the community built around them.\n- They are 62% more likely to find out about new brands and products through vlogs.\n- They are 43% more likely to discover new brands through ads seen in video or mobile games.\n- They are 62% more likely to discover new brands and products through posts or reviews from expert bloggers.\n- They are 25% between the ages of 16 and 24, and 22% between 25 and 34.\n- They are 30% more likely to be between 25 and 34 years old than the average person.\n- They are 16% of U.S. Hispanic/Latino mobile gamers.\n- They are 42% more likely to be interested in computers and coding.\n- They are 74% more likely to discover new brands and products through vlogs.\n- They are 103% more likely to discover new brands through ads in video or mobile games compared to the average person.\n- They are 45% more likely to spend 2-3 hours on streaming services daily.\n- They are 41% more likely to spend more than 4 hours on streaming services daily.\nThis indicates that Latino mobile gamers are highly active on social media platforms like TikTok and Instagram, have a strong interest in gaming, esports, technology, and discovering new brands through video content and ads in mobile games.",
"created_at": "2025-05-29 10:14:11.136485 +0000 UTC"
},
{
"id": "86718efc-d4b7-46b6-ac3f-b8c1a044cd8d",
"sender": "USER",
"content": "do they use social media",
"created_at": "2025-05-29 10:14:11.172935 +0000 UTC"
},
{
"id": "4dbba2a9-8df5-4c29-97f9-dd7d8999a34e",
"sender": "SYSTEM",
"content": "Yes, based on the provided context, Latino mobile gamers actively use social media. They are more likely than the average person to use platforms such as TikTok (42% more likely to use weekly, 36% more likely to use daily), Instagram (62% more likely to use more than once a day), and Reddit (49% more likely to visit daily). They also frequently discover new brands and products through social media content like vlogs and reviews from bloggers, indicating high engagement with social media channels.",
"created_at": "2025-05-29 10:14:17.505033 +0000 UTC"
},
{
"id": "e63bbd20-0519-45bd-a88f-48e0d4179c9e",
"sender": "USER",
"content": "what social media do they use the most",
"created_at": "2025-05-29 10:14:17.541732 +0000 UTC"
},
{
"id": "28e0d999-19b5-40d7-a23c-b31012fa8c87",
"sender": "SYSTEM",
"content": "Based on the provided context, Latino mobile gamers use TikTok and Instagram the most. They are 42% more likely to use TikTok weekly, 36% more likely to use it more than once a day, and 62% more likely to use Instagram more than once a day compared to the average person.",
"created_at": "2025-05-29 10:14:23.200825 +0000 UTC"
}
]
}```
26 changes: 26 additions & 0 deletions build/Dev.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
FROM golang:1.24.3-alpine3.21 as builder

WORKDIR /app

COPY ../go.mod go.sum ./
RUN go mod download
RUN go install github.com/joho/godotenv/cmd/godotenv@v1.4.0
RUN go install github.com/go-delve/delve/cmd/dlv@latest

FROM golang:1.24.3-alpine3.21

RUN apk update
RUN apk add build-base bash

COPY --from=builder /go /go

WORKDIR /app

COPY .. .

RUN GOOS=linux go build -gcflags='all=-N -l' -tags musl -a -installsuffix cgo -o main ./cmd/main.go

EXPOSE 8080
EXPOSE 40000

CMD ["dlv", "--listen=:40000", "--headless=true", "--api-version=2", "--accept-multiclient", "--continue=true", "exec", "main"]
38 changes: 38 additions & 0 deletions build/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Start from golang base image
FROM golang:1.24.3-alpine3.21 as builder

# Install git.
# Git is required for fetching the dependencies.
RUN apk update && apk add --no-cache git build-base bash

# Set the current working directory inside the container
WORKDIR /app

# Copy go mod and sum files
COPY ../go.mod go.sum ./

# Download all dependencies. Dependencies will be cached if the go.mod and the go.sum files are not changed
RUN go mod download

# Copy the source from the current directory to the working Directory inside the container
COPY .. .

# Build the Go app
RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main ./cmd/main.go

# Start a new stage from scratch
FROM alpine:latest
RUN apk --no-cache add ca-certificates

WORKDIR /app/

# Copy the Pre-built binary file from the previous stage. Observe we also copied the .env file
COPY --from=builder /app/main .
COPY --from=builder /app/.env .

# Expose port 8080 to the outside world
EXPOSE 8080

#Command to run the executable
RUN chmod +x ./main
CMD ["./main"]
19 changes: 19 additions & 0 deletions build/Utilities.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
FROM golang:1.24.3-alpine3.21

# Install git

RUN apk update && apk add --no-cache git build-base bash

WORKDIR /app

COPY ../go.mod go.sum ./

RUN go mod download

# Copy the source from the current directory to the working Directory inside the container
COPY .. .

RUN go install go.uber.org/mock/mockgen@latest
RUN go install github.com/joho/godotenv/cmd/godotenv@v1.4.0
RUN wget -O- -nv https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.50.1
RUN go install github.com/swaggo/swag/cmd/swag@latest
Loading