Skip to content

Latest commit

 

History

History

fixmycar

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

FixMyCar

FixMyCar is a retrieval-augmented generation (RAG) sample application, powered by Google Cloud, including: Gemini on Vertex AI, Vertex AI Agent Builder, Google Kubernetes Engine (GKE), Java (Spring), and Python (Streamlit). This doc guides you through deploying this app to your Google Cloud project.

Prerequisites

To deploy this app, you will need:

  • A Google Cloud project with billing enabled.
  • The Google Cloud SDK (gcloud) installed and configured in your local environment (or use Google Cloud Shell). (Note: The gcloud SDK is already installed on Google Cloud Shell)
  • Docker, OR an open-source tool like Colima that can run docker build and docker push. (Note: Docker is already installed on Google Cloud Shell.)
  • Java 18+, Maven 3.9.6+
  • Python 3.9+

Create an Artifact Registry repository.

This is where you'll store your container images for the Streamlit frontend and Java backend. Containers are packaged-up source code and dependencies that can be deployed to different environments.

  1. Open the Google Cloud console. Open the search bar and type "Artifact Registry." Open the Artifact Registry console page.

  2. Click Create Repository.

  3. Name your repository fixmycar. Keep the default Docker option. Choose any region, eg. us-central1. Then click Create.

Build and push container images.

  1. From the Artifact Registry console, click your repository, fixmycar. Then click Setup Instructions.

  2. Copy the authentication command, eg.

gcloud auth configure-docker \
    us-central1-docker.pkg.dev
  1. Open your terminal or Cloud Shell. Paste the command and run it. This will authenticate your Docker client to your Artifact Registry repository.

Expected output:

{
  "credHelpers": {
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
  1. Open the dockerpush.sh script in the root of this directory. Replace PROJECT_ID with your Google Cloud project ID.
  2. Run the script to build and push the Frontend and Backend container images to Artifact Registry.
./dockerpush.sh

Expected output:

latest: digest: sha256:864589160d7c3f982472427ed008cc03cf244f5db61a0c7312caaaa670ee0e47 size: 1786
✅ Container build and push complete.

Create a GKE Autopilot cluster

You will deploy these two container images ("frontend" and "backend") to Google Kubernetes Engine (GKE). GKE takes care of running these two servers on underlying compute resources.

  1. Open the Google Cloud console. Open the search bar and type "Kubernetes Engine." Open the Kubernetes Engine console page. (If it prompts you to enable the API, click Enable.)

  2. From the Kubernetes Engine console, click Create to open the cluster-creation wizard.

  3. Keep all defaults (GKE Autopilot). Give your cluster any name you want, eg. fixmycar. Click Create.

This will take a few minutes to complete.

  1. When your cluster is ready, click on the name of the cluster, and click Connect. Copy the "Command line Access" command, eg.
gcloud container clusters get-credentials fixmycar --region us-central1 --project my-project
  1. Return to your terminal and paste that command, then run it.

Expected output:

Fetching cluster endpoint and auth data.
kubeconfig entry generated for fixmycar.
  1. Test that you can reach your GKE cluster by running:
kubectl cluster-info

You should see something like:

Kubernetes control plane is running at https://34.69.121.152
...

Upload auto manuals to Cloud Storage

  1. Open the Cloud Console and search for "Cloud Storage." Click the console page.

  2. Click Create Bucket.

  3. Name your bucket something globally unique, eg. <your-project-id>-fixmycar. Keep all other default settings, then click Create. (You may see a notification about enforcing public access prevention - this is expected. Click "Confirm.")

  4. Copy the full name of your bucket by clicking the "copy" icon.

  1. Open your terminal. Set your bucket's name as an environment variable.
export BUCKET_NAME=<your-bucket-name>
  1. Download the Cymbal Starlight 2024 manual from the public bucket.
gsutil cp gs://github-repo/generative-ai/sample-apps/fixmycar/cymbal-starlight-2024.pdf .
  1. Upload the Cymbal Starlight 2024 owner's manual to your private Cloud Storage bucket.
gsutil -m cp -r cymbal-starlight-2024.pdf gs://$BUCKET_NAME

Expected output:

- [10/10 files][156.6 MiB/156.6 MiB] 100% Done
Operation completed over 10 objects/156.6 MiB.

Set up Vertex AI Agent Builder

You will store the Cymbal Starlight 2024 owner's manual in a Vertex AI Agent Builder data store. This will allow you to search the manual's contents using natural language queries.

  1. Open the Cloud Console and search for "Agent Builder." Open the console page. If prompted, click Activate API.

  2. Click Create a new app, then click type Search.

  3. Keep "Enterprise edition features" and "Advanced LLM features"

  4. Name your app [PROJECT_ID]-fixmycar, and set the ID to match. Set "Company Name" to fixmycar. Keep region global.

  5. Click Continue. Then click Create a new data store.

  6. In "Select Data Source," click Cloud Storage. Then browse to your Cloud Storage bucket's manuals/ directory. Click Continue.

  1. Set region global, and the Default Document Parser type to OCR Parser. Name your datastore fixmycar.

  1. Click Create to create the data store.

  2. Back in the application creation wizard, select the datastore you just created, then click Create to create your app.

Test Vertex AI Agent Builder datastore

  1. Back in the Data Stores page, click your new data store. Then click the Activity tab. You can view the status of your PDF processing here. Under the hood, Vertex AI Agent Builder is scanning your documents and converting them to vector embeddings. Then, it's storing those embeddings in the data store. This may take around 10 minutes to complete.

  1. When your data store is ready, you should see a green check icon and the status: "Import completed."

  1. You can test a search query directly from the console by clicking Apps on the left sidebar, then Preview. Type a query, for instance: Cymbal Starlight 2024: Max cargo capacity. A generated result should appear.

Set up GKE auth to Vertex AI

Your GKE-based backend server needs to access Vertex AI Agent Builder and Vertex AI's Gemini API. To do this, we'll map the Kubernetes service account used by the backend pod, to a Google Cloud IAM service account with the right permissions. Here, the Kubernetes-to-GCP service account mapping provides the authentication ("who are you?"), and the Google Cloud service account's IAM roles provide authorization ("what are you allowed to do?"). This setup is called GKE Workload Identity.

  1. Run the setup script to configure auth.
./workload_identity.sh

Expected output

✅ Workload Identity setup complete.

Deploy the app to GKE

  1. Get ready to deploy the frontend and backend to Kubernetes Engine (GKE) by opening the kubernetes/frontend-deployment.yaml file.

  2. Update the image field to use your project ID, for example:

image: us-central1-docker.pkg.dev/project123/fixmycar/frontend:latest
  1. Repeat step 2 for kubernetes/backend-deployment-vertex-search.yaml.

  2. Update the GCP_PROJECT_ID env var in backend-deployment-vertex-search.yaml to use your project ID.

            - name: GCP_PROJECT_ID
              value: "your-project-id"
  1. Lastly, update the VERTEX_AI_DATASTORE_ID env var in backend-deployment-vertex-search.yaml to use your Vertex AI Agent Builder Data Store ID. You can find this in the Console by clicking 'Data,' then copying the value of Data Store ID.
            - name: VERTEX_AI_DATASTORE_ID
              value: "your-datastore-id"
  1. Deploy the app to your GKE cluster. This will create Deployments ("Pods", or running servers) for both the Streamlit frontend and Java backend. It will also create Services to expose both servers to the public Internet.
kubectl apply -f kubernetes/backend-deployment-vertex-search.yaml
kubectl apply -f kubernetes/backend-service.yaml
kubectl apply -f kubernetes/frontend-deployment.yaml
kubectl apply -f kubernetes/frontend-service.yaml

You may see a warning on Autopilot resource adjustments. This is normal; GKE Autopilot is scaling up its compute resources to run your workloads.

Warning: autopilot-default-resources-mutator:Autopilot updated Deployment default/fixmycar-frontend: adjusted resources to meet requirements for containers [fixmycar-frontend] (see http://g.co/gke/autopilot-resources)

Test functionality.

  1. Get the status of your running GKE pods to ensure that the Frontend and Backend started up successfully.
kubectl get pods

Note - it may take ~3 minutes for your pods to move from Pending to Running, if you're deploying for the first time. (GKE is scaling up your cluster)

Expected output

NAME                                 READY   STATUS    RESTARTS      AGE
fixmycar-backend-74fbb8c8b5-zbcmv    1/1     Running   0             3m
fixmycar-frontend-75ff59c776-h57r6   1/1     Running   0             2m
  1. Copy the external IP value of your frontend service.
kubectl get service fixmycar-frontend

Expected output

➜ kubectl get service fixmycar-frontend
NAME                TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)        AGE
fixmycar-frontend   LoadBalancer   <ip-value>       <ip-value>      80:30519/TCP   4m13s
  1. Open that IP address in a web browser. You should see a Streamlit frontend with a chat window.

  2. Test a chat prompt based on an existing item in the Cymbal Starlight owner's manual.

For instance, you can ask again about the max cargo capacity of the vehicle. This text is in our Vertex AI data store, so our app should be able to find this info for us.

Prompt the Streamlit app:

Cymbal Starlight 2024: What is the max cargo capacity?

Expected chatbot response:

The response should match the text from the manual.

You can see what happened "under the hood" by viewing the backend server logs:

kubectl logs -l app=fixmycar-backend

Expected output:

fixmycar-backend-77cb969894-cfvbq fixmycar-backend 2024-03-23T23:35:07.059Z  INFO 1 --- [nio-8080-exec-4] c.c.f.FixMyCarBackendController          : 🔍 Vertex AI Agent Builder results: Chapter 6: Towing, Cargo, and Luggage Towing Your Cymbal Starlight 2024 is not equipped to tow a trailer. Cargo The Cymbal Starlight 2024 has a cargo capacity of <b>13.5 cubic feet</b>. The cargo area is located in the trunk of the vehicle.
fixmycar-backend-77cb969894-cfvbq fixmycar-backend 2024-03-23T23:35:07.060Z  INFO 1 --- [nio-8080-exec-4] c.c.f.FixMyCarBackendController          : 🔮 Gemini Prompt: You are a helpful car manual chatbot. Answer the car owner's question about their car. Human prompt: Cymbal Starlight 2024: what is the max cargo capacity?,
fixmycar-backend-77cb969894-cfvbq fixmycar-backend  Use the following grounding data as context. This came from the relevant vehicle owner's manual: Chapter 6: Towing, Cargo, and Luggage Towing Your Cymbal Starlight 2024 is not equipped to tow a trailer. Cargo The Cymbal Starlight 2024 has a cargo capacity of <b>13.5 cubic feet</b>. The cargo area is located in the trunk of the vehicle.
fixmycar-backend-77cb969894-cfvbq fixmycar-backend 2024-03-23T23:35:07.762Z  INFO 1 --- [nio-8080-exec-4] c.c.f.FixMyCarBackendController          : 🔮 Gemini Response: The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. The cargo area is located in the trunk of the vehicle.

Here, you can see how the backend is doing a search query to Vertex AI Agent Builder, then augmenting the Gemini prompt using the results. The Gemini response is then sent back to the frontend, and that's what you're seeing as the chatbot response in the browser.