Skip to content

Conversation

@jhcipar
Copy link
Contributor

@jhcipar jhcipar commented Sep 4, 2025

This PR adds an example of using Tetra with vLLM for inference using the instructor package.

A CPU endpoint is created to handle remote dependencies and acts as a client submitting requests to a GPU endpoint running Qwen3-0.6B.

This example requires runpod/tetra-rp#89 to work properly.

@jhcipar jhcipar requested review from deanq and pandyamarut September 4, 2025 23:56
# This key is required to authenticate with RunPod's serverless API
RUNPOD_API_KEY = os.environ["RUNPOD_API_KEY"]

# Qwen3-0.6B is a compact model that's efficient for structured data extraction tasks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add all of these comments regarding choices on the top.
Keep it clean. Since we don't have separate readme for examples, an informative comment on the top works the best.

Copy link
Contributor

@pandyamarut pandyamarut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@pandyamarut pandyamarut merged commit 3f411bb into main Sep 10, 2025
@pandyamarut pandyamarut deleted the jhcipar/instructor-inference-example branch September 10, 2025 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants