-
Notifications
You must be signed in to change notification settings - Fork 17
ray_quick
The Ray appliance includes a built-in chat application that can be easily deployed using a pre-trained model. This guide shows how to deploy and serve this application:
-
Download the Appliance Retrieve the Ray appliance from the OpenNebula marketplace using the following command:
$ onemarketapp export 'Service Ray' Ray --datastore default
-
(Optional) Configure the Ray VM Template Depending on your specific application requirements, you may need to modify the VM template to adjust resources such as
vCPU
orMEMORY
, or to add GPU cards for enhanced model serving capabilities. -
Instantiate the Template Upon instantiation, you will be prompted to configure model-specific parameters, such as the model ID and temperature, as well as provide your Hugging Face token if required. For example, deploying the
Qwen2.5-1.5B-Instruct
model results in the followingCONTEXT
and capacity attributes:MEMORY="8192" VCPU="4" ... CONTEXT=[ DISK_ID="1", ETH0_DNS="172.20.0.1", ... ONEAPP_RAY_API_PORT="8000", ONEAPP_RAY_CHATBOT_CPUS="4", ONEAPP_RAY_MODEL_ID="Qwen/Qwen2.5-1.5B-Instruct", ONEAPP_RAY_MODEL_TEMPERATURE="0.1", ONEAPP_RAY_MODEL_TOKEN="hf_rbmflxx*************", ... ]
Note: The number of CPUs allocated to the application is automatically derived from the available virtual CPUs.
-
Deploy the Application The deployment process may take several minutes as it downloads the model and required dependencies (e.g., PyTorch and FastAPI). You can monitor the status by logging into the VM:
- Access the VM via SSH:
$ onevm ssh 71 Warning: Permanently added '172.20.0.3' (ED25519) to the list of known hosts. Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-127-generic x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/pro System information as of Thu Jan 2 12:01:28 UTC 2025 System load: 0.16 Processes: 130 Usage of /: 10.5% of 96.73GB Users logged in: 0 Memory usage: 89% IPv4 address for eth0: 172.20.0.3 Swap usage: 0% Expanded Security Maintenance for Applications is not enabled. 8 updates can be applied immediately. To see these additional updates run: apt list --upgradable Enable ESM Apps to receive additional future security updates. See https://ubuntu.com/esm or run: sudo pro status ___ _ __ ___ / _ \ | '_ \ / _ \ OpenNebula Service Appliance | (_) || | | || __/ \___/ |_| |_| \___| All set and ready to serve 8)
- Verify the Ray Cluster Status:
root@chatbot-71:~# ray status ======== Autoscaler status: 2025-01-02 12:01:36.792794 ======== Node status --------------------------------------------------------------- Active: 1 node_4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186 Pending: (no pending nodes) Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 4.0/4.0 CPU 0B/4.42GiB memory 0B/2.21GiB object_store_memory Demands: (no resource demands)
- Confirm the Application Deployment:
root@chatbot-71:~# serve status proxies: 4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186: HEALTHY applications: app1: status: RUNNING message: '' last_deployed_time_s: 1735817946.9661372 deployments: ChatBot: status: HEALTHY status_trigger: CONFIG_UPDATE_COMPLETED replica_states: RUNNING: 1 message: '' target_capacity: null
-
Test the Inference Endpoint If
one-gate
is enabled in your OpenNebula installation, the inference endpoint URL should be added to the VM information. Alternatively, you can use the VM's IP address and port8000
:
$ onevm show 71 | grep RAY.*URL
ONEAPP_RAY_CHATBOT_URL="http://172.20.0.3:8000/chat"
A simple client.py script is available for testing the default application included in the appliance:
$ python3 ./client.py http://172.20.0.3:8000/chat
Chat interface started. Type 'exit' to quit.
You: Hello
Server: Hello! How can I assist you today?
You: What is Cloud Computing?
Server: Cloud computing refers to the delivery of computing services over the internet, such as storage, servers, databases
- OpenNebula Apps Overview
- OS Appliances Update Policy
- OneApps Quick Intro
- Build Instructions
- Linux Contextualization Packages
- Windows Contextualization Packages
- OneKE (OpenNebula Kubernetes Edition)
- Virtual Router
- Overview & Release Notes
- Quick Start
- OpenRC Services
- Virtual Router Modules
- Glossary
- WordPress
- Harbor Container Registry
- MinIO
- Ray AI
- Development