-
Notifications
You must be signed in to change notification settings - Fork 17
ray_feature
The Ray appliance comes pre-installed with the Ray framework and its Ray Serve library, enabling the deployment inference APIs. This appliance simplifies the deployment of model-serving applications and integrates seamlessly with models available on Hugging Face (a Hugging Face account and token may be required for certain models).
The appliance's behavior and configuration are controlled by contextualization parameters specified in the VM template's Context Section. Below are the primary configurable aspects:
A simple model-serving application is included with the Ray appliance for testing purposes. See the config.rb file for details. The application deployment can be controlled using the following parameters:
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_APPLICATION_URL |
- | URL to download the Python application. |
ONEAPP_RAY_APPLICATION_FILE64 |
- | Python application to be deployed in the Ray framework (base64 encoded). |
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_API_PORT |
8000 | Port number for the API endpoint. |
ONEAPP_RAY_API_ROUTE |
"/chat" | Route path for the REST API exposed by the Ray application. |
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_MODEL_ID |
meta-llama/Llama-3.2-1B-Instruct | Specifies the AI model(s) used for inference. |
ONEAPP_RAY_MODEL_TEMPERATURE |
0.1 | Controls the randomness of generated text by adjusting the temperature setting. |
ONEAPP_RAY_MODEL_TOKEN |
- | Provides the authentication token required to access the specified AI model. |
To achieve full control over the application setup, you can provide a configuration file for the Ray Serve application. Refer to the Ray Serve documentation for detailed a description. Use the following parameter to configure this:
Parameter | Default | Description |
---|---|---|
ONEAPP_RAY_CONFIG_FILE64 |
- | Base64-encoded configuration file for the Serve application. |
The appliance is designed to utilize all available CPU and GPU resources in the VM by default. However, GPU drivers are not pre-installed. To use GPUs, the appropriate drivers must be installed. GPUs can be added to the VM using:
- PCI Passthrough
- SR-IOV vGPUs
Some configurations may require downloading proprietary drivers and configuring associated licenses. Note: When using NVIDIA cards, select a profile that supports OpenCL and CUDA applications (e.g., Q-series vGPU types).
After deployment, the application should utilize the GPU resources, as verified using nvidia-smi
:
root@ray-app-28245:~# nvidia-smi
Tue Dec 31 15:28:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10-24Q On | 00000000:01:01.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 6259MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2286 C ray::ServeReplica:app1:ChatBot 6257MiB |
+---------------------------------------------------------------------------------------+
- OpenNebula Apps Overview
- OS Appliances Update Policy
- OneApps Quick Intro
- Build Instructions
- Linux Contextualization Packages
- Windows Contextualization Packages
- OneKE (OpenNebula Kubernetes Edition)
- Virtual Router
- Overview & Release Notes
- Quick Start
- OpenRC Services
- Virtual Router Modules
- Glossary
- WordPress
- Harbor Container Registry
- MinIO
- Ray AI
- Development