ray_feature

Features and Usage

The Ray appliance comes pre-installed with the Ray framework and its Ray Serve library, enabling the deployment inference APIs. This appliance simplifies the deployment of model-serving applications and integrates seamlessly with models available on Hugging Face (a Hugging Face account and token may be required for certain models).

Contextualization

The appliance's behavior and configuration are controlled by contextualization parameters specified in the VM template's Context Section. Below are the primary configurable aspects:

Ray Application

A simple model-serving application is included with the Ray appliance for testing purposes. See the config.rb file for details. The application deployment can be controlled using the following parameters:

Parameter	Default	Description
`ONEAPP_RAY_APPLICATION_URL`	-	URL to download the Python application.
`ONEAPP_RAY_APPLICATION_FILE64`	-	Python application to be deployed in the Ray framework (base64 encoded).

API Endpoint

Parameter	Default	Description
`ONEAPP_RAY_API_PORT`	8000	Port number for the API endpoint.
`ONEAPP_RAY_API_ROUTE`	"/chat"	Route path for the REST API exposed by the Ray application.

Application Model

Parameter	Default	Description
`ONEAPP_RAY_MODEL_ID`	meta-llama/Llama-3.2-1B-Instruct	Specifies the AI model(s) used for inference.
`ONEAPP_RAY_MODEL_TEMPERATURE`	0.1	Controls the randomness of generated text by adjusting the temperature setting.
`ONEAPP_RAY_MODEL_TOKEN`	-	Provides the authentication token required to access the specified AI model.

Configuration Files

To achieve full control over the application setup, you can provide a configuration file for the Ray Serve application. Refer to the Ray Serve documentation for detailed a description. Use the following parameter to configure this:

Parameter	Default	Description
`ONEAPP_RAY_CONFIG_FILE64`	-	Base64-encoded configuration file for the Serve application.

Using GPUs

The appliance is designed to utilize all available CPU and GPU resources in the VM by default. However, GPU drivers are not pre-installed. To use GPUs, the appropriate drivers must be installed. GPUs can be added to the VM using:

PCI Passthrough
SR-IOV vGPUs

Some configurations may require downloading proprietary drivers and configuring associated licenses. Note: When using NVIDIA cards, select a profile that supports OpenCL and CUDA applications (e.g., Q-series vGPU types).

After deployment, the application should utilize the GPU resources, as verified using nvidia-smi:

root@ray-app-28245:~# nvidia-smi
Tue Dec 31 15:28:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01             Driver Version: 535.216.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10-24Q                 On  | 00000000:01:01.0 Off |                    0 |
| N/A   N/A    P8              N/A /  N/A |   6259MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2286      C   ray::ServeReplica:app1:ChatBot             6257MiB |
+---------------------------------------------------------------------------------------+

Home

OpenNebula Apps Overview
OS Appliances Update Policy
OneApps Quick Intro
Build Instructions
Linux Contextualization Packages
Windows Contextualization Packages
OneKE (OpenNebula Kubernetes Edition)
Virtual Router
- Overview & Release Notes
- Quick Start
- OpenRC Services
- Virtual Router Modules
  - Router4
  - NAT4
  - SDNAT4
  - Load Balancing
  - DNS
  - DHCP4
  - Keepalived: Failover
  - Wireguard VPN
- Glossary
WordPress
- Overview & Release Notes
- Features and usage
Harbor Container Registry
MinIO
Ray AI
Development
- Virtual Router

Provide feedback

Saved searches

Use saved searches to filter your results more quickly