Skip to content

ray_feature

Ruben S. Montero edited this page Jan 2, 2025 · 4 revisions

Features and Usage

The Ray appliance comes pre-installed with the Ray framework and its Ray Serve library, enabling the deployment inference APIs. This appliance simplifies the deployment of model-serving applications and integrates seamlessly with models available on Hugging Face (a Hugging Face account and token may be required for certain models).

Contextualization

The appliance's behavior and configuration are controlled by contextualization parameters specified in the VM template's Context Section. Below are the primary configurable aspects:

Ray Application

A simple model-serving application is included with the Ray appliance for testing purposes. See the config.rb file for details. The application deployment can be controlled using the following parameters:

Parameter Default Description
ONEAPP_RAY_APPLICATION_URL - URL to download the Python application.
ONEAPP_RAY_APPLICATION_FILE64 - Python application to be deployed in the Ray framework (base64 encoded).

API Endpoint

Parameter Default Description
ONEAPP_RAY_API_PORT 8000 Port number for the API endpoint.
ONEAPP_RAY_API_ROUTE "/chat" Route path for the REST API exposed by the Ray application.

Application Model

Parameter Default Description
ONEAPP_RAY_MODEL_ID meta-llama/Llama-3.2-1B-Instruct Specifies the AI model(s) used for inference.
ONEAPP_RAY_MODEL_TEMPERATURE 0.1 Controls the randomness of generated text by adjusting the temperature setting.
ONEAPP_RAY_MODEL_TOKEN - Provides the authentication token required to access the specified AI model.

Configuration Files

To achieve full control over the application setup, you can provide a configuration file for the Ray Serve application. Refer to the Ray Serve documentation for detailed a description. Use the following parameter to configure this:

Parameter Default Description
ONEAPP_RAY_CONFIG_FILE64 - Base64-encoded configuration file for the Serve application.

Using GPUs

The appliance is designed to utilize all available CPU and GPU resources in the VM by default. However, GPU drivers are not pre-installed. To use GPUs, the appropriate drivers must be installed. GPUs can be added to the VM using:

  • PCI Passthrough
  • SR-IOV vGPUs

Some configurations may require downloading proprietary drivers and configuring associated licenses. Note: When using NVIDIA cards, select a profile that supports OpenCL and CUDA applications (e.g., Q-series vGPU types).

After deployment, the application should utilize the GPU resources, as verified using nvidia-smi:

root@ray-app-28245:~# nvidia-smi
Tue Dec 31 15:28:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.216.01             Driver Version: 535.216.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10-24Q                 On  | 00000000:01:01.0 Off |                    0 |
| N/A   N/A    P8              N/A /  N/A |   6259MiB / 24576MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2286      C   ray::ServeReplica:app1:ChatBot             6257MiB |
+---------------------------------------------------------------------------------------+
Clone this wiki locally