Skip to content

ray_quick

Ruben S. Montero edited this page Jan 2, 2025 · 7 revisions

Quick Start

The Ray appliance includes a built-in chat application that can be easily deployed using a pre-trained model. This guide shows how to deploy and serve this application:

  1. Download the Appliance Retrieve the Ray appliance from the OpenNebula marketplace using the following command:

    $ onemarketapp export 'Service Ray' Ray --datastore default
  2. (Optional) Configure the Ray VM Template Depending on your specific application requirements, you may need to modify the VM template to adjust resources such as vCPU or MEMORY, or to add GPU cards for enhanced model serving capabilities.

  3. Instantiate the Template Upon instantiation, you will be prompted to configure model-specific parameters, such as the model ID and temperature, as well as provide your Hugging Face token if required. For example, deploying the Qwen2.5-1.5B-Instruct model results in the following CONTEXT and capacity attributes:

    MEMORY="8192"
    VCPU="4"
    ...
    CONTEXT=[
      DISK_ID="1",
      ETH0_DNS="172.20.0.1",
      ...
      ONEAPP_RAY_API_PORT="8000",
      ONEAPP_RAY_CHATBOT_CPUS="4",
      ONEAPP_RAY_MODEL_ID="Qwen/Qwen2.5-1.5B-Instruct",
      ONEAPP_RAY_MODEL_TEMPERATURE="0.1",
      ONEAPP_RAY_MODEL_TOKEN="hf_rbmflxx*************",
      ...
    ]

    Note: The number of CPUs allocated to the application is automatically derived from the available virtual CPUs.

  4. Deploy the Application The deployment process may take several minutes as it downloads the model and required dependencies (e.g., PyTorch and FastAPI). You can monitor the status by logging into the VM:

    • Access the VM via SSH:
    $ onevm ssh 71
    Warning: Permanently added '172.20.0.3' (ED25519) to the list of known hosts.
    Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-127-generic x86_64)
    
    * Documentation:  https://help.ubuntu.com
    * Management:     https://landscape.canonical.com
    * Support:        https://ubuntu.com/pro
    
    System information as of Thu Jan  2 12:01:28 UTC 2025
    
    System load:  0.16               Processes:             130
    Usage of /:   10.5% of 96.73GB   Users logged in:       0
    Memory usage: 89%                IPv4 address for eth0: 172.20.0.3
    Swap usage:   0%
    
    Expanded Security Maintenance for Applications is not enabled.
    
    8 updates can be applied immediately.
    To see these additional updates run: apt list --upgradable
    
    Enable ESM Apps to receive additional future security updates.
    See https://ubuntu.com/esm or run: sudo pro status
          ___   _ __    ___
         / _ \ | '_ \  / _ \   OpenNebula Service Appliance
        | (_) || | | ||  __/
         \___/ |_| |_| \___|
    
     All set and ready to serve 8)
    • Verify the Ray Cluster Status:
    root@chatbot-71:~# ray status
    ======== Autoscaler status: 2025-01-02 12:01:36.792794 ========
    Node status
    ---------------------------------------------------------------
    Active:
     1 node_4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186
    Pending:
     (no pending nodes)
    Recent failures:
     (no failures)
    
    Resources
    ---------------------------------------------------------------
    Usage:
      4.0/4.0 CPU
      0B/4.42GiB memory
      0B/2.21GiB object_store_memory
    Demands:
      (no resource demands)
    • Confirm the Application Deployment:
    root@chatbot-71:~# serve status
    proxies:
      4980ccc4dd76317acd4a9bab9f72a2507387f3bb10902bebc91de186: HEALTHY
    applications:
      app1:
        status: RUNNING
        message: ''
        last_deployed_time_s: 1735817946.9661372
        deployments:
          ChatBot:
            status: HEALTHY
            status_trigger: CONFIG_UPDATE_COMPLETED
          replica_states:
            RUNNING: 1
          message: ''
     target_capacity: null
  5. Test the Inference Endpoint If one-gate is enabled in your OpenNebula installation, the inference endpoint URL should be added to the VM information. Alternatively, you can use the VM's IP address and port 8000:

$ onevm show 71 | grep RAY.*URL
ONEAPP_RAY_CHATBOT_URL="http://172.20.0.3:8000/chat"

A simple client.py script is available for testing the default application included in the appliance:

$ python3 ./client.py http://172.20.0.3:8000/chat
Chat interface started. Type 'exit' to quit.
You: Hello
Server: Hello! How can I assist you today?
You: What is Cloud Computing?
Server: Cloud computing refers to the delivery of computing services over the internet, such as storage, servers, databases
Clone this wiki locally