Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation, code, application and tutorial material for cloud-HPC integration with initial LLM use in OSS #51

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

K123AsJ0k1
Copy link
Collaborator

@K123AsJ0k1 K123AsJ0k1 commented Jan 27, 2025

As per request, this merge provides cloud-HPC integration changes into the main that were developed in the https://github.com/K123AsJ0k1/cloud-hpc-oss-mlops-platform fork. The changes are mostly additive with exceptions being deployment image updates and new setup script suggestion. The main changes are the following:

  • NVIDIA GPU support with GPU operator
  • Sharing 1 GPU between 10 pods via nvshare
  • Ray GPU configuration
  • LLM related inference and storage deployments
  • RAG Ray code with pararellism using multiple CPU workers and a GPU actor
  • Cloud-HPC integration documentation, applications and tutorial

These were developed and tested in a Ubuntu 22.04 VM run in CSC CPouta

@K123AsJ0k1 K123AsJ0k1 added the enhancement New feature or request label Jan 27, 2025
@K123AsJ0k1 K123AsJ0k1 self-assigned this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant