Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C, C++, and Python API for Adapters (Multi-LoRA and others) #911

Merged
merged 8 commits into from
Oct 16, 2024

Conversation

baijumeswani
Copy link
Contributor

@baijumeswani baijumeswani commented Sep 23, 2024

This pull-request introduces a way to apply adapters to a model with onnxruntime-genai. It leverages onnxruntime API changes from microsoft/onnxruntime#22046 and allows users to load, apply, and unload adapters as needed.

This pull-request adds the C, C++ and the Python API. Other language bindings will follow soon.

Example usage:

C++

auto model = OgaModel::Create("<adapter/model/path>");
auto adapters = OgaAdapters::Create(*model);
adapters->LoadAdapter("</absolute/adapter/file/path>", "<unique_user_supplied_adapter_identifier>");

auto tokenizer = OgaTokenizer::Create(*model);

const char* input_strings[] = {
    "This is a test.",
    "Rats are awesome pets!",
    "The quick brown fox jumps over the lazy dog.",
};

auto input_sequences = OgaSequences::Create();
for (auto& string : input_strings)
  tokenizer->Encode(string, *input_sequences);

{
  auto params = OgaGeneratorParams::Create(*model);
  params->SetSearchOption("max_length", 20);
  params->SetInputSequences(*input_sequences);

  auto generator = OgaGenerator::Create(*model, *params);
  generator->SetActiveAdapter(*adapters, "<unique_user_supplied_adapter_identifier>");

  while (!generator->IsDone()) {
    generator->ComputeLogits();
    generator->GenerateNextToken();
  }
}

// Optionally, unload the adapter. Will error out if the adapter is still active.
// So, the generator must go out of scope before the adapter can be unloaded.
adapters->UnloadAdapter("<unique_user_supplied_adapter_identifier>");

Python

model = og.Model("<model/path>")
adapters = og.Adapters(model)
adapters.load("</absolute/path/to/adapters>", "<unique_user_supplied_adapter_identifier>")

tokenizer = og.Tokenizer(model)
prompts = [
    "This is a test.",
    "Rats are awesome pets!",
    "The quick brown fox jumps over the lazy dog.",
]

params = og.GeneratorParams(model)
params.set_search_options(max_length=20)
params.input_ids = tokenizer.encode_batch(prompts)

generator = og.Generator(model, params)
generator.set_active_adapter(adapters, "<unique_user_suppled_adapter_identifier>")

while not generator.is_done():
    generator.compute_logits()
    generator.generate_next_token()

@baijumeswani baijumeswani marked this pull request as ready for review October 8, 2024 17:46
@baijumeswani baijumeswani changed the title C API for Adapters (Multi-LoRA and others) C, C++, and Python API for Adapters (Multi-LoRA and others) Oct 8, 2024
src/models/adapters.cpp Show resolved Hide resolved
src/ort_genai_c.cpp Show resolved Hide resolved
src/models/adapters.cpp Show resolved Hide resolved
Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of time, let's resolve the ref counting issues later as they don't affect the public APIs.

@baijumeswani baijumeswani merged commit 47132b6 into main Oct 16, 2024
13 checks passed
@baijumeswani baijumeswani deleted the baijumeswani/multi-lora branch October 16, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants