C, C++, and Python API for Adapters (Multi-LoRA and others) #911

baijumeswani · 2024-09-23T19:58:52Z

This pull-request introduces a way to apply adapters to a model with onnxruntime-genai. It leverages onnxruntime API changes from microsoft/onnxruntime#22046 and allows users to load, apply, and unload adapters as needed.

This pull-request adds the C, C++ and the Python API. Other language bindings will follow soon.

Example usage:

C++

auto model = OgaModel::Create("<adapter/model/path>");
auto adapters = OgaAdapters::Create(*model);
adapters->LoadAdapter("</absolute/adapter/file/path>", "<unique_user_supplied_adapter_identifier>");

auto tokenizer = OgaTokenizer::Create(*model);

const char* input_strings[] = {
    "This is a test.",
    "Rats are awesome pets!",
    "The quick brown fox jumps over the lazy dog.",
};

auto input_sequences = OgaSequences::Create();
for (auto& string : input_strings)
  tokenizer->Encode(string, *input_sequences);

{
  auto params = OgaGeneratorParams::Create(*model);
  params->SetSearchOption("max_length", 20);
  params->SetInputSequences(*input_sequences);

  auto generator = OgaGenerator::Create(*model, *params);
  generator->SetActiveAdapter(*adapters, "<unique_user_supplied_adapter_identifier>");

  while (!generator->IsDone()) {
    generator->ComputeLogits();
    generator->GenerateNextToken();
  }
}

// Optionally, unload the adapter. Will error out if the adapter is still active.
// So, the generator must go out of scope before the adapter can be unloaded.
adapters->UnloadAdapter("<unique_user_supplied_adapter_identifier>");

Python

model = og.Model("<model/path>")
adapters = og.Adapters(model)
adapters.load("</absolute/path/to/adapters>", "<unique_user_supplied_adapter_identifier>")

tokenizer = og.Tokenizer(model)
prompts = [
    "This is a test.",
    "Rats are awesome pets!",
    "The quick brown fox jumps over the lazy dog.",
]

params = og.GeneratorParams(model)
params.set_search_options(max_length=20)
params.input_ids = tokenizer.encode_batch(prompts)

generator = og.Generator(model, params)
generator.set_active_adapter(adapters, "<unique_user_suppled_adapter_identifier>")

while not generator.is_done():
    generator.compute_logits()
    generator.generate_next_token()

src/models/adapters.cpp

src/ort_genai_c.cpp

src/models/adapters.cpp

test/c_api_tests.cpp

pranavsharma

In the interest of time, let's resolve the ref counting issues later as they don't affect the public APIs.

baijumeswani force-pushed the baijumeswani/multi-lora branch from 1ac15e1 to d02e011 Compare October 2, 2024 06:56

baijumeswani added 3 commits October 7, 2024 18:05

C, C++, and Python API for Multi-Lora

5874763

Python script to create an adapter model based on phi2

6030355

Update nightly ort version for build

5d21f0b

baijumeswani force-pushed the baijumeswani/multi-lora branch from d02e011 to 5d21f0b Compare October 7, 2024 18:30

baijumeswani added 5 commits October 7, 2024 14:04

Get windows pipelines to succeed

ebf5690

Enable python test

6460d6b

Update requirements

711bba3

Enable cuda python test and C++ test

34db129

Hide C++ test behind ifdef to avoid pipeline failures

fd1ed71

baijumeswani marked this pull request as ready for review October 8, 2024 17:46

baijumeswani changed the title ~~C API for Adapters (Multi-LoRA and others)~~ C, C++, and Python API for Adapters (Multi-LoRA and others) Oct 8, 2024

baijumeswani requested review from yuslepukhin and RyanUnderhill October 9, 2024 17:59

RyanUnderhill reviewed Oct 9, 2024

View reviewed changes

src/models/adapters.cpp Show resolved Hide resolved

src/ort_genai_c.cpp Show resolved Hide resolved

src/models/adapters.cpp Show resolved Hide resolved

natke reviewed Oct 9, 2024

View reviewed changes

test/c_api_tests.cpp Show resolved Hide resolved

pranavsharma approved these changes Oct 16, 2024

View reviewed changes

baijumeswani merged commit 47132b6 into main Oct 16, 2024
13 checks passed

baijumeswani deleted the baijumeswani/multi-lora branch October 16, 2024 17:25

baijumeswani mentioned this pull request Oct 16, 2024

Address open comments from Multi-Lora PR #988

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C, C++, and Python API for Adapters (Multi-LoRA and others) #911

C, C++, and Python API for Adapters (Multi-LoRA and others) #911

baijumeswani commented Sep 23, 2024 •

edited

Loading

pranavsharma left a comment

C, C++, and Python API for Adapters (Multi-LoRA and others) #911

C, C++, and Python API for Adapters (Multi-LoRA and others) #911

Conversation

baijumeswani commented Sep 23, 2024 • edited Loading

pranavsharma left a comment

Choose a reason for hiding this comment

baijumeswani commented Sep 23, 2024 •

edited

Loading