Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable Model Integration Interface #738

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

calpt
Copy link
Member

@calpt calpt commented Aug 25, 2024

This PR drafts a new model integration interface which makes it easier to support new and custom model architectures for selected adapter methods without full model implementation.

This is done with the new AdapterModelInterface class that translates from generic model access points to model-specific attribute names.

Example usage:

Basic interface for Qwen model:

model_interface = AdapterModelInterface(
    adapter_types=["lora", "reft"],
    model_embeddings="embed_tokens",
    model_layers="layers",
    layer_self_attn="self_attn",
    layer_cross_attn=None,
    attn_k_proj="k_proj",
    attn_q_proj="q_proj",
    attn_v_proj="v_proj",
    attn_o_proj="o_proj",
    layer_intermediate_proj="mlp.up_proj",
    layer_output_proj="mlp.down_proj",
)

model_name = "Qwen/Qwen2-0.5B"

model = AutoModelForCausalLM.from_pretrained(model_name)
adapters.init(model, interface=model_interface)

config = LoRAConfig()
# config = LoReftConfig()
model.add_adapter("my_adapter", config=config)

print(model.adapter_summary())

Extended interface

Additionally, the interface provides optional attributes that enable (almost) full bottleneck adapter support. Without the extended interface, bottleneck adapter support is very limited.

Example for Gemma2:

adapter_interface = AdapterModelInterface(
    adapter_types=["bottleneck", "lora", "reft"],
    model_embeddings="embed_tokens",
    model_layers="layers",
    layer_self_attn="self_attn",
    layer_cross_attn=None,
    attn_k_proj="k_proj",
    attn_q_proj="q_proj",
    attn_v_proj="v_proj",
    attn_o_proj="o_proj",
    layer_intermediate_proj="mlp.up_proj",
    layer_output_proj="mlp.down_proj",
    layer_pre_self_attn="input_layernorm",
    layer_pre_cross_attn=None,
    layer_pre_ffn="pre_feedforward_layernorm",
    layer_ln_1="post_attention_layernorm",
    layer_ln_2="post_feedforward_layernorm",
)

Additional novelties

  • Adds AdapterMethod as an enum of all supported adapter method types (e.g. AdapterMethod.bottleneck, AdapterMethod.lora, ...)
  • Adds a supports_adapter() method for easy checking whether a model instance supports a certain adapter method. This method can receive an AdapterMethod string or a config object:
    model.supports_adapter(AdapterMethod.prompt_tuning)
    # or
    model.supports_adapter(PromptTuningConfig())
    (This method is supported by both models implemented via "classic" mixins and via pluggable interface.)

State of implementation

Supported adapter types:

  • LoRA
  • ReFT
  • Bottleneck/ Compacter: partial, currently does not support:
    • is_parallel, via extended interface
    • original_ln_before=True, via extended interface
    • original_ln_after=False (e.g. used for AdapterPlusConfig)
  • Invertible adapters
  • Prefix Tuning
  • Prompt Tuning: partial: attention mask modification only supports very specific model implementations

Supported features:

  • Embedding training
  • Fusion composition

Not to be supported:

  • Parallel composition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants