dpo

Here are 51 public repositories matching this topic...

wangclnlp / Vision-LLM-Alignment

This repo contains the codes for supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) designed for vision LLMs.

vision alignment multi-model reward ppo sft dpo llm rlhf mllm llava

Updated Jul 7, 2024
Python

TUDB-Labs / mLoRA

Star

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

Updated Jul 7, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Updated Jul 7, 2024
Python

shibing624 / MedicalGPT

Star

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

medical llama gpt dpo llm chatgpt medicalgpt

Updated Jul 7, 2024
Python

dvlab-research / Step-DPO

Star

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jul 7, 2024
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jul 7, 2024
Python

armbues / SiLLM-examples

Star

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jul 5, 2024
Python

armbues / SiLLM

Star

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jul 3, 2024
Python

sugarandgugu / Simple-Trl-Training

Star

基于DPO算法微调语言大模型，简单好上手。

simple dpo trl llm rlhf

Updated Jul 3, 2024
Python

martin-wey / CodeUltraFeedback

Star

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

Updated Jun 25, 2024
Python

RockeyCoss / SPO

Star

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Jun 21, 2024
Python

OctopusMind / DPO

Star

dpo算法实现

lora dpo rlhf qwen

Updated Jun 12, 2024
Python

TideDra / VL-RLHF

Star

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Jun 12, 2024
Python

kyryl-opens-ml / rlfh-dagster-modal

Star

Re-usable & scalable RLHF training pipeline with Dagster and Modal.

modal dpo dagster llm rlhf

Updated Jun 11, 2024
Python

ducnh279 / Align-LLMs-with-DPO

Star

Align a Large Language Model (LLM) with DPO loss

python transformers pytorch alignment dpo llms

Updated Jun 6, 2024
Jupyter Notebook

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated May 30, 2024
Python

DPO-Group / DPO_WooCommerce

Star

This is the DPO Pay plugin for WooCommerce.

woocommerce woocommerce-payment dpo

Updated May 28, 2024
PHP

adithya-s-k / Indic-llm

Star

A open-source framework designed to adapt pre-trained Language Models (LLMs), such as Llama, Mistral, and Mixtral, to a wide array of domains and languages.

lora finetuning dpo llm finetuning-llms continual-pre-training

Updated May 27, 2024
Python

golang-malawi / go-dpo

Star

Unofficial Go library for DPO Group

golang library payments dpo

Updated May 3, 2024
Go

DPO-Group / DPO_Gravity_Forms

Star

This is the DPO Group plugin for Gravity Forms.

gravityforms gravity-forms gravityforms-payment dpo

Updated Apr 29, 2024
PHP

Improve this page

Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo

Here are 51 public repositories matching this topic...

wangclnlp / Vision-LLM-Alignment

TUDB-Labs / mLoRA

modelscope / swift

shibing624 / MedicalGPT

dvlab-research / Step-DPO

jianzhnie / LLamaTuner

armbues / SiLLM-examples

armbues / SiLLM

sugarandgugu / Simple-Trl-Training

martin-wey / CodeUltraFeedback

RockeyCoss / SPO

OctopusMind / DPO

TideDra / VL-RLHF

kyryl-opens-ml / rlfh-dagster-modal

ducnh279 / Align-LLMs-with-DPO

ContextualAI / HALOs

DPO-Group / DPO_WooCommerce

adithya-s-k / Indic-llm

golang-malawi / go-dpo

DPO-Group / DPO_Gravity_Forms

Improve this page

Add this topic to your repo