monologuer

Follow

monologuer

Follow

4 followers · 1 following

Popular repositories Loading

flash-attention-minimal flash-attention-minimal Public

Forked from tspeterkim/flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda
LLM4Decompile LLM4Decompile Public

Forked from albertan017/LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models

Python
candle candle Public

Forked from huggingface/candle

Minimalist ML framework for Rust

Rust
tiny-gpu tiny-gpu Public

Forked from adam-maj/tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog
NyuziProcessor NyuziProcessor Public

Forked from jbush001/NyuziProcessor

GPGPU microprocessor architecture

C
distributed-llama distributed-llama Public

Forked from b4rtaz/distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

C++