awesome-ai-engineering

List of resources helping you become a better AI engineer.

What's an AI Engineer?

Microsoft's definition "Artificial intelligence (AI) engineers are responsible for developing, programming and training the complex networks of algorithms that make up AI so that they can function like a human brain. This role requires combined expertise in software development, programming, data science and data engineering"
Coursera's definition "Artificial intelligence engineers are individuals who use AI and machine learning techniques to develop applications and systems that can help organizations increase efficiency, cut costs, increase profits, and make better business decisions."
Tech Target "AI engineers develop, program and train the complex networks of algorithms that encompass AI so those algorithms can work like a human brain. AI engineers must be experts in software development, data science, data engineering and programming."
Swyx podcast (17 April 2024)
Scaler Blogs "AI engineers design, develop, and deploy intelligent systems using machine learning, deep learning, and NLP to solve complex problems and enable autonomous decision-making."

What's the difference between an AI Engineer and a Machine Learning Engineer?

UpWork "AI engineers work on a broader set of tasks that encompass various forms of machine intelligence, like neural networks, to develop AI models for specific applications. In contrast, ML engineers focus more on ML algorithms and models that can self-tune to better learn and make predictions from large data sets."

What's the difference between an AI Engineer and a Software Engineer?

IEEE ChatGPT's summary of that page "AI engineers blend traditional software engineering skills with a deep understanding of machine learning and artificial intelligence to develop systems that enhance decision-making and automation within organizations. They are proficient in AI technologies and statistical analysis, focusing on building and integrating AI models into applications. On the other hand, software engineers focus broadly on designing, implementing, and maintaining software systems, with a comprehensive grasp of the software development lifecycle, from requirement analysis to deployment and maintenance. The distinction is further marked by the AI engineer's need to navigate emerging AI technologies, whereas software engineers adhere to established engineering principles and practices across various platforms and technologies"

Practical Tools & Techniques

This section covers useful stuff you can use to become a better AI engineer.

LLM Platforms and APIs

LLM Platforms

ChatGPT
Claude.ai
Phind (dev focus, GPT4+own)
Microsoft Copilot (GPT4+own)
Perplexity.ai
You.com
groq.com

LLM APIs and Inference Services

GPU Marketplaces

free open weight playgrounds

Try out open source models instantly.

Perplexity Labs side by side comparison
Groq chat demo a subset of models on Groq's proprietary inference hardware (LPUs)
Vercel AI Playground

self-hosted Open weight inference

ollama (go/open source)
LocalAI (go/open source)
msty.app
Nitro.jan.ai
Paddler scaling / load balancing of llama.cpp inference

SaaS

fal.ai
lepton.ai
modal.com: on demand Serverless container +GPU execution runtime
Predibase: LLM fine-tuning and hosting
[https://hpc-ai.com/](HPC AI): GPU rental
Replicate.com: models-as-a service
Together.ai: Serverless LLM / multimodal inference
Lambda Labs: Manual rental of GPUs / clusters
Beam.cloud: Serverless generative AI fast standup
Runpod
Cloudflare Workers AI
Coreweave: autoscale GPU + Serverless (knative)
Mosaicml: (acquired by Databricks)
mixedbread.ai: retrieval as a service (search, reranking, embedding)
lamini.ai: LLM inference
Anyscale + rai.ai scaling
HF inference API
massedcompute.com
Salad.com
Openpipe.ai
Unsloth.ai
Crusoe.ai GPU rental
Akash
Groq: ultra fast LLM for selected models
BoltAI
Saturn Cloud
Fireworks.ai
Inferless.com
Banana.dev (defunct)
pipeline.ai
hyperstack.cloud
Alibaba Elastic GPU service
Cloudalize GPU Kubenetes Service
Tensordock.com
Fly GPU GPUs on demand
Jarvis Labs GPUs on demand
BentoML open source open weight inference with cloud option
bitbop GPU dev the cloud
Simplepid.ai

Structured output

SGLang
outlines
Instructor
Marginalia

Prompt engineering

LLM Development and Optimization

LLM Testing and Evaluation

promptfoo
Ollama grid search
Uptrain
Google Cloud GCP AutoSxS
Paloma
LightEval
Bayesian Evaluation
Mozilla's experience
Ruler (long context evaluation)
OpenAI Simple Evals
Moonshot

Leaderboards / Evaluations

SEAL
Lmsys.org
HuggingFace Open LLM leaderboard
Vectara Hallucination Leaderboard
Text Embedding Leaderboard
Enterprise Use Case Leaderboard Finance, Legal, Customer Support
MixEval
Arena Hard Auto
Google Instruction Following Eval / IFEval

Observability

Pretraining

llm.c: Andrey Karparthy's GPT-2 from thr ground up in raw C

Human Input Methods

RLHF
DPO
TKO
LIPO
DORA
SPO

Architecture Innovations

Longformer
Reformer
BigBird
Attention Beacons
RWKV
Denseformer
Microsoft SliceGPT remove up to 25% of layers
DCFormer

Tokenizers

ZeTT

Fine-Tuning and Optimization

Lazy Axolotl
Lit-GPT
Predibase
Fine Tune Llama 2 Colab (by HF)
Openpipe.ai
LISA
Torchtune
LASER layer reduction
lmstudio.ai
AutoQuant
Mistral fine tuning service Github

Task-Optimized LLMs and Context Extension

Predibase LORALand
RoPE
Ailibi
LongRoPE
Unsloth+RoPE
InfiniAttention: a pathway to ultra long context windows with manageable memory consumption

Infrastructure and Tools

Vector Stores / Information Retrieval

pinecone
weaviate
chroma (open source)
lancedb (open source)
postgresql + pgvector (open source)
sqlite + vss (open source)
faiss by meta
Vespa.ai + binary embeddings

Telemetry

IR measures

Cloud Hosting

Blueocean / paperspace for GPU
AWS
GCP
Azure
Hetzner GPU
Cloudflare

Notebooks and Code Interpreters

Lightning Studio
Google Colab
ChatGPT
Julius.ai

Attention Mechanisms

FlashAttentionv2
HippoAttention
RingAttention
PagedAttention

Model Merging

Efficient Linear Model Merging for LLMs
Automerge
Sakana Evolutionary Model Merge

Optimizers and Autodifferentiation

Optimizers

Adam
AdamW
Prodigy
Schedule-free optimizers (April 2024)

Autodifferentiation Libraries

SymPy
torch.autograd
Autograd
tf.GradientTape
gomlx

Prompt Debugging

mitmproxy (via Show Me The Prompt)

Agents and Swarms

CrewAI
Autogen
OpenDevin
SWE-agent
Leda
Devon open source pair programmer
HuggingFace Agents

Analytics

Chat with Your Data/RAG

Weaviate Verba: RAG solution using Weaviate
Microsoft GitHub
AWS Bedrock embeddings, streamlit, langchain, pinecone, claude, etc.
AWS Serverless
GCP
Gemini for document processing
AWS knowledge bases for bedrock
FLARE dynamically replace low-probability tokens with RAG lookups
Embedchain

Guardrails and Safety

Protection

Llamaguard
Llamaguard with streaming
Guardrails for AWS Bedrock

Jailbreaks

Embeddings and Document Processing

Embeddings Services

Amazon Titan Embeddings
Huggingface
Nomic + ollama
Cohere multi-aspect embeddings
LLM2Vec

Document Extraction Services

Amazon Kendra

Embeddings Algorithms

Colbert
Binary quantization (BitNet)

Multi-Adapter Models

For hosting multiple fine-tunes at once

Punica

GPU Usage Optimization

Run.ai -- service for bare metal GPU cluster management now owned by Nvidia

Important Datasets

sst2 sentiment movie sentiment (HF)
650,000 English books
Openwebtext
Fineweb

Synthetic Data Generation

generator9000

GPUs and Accelerators

Groq
Truffle-1

Data Curation

NeMo-Curator

ML Local Mini Clusters

Tinybox / tinygrad
WOPR (7 x 4090)

Data Labeling

Argilla Distilabel
Spacy Prodigy
Snorkel
Refuel-AI autolabel

Model Configuration Management

DVCorg
WandB Weave