Yineng Zhang zhyncs

💼 Principal AI Researcher at Together AI — tech lead of Together AI Inference Stack.
🧑‍💻 I have initiated and led the end-to-end DeepSeek V3/R1 effort on SGLang — from day-0 support and performance optimization to large-scale EP deployment and GB200 NVL72 integration — driving roadmap, coordination, and execution across community collaborations that pushed the frontier of open-source inference engines.
🎤 Interviewed by The New York Times (Article 1, Article 2), Featured speaker at AI Engineer World's Fair 2025, AMD AI DevDay 2025 and PyTorch Conference 2025.
📚 Co-author of the FlashInfer paper (MLSys 2025 Best Paper) and committer to FlashInfer. Previously, I was Lead Software Engineer at Baseten (co-authored the DeepSeek V3 and Qwen 3 launches) and led CTR GPU inference and vector retrieval system development at Meituan.
📫 Contact: [email protected] | Telegram | LinkedIn | Homepage

Provide feedback