一份说明可伸缩、高可靠和高性能的大规模系统模式的阅读清单。案例学习都是从服务于数百万甚至数亿用用户的线上系统总结的。
定位你的问题: 可扩展问题(对于单个用户请求响应很快但是在负载高时变慢)或者性能问题(对于单个用户请求慢)可以参考设计原则和可伸缩性和性能问题在技术公司是如何解决的。
智能部分是为处理数据,机器学习和深度学习的人准备
"即使某天你失去了一切,如果保持镇静,你还能东山再起" - Thuan Pham,Uber CTO。所以遇事不慌,记住可用性和稳定性的重要性。
在白板设计应用之前,看看面试笔记和完整图示的实际架构有个全面的认识。你还可以看看技术大牛的演讲,了解他们怎么构建,扩展和优化他们的系统。推荐一些书籍(大部分都是免费的)给你!祝你好运:four_leaf_clover:
扩大团队规模的目标不是增加团队规模,而是增加团队产出和价值。你可以在组织中看到技术公司如何在各个方面实现这个目标:雇用,管理,组织,文化和沟通。
欢迎贡献!你可以看看 contribution guidelines。如果你发现一些链接失效或错误,请提交PR。
这个项目花了很多时间整理。如果你觉得对你有帮助,请分享到Facebook,Twitter和微博,或者分享到聊天群众!知识就是力量,分享知识力量翻倍。谢谢。
- 大规模服务的经验教训 - Eric Brewer, UC Berkeley & Google
- 构建大型分布式系统的设计、经验和建议 - Jeff Dean, Google
- 如何设计一个好的API&为什么这很重要- Joshua Bloch, CMU & Google
- 关于效率,可靠性,扩展性 - James Hamilton, VP at AWS
- 为企业构建平台时要牢记的事情 - Heidi Williams, VP Platform at Box
- 混沌工程原理
- 在混乱中寻找秩序
- 12-factor应用 | 原文 | 译文
- 整洁架构
- 高内聚和低耦合
- 单体和微服务
- CAP理论和权衡
- CP数据库和AP数据库
- 无状态vs有状态的可扩展性
- Scale Up vs Scale Out
- Scale Up vs Scale Out: 隐藏的成本
- Scaling Out最佳实践
- 持续交付的最佳实践
- ACID 和 BASE
- 阻塞/非阻塞和同步/异步
- 数据库的性能和可扩展性
- 数据库隔离水平及其对性能和可扩展性的影响
- 大型集群中数据丢失的概率
- 高可扩展解决方案的数据访问:使用SQL, NoSQL和Polyglot持久化技术
- SQL vs NoSQL
- SQL vs NoSQL - 来自Salesforce的经验
- NoSQL数据库: 调查和决策指导
- 分片是如何工作的 | 原文 | 译文
- 一致性hash
- 一致性哈希:算法上的权衡
- 不要被散列技巧欺骗
- Netflix的统一一致哈希
- 最终一致 - Werner Vogels, CTO at Amazon
- 缓存为王
- 反缓存
- 了解延迟
- 每个程序员都应该知道的延迟
- 服务可用性演算
- 扩展Web应用程序时的体系结构问题:瓶颈,数据库,CPU,IO
- 常见瓶颈
- 分布式交易之外的生活
- 依靠软件在不同的层级上可靠地重定向流量
- 故意打破的东西
- 避免过度工程化
- 可扩展性最糟糕的做法
- 使用坚实的技术---不要重新发明车轮---保持简单!
- 通过分配复杂性来简化
- 为什么过度使用是不好的
- 性能是一种特性
- 让性能成为你工作流程的一部分
- 服务器端渲染比客户端渲染的优点
- 编写可扩展的代码
- 自动化与摘要:在Facebook的经验
- AWS应该做的和不应该做的
- (UI)设计没有规模---Spotify的设计总监Stanley Wood
- Linux性能
- 构建快速、灵活的Web应用 - Ilya Grigorik
- 接受部分故障,尽量减少服务损失
- 松耦合的设计
- 弹性设计
- 自愈设计
- Scaling Out的设计
- 演化设计
- 从错误中吸取教训
- 微服务和组织架构
- Riot Games 容器(8 parts)
- Pinterest 容器化
- Netflix 容器用途的演变
- Uber Docker化MySQL
- Spotify 微服务测试
- Treehouse的docker应用
- SoundCloud 微服务
- Stripe 可靠地运行Kubernetes
- Rakuten Kubernetes流量路由(2 parts)
- Agrarian-Scale Kubernetes (3 parts) at New York Times
- BBC 纳米服务
- Bloomberg Kubernetes集群的测试工具PowerfulSeal
- Netflix 微服务协调器Conductor
- Shopify 为超过10万家在线商店提供动力的Docker容器
- Medium微服务架构
- Betabrand 从裸机到Kubernetes
- Kubernetes在Tinder
- Pinterest的Kubernetes平台
- Nubank的微服务
- 分布式缓存
- EVCache: Netflix的分布式内存缓存
- Netflix Cache Warmer 基础架构:EVCache
- Box 强大的Memcache流量分析器:Memsniff
- Etsy 一致哈希缓存和缓存涂抹
- Facebook 照片缓存的分析
- Facebook 内存高效实践
- tCache: Scalable Data-aware Java Caching at Trivago
- Trivago 减少50%Memcached内存使用
- Yelp 缓存内部服务调用
- Allegro 利用大数据估算缓存效率
- Zalando 分布式缓存
- NetFlix 从RAM到SSD的应用数据缓存
- Skyscanner 复制式缓存的权衡
- DoorDash 避开 "缓存风暴"
- Yext 使用Quadtrees进行位置缓存
- Quoraji 进程内缓存:Pycache
- 可扩展Redis在Twitter的应用
- Slack 使用Redis扩展任务队列
- Github 将持久性数据从Redis中移出
- Instagram 在 Redis 中存储数以亿计的简单键值对 | 原文 | 译文
- Redis在Trivago的应用
- Deliveroo 优化Redis存储
- Wattpad Redis中的内存优化
- Heroku使用Redis Fleet
- HTTP缓存和CDN
- 分布式锁
- 分布式链路和追踪
- Zipkin: Twitter分布式链路追踪
- SoundCloud 使用Kubernetes Pod元数据改善Zipkin跟踪
- Facebook可扩展的分布式跟踪和分析:Canopy
- Pintrace: Pinterest的分布式追踪
- LinkedIn实时分布式追踪
- Shopify 大规模跟踪服务基础架构
- HelloFresh 分布式追踪
- Pinterest 分析分布式链路数据
- Uber 分布式追踪
- Uber 跟踪分布式JVM应用程序:JVM Profiler
- Data Checking at Dropbox
- Showmax分布书追踪
- Palantir osquery在企业中的应用
- StatsD在Etsy
- StatsD在DoorDash
- 分布式调度
- 分布式监控与告警
- Alibaba 监控系统
- Dailymotion 实时用户监控
- Uber 预警系统
- SoundCloud 服务级别目标 (SLO) 警报
- Uber 用于可观察性异常检测的基于作业的预测工作流
- HackerEarth 使用 Graphite 和 Cabot 的监控和警报系统
- Securitybot:Dropbox 的分布式警报机器人
- Twitter 上的可观察性(2 部分)
- Slack分布式安全告警
- Bloomberg 实时新闻提醒
- Unicorn: eBay 的修复系统
- M3:Uber的指标和监控平台
- Athena:Dropbox 的自动化构建健康管理系统
- Nuage:LinkedIn 的云管理服务
- ThirdEye: LinkedIn监控平台
- 分布式安全
- Dropbox 的大规模安全方法
- Aardvark 和 Repokid:用于 Netflix 分布式高速开发的 AWS 最低权限
- LISA:LinkedIn 的分布式防火墙
- Coinbase 云中存储比特币的安全基础设施
- BinaryAlert:Airbnb 的实时无服务器恶意软件检测
- 可扩展的 IAM 架构以保护对 Segment 中 100 个 AWS 账户的访问
- Indeed 的 OAuth 审计工具箱
- Yelp 的 Active Directory 密码黑名单
- Slack 的大规模系统调用审计
- Athenz:雅虎的细粒度、基于角色的访问控制
- WebAuthn 支持 Dropbox 安全登录
- Slack 的安全开发生命周期 (SDL)
- Kinvolk 的非特权容器构建
- Diffy:Netflix 云中数字取证的差分引擎
- 在 Netflix 的 AWS 中检测凭据泄露
- Spotify 可扩展的用户隐私
- AVA:在 Indeed 审计 Web 应用程序
- TTL 即服务:自动撤销 Yelp 的陈旧权限
- Slack 的企业密钥管理
- 分布式消息
- Cape: Dropbox事件流处理框架
- Brooklin:LinkedIn 近实时数据流的分布式服务
- Samza:LinkedIn 的延迟洞察流处理系统
- Bullet:雅虎流数据的前瞻性查询引擎
- EventHorizon:用于在 Etsy 上观看事件流的工具
- Qmessage: Quora分布式异步任务队列
- Cherami:用于在 Uber 传输异步任务的消息队列系统
- Riot Games消息服务
- 在 Zillow 使用事件日志调试生产
- Netflix 的跨平台应用内消息编排服务
- Netflix 的视频看门人
- 在 Netflix 为数百万台设备扩展推送消息
- 在 Indeed 上使用 RabbitMQ 延迟异步消息处理
- 雅虎对流计算引擎的基准测试
- 在 Deliveroo 使用 Protobuf 模式验证提高流数据质量
- 事件驱动消息
- 发布订阅消息
- Kafka the Message Broker
- 流数据重复删除
- 分布式日志
- 分布式搜索
- Instagram的搜索架构
- eBay 搜索架构
- Box 搜索架构
- Pinterest 通用搜索系统
- eBay 将搜索引擎效率提高25%以上
- Palantir 使用 Lucene 索引和查询遥测日志
- LinkedIn 搜索联合架构(2018)
- Slack 搜索
- DoorDash 搜索和推荐
- Twitter 搜素服务(2014)
- Traveloka 自动完成搜索(2 部分)
- Canva 数据驱动的自动更正系统
- Dropbox 搜索引擎:Nautilus
- LinkedIn 搜素架构: Galene
- Manas:Pinterest 的高性能定制搜索系统
- Sherlock:Flipkart 的近实时搜索索引
- Nebula:用于在 Airbnb 上构建搜索后端的存储平台
- ELK (Elasticsearch, Logstash, Kibana) Stack
- Uber ELK 实时预测
- Envato 构建可扩展的 ELK 栈
- ELK在Robinhood
- Uber 弹性Elasticsearch集群
- eBay Elasticsearch 性能调优实践
- Elasticsearch在Kickstarter
- Elasticsearch在Target
- Trivago 使用 Logstash和Google protobuf进行日志解析
- Yelp 使用数据管道和Elasticsearch进行快速订单搜索
- Yelp 将核心业务搜索迁移到 Elasticsearch
- Vinted 分片 Elasticsearch
- Wattpad 使用 Elasticsearch 进行自我排名搜索
- Redmart 升级 Elasticsearch(3 部分)
- Vulcanizer:一个在 Github 上运行 Elasticsearch 的库
- 分布式存储
- 内存存储
- 对象存储
- Uber可伸缩的HDFS
- Reasons for Choosing S3 over HDFS at Databricks
- Quantcast基于S3的文件系统
- Image Recovery at Scale Using S3 Versioning at Trivago
- Yahoo 云对象存储
- LinkedIn 分布式不可变对象存储: Ambry
- LinkedIn 在最小的硬件上对HDFS进行规模化测试,实现最大的保真度: Dynamometer
- Hammerspace: Persistent, Concurrent, Off-heap Storage at Airbnb
- MezzFS: Mounting Object Storage in Media Processing Platform at Netflix
- Magic Pocket: In-house Multi-exabyte Storage System at Dropbox
- 关系数据库
- MySQL for Schema-less Data at FriendFeed
- Pinterest的MySQL应用
- Twitch的PostgreSQL应用
- Scaling MySQL-based Financial Reporting System at Airbnb
- Scaling MySQL at Wix
- MaxScale (MySQL) Database Proxy at Airbnb
- Uber 从Postgres到MySQL
- Handling Growth with Postgres at Instagram
- Scaling the Analytics Database (Postgres) at TransferWise
- Updating a 50 Terabyte PostgreSQL Database at Adyen
- Scaling Database Access for 100s of Billions of Queries per Day at PayPal
- 副本
- MySQL Parallel Replication (4 parts) at Booking.com
- Mitigating MySQL Replication Lag and Reducing Read Load at Github
- Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift at Yelp
- Monitoring MySQL Delayed Replication at IMVU
- Partitioning Main MySQL Database at Airbnb
- Herb: Multi-DC Replication Engine for Schemaless Datastore at Uber
- 分片
- Presto分布式SQL查询引擎
- 非关系数据库
- KV数据库
- DynamoDB在Nike
- DynamoDB在Segment
- DynamoDB在Mapbox
- Manhattan: Twitter分布式KV数据库
- Sherpa: Distributed NoSQL Key-Value Store at Yahoo
- HaloDB: Embedded Key-Value Storage Engine at Yahoo
- MPH: Fast and Compact Immutable Key-Value Stores at Indeed
- zBase: High Performance, Elastic, Distributed Key-Value Store at Zynga
- Venice: Distributed Key-Value Database at Linkedin
- 列式数据库
- Cassandra
- Cassandra在Instagram
- Walmart 使用Cassandra存储图片
- Storing Messages with Cassandra at Discord
- Scaling Cassandra Cluster at Walmart
- Scaling Ad Analytics with Cassandra at Yelp
- Scaling to 100+ Million Reads/Writes using Spark and Cassandra at Dream11
- Moving Food Feed from Redis to Cassandra at Zomato
- Benchmarking Cassandra Scalability on AWS at Netflix
- Service Decomposition at Scale with Cassandra at Intuit QuickBooks
- Cassandra for Keeping Counts In Sync at SoundCloud
- cstar: Cassandra Orchestration Tool at Spotify
- HBase
- Redshift
- Cassandra
- 文档数据库
- 图数据库
- KV数据库
- 时间序列数据库
- 分布式存储仓库、依赖库和配置管理
- Github 分布式Git: DGit
- Palantir 分布式Git服务: Stemma
- Flickr 分布式系统的配置管理
- Microsoft Git仓库
- Microsoft 用大型存储库解决Git问题
- Google 单一存储库
- Scaling Infrastructure and (Git) Workflow at Adyen
- Dotfiles Distribution at Booking.com
- Secret Detector: Preventing Secrets in Source Code at Yelp
- Managing Software Dependency at Scale at LinkedIn
- Twitter的动态配置
- 扩展持续集成和持续交付
- Facebook 持续集成stack
- Continuous Integration with Distributed Repositories and Dependencies at Netflix
- Screwdriver: Continuous Delivery Build System for Dynamic Infrastructure at Yahoo
- Betterment的CI/CD
- Brainly的CI/CD
- Scaling iOS CI with Anka at Shopify
- Scaling Jira Server at Yelp
- Auto-scaling CI/CD cluster at Flexport
- Resilience Engineering: Learning to Embrace Failure
- Resilience Engineering with Project Waterbear at LinkedIn
- Resiliency against Traffic Oversaturation at iHeartRadio
- Resiliency in Distributed Systems at GO-JEK
- Practical NoSQL Resilience Design Pattern for the Enterprise at eBay
- Ensuring Resilience to Disaster at Quora
- Resilience在Shopify
- Site Resiliency at Expedia
- Failover
- The Evolution of Global Traffic Routing and Failover
- Testing for Disaster Recovery Failover Testing
- Designing a Microservices Architecture for Failure
- ELB for Automatic Failover at GoSquared
- Eliminate the Database for Higher Availability at American Express
- Failover with Redis Sentinel at Vinted
- High-availability SaaS Infrastructure at FreeAgent
- MySQL High Availability at GitHub
- 负载均衡
- Introduction to Modern Network Load Balancing and Proxying
- Top Five (Load Balancing) Scalability Patterns
- Load Balancing infrastructure to support more than 1.3 billion users at Facebook
- DHCPLB: DHCP Load Balancer at Facebook
- Katran: Scalable Network Load Balancer at Facebook
- Load Balancing with Eureka at Netflix
- Edge Load Balancing at Netflix
- Zuul 2: Cloud Gateway at Netflix
- Yelp的负载均衡
- Github的负载均衡
- Vimeo一致性hash提升负载均衡
- 500 pixel UDP的负载均衡
- QALM: QoS Load Management Framework at Uber
- Traffic Steering using Rum DNS at LinkedIn
- Traffic Infrastructure (Edge Network) at Dropbox
- Monitor DNS systems at Stripe
- 限流
- 自动扩容
- Pinterest 自动扩容
- Autoscaling Based on Request Queuing at Square
- Autoscaling Jenkins at Trivago
- Autoscaling Pub-Sub Consumers at Spotify
- Autoscaling Bigtable Clusters based on CPU Load at Spotify
- Autoscaling AWS Step Functions Activities at Yelp
- Scryer: Predictive Auto Scaling Engine at Netflix
- Bouncer: Simple AWS Auto Scaling Rollovers at Palantir
- Clusterman: Autoscaling Mesos Clusters at Yelp
- Google高可用分布式存储系统
- Yahoo高可用NodeJS
- Operations (11 parts) at LinkedIn
- Monitoring Powers High Availability for LinkedIn Feed
- Supporting Global Events at Facebook
- BlaBlaCar的高可用
- Netflix的高可用
- High Availability Cloud Infrastructure at Twilio
- Dropbox 自动化数据中心运营
- Riot Games 全球化玩家账户
- 熔断
- 分布式系统的熔断
- LINE分布式服务的熔断
- Applying Circuit Breaker to Channel Gateway at LINE
- Lessons in Resilience at SoundCloud
- Circuit Breaker for Scaling Containers
- Protector: Circuit Breaker for Time Series Databases at Trivago
- Improved Production Stability with Circuit Breakers at Heroku
- Circuit Breakers at Zendesk
- Circuit Breakers at Traveloka
- 超时
- Booking.com MySQL 的崩溃安全复制
- Bulkheads: Partition and Tolerate Failure in One Part
- Steady State: Always Put Logs on Separate Disk
- Throttling: Maintain a Steady Pace
- Multi-Clustering: Improving Resiliency and Stability of a Large-scale Monolithic API Service at LinkedIn
- Determinism (4 parts) in League of Legends Server
- 操作系统,存储,数据库,网络的性能优化
- Instagram 通过后台数据预取提高性能
- eBay 解决网络I/O瓶颈的压缩技术
- Dropbox 优化Web服务器,实现高吞吐量和低延迟
- Netflix 60,000毫秒内的Linux性能分析
- Live Downsizing Google Cloud Persistent Disks (PD-SSD) at Mixpanel
- Zapier 使用jemalloc与Python和Celery降低40%的RAM使用率
- Reducing Memory Footprint at Slack
- Pinterest的性能提升
- Wix的服务端渲染
- Yelp MySQLStreamer的30倍性能提升
- Optimizing APIs through Dynamic Polyglot Runtime, Fully Asynchronous, and Reactive Programming at Netflix
- Performance Monitoring with Riemann and Clojure at Walmart
- Performance Tracking Dashboard for Live Games at Zynga
- Optimizing CAL Report Hadoop MapReduce Jobs at eBay
- Performance Tuning on Quartz Scheduler at eBay
- Profiling C++ (Part 1: Optimization, Part 2: Measurement and Analysis) at Riot Games
- HomeAway 剖析React服务器端渲染
- Mixpanel 诊断Linux内核中的网络问题
- Dailymotion 硬件辅助视频转码
- Cross Shard Transactions at 10 Million RPS at Dropbox
- Pinterest API剖析
- Pagelets Parallelize Server-side Processing at Yelp
- Improving key expiration in Redis at Twitter
- Ad Delivery Network Performance Optimization with Flame Graphs at MindGeek
- Predictive CPU isolation of containers at Netflix
- GC性能优化
- 图片, 视频, 页加载性能优化
- 大数据
- Uber 数据平台
- BMW 数据平台
- Netflix 数据平台
- Flipkart 数据平台
- Khan Academy 数据平台
- Airbnb 数据平台
- LinkedIn 的数据基础架构
- GO-JEK 数据基础架构
- Pinterest 数据基础架构
- Pinterest 数据分析架构
- Spotify 大数据处理
- Uber 大数据处理
- 数据分析流水线
- 数据分析流水线
- Teads数据分析流水线
- PayPal 用于实时欺诈预防的 ML 数据管道
- LinkedIn 大数据分析和机器学习技术
- LinkedIn Hadoop 上的自助报告平台
- LinkedIn 隐私保护分析和报告
- Walmart 用于跟踪项目可用性的分析平台
- HALO:Facebook 的硬件分析和生命周期优化
- RBEA:King 的实时分析平台
- AresDB:Uber GPU 驱动的实时分析引擎
- AthenaX:Uber的流分析平台
- Keystone:Netflix 的实时流处理平台
- 数据手册:在 Uber 使用元数据将大数据转化为知识
- Amundsen: Data Discovery & Metadata Engine at Lyft
- Maze: Funnel Visualization Platform at Uber
- Metacat: Making Big Data Discoverable and Meaningful at Netflix
- SpinalTap: Change Data Capture System at Airbnb
- Accelerator: Fast Data Processing Framework at eBay
- Omid: Transaction Processing Platform at Yahoo
- TensorFlowOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo
- CaffeOnSpark: Distributed Deep Learning on Big Data Clusters at Yahoo
- Spark on Scala: Analytics Reference Architecture at Adobe
- Experimentation Platform at Airbnb
- Smart Product Platform at Zalando
- LINE 日志分析平台
- Myntra 数据可视化平台
- Building and Scaling Data Lineage at Netflix
- Building a scalable data management system for computer vision tasks at Pinterest
- Structured Data at Etsy
- 分布式机器学习
- Aroma: Using ML for Code Recommendation at Facebook
- Michelangelo: Machine Learning Platform at Uber
- Scaling Michelangelo
- Horovod: Open Source Distributed Deep Learning Framework for TensorFlow at Uber
- COTA: Improving Customer Care with NLP & Machine Learning at Uber
- Manifold: Model-Agnostic Visual Debugging Tool for Machine Learning at Uber
- Repo-Topix: Topic Extraction Framework at Github
- Concourse: Generating Personalized Content Notifications in Near-Real-Time at LinkedIn
- Altus Care: Applying a Chatbot to Platform Engineering at eBay
- Box Graph: Spontaneous Social Network at Box
- PricingNet: Pricing Modelling with Neural Networks at Skyscanner
- PinText: Multitask Text Embedding System at Pinterest
- Scaling Gradient Boosted Trees for Click-Through-Rate Prediction at Yelp
- Learning with Privacy at Scale at Apple
- Deep Learning for Image Classification Experiment at Mercari
- Deep Learning for Frame Detection in Product Images at Allegro
- Content-based Video Relevance Prediction at Hulu
- Improving Photo Selection With Deep Learning at TripAdvisor
- Personalized Recommendations for Experiences Using Deep Learning at TripAdvisor
- Personalised Recommender Systems at BBC
- 机器学习在Condé Nast
- 自然语言处理和内容分析在Condé Nast
- Machine Learning Applications In The E-commerce Domain (4 parts) at Rakuten
- Mapping the World of Music Using Machine Learning (2 parts) at iHeartRadio
- Machine Learning to Improve Streaming Quality at Netflix
- Machine Learning to Match Drivers & Riders at GO-JEK
- Improving Video Thumbnails with Deep Neural Nets at YouTube
- Quantile Regression for Delivering On Time at Instacart
- Cross-Lingual End-to-End Product Search with Deep Learning at Zalando
- Jane Street机器学习
- Machine Learning for Ranking Answers End-to-End at Quora
- Clustering Similar Stories Using LDA at Flipboard
- Similarity Search at Flickr
- Large-Scale Machine Learning Pipeline for Job Recommendations at Indeed
- Deep Learning from Prototype to Production at Taboola
- Atom Smashing using Machine Learning at CERN
- Mapping Tags at Medium
- Clustering with the Dirichlet Process Mixture Model in Scala at Monsanto
- Map Pins with DBSCAN & Random Forests at Foursquare
- Detecting and Preventing Fraud at Uber
- Forecasting at Uber
- Financial Forecasting at Uber
- Productionizing ML with Workflows at Twitter
- GUI Testing Powered by Deep Learning at eBay
- Scaling Machine Learning to Recommend Driving Routes at Pivotal
- 实时预测在DoorDash
- Dropbox 机器智能
- Dropbox 用于从数十亿张图像中索引文本的机器学习
- Etsy 通过语义嵌入建模用户旅程
- LinkedIn 自动假账户检测
- Airbnb 构建知识图谱
- Instagram 核心建模
- Mercari 用于禁止物品检测的神经架构搜索 (NAS)
- Airbnb 计算机视觉
- Zillow 3D 家庭后端算法
- Lyft 长期预测
- Systems We Make
- Uber 技术栈(两部分)
- Medium 技术栈
- Shopif 技术栈
- Services (2 parts) at Airbnb
- 印象笔记架构
- Riot Games 聊天服务架构(三部分)
- 英雄联盟客户端更新架构
- Slack基础架构
- LinkedIn 后端
- Flickr 后端
- Zendesk基础架构(3 parts)
- Grubhub 云基础设施
- LinkedIn 实时呈现平台
- LinkedIn 设置平台
- Pinterest 广告的实时用户操作计数系统
- Riot Games API平台
- The New York Times 游戏平台
- Kabootar:Swiggy 的通信平台
- Simone:Netflix 的分布式模拟服务
- Seagull:帮助在 Yelp 上每天运行超过 2000 万次测试的分布式系统
- Netflix Play API 服务架构
- LINE 贴纸服务架构
- Stack Overflow Enterprise at Palantir
- Pinterest Following流, Interest流和 Picked For You 架构
- WeWork API 规范工作流
- Netflix 媒体数据库
- Walmart 会员交易历史架构
- 金融和银行系统架构
- 设计大规模系统
- 我的伸缩偶像 - Jeff Atwood (a dose of Endorphins before your interview, JK)
- 构建大型分布式系统的软件工程建议 - Jeff Dean
- Introduction to Architecting Systems for Scale
- 系统设计面试的剖析原文
- 在系统设计面试之前你需要知道的8件事
- 10个系统设计的面试问题
- 十大常见的大规模软件架构模式概述
- 云端大数据设计模式- Lynn Langit
- 如何在 45 分钟的系统设计面试中不设计 Netflix?
- API Best Practices: Webhooks, Deprecation, and Design
- Explaining Low-Level Systems (操作系统, 网络/协议, 数据库, 存储)
- "What Happens When... and How" 问题
- Engineering Levels at SoundCloud
- Engineering Roles at Palantir
- Scaling Engineering Teams at Twitter
- Scaling Decision-Making Across Teams at LinkedIn
- Scaling Data Science Team at GOJEK
- Scaling Agile at Zalando
- Scaling Agile at bol.com
- Lessons Learned from Scaling a Product Team at Intercom
- Hiring, Managing, and Scaling Engineering Teams at Typeform
- Scaling the Datagram Team at Instagram
- Scaling the Design Team at Flexport
- Team Model for Scaling a Design System at Salesforce
- Building Analytics Team (4 parts) at Wish
- From 2 Founders to 1000 Employees at Transferwise
- Lessons Learned Growing a UX Team from 10 to 170 at Adobe
- Five Lessons from Scaling at Pinterest
- Approach Engineering at Vinted
- Using Metrics to Improve the Development Process (and Coach People) at Indeed
- Mistakes to Avoid while Creating an Internal Product at Skyscanner
- RACI (Responsible, Accountable, Consulted, Informed) at Etsy
- Four Pillars of Leading People (Empathy, Inspiration, Trust, Honesty) at Zalando
- Shopify 结对编程
- Distributed Responsibility at Asana
- Rotating Engineers at Zalando
- Code Review
- 一节课讲解分布式系统 - Tim Berglund, Senior Director of Developer Experience at Confluent
- Facebook 构建实时基础设施 - Jeff Barber and Shie Erlich, Software Engineer at Facebook
- 为 Google 构建可靠的社交基础设施 - Marc Alvidrez, Senior Manager at Google
- 以 Google 规模构建分布式构建系统 - Aysylu Greenberg, SDE at Google
- Dropbox 网站可靠性工程 - Tammy Butow, Site Reliability Engineering Manager at Dropbox
- How Google Does Planet-Scale for Planet-Scale Infra - Melissa Binde, SRE Director for Google Cloud Platform
- Netflix 微服务指南 - Josh Evans, Director of Operations Engineering at Netflix
- 在大型在线服务中实现快速响应 - Jeff Dean, Google Senior Fellow
- Shopify 处理 80K RPS 名人销售的架构 - Simon Eskildsen, Engineering Lead at Shopify
- Facebook的扩展经验 - Bobby Johnson, Director of Engineering at Facebook
- Salesforce 大中华区的性能优化 - Jeff Cheng, Enterprise Architect at Salesforce
- GIPHY 如何向 3 亿用户提供 GIF 动图 - Alex Hoang and Nima Khoshini, Services Engineers at GIPHY
- Alibaba 高性能数据包处理平台 - Haiyong Wang, Senior Director at Alibaba
- 解决大规模数据中心与云互联问题 - Ihab Tarazi, CTO at Equinix
- Dropbox 扩展 - Kevin Modzelewski, Back-end Engineer at Dropbox
- Dropbox 可靠性扩展 - Sat Kriya Khalsa, SRE at Dropbox
- Facebook 性能扩展 - Bill Jia, VP of Infrastructure at Facebook
- Facebook 将直播视频扩展到十亿用户 - Sachin Kulkarni, Director of Engineering at Facebook
- Instagram 扩展基础设施 - Lisa Guo, Instagram Engineering
- Twitter 扩展基础设施 - Yao Yue, Staff Software Engineer at Twitter
- Etsy 扩展基础设施 - Bethany Macri, Engineering Manager at Etsy
- Alibaba 为全球购物假期扩展实时基础设施 - Xiaowei Jiang, Senior Director at Alibaba
- Spotify 扩展数据基础设施 - Matti (Lepistö) Pehrs, Spotify
- Pinterest 扩展 - Marty Weiner, Pinterest’s founding engineer
- 扩展 Slack - Bing Wei, Software Engineer (Infrastructure) at Slack
- Youtube 扩展后端 - Sugu Sougoumarane, SDE at Youtube
- Uber 扩展后端 - Matt Ranney, Chief Systems Architect at Uber
- Netflix 扩展全球 CDN - Dave Temkin, Director of Global Networks at Netflix
- 扩展负载平衡基础设施以支持 Facebook 的 13 亿用户 - Patrick Shuff, Production Engineer at Facebook
- 将(NSFW 网站)扩展到每天超过 2 亿次观看 - Eric Pickup, Lead Platform Developer at MindGeek
- Quora 扩展计数基础设施 - Chun-Ho Hung and Nikhil Gar, SEs at Quora
- Microsoft 扩展 Git - Saeed Noursalehi, Principal Program Manager at Microsoft
- Shopify 扩展跨多个数据中心多租户架构 - Weingarten, Engineering Lead at Shopify
- Big Data, Web Ops & DevOps Ebooks - O'Reilly (Online - Free)
- Google Site Reliability Engineering (Online - Free)
- Distributed Systems for Fun and Profit (Online - Free)
- What Every Developer Should Know About SQL Performance (Online - Free)
- Beyond the Twelve-Factor App - Exploring the DNA of Highly Scalable, Resilient Cloud Applications (Free)
- Chaos Engineering - Building Confidence in System Behavior through Experiments (Free)
- The Art of Scalability
- Designing Data-Intensive Applications
- Web Scalability for Startup Engineers
- Scalability Rules: 50 Principles for Scaling Web Sites
这个项目由Nguyen Quoc Binh 在 2017 Christmas Eve 创建,献给那些在工作中牺牲个人生活的深夜程序员。
请我喝杯咖啡,好吗?谢谢你! 这对我意义非凡::heart: