Interesting papers and blog posts, roughly organized by topic
- Choosing a Cloud DBMS: Architectures and Tradeoffs
- Presto: SQL on Everything
- "Further reading" papers in Globus Labs' "Flows" (UChicago)
- How Apache Airflow Distributes Jobs on Celery Workers
- Uber Engineering highlights
- Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs
- Cost Efficiency @ Scale in Big Data File Format
- Capacity Recommendation Engine: Throughput and Utilization Based Predictive Scaling
- CRISP: Critical Path Analysis for Microservice Architectures
- Enabling Seamless Kafka Async Queuing with Consumer Proxy
- Building Scalable Streaming Pipelines for Near Real-Time Features