layout

title

article_header

article

Unified Compilation and Execution for Data Pipelines

type

theme

background_color

cover_size

background_image

overlay

dark

#20639B

md

gradient	src
linear-gradient(135deg, rgba(34, 90, 87 , .7), rgba(139, 34, 139, .7))	/assets/images/unified.png

Our Mission

Modern end-to-end data pipelines are highly complex and unoptimized. They combine code from different frontends (e.g., SQL, Beam, Keras), declared in different programming languages (e.g., Python, Scala) and execute across many backend runtimes (e.g., Spark, Flink, Tensorflow). Data and intermediate results take a long and slow path through excessive materialization, conversions down to different partially supported hardware accelerators. End-to-End guarantees are typically complex to reason due to the mismatch of processing semantics across runtimes.

The Continuous Deep Analytics (CDA) project aims to shape the next-generation software for scalable, data-driven applications and pipelines. Our work binds state of the art mechanisms in compiler and database technology together with hardware-accelerated machine learning and distributed stream processing.

Architecture

The CDA project is currently composed out of 1) Arc: an Intermediate Representation (IR) for batch and stream data computation, 2) Arcon: A distributed execution runtime for Arc programs and 3) Kompact: a component- and actor-based middleware for programming distributed systems that is written entirely in Rust.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Our Mission

Architecture

Files

index.md

Latest commit

History

index.md

File metadata and controls

Our Mission

Architecture