From 943e671a8d522cef5ddc8c6e04848134b03c2de4 Mon Sep 17 00:00:00 2001 From: romitgirdhar Date: Fri, 2 Mar 2018 20:48:37 +0100 Subject: [PATCH] Adding comparison Information on Azure Databricks (#434) Adding comparison Information on Azure Databricks --- .../technology-choices/stream-processing.md | 32 ++++++++++--------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/docs/data-guide/technology-choices/stream-processing.md b/docs/data-guide/technology-choices/stream-processing.md index f274ab9fca8..42818a9524f 100644 --- a/docs/data-guide/technology-choices/stream-processing.md +++ b/docs/data-guide/technology-choices/stream-processing.md @@ -1,3 +1,4 @@ + --- title: Choosing a stream processing technology description: @@ -15,6 +16,7 @@ Real-time stream processing consumes messages from either queue or file-based st In Azure, all of the following data stores will meet the core requirements supporting real-time processing: - [Azure Stream Analytics](/azure/stream-analytics/) - [HDInsight with Spark Streaming](/azure/hdinsight/spark/apache-spark-streaming-overview) +- [Apache Spark in Azure Databricks](/azure/azure-databricks/) - [HDInsight with Storm](/azure/hdinsight/storm/apache-storm-overview) - [Azure Functions](/azure/azure-functions/functions-overview) - [Azure App Service WebJobs](/azure/app-service/web-sites-create-web-jobs) @@ -36,25 +38,25 @@ For real-time processing scenarios, begin choosing the appropriate service for y The following tables summarize the key differences in capabilities. ### General capabilities -| | Azure Stream Analytics | HDInsight with Spark Streaming | HDInsight with Storm | Azure Functions | Azure App Service WebJobs | -| --- | --- | --- | --- | --- | --- | -| Programmability | Stream analytics query language, JavaScript | Scala, Python, Java | Java, C# | C#, F#, Node.js | C#, Node.js, PHP, Java, Python | -| Programming paradigm | Declarative | Mixture of declarative and imperative | Imperative | Imperative | Imperative | -| Pricing model | By streaming units | By cluster hour | By cluster hour | Per function execution and resource consumption | Per app service plan hour | +| | Azure Stream Analytics | HDInsight with Spark Streaming | Apache Spark in Azure Databricks | HDInsight with Storm | Azure Functions | Azure App Service WebJobs | +| --- | --- | --- | --- | --- | --- | --- | +| Programmability | Stream analytics query language, JavaScript | Scala, Python, Java | Scala, Python, Java, R | Java, C# | C#, F#, Node.js | C#, Node.js, PHP, Java, Python | +| Programming paradigm | Declarative | Mixture of declarative and imperative | Mixture of declarative and imperative | Imperative | Imperative | Imperative | +| Pricing model | [Streaming units](https://azure.microsoft.com/pricing/details/stream-analytics/) | Per cluster hour | [Databricks units](https://azure.microsoft.com/pricing/details/databricks/) | Per cluster hour | Per function execution and resource consumption | Per app service plan hour | ### Integration capabilities -| | Azure Stream Analytics | HDInsight with Spark Streaming | HDInsight with Storm | Azure Functions | Azure App Service WebJobs | -| --- | --- | --- | --- | --- | --- | -| Inputs | [Stream Analytics inputs](/azure/stream-analytics/stream-analytics-define-inputs) | Event Hubs, IoT Hub, Kafka, HDFS | Event Hubs, IoT Hub, Storage Blobs, Azure Data Lake Store | [Supported bindings](/azure/azure-functions/functions-triggers-bindings#supported-bindings) | Service Bus, Storage Queues, Storage Blobs, Event Hubs, WebHooks, Cosmos DB, Files | -| Sinks | [Stream Analytics outputs](/azure/stream-analytics/stream-analytics-define-outputs) | HDFS | Event Hubs, Service Bus, Kafka | [Supported bindings](/azure/azure-functions/functions-triggers-bindings#supported-bindings) | Service Bus, Storage Queues, Storage Blobs, Event Hubs, WebHooks, Cosmos DB, Files | +| | Azure Stream Analytics | HDInsight with Spark Streaming | Apache Spark in Azure Databricks | HDInsight with Storm | Azure Functions | Azure App Service WebJobs | +| --- | --- | --- | --- | --- | --- | --- | +| Inputs | [Stream Analytics inputs](/azure/stream-analytics/stream-analytics-define-inputs) | Event Hubs, IoT Hub, Kafka, HDFS, Storage Blobs, Azure Data Lake Store | Event Hubs, IoT Hub, Kafka, HDFS, Storage Blobs, Azure Data Lake Store | Event Hubs, IoT Hub, Storage Blobs, Azure Data Lake Store | [Supported bindings](/azure/azure-functions/functions-triggers-bindings#supported-bindings) | Service Bus, Storage Queues, Storage Blobs, Event Hubs, WebHooks, Cosmos DB, Files | +| Sinks | [Stream Analytics outputs](/azure/stream-analytics/stream-analytics-define-outputs) | HDFS, Kafka, Storage Blobs, Azure Data Lake Store, Cosmos DB | HDFS, Kafka, Storage Blobs, Azure Data Lake Store, Cosmos DB | Event Hubs, Service Bus, Kafka | [Supported bindings](/azure/azure-functions/functions-triggers-bindings#supported-bindings) | Service Bus, Storage Queues, Storage Blobs, Event Hubs, WebHooks, Cosmos DB, Files | ### Processing capabilities -| | Azure Stream Analytics | HDInsight with Spark Streaming | HDInsight with Storm | Azure Functions | Azure App Service WebJobs | -| --- | --- | --- | --- | --- | --- | -| Built-in temporal/windowing support | Yes | Yes | Yes | No | No | -| Input data formats | Avro, JSON or CSV, UTF-8 encoded | Any format using custom code | Any format using custom code | Any format using custom code | Any format using custom code | -| Scalability | [Query partitions](/azure/stream-analytics/stream-analytics-parallelization) | Bounded by cluster size | Bounded by cluster size | Up to 200 function app instances processing in parallel | Bounded by app service plan capacity | -| Late arrival and out of order event handling support | Yes | Yes | Yes | No | No | +| | Azure Stream Analytics | HDInsight with Spark Streaming | Apache Spark in Azure Databricks | HDInsight with Storm | Azure Functions | Azure App Service WebJobs | +| --- | --- | --- | --- | --- | --- | --- | +| Built-in temporal/windowing support | Yes | Yes | Yes | Yes | No | No | +| Input data formats | Avro, JSON or CSV, UTF-8 encoded | Any format using custom code | Any format using custom code | Any format using custom code | Any format using custom code | Any format using custom code | +| Scalability | [Query partitions](/azure/stream-analytics/stream-analytics-parallelization) | Bounded by cluster size | Bounded by Databricks cluster scale configuration | Bounded by cluster size | Up to 200 function app instances processing in parallel | Bounded by app service plan capacity | +| Late arrival and out of order event handling support | Yes | Yes | Yes | Yes | No | No | See also: