From 47f1c4e1751336346b53e4065026dc44b67c9a8f Mon Sep 17 00:00:00 2001 From: Ed Price - MSFT Date: Wed, 28 Feb 2018 15:34:21 -0800 Subject: [PATCH] Update batch-processing.md Fixed broken link. --- docs/data-guide/scenarios/batch-processing.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/data-guide/scenarios/batch-processing.md b/docs/data-guide/scenarios/batch-processing.md index 470a7d0eaaa..39d0aad3420 100644 --- a/docs/data-guide/scenarios/batch-processing.md +++ b/docs/data-guide/scenarios/batch-processing.md @@ -15,7 +15,7 @@ For example, the logs from a web server might be copied to a folder and then pro ## When to use this solution -Batch processing is used in a variety of scenarios, from simple data transformations to a more complete ETL (extract-transform-load) pipeline. In a big data context, batch processing may operate over very large data sets, where the computation takes significant time. (For example, see [Lambda architecture](../concepts/big-data.md##lambda-architecture).) Batch processing typically leads to further interactive exploration, provides the modeling-ready data for machine learning, or writes the data to a data store that is optimized for analytics and visualization. +Batch processing is used in a variety of scenarios, from simple data transformations to a more complete ETL (extract-transform-load) pipeline. In a big data context, batch processing may operate over very large data sets, where the computation takes significant time. (For example, see [Lambda architecture](../concepts/big-data.md#lambda-architecture).) Batch processing typically leads to further interactive exploration, provides the modeling-ready data for machine learning, or writes the data to a data store that is optimized for analytics and visualization. One example of batch processing is transforming a large set of flat, semi-structured CSV or JSON files into a schematized and structured format that is ready for further querying. Typically the data is converted from the raw formats used for ingestion (such as CSV) into binary formats that are more performant for querying because they store data in a columnar format, and often provide indexes and inline statistics about the data. @@ -81,4 +81,4 @@ For more information, see [Analytics and reporting](../technology-choices/analys - **Azure Data Factory**. Azure Data Factory pipelines can be used to define a sequence of activities, scheduled for recurring temporal windows. These activities can initiate data copy operations as well as Hive, Pig, MapReduce, or Spark jobs in on-demand HDInsight clusters; U-SQL jobs in Azure Date Lake Analytics; and stored procedures in Azure SQL Data Warehouse or Azure SQL Database. - **Oozie** and **Sqoop**. Oozie is a job automation engine for the Apache Hadoop ecosystem and can be used to initiate data copy operations as well as Hive, Pig, and MapReduce jobs to process data and Sqoop jobs to copy data between HDFS and SQL databases. -For more information, see [Pipeline orchestration](../technology-choices/pipeline-orchestration-data-movement.md) \ No newline at end of file +For more information, see [Pipeline orchestration](../technology-choices/pipeline-orchestration-data-movement.md)