Skip to content

Commit

Permalink
Restructuring data guide (#475)
Browse files Browse the repository at this point in the history
* New structure for data guide
* Add redirects
  • Loading branch information
Mike Wasson authored Mar 31, 2018
1 parent c441fd1 commit 3d8eda1
Show file tree
Hide file tree
Showing 53 changed files with 1,332 additions and 819 deletions.
74 changes: 73 additions & 1 deletion .openpublishing.redirection.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,78 @@
{
"source_path": "docs/elasticsearch/resilience-and-recover.md",
"redirect_url": "/azure/architecture"
}
},
{
"source_path": "docs/data-guide/scenarios/batch-processing.md",
"redirect_url": "/azure/architecture/data-guide/big-data/batch-processing"
},
{
"source_path": "docs/data-guide/concepts/big-data.md",
"redirect_url": "/azure/architecture/data-guide/big-data"
},
{
"source_path": "docs/data-guide/concepts/machine-learning-at-scale.md",
"redirect_url": "/azure/architecture/data-guide/big-data/machine-learning-at-scale"
},
{
"source_path": "docs/data-guide/concepts/non-relational-data.md",
"redirect_url": "/azure/architecture/data-guide/big-data/non-relational-data"
},
{
"source_path": "docs/data-guide/scenarios/real-time-processing.md",
"redirect_url": "/azure/architecture/data-guide/big-data/real-time-processing"
},
{
"source_path": "docs/data-guide/technology-choices/data-warehouses.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/data-warehousing"
},
{
"source_path": "docs/data-guide/scenarios/etl.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/etl"
},
{
"source_path": "docs/data-guide/concepts/relational-data.md",
"redirect_url": "/azure/architecture/data-guide/relational-data"
},
{
"source_path": "docs/data-guide/concepts/advanced-analytics.md",
"redirect_url": "/azure/architecture/data-guide/scenarios/advanced-analytics"
},
{
"source_path": "docs/data-guide/concepts/csv-and-json.md",
"redirect_url": "/azure/architecture/data-guide/scenarios/csv-and-json"
},
{
"source_path": "docs/data-guide/concepts/data-lake.md",
"redirect_url": "/azure/architecture/data-guide/scenarios/data-lake"
},
{
"source_path": "docs/data-guide/concepts/semantic-modeling.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-analytical-processing"
},
{
"source_path": "docs/data-guide/concepts/transactional-data.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-transaction-processing"
},
{
"source_path": "docs/data-guide/scenarios/data-warehousing.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/data-warehousing"
},
{
"source_path": "docs/data-guide/scenarios/online-analytical-processing.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-analytical-processing"
},
{
"source_path": "docs/data-guide/scenarios/online-transaction-processing.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-transaction-processing"
},
{
"source_path": "docs/data-guide/technology-choices/olap-data-stores.md",
"redirect_url": "/azure/architecture/data-guide//relational-data/online-analytical-processing"
},
{
"source_path": "docs/data-guide/technology-choices/oltp-data-stores.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-transaction-processing"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ For example, the logs from a web server might be copied to a folder and then pro

## When to use this solution

Batch processing is used in a variety of scenarios, from simple data transformations to a more complete ETL (extract-transform-load) pipeline. In a big data context, batch processing may operate over very large data sets, where the computation takes significant time. (For example, see [Lambda architecture](../concepts/big-data.md#lambda-architecture).) Batch processing typically leads to further interactive exploration, provides the modeling-ready data for machine learning, or writes the data to a data store that is optimized for analytics and visualization.
Batch processing is used in a variety of scenarios, from simple data transformations to a more complete ETL (extract-transform-load) pipeline. In a big data context, batch processing may operate over very large data sets, where the computation takes significant time. (For example, see [Lambda architecture](../big-data/index.md#lambda-architecture).) Batch processing typically leads to further interactive exploration, provides the modeling-ready data for machine learning, or writes the data to a data store that is optimized for analytics and visualization.

One example of batch processing is transforming a large set of flat, semi-structured CSV or JSON files into a schematized and structured format that is ready for further querying. Typically the data is converted from the raw formats used for ingestion (such as CSV) into binary formats that are more performant for querying because they store data in a columnar format, and often provide indexes and inline statistics about the data.

Expand Down
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ ms:date: 02/12/2018

A *non-relational database* is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.

What all of these data stores have in common is that they don't use a [relational model](./relational-data.md). Also, they tend to be more specific in the type of data they support and how data can be queried. For example, time series data stores are optimized for queries over time-based sequences of data, while graph data stores are optimized for exploring weighted relationships between entities. Neither format would generalize well to the task of managing transactional data.
What all of these data stores have in common is that they don't use a [relational model](../relational-data/index.md). Also, they tend to be more specific in the type of data they support and how data can be queried. For example, time series data stores are optimized for queries over time-based sequences of data, while graph data stores are optimized for exploring weighted relationships between entities. Neither format would generalize well to the task of managing transactional data.

The term *NoSQL* refers to data stores that do not use SQL for queries, and instead use other programming languages and constructs to query the data. In practice, "NoSQL" means "non-relational database," even though many of these databases do support SQL-compatible queries. However, the underlying query execution strategy is usually very different from the way a traditional RDBMS would execute the same SQL query.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ For more information, see [Real-time message ingestion](../technology-choices/re

### Data storage

- **Azure Storage Blob Containers** or **Azure Data Lake Store**. Incoming real-time data is usually captured in a message broker (see above), but in some scenarios, it can make sense to monitor a folder for new files and process them as they are created or updated. Additionally, many real-time processing solutions combine streaming data with static reference data, which can be stored in a file store. Finally, file storage may be used as an output destination for captured real-time data for archiving, or for further batch processing in a [lambda architecture](../concepts/big-data.md#lambda-architecture).
- **Azure Storage Blob Containers** or **Azure Data Lake Store**. Incoming real-time data is usually captured in a message broker (see above), but in some scenarios, it can make sense to monitor a folder for new files and process them as they are created or updated. Additionally, many real-time processing solutions combine streaming data with static reference data, which can be stored in a file store. Finally, file storage may be used as an output destination for captured real-time data for archiving, or for further batch processing in a [lambda architecture](../big-data/index.md#lambda-architecture).

For more information, see [Data storage](../technology-choices/data-storage.md).

Expand Down
Binary file removed docs/data-guide/concepts/images/data-pipeline-ml.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
61 changes: 0 additions & 61 deletions docs/data-guide/concepts/semantic-modeling.md

This file was deleted.

44 changes: 0 additions & 44 deletions docs/data-guide/concepts/transactional-data.md

This file was deleted.

File renamed without changes
File renamed without changes
Loading

0 comments on commit 3d8eda1

Please sign in to comment.