Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring data guide #475

Merged
merged 2 commits into from
Mar 31, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 73 additions & 1 deletion .openpublishing.redirection.json
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,78 @@
{
"source_path": "docs/elasticsearch/resilience-and-recover.md",
"redirect_url": "/azure/architecture"
}
},
{
"source_path": "docs/data-guide/scenarios/batch-processing.md",
"redirect_url": "/azure/architecture/data-guide/big-data/batch-processing"
},
{
"source_path": "docs/data-guide/concepts/big-data.md",
"redirect_url": "/azure/architecture/data-guide/big-data"
},
{
"source_path": "docs/data-guide/concepts/machine-learning-at-scale.md",
"redirect_url": "/azure/architecture/data-guide/big-data/machine-learning-at-scale"
},
{
"source_path": "docs/data-guide/concepts/non-relational-data.md",
"redirect_url": "/azure/architecture/data-guide/big-data/non-relational-data"
},
{
"source_path": "docs/data-guide/scenarios/real-time-processing.md",
"redirect_url": "/azure/architecture/data-guide/big-data/real-time-processing"
},
{
"source_path": "docs/data-guide/technology-choices/data-warehouses.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/data-warehousing"
},
{
"source_path": "docs/data-guide/scenarios/etl.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/etl"
},
{
"source_path": "docs/data-guide/concepts/relational-data.md",
"redirect_url": "/azure/architecture/data-guide/relational-data"
},
{
"source_path": "docs/data-guide/concepts/advanced-analytics.md",
"redirect_url": "/azure/architecture/data-guide/scenarios/advanced-analytics"
},
{
"source_path": "docs/data-guide/concepts/csv-and-json.md",
"redirect_url": "/azure/architecture/data-guide/scenarios/csv-and-json"
},
{
"source_path": "docs/data-guide/concepts/data-lake.md",
"redirect_url": "/azure/architecture/data-guide/scenarios/data-lake"
},
{
"source_path": "docs/data-guide/concepts/semantic-modeling.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-analytical-processing"
},
{
"source_path": "docs/data-guide/concepts/transactional-data.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-transaction-processing"
},
{
"source_path": "docs/data-guide/scenarios/data-warehousing.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/data-warehousing"
},
{
"source_path": "docs/data-guide/scenarios/online-analytical-processing.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-analytical-processing"
},
{
"source_path": "docs/data-guide/scenarios/online-transaction-processing.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-transaction-processing"
},
{
"source_path": "docs/data-guide/technology-choices/olap-data-stores.md",
"redirect_url": "/azure/architecture/data-guide//relational-data/online-analytical-processing"
},
{
"source_path": "docs/data-guide/technology-choices/oltp-data-stores.md",
"redirect_url": "/azure/architecture/data-guide/relational-data/online-transaction-processing"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ For example, the logs from a web server might be copied to a folder and then pro

## When to use this solution

Batch processing is used in a variety of scenarios, from simple data transformations to a more complete ETL (extract-transform-load) pipeline. In a big data context, batch processing may operate over very large data sets, where the computation takes significant time. (For example, see [Lambda architecture](../concepts/big-data.md#lambda-architecture).) Batch processing typically leads to further interactive exploration, provides the modeling-ready data for machine learning, or writes the data to a data store that is optimized for analytics and visualization.
Batch processing is used in a variety of scenarios, from simple data transformations to a more complete ETL (extract-transform-load) pipeline. In a big data context, batch processing may operate over very large data sets, where the computation takes significant time. (For example, see [Lambda architecture](../big-data/index.md#lambda-architecture).) Batch processing typically leads to further interactive exploration, provides the modeling-ready data for machine learning, or writes the data to a data store that is optimized for analytics and visualization.

One example of batch processing is transforming a large set of flat, semi-structured CSV or JSON files into a schematized and structured format that is ready for further querying. Typically the data is converted from the raw formats used for ingestion (such as CSV) into binary formats that are more performant for querying because they store data in a columnar format, and often provide indexes and inline statistics about the data.

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ ms:date: 02/12/2018

A *non-relational database* is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.

What all of these data stores have in common is that they don't use a [relational model](./relational-data.md). Also, they tend to be more specific in the type of data they support and how data can be queried. For example, time series data stores are optimized for queries over time-based sequences of data, while graph data stores are optimized for exploring weighted relationships between entities. Neither format would generalize well to the task of managing transactional data.
What all of these data stores have in common is that they don't use a [relational model](../relational-data/index.md). Also, they tend to be more specific in the type of data they support and how data can be queried. For example, time series data stores are optimized for queries over time-based sequences of data, while graph data stores are optimized for exploring weighted relationships between entities. Neither format would generalize well to the task of managing transactional data.

The term *NoSQL* refers to data stores that do not use SQL for queries, and instead use other programming languages and constructs to query the data. In practice, "NoSQL" means "non-relational database," even though many of these databases do support SQL-compatible queries. However, the underlying query execution strategy is usually very different from the way a traditional RDBMS would execute the same SQL query.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ For more information, see [Real-time message ingestion](../technology-choices/re

### Data storage

- **Azure Storage Blob Containers** or **Azure Data Lake Store**. Incoming real-time data is usually captured in a message broker (see above), but in some scenarios, it can make sense to monitor a folder for new files and process them as they are created or updated. Additionally, many real-time processing solutions combine streaming data with static reference data, which can be stored in a file store. Finally, file storage may be used as an output destination for captured real-time data for archiving, or for further batch processing in a [lambda architecture](../concepts/big-data.md#lambda-architecture).
- **Azure Storage Blob Containers** or **Azure Data Lake Store**. Incoming real-time data is usually captured in a message broker (see above), but in some scenarios, it can make sense to monitor a folder for new files and process them as they are created or updated. Additionally, many real-time processing solutions combine streaming data with static reference data, which can be stored in a file store. Finally, file storage may be used as an output destination for captured real-time data for archiving, or for further batch processing in a [lambda architecture](../big-data/index.md#lambda-architecture).

For more information, see [Data storage](../technology-choices/data-storage.md).

Expand Down
Binary file removed docs/data-guide/concepts/images/data-pipeline-ml.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
61 changes: 0 additions & 61 deletions docs/data-guide/concepts/semantic-modeling.md

This file was deleted.

44 changes: 0 additions & 44 deletions docs/data-guide/concepts/transactional-data.md

This file was deleted.

Loading