diff --git a/.openpublishing.redirection.json b/.openpublishing.redirection.json index 0d8b4300df3..b45593ce639 100644 --- a/.openpublishing.redirection.json +++ b/.openpublishing.redirection.json @@ -366,6 +366,10 @@ "source_path": "docs/topics/high-performance-computing/index.md", "redirect_url": "/azure/architecture/topics/high-performance-computing", "redirect_document_id": true + }, + { + "source_path": "docs/cloud-adoption/operations/monitor/cloud-app-howto.md", + "redirect_url": "/azure/architecture/cloud-adoption/operations/monitor/cloud-models-monitor-overview" } ] } diff --git a/docs/cloud-adoption/migrate/azure-best-practices/migrate-best-practices-costs.md b/docs/cloud-adoption/migrate/azure-best-practices/migrate-best-practices-costs.md index b6fb77d4904..5c8cc638822 100644 --- a/docs/cloud-adoption/migrate/azure-best-practices/migrate-best-practices-costs.md +++ b/docs/cloud-adoption/migrate/azure-best-practices/migrate-best-practices-costs.md @@ -137,10 +137,10 @@ Storage accounts can use different types of redundancy for resilience and high a **Type** | **Details** | **Usage** --- | --- | --- -**Locally Redundant Storage (LRS)** | Protects against a local outage by replicating within a single storage unit to a separate fault domain and update domain. Keeps multiple copies of your data in one datacenter. Provides at least 99.999999999 % (11 9\'s) durability of objects over a given year. | Consider if your app stores data that can be easily reconstructed. -**Zone Redundant Storage (ZRS)** | Protects again a datacenter outage by replicating across three storage clusters in a single region. Each storage cluster is physically separated and located in its own availability zone. Provides at least 99.9999999999 % (12 9\'s) durability of objects over a given year by keeping multiple copies of your data across multiple datacenters or regions. | Consider if you need consistency, durability, and high availability. Might not protect against a regional disaster when multiple zones are permanently affected. -**Geographically Redundant Storage (GRS)** | Protects against an entire region outage by replicating data to a secondary region hundreds of miles away from the primary. Provides at least 99.99999999999999 % (16 9\'s) durability of objects over a given year. | Replica data isn't available unless Microsoft initiates a failover to the secondary region. If failover occurs, read and write access is available. -**Read-Access Geographically Redundant Storage (RA-GRS)** | Similar to GRS. Provides at least 99.99999999999999 % (16 9\'s) durability of objects over a given year | Provides and 99.99 % read availability by allowing read access from the second region used for GRS. +**Locally redundant storage (LRS)** | Protects against a local outage by replicating within a single storage unit to a separate fault domain and update domain. Keeps multiple copies of your data in one datacenter. Provides at least 99.999999999 % (11 9\'s) durability of objects over a given year. | Consider if your app stores data that can be easily reconstructed. +**Zone-redundant storage (ZRS)** | Protects again a datacenter outage by replicating across three storage clusters in a single region. Each storage cluster is physically separated and located in its own availability zone. Provides at least 99.9999999999 % (12 9\'s) durability of objects over a given year by keeping multiple copies of your data across multiple datacenters or regions. | Consider if you need consistency, durability, and high availability. Might not protect against a regional disaster when multiple zones are permanently affected. +**Geographically redundant storage (GRS)** | Protects against an entire region outage by replicating data to a secondary region hundreds of miles away from the primary. Provides at least 99.99999999999999 % (16 9\'s) durability of objects over a given year. | Replica data isn't available unless Microsoft initiates a failover to the secondary region. If failover occurs, read and write access is available. +**Read-access geographically redundant storage (RA-GRS)** | Similar to GRS. Provides at least 99.99999999999999 % (16 9\'s) durability of objects over a given year | Provides and 99.99 % read availability by allowing read access from the second region used for GRS. **Learn more:** diff --git a/docs/cloud-adoption/operations/monitor/alert.md b/docs/cloud-adoption/operations/monitor/alert.md index 0a9abb382f7..dc294a6829c 100644 --- a/docs/cloud-adoption/operations/monitor/alert.md +++ b/docs/cloud-adoption/operations/monitor/alert.md @@ -2,7 +2,7 @@ title: Cloud monitoring guide – Alerting titleSuffix: Microsoft Cloud Adoption Framework for Azure description: Choose when to use Azure Monitor or System Center Operations Manager in Microsoft Azure -author: mgoedtel +author: MGoedtel ms.author: magoedte ms.date: 06/26/2019 ms.topic: guide diff --git a/docs/cloud-adoption/operations/monitor/cloud-app-howto.md b/docs/cloud-adoption/operations/monitor/cloud-app-howto.md deleted file mode 100644 index 38a947ae4cb..00000000000 --- a/docs/cloud-adoption/operations/monitor/cloud-app-howto.md +++ /dev/null @@ -1,133 +0,0 @@ ---- -title: Cloud monitoring guide – Monitoring Azure cloud apps -titleSuffix: Microsoft Cloud Adoption Framework for Azure -description: Choose when to use Azure Monitor or System Center Operations Manager in Microsoft Azure -author: mgoedtel -ms.author: magoedte -ms.date: 06/26/2019 -ms.topic: guide -ms.service: cloud-adoption-framework -ms.subservice: operate -services: azure-monitor ---- - -# Cloud monitoring guide: Monitoring Azure cloud applications - -This article includes our recommended monitoring strategy for each of the cloud deployment models, based on the following criteria: - -- You require continued commitment to Operations Manager or other enterprise monitoring platform. This is because of integration with your IT operations processes, knowledge and expertise, or because certain functionality isn't available yet in Azure Monitor. -- You have to monitor workloads both on-premises and in the public cloud, or just in the cloud. -- Your cloud migration strategy includes modernizing IT operations and moving to our cloud monitoring services and solutions. -- You might have critical systems that are air-gapped or physically isolated, hosted in a private cloud or on physical hardware, and need to be monitored. - -## Azure cloud monitoring - -Azure Monitor is the platform service that provides a single source for monitoring Azure resources. It's designed for cloud solutions that are built on Azure, and that support a business capability that is based on VM workloads or complex architectures that use microservices and other platform resources. It monitors all layers of the stack, starting with tenant services such as Azure Active Directory Domain Services, and subscription-level events and Azure service health. It also monitors infrastructure resources like VMs, storage, and network resources, and, at the top layer, your application. Monitoring each of these dependencies, and collecting the right signals that each can emit, gives you the observability of applications and the key infrastructure you need. - -The following table summarizes the recommended approach to monitoring each layer of the stack. - - - -Layer | Resource | Scope | Method ----|---|---|---- -Application | Web-based application running on .NET, .NET Core, Java, JavaScript, and Node.js platform on an Azure VM, Azure App Services, Azure Service Fabric, Azure Functions, and Azure Cloud Services | Monitor a live web application to automatically detect performance anomalies, identify code exceptions and issues, and collect usability telemetry. | Application Insights -Containers | Azure Kubernetes Service/Azure Container Instances | Monitor capacity, availability, and performance of workloads running on containers and container instances. | Azure Monitor for containers -Guest operating system | Linux and Windows VM operating system | Monitor capacity, availability, and performance. Map dependencies hosted on each VM, including the visibility of active network connections between servers, inbound and outbound connection latency, and ports across any TCP-connected architecture. | Azure Monitor for VMs -Azure resources - PaaS | Azure Database services (for example, SQL or mySQL) | Azure Database for SQL performance metrics. | Enable diagnostic logging to stream SQL data to Azure Monitor Logs. -Azure resources - IaaS | 1. Azure Storage
2. Azure Application Gateway
3. Azure Key Vault
4. Network security groups
5. Azure Traffic Manager | 1. Capacity, availability, and performance.
2. Performance and diagnostic logs (activity, access, performance, and firewall).
3. Monitor how and when your key vaults are accessed, and by whom.
4. Monitor events when rules are applied, and the rule counter for how many times a rule is applied to deny or allow.
5. Monitor endpoint status availability. | 1. Storage metrics for Blob storage.
2. Enable diagnostic logging and configure streaming to Azure Monitor Logs.
3.
4. Enable diagnostic logging of network security groups, and configure streaming to Azure Monitor Logs.
5. Enable diagnostic logging of Traffic Manager endpoints, and configure streaming to Azure Monitor Logs. -Network| Communication between your virtual machine and one or more endpoints (another VM, a fully qualified domain name, a uniform resource identifier, or an IPv4 address). | Monitor reachability, latency, and network topology changes that occur between the VM and the endpoint. | Azure Network Watcher -Azure subscription | Azure service health and basic resource health |
  • Administrative actions performed on a service or resource.
  • Service health with an Azure service is in a degraded or unavailable state.
  • Health issues detected with an Azure resource from the Azure service perspective.
  • Operations performed with Azure Autoscale indicating a failure or exception.
  • Operations performed with Azure Policy indicating that an allowed or denied action occurred.
  • Record of alerts generated by Azure Security Center. |Delivered in the Activity Log for monitoring and alerting by using Azure Resource Manager. -Azure tenant|Azure Active Directory || Enable diagnostic logging, and configure streaming to Azure Monitor Logs. - - - -## Hybrid cloud monitoring - -Some organizations aren't ready to embrace the latest DevOps practices and cloud innovations to manage their heterogenous environments with Azure Monitor. For this situation, Microsoft has several strategies intended to support your business and IT operational goals, realizing the need for integration and phased migration from your current tools to Monitor. - -The following are the likely candidates for this scenario: - -- You need to collect data from Azure resources supporting the workload, and forward them to your existing on-premises or managed service provider tools. -- You need to maintain investment in System Center Operations Manager, and configure it to monitor IaaS and PaaS resources running in Azure. Optionally, because you are monitoring two environments with different characteristics, based on your requirements, you determine integrating with Monitor supports your strategy. -- As part of your modernization strategy, you commit to Monitor for monitoring the resources in Azure and on your corporate network. This decision represents an effort to standardize on a single tool, and reduce costs and complexity. - -### Collect and stream monitoring data to third-party or on-premises tools - -To collect metrics and logs from Azure infrastructure and platform resources, you enable Azure diagnostic logs for those resources. For Azure VMs, you can collect metrics and logs from the guest operating system, and other diagnostic data using the Azure Diagnostics extension. By using [Event Hubs](https://docs.microsoft.com/azure/azure-monitor/platform/diagnostic-logs-stream-event-hubs), you can stream diagnostic data emitted from Azure resources to your on-premises tools or managed service provider. - -### Monitor with System Center Operations Manager - -This is the best choice if you require a monitoring platform that provides full visibility and holistic health monitoring of the application. This includes monitoring the workload components that have been migrated to Azure and that are still on-premises. The knowledge defined in management packs describes how to monitor the individual dependencies and components. These include the guest operating system (Windows and Linux), the workloads running on the VM (for example, SQL Server and Apache Tomcat), and resources hosted in Azure that use the Azure Management Pack. To describe and measure the end-to-end health of the application, you customize Operations Manager to build a model representing the relationship among the components of the application. This model allows you to view the overall health of the application at any point in time, as well as measure the availability of your application against defined SLAs. - -With Azure or other cloud providers existing as an extension of your own on-premises network, with Operations Manager you can monitor the Linux and Windows VMs as if they were on your corporate network and intranet. At a minimum, monitoring VMs requires deploying the Operations Manager monitoring agent on the VMs. You must also deploy the applicable operating system management pack that supports the version of the operating system installed on the VMs. - -At the application tier, Operations Manager offers basic application performance monitoring capabilities for some versions of .NET and Java. If certain applications within your hybrid cloud environment operate in an offline or network-isolated mode, such that they can't communicate with a public cloud service, Operations Manager might be your best option. For applications, hosted both on-premises and in any public cloud, that allow communication through a firewall to Azure, use Azure Monitor Application Insights. This offers deep, code-level monitoring, with first-class support for .NET, .NET Core, Java, JavaScript, and Node.js. - -For any web application that can be reached externally, enable availability monitoring. It's extremely important to know if your application, or a critical HTTP/HTTPS endpoint that your app relies on, is available and responsive. Application Insights availability monitoring allows you to run tests from multiple Azure datacenters, and provide insight into the health of your application from a global perspective. - -After availability monitoring is in place, Application Insights offers two core forms of code level monitoring: live application monitoring or SDK-based monitoring. - -**Live app monitoring** is available for .NET and .NET Core. It allows instrumenting an application without modifying its internal code. You can use live app monitoring to quickly add monitoring to an already deployed live application that you wrote. You can also perform deep application monitoring against third-party .NET applications, where you don’t necessarily have access to the source code. Live app monitoring can collect: - -- Requests and exceptions. -- Dependency diagnostic information, including SQL Command text. -- System performance counters. - -**SDK-based monitoring** integrates the Application Insights SDK directly into your app’s codebase. It's therefore much more flexible, and allows a level of granularity in monitoring that wouldn't otherwise be possible as you can add custom tracking to any part of your code. A subset of SDK-based monitoring is *client-side monitoring*, where JavaScript is used to collect information on the customer’s in-browser experience of your web app. This collection allows for deep user behavior analytics of both the server-side and client-side experiences. - -Integrating Azure Monitor with Operations Manager provides several advantages: - -- Monitor Logs delivers a scalable, powerful, integrated analytics platform. It complements the Operations Manager data warehouse database when you want to collect specific and valued performance and log data. Monitor delivers better analytics, performance when querying large data volume, and retention than the Operations Manager data warehouse. The Kusto query language allows you to create much more complex and sophisticated queries, with an ability to run queries across terabytes of data in seconds. You can quickly transform your data into pie charts, time charts, and many other visualizations. No longer are you constrained by working with reports in Operations Manager based on SQL Server Reporting Services, custom SQL queries, or other workarounds to analyze this data. - -- Analyze alerts using the Azure Monitor Alerts Management solution. Alerts generated in the Operations Manager management group are forwarded to the Azure Monitor Logs Analytics workspace. You can configure the subscription responsible for forwarding alerts from Operations Manager to Monitor Logs to only forward certain alerts. For example, you can forward only alerts that meet your criteria for querying in support of problem management for trends, and investigation of the root cause of failures or problems, through a single pane of glass. Additionally, you can correlate other log data from Application Insights or other sources, to gain insight to help improve user experience, increase uptime, and reduce time to resolve incidents. - -- Use the System Center Operations Manager Health Check solution to assess the risk and health of your System Center Operations Manager management group on a regular interval. - -- With the Map feature of Azure Monitor for VMs, you can monitor standard connectivity metrics from network connections between your Azure VMs and on-premises VMs. These metrics include response time, requests per minute, traffic throughput, and links. You can identify failed connections, troubleshoot, perform migration validation, perform security analysis, and verify the overall architecture of the service. Map can automatically discover application components on Windows and Linux systems, and map the communication between services. This helps you identify connections and dependencies you were unaware of, plan and validate migration to Azure, and minimize speculation during incident resolution. - -- By using Network Performance Monitor, monitor the network connectivity between: - - - Your corporate network and Azure. - - Mission critical multitier applications and micro-services. - - User locations and web-based applications (HTTP/HTTPs). - - This strategy can deliver visibility of the network layer, without the need for SNMP. It can also present in an interactive topology map the hop-by-hop topology of routes between the source and destination endpoint. It can be a better choice than attempting to accomplish the same result with network monitoring in Operations Manager, or other network monitoring tools currently used in your organization. - -### Monitor with Azure Monitor - -Use Azure Monitor to migrate from one or more on-premises enterprise monitoring tools, as part of your cloud migration strategy. But understand that it was not designed with the intention of replacing a mature product like Operations Manager. There are distinct features available in Operations Manager and other on-premises enterprise monitoring platforms that Monitor doesn’t provide. - -- Azure Monitor for VMs introduces health monitoring in the cloud, but doesn't support monitoring VMs outside of Azure, nor other infrastructure or platform resources supporting the application. Additionally, it doesn't include support for creating the custom health monitoring criteria you might need to meet your monitoring requirements. -- Azure Monitor doesn't include the notion of a service model that represents the components and relationships among them. Instead, you enable data collection from each resource, and configure your monitoring logic after the data is written to the metric or logs store. -- You can't convert your monitoring configuration from System Center Operations Manager into a Resource Manager template. For example, if you want to transition the monitoring configuration from management packs targeting the guest operating system and workloads running on the VM, there is no conversion tool available. -- You can't suppress alerts during planned or emergency maintenance windows. -- Visualizing data in Azure Monitor is delivered by using several different features in Azure. These features include Azure dashboards, Monitor views delivered with monitoring solutions for log data, and workbooks for incident investigation. Each of these visualization methods is applicable across several scenarios. However, there are limitations pertaining to Grafana for rich dashboards, and to exporting the data to Power BI to deliver business and IT-centric dashboards. We recommend using reports for different personas in the organization. -- Centralized and effective management of a predefined monitoring configuration (solutions, data collection, alerting, and visualizations) isn't available. Neither is verification of applying a configuration change, and how to best define targeting and grouping of affected resources. -- Microsoft consolidated alerting in Monitor to deliver a centralized alerting service. It takes advantage of cloud services, such as machine learning. Monitor alerting doesn't have some features, such as searching alerts based on a query, customized notification messages, and suppression of alerts. -- Monitor identifies if the agent stops sending data to the service, by using a heartbeat event it generates and sends every 60 seconds. It doesn't monitor and alert other aspects of agent health, such as resource utilization, nor does it monitor and alert if there is latency between the agent to the service. Monitoring the agent requires custom-developed monitoring logic with Azure Monitor to proactively identify symptoms affecting agent performance and reliability. - -## Private cloud monitoring - -You can achieve holistic monitoring of Azure Stack with System Center Operations Manager. Specifically, you can monitor the workloads running in the tenant, the resource level, on the virtual machines, and the infrastructure hosting Azure Stack (physical servers and network switches). You can also achieve holistic monitoring with a combination of [infrastructure monitoring capabilities](/azure/azure-stack/azure-stack-monitor-health) included in Azure Stack. These capabilities help you view health and alerts for an Azure Stack region and the [Azure Monitor service](/azure/azure-stack/user/azure-stack-metrics-azure-data) in Azure Stack, which provides base-level infrastructure metrics and logs for most services. - -If you've already invested in Operations Manager, use the Azure Stack management pack to monitor the availability and health state of Azure Stack deployments. This includes regions, resource providers, updates, update runs, scale units, unit nodes, infrastructure roles, and their instances (logical entities comprised of the hardware resources). It uses the Health and Update resource provider REST APIs to communicate with Azure Stack. To monitor physical servers and storage devices, use the OEM vendors' management pack (for example, provided by Lenovo, Hewlett Packard, or Dell). Operations Manager can natively monitor the network switches to collect basic statistics by using the SNMP protocol. Monitoring the tenant workloads is possible with the Azure management pack by following two basic steps. Configure the subscription that you want to monitor, and then add the monitors for that subscription. - -## Summary - -To summarize, the following table highlights monitoring scenarios and how our monitoring platforms support each scenario. - -Scenario | Azure Monitor | Operations Manager -:--|:---|:--- -Infrastructure monitoring | Currently delivers health monitoring experience for Azure VMs, somewhat similar to Operations Manager. | Supports monitoring most of the infrastructure from the corporate network. Tracks availability state, metrics, and alerts for Azure VMs, SQL, and storage via the Azure management pack (polling Azure Resource Manager APIs). -Monitor server workloads | Can collect IIS and SQL Server error logs, Windows events, and performance counters. Requires creating custom queries, alerts, and visualizations. | Supports monitoring most of the server workloads with available management packs. Requires either the Log Analytics Windows agent or Operations Manager agent on the VM, reporting back to the management group on the corporate network. -Web application monitoring | Application Insights includes support for the latest versions of .NET, Java, and other platforms. Comprehensive web application monitoring to detect and help diagnose issues with code, capacity, and responsiveness. | Supports monitoring older versions of .NET and Java web servers. Requires creating a custom management pack by using REST API to query data from Application Insights and stream to Operations Manager. -Azure service monitoring | [Azure Service Health](https://docs.microsoft.com/azure/service-health/overview) provides the ability to monitor your service, and how the health of the underlying Azure infrastructure affects your service. | While there is no native monitoring of Azure service health provided today through a management pack, you can create custom workflows to query Azure service health alerts. Use the Azure REST API, and get alerts through your existing notifications. -Network performance monitoring | Azure Monitor Network Insights monitors the Azure networking stack, network performance, and NSGs. Azure Monitor for VM's Map feature includes connectivity metrics between Azure and other environment VMs. | Supports availability checks, and collects basic statistics from network devices by using the SNMP protocol from the corporate network. -Data aggregation | Azure Monitor Logs and alert management support processing data from Operations Manager and other platforms. | Relies on SQL Server Reporting Services pre-canned or custom reports, third-party visualization solutions, or a custom Power BI implementation. There are scale and performance limitations with the Operations Manager data warehouse. Integrate with Azure Monitor Logs as an alternative for data aggregation requirements. Integration is achieved by configuring the Log Analytics connector. -End-to-end diagnostics, root-cause analysis, and timely troubleshooting | Azure Monitor delivers end-to-end diagnostics and root-cause analysis for developer and IT operations, from your cloud and on-premises environments. It does this through several features and tools that provide valuable insights into your applications and other resources that they depend on.| Supports end-to-end diagnostics and troubleshooting only for on-premises infrastructure and applications. It uses other System Center components or partner solutions. -Experiences – Dashboards, reports, integrations with IT/DevOps tools | Supports integration with Azure dashboards, Power BI, Grafana, and integration with ITSM tools to forward collected data and alerts. | Supports dashboards natively, or by using partner solutions from Squared Up and Savision. Integrates with ITSM tools by using custom code, System Center Orchestrator, or partner solutions based on the Operations Manager SDK. - -## Next steps - -> [!div class="nextstepaction"] -> [Collecting the right data](./data-collection.md) diff --git a/docs/cloud-adoption/operations/monitor/cloud-models-monitor-overview.md b/docs/cloud-adoption/operations/monitor/cloud-models-monitor-overview.md new file mode 100644 index 00000000000..139d58712b9 --- /dev/null +++ b/docs/cloud-adoption/operations/monitor/cloud-models-monitor-overview.md @@ -0,0 +1,59 @@ +--- +title: Cloud monitoring guide – Monitoring strategy for cloud deployment models +titleSuffix: Microsoft Cloud Adoption Framework for Azure +description: Choose when to use Azure Monitor or System Center Operations Manager in Microsoft Azure +author: MGoedtel +ms.author: magoedte +ms.date: 07/31/2019 +ms.topic: guide +ms.service: cloud-adoption-framework +ms.subservice: operate +services: azure-monitor +--- + +# Cloud monitoring guide: Monitoring strategy for cloud deployment models + +This article includes our recommended monitoring strategy for each of the cloud deployment models, based on the following criteria: + +- You require continued commitment to Operations Manager or other enterprise monitoring platform. This is because of integration with your IT operations processes, knowledge and expertise, or because certain functionality isn't available yet in Azure Monitor. +- You have to monitor workloads both on-premises and in the public cloud, or just in the cloud. +- Your cloud migration strategy includes modernizing IT operations and moving to our cloud monitoring services and solutions. +- You might have critical systems that are air-gapped or physically isolated, hosted in a private cloud or on physical hardware, and need to be monitored. + +Our strategy includes support for monitoring infrastructure (compute, storage, and server workloads), application (end-user, exceptions, and client), and network resources to deliver a complete, service-oriented monitoring perspective. + +## Azure cloud monitoring + +Azure Monitor is the platform service that provides a single source for monitoring Azure resources. It's designed for cloud solutions that are built on Azure, and that support a business capability that is based on VM workloads or complex architectures that use microservices and other platform resources. It monitors all layers of the stack, starting with tenant services such as Azure Active Directory Domain Services, and subscription-level events and Azure service health. It also monitors infrastructure resources like VMs, storage, and network resources, and, at the top layer, your application. Monitoring each of these dependencies, and collecting the right signals that each can emit, gives you the observability of applications and the key infrastructure you need. + +The following table summarizes the recommended approach to monitoring each layer of the stack. + + + +Layer | Resource | Scope | Method +---|---|---|---- +Application | Web-based application running on .NET, .NET Core, Java, JavaScript, and Node.js platform on an Azure VM, Azure App Services, Azure Service Fabric, Azure Functions, and Azure Cloud Services | Monitor a live web application to automatically detect performance anomalies, identify code exceptions and issues, and collect usability telemetry. | Application Insights +Containers | Azure Kubernetes Service/Azure Container Instances | Monitor capacity, availability, and performance of workloads running on containers and container instances. | Azure Monitor for containers +Guest operating system | Linux and Windows VM operating system | Monitor capacity, availability, and performance. Map dependencies hosted on each VM, including the visibility of active network connections between servers, inbound and outbound connection latency, and ports across any TCP-connected architecture. | Azure Monitor for VMs +Azure resources - PaaS | Azure Database services (for example, SQL or mySQL) | Azure Database for SQL performance metrics. | Enable diagnostic logging to stream SQL data to Azure Monitor Logs. +Azure resources - IaaS | 1. Azure Storage
    2. Azure Application Gateway
    3. Azure Key Vault
    4. Network security groups
    5. Azure Traffic Manager | 1. Capacity, availability, and performance.
    2. Performance and diagnostic logs (activity, access, performance, and firewall).
    3. Monitor how and when your key vaults are accessed, and by whom.
    4. Monitor events when rules are applied, and the rule counter for how many times a rule is applied to deny or allow.
    5. Monitor endpoint status availability. | 1. Storage metrics for Blob storage.
    2. Enable diagnostic logging and configure streaming to Azure Monitor Logs.
    3. Enable diagnostic logging and configure streaming to Azure Monitor Logs, and enable the [Azure Key Vault Analytics Solution](https://docs.microsoft.com/azure/azure-monitor/insights/azure-key-vault).
    4. Enable diagnostic logging of network security groups, and configure streaming to Azure Monitor Logs.
    5. Enable diagnostic logging of Traffic Manager endpoints, and configure streaming to Azure Monitor Logs. +Network| Communication between your virtual machine and one or more endpoints (another VM, a fully qualified domain name, a uniform resource identifier, or an IPv4 address). | Monitor reachability, latency, and network topology changes that occur between the VM and the endpoint. | Azure Network Watcher +Azure subscription | Azure service health and basic resource health |
  • Administrative actions performed on a service or resource.
  • Service health with an Azure service is in a degraded or unavailable state.
  • Health issues detected with an Azure resource from the Azure service perspective.
  • Operations performed with Azure Autoscale indicating a failure or exception.
  • Operations performed with Azure Policy indicating that an allowed or denied action occurred.
  • Record of alerts generated by Azure Security Center. |Delivered in the Activity Log for monitoring and alerting by using Azure Resource Manager. +Azure tenant|Azure Active Directory || Enable diagnostic logging, and configure streaming to Azure Monitor Logs. + + + +## Hybrid cloud monitoring + +This section is currently under development to deliver a comprehensive set of recommendations intended to address your interest for this cloud model, and will be made available shortly. + +## Private cloud monitoring + +You can achieve holistic monitoring of Azure Stack with System Center Operations Manager. Specifically, you can monitor the workloads running in the tenant, the resource level, on the virtual machines, and the infrastructure hosting Azure Stack (physical servers and network switches). You can also achieve holistic monitoring with a combination of [infrastructure monitoring capabilities](/azure/azure-stack/azure-stack-monitor-health) included in Azure Stack. These capabilities help you view health and alerts for an Azure Stack region and the [Azure Monitor service](/azure/azure-stack/user/azure-stack-metrics-azure-data) in Azure Stack, which provides base-level infrastructure metrics and logs for most services. + +If you've already invested in Operations Manager, use the Azure Stack management pack to monitor the availability and health state of Azure Stack deployments. This includes regions, resource providers, updates, update runs, scale units, unit nodes, infrastructure roles, and their instances (logical entities comprised of the hardware resources). It uses the Health and Update resource provider REST APIs to communicate with Azure Stack. To monitor physical servers and storage devices, use the OEM vendors' management pack (for example, provided by Lenovo, Hewlett Packard, or Dell). Operations Manager can natively monitor the network switches to collect basic statistics by using the SNMP protocol. Monitoring the tenant workloads is possible with the Azure management pack by following two basic steps. Configure the subscription that you want to monitor, and then add the monitors for that subscription. + +## Next steps + +> [!div class="nextstepaction"] +> [Collecting the right data](./data-collection.md) diff --git a/docs/cloud-adoption/operations/monitor/data-collection.md b/docs/cloud-adoption/operations/monitor/data-collection.md index 53b07272074..3616781a7cb 100644 --- a/docs/cloud-adoption/operations/monitor/data-collection.md +++ b/docs/cloud-adoption/operations/monitor/data-collection.md @@ -2,7 +2,7 @@ title: Cloud monitoring guide – Collecting the right data titleSuffix: Microsoft Cloud Adoption Framework for Azure description: Choose when to use Azure Monitor or System Center Operations Manager in Microsoft Azure -author: mgoedtel +author: MGoedtel ms.author: magoedte ms.date: 06/26/2019 ms.topic: guide @@ -42,7 +42,7 @@ To drive quicker resolution of the incident, consider the following recommendati Embracing this guiding set of principles gives you near real-time insights, as well as better management of your service. -## Next step +## Next steps > [!div class="nextstepaction"] > [Alerting strategy](./alert.md) diff --git a/docs/cloud-adoption/operations/monitor/index.md b/docs/cloud-adoption/operations/monitor/index.md index e50d7d953d8..c612b0acf71 100644 --- a/docs/cloud-adoption/operations/monitor/index.md +++ b/docs/cloud-adoption/operations/monitor/index.md @@ -2,9 +2,9 @@ title: Cloud monitoring guide titleSuffix: Microsoft Cloud Adoption Framework for Azure description: Overview of Azure Monitor and System Center Operations Manager -author: mgoedtel +author: MGoedtel ms.author: magoedte -ms.date: 06/26/2019 +ms.date: 07/31/2019 ms.topic: guide ms.service: cloud-adoption-framework ms.subservice: operate @@ -23,7 +23,7 @@ This digital transformation is also enabling an opportunity to modernize your in Stakeholders want to use cloud-based, software as a service (SaaS) monitoring and management tools. They need to understand what services and solutions deliver in order to achieve end-to-end visibility, reduce costs, and focus less on infrastructure and maintenance of traditional software-based IT operations tools. -However, IT often prefers to use the tools they have already made a significant investment in. This supports their service operations processes to monitor both, with the eventual goal of transitioning to a SaaS-based offering. This choice is not only because it takes time planning, resources, and funding to switch. It's also due to confusion about which products or Azure services are appropriate or applicable to achieve the transition. +However, IT often prefers to use the tools they have already made a significant investment in. This supports their service operations processes to monitor both cloud models, with the eventual goal of transitioning to a SaaS-based offering. This choice is not only because it takes time planning, resources, and funding to switch. It's also due to confusion about which products or Azure services are appropriate or applicable to achieve the transition. The goal of this guide is to provide a detailed reference to help enterprise IT managers, business decision makers, application architects, and application developers understand: @@ -33,6 +33,8 @@ The goal of this guide is to provide a detailed reference to help enterprise IT This guide isn't a how-to guide for using or configuring individual Azure services and solutions, but does reference those sources when applicable or available. After reading this guide, you'll understand how to successfully operate a workload following recommended practices and patterns. +If you are unfamiliar with Azure Monitor and System Center Operations Manager and you would like to get a better understanding of what makes them unique and how they compare to each other before going any further, review the [Overview of our monitoring platforms](./platform-overview.md). + ## Audience This guide is primarily useful for enterprise administrators, IT operations, IT security and compliance, application architects, workload development owners, and workload operations owners. @@ -42,8 +44,7 @@ This guide is primarily useful for enterprise administrators, IT operations, IT This article is part of a series. The following articles are meant to be read together, in order: * Introduction (this article) -* [Overview of the Azure monitoring platform](./platform-overview.md) -* [Monitoring Azure cloud applications](./cloud-app-howto.md) +* [Monitoring strategy for cloud deployment models](./cloud-models-monitor-overview.md) * [Collecting the right data](./data-collection.md) * [Alerting](./alert.md) @@ -53,14 +54,14 @@ A selection of software and services are available to monitor and manage a varie * System Center Operations Manager * Azure Monitor, which now includes Log Analytics and Application Insights -* Azure Blueprints and Azure Policy +* Azure Policy and Azure Blueprints * Azure Automation * Azure Logic Apps * Azure Event Hubs -A large part of this guide discusses and contrasts Azure Monitor to System Center Operations Manager. +This first version of the guide covers our current monitoring platforms - Azure Monitor and System Center Operations Manager, and outlines our recommended strategy for monitoring each of the cloud deployment models. Also included is the first set of monitoring recommendations, starting with data collection and alerting. -## Next step +## Next steps > [!div class="nextstepaction"] -> [Overview of the Azure monitoring platform](./platform-overview.md) +> [Monintoring strategy for cloud deployment models](./cloud-models-monitor-overview.md) diff --git a/docs/cloud-adoption/operations/monitor/platform-overview.md b/docs/cloud-adoption/operations/monitor/platform-overview.md index b18abeb8051..0c53384d459 100644 --- a/docs/cloud-adoption/operations/monitor/platform-overview.md +++ b/docs/cloud-adoption/operations/monitor/platform-overview.md @@ -1,78 +1,69 @@ --- -title: Cloud monitoring guide – Azure monitoring platform overview +title: Cloud monitoring guide – monitoring platforms overview titleSuffix: Microsoft Cloud Adoption Framework for Azure description: Choose when to use Azure Monitor or System Center Operations Manager in Microsoft Azure -author: mgoedtel +author: MGoedtel ms.author: magoedte -ms.date: 06/26/2019 +ms.date: 07/31/2019 ms.topic: guide ms.service: cloud-adoption-framework ms.subservice: operate services: azure-monitor --- -# Cloud monitoring guide: Overview of the Azure monitoring platform +# Cloud monitoring guide: Overview of our monitoring platforms -Microsoft provides a range of monitoring capabilities from two products: System Center Operations Manager for on-premises environments, and Azure Monitor for the cloud. These offerings deliver core monitoring services, such as alerting, service uptime tracking, application and infrastructure health monitoring, diagnostics, and analytics. +Microsoft provides a range of monitoring capabilities from two products: System Center Operations Manager which was designed for on-premises and then extended to the cloud and Azure Monitor, which was designed for the cloud but can also monitor on-premises systems. These two offerings deliver core monitoring services, such as alerting, service uptime tracking, application and infrastructure health monitoring, diagnostics, and analytics. Many organizations are embracing the latest practices for DevOps agility and cloud innovations to manage their heterogenous environments. Yet they are also concerned about their ability to make appropriate and responsible decisions regarding how to monitor those workloads. -This article compares the current offerings available, and outlines our recommended strategy based on the following factors: - -- Your current investment in Operations Manager or other monitoring platforms on your corporate network, and whether it’s tightly integrated with your IT operations processes. -- Your migration approach to Azure. This approach can be a staged one, extending your corporate network to Azure, or redesigning applications and services to run natively in Azure by using a combination of IaaS and PaaS resources. - -Our strategy includes support for monitoring infrastructure (compute, storage, and server workloads), application (end-user, exceptions, and client), and network resources to deliver a complete, service-oriented monitoring perspective. - -Every journey has a story. Before we dive into the detailed overview, let’s start with a brief look at where it all began when we entered the monitoring field, and how our strategy has evolved over time. +This article provides a high-level overview of our monitoring platforms to help you understand how both deliver core monitoring functionality. ## Story of System Center Operations Manager -In 2000, we entered the operations management field with Microsoft Operations Manager (MOM) 2000. In 2007, we introduced a reengineered version of the product named System Center Operations Manager. It moved beyond simple monitoring of a Windows server and concentrated on robust, end-to-end service and application monitoring, including heterogenous platforms, network devices, and other application or service dependencies. It's an established, enterprise-grade monitoring platform for on-premises environments, in the same class as IBM Tivoli or HP Operations Manager in the industry. It has evolved to support monitoring compute and platform resources running in Azure, Amazon Web Services (AWS), and other cloud providers. +In 2000, we entered the operations management field with Microsoft Operations Manager (MOM) 2000. In 2007, we introduced a re-engineered version of the product named System Center Operations Manager. It moved beyond simple monitoring of a Windows server and concentrated on robust, end-to-end service and application monitoring, including heterogenous platforms, network devices, and other application or service dependencies. It's an established, enterprise-grade monitoring platform for on-premises environments, in the same class as IBM Tivoli or HP Operations Manager in the industry. It has evolved to support monitoring compute and platform resources running in Azure, Amazon Web Services (AWS), and other cloud providers. -Operations Manager monitors these different platforms, devices, applications, and infrastructure services with a management pack. Defined in a management pack are all the elements required for monitoring an IT service or application, including the service model, health model, discovery rules, views, and reports. After the management pack is imported into the management group, it automatically detects if the agent is running the components supporting the application or IT service. If so, it runs the workflows that proactively monitor them. +## Story of Azure Monitor -As an extensible platform, Operations Manager is supported by a broad ecosystem of partners and communities that provide a variety of solutions. These solutions include: +When Azure was released in 2010, monitoring of cloud services was provided with the Azure Diagnostics agent, which delivered a way to collect diagnostic data from Azure resources. This capability was considered a general monitoring tool vs an enterprise-class monitoring platform. -- Management packs to monitor non-Microsoft applications, vendor hardware, and other technologies, to deliver full-stack monitoring or to automate and extend features of Operations Manager. -- Visualization products that include advanced dashboarding capabilities and engaging data visualizations. These help to easily analyze the data on any browser or device in the enterprise. -- Integration with other ITSM products or orchestration tools, to support incident recording, configuration management, and incident autoremediation. -- Custom code to automate and extend Operations Manager, by using published APIs. +Application Insights was introduced to shift with changes in the industry where proliferation of cloud, mobile and IoT devices was growing and the introduction of DevOps practices. It evolved from Application Performance Monitoring in Operations Manager to a service in Azure, where it delivers rich monitoring of web applications written in a variety of languages. In 2015, the preview of Application Insights for Visual Studio was announced and later, it became known just as Application Insights. It collects details on application performance, requests and exceptions, and traces. -## Story of Azure Monitor +In 2015, Azure Operational Insights was made generally available. It delivered the Log Analytics analysis service that collected and searched data from machines in Azure, on-prem or other cloud environments, and connected to System Center Operations Manager. Intelligence packs were offered that delivered different pre-packaged management and monitoring configurations that contained a collection of query and analytic logic, visualizations, and data collection rules for such scenarios as security auditing, health assessments, and alert management. Later Azure Operational Insights became known as Log Analytics. -![Timeline](./media/monitoring-management-guidance-cloud-and-on-premises/timeline-v2-opt.svg) +In 2016, the preview of Azure Monitor was announced Ignite. It provided a common framework to collect platform metrics, resource diagnostics logs, and subscription-level activity log events from any Azure service that started using the framework. Previously, each Azure service had its own monitoring method. -When Azure cloud was released in 2010, monitoring of cloud services was provided with the Azure Diagnostics agent, which delivered a way to collect diagnostic data from Azure compute and platform resources. In addition to viewing this data in the Azure portal, you can use Azure Storage to view the data with one of several available tools, such as Server Explorer in Visual Studio and Azure Storage Explorer. +At Microsoft Ignite conference in 2018, we announced that the Azure Monitor brand expanded to include several different services originally developed with independent functionality: -This capability was considered a general monitoring tool. It lacked many features that were available in enterprise-class monitoring platforms, and it wasn’t consistent between each service that supported this method. In addition, as new Azure services were released, each provided its own monitoring method. Azure services lacked an overall, common monitoring methodology and framework. For these reasons, we began working on Azure Insights. It provided a common framework to collect platform metrics and logs from any Azure service that started using the framework. +* The original **Azure Monitor** functionality of collecting platform metrics, resource diagnostics logs, and activity logs for only Azure platform resources. +* **Application Insights** for application monitoring. +* **Log Analytics** as the primary location for collection and analysis of log data. +* A new **unified alerting service** that brought together alert mechanisms from each of the other services mentioned earlier. +* **Azure Network Watcher** to monitor, diagnose, and view metrics for resources in an Azure virtual network. -In 2015, Azure Operational Insights was made generally available. It delivered the Log Analytics analysis service that collected and searched data from machines in Azure, on-premises, or other cloud environments, and it connected to System Center Operations Manager. Intelligence packs delivered different pre-packaged management and monitoring configurations. These contained a collection of query and analytic logic, visualizations, and data collection rules for such scenarios as capacity planning, security auditing, health assessments, and alert management. As part of this release, Azure Operational Insights was added to the new Operations Management Suite (OMS). OMS intended to provide a unified IT management experience for organizations by bringing together a collection of IT management solutions, including automation, backup, recovery, and security. +## Story of Operations Management Suite (OMS) -Early on, Microsoft realized the need for a common monitoring framework and began working on Azure Insights. This service collected and routed platform metrics and logs from any Azure service into a central pipeline. In 2016, we rebranded Azure Insights as Azure Monitor, which was released in March 2017. Even before release, we began consolidation of the various alerting capabilities into the newly branded Azure Monitor. +From 2015 until April 2018, Operations Management Suite (OMS) was a bundling of the following Azure management services for licensing purposes: -In 2018, Azure Monitor expanded to include several different services that were originally developed for independent functionality: +* Application Insights +* Azure Automation +* Azure Backup +* Operational Insights (later the rebranded Log Analytics) +* Site Recovery -- **Log Analytics** provided rich analytical insight of log data from services and applications. It shared the same agent as Operations Manager to collect monitoring data from VMs both in the cloud and on-premises. Monitoring solutions in Log Analytics were like management packs in Operations Manager, providing packaged logic to monitor a particular product or service. -- **Application Insights** grew from Application Performance Monitoring in Operations Manager. It provided rich monitoring of web applications written in a variety of languages. Application Insights collects details on application performance, requests and exceptions, and traces. -- **Azure Insights** (briefly branded Azure Monitor) provided core metrics and resource diagnostics logs for only Azure platform resources. It also included alerting and notification based on those metrics. The notification system included the ability to integrate with partner applications through webhooks and ITSM software. Later, it also integrated with Azure Logic Apps and Azure Automation, in addition to common notification methods such as voice, SMS, and email. -- **Azure Network Watcher** to monitor, diagnose, and view metrics for resources in an Azure virtual network. - -Now that you understand the history, let’s review the way Operations Manager monitors applications and infrastructure services, and understand the advantages Azure Monitor provides as a SaaS monitoring platform for organizations. +The functionality of the services that were part of OMS did not change when OMS was discontinued, they were realigned under Azure Monitor. ## Infrastructure requirements ### Operations Manager -Operations Manager requires significant infrastructure to support a management group, which is a basic unit of functionality. At a minimum, a management group consists of one or more management servers, a SQL Server, hosting the operational and reporting data warehouse database, and agents. The complexity of a management group design depends on a number of factors, such as the scope of workloads to monitor, how many devices or computers support the workloads, and if you require high availability and site resiliency. +Operations Manager requires significant infrastructure and maintenance to support a management group, which is a basic unit of functionality. At a minimum, a management group consists of one or more management servers, a SQL Server, hosting the operational and reporting data warehouse database, and agents. The complexity of a management group design depends on a number of factors, such as the scope of workloads to monitor, and how many devices or computers support the workloads. If you require high availability and site resiliency, as is commonly the case with enterprise monitoring platforms, the infrastructure requirements and associated maintenance can increase dramatically. ![Diagram of Operations Manager management group](./media/monitoring-management-guidance-cloud-and-on-premises/operations-manager-management-group-optimized.svg) -Each of these components requires infrastructure and software within your corporate network that must be maintained. - ### Azure Monitor -Azure Monitor is a SaaS service, where all the infrastructure supporting it is running in the Azure cloud and is managed by Microsoft. It's designed to do monitoring, analytics, and diagnostics at scale, and is available in all national clouds. Core parts of the infrastructure (collectors, metrics and logs store, and analytics) are necessary to support Azure Monitor. +Azure Monitor is a SaaS service, where all the infrastructure supporting it is running in Azure and is managed by Microsoft. It's designed to do monitoring, analytics, and diagnostics at scale, and is available in all national clouds. Core parts of the infrastructure (collectors, metrics and logs store, and analytics) necessary to support Azure Monitor are maintained by Microsoft. ![Diagram of Azure Monitor](./media/monitoring-management-guidance-cloud-and-on-premises/azure-monitor-greyed-optimized.svg) @@ -82,7 +73,7 @@ Azure Monitor is a SaaS service, where all the infrastructure supporting it is r #### Agents -Operations Manager only collects data directly from agents installed on [Windows computers](https://docs.microsoft.com//system-center/scom/plan-planning-agent-deployment?view=sc-om-1807#windows-agent). It can accept data from the Operations Manager SDK, but this is typically used for partners extending the product with custom applications, not for collecting monitoring data. It collects data from other sources, such as [Linux computers](https://docs.microsoft.com/system-center/scom/plan-planning-agent-deployment?view=sc-om-1807#linuxunix-agent) and network devices, by using special modules that run on the Windows agent remotely accessing these other devices. +Operations Manager only collects data directly from agents installed on [Windows computers](https://docs.microsoft.com//system-center/scom/plan-planning-agent-deployment?view=sc-om-1807#windows-agent). It can accept data from the Operations Manager SDK, but this is typically used for partners extending the product with custom applications, not for collecting monitoring data. It can collect data from other sources, such as [Linux computers](https://docs.microsoft.com/system-center/scom/plan-planning-agent-deployment?view=sc-om-1807#linuxunix-agent) and network devices, by using special modules that run on the Windows agent remotely accessing these other devices. ![Diagram of Operations Manager agent](./media/monitoring-management-guidance-cloud-and-on-premises/data-collection-opsman-agents-optimized.svg) @@ -90,15 +81,15 @@ The Operations Manager agent can collect from multiple data sources on the local #### Management packs -Operations Manager performs all monitoring with workflows (rules, monitors, and discoveries). These are packaged together in a [management pack](https://docs.microsoft.com/system-center/scom/manage-overview-management-pack?view=sc-om-2019), and deployed to agents. Management packs are available for a variety of products and services that include predefined rules and monitors. You can also author your own management pack for your own applications and custom scenarios. +Operations Manager performs all monitoring with workflows (rules, monitors, and object discoveries). These are packaged together in a [management pack](https://docs.microsoft.com/system-center/scom/manage-overview-management-pack?view=sc-om-2019), and deployed to agents. Management packs are available for a variety of products and services that include predefined rules and monitors. You can also author your own management pack for your own applications and custom scenarios. -#### Workflows +#### Monitoring configuration -Management packs can contain hundreds of workflows, and a single agent simultaneously runs all workflows from all the management packs it has loaded. Each instance of each workflow runs independently, and acts immediately on the data that it collects. This is how Operations Manager can achieve near real-time alerting and the current health state of monitored resources. +Management packs can contain hundreds of rules, monitors, and object discovery rules. An agent runs all these monitoring settings from all the management packs that apply, which are determined by discovery rules. Each instance of each monitoring setting runs independently, and acts immediately on the data that it collects. This is how Operations Manager can achieve near real-time alerting and the current health state of monitored resources. -For example, a monitor might sample a performance counter every few minutes. If that counter exceeds a threshold, it immediately sets the health state of its target object, which immediately triggers an alert. A scheduled rule might watch for a particular event to be created, and immediately fire an alert when that event is created in the local event log. +For example, a monitor might sample a performance counter every few minutes. If that counter exceeds a threshold, it immediately sets the health state of its target object, which immediately triggers an alert in the management group. A scheduled rule might watch for a particular event to be created, and immediately fire an alert when that event is created in the local event log. -Because workflows are isolated from each other and work from the individual sources of data, Operations Manager has challenges correlating data between multiple sources. It’s also difficult to react to data after it’s been collected. You can run workflows that access the Operations Manager database, but this scenario isn’t common and it's typically used for a limited number of special purpose workflows. +Because these monitoring settings are isolated from each other and work from the individual sources of data, Operations Manager has challenges correlating data between multiple sources. It’s also difficult to react to data after it’s been collected. You can run workflows that access the Operations Manager database, but this scenario isn’t common and it's typically used for a limited number of special purpose workflows. ![Diagram of Operations Manager Management Group](./media/monitoring-management-guidance-cloud-and-on-premises/operations-manager-management-group-optimized.svg) @@ -112,15 +103,15 @@ Azure Monitor collects data from a variety of sources, including Azure infrastru Monitoring solutions use the logs platform in Azure Monitor to provide monitoring for a particular application or service. They typically define data collection from agents or from Azure services, and provide log queries and views to analyze that data. They typically don’t provide alert rules, meaning that you must define your own alert criteria based on collected data. -Insights use the logs and metrics platform of Azure Monitor to provide a customized monitoring experience for an application or service in the Azure portal. They might provide health monitoring and alerting conditions, in addition to customized analysis of collected data. +Insights, such as Azure Monitor for containers and Azure Monitor for VMs, use the logs and metrics platform of Azure Monitor to provide a customized monitoring experience for an application or service in the Azure portal. They might provide health monitoring and alerting conditions, in addition to customized analysis of collected data. -#### Workflows +#### Monitoring configuration -Azure Monitor separates data collection from actions taken against that data, which supports distributed microservices in a cloud environment. It consolidates data from multiple sources into a common data platform, and provides features or performing such monitoring tasks as analysis, visualization, and alerting base on that collected data. +Azure Monitor separates data collection from actions taken against that data, which supports distributed microservices in a cloud environment. It consolidates data from multiple sources into a common data platform, and provides analysis, visualization, and alerting capabilities based on the collected data. -Monitor stores all data collected either as logs or as metrics, and different features of Monitor rely on either. Metrics contain numerical values in time series that are well suited for near real-time alerting and fast detection of issues. Logs contain text or numerical data, and are supported by a powerful query language that make them especially useful for performing complex analysis. +All data collected by Azure Monitor is stored as either logs or metrics, and different features of Monitor rely on either. Metrics contain numerical values in time series that are well suited for near real-time alerting and fast detection of issues. Logs contain text or numerical data, and are supported by a powerful query language that make them especially useful for performing complex analysis. -Because Monitor separates data collection from actions against that data, it might not be able to provide near real-time alerting in many cases. You must retrieve all log data by using a log query, which is scheduled in alerts. This behavior allows Monitor to easily correlate data from all monitored sources, and you can interactively analyze data in a variety of ways. +Because Monitor separates data collection from actions against that data, it might not be able to provide near real-time alerting in many cases. To alert on log data, queries are run on a recurring schedule defined in the alert. This behavior allows Azure Monitor to easily correlate data from all monitored sources, and you can interactively analyze data in a variety of ways. This is especially helpful for doing root cause analysis and identifying where else an issue may occur. ## Health monitoring @@ -130,10 +121,12 @@ Management Packs in Operations Manager include a service model that describes th ### Azure Monitor -Azure Monitor doesn’t provide a standard means of implementing a service model or monitors that indicate the current health state of any service components. Because monitoring solutions are based on standard features of Azure Monitor, they don’t provide state level monitoring. The following features of Azure Monitor can be helpful: +Azure Monitor doesn’t provide a user definable method of implementing a service model or monitors that indicate the current health state of any service components. Because monitoring solutions are based on standard features of Azure Monitor, they don’t provide state level monitoring. The following features of Azure Monitor can be helpful: + +- **Application Insights** builds a composite map of your web application, and provides a health state for each application component or dependency. This includes alerts status and drill-down to more detailed diagnostics of your application. -- **Application Insights** builds a composite map of your web application, and provides a health state for each application component or dependency. This includes alerts status and drill-down to more detailed diagnostics, if your app uses Azure services. - **Azure Monitor for VMs** delivers a health monitoring experience for the guest Azure VMs, similar to Operations Manager, when monitoring Windows and Linux virtual machines. It evaluates the health of key operating system components from the perspective of availability and performance to determine the current health state. When it determines the guest VM is experiencing sustained resource utilization, disk space capacity, or an issue related to core operating system functionality, it generates an alert to bring this state to your attention. + - **Azure Monitor for containers** monitors the performance and health of Azure Kubernetes Services or Azure Container Instances. It collects memory and processor metrics from controllers, nodes, and containers that are available in Kubernetes through the Metrics API. It also collects container logs, and inventory data about containers and their images. Pre-defined health criteria based on the performance data collected help you identify if there is a resource bottleneck or capacity issue. You can also understand the overall performance, or the performance from a specific Kubernetes object type (pod, node, controller, or container). ## Analyzing data @@ -142,29 +135,29 @@ Azure Monitor doesn’t provide a standard means of implementing a service model Operations Manager provides four basic ways to analyze data after it’s collected. -- With **Health Explorer**, you can find out which monitor is reflecting a health state issue and review knowledge about the monitor and possible causes for actions related to it. +- With **Health Explorer**, you can find out which monitors are identifying a health state issue and review knowledge about the monitor and possible causes for actions related to it. - **Views** are predefined visualizations of collected data, such as a graph of performance data or a list of monitored components and their current health state. Diagram views visually present the service model of an application. -- **Reports** allow you to summarize historical data stored in the Operations Manager data warehouse. You can customize the data that views and reports are based on, but there is no feature to allow for complex or interactive analysis of collected data. +- **Reports** allow you to summarize historical data stored in the Operations Manager data warehouse. You can customize the data that views and reports are based on. However, there is no feature to allow for complex or interactive analysis of collected data. - **Operations Manager Command Shell**, which extends Windows PowerShell with an additional set of cmdlets, can query and visualize collected data. This includes graphs and other visualizations, natively with PowerShell, or with the Operations Manager HTML-based web console. ### Azure Monitor -Azure Monitor has a powerful analytics engine that allows you to interactively work with detailed or aggregated log data. Views and dashboards allow you to visualize query data in different ways from the Azure portal, and import into Power BI. Monitoring solutions include queries and views to present the data they collect. Insights such as Application Insights, Azure Monitor for VMs, and Azure Monitor for containers include customized visualizations to support interactive monitoring scenarios. +Azure Monitor has a powerful analytics engine that allows you to interactively work with log data and combine them with other monitoring data for trending and other data analysis. Views and dashboards allow you to visualize query data in different ways from the Azure portal, and import into Power BI. Monitoring solutions include queries and views to present the data they collect. Insights such as Application Insights, Azure Monitor for VMs, and Azure Monitor for containers include customized visualizations to support interactive monitoring scenarios. ## Alerting ### Operations Manager -Operations Manager creates alerts in response to important events on an agent, when a performance threshold is crossed, and when the health state of a monitored component changes. It includes complete management of alerts, allowing you to set their resolution and assign them to different operators. You can set notification rules that specify which alerts will send proactive notifications. +Operations Manager creates alerts in response to pre-defined events, when a performance threshold is met, and when the health state of a monitored component changes. It includes complete management of alerts, allowing you to set their resolution and assign them to different operators or system engineers. You can set notification rules that specify which alerts will send proactive notifications. -Management packs include various predefined alerting rules for different critical conditions in the application being monitored. You can tune these rules to the particular requirements of your environment. +Management packs include various predefined alerting rules for different critical conditions in the application being monitored. You can tune these rules or create custom rules to the particular requirements of your environment. ### Azure Monitor -Azure Monitor allows you to create alerts based on a metric crossing a threshold, or based on a scheduled query returning results. Alerts based on metrics can achieve near real-time results, while scheduled queries have a longer response time, depending on the speed of data ingestion and indexing. Instead of being limited to a specific agent, log query alerts in Azure Monitor allow you to analyze data across all data stored in multiple workspaces. These alerts also include data from a specific Application Insights app by using a cross-workspace query. +Azure Monitor allows you to create alerts based on a metric crossing a threshold, or based on a scheduled query result. Alerts based on metrics can achieve near real-time results, while scheduled queries have a longer response time, depending on the speed of data ingestion and indexing. Instead of being limited to a specific agent, log query alerts in Azure Monitor allow you to analyze data across all data stored in multiple workspaces. These alerts also include data from a specific Application Insights app by using a cross-workspace query. While monitoring solutions can include alert rules, typically you create them based on your own requirements. @@ -186,7 +179,7 @@ Azure Monitor separates data collection from actions and analysis taken from tha Operations Manager implements all monitoring logic in a management pack, which you either create yourself or obtain from us or a partner. When you install a management pack, it automatically discovers components of the application or service on different agents, and deploys appropriate rules and monitors. The management pack contains health definitions, alert rules, performance and event collection rules, and views, to provide complete monitoring supporting the infrastructure service or application. -The Operations Manager SDK enables Operations Manager to integrate with third-party monitoring platforms or ITSM software. The SDK is also used by some partner management packs to support monitoring network devices, and deliver custom presentation experiences like the Squared Up HTML5 dashboard or our Visio integration. +The Operations Manager SDK enables Operations Manager to integrate with third-party monitoring platforms or ITSM software. The SDK is also used by some partner management packs to support monitoring network devices, and deliver custom presentation experiences like the Squared Up HTML5 dashboard or integration with Office Visio. ### Azure Monitor @@ -194,7 +187,7 @@ Azure Monitor collects metrics and logs from Azure resources, with little to no Monitor supports several methods to collect monitoring or management data from Azure or external resources. You can then extract and forward data from the metric or log stores to your ITSM or monitoring tools, or perform administrative tasks by using the Azure Monitor REST API. -## Next step +## Next steps > [!div class="nextstepaction"] -> [Monitoring Azure cloud applications](./cloud-app-howto.md) +> [Monitoring the cloud deployment models](./cloud-models-monitor-overview.md) diff --git a/docs/example-scenario/apps/sap-production.md b/docs/example-scenario/apps/sap-production.md index c7e630bb944..289d406c401 100644 --- a/docs/example-scenario/apps/sap-production.md +++ b/docs/example-scenario/apps/sap-production.md @@ -81,13 +81,13 @@ Extra Large|250000|M64s|6xP30, 1xP30|DS11_v2|1x P10|10x DS14_v2|1x P10|[Extra La > [!NOTE] > This pricing is a guide and only indicates the VMs and storage costs. It excludes networking, backup storage, and data ingress/egress charges. -- [Small](https://azure.com/e/45880ba0bfdf47d497851a7cf2650c7c): A small system consists of VM type DS13_v2 for the database server with 8x vCPUs, 56-GB RAM, and 112-GB temp storage, additionally five 512-GB premium storage disks. An SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs 14-GB RAM and 28-GB temp storage. A single VM type DS13_v2 for the SAP application server with 8x vCPUs, 56-GB RAM, and 400-GB temp storage, additionally one 128-GB premium storage disk. +- [Small](https://azure.com/e/45880ba0bfdf47d497851a7cf2650c7c): A small system consists of VM type DS13_v2 for the database server with 8x vCPUs, 56-GB RAM, and 112 GB of temporary storage, along with five 512-GB premium storage disks; an SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs, 14-GB RAM, and 28 GB of temporary storage; and a single VM type DS13_v2 for the SAP application server with 8x vCPUs, 56-GB RAM, and 400 GB of temporary storage, along with one 128-GB premium storage disk. -- [Medium](https://azure.com/e/9a523f79591347ca9a48c3aaa1406f8a): A medium system consists of VM type DS14_v2 for the database server with 16x vCPUs, 112 GB RAM, and 800-GB temp storage, additionally seven 512-GB premium storage disks. An SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs 14-GB RAM and 28-GB temp storage. Four VM type DS13_v2 for the SAP application server with 8x vCPUs, 56-GB RAM, and 400-GB temp storage, additionally one 128-GB premium storage disk. +- [Medium](https://azure.com/e/9a523f79591347ca9a48c3aaa1406f8a): A medium system consists of VM type DS14_v2 for the database server with 16x vCPUs, 112 GB RAM, and 800 GB of temporary storage, along with seven 512-GB premium storage disks; an SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs 14-GB RAM and 28 GB of temporary storage; four VM type DS13_v2 for the SAP application server with 8x vCPUs, 56-GB RAM, and 400 GB of temporary storage, along with one 128-GB premium storage disk. -- [Large](https://azure.com/e/f70fccf571e948c4b37d4fecc07cbf42): A large system consists of VM type E32s_v3 for the database server with 32x vCPUs, 256-GB RAM and 800-GB temp storage, additionally three 512 GB and one 128-GB premium storage disks. An SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs 14-GB RAM and 28-GB temp storage. Six VM type DS14_v2 for the SAP application servers with 16x vCPUs, 112 GB RAM, and 224 GB temp storage, additionally six 128-GB premium storage disk. +- [Large](https://azure.com/e/f70fccf571e948c4b37d4fecc07cbf42): A large system consists of VM type E32s_v3 for the database server with 32x vCPUs, 256-GB RAM, and 800 GB of temporary storage, along with three 512 GB and one 128-GB premium storage disk; an SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs, 14-GB RAM, and 28 GB of temporary storage; six VM type DS14_v2 for the SAP application servers with 16x vCPUs, 112 GB RAM, and 224 GB temporary storage, along with six 128-GB premium storage disks. -- [Extra Large](https://azure.com/e/58c636922cf94faf9650f583ff35e97b): An extra-large system consists of the M64s VM type for the database server with 64x vCPUs, 1024 GB RAM, and 2000 GB temp storage, additionally seven 1024-GB premium storage disks. An SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs 14-GB RAM and 28-GB temp storage. 10 VM type DS14_v2 for the SAP application servers with 16x vCPUs, 112 GB RAM, and 224 GB temp storage, additionally ten 128-GB premium storage disk. +- [Extra Large](https://azure.com/e/58c636922cf94faf9650f583ff35e97b): An extra-large system consists of the M64s VM type for the database server with 64x vCPUs, 1024 GB RAM, and 2000 GB of temporary storage, along with seven 1024-GB premium storage disks; an SAP Central Instance server using a DS11_v2 VM types with 2x vCPUs 14-GB RAM and 28 GB of temporary storage; ten VM type DS14_v2 for the SAP application servers with 16x vCPUs, 112 GB RAM, and 224 GB of temporary storage, along with ten 128-GB premium storage disks. ## Deployment diff --git a/docs/example-scenario/data/big-data-with-iot.md b/docs/example-scenario/data/big-data-with-iot.md index 64ae42eeb5c..9ca0d572e02 100644 --- a/docs/example-scenario/data/big-data-with-iot.md +++ b/docs/example-scenario/data/big-data-with-iot.md @@ -15,7 +15,7 @@ social_image_url: /azure/architecture/example-scenario/data/media/architecture-b This example scenario is relevant to organizations building solutions that integrate data from many IoT devices into a comprehensive data analysis architecture to improve and automate decision making. Potential applications include construction, mining, manufacturing, or other industry solutions involving large volumes of data from many IoT-based data inputs. -In this scenario, a construction equipment manufacturer builds vehicles, meters, and drones that use IoT and GPS technologies to emit telemetry data. The company wants to modernize their data architecture to better monitor operating conditions and equipment health. Replacing the company's legacy solution using on-premises infrastructure would be both time and labor intensive, and would not be able to scale sufficiently to handle the anticipated data volume. +In this scenario, a construction equipment manufacturer builds vehicles, meters, and drones that use IoT and GPS technologies to emit telemetry data. The company wants to modernize their data architecture to better monitor operating conditions and equipment health. Replacing the company's legacy solution using on-premises infrastructure would be both time intensive and labor intensive, and would not be able to scale sufficiently to handle the anticipated data volume. The company wants to build a cloud-based "smart construction" solution. It should gather a comprehensive set of data for a construction site and automate the operation and maintenance of the various elements of the site. The company's goals include: diff --git a/docs/example-scenario/infrastructure/linux-vdi-citrix.md b/docs/example-scenario/infrastructure/linux-vdi-citrix.md index 57329f1ba82..35ee900078b 100644 --- a/docs/example-scenario/infrastructure/linux-vdi-citrix.md +++ b/docs/example-scenario/infrastructure/linux-vdi-citrix.md @@ -60,8 +60,8 @@ For this scenario, the following SKUs are used: ### Components - [Azure Virtual Network](/azure/virtual-network/virtual-networks-overview) allows resources such as VMs to securely communicate with each other, the internet, and on-premises networks. Virtual networks provide isolation and segmentation, filter and route traffic, and allow connection between locations. One virtual network will be used for all resources in this scenario. -- [Azure network security groups](/azure/virtual-network/security-overview) contain a list of security rules that allow or deny inbound or outbound network traffic based on source or destination IP address, port, and protocol. The virtual networks in this scenario are secured with network security group rules that restrict the flow of traffic between the application components. -- [Azure load balancer](/azure/application-gateway/overview) distributes inbound traffic according to rules and health probes. A load balancer provides low latency and high throughput, and scales up to millions of flows for all TCP and UDP applications. An internal load balancer is used in this scenario to distribute traffic on the Citrix NetScaler. +- [Network security groups](/azure/virtual-network/security-overview) contain a list of security rules that allow or deny inbound or outbound network traffic based on source or destination IP address, port, and protocol. The virtual networks in this scenario are secured with network security group rules that restrict the flow of traffic between the application components. +- [Azure Load Balancer](/azure/application-gateway/overview) distributes inbound traffic according to rules and health probes. A load balancer provides low latency and high throughput, and scales up to millions of flows for all TCP and UDP applications. An internal load balancer is used in this scenario to distribute traffic on the Citrix NetScaler. - [Azure Hybrid File Sync](https://github.com/MicrosoftDocs/azure-docs/edit/master/articles/storage/files/storage-sync-files-planning.md) will be used for all shared storage. The storage will replicate to two file servers using Hybrid File Sync. - [Azure SQL Database](/azure/sql-database/sql-database-technical-overview) is a managed relational database service based on the latest stable version of the Microsoft SQL Server Database Engine. In this example, it is used to host Citrix databases. - [ExpressRoute](/azure/expressroute/expressroute-introduction) lets you extend your on-premises networks into the Microsoft cloud over a private connection facilitated by a connectivity provider. @@ -89,7 +89,7 @@ For this scenario, the following SKUs are used: - This example is designed for high availability for all roles other than the licensing server. Because the environment continues to function during a 30-day grace period if the license server is offline, no additional redundancy is required on that server. - All servers providing similar roles should be deployed in [Availability Sets](/azure/virtual-machines/windows/manage-availability#configure-multiple-virtual-machines-in-an-availability-set-for-redundancy). - This example scenario does not include Disaster Recovery capabilities. [Azure Site Recovery](/azure/site-recovery/site-recovery-overview) could be a good add-on to this design. -- Consider deploying the VM instances in this scenario across [Availability Zones](/azure/availability-zones/az-overview). Each availability zone is made up of one or more datacenters equipped with independent power, cooling, and networking. Each enabled region has a minimum of three availability zones. This distribution of VM instances across zones provides high availability to the application tiers. For more information, see [what are Availability Zones in Azure?](/azure/availability-zones/az-overview). You can also [deploy VPN and ExpressRoute gateways in Azure Availability Zones](/azure/vpn-gateway/about-zone-redundant-vnet-gateways). +- Consider deploying the VM instances in this scenario across [Availability Zones](/azure/availability-zones/az-overview). Each availability zone is made up of one or more datacenters equipped with independent power, cooling, and networking. Each enabled region has a minimum of three availability zones. This distribution of VM instances across zones provides high availability to the application tiers. For more information, see [What are Availability Zones in Azure?](/azure/availability-zones/az-overview). You can also [deploy VPN and ExpressRoute gateways in Azure Availability Zones](/azure/vpn-gateway/about-zone-redundant-vnet-gateways). - For a production deployment management solution should be implemented such as [backup](/azure/backup/backup-introduction-to-azure-backup), [monitoring](/azure/monitoring-and-diagnostics/monitoring-overview) and [update management](/azure/automation/automation-update-management). - This example should work for about 250 concurrent (about 50-60 per VDA server) users with a mixed usage. But that will greatly depended on the type of applications being used. For production use, rigorous load testing should be performed. diff --git a/docs/resiliency/recovery-loss-azure-region.md b/docs/resiliency/recovery-loss-azure-region.md index 46c0972bcfd..633a99f6969 100644 --- a/docs/resiliency/recovery-loss-azure-region.md +++ b/docs/resiliency/recovery-loss-azure-region.md @@ -19,7 +19,7 @@ Under rare circumstances, it is possible that facilities in an entire region can ### Resource management -You can distribute compute instances across regions by creating a separate cloud service in each target region, and then publishing the deployment package to each cloud service. However, note that distributing traffic across cloud services in different regions must be implemented by the application developer or with a traffic management service. +You can distribute compute instances across regions by creating a separate cloud service in each target region, and then publishing the deployment package to each cloud service. However, distributing traffic across cloud services in different regions must be implemented by the application developer or with a traffic management service. Determining the number of spare role instances to deploy in advance for disaster recovery is an important aspect of capacity planning. Having a full-scale secondary deployment ensures that capacity is already available when needed; however, this effectively doubles the cost. A common pattern is to have a small, secondary deployment, just large enough to run critical services. This small secondary deployment is a good idea, both to reserve capacity, and for testing the configuration of the secondary environment. @@ -28,54 +28,54 @@ Determining the number of spare role instances to deploy in advance for disaster ### Load Balancing -To load balance traffic across regions requires a traffic management solution. Azure provides [Azure Traffic Manager](https://azure.microsoft.com/services/traffic-manager/). You can also take advantage of third-party services that provide similar traffic management capabilities. +To load balance traffic across regions requires a traffic management solution. Azure provides [Azure Traffic Manager](https://azure.microsoft.com/services/traffic-manager). You can also take advantage of third-party services that provide similar traffic management capabilities. ### Strategies Many alternative strategies are available for implementing distributed compute across regions. These must be tailored to the specific business requirements and circumstances of the application. At a high level, the approaches can be divided into the following categories: -- **Redeploy on disaster**: In this approach the application is redeployed from scratch at the time of disaster. This is appropriate for non-critical applications that don’t require a guaranteed recovery time. +- **Redeploy on disaster**: In this approach, the application is redeployed from scratch at the time of disaster. This is appropriate for non-critical applications that don’t require a guaranteed recovery time. - **Warm Spare (Active/Passive)**: A secondary hosted service is created in an alternate region, and roles are deployed to guarantee minimal capacity; however, the roles don’t receive production traffic. This approach is useful for applications that have not been designed to distribute traffic across regions. - **Hot Spare (Active/Active)**: The application is designed to receive production load in multiple regions. The cloud services in each region might be configured for higher capacity than required for disaster recovery purposes. Alternatively, the cloud services might scale out as necessary at the time of a disaster and fail over. This approach requires substantial investment in application design, but it has significant benefits. These include low and guaranteed recovery time, continuous testing of all recovery locations, and efficient usage of capacity. -A complete discussion of distributed design is outside the scope of this document. For further information, see [Disaster Recovery and High Availability for Azure Applications](https://aka.ms/drtechguide). +A complete discussion of distributed design is outside the scope of this document. For more information, see [Disaster Recovery and High Availability for Azure Applications](https://aka.ms/drtechguide). ## Virtual machines Recovery of infrastructure as a service (IaaS) virtual machines (VMs) is similar to platform as a service (PaaS) compute recovery in many respects. There are important differences, however, due to the fact that an IaaS VM consists of both the VM and the VM disk. - **Use Azure Backup to create cross region backups that are application consistent**. - [Azure Backup](https://azure.microsoft.com/services/backup/) enables customers to create application consistent backups across multiple VM disks, and support replication of backups across regions. You can do this by choosing to geo-replicate the backup vault at the time of creation. Note that replication of the backup vault must be configured at the time of creation. It can't be set later. If a region is lost, Microsoft will make the backups available to customers. Customers will be able to restore to any of their configured restore points. + [Azure Backup](https://azure.microsoft.com/services/backup) enables customers to create application consistent backups across multiple VM disks, and support replication of backups across regions. You can do this by choosing to geo-replicate the backup vault at the time of creation. Replication of the backup vault must be configured at the time of creation. It can't be set later. If a region is lost, Microsoft will make the backups available to customers. Customers will be able to restore to any of their configured restore points. - **Separate the data disk from the operating system disk**. An important consideration for IaaS VMs is that you cannot change the operating system disk without re-creating the VM. This is not a problem if your recovery strategy is to redeploy after disaster. However, it might be a problem if you are using the Warm Spare approach to reserve capacity. To implement this properly, you must have the correct operating system disk deployed to both the primary and secondary locations, and the application data must be stored on a separate drive. If possible, use a standard operating system configuration that can be provided on both locations. After a failover, you must then attach the data drive to your existing IaaS VMs in the secondary DC. Use AzCopy to copy snapshots of the data disk(s) to a remote site. -- **Be aware of potential consistency issues after a geo-failover of multiple VM Disks**. VM Disks are implemented as Azure Storage blobs, and have the same geo-replication characteristic. Unless [Azure Backup](https://azure.microsoft.com/services/backup/) is used, there are no guarantees of consistency across disks, because geo-replication is asynchronous and replicates independently. Individual VM disks are guaranteed to be in a crash consistent state after a geo-failover, but not consistent across disks. This could cause problems in some cases (for example, in the case of disk striping). +- **Be aware of potential consistency issues after a geo-failover of multiple VM Disks**. VM Disks are implemented as Azure Storage blobs, and have the same geo-replication characteristic. Unless [Azure Backup](https://azure.microsoft.com/services/backup) is used, there are no guarantees of consistency across disks, because geo-replication is asynchronous and replicates independently. Individual VM disks are guaranteed to be in a crash consistent state after a geo-failover, but not consistent across disks. This could cause problems in some cases (for example, in the case of disk striping). ## Storage -### Recovery by using Geo-Redundant Storage of blob, table, queue and VM disk storage +### Recovery by using geo-redundant storage of blob, table, queue, and VM disk storage -In Azure, blobs, tables, queues, and VM disks are all geo-replicated by default. This is referred to as Geo-Redundant Storage (GRS). GRS replicates storage data to a paired datacenter hundreds of miles apart within a specific geographic region. GRS is designed to provide additional durability in case there is a major datacenter disaster. Microsoft controls when failover occurs, and failover is limited to major disasters in which the original primary location is deemed unrecoverable in a reasonable amount of time. Under some scenarios, this can be several days. Data is typically replicated within a few minutes, although synchronization interval is not yet covered by a service level agreement. +In Azure, blobs, tables, queues, and VM disks are all geo-replicated by default. This is referred to as geo-redundant storage (GRS). GRS replicates storage data to a paired datacenter located hundreds of miles apart within a specific geographic region. GRS is designed to provide additional durability in case there is a major datacenter disaster. Microsoft controls when failover occurs, and failover is limited to major disasters in which the original primary location is deemed unrecoverable in a reasonable amount of time. Under some scenarios, this can be several days. Data is typically replicated within a few minutes, although synchronization interval is not yet covered by a service level agreement. -In the event of a geo-failover, there will be no change to how the account is accessed (the URL and account key will not change). The storage account will, however, be in a different region after failover. This could impact applications that require regional affinity with their storage account. Even for services and applications that do not require a storage account in the same datacenter, the cross-datacenter latency and bandwidth charges might be compelling reasons to move traffic to the failover region temporarily. This could factor into an overall disaster recovery strategy. +If a geo-failover occurs, there will be no change to how the account is accessed (the URL and account key will not change). The storage account will, however, be in a different region after failover. This could impact applications that require regional affinity with their storage account. Even for services and applications that do not require a storage account in the same datacenter, the cross-datacenter latency and bandwidth charges might be compelling reasons to move traffic to the failover region temporarily. This could factor into an overall disaster recovery strategy. -In addition to automatic failover provided by GRS, Azure has introduced a service that gives you read access to the copy of your data in the secondary storage location. This is called Read-Access Geo-Redundant Storage (RA-GRS). +In addition to automatic failover provided by GRS, Azure has introduced a service that gives you read access to the copy of your data in the secondary storage location. This is called read-access geo-redundant storage (RA-GRS). -For more information about both GRS and RA-GRS storage, see [Azure Storage replication](/azure/storage/storage-redundancy/). +For more information about both GRS and RA-GRS storage, see [Azure Storage replication](/azure/storage/storage-redundancy). -### Geo-Replication region mappings +### Geo-replication region mappings It is important to know where your data is geo-replicated, in order to know where to deploy the other instances of your data that require regional affinity with your storage. For more information, see [Azure Paired Regions](/azure/best-practices-availability-paired-regions). -### Geo-Replication pricing +### Geo-replication pricing -Geo-replication is included in current pricing for Azure Storage. This is called Geo-Redundant Storage (GRS). If you do not want your data geo-replicated you can disable geo-replication for your account. This is called Locally Redundant Storage, and it is charged at a discounted price compared to GRS. +Geo-replication is included in current pricing for Azure Storage. This is called geo-redundant storage (GRS). If you do not want your data geo-replicated, you can disable geo-replication for your account. This is called locally redundant storage (LRS), and it is charged at a discounted price compared to GRS. ### Determining if a geo-failover has occurred -If a geo-failover occurs, this will be posted to the [Azure Service Health Dashboard](https://azure.microsoft.com/status/). Applications can implement an automated means of detecting this, however, by monitoring the geo-region for their storage account. This can be used to trigger other recovery operations, such as activation of compute resources in the geo-region where their storage moved to. You can perform a query for this from the service management API, by using [Get Storage Account Properties](https://msdn.microsoft.com/library/ee460802.aspx). The relevant properties are: +If a geo-failover occurs, this will be posted to the [Azure Service Health Dashboard](https://azure.microsoft.com/status). Applications can implement an automated means of detecting this, however, by monitoring the geo-region for their storage account. This can be used to trigger other recovery operations, such as activation of compute resources in the geo-region where their storage moved to. You can perform a query for this from the service management API, by using [Get Storage Account Properties](https://msdn.microsoft.com/library/ee460802.aspx). The relevant properties are: primary-region [Available|Unavailable] @@ -91,19 +91,19 @@ As discussed in the section on VM disks, there are no guarantees for data consis ### SQL Database -Azure SQL Database provides two types of recovery: Geo-Restore and Active Geo-Replication. +Azure SQL Database provides two types of recovery: geo-restore and active geo-replication. -#### Geo-Restore +#### Geo-restore -[Geo-Restore](/azure/sql-database/sql-database-recovery-using-backups/#geo-restore) is also available with Basic, Standard, and Premium databases. It provides the default recovery option when the database is unavailable because of an incident in the region where your database is hosted. Similar to Point-In-Time Restore, Geo-Restore relies on database backups in geo-redundant Azure storage. It restores from the geo-replicated backup copy, and therefore is resilient to the storage outages in the primary region. For more details, see [Restore an Azure SQL Database or failover to a secondary](/azure/sql-database/sql-database-disaster-recovery/). +[Geo-restore](/azure/sql-database/sql-database-recovery-using-backups/#geo-restore) is also available with Basic, Standard, and Premium databases. It provides the default recovery option when the database is unavailable because of an incident in the region where your database is hosted. Similar to point-in-time restore, geo-restore relies on database backups in geo-redundant Azure storage. It restores from the geo-replicated backup copy, and therefore is resilient to the storage outages in the primary region. For more information, see [Restore an Azure SQL Database or failover to a secondary](/azure/sql-database/sql-database-disaster-recovery). -#### Active Geo-Replication +#### Active geo-replication -[Active Geo-Replication](/azure/sql-database/sql-database-geo-replication-overview/) is available for all database tiers. It’s designed for applications that have more aggressive recovery requirements than Geo-Restore can offer. Using Active Geo-Replication, you can create up to four readable secondaries on servers in different regions. You can initiate failover to any of the secondaries. In addition, Active Geo-Replication can be used to support the application upgrade or relocation scenarios, as well as load balancing for read-only workloads. For details, see [configure Geo-Replication](/azure/sql-database/sql-database-geo-replication-portal/) and to [fail over to the secondary database](/azure/sql-database/sql-database-geo-replication-failover-portal/). Refer to [Design an application for cloud disaster recovery using Active Geo-Replication in SQL Database](/azure/sql-database/sql-database-designing-cloud-solutions-for-disaster-recovery/) and [Managing rolling upgrades of cloud applications using SQL Database Active Geo-Replication](/azure/sql-database/sql-database-manage-application-rolling-upgrade/) for details on how to design and implement applications and applications upgrade without downtime. +[Active geo-replication](/azure/sql-database/sql-database-geo-replication-overview) is available for all database tiers. It’s designed for applications that have more aggressive recovery requirements than geo-restore can offer. Using active geo-replication, you can create up to four readable secondaries on servers in different regions. You can initiate failover to any of the secondaries. In addition, active geo-replication can be used to support the application upgrade or relocation scenarios, as well as load balancing for read-only workloads. For details, see [Configure active geo-replication for Azure SQL Database and initiate failover](/azure/sql-database/sql-database-geo-replication-portal). Refer to [Designing globally available services using Azure SQL Database](/azure/sql-database/sql-database-designing-cloud-solutions-for-disaster-recovery) and [Managing rolling upgrades of cloud applications by using SQL Database active geo-replication](/azure/sql-database/sql-database-manage-application-rolling-upgrade) for details on how to design and implement applications and applications upgrade without downtime. -### SQL Server on Virtual Machines +### SQL Server on Azure Virtual Machines -A variety of options are available for recovery and high availability for SQL Server 2012 (and later) running in Azure Virtual Machines. For more information, see [High availability and disaster recovery for SQL Server in Azure Virtual Machines](/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-high-availability-dr/). +A variety of options are available for recovery and high availability for SQL Server 2012 (and later) running in Azure Virtual Machines. For more information, see [High availability and disaster recovery for SQL Server in Azure Virtual Machines](/azure/virtual-machines/windows/sql/virtual-machines-windows-sql-high-availability-dr). ## Other Azure platform services @@ -114,7 +114,7 @@ When attempting to run your cloud service in multiple Azure regions, you must co ### Service Bus -Azure Service Bus uses a unique namespace that does not span Azure regions. So the first requirement is to set up the necessary service bus namespaces in the alternate region. However, there are also considerations for the durability of the queued messages. There are several strategies for replicating messages across Azure regions. For the details on these replication strategies and other disaster recovery strategies, see [Best practices for insulating applications against Service Bus outages and disasters](/azure/service-bus-messaging/service-bus-outages-disasters/). +Azure Service Bus uses a unique namespace that does not span Azure regions. So the first requirement is to set up the necessary service bus namespaces in the alternate region. However, there are also considerations for the durability of the queued messages. There are several strategies for replicating messages across Azure regions. For the details on these replication strategies and other disaster recovery strategies, see [Best practices for insulating applications against Service Bus outages and disasters](/azure/service-bus-messaging/service-bus-outages-disasters). ### App Service @@ -122,7 +122,7 @@ To migrate an Azure App Service application, such as Web Apps or Mobile Apps, to ### HDInsight -The data associated with HDInsight is stored by default in Azure Blob Storage. HDInsight requires that a Hadoop cluster processing MapReduce jobs must be co-located in the same region as the storage account that contains the data being analyzed. Provided you use the geo-replication feature available to Azure Storage, you can access your data in the secondary region where the data was replicated if for some reason the primary region is no longer available. You can create a new Hadoop cluster in the region where the data has been replicated and continue processing it. +The data associated with HDInsight is stored by default in Azure Blob Storage. HDInsight requires that a Hadoop cluster processing MapReduce jobs must be colocated in the same region as the storage account that contains the data being analyzed. Provided you use the geo-replication feature available to Azure Storage, you can access your data in the secondary region where the data was replicated if for some reason the primary region is no longer available. You can create a new Hadoop cluster in the region where the data has been replicated and continue processing it. ### SQL Reporting @@ -134,7 +134,7 @@ Azure Media Services has a different recovery approach for encoding and streamin ### Virtual network -Configuration files provide the quickest way to set up a virtual network in an alternate Azure region. After configuring the virtual network in the primary Azure region, [export the virtual network settings](/azure/virtual-network/virtual-networks-create-vnet-classic-portal/) for the current network to a network configuration file. In the event of an outage in the primary region, [restore the virtual network](/azure/virtual-network/virtual-networks-create-vnet-classic-portal/) from the stored configuration file. Then configure other cloud services, virtual machines, or cross-premises settings to work with the new virtual network. +Configuration files provide the quickest way to set up a virtual network in an alternate Azure region. After configuring the virtual network in the primary Azure region, [export the virtual network settings](/azure/virtual-network/virtual-networks-create-vnet-classic-portal) for the current network to a network configuration file. If an outage occurs in the primary region, [restore the virtual network](/azure/virtual-network/virtual-networks-create-vnet-classic-portal) from the stored configuration file. Then configure other cloud services, virtual machines, or cross-premises settings to work with the new virtual network. ## Checklists for disaster recovery @@ -148,19 +148,19 @@ Configuration files provide the quickest way to set up a virtual network in an a ### Virtual Machines checklist 1. Review the Virtual Machines section of this document. -2. Use [Azure Backup](https://azure.microsoft.com/services/backup/) to create application consistent backups across regions. +2. Use [Azure Backup](https://azure.microsoft.com/services/backup) to create application consistent backups across regions. ### Storage checklist 1. Review the Storage section of this document. 2. Do not disable geo-replication of storage resources. -3. Understand alternate region for geo-replication in the event of failover. +3. Understand alternate region for geo-replication if a failover occurs. 4. Create custom backup strategies for user-controlled failover strategies. ### SQL Database checklist 1. Review the SQL Database section of this document. -2. Use [Geo-Restore](/azure/sql-database/sql-database-recovery-using-backups/#geo-restore) or [Geo-Replication](/azure/sql-database/sql-database-geo-replication-overview/) as appropriate. +2. Use [geo-restore](/azure/sql-database/sql-database-recovery-using-backups/#geo-restore) or [geo-replication](/azure/sql-database/sql-database-geo-replication-overview) as appropriate. ### SQL Server on Virtual Machines checklist @@ -198,7 +198,7 @@ Configuration files provide the quickest way to set up a virtual network in an a 1. Review the Media Services section of this document. 2. Create a Media Services account in an alternate region. 3. Encode the same content in both regions to support streaming failover. -4. Submit encoding jobs to an alternate region in the event of a service disruption. +4. Submit encoding jobs to an alternate region if a service disruption occurs. ### Virtual Network checklist diff --git a/docs/service-fabric/modernize-app-azure-service-fabric.md b/docs/service-fabric/modernize-app-azure-service-fabric.md index 9a9bece7605..1c6be2aa16b 100644 --- a/docs/service-fabric/modernize-app-azure-service-fabric.md +++ b/docs/service-fabric/modernize-app-azure-service-fabric.md @@ -49,7 +49,7 @@ Before containerizing existing applications, evaluate requirements. Select appli First, determine the type of applications that are best suited for a containerized platform, full virtual machines, and pure PaaS environment. The application could be a shared application that is built with Service Fabric to share Windows Server hosts across various containerized applications. Each Service Fabric host can run multiple different applications running in isolated Windows containers. -Consider creating a set of criteria to determine such applications. Here are some example criteria of containerized Windows applications in Service Fabric. +Consider creating a set of criteria to determine such applications. Here are some example criteria of containerized Windows applications in Service Fabric. - HTTP/HTTPS web and application tiers without database dependency. - Stateless web applications. @@ -61,33 +61,38 @@ Consider creating a set of criteria to determine such applications. Here are som > Dependencies that cannot be containerized include MSMQ (Currently supported in preview releases of Windows Server Core post 1709). - Applications can compile and build in Visual Studio. -For the web applications, databases, and other required servers (such as Active Directory) exist outside the Service Fabric cluster in IaaS VMs, PaaS, or on-premise. +For the web applications, databases, and other required servers (such as Active Directory) exist outside the Service Fabric cluster in IaaS VMs, PaaS, or on-premises. ### Developer workstation requirements -From an application development perspective, determine the workstation requirements. -- [Docker for Windows](https://www.docker.com/docker-windows) is required for developers to containerize and test their applications prior to deployment. + +From an application development perspective, determine the workstation requirements. + +- [Docker for Windows](https://www.docker.com/docker-windows) is required for developers to containerize and test their applications prior to deployment. - Visual Studio Docker support is required. Standardize on the latest version of [Visual Studio](https://visualstudio.microsoft.com/) for the best Docker compatibility. - If workstations don't have enough hardware resources to oversee those requirements, use Azure compute resources for speed and productivity gains. An option is the Azure DevTest Labs Service. Docker for Windows, and Visual Studio 2017 require a minimum of 8 GB of memory. ### Networking requirements + Service Fabric orchestration provides a platform for hosting, deploying, scaling, and operating applications at enterprise scale. Most large enterprises that use Azure: -- Extend their corporate network with a private address space to an Azure subscription. use either [Express Route](https://azure.microsoft.com/services/expressroute/) or a [Site-to-Site VPN](/azure/vpn-gateway/vpn-gateway-howto-site-to-site-resource-manager-portal) to provide secure on-premise connectivity. -- Want to control inbound and outbound network traffic through third-party firewall appliances and/or [Azure Network Security Group rules](/azure/virtual-network/security-overview). -- Want tight control over the address space requirements and subnets. -Service Fabric is suitable as a containerization platform. It plugs into an existing cloud infrastructure and doesn't require open public ingress endpoints. You just need to carve out the necessary address space for Service Fabric’s IP address requirements. For details, see the [Service Fabric Networking](#service-fabric-networking) section in this article. +- Extend their corporate network with a private address space to an Azure subscription. use either [Express Route](https://azure.microsoft.com/services/expressroute/) or a [Site-to-Site VPN](/azure/vpn-gateway/vpn-gateway-howto-site-to-site-resource-manager-portal) to provide secure on-premises connectivity. +- Want to control inbound and outbound network traffic through third-party firewall appliances and/or [Azure Network Security Group rules](/azure/virtual-network/security-overview). +- Want tight control over the address space requirements and subnets. + +Service Fabric is suitable as a containerization platform. It plugs into an existing cloud infrastructure and doesn't require open public ingress endpoints. You just need to carve out the necessary address space for Service Fabric’s IP address requirements. For details, see the [Service Fabric Networking](#service-fabric-networking) section in this article. ## Containerize existing Windows applications + After you’ve determined the applications that meet the selection criteria, containerize them into Docker images. The result is containerized .NET web application running in IIS where all tiers run in one container. -> [!NOTE] +> [!NOTE] > You can use multiple containers; one per tier. Here are the basic steps for containerizing an application. -1. Open the project in Visual Studio. -2. Make sure the project compiles and runs locally on the developer workstation. -3. Add a Dockerfile to the project. This Dockerfile example shows a basic .NET MVC application. +1. Open the project in Visual Studio. +2. Make sure the project compiles and runs locally on the developer workstation. +3. Add a Dockerfile to the project. This Dockerfile example shows a basic .NET MVC application. ``` FROM microsoft/aspnet:4.7 ADD PublishOutput/ /inetpub/wwwroot @@ -103,30 +108,32 @@ Here are the basic steps for containerizing an application. # plugin into SF healthcheck ensuring the container website is running HEALTHCHECK --interval=30s --timeout=30s --start-period=60s --retries=3 CMD curl -f http://localhost/ || exit 1 ``` -4. Test locally by using Docker For Windows. The application must successfully run in a Docker container by using the Visual Studio debug experience. For more information, see [Deploy a .NET app using Docker Compose](/azure/service-fabric/service-fabric-host-app-in-a-container). +4. Test locally by using Docker For Windows. The application must successfully run in a Docker container by using the Visual Studio debug experience. For more information, see [Deploy a .NET app using Docker Compose](/azure/service-fabric/service-fabric-host-app-in-a-container). -5. Build (if needed), tag, and push the tested image to a Docker registry, like the [Azure Container Registry](/azure/container-registry/) service. This example uses an existing Azure Container Registry named MyAcr and Docker build/tag/push to build/deploy appA to the registry. +5. Build (if needed), tag, and push the tested image to a Docker registry, like the [Azure Container Registry](/azure/container-registry/) service. This example uses an existing Azure Container Registry named MyAcr and Docker build/tag/push to build/deploy appA to the registry. ``` docker login myacr.azurecr.io -u myacr -p docker build -t appa . docker tag appa myacr.azurecr.io/appa:1.0 - docker push myacr.azurecr.io/appa:1.0 - + docker push myacr.azurecr.io/appa:1.0 ``` + The image is tagged with a version number that Service Fabric references when it deploys and versions the container. Azure DevOps encapsulates and executes the manual Docker build/tag/push process. DevOps details are described in the [DevOps and CI/CD](#devops-and-cicd) section. > [!NOTE] > In the preceding example, the base image is "microsoft/aspnet4.7" from DockerHub. Here are some considerations about the base images: + - The base image could be a locked-down custom enterprise image that enforces enterprise requirements. For a shared application, isolation boundaries can be created through credentials or by using separate registry. It's recommended that enterprise-supported docker images be kept separately and stored in an isolated container registry. -- Avoid storing the registry login credentials in configuration files. Instead, use (role-based access control) RBAC and [Azure Active Directory service principals](/azure/active-directory/develop/app-objects-and-service-principals) with Azure Container Registry. Provide read-only access to registries depending on your enterprise requirements. +- Avoid storing the registry login credentials in configuration files. Instead, use (role-based access control) RBAC and [Azure Active Directory service principals](/azure/active-directory/develop/app-objects-and-service-principals) with Azure Container Registry. Provide read-only access to registries depending on your enterprise requirements. For information about running an IIS ASP.net MVC application in a Windows container, see [Migrating ASP.NET MVC Applications to Windows Containers](/aspnet/mvc/overview/deployment/docker-aspnetmvc). ## Service Fabric cluster configuration for enterprise deployments -To deploy a Service Fabric cluster, start with the sample Azure Resource Manager template in this [GitHub Repo](https://github.com/Azure-Samples/Service-fabric-dotnet-modernization) and customize it to fit your requirements. You also deploy a cluster through the Azure portal, but that option should be used for development/test provisioning. + +To deploy a Service Fabric cluster, start with the sample Azure Resource Manager template in this [GitHub Repo](https://github.com/Azure-Samples/Service-fabric-dotnet-modernization) and customize it to fit your requirements. You also deploy a cluster through the Azure portal, but that option should be used for development/test provisioning. ### Service Fabric node types diff --git a/docs/toc.yml b/docs/toc.yml index 807026fc46b..5c40a15b02c 100644 --- a/docs/toc.yml +++ b/docs/toc.yml @@ -1338,14 +1338,14 @@ items: items: - name: Introduction href: cloud-adoption/operations/monitor/index.md - - name: Platform overview - href: cloud-adoption/operations/monitor/platform-overview.md - - name: Monitoring cloud apps - href: cloud-adoption/operations/monitor/cloud-app-howto.md + - name: Monitoring cloud models + href: cloud-adoption/operations/monitor/cloud-models-monitor-overview.md - name: Data collection href: cloud-adoption/operations/monitor/data-collection.md - name: Alerting href: cloud-adoption/operations/monitor/alert.md + - name: Monitoring platforms overview + href: cloud-adoption/operations/monitor/platform-overview.md - name: Establish an operational fitness review href: cloud-adoption/operations/operational-fitness-review.md - name: References diff --git a/docs/topics/high-performance-computing.md b/docs/topics/high-performance-computing.md index 2246073d50b..10b12b5f3b2 100644 --- a/docs/topics/high-performance-computing.md +++ b/docs/topics/high-performance-computing.md @@ -30,7 +30,7 @@ Many industries use HPC to solve some of their most difficult problems. These i ### How is HPC different on the cloud? -One of the primary differences between an on-premise HPC system and one in the cloud is the ability for resources to dynamically be added and removed as they're needed. Dynamic scaling removes compute capacity as a bottleneck and instead allow customers to right size their infrastructure for the requirements of their jobs. +One of the primary differences between an on-premises HPC system and one in the cloud is the ability for resources to dynamically be added and removed as they're needed. Dynamic scaling removes compute capacity as a bottleneck and instead allow customers to right size their infrastructure for the requirements of their jobs. The following articles provide more detail about this dynamic scaling capability. @@ -170,7 +170,7 @@ Building an HPC system from scratch on Azure offers a significant amount of flex ### Hybrid and cloud Bursting -If you have an existing on-premise HPC system that you'd like to connect to Azure, there are a number of resources to help get you started. +If you have an existing on-premises HPC system that you'd like to connect to Azure, there are a number of resources to help get you started. First, review the [Options for connecting an on-premises network to Azure](/azure/architecture/reference-architectures/hybrid-networking/) article in the documentation. From there, you may want information on these connectivity options: @@ -430,4 +430,4 @@ These tutorials will provide you with details on running applications on Microso - [Run containerized HPC workloads with Batch Shipyard](https://github.com/Azure/batch-shipyard) - [Run parallel R workloads on Batch](https://github.com/Azure/doAzureParallel) - [Run on-demand Spark jobs on Batch](https://github.com/Azure/aztk) -- [Use compute-intensive VMs in Batch pools](/azure/batch/batch-pool-compute-intensive-sizes) \ No newline at end of file +- [Use compute-intensive VMs in Batch pools](/azure/batch/batch-pool-compute-intensive-sizes)