diff --git a/docs/sources/introduction/_index.md b/docs/sources/introduction/_index.md index e3597097630..28ee3bd3e6e 100644 --- a/docs/sources/introduction/_index.md +++ b/docs/sources/introduction/_index.md @@ -1,6 +1,6 @@ --- canonical: https://grafana.com/docs/alloy/latest/introduction/ -description: Grafana Alloy is a flexible, high performance, vendor-neutral distribution of the OTel Collector +description: Grafana Alloy simplifies telemetry collection by combining metrics, logs, traces, and profiles into one powerful, vendor-neutral collector menuTitle: Introduction title: Introduction to Grafana Alloy weight: 10 @@ -8,65 +8,45 @@ weight: 10 # Introduction to {{% param "FULL_PRODUCT_NAME" %}} -{{< param "PRODUCT_NAME" >}} is a flexible, high performance, vendor-neutral distribution of the [OpenTelemetry][] Collector. -It's fully compatible with the most popular open source observability standards such as OpenTelemetry and Prometheus. +{{< param "FULL_PRODUCT_NAME" >}} is an open source telemetry collector that simplifies how you gather and send observability data. +It's an [OpenTelemetry Collector distribution][OpenTelemetry] with built-in Prometheus pipelines and native support for Loki, Pyroscope, and other observability backends. -{{< param "PRODUCT_NAME" >}} focuses on ease-of-use and the ability to adapt to the needs of power users. +{{< param "PRODUCT_NAME" >}} collects metrics, logs, traces, and profiles in one unified solution. +Instead of running separate collectors for each signal type, you configure a single tool that handles all your telemetry needs. +This approach reduces operational complexity while giving you the flexibility to send data to any compatible backend, whether that's Grafana Cloud, a self-managed Grafana stack, or other observability platforms. -{{< docs/learning-journeys title="Send logs to Grafana Cloud using Alloy" url="/docs/learning-journeys/send-logs-alloy-loki/" >}} - -## Key features - -Some of the key features of {{< param "PRODUCT_NAME" >}} include: - -* **Custom components:** You can use {{< param "PRODUCT_NAME" >}} to create and share custom components. - Custom components combine a pipeline of existing components into a single, easy-to-understand component that's just a few lines long. - You can use pre-built custom components from the community, ones packaged by Grafana, or create your own. -* **Reusable components:** You can use the output of a component as the input for multiple other components. -* **Chained components:** You can chain components together to form a pipeline. -* **Single task per component:** The scope of each component is limited to one specific task. -* **GitOps compatibility:** {{< param "PRODUCT_NAME" >}} uses frameworks to pull configurations from Git, S3, HTTP endpoints, and just about any other source. -* **Clustering support:** {{< param "PRODUCT_NAME" >}} has native clustering support. - Clustering helps distribute the workload and ensures you have high availability. - You can quickly create horizontally scalable deployments with minimal resource and operational overhead. -* **Security:** {{< param "PRODUCT_NAME" >}} helps you manage authentication credentials and connect to HashiCorp Vaults or Kubernetes clusters to retrieve secrets. -* **Debugging utilities:** {{< param "PRODUCT_NAME" >}} provides troubleshooting support and an embedded [user interface][UI] to help you identify and resolve configuration problems. - -## How does {{% param "PRODUCT_NAME" %}} work as an OpenTelemetry collector? - -{{< figure src="/media/docs/alloy/flow-diagram-small-alloy.png" alt="Alloy flow diagram" >}} - -### Collect +{{< youtube bFyGd_Sr5W4 >}} -{{< param "PRODUCT_NAME" >}} uses more than 120 components to collect telemetry data from applications, databases, and OpenTelemetry collectors. -{{< param "PRODUCT_NAME" >}} supports collection using multiple ecosystems, including OpenTelemetry and Prometheus. - -Telemetry data can be either pushed to {{< param "PRODUCT_NAME" >}}, or {{< param "PRODUCT_NAME" >}} can pull it from your data sources. - -### Transform - -{{< param "PRODUCT_NAME" >}} processes data and transforms it for sending. - -You can use transformations to inject extra metadata into telemetry or filter out unwanted data. - -### Write +{{< docs/learning-journeys title="Send logs to Grafana Cloud using Alloy" url="/docs/learning-journeys/send-logs-alloy-loki/" >}} -{{< param "PRODUCT_NAME" >}} sends data to OpenTelemetry-compatible databases or collectors, the Grafana stack, or Grafana Cloud. +## Get started -{{< param "PRODUCT_NAME" >}} can also write alerting rules in compatible databases. +- [Install][Install] {{< param "PRODUCT_NAME" >}} on your platform +- Learn core [concepts][Concepts] including components, expressions, and pipelines +- Follow [tutorials][tutorials] for hands-on experience +- Explore [alloy-scenarios][scenarios] for real-world configuration examples +- Try the [Alloy for Beginners][beginners] workshop for interactive, scenario-based learning +- Explore the [component reference][reference] to see available components -## Next steps +## Learn more -* [Install][] {{< param "PRODUCT_NAME" >}}. -* Learn about the core [Concepts][] of {{< param "PRODUCT_NAME" >}}. -* Follow the [tutorials][] for hands-on learning about {{< param "PRODUCT_NAME" >}}. -* Learn how to [collect and forward data][Collect] with {{< param "PRODUCT_NAME" >}}. -* Check out the [reference][] documentation to find information about the {{< param "PRODUCT_NAME" >}} components, configuration blocks, and command line tools. +- [Why Alloy][Why Alloy]: Understand when {{< param "PRODUCT_NAME" >}} is the right choice +- [How Alloy works][How Alloy works]: Learn about the architecture and key capabilities +- [Requirements and expectations][Requirements]: Review deployment considerations and constraints +- [Supported platforms][Supported platforms]: Check platform compatibility +- [Estimate resource usage][Estimate resource usage]: Plan your deployment +- [Migrate from other collectors][migrate]: Move from OpenTelemetry Collector, Prometheus Agent, or Grafana Agent -[OpenTelemetry]: https://opentelemetry.io/ecosystem/distributions/ +[OpenTelemetry]: https://opentelemetry.io/docs/collector/distributions/ [Install]: ../set-up/install/ [Concepts]: ../get-started/ -[Collect]: ../collect/ [tutorials]: ../tutorials/ [reference]: ../reference/ -[UI]: ../troubleshoot/debug/ +[Why Alloy]: ./why-alloy/ +[How Alloy works]: ./how-alloy-works/ +[Requirements]: ./requirements/ +[Supported platforms]: ../set-up/supported-platforms/ +[Estimate resource usage]: ../set-up/estimate-resource-usage/ +[migrate]: ../set-up/migrate/ +[beginners]: https://github.com/grafana/Grafana-Alloy-for-Beginners +[scenarios]: https://github.com/grafana/alloy-scenarios \ No newline at end of file diff --git a/docs/sources/introduction/how-alloy-works.md b/docs/sources/introduction/how-alloy-works.md new file mode 100644 index 00000000000..2f702346975 --- /dev/null +++ b/docs/sources/introduction/how-alloy-works.md @@ -0,0 +1,117 @@ +--- +canonical: https://grafana.com/docs/alloy/latest/introduction/how-alloy-works/ +description: Learn how Grafana Alloy works and where it fits in your observability architecture +menuTitle: How Alloy works +title: How Grafana Alloy works +weight: 220 +--- + +# How {{% param "FULL_PRODUCT_NAME" %}} works + +Understanding the architecture and design of {{< param "PRODUCT_NAME" >}} helps you use it effectively. + +## Where {{% param "PRODUCT_NAME" %}} fits + +A typical observability setup has three layers: data sources that generate telemetry, collection tools that gather and process it, and storage backends with visualization frontends for querying and exploring data. + +{{< param "PRODUCT_NAME" >}} operates in the collection layer, sitting between your data sources and your storage backends. +It acts as the bridge between them, performing three main functions in your telemetry pipeline. + +### Collect telemetry data + +{{< param "PRODUCT_NAME" >}} gathers telemetry from any source in your infrastructure. +You can configure it to scrape Prometheus endpoints for metrics or set up receivers to accept data pushed via the OpenTelemetry protocol. +It tails log files and reads from system outputs to capture application and infrastructure logs. +Service discovery automatically finds resources in Kubernetes, Docker, or cloud environments without requiring static configuration. +You can also integrate with databases, message queues, and other systems to capture telemetry from specialized sources. + +### Transform and process data + +Processing telemetry before sending it to backends optimizes costs and improves data quality. +Create filters to drop unwanted data or redact sensitive information like tokens and credentials from logs before they reach storage. +Add labels, metadata, or contextual information to enrich your data—for example, extract a cloud provider name from instance IDs to create useful aggregation labels. +Standardize attribute names across services when different teams use inconsistent naming conventions. +Implement sampling strategies to reduce high-volume data while preserving the signal you need for troubleshooting. +Convert between formats, such as transforming Prometheus metrics to OpenTelemetry format, to ensure compatibility with your backends. +Define routing rules to send different types of data to different destinations based on your operational requirements. + +### Send to backends + +{{< param "PRODUCT_NAME" >}} delivers processed telemetry to any storage system you choose. +Send data to Grafana Cloud for managed observability, or export to your self-managed Grafana stack components. +Connect to any Prometheus-compatible database for metrics and any OpenTelemetry-compatible backend for all signal types. +Write to multiple destinations simultaneously, sending the same data to different systems or routing different data types to specialized backends. + +## Component-based architecture + +{{< param "PRODUCT_NAME" >}} uses modular [components][] that work like building blocks. +Each component performs a specific task, such as collecting metrics from Prometheus endpoints, receiving OpenTelemetry data, transforming and filtering telemetry, or sending data to backends. + +You connect these components together to [build pipelines][] that match your exact requirements. +This modular approach makes configurations easier to understand, test, and maintain. + +## Programmable pipelines + +{{< param "PRODUCT_NAME" >}} uses a rich, [expression-based configuration language][syntax] that lets you reference data from one component in another, create dynamic configurations that respond to changing conditions, build reusable pipelines you can share across teams, and use built-in [functions][expressions] to transform and filter data. + +## Custom and shareable pipelines + +You can create [custom components][] that combine multiple components into a single, reusable unit. +Share these custom components with your team or the community through the [module system][modules]. +Use pre-built modules from the community or create your own. + +## Enterprise-ready features + +As your systems grow more complex, {{< param "PRODUCT_NAME" >}} scales with you. +[Clustering][] lets you configure instances to form a cluster for automatic workload distribution and high availability. +Centralized configuration retrieves settings from remote servers for fleet management. +Kubernetes-native capabilities let you interact with Kubernetes resources directly without learning separate operators. + +## Built-in debugging tools + +{{< param "PRODUCT_NAME" >}} includes a [built-in user interface][debug] that helps you visualize your component pipelines, inspect component states and outputs, troubleshoot configuration issues, and monitor performance. + +## Deployment patterns + +Choose the [deployment pattern][deploy] that fits your architecture. + +**Edge deployment:** Deploy {{< param "PRODUCT_NAME" >}} close to your data sources for minimal latency. +Run it as a DaemonSet in Kubernetes to collect from every node, install it on each host for infrastructure monitoring, or deploy it alongside applications for local processing. + +**Gateway deployment:** Deploy {{< param "PRODUCT_NAME" >}} as a centralized gateway. +Configure your applications to send telemetry to {{< param "PRODUCT_NAME" >}} gateways, which process and forward data to backends. +Applications only need to know about the gateway endpoints. + +**Hybrid deployment:** Combine edge and gateway approaches. +Deploy edge instances to handle initial collection and filtering close to sources, then forward to gateway instances for aggregation and final processing. +This pattern reduces bandwidth usage and enables centralized policy enforcement while maintaining local processing capabilities. + +## Integrations + +{{< param "PRODUCT_NAME" >}} integrates with Grafana Cloud and self-managed Grafana stacks, routing metrics to Mimir, logs to Loki, traces to Tempo, and profiles to Pyroscope. +It also works with the broader Prometheus ecosystem through full compatibility with the Prometheus exposition format and service discovery mechanisms, and with any OpenTelemetry-compatible backend through OTLP support. + +You can also connect to other ecosystems, including InfluxDB, Elasticsearch, and cloud platforms like AWS, Google Cloud Platform, and Azure. + +## Next steps + +- Review [requirements and expectations][requirements] to understand deployment considerations +- [Install][Install] {{< param "PRODUCT_NAME" >}} to get started +- Learn core [concepts][Concepts] including components, expressions, and pipelines +- Follow [tutorials][tutorials] for hands-on experience +- Explore the [component reference][reference] to see available components + +[requirements]: ../requirements/ +[Install]: ../../set-up/install/ +[Concepts]: ../../get-started/ +[tutorials]: ../../tutorials/ +[reference]: ../../reference/ +[components]: ../../get-started/components/ +[build pipelines]: ../../get-started/components/build-pipelines/ +[syntax]: ../../get-started/syntax/ +[expressions]: ../../get-started/expressions/ +[custom components]: ../../get-started/components/custom-components/ +[modules]: ../../get-started/modules/ +[Clustering]: ../../get-started/clustering/ +[debug]: ../../troubleshoot/debug/ +[deploy]: ../../set-up/deploy/ \ No newline at end of file diff --git a/docs/sources/introduction/requirements.md b/docs/sources/introduction/requirements.md new file mode 100644 index 00000000000..c603acf2dee --- /dev/null +++ b/docs/sources/introduction/requirements.md @@ -0,0 +1,201 @@ +--- +canonical: https://grafana.com/docs/alloy/latest/introduction/requirements/ +description: Understand supported environments, deployment expectations, and common constraints when running Grafana Alloy in production +menuTitle: Requirements +title: Requirements and expectations +weight: 250 +--- + +# Requirements and expectations + +Before you put {{< param "FULL_PRODUCT_NAME" >}} into production, it helps to have a clear picture of where it runs well, how it's usually deployed, and where people most often get surprised. + +Before a first deployment, people usually want answers to a few basic questions: + +- Will {{< param "PRODUCT_NAME" >}} run in my environment? +- How should I deploy it the first time? +- What kinds of constraints or trade-offs should I expect? + +The guidance here focuses on the common, supported paths that work well for most users, without diving into every possible edge case. + +## Design expectations + +{{< param "FULL_PRODUCT_NAME" >}} makes telemetry collection explicit and predictable, even when that means exposing trade-offs that other tools try to hide. + +A few design choices are worth keeping in mind: + +- {{< param "PRODUCT_NAME" >}} favors explicit configuration over implicit behavior. + You define pipelines, routing, and scaling decisions in configuration rather than relying on automatic inference. +- {{< param "PRODUCT_NAME" >}} exposes deployment and scaling choices instead of masking them. + Changes in topology—such as switching from a DaemonSet to a centralized deployment—can affect behavior, and those effects are intentional and visible. +- {{< param "PRODUCT_NAME" >}} consolidates multiple collectors, but it doesn't replicate every default or assumption from these other collectors. + Similar concepts may behave differently when the underlying goals differ. +- {{< param "PRODUCT_NAME" >}} prioritizes predictability over "magic" defaults. + Understanding how components connect and how work distributes is part of operating {{< param "PRODUCT_NAME" >}} successfully. + +Keeping these expectations in mind makes it easier to reason about configuration changes, scaling decisions, and observed behavior in production. + +## Supported platforms + +{{< param "PRODUCT_NAME" >}} runs on the following platforms: + +- Linux +- Windows +- macOS +- FreeBSD + +For supported architectures and version requirements, refer to [Supported platforms][supported platforms]. + +For setup instructions, refer to [Set up {{< param "PRODUCT_NAME" >}}][set up]. + +## Network requirements + +{{< param "PRODUCT_NAME" >}} requires network access for its HTTP server and for sending data to backends. + +### HTTP server + +{{< param "PRODUCT_NAME" >}} runs an HTTP server for its UI, API, and metrics endpoints. +By default, it listens on `127.0.0.1:12345`. + +For more information, refer to [HTTP endpoints][http]. + +### Outbound connectivity + +{{< param "PRODUCT_NAME" >}} needs outbound network access to send telemetry to your backends. +Ensure firewall rules and egress rules allow connections to: + +- Remote write or OTLP endpoints for metrics, such as Mimir, Prometheus, or Thanos +- Log ingestion endpoints, such as Loki, Elasticsearch, or OTLP-compatible backends +- Trace ingestion endpoints, such as Tempo, Jaeger, or OTLP-compatible backends +- Profile ingestion endpoints, such as Pyroscope + +### Cluster communication + +When you enable [clustering][], {{< param "PRODUCT_NAME" >}} nodes communicate over HTTP/2 using the same HTTP server port. +Each node must be reachable by other cluster members on the configured listen address. + +## Permissions and access + +Some {{< param "PRODUCT_NAME" >}} components interact closely with the host, container runtime, or Kubernetes APIs. +When that happens, {{< param "PRODUCT_NAME" >}} needs enough access to complete the work. + +This requirement most often comes up when collecting: + +- Host-level metrics, logs, traces, or profiles +- Container or runtime information +- Data that lives outside the application sandbox + +Not every component can run in a fully locked-down environment. +When {{< param "PRODUCT_NAME" >}} runs with restricted permissions, certain components might fail or behave unexpectedly. + +For information about running as a non-root user, refer to [Run as a non-root user][nonroot]. + +When you enable a component, check its documented requirements first. +Refer to the [component reference][reference] for component-specific constraints and limitations. + +## Security + +{{< param "PRODUCT_NAME" >}} supports TLS for secure communication. +Configure TLS in component `tls` blocks for backend connections, or use the [`--cluster.enable-tls` flag][run] for [clustered mode][clustering]. +Authentication methods such as basic auth, OAuth2, and bearer tokens are configured per component. + +### Secrets management + +Store sensitive values like API keys and passwords outside your configuration files. +{{< param "PRODUCT_NAME" >}} supports environment variable references and integrations such as HashiCorp Vault, Kubernetes Secrets, AWS S3, and local files. + +Refer to the [component documentation][reference] for specific options. + +## Deployment patterns + +{{< param "PRODUCT_NAME" >}} supports edge, gateway, and hybrid deployment patterns. +Refer to [How {{< param "PRODUCT_NAME" >}} works][how alloy works] for guidance on choosing the right pattern for your architecture. + +For detailed setup instructions, refer to [Deploy {{< param "PRODUCT_NAME" >}}][deploy]. + +## Clustering and scaling behavior + +Some {{< param "PRODUCT_NAME" >}} behavior depends on how you deploy it, not just on configuration. + +{{< param "PRODUCT_NAME" >}} supports [clustering][] to distribute work across multiple instances. +Clustering uses a gossip protocol and consistent hashing to distribute scrape targets automatically. + +{{< admonition type="note" >}} +Target auto-distribution requires enabling clustering at both the instance level and the component level. +Refer to [Clustering][clustering] for configuration details. + +[clustering]: ../../get-started/clustering/ +{{< /admonition >}} + +A few things that often surprise users: + +- More {{< param "PRODUCT_NAME" >}} instances means more meta-monitoring metrics. +- A switch between DaemonSet and centralized deployments can change observed series counts. +- Scaling clustered collectors changes how targets distribute, even when the target list stays the same. + +For resource planning guidance, refer to [Estimate resource usage][estimate resource usage]. + +## Data durability + +{{< param "PRODUCT_NAME" >}} uses a Write-Ahead Log (WAL) for metrics to handle temporary backend outages. +The WAL buffers data locally and retries sending when the backend becomes available. + +For the WAL to persist across restarts, configure persistent storage using the [`--storage.path` flag][run]. + +{{< admonition type="note" >}} +Without persistent storage, {{< param "PRODUCT_NAME" >}} loses buffered data on restart. +By default, {{< param "PRODUCT_NAME" >}} stores data in a temporary directory. +{{< /admonition >}} + +Push-based pipelines for logs, traces, and profiles have different durability characteristics. +Refer to [component documentation][reference] for more information. + +## Monitor Alloy + +{{< param "PRODUCT_NAME" >}} exposes metrics about its own health and performance at the `/metrics` endpoint. + +Key monitoring capabilities: + +- **Internal metrics:** Controller and component metrics in Prometheus format +- **Health endpoints:** `/-/ready` and `/-/healthy` for load balancer checks +- **Debugging UI:** Visual component graph and live debugging at `/` + +Refer to [Set up meta-monitoring][metamonitoring] for configuration examples. + +## Component capabilities + +Each {{< param "PRODUCT_NAME" >}} component has its own capabilities and limits. +Before you rely on a component in production, check: + +- Which signal types it accepts and emits: metrics, logs, traces, and profiles +- Whether the component is stable or still evolving +- Whether it's a native {{< param "PRODUCT_NAME" >}} component or wraps upstream OpenTelemetry Collector functionality + +Refer to the [component reference][reference] for this information. + +## Troubleshoot issues + +If something doesn't behave as expected after deployment: + +1. Review [Troubleshooting and debugging][debug]. +1. Check the [component documentation][reference]. +1. Revisit deployment patterns and clustering assumptions. + +## Next steps + +- [Set up {{< param "PRODUCT_NAME" >}}][set up] +- [Learn about clustering][clustering] +- [Explore components][reference] + +[supported platforms]: ../../set-up/supported-platforms/ +[http]: ../../reference/http/ +[reference]: ../../reference/ +[run]: ../../reference/cli/run/ +[nonroot]: ../../configure/nonroot/ +[deploy]: ../../set-up/deploy/ +[clustering]: ../../get-started/clustering/ +[estimate resource usage]: ../../set-up/estimate-resource-usage/ +[metamonitoring]: ../../collect/metamonitoring/ +[debug]: ../../troubleshoot/debug/ +[set up]: ../../set-up/ +[how alloy works]: ../how-alloy-works/ diff --git a/docs/sources/introduction/supported-platforms.md b/docs/sources/introduction/supported-platforms.md deleted file mode 100644 index 3ef18b601b2..00000000000 --- a/docs/sources/introduction/supported-platforms.md +++ /dev/null @@ -1,31 +0,0 @@ ---- -canonical: https://grafana.com/docs/alloy/latest/introduction/supported-platforms/ -description: Supported platforms for Grafana Alloy -menuTitle: Supported platforms -title: Supported platforms -weight: 200 ---- - -# Supported platforms - -The following operating systems and hardware architecture are supported. - -## Linux - -* Architectures: AMD64, ARM64 -* Within the Linux distribution lifecycle - -## Windows - -* Minimum version: Windows Server 2016 or later, or Windows 10 or later. -* Architectures: AMD64 - -## macOS - -* Minimum version: macOS 10.13 or later -* Architectures: AMD64 on Intel, ARM64 on Apple Silicon - -## FreeBSD - -* Within the FreeBSD lifecycle -* Architectures: AMD64 diff --git a/docs/sources/introduction/why-alloy.md b/docs/sources/introduction/why-alloy.md new file mode 100644 index 00000000000..a68e85eb2bd --- /dev/null +++ b/docs/sources/introduction/why-alloy.md @@ -0,0 +1,128 @@ +--- +canonical: https://grafana.com/docs/alloy/latest/introduction/why-alloy/ +description: Understand when Grafana Alloy is the right choice for your telemetry collection needs +menuTitle: Why Alloy +title: Why Grafana Alloy +weight: 200 +--- + +# Why {{% param "FULL_PRODUCT_NAME" %}} + +{{< param "FULL_PRODUCT_NAME" >}} simplifies telemetry collection by consolidating multiple collectors into one solution. + +## The telemetry collection challenge + +Telemetry collection in production environments can quickly become complex as different teams develop different needs over time. + +Consider a common scenario. +You start with infrastructure observability, using Prometheus to scrape metrics from Node Exporter. +Your metrics flow to a Prometheus database, and you visualize them in Grafana dashboards. +This works well for monitoring infrastructure. + +Later, you want to add application observability and start analyzing distributed traces. +Prometheus doesn't support traces, so you add the OpenTelemetry Collector. +Now you're running two different collectors, each with its own configuration syntax, deployment requirements, and operational overhead. + +As your observability needs grow, you might add a separate collector for logs, another tool for continuous profiling, and different agents for different environments. +Before long, you're managing multiple collectors, learning different configuration languages, troubleshooting various failure modes, and dealing with increased memory and CPU overhead. + +{{< param "PRODUCT_NAME" >}} addresses these challenges by handling all signal types in a single deployment with one configuration language. +You learn one tool, deploy one collector, and maintain one system. + +## When to use {{% param "PRODUCT_NAME" %}} + +{{< param "PRODUCT_NAME" >}} excels in several scenarios. +The following sections help you identify whether it fits your needs. + +### You need multiple signal types + +{{< param "PRODUCT_NAME" >}} natively supports metrics from both Prometheus and OpenTelemetry sources, application and system logs, distributed traces using OpenTelemetry, and continuous profiling data. + +If you're running separate collectors for metrics and traces, or planning to add log collection to your metrics pipeline, {{< param "PRODUCT_NAME" >}} lets you consolidate into a single solution. + +For example, if you monitor infrastructure with Prometheus but want to add distributed tracing for your microservices, you can use {{< param "PRODUCT_NAME" >}} to handle both with one collector instead of deploying multiple collectors. + +### You want to reduce collector complexity + +Running multiple collectors creates operational overhead. +You have to learn different configuration languages, manage separate deployments and upgrades, troubleshoot different failure modes, and monitor multiple systems—all while consuming more resources. + +{{< param "PRODUCT_NAME" >}} consolidates all of this. +You learn one configuration language, manage one deployment, and use a single built-in UI for debugging. +This unified approach reduces both complexity and resource consumption. + +For example, if your team runs Prometheus for metrics, Fluentd for logs, and Jaeger agents for traces, {{< param "PRODUCT_NAME" >}} can replace all three and simplify your telemetry architecture. + +### You need both Prometheus and OpenTelemetry + +{{< param "PRODUCT_NAME" >}} works with both ecosystems simultaneously. +It includes native Prometheus remote write support, full OpenTelemetry protocol support, Prometheus service discovery mechanisms, OpenTelemetry instrumentation compatibility, and the ability to convert between formats. + +You don't have to choose between Prometheus and OpenTelemetry. +If you have Prometheus deployments and instrumentation but your applications use OpenTelemetry, {{< param "PRODUCT_NAME" >}} collects from both while you standardize on one collector. + +### You value vendor neutrality + +{{< param "PRODUCT_NAME" >}} supports sending data to Grafana Cloud, a self-managed Grafana stack with Loki, Mimir, Tempo, and Pyroscope, any Prometheus-compatible database, any OpenTelemetry-compatible backend, or multiple destinations simultaneously. + +This flexibility means you're not locked into a single vendor or backend. +You can send data to Grafana Cloud for some telemetry and self-managed systems for others, or change backends without changing your collector. + +For example, if you want to send metrics to Grafana Cloud but keep logs on-premises for compliance reasons, {{< param "PRODUCT_NAME" >}} can send metrics to the cloud and logs to your local Loki instance from the same configuration. + +### Your observability needs are growing + +{{< param "PRODUCT_NAME" >}} provides features for scaling, including clustering to distribute workload across multiple instances for high availability and horizontal scaling, remote configuration to manage fleet-wide configurations from a central location, and automatic workload distribution across cluster members. + +Start with a single {{< param "PRODUCT_NAME" >}} instance and scale to clusters as your needs grow, without changing your approach. + +### You're running on Kubernetes + +{{< param "PRODUCT_NAME" >}} offers Kubernetes-native features including first-class support for discovering Kubernetes resources, components that interact with Kubernetes APIs directly, native understanding of pods, services, and custom resources, and support for DaemonSet and Deployment patterns. + +No separate Kubernetes operator is required. +The Kubernetes discovery components automatically find and scrape pods as they start and stop, without additional configuration. + +### You want programmable pipelines + +The {{< param "PRODUCT_NAME" >}} configuration language lets you create conditional logic in your pipelines, reference data from one component in another, build reusable pipeline modules, transform and filter data with built-in functions, and respond dynamically to changing conditions. + +If you need more than basic "collect and forward" functionality, the programmable approach provides the flexibility you need. +Common scenarios include routing high-priority metrics to one backend while sampling lower-priority data, extracting useful labels from high-cardinality fields to manage storage costs, standardizing attribute names when different teams use inconsistent conventions, and redacting sensitive tokens or credentials from logs before they reach storage. + +### You want to share pipelines across teams + +The module system allows you to create custom components that combine multiple steps, package and share pipelines with your team, use community-contributed modules, and maintain consistent collection patterns across services. + +Your platform team can create a standard monitoring module that application teams import and configure with their specific settings, without understanding the underlying complexity. + +## What {{% param "PRODUCT_NAME" %}} replaces + +{{< param "PRODUCT_NAME" >}} can consolidate multiple collectors. +Replace Prometheus Agent to gain the same functionality plus support for logs, traces, and profiles. +Replace the OpenTelemetry Collector to add native Prometheus support alongside OTLP. +Replace specialized log collectors like Promtail, Fluentd, or Filebeat with a unified collection approach. + +You can also run {{< param "PRODUCT_NAME" >}} alongside collectors during migration to transition gradually without disrupting your observability. +Refer to the [migration guides][migrate] for step-by-step instructions. + +## When {{% param "PRODUCT_NAME" %}} might not be the right choice + +{{< param "PRODUCT_NAME" >}} is powerful and flexible, but it's not always the best fit. + +Consider alternatives if you only need basic Prometheus metrics scraping with no additional features, as Prometheus Agent might be simpler. +If you're deeply integrated with a specific collector's ecosystem and don't need multi-signal support, staying with your current solution may make more sense. +If you have very specific requirements that available components don't address, evaluate whether the benefits outweigh the effort of migration. + +## Next steps + +- Learn [how {{< param "PRODUCT_NAME" >}} works][how alloy works] to understand the architecture +- Review [requirements and expectations][requirements] to understand deployment considerations +- [Install][Install] {{< param "PRODUCT_NAME" >}} to get started +- Follow a [tutorial][tutorial] for hands-on experience + +[how alloy works]: ../how-alloy-works/ +[requirements]: ../requirements/ +[Install]: ../../set-up/install/ +[tutorial]: ../../tutorials/ +[migrate]: ../../set-up/migrate/ \ No newline at end of file diff --git a/docs/sources/reference/release-information/_index.md b/docs/sources/reference/release-information/_index.md new file mode 100644 index 00000000000..8b05c71cd13 --- /dev/null +++ b/docs/sources/reference/release-information/_index.md @@ -0,0 +1,15 @@ +--- +canonical: https://grafana.com/docs/alloy/latest/reference/release-information/ +description: Versioning policies and release schedules for Grafana Alloy +menuTitle: Release information +title: Release information +weight: 750 +--- + +# Release information + +This section provides information about {{< param "PRODUCT_NAME" >}} versioning policies and release schedules. + +For product lifecycle information, refer to [Grafana Cloud product lifecycle](https://grafana.com/docs/release-life-cycle/). + +{{< section >}} diff --git a/docs/sources/introduction/backward-compatibility.md b/docs/sources/reference/release-information/backward-compatibility.md similarity index 72% rename from docs/sources/introduction/backward-compatibility.md rename to docs/sources/reference/release-information/backward-compatibility.md index 32ca3c90026..d8f879b2983 100644 --- a/docs/sources/introduction/backward-compatibility.md +++ b/docs/sources/reference/release-information/backward-compatibility.md @@ -1,9 +1,11 @@ --- -canonical: https://grafana.com/docs/alloy/latest/introduction/backward-compatibility/ +aliases: + - ../../introduction/backward-compatibility/ +canonical: https://grafana.com/docs/alloy/latest/reference/release-information/backward-compatibility/ description: Grafana Alloy backward compatibility menuTitle: Backward compatibility title: Grafana Alloy backward compatibility -weight: 999 +weight: 100 --- # {{% param "FULL_PRODUCT_NAME" %}} backward compatibility @@ -13,27 +15,27 @@ weight: 999 Documented functionality that's released as _Generally available_ is covered by backward compatibility, including: -* **User configuration**, including the {{< param "PRODUCT_NAME" >}} configuration syntax, the semantics of the configuration file, and the command-line interface. +- **User configuration**, including the {{< param "PRODUCT_NAME" >}} configuration syntax, the semantics of the configuration file, and the command-line interface. -* **APIs**, for any network or code API released as v1.0.0 or later. +- **APIs**, for any network or code API released as v1.0.0 or later. -* **Observability data used in official dashboards**, where the official set of dashboards are found in the [`alloy-mixin/`][alloy-mixin] directory. +- **Observability data used in official dashboards**, where the official set of dashboards are found in the [`alloy-mixin/`][alloy-mixin] directory. ## Exceptions We strive to maintain backward compatibility, but there are situations that may arise that require a breaking change without a new major version, deviating from [item 8 of the semver specification][]: -* **Security**: A security issue may arise that requires breaking compatibility. +- **Security**: A security issue may arise that requires breaking compatibility. -* **Legal requirements**: If an exposed behavior violates a licensing or legal requirement, a breaking change may be required. +- **Legal requirements**: If an exposed behavior violates a licensing or legal requirement, a breaking change may be required. -* **Specification errors**: If a specification for a feature is found to be incomplete or inconsistent, fixing the specification may require a breaking change. +- **Specification errors**: If a specification for a feature is found to be incomplete or inconsistent, fixing the specification may require a breaking change. -* **Bugs**: If a bug is found that goes against the documented specification of that functionality, fixing the bug may require breaking compatibility for users who are relying on the incorrect behavior. +- **Bugs**: If a bug is found that goes against the documented specification of that functionality, fixing the bug may require breaking compatibility for users who are relying on the incorrect behavior. -* **Upstream changes**: Much of the functionality of {{< param "PRODUCT_NAME" >}} is built on top of other software, such as OpenTelemetry Collector and Prometheus. If upstream software breaks compatibility, we may need to reflect this in {{< param "PRODUCT_NAME" >}}. +- **Upstream changes**: Much of the functionality of {{< param "PRODUCT_NAME" >}} is built on top of other software, such as OpenTelemetry Collector and Prometheus. If upstream software breaks compatibility, we may need to reflect this in {{< param "PRODUCT_NAME" >}}. -* **Community components**: Community components are components implemented and maintained by the community. They aren't covered by the backward compatibility strategy. +- **Community components**: Community components are components implemented and maintained by the community. They aren't covered by the backward compatibility strategy. We try, whenever possible, to resolve these issues without breaking compatibility. diff --git a/docs/sources/introduction/release-cadence.md b/docs/sources/reference/release-information/release-cadence.md similarity index 81% rename from docs/sources/introduction/release-cadence.md rename to docs/sources/reference/release-information/release-cadence.md index 175ac41f94b..6c038116212 100644 --- a/docs/sources/introduction/release-cadence.md +++ b/docs/sources/reference/release-information/release-cadence.md @@ -1,9 +1,11 @@ --- -canonical: https://grafana.com/docs/alloy/latest/introduction/release-cadence/ +aliases: + - ../../introduction/release-cadence/ +canonical: https://grafana.com/docs/alloy/latest/reference/release-information/release-cadence/ description: The release cadence for Grafana Alloy menuTitle: Release cadence title: Release cadence -weight: 400 +weight: 200 --- # Release cadence diff --git a/docs/sources/set-up/deploy.md b/docs/sources/set-up/deploy.md index 49987f10c42..a2c1b713c33 100644 --- a/docs/sources/set-up/deploy.md +++ b/docs/sources/set-up/deploy.md @@ -5,7 +5,7 @@ aliases: description: Learn about possible deployment topologies for Grafana Alloy menuTitle: Deploy title: Deploy Grafana Alloy -weight: 900 +weight: 400 --- # Deploy {{% param "FULL_PRODUCT_NAME" %}} @@ -35,21 +35,21 @@ You can also use a Kubernetes Deployment in cases where persistent storage isn't ### Pros -* Straightforward scaling using [clustering][] -* Minimizes the "noisy neighbor" effect -* Easy to meta-monitor +- Straightforward scaling using [clustering][] +- Minimizes the "noisy neighbor" effect +- Easy to meta-monitor ### Cons -* Requires running on separate infrastructure +- Requires running on separate infrastructure ### Use for -* Scalable telemetry collection +- Scalable telemetry collection ### Don't use for -* Host-level metrics and logs +- Host-level metrics and logs ## As a host daemon @@ -68,24 +68,24 @@ The simplest use case of the host daemon topology is a Kubernetes DaemonSet, and ### Pros -* Doesn't require running on separate infrastructure -* Typically leads to smaller-sized collectors -* Lower network latency to instrumented applications +- Doesn't require running on separate infrastructure +- Typically leads to smaller-sized collectors +- Lower network latency to instrumented applications ### Cons -* Requires planning a process for provisioning {{< param "PRODUCT_NAME" >}} on new machines, as well as keeping configuration up to date to avoid configuration drift -* Not possible to scale independently when using Kubernetes DaemonSets -* Scaling the topology can strain external APIs (like service discovery) and network infrastructure (like firewalls, proxy servers, and egress points) +- Requires planning a process for provisioning {{< param "PRODUCT_NAME" >}} on new machines, as well as keeping configuration up to date to avoid configuration drift +- Not possible to scale independently when using Kubernetes DaemonSets +- Scaling the topology can strain external APIs (like service discovery) and network infrastructure (like firewalls, proxy servers, and egress points) ### Use for -* Collecting machine-level Prometheus metrics and logs (for example, node_exporter hardware metrics, Kubernetes Pod logs) +- Collecting machine-level Prometheus metrics and logs (for example, node_exporter hardware metrics, Kubernetes Pod logs) ### Don't use for -* Scenarios where {{< param "PRODUCT_NAME" >}} grows so large it can become a noisy neighbor -* Collecting an unpredictable amount of telemetry +- Scenarios where {{< param "PRODUCT_NAME" >}} grows so large it can become a noisy neighbor +- Collecting an unpredictable amount of telemetry ## As a container sidecar @@ -100,25 +100,25 @@ The Pod's controller, network configuration, enabled capabilities, and available ### Pros -* Doesn't require running on separate infrastructure -* Straightforward networking with partner applications +- Doesn't require running on separate infrastructure +- Straightforward networking with partner applications ### Cons -* Doesn't scale separately -* Makes resource consumption harder to monitor and predict -* Each {{< param "PRODUCT_NAME" >}} instance doesn't have a life cycle of its own, making it harder to do things like recovering from network outages +- Doesn't scale separately +- Makes resource consumption harder to monitor and predict +- Each {{< param "PRODUCT_NAME" >}} instance doesn't have a life cycle of its own, making it harder to do things like recovering from network outages ### Use for -* Serverless services -* Job/batch applications that work with a push model -* Air-gapped applications that can't be otherwise reached over the network +- Serverless services +- Job/batch applications that work with a push model +- Air-gapped applications that can't be otherwise reached over the network ### Don't use for -* Long-lived applications -* Scenarios where the {{< param "PRODUCT_NAME" >}} deployment size grows so large it can become a noisy neighbor +- Long-lived applications +- Scenarios where the {{< param "PRODUCT_NAME" >}} deployment size grows so large it can become a noisy neighbor [clustering]: ../../configure/clustering/ @@ -133,8 +133,8 @@ This provides better stability due to the isolation between processes. For example, an overloaded {{< param "PRODUCT_NAME" >}} instance processing traces won't impact an {{< param "PRODUCT_NAME" >}} instance processing metrics. Different types of signal collection require different methods for scaling: -* "Pull" components such as `prometheus.scrape` and `pyroscope.scrape` are scaled using hashmod sharing or clustering. -* "Push" components such as `otelcol.receiver.otlp` are scaled by placing a load balancer in front of the components. +- "Pull" components such as `prometheus.scrape` and `pyroscope.scrape` are scaled using hashmod sharing or clustering. +- "Push" components such as `otelcol.receiver.otlp` are scaled by placing a load balancer in front of the components. ### Traces @@ -146,9 +146,10 @@ This similarity is because most {{< param "PRODUCT_NAME" >}} components used for #### When to scale To decide whether scaling is necessary, check metrics such as: -* `otelcol_receiver_refused_spans_total` from receivers such as `otelcol.receiver.otlp`. -* `otelcol_receiver_refused_spans_total` from processors such as `otelcol.processor.batch`. -* `otelcol_exporter_send_failed_spans_total` from exporters such as `otelcol.exporter.otlp` and `otelcol.exporter.loadbalancing`. + +- `otelcol_receiver_refused_spans_total` from receivers such as `otelcol.receiver.otlp`. +- `otelcol_receiver_refused_spans_total` from processors such as `otelcol.processor.batch`. +- `otelcol_exporter_send_failed_spans_total` from exporters such as `otelcol.exporter.otlp` and `otelcol.exporter.loadbalancing`. #### Stateful and stateless components @@ -160,9 +161,9 @@ You can forward spans with `otelcol.exporter.loadbalancing`. Examples of stateful components: -* `otelcol.processor.tail_sampling` -* `otelcol.connector.spanmetrics` -* `otelcol.connector.servicegraph` +- `otelcol.processor.tail_sampling` +- `otelcol.connector.spanmetrics` +- `otelcol.connector.servicegraph` @@ -173,7 +174,8 @@ A stateless {{< param "PRODUCT_NAME" >}} instance can be scaled without using `o For example, you could use an off-the-shelf load balancer to do a round-robin load balancing. Examples of stateless components: -* `otelcol.processor.probabilistic_sampler` -* `otelcol.processor.transform` -* `otelcol.processor.attributes` -* `otelcol.processor.span` + +- `otelcol.processor.probabilistic_sampler` +- `otelcol.processor.transform` +- `otelcol.processor.attributes` +- `otelcol.processor.span` diff --git a/docs/sources/introduction/estimate-resource-usage.md b/docs/sources/set-up/estimate-resource-usage.md similarity index 86% rename from docs/sources/introduction/estimate-resource-usage.md rename to docs/sources/set-up/estimate-resource-usage.md index ad69b0eca30..2f395e2d167 100644 --- a/docs/sources/introduction/estimate-resource-usage.md +++ b/docs/sources/set-up/estimate-resource-usage.md @@ -1,11 +1,12 @@ --- -canonical: https://grafana.com/docs/alloy/latest/introduction/estimate-resource-usage/ +canonical: https://grafana.com/docs/alloy/latest/set-up/estimate-resource-usage/ aliases: - ../tasks/estimate-resource-usage/ # /docs/alloy/latest/tasks/estimate-resource-usage/ + - ../introduction/estimate-resource-usage/ # /docs/alloy/latest/introduction/estimate-resource-usage/ description: Estimate expected Grafana Alloy resource usage title: Estimate Grafana Alloy resource usage menuTitle: Estimate resource usage -weight: 300 +weight: 500 --- # Estimate {{% param "FULL_PRODUCT_NAME" %}} resource usage @@ -23,9 +24,9 @@ The Prometheus metrics resource usage depends mainly on the number of active ser As a rule of thumb, **per each 1 million active series** and with the default scrape interval, you can expect to use approximately: -* 0.4 CPU cores -* 11 GiB of memory -* 1.5 MiB/s of total network bandwidth, send and receive +- 0.4 CPU cores +- 11 GiB of memory +- 1.5 MiB/s of total network bandwidth, send and receive These recommendations are based on deployments that use [clustering][], but they broadly apply to other deployment modes. Refer to [Deploy {{< param "FULL_PRODUCT_NAME" >}}][deploy] for more information on how to deploy {{< param "PRODUCT_NAME" >}}. @@ -36,8 +37,8 @@ Loki logs resource usage depends mainly on the volume of logs ingested. As a rule of thumb, **per each 1 MiB/second of logs ingested**, you can expect to use approximately: -* 1 CPU core -* 120 MiB of memory +- 1 CPU core +- 120 MiB of memory These recommendations are based on Kubernetes DaemonSet deployments on clusters with relatively small number of nodes and high logs volume on each. The resource usage can be higher per each 1 MiB/second of logs if you have a large number of small nodes due to the constant overhead of running the {{< param "PRODUCT_NAME" >}} on each node. @@ -50,8 +51,8 @@ Pyroscope profiles resource usage depends mainly on the volume of profiles. As a rule of thumb, **per each 100 profiles/second**, you can expect to use approximately: -* 1 CPU core -* 10 GiB of memory +- 1 CPU core +- 10 GiB of memory Factors such as size of each profile and frequency of fetching them also play a role in the overall resource usage. diff --git a/docs/sources/set-up/supported-platforms.md b/docs/sources/set-up/supported-platforms.md new file mode 100644 index 00000000000..1f2a5211095 --- /dev/null +++ b/docs/sources/set-up/supported-platforms.md @@ -0,0 +1,33 @@ +--- +canonical: https://grafana.com/docs/alloy/latest/set-up/supported-platforms/ +aliases: + - ../introduction/supported-platforms/ # /docs/alloy/latest/introduction/supported-platforms/ +description: Supported platforms for Grafana Alloy +menuTitle: Supported platforms +title: Supported platforms +weight: 50 +--- + +# Supported platforms + +The following operating systems and hardware architectures are supported. + +## Linux + +- Architectures: AMD64, ARM64 +- Within the Linux distribution lifecycle + +## Windows + +- Minimum version: Windows Server 2016 or later, or Windows 10 or later. +- Architectures: AMD64 + +## macOS + +- Minimum version: macOS 10.13 or later +- Architectures: AMD64 on Intel, ARM64 on Apple Silicon + +## FreeBSD + +- Within the FreeBSD lifecycle +- Architectures: AMD64