近日发现 opencensus-service 项目有了一些新的动作,主要涉及:
- 将实验性项目 opencensus-service 重命名为 opencensus-agent ;
- 实现一个 Go exporter 用于和 opencensus-agent 交互:
- Proto (agent): opencensus/proto/agent
- Go exporter (agent): opencensus-go-exporter-ocagent
- agent 需要满足:
- [trace/interceptor] The agent needs to be able to pull or receive traces from various sources (Zipkin/Jaeger), and then transform them into the OpenCensus Proto-defined Spans for later consumption - issues#19
- [metrics/interceptor] The agent will be able to pull metrics from sources (Prometheus), and transform them into OpenCensus Metrics which can then be consumed/exported to the various backends - issues#20
- agent architecture diagram: here
- OpenCensus Agent Proto - an overview on how OpenCensus Agent works with Library, Collector/Service and Monitoring backend. However, in the overview section Agent is treated as a blackbox and many implementation details are omitted.
This repository contains the Go implementation of the OpenCensus Agent (OC-Agent) Exporter. OC-Agent is a deamon process running in a VM that can retrieve spans/stats/metrics from OpenCensus Library, export them to other backends and possibly push configurations back to Library.
Note: This is an experimental repository and is likely to get backwards-incompatible changes. Ultimately we may want to move the OC-Agent Go Exporter to OpenCensus Go core library.
- OC-Agent 就是之前的 opencensus-service ,后续会变为 opencensus-agent ;
- OC-Agent Go Exporter 最后会合并回 opencensus-go/exporter 下;
This package describes the OpenCensus Agent protocol.
common
package contains the common messages shared between different services, such asNode
,Service
andLibrary
identifiers.trace
package contains the Trace Service protos.metrics
package contains the Metrics Service protos.- (Coming soon)
stats
package contains the Stats Service protos.
On a typical VM/container, there are user applications running in some processes/pods with OpenCensus Library (
Library
). Previously,Library
did all the recording, collecting, sampling and aggregation on spans/stats/metrics, and exported them to other persistent storage backends via theLibrary
exporters, or displayed them on local zpages. This pattern has several drawbacks, for example:
- For each OpenCensus
Library
, exporters/zpages need to be re-implemented in native languages.- In some programming languages (e.g
Ruby
,PHP
), it is difficult to do the stats aggregation in process.- To enable exporting OpenCensus spans/stats/metrics, application users need to manually add library exporters and redeploy their binaries. This is especially difficult when there’s already an incident and users want to use OpenCensus to investigate what’s going on right away.
- Application users need to take the responsibility in configuring and initializing exporters. This is error-prone (e.g they may not set up the correct credentials\monitored resources), and users may be reluctant to “pollute” their code with OpenCensus.
基于 library 模式进行 instrument 时面临的问题;
To resolve the issues above, we are introducing OpenCensus Agent (
Agent
).Agent
runs as a daemon in the VM/container and can be deployed independent ofLibrary
. OnceAgent
is deployed and running, it should be able to retrieve spans/stats/metrics fromLibrary
, export them to other backends. We MAY also giveAgent
the ability to push configurations (e.g sampling probability) toLibrary
. For those languages that cannot do stats aggregation in process, they should also be able to send raw measurements and haveAgent
do the aggregation.
基于 OpenCensus Agent 的实现(可以做到的东西):
Agent
在部署后,即可立即从Library
获取 spans/stats/metrics 数据,并 export 到相应的各种 backends ;- 可以通过
Agent
反向推送配置给Library
; - 针对不方便实现 stats aggregation 的语言,可以在
Agent
中完成;
For developers/maintainers of other libraries: Agent can also be extended to accept spans/stats/metrics from other tracing/monitoring libraries, such as Zipkin, Prometheus, etc. This is done by adding specific interceptors.
对于其他类型的(非官方支持) Library
,可以通过对 Agent
进行扩展,进而处理相应的数据;这部分功能可以基于特定的 interceptors 实现来完成;
old:
new:
To support
Agent
,Library
should have “agent exporters”, similar to the existing exporters to other backends. There should be 3 separate agent exporters for tracing/stats/metrics respectively. Agent exporters will be responsible for sending spans/stats/metrics and (possibly) receiving configuration updates fromAgent
.
这里解释了 Agent 和 Agent exporter 的关系,以及 Agent exporter 需要支持的功能;
Communication between
Library
andAgent
should user a bi-directional gRPC stream.Library
should initiate the connection, since there’s only one dedicated port forAgent
, while there could be multiple processes withLibrary
running. By default,Agent
is available on port 55678.
- 双向 gRPC stream ;
Library
侧发起连接建立;Agent
监听在 55678 端口;
Library
will try to directly establish connections for Config and Export streams.- As the first message in each stream,
Library
must sent its identifier. Each identifier should uniquely identifyLibrary
within the VM/container. If there is no identifier in the first message, Agent should drop the whole message and return an error to the client. In addition, the first message MAY contain additional data (such asSpan
s). As long as it has a valid identifier assoicated, Agent should handle the data properly, as if they were sent in a subsequent message. Identifier is no longer needed once the streams are established.- If streams were disconnected and retries failed, the
Library
identifier would be considered expired onAgent
side.Library
needs to start a new connection with a unique identifier (MAY be different than the previous one).
This section describes the in-process implementation details of OC-Agent.
Note: Red arrows represent RPCs or HTTP requests. Black arrows represent local method invocations.
The Agent consists of three main parts:
- The interceptors of different instrumentation libraries, such as OpenCensus, Zipkin, Istio Mixer, Prometheus client, etc. Interceptors act as the “frontend” or “gateway” of Agent. In addition, there MAY be one special receiver for receiving configuration updates from outside.
- The core Agent module. It acts as the “brain” or “dispatcher” of Agent.
- The exporters to different monitoring backends or collector services, such as Omnition Collector, Stackdriver Trace, Jaeger, Zipkin, etc.
Agent 由三部分构成:
- interceptors
- core module
- exporters
Each interceptor can be connected with multiple instrumentation libraries. The communication protocol between interceptors and libraries is the one we described in the proto files (for example trace_service.proto). When a library opens the connection with the corresponding interceptor, the first message it sends must have the Node identifier. The interceptor will then cache the Node for each library, and Node is not required for the subsequent messages from libraries.
- 每个 interceptor 可以和多个 instrumentation libraries 建立连接;
- Interceptors 和 libraries 之间的通信协议由 proto 文件定义(例如 trace_service.proto);
- 当一个 library 打开了一个和某个 interceptor 对应的连接,第一条消息必须带有 Node identifier ;之后 interceptor 会为缓存 Node 信息以便所有 library 可用,因此来自 libraries 的后续消息将不在需要 Node 信息了;
Most functionalities of Agent are in Agent Core. Agent Core's responsibilies include:
- Accept
SpanProto
from each interceptor. Note that theSpanProto
s that are sent to Agent Core must have Node associated, so that Agent Core can differentiate and group SpanProtos by each Node.- Store and batch
SpanProto
s.- Augment the
SpanProto
orNode
sent from the interceptor. For example, in a Kubernetes container, Agent Core can detect the namespace, pod id and container name and then add them to its record of Node from interceptor.- For some configured period of time, Agent Core will push
SpanProto
s (grouped by Nodes) to Exporters.- Display the currently stored
SpanProto
s on local zPages.- MAY accept the updated configuration from Config Receiver, and apply it to all the config service clients.
- MAY track the status of all the connections of Config streams. Depending on the language and implementation of the Config service protocol, Agent Core MAY either store a list of active Config streams (e.g gRPC-Java), or a list of last active time for streams that cannot be kept alive all the time (e.g gRPC-Python).
Once in a while, Agent Core will push
SpanProto
withNode
to each exporter. After receiving them, each exporter will translateSpanProto
to the format supported by the backend (e.g Jaeger Thrift Span), and then push them to corresponding backend or service.
- Install
ocagent
if you haven't.
$ go get github.com/census-instrumentation/opencensus-service/cmd/ocagent
Create a config.yaml
file in the current directory and modify it with the exporter and interceptor configuration.
For example, following configuration exports both to Stackdriver and Zipkin.
# config.yaml
stackdriver:
project: "your-project-id"
enableTraces: true
zipkin:
endpoint: "http://localhost:9411/api/v2/spans"
To modify the address that the OpenCensus interceptor runs on, please use the YAML field name opencensus_interceptor
and it takes fields like address. For example:
opencensus_interceptor:
address: "localhost:55678"
Run the example application that collects traces and exports to the daemon if it is running.
- 首先运行
ocagent
$ ocagent
- 之后运行 demo 应用
$ go run "$(go env GOPATH)/src/github.com/census-instrumentation/opencensus-service/example/main.go"
You should be able to see the traces in Stackdriver
and Zipkin
. If you stop the ocagent
, example application will stop exporting. If you run it again, exporting will resume.
The OpenCensus Collector is a component that runs “nearby” (e.g. in the same VPC, AZ, etc.) a user’s application components and receives trace spans and metrics emitted by the OpenCensus Agent or tasks instrumented with OpenCensus instrumentation (or other supported protocols/libraries). The received spans and metrics could be emitted directly by clients in instrumented tasks, or potentially routed via intermediate proxy sidecar/daemon agents (such as the OpenCensus Agent). The collector provides a central egress point for exporting traces and metrics to one or more tracing and metrics backends, with buffering and retries as well as advanced aggregation, filtering and annotation capabilities.
collector 提供了中央 egress 出口,用于导出 traces 和 metrics 到一个或多个 backends ;collector 提供了缓冲功能,提供了失败重试功能,提供了聚合、过滤,以及 annotation 等高级能力;
The collector is extensible enabling it to support a range of out-of-the-box (and custom) capabilities such as:
- Retroactive (tail-based) sampling of traces 抽样率反制调整方法
- Cluster-wide z-pages
- Filtering of traces and metrics 过滤
- Aggregation of traces and metrics 聚合
- Decoration with meta-data from infrastructure provider (e.g. k8s master) 额外信息添加
- much more ...
The collector also serves as a control plane for agents/clients by supplying them updated configuration (e.g. trace sampling policies), and reporting agent/client health information/inventory metadata to downstream exporters.
collector 还可以作为控制面,为 agents/clients 提供配置更新功能,提供健康状态报告元数据给下游的 exporters ;
The OpenCensus Collector runs as a standalone instance and receives spans and metrics exporterd by one or more OpenCensus Agents or Libraries, or by tasks/agents that emit in one of the supported protocols. The Collector is configured to send data to the configured exporter(s). The following figure summarizes the deployment architecture:
The OpenCensus Collector can also be deployed in other configurations, such as receiving data from other agents or clients in one of the formats supported by its interceptors.
- install the collector if you haven't
$ go get github.com/census-instrumentation/opencensus-service/cmd/occollector
Create a config.yaml
file in the current directory and modify it with the collector exporter configuration.
omnition:
tenant: "your-api-key"
- install ocagent if you haven't
$ go get github.com/census-instrumentation/opencensus-service/cmd/ocagent
Create a config.yaml
file in the current directory and modify it with the collector exporter configuration.
collector:
endpoint: "https://collector.local"
- Run the example application that collects traces and exports to the daemon if it is running
$ go run "$(go env GOPATH)/src/github.com/census-instrumentation/opencensus-service/example/main.go"
- Run ocagent
$ ocagent
You should be able to see the traces in the configured tracing backend. If you stop the ocagent, example application will stop exporting. If you run it again, it will start exporting again.