-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New component: starlarktransform processor #27087
Comments
@TylerHelmuth, created a follow-up proposal implementing changes based on the feedback from the Collector Sig. It is still possible that this processor will not be accepted but I wanted to create a history showing the various objections for any future devs with similar ideas. Let me know if you have any additional questions or issues. Thanks. |
@daidokoro can you share some scenarios/problems where the starlarktransform processor is the right solution? |
Hey @TylerHelmuth, Going through my list and comparing it to the current version of the transform processor, I would say that the transform processor has been updated sufficiently to handle most of my outliers. Scenario:Given: {
"resourceSpans": [
{
"resource": {
"attributes": [
{
"key": "telemetry.sdk.language",
"value": {
"stringValue": "go"
}
},
{
"key": "zones",
"value": {
"stringValue": "[\"eu-west-1a\", \"eu-west-1b\"]"
}
},
{
"key": "metadata",
"value": {
"stringValue": "{\"os\": \"linux\", \"arch\": \"x86\", \"instances\": 2}"
}
}
]
},
"scopeSpans": [
{
"scope": {
"name": "opentelemetry.instrumentation.flask",
"version": "0.40b0"
},
"spans": [
{
"traceId": "9cb5bf738137b2248dc7b20445ec2e1c",
"spanId": "88079ad5c94b5b13",
"parentSpanId": "",
"name": "/roll",
"kind": 2,
"startTimeUnixNano": "1694388218052842000",
"endTimeUnixNano": "1694388218053415000",
"attributes": [],
"status": {}
}
]
}
]
}
]
}
Desired outcome:
SolutionsTransform Processor The transform processor could handle the 1st criteria by offloading the JSON into cache, setting the required keys from it and them deleting the metadata field. For 2nd criteria, I'm not sure how this would be handled, I could extract the required values from the array using regex, however, I'm not sure there is a way to enumerate key names. I can't hardcode the key names as there is a potential for the length of the array to be arbitrary. Starlark Processor Both criteria can be handled by the following: processors:
starlarktransform/traces:
code: |
def unpack_zones(event):
e = json.decode(event)
for rs in e["resourceSpans"]:
zones = [
json.decode(attr["value"]["stringValue"])
for attr in rs["resource"]["attributes"]
if attr["key"] == "zones"
]
if not zones:
return e
zones = zones[0]
count = 0
for zone in zones:
rs["resource"]["attributes"].append({
"key": "zone_{}".format(count),
"value": {
"stringValue": zone
}
})
count += 1
rs["resource"]["attributes"] = [
r for r in rs["resource"]["attributes"]
if r["key"] != "zones"
]
return e
def unpack_metadata(e):
for rs in e["resourceSpans"]:
metadata = [
json.decode(attr["value"]["stringValue"])
for attr in rs["resource"]["attributes"]
if attr["key"] == "metadata"
]
if not metadata:
return e
metadata = metadata[0]
for k, v in metadata.items():
rs["resource"]["attributes"].append({
"key": k,
"value": {
"stringValue" if type(v) == "string" else "intValue": v
}
})
# remove item from the list
rs["resource"]["attributes"] = [
r for r in rs["resource"]["attributes"]
if r["key"] != "metadata"
]
return e
def transform(event):
event = unpack_zones(event)
return unpack_metadata(event) This results in:
Note that the scenario above isn't the reason the starlark processor was created. Whether or not this is possible using the transform processor, isn't the issue. The issue is the lack of familiarity with the transform processor itself in complex cases. Note that I mentioned, I was unsure how to approach the issue using the transform processor, even though I've been through the docs a few times. It may well be possible to do this in the transform processor, but it was not obvious. However, I knew how solve the issue instinctively with the starlark processor, as it is simply managing a This brings me to Scenario No. 2: Any time critical situation in which it is unclear how to solve a particular transform using the transform processor. Users can complete the transform using the starlark processor, then do the necessary research to figure out how to accomplish it using the main transform processor. Sometimes this may involve opening an Issue via github, etc. |
I believe this has been discussed in a Sig meeting already, but just in case I'm mistaken I'll drop the new component blurb below. If you have not found a volunteer sponsor yet then I encourage you to come to our weekly collector sig meetings. You can add an item to the agenda to discuss this new component proposal. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
This processor has been released via open source and is available here |
Sad shit that this was rejected, the internal otel transformer is abysmal |
The purpose and use-cases of the new component
starlarktransform
The starlarktransform processor modifies telemetry based on configuration using Starlark code.
Starlark is a scripting language used for configuration that is designed to be similar to Python. It is designed to be fast, deterministic, and easily embedded into other software projects.
The processor leverages Starlark to modify telemetry data while using familiar, pythonic syntax. Modifying telemetry data is as a simple as modifying a
Dict
.Why?
While there are a number of transform processors, most notably, the main OTTL transform processor, this processor aims to grant users more flexibility by allowing them to manipulate telemetry data using a familiar syntax.
Python is a popular, well known language, even among non-developers. By allowing Starlark code to be used as an option to transform telemetry data, users can leverage their existing knowledge of Python.
How it works
The processor uses the Starlark-Go interpreter, this allows you to run this processor without having to install a Starlark language interpreter on the host.
Features
The starlarktransform processor gives you access to the full telemetry event payload. You are able to modify this payload using the Starklark code in any way you want. This allows you do various things such as:
Libs, Functions and Functionality
While similar in syntax to Python, Starlack does not have all the functionality associated with Python. This processor does not have access to Python standard libraries and the implementation found in this processor is limited further to only the following libraries and functions:
You can read more on the JSON library here
The print statement above would result in the following output in the Open Telemetry runtime log. Again, this output is only visible when running Open Telemetry in Debug mode.
Note that you can define your own functions within your Starlark code, however, there must be at least one function named
transform
that accepts a single argumentevent
and returns a JSON decoded Dict, this function can call all your other functions as needed.Warnings
The starlarktransform processor allows you to modify all aspects of your telemetry data. This can result in invalid or bad data being propogated if you are not careful. It is your responsibility to inspect the data and ensure it is valid.
Example configuration for the component
You must define a function called
transform
that accepts a single argument,event
. This function is called by the processor and is passed the telemetry event. The function must return the modified, json decoded event.Full Configuration Example
For following configuration example demonstrates the starlarktransform processor telemetry events for logs, metrics and traces.
Telemetry data types supported
Supports:
Is this a vendor-specific component?
Code Owner(s)
@daidokoro
Sponsor (optional)
No response
Additional context
This proposal follows the previous pytransform processor proposal and aims to address issues raised with the previous implementation.
Previous Concerns:
This processor does not aim to compete or replace the OTTL Transform processor. The goal is to provide an approachable or familiar method for accomplishing telemetry data transformations. Benchmarks show the main transform processor is 2x faster than the starlark processor. So if performance is a key requirement, the transform processor is recommended.
The main transform processor also offers significant abstraction, allowing users to accomplish more with less code for certain tasks.
Take the following transformation for example.
To accomplish the same using the starlarktransform processor:
As you can see, there is no abstraction for dealing with the underlying data types when using the starlark processor. Again asserting that this is not meant to compete with the transform processor, but instead only to provide a familiar alternative.
As stated above, this processor is approximately 2x slower than the transform processors.
Unlike the previous implementstion that used embedded python, the starlark implementation does not spawn subprocesses to execute starlark code. The Interpreter itself is embedded using the Starlark Go package.
Starlark code does not have access to the internet, filesystem or any system processes. There are also no additional libraries, except those explicitly defined and allowed by the interpreter. It is completely sandboxed.
Current code implementation can be found here
Active PR
The text was updated successfully, but these errors were encountered: