Conversation
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Codecov Report❌ Patch coverage is
❌ Your project status has failed because the head coverage (80.90%) is below the target coverage (86.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1701 +/- ##
==========================================
- Coverage 80.91% 80.90% -0.01%
==========================================
Files 146 147 +1
Lines 13374 13378 +4
==========================================
+ Hits 10821 10823 +2
- Misses 1890 1891 +1
- Partials 663 664 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
🤞 |
| // The full text of the Apache license is available in the LICENSE file at | ||
| // the root of the repo. | ||
|
|
||
| package json |
There was a problem hiding this comment.
centralize the json functions here so that we can switch the impl later (if we ever want to) easily
nacx
left a comment
There was a problem hiding this comment.
This is nice and LGTM!
Is there a Linter rule we can configure to prevent using encoding/json and make sure we always use our internal package? It is very easy to start using encoding/json in the future by mistake.
|
Already added a linter rule ;) |
# Conflicts: # tests/internal/testupstreamlib/server_test.go # tests/internal/testupstreamlib/testupstream/main.go
Oh, true, I missed it :) |
**Description**
This replaces encoding/json with bytedance/sonic for faster json
operations. This makes the data plane benchmark results drastically
better, especially for the translation code path. One thing to note is
that we could use goccy/go-json, but it comes with some incompatibility
(no omitzero tag support, etc). On the other hand, bytesdance/sonic is
99% compatible with the current behavior except for the field order,
which semantically doesn't matter in practice.
```
goos: linux
goarch: amd64
pkg: github.com/envoyproxy/ai-gateway/tests/data-plane
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ChatCompletions/OpenAI_-_small-16 135.0µ ± 1% 107.7µ ± 2% -20.24% (p=0.000 n=10)
ChatCompletions/OpenAI_-_medium-16 2.546m ± 1% 1.456m ± 1% -42.83% (p=0.000 n=10)
ChatCompletions/OpenAI_-_large-16 28.50m ± 7% 17.41m ± 4% -38.91% (p=0.000 n=10)
ChatCompletions/OpenAI_-_xlarge-16 141.24m ± 3% 71.07m ± 7% -49.68% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_small-16 155.5µ ± 1% 120.4µ ± 1% -22.57% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_medium-16 3.475m ± 1% 2.004m ± 1% -42.33% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_large-16 37.01m ± 2% 24.08m ± 3% -34.93% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_xlarge-16 346.1m ± 3% 100.2m ± 5% -71.04% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_small-16 172.3µ ± 1% 141.9µ ± 2% -17.63% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_medium-16 4.371m ± 1% 2.827m ± 1% -35.31% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_large-16 51.21m ± 3% 32.67m ± 3% -36.19% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_xlarge-16 344.9m ± 2% 102.3m ± 2% -70.33% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_small-16 232.4µ ± 1% 198.4µ ± 1% -14.64% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_medium-16 9.098m ± 1% 7.513m ± 2% -17.43% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_large-16 102.14m ± 4% 83.27m ± 3% -18.48% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_xlarge-16 2.014 ± 3% 1.450 ± 2% -27.99% (p=0.000 n=10)
ChatCompletionsStreaming/OpenAI_Streaming-16 13.11m ± 0% 13.12m ± 1% ~ (p=0.190 n=10)
ChatCompletionsStreaming/AWS_Streaming-16 13.11m ± 0% 13.10m ± 0% ~ (p=0.353 n=10)
geomean 11.33m 7.424m -34.50%
```
---------
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Description**
This replaces encoding/json with bytedance/sonic for faster json
operations. This makes the data plane benchmark results drastically
better, especially for the translation code path. One thing to note is
that we could use goccy/go-json, but it comes with some incompatibility
(no omitzero tag support, etc). On the other hand, bytesdance/sonic is
99% compatible with the current behavior except for the field order,
which semantically doesn't matter in practice.
```
goos: linux
goarch: amd64
pkg: github.com/envoyproxy/ai-gateway/tests/data-plane
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ChatCompletions/OpenAI_-_small-16 135.0µ ± 1% 107.7µ ± 2% -20.24% (p=0.000 n=10)
ChatCompletions/OpenAI_-_medium-16 2.546m ± 1% 1.456m ± 1% -42.83% (p=0.000 n=10)
ChatCompletions/OpenAI_-_large-16 28.50m ± 7% 17.41m ± 4% -38.91% (p=0.000 n=10)
ChatCompletions/OpenAI_-_xlarge-16 141.24m ± 3% 71.07m ± 7% -49.68% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_small-16 155.5µ ± 1% 120.4µ ± 1% -22.57% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_medium-16 3.475m ± 1% 2.004m ± 1% -42.33% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_large-16 37.01m ± 2% 24.08m ± 3% -34.93% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_xlarge-16 346.1m ± 3% 100.2m ± 5% -71.04% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_small-16 172.3µ ± 1% 141.9µ ± 2% -17.63% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_medium-16 4.371m ± 1% 2.827m ± 1% -35.31% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_large-16 51.21m ± 3% 32.67m ± 3% -36.19% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_xlarge-16 344.9m ± 2% 102.3m ± 2% -70.33% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_small-16 232.4µ ± 1% 198.4µ ± 1% -14.64% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_medium-16 9.098m ± 1% 7.513m ± 2% -17.43% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_large-16 102.14m ± 4% 83.27m ± 3% -18.48% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_xlarge-16 2.014 ± 3% 1.450 ± 2% -27.99% (p=0.000 n=10)
ChatCompletionsStreaming/OpenAI_Streaming-16 13.11m ± 0% 13.12m ± 1% ~ (p=0.190 n=10)
ChatCompletionsStreaming/AWS_Streaming-16 13.11m ± 0% 13.10m ± 0% ~ (p=0.353 n=10)
geomean 11.33m 7.424m -34.50%
```
---------
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
**Description**
This replaces encoding/json with bytedance/sonic for faster json
operations. This makes the data plane benchmark results drastically
better, especially for the translation code path. One thing to note is
that we could use goccy/go-json, but it comes with some incompatibility
(no omitzero tag support, etc). On the other hand, bytesdance/sonic is
99% compatible with the current behavior except for the field order,
which semantically doesn't matter in practice.
```
goos: linux
goarch: amd64
pkg: github.com/envoyproxy/ai-gateway/tests/data-plane
cpu: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
ChatCompletions/OpenAI_-_small-16 135.0µ ± 1% 107.7µ ± 2% -20.24% (p=0.000 n=10)
ChatCompletions/OpenAI_-_medium-16 2.546m ± 1% 1.456m ± 1% -42.83% (p=0.000 n=10)
ChatCompletions/OpenAI_-_large-16 28.50m ± 7% 17.41m ± 4% -38.91% (p=0.000 n=10)
ChatCompletions/OpenAI_-_xlarge-16 141.24m ± 3% 71.07m ± 7% -49.68% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_small-16 155.5µ ± 1% 120.4µ ± 1% -22.57% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_medium-16 3.475m ± 1% 2.004m ± 1% -42.33% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_large-16 37.01m ± 2% 24.08m ± 3% -34.93% (p=0.000 n=10)
ChatCompletions/AWS_Bedrock_-_xlarge-16 346.1m ± 3% 100.2m ± 5% -71.04% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_small-16 172.3µ ± 1% 141.9µ ± 2% -17.63% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_medium-16 4.371m ± 1% 2.827m ± 1% -35.31% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_large-16 51.21m ± 3% 32.67m ± 3% -36.19% (p=0.000 n=10)
ChatCompletions/GCP_VertexAI_-_xlarge-16 344.9m ± 2% 102.3m ± 2% -70.33% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_small-16 232.4µ ± 1% 198.4µ ± 1% -14.64% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_medium-16 9.098m ± 1% 7.513m ± 2% -17.43% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_large-16 102.14m ± 4% 83.27m ± 3% -18.48% (p=0.000 n=10)
ChatCompletions/GCP_AnthropicAI_-_xlarge-16 2.014 ± 3% 1.450 ± 2% -27.99% (p=0.000 n=10)
ChatCompletionsStreaming/OpenAI_Streaming-16 13.11m ± 0% 13.12m ± 1% ~ (p=0.190 n=10)
ChatCompletionsStreaming/AWS_Streaming-16 13.11m ± 0% 13.10m ± 0% ~ (p=0.353 n=10)
geomean 11.33m 7.424m -34.50%
```
---------
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Description
This replaces encoding/json with bytedance/sonic for faster json operations. This makes the data plane benchmark results drastically better, especially for the translation code path. One thing to note is that we could use goccy/go-json, but it comes with some incompatibility (no omitzero tag support, etc). On the other hand, bytesdance/sonic is 99% compatible with the current behavior except for the field order, which semantically doesn't matter in practice.