Skywalking benchmark on Finance Core System #9002
Replies: 5 comments 23 replies
-
Thanks for the sharing! Looks like the tables are messed up because GitHub trying to limit the width. Let's find out a better way to show the work.. |
Beta Was this translation helpful? Give feedback.
-
https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-telemetry/#service-discovery-on-kubernetes service discover on Kubernetes should work as I use this to set up self observability multiple times and all the OAP instances are correctly shown as their Pod IPs, can you share more details how you set it up and how the result looks like?
I personally think using SkyWalking gRPC exporter is a quick and practical way to save those key metrics for longer term |
Beta Was this translation helpful? Give feedback.
-
@wu-sheng the author say they will keep updating the perf result but you moved it to your own gist account, I think the author won't be able to update it there |
Beta Was this translation helpful? Give feedback.
-
@lewiselau in terms of the exception |
Beta Was this translation helpful? Give feedback.
-
A question, what does CCU represent?
I don't have any idea how is Ali MySQL PaaS working? Agent basically works with client-side driver(JDBC in this case), if somehow it is changed, then no relative data.
The exporter is designed for this. Choose exporting mode only.
Decreasing this definitely works, if it is acceptable. Storage definitely is the most challenging thing, which is why last year, we began the new design of BanyanDB, https://github.com/apache/skywalking-banyandb. From now, we are going to work more on BanyanDB rather than ElasticSearch. ElasticSearch's resource(iops or saas bill) cost per service is clearly very high, and time-series databases(InfluxDB and IoTDB) don't show distinct performance improvement, especially when face log/trace and are being impacted hugely by trace ID(too many index candidate).
The number of threads and thread pools are already large enough, notice Zhenxu mentioned issue, others should be good for 9.0.0. |
Beta Was this translation helpful? Give feedback.
-
Hi Community,
We have a performance Test on Skywalking, on base of Finance Core system.
The platform architecture:
Phased Summary:
Some open questions:
We utilized Skywalking self observibility to trace the OAP side performance, however we can only find one OAP instance 'localhost:1234' in the scenario of multi OAP pods, even though we practiced static hostname or pod discovered with K8S, in regarding to the setup manual: https://skywalking.apache.org/docs/main/latest/en/setup/backend/backend-telemetry/.
<20220509> This one was resolved. Per review from professional community, the key points are:
- It is opencensus on oap end, so it is opencensus exporter in opentelemetry collector configmap, to reference https://github.com/apache/skywalking-showcase/blob/main/deploy/platform/kubernetes/feature-so11y/open-telemetry.yaml#L51
- Disable SW_PROMETHEUS_FETCHER, which is only for static way
- Keep oap container name and service name same as the ones defined in scrape_configs job of opentelemetry, e.g oap as container name and oap-server as service name.
Please refer to my practice in Appendix part.
No data on Virtual database UI within our environments (Ali K8S PaaS, Ali MySQL PaaS), however on other scenario it works well. No idea so far.
<20220512> This one was resolved. Root cause is, Skywalking enhance was impacted by the enhance from another APM Tracer - Aliyun ARMS, which led to below enhance failure:
Disable ARMS, then Virtual database UI works well.
We hope to archive some key metrics for long-term analysis, e.g. apdex and p9x response time data. However, there are no default solution. We are trying to validate on 3 options: a) GraphQL with script; b) Coding a GRPC Server to activate Skywalking Exporter feature; c) From storage level, ETL to another storage. We are still on the way.
<20220512> On the way to GRPC Server Option, and will update later on.
After Small version upgrade within oap 9.0 (From v9.1.0-SNAPSHOT-85CE164(20220217220813) to 9.1.0 - 20220508)), no new collected data can be not presented on UI with below notification on UI:
Exception while fetching data (/data) : null
The data can be presented after Storage refresh, e.g. ES.
<20220522> Per test, new data include metrics/trace/log is able to write onto Elastic Search, and also the input data within one week is able to be qeury from UI normally. But in case query the data in past one month where the data over 1 week is already remove by housekeeping job, there will be Exception warning from UI, "Exception while fetching data (/data) : null" as below:
And also, some warinig log generated as below:
Detail performance test data
as below, and we will continue to update later on.
https://gist.github.com/lewiselau/9aa8bec87e3af682d7229660f965fbab
Update: 20220508 00
Direct image here - seems gist not visible to sometime
Appendix:
1. Self obsevibility setting on Discovery K8S manner
Per review and guidance from Community professor, the k8s-discovery option of self-observibility feature is activated. And here attach the configuration file as successful practice.
Environment parameter and Ports on OAP deployment
ports:
env:
Configmap/deployment on opentelemetry collector
Permission setting since K8S API Server will be accessed for pod metadata information.
2. The UI snapshot
So far we achieved self-observibility on both statis and K8S-Discovery way.
Beta Was this translation helpful? Give feedback.
All reactions