Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exports block query errors to prometheus metrics (counter) #1239

Merged
merged 5 commits into from
Jul 25, 2023

Conversation

boojamya
Copy link
Contributor

@boojamya boojamya commented Jul 12, 2023

This PR...

Exports block query errors to prometheus metrics. It adds it as a counter type so we can monitor the rate of witch the counter increases.

They are divided into two categories:
RPC Client query errors
and
IBC HEADER query errors.

Screenshot 2023-07-12 at 11 06 54 AM

(This also updates the docs to the default metrics port of 5183)

Closes: #1211

@boojamya boojamya added T: Enhancement TYPE: Enhancement C: Monitoring COMPONENT: Monitoring labels Jul 12, 2023
@boojamya boojamya requested a review from nourspace July 20, 2023 21:32
@boojamya boojamya enabled auto-merge (squash) July 25, 2023 19:25
@boojamya boojamya merged commit 107d3f5 into main Jul 25, 2023
@boojamya boojamya deleted the dan/prom-errors branch July 25, 2023 19:32
ashishchandr70 added a commit to sagaxyz/relayer that referenced this pull request Oct 4, 2023
* Better Error Messaging when failing to query the Block Height (cosmos#1189)

* better block data errors

* remove redundant field

* penumbra: update protos (cosmos#1181)

Matches the latest protos shipped with the Penumbra Testnet 52.

Co-authored-by: Conor Schaefer <[email protected]>
Co-authored-by: Justin Tieri <[email protected]>

* Neutron launch fixes and optimizations (cosmos#1185)

* pipe max msgs through path processor

* only apply max msgs to packet msgs

* multiple msgs simultaneously on ordered chans

* flush should be more frequent if it fails or does not complete

* fix legacy

* handle feedback

* Problem: fixes in ibc-go v7.0.1 are not included (cosmos#1205)

* Problem: fixes in ibc-go v7.0.1 are not included

* add change doc

* Harry/rly address (cosmos#1204)

* added addresCmd to root and keys.go

* nicks

* nick

* made a common method "showAddressByChainAndKey" to be used by both addressCmd and keysShowCmd

---------

Co-authored-by: Harry <[email protected]>
Co-authored-by: Andrew Gouin <[email protected]>

* deps: update to ibc-go v7.1.0-rc0 (cosmos#1207)

* Export wallet address for Prometheus metrics (cosmos#1206)

* export relayer address for pro

* address in updateFeesSpent

* make error messages consistent

* log error rather than return

* handle 0 balance

* chore: replace gogo/protobuf to cosmos/gogoproto (cosmos#1208)

* rm dup pb

* add missing cp for messages.proto

* make proto-gen

* migrate gogo/protobuf to cosmos/gogoproto

* add change log

* mod tidy

* feat: add extension_options parameter in chain configs (cosmos#1179)

* support extension options for build tx

* add test

* add change doc

* rm dup pb

* update mod

* point to v0.47.3

* feat: support localhost ibc (cosmos#1191)

* WIP: testing localhost ibc client

* WIP: localhost ibc support

* WIP: debugging channel handshake correlation bug

* WIP: debugging localhost IBC

* debugging failed ibc transfers

* chore: remove debug output

* fix: get acks working + cleanup localhost data handling

* test: add additional assertions, debug failing timeouts

* fix: remove redundant chankey assignment

* fix: update to latest interchaintest commit + fix test

* fix: hack to get ordered channels working on localhost

* test: implement ica test case for localhost

* fix: update reverted Go version

* test: fix flaky scenario e2e test

* fix: address suggestions from code review

* dep: bump cometbft and ibc-go (cosmos#1221)

* bump cometbft to v0.37.2

* bump ibc-go to v7.2.0

* add change doc

* add missing stop relayer to avoid log after test complete (cosmos#1229)

Co-authored-by: Justin Tieri <[email protected]>

* fix: avoid invalid Bech32 prefix in scenario test (cosmos#1226)

* avoid invalid Bech32 prefix

due to singleton GetConfig

* add change doc

* separate process in ci

* separate fee middleware test for juno

* wait more blks for ack (cosmos#1222)

* penumbra provider: update proof spec (cosmos#1232)

* fix: flag accessed but not defined: flush-interval (cosmos#1238)

* penumbra provider: update protos (cosmos#1245)

* fix: Suppressing scary SDK error on redundant packets (cosmos#1214)

Co-authored-by: Andrew Gouin <[email protected]>

* catch error if type is missing (cosmos#1234)

Co-authored-by: Andrew Gouin <[email protected]>

* Export client expiration metric to prometheus (cosmos#1235)

* export client expiration metric

* finalize

* add path name

* snake case

* change label to `chain`

* trusting period as string

* Export configured gas prices to prometheus wallet balance metric (cosmos#1236)

* export gas price to prom

* update label

* update fees spent metric

* snake case

* Exports block query errors to prometheus metrics (counter) (cosmos#1239)

* separate by type

* add help info

* remove new line in help and fix readme

* feedback

* Export TX failures to prometheus metrics (counter) (cosmos#1240)

* export tx failures to prometheus

* change label to `cause`

* use the name given by the user to generate the fetch URL (cosmos#1233)

* use the name given by the user to generate the fetch URL

* add example

---------

Co-authored-by: Andrew Gouin <[email protected]>
Co-authored-by: Dan Kanefsky <[email protected]>

* feegrant PR (cosmos#1140)

* Feegrant support

* Test case for address caching bugfix

* Bugfix for SDK account prefix. Feegrant test passing.

* Mutex for signer expanded to include feegrantees

* Cleaned up feegrant test case

* Cleaned up feegrant test case

* Cleaned up feegrant test case

* check round robin feegrant behavior by counting number of TXs each grantee signer

* module updates from merge

* v0.47.0 with bech32 address cache fix

* Move SetAddrCacheEnabled to NewRelayer func for full coverage

* Do not hardcode chain id in feegrant test case

* Wait more blocks for ibc transfers

* disable cosmos SDK bech32 address cache for rly start command

* Fix sloppy comments/remove unnecessary code

* Faster acc caching unit test

* Penumbra provider feegrant support

* Merge upstream

* Fixed merge issue where feegrant config wasn't being written to file

* feegrant patch for cosmos-sdk v0.47.1

* merge from main

* Update to cosmos-sdk v0.47.2

* Increase test case blocks to wait

* Fixed data race by moving test parallelization after relayer wallet build

* Increased TestScenarioICAChannelClose timeout height

* Cleanup feegrant test case

* Fixed race condition in sequence guard w/ mutex

* Automatic retry for TX lookup in feegrant test case

---------

Co-authored-by: Andrew Gouin <[email protected]>

* Export Client Trusting Period to Prometheus metrics (cosmos#1246)

* export client trusting period

* update docs

* separate feegrant test to avoid no space left on device (cosmos#1250)

from scenarios test in ci

* Add extra client info when querying client expiration (cosmos#1247)

* extra client info

* cleanup print

* remove extra comments

* add alias

* allow only one client

* spelling

* remainingTime

---------

Co-authored-by: Justin Tieri <[email protected]>

* next seq ack handling (cosmos#1244)

* next seq ack handling and chan order

* use max msgs for ack flush

* improve logs

* fix check

* don't override unless not chantypes.NONE

* fix: Suppressing scary SDK error on redundant packets (cosmos#1214)

Co-authored-by: Andrew Gouin <[email protected]>

* tidy logic

* improve logic and order detection

* shorten flushFailureRetry

* check empty string

* tidy logs. better account sequence regex. don't split up ordered channel batches

---------

Co-authored-by: Joe Abbey <[email protected]>
Co-authored-by: jtieri <[email protected]>

* chore: update penumbra protos to v0.57.0 (cosmos#1249)

Version 0.57.0 of Penumbra was released on 2023-07-26 [0].
This commit pulls in the latest proto defs from BSR.

[0] https://github.com/penumbra-zone/penumbra/releases/tag/v0.57.0

Co-authored-by: Conor Schaefer <[email protected]>
Co-authored-by: Justin Tieri <[email protected]>

* fix: reduce get bech32 prefix when get signer (cosmos#1231)

* add missing getFeePayer for clienttypes

* add missing getFeePayer for feetypes

* apply in penumbra

* add change doc

---------

Co-authored-by: Justin Tieri <[email protected]>

* update setup-go action (cosmos#1251)

* fix for feegrants (cosmos#1256)

* Feegrant support

* Test case for address caching bugfix

* Bugfix for SDK account prefix. Feegrant test passing.

* Mutex for signer expanded to include feegrantees

* Cleaned up feegrant test case

* Cleaned up feegrant test case

* Cleaned up feegrant test case

* check round robin feegrant behavior by counting number of TXs each grantee signer

* module updates from merge

* v0.47.0 with bech32 address cache fix

* Move SetAddrCacheEnabled to NewRelayer func for full coverage

* Do not hardcode chain id in feegrant test case

* Wait more blocks for ibc transfers

* disable cosmos SDK bech32 address cache for rly start command

* Fix sloppy comments/remove unnecessary code

* Faster acc caching unit test

* Penumbra provider feegrant support

* Merge upstream

* Fixed merge issue where feegrant config wasn't being written to file

* feegrant patch for cosmos-sdk v0.47.1

* merge from main

* Update to cosmos-sdk v0.47.2

* Increase test case blocks to wait

* Fixed data race by moving test parallelization after relayer wallet build

* Increased TestScenarioICAChannelClose timeout height

* Cleanup feegrant test case

* Fixed race condition in sequence guard w/ mutex

* Automatic retry for TX lookup in feegrant test case

* Disable cosmos SDK address cache on app initialization via main package init()

* Added docs for feegrant in the advanced usage guide

* Removed commented out code

* Removed commented out code

* Added detail to feegrant docs, fixed minor issue with test case

* Added detail to feegrant docs, fixed minor issue with test case

---------

Co-authored-by: Andrew Gouin <[email protected]>
Co-authored-by: Kyle <[email protected]>

* chore: update penumbra protos (cosmos#1260)

Updating to the latest Penumbra upstream protos.
We'll likely submit another round of changes ahead of the next public
testnet release, as a heads up.

Co-authored-by: Conor Schaefer <[email protected]>

* Change 2.3.0 to 2.4.0 (cosmos#1253)

Co-authored-by: Justin Tieri <[email protected]>

* rename path to path_name for consistency (cosmos#1262)

* Use unique names for relayer images & cleanup when purpose served (cosmos#1269)

* Use unique names for relayer images & cleanup when purpose served

* move random tag generation and teardown image to within BuildRelayerImage

* fix return line

* use ibc-go capability module (cosmos#1277)

* use ibc-go capability module

* tidy interchaintest

* fix: use resp.Events to parse events instead of logs (cosmos#1271)

* fix: use resp.Events to parse events instead of logs

* revert: use legacy behaviour as fallback mechanism

* refactor: use legacy approach first, fallback onto new parsing approach

---------

Co-authored-by: Justin Tieri <[email protected]>

* working correctly with 7.3 and 0.47.5 (cosmos#1280)

Co-authored-by: Justin Tieri <[email protected]>

* Add use command to "rly keys" (cosmos#1282)

* Update README.md with an update to leverage 'rly key use' in step 5 (cosmos#1289)

* Update README.md with an update to leverage 'rly key use' in step 5

* change alias from 'a' to 'u' for use subcommand

* Ability to fetch specific chain paths only (cosmos#1291)

* feat: allow a relayer to fetch a specific chain only

* minor: check specific path pair logic earlier

* faddat/upgrade go (cosmos#1279)

* upgrade ci to go 1.21

* upgrade to go1.21

---------

Co-authored-by: Justin Tieri <[email protected]>

* Split scenarios test (cosmos#1294)

* Split scenarios test

* use matrix

* updates

* need to cd into dir first

* handle deprication

* Remove rouge entry into the matrix

* Ensure other parallel tests run to completion even if one of them fail

* Add explanation

* Remove rougue whitespace

* Add verbosity and timeout for scenarios tests (cosmos#1295)

* Add output flag for query sub commands  results printed to console.  (cosmos#1281)

* output json for query cmd's  balance & clients-expirations
* print proper json instead of bytes for headers command
* Add output flag . Use legacy,json options
* Update according to reviews

* Add ability to fetch testnet chains and paths + force-add ability (cosmos#1285)

* add testnet and force-add

* update cmd examples

* improve usage description

* update interchaintest workflow (cosmos#1298)

* Query param prop directly (cosmos#1264)

* Query param prop directly

* Flip order of queries for QueryUnbondingPeriod

* Add fallback for chains using cosmos-sdk 47+

* Trusting period logic remains same

* Add Fallback

* Consolidate functions into a single queryParamsSubspaceTime

* Saga IBC transfers between SPC and chainlet

* Set keystore password

* Poll chainlet until it's up and running

* Remove unused mnemonic file

* Rename example .json files to .json.example

* Env vars renaming

* Rename SPC_EXTERNAL_ADDRESS -> SPC_RPC_EXTERNAL_ADDRESS

* Feature/ccv (#2)

* Updates to support CCV

* Re-added out/err redirect to log files

* Add multiple tries for linking. Upd chain waiting

* Minor fix to Github action. Update rly start

---------

Co-authored-by: Ashish Chandra <[email protected]>

* Committing go.work.sum

---------

Co-authored-by: Keefer Taylor | Tessellated <[email protected]>
Co-authored-by: Conor Schaefer <[email protected]>
Co-authored-by: Conor Schaefer <[email protected]>
Co-authored-by: Justin Tieri <[email protected]>
Co-authored-by: Andrew Gouin <[email protected]>
Co-authored-by: mmsqe <[email protected]>
Co-authored-by: Cosmos-Harry <[email protected]>
Co-authored-by: Harry <[email protected]>
Co-authored-by: Dan Kanefsky <[email protected]>
Co-authored-by: Ava Howell <[email protected]>
Co-authored-by: mindcarver <[email protected]>
Co-authored-by: Joe Abbey <[email protected]>
Co-authored-by: murataniloener <[email protected]>
Co-authored-by: KyleMoser <[email protected]>
Co-authored-by: jtieri <[email protected]>
Co-authored-by: Kyle <[email protected]>
Co-authored-by: Sr20de <[email protected]>
Co-authored-by: danb <[email protected]>
Co-authored-by: vimystic <[email protected]>
Co-authored-by: Jacob Gadikian <[email protected]>
Co-authored-by: colin axnér <[email protected]>
Co-authored-by: Reece Williams <[email protected]>
Co-authored-by: Konstantin Munichev <[email protected]>
Co-authored-by: Ashish Chandra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Monitoring COMPONENT: Monitoring T: Enhancement TYPE: Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prometheus metric for block query errors
3 participants