fix: remove invalid chars in prometheus metrics#6065
Conversation
WalkthroughRemoved the type name utility and its tests, dropped the convert_case dependency, and changed tipset state caches to require explicit per-cache identifier strings — updating call sites and removing previous default/derived cache naming and public re-exports. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant StateManager
participant TipsetCacheFactory as TipsetStateCache
participant LRU as LruCacheImpl
StateManager->>TipsetCacheFactory: new("state_output")
Note right of TipsetCacheFactory: construct name "tipset_state_state_output"
TipsetCacheFactory->>LRU: initialize with name & size
LRU-->>TipsetCacheFactory: cache instance
TipsetCacheFactory-->>StateManager: TipsetStateCache<StateOutputValue>
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing Touches
🧪 Generate unit tests
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/utils/misc/type.rs (3)
7-10: Regex cleanup: avoid zero-length global matches; fix typo; keep intent.
((... )|\s)*can match empty strings;replace_allwill still terminate, but this does extra work.- Rename
multi_understore_pattern→multi_underscore_pattern.Lean regexes with the same behavior:
- let prefix_pattern = lazy_regex::regex!(r#"(([^,:<>]+::)|\s)*"#); + // Remove every `segment::` path prefix anywhere (keeps the final type identifiers) + let prefix_pattern = lazy_regex::regex!(r#"(?:[A-Za-z_][A-Za-z0-9_]*)::"#); @@ - let multi_understore_pattern = lazy_regex::regex!(r#"_{2,}"#); + let multi_underscore_pattern = lazy_regex::regex!(r#"_{2,}"#); @@ - let n = multi_understore_pattern.replace_all(n.as_ref(), "_"); + let n = multi_underscore_pattern.replace_all(n.as_ref(), "_");Also applies to: 14-14
12-18: Harden invariants for Prometheus: non-empty and non-digit-leading names.If this feeds metric/label names (not just values), enforce Prometheus constraints: only [A-Za-z0-9_], not empty, and not starting with a digit. Current code can yield an empty string for types like
()(becomes_→ trimmed to"").Apply tail-end normalization:
- n.as_ref() - .strip_suffix("_") - .unwrap_or(n.as_ref()) - .to_string() + { + // Trim both ends (underscores can accumulate at boundaries) + let mut s = n.as_ref().trim_matches('_').to_string(); + if s.is_empty() { + s.push_str("Unknown"); + } + if s.starts_with(|c: char| c.is_ascii_digit()) { + s.insert(0, '_'); + } + s + }Confirm intended target (metric name vs label value). If it’s a metric or label name, consider an assertion test ensuring:
- starts with [A-Za-z_] (metric name can also start with ‘_’),
- only [A-Za-z0-9_],
- non-empty.
30-31: Tests updated appropriately; consider adding invariants coverage.LGTM on changed expectations.
Add a couple of edge-case tests to lock in Prometheus-safety invariants:
#[test] fn test_short_type_name_edge_cases() { // Unit type should not produce empty let u = short_type_name::<()>(); assert!(!u.is_empty()); // If used as (label/metric) name, ensure allowed charset and leading char let name = short_type_name::<Option<&'static str>>(); assert!(name.chars().all(|c| c.is_ascii_alphanumeric() || c == '_')); // Metric/label name rule (skip if this is only a label value): assert!(name.starts_with(|c: char| c.is_ascii_alphabetic() || c == '_')); }Also applies to: 34-34, 36-36, 39-39
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/utils/misc/type.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: tests
- GitHub Check: tests-release
- GitHub Check: All lint checks
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
🔇 Additional comments (1)
src/utils/misc/type.rs (1)
4-6: Breaking API change (Cow<'static, str> → String): confirm external impact or keep compatibility.Repo scan: only internal callers found — tests at src/utils/misc/type.rs:29–40 and cache usage at src/state_manager/cache.rs:49; none use Cow-specific methods. Because short_type_name is public, this still breaks external consumers. Either keep compatibility (return Cow<'static, str>) or explicitly document the breaking change.
Suggested minimal compatibility diff:
@@ -use std::any::type_name; +use std::any::type_name; +use std::borrow::Cow; -pub fn short_type_name<T>() -> String { +pub fn short_type_name<T>() -> Cow<'static, str> { @@ - n.as_ref() - .strip_suffix("_") - .unwrap_or(n.as_ref()) - .to_string() + Cow::Owned( + n.as_ref() + .strip_suffix("_") + .unwrap_or(n.as_ref()) + .to_string() + ) }
4c432c8 to
7327a34
Compare
7327a34 to
2daa377
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/state_manager/cache.rs (1)
68-125: Pending-entry leak on compute error; remove frompendingwhen computation fails.If
compute().awaitreturns Err, the(key, mtx)remains inpendingforever unless another successful compute happens later. Clean it up on error to avoid unbounded growth.Apply this diff:
- CacheLookupStatus::Empty(mtx) => { - let _guard = mtx.lock().await; - match self.get(key) { + CacheLookupStatus::Empty(mtx) => { + let _guard = mtx.lock().await; + let key_owned = key.clone(); + match self.get(key) { Some(v) => { // While locking someone else computed the pending task crate::metrics::LRU_CACHE_HIT .get_or_create(&crate::metrics::values::STATE_MANAGER_TIPSET) .inc(); Ok(v) } None => { // Entry does not have state computed yet, compute value and fill the cache crate::metrics::LRU_CACHE_MISS .get_or_create(&crate::metrics::values::STATE_MANAGER_TIPSET) .inc(); - let value = compute().await?; - - // Write back to cache, release lock and return value - self.insert(key.clone(), value.clone()); - Ok(value) + let res = compute().await; + match res { + Ok(value) => { + // Write back to cache, release lock and return value + self.insert(key_owned.clone(), value.clone()); + Ok(value) + } + Err(e) => { + // Remove stale pending entry on failure + self.with_inner(|inner| { + inner.pending.retain(|(k, _)| k != &key_owned); + }); + Err(e) + } + } } } }
🧹 Nitpick comments (2)
src/state_manager/cache.rs (2)
31-33: Sanitize identifiers before constructing metric names.Call sites currently pass safe literals, but this API allows arbitrary &str. Guard against regressions by normalizing to [a-z0-9_], lowering case, and replacing any other char with '' before composing "tipset_state{...}".
Apply this diff to enforce sanitization at the point of name construction:
- fn cache_name(cache_identifier: &str) -> String { - format!("tipset_state_{cache_identifier}") - } + fn cache_name(cache_identifier: &str) -> String { + let mut out = String::with_capacity("tipset_state_".len() + cache_identifier.len()); + out.push_str("tipset_state_"); + for ch in cache_identifier.chars() { + match ch { + 'a'..='z' | '0'..='9' | '_' => out.push(ch), + 'A'..='Z' => out.push(ch.to_ascii_lowercase()), + _ => out.push('_'), + } + } + out + }
47-58: API change looks good; consider documenting allowed identifier charset.Add a brief rustdoc noting the identifier should be [a-z0-9_]+ to remain Prometheus-safe.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.toml(0 hunks)src/state_manager/cache.rs(6 hunks)src/state_manager/mod.rs(1 hunks)src/utils/misc/mod.rs(0 hunks)src/utils/misc/type.rs(0 hunks)
💤 Files with no reviewable changes (3)
- Cargo.toml
- src/utils/misc/mod.rs
- src/utils/misc/type.rs
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hanabi1224
PR: ChainSafe/forest#5930
File: build.rs:64-77
Timestamp: 2025-08-13T09:43:20.301Z
Learning: hanabi1224 prefers hard compile-time errors in build scripts rather than runtime safeguards or collision detection, believing it's better to fail fast and fix root causes of issues like malformed snapshot names.
🧬 Code graph analysis (2)
src/state_manager/mod.rs (2)
src/chain/store/index.rs (1)
new(42-48)src/chain/store/chain_store.rs (2)
new(119-145)new(566-573)
src/state_manager/cache.rs (2)
src/utils/cache/lru.rs (2)
new_with_default_metrics_registry(96-101)cache(120-122)src/chain/store/index.rs (1)
new(42-48)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests-release
- GitHub Check: tests
- GitHub Check: Build forest binaries on Linux AMD64
- GitHub Check: cargo-publish-dry-run
- GitHub Check: Build Ubuntu
- GitHub Check: Build MacOS
- GitHub Check: All lint checks
🔇 Additional comments (5)
src/state_manager/cache.rs (4)
21-29: Per-cache identifiers: good step toward Prometheus-safe names.Explicit identifiers remove type-derived punctuation from metric names/labels and improve observability hygiene.
175-186: Good: stable, lowercase identifiers for per-cache metrics."events" and "receipts" are Prometheus-safe and consistent.
325-341: Tests updated to use explicit identifiers: LGTM.Covers basic, same-key concurrent, and different-keys concurrent paths with the new constructor.
Also applies to: 345-388, 392-430
21-29: Verify TipsetStateCache::new/with_size usage and metric namesrg returned "No files were searched" — verification incomplete. Run locally (or grant repo access) and paste outputs:
rg -nP '\bTipsetStateCache::new\s*\(\s*\)' --hidden -S || true rg -nP '\bTipsetStateCache::with_size\s*\(' -n -C2 --hidden -S || true rg -nP 'new_with_default_metrics_registry\([^)]*\)' -n -C2 --hidden -S || true rg -nP 'new_with_default_metrics_registry\(\s*"[^"]*[-./ <>%{}()#$@!~`+=,;][^"]*"' -n --hidden -S || trueCheck: src/state_manager/cache.rs lines 21-29, 31-33, 47-58, 175-186.
src/state_manager/mod.rs (1)
203-206: Explicit state cache identifier: good."state_output" keeps the metric label/name clean and consistent.
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist
Summary by CodeRabbit