refactor: parameterize SQL queries in analytics repositories for improved security#2387
Conversation
…oved security and maintainability
…ction-instances-in-clickhouse
WalkthroughThis PR systematically converts SQL query construction across 10 repository files from direct string interpolation to parameterized query bindings. Values for dates, IDs, multipliers, and granularity are now passed via typed parameter objects to queryPromise instead of being embedded directly in SQL strings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
controlplane/src/core/repositories/analytics/UsageRepository.ts (1)
116-144: Align placeholder types toUInt32ingetMetafor consistencyThe
getMetamethod uses{startDate:String}and{endDate:String}placeholders, butparseTimeFiltersreturns numeric Unix timestamp values (viagetUnixTimeInSeconds), which are passed asdateRange.startanddateRange.end. This misaligns withgetUsageRequestSeriesandgetClientsWithOperations, which both correctly use{startDate:UInt32}and{endDate:UInt32}for the same numeric values from the sameTimeFiltersobject.Change lines 127-128:
- toDateTime({startDate:String}) AS startDate, - toDateTime({endDate:String}) AS endDate + toDateTime({startDate:UInt32}) AS startDate, + toDateTime({endDate:UInt32}) AS endDate
🧹 Nitpick comments (3)
controlplane/src/core/repositories/analytics/AnalyticsDashboardViewRepository.ts (2)
289-306: Same IN clause pattern as above.The subgraph ID handling here follows the same escaped interpolation pattern. If the array parameter approach is adopted for
getSubgraphRates, apply it here as well for consistency.
227-252: SubgraphID IN clause uses string interpolation instead of parameterized approach.While the subgraph IDs are escaped, they're still interpolated into the SQL string. Consider parameterizing this as an array once the ClickHouseClient is enhanced to support Array(String) serialization. Currently, the client only handles primitive types (string, number, boolean) passed as URL query parameters. The proper pattern would be:
- // Properly escape subgraph IDs for SQL - const escapedSubgraphIds = subgraphs.map((s) => `'${s.id.replace(/'/g, "''")}'`).join(','); ... - AND SubgraphID IN (${escapedSubgraphIds}) + AND SubgraphID IN ({subgraphIds:Array(String)}) ... const params = { start: dateRange.start, end: dateRange.end, multiplier, federatedGraphId, organizationId, + subgraphIds: subgraphs.map((s) => s.id), };This applies similarly to
getSubgraphLatencyat lines 270–306. Requires enhancing ClickHouseClient._getRequestOptions to serialize array parameters.controlplane/src/core/repositories/analytics/UsageRepository.ts (1)
195-281: Escaping ingetUnusedFieldshardens tuple construction against injectionThe new
arrayJoinFieldslogic escapes single quotes inField.nameandField.typeNamebefore embedding them in the tuple literal, which is the main character needed to safely stay inside SQL string literals. Combined with parameterizedOrganizationID/FederatedGraphID/ date bounds, this significantly reduces injection risk for these dynamically generated tuples.If you later want to go further, you could consider generating these tuples via repeated placeholders instead of literal construction, but that’s not strictly necessary given the current escaping.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
controlplane/src/core/repositories/CacheWarmerRepository.ts(3 hunks)controlplane/src/core/repositories/analytics/AnalyticsDashboardViewRepository.ts(8 hunks)controlplane/src/core/repositories/analytics/AnalyticsRequestViewRepository.ts(2 hunks)controlplane/src/core/repositories/analytics/MetricsRepository.ts(14 hunks)controlplane/src/core/repositories/analytics/MonthlyRequestViewRepository.ts(1 hunks)controlplane/src/core/repositories/analytics/RouterMetricsRepository.ts(2 hunks)controlplane/src/core/repositories/analytics/SubgraphMetricsRepository.ts(12 hunks)controlplane/src/core/repositories/analytics/TraceRepository.ts(3 hunks)controlplane/src/core/repositories/analytics/UsageRepository.ts(15 hunks)controlplane/src/core/services/SchemaUsageTrafficInspector.ts(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-29T10:28:04.846Z
Learnt from: JivusAyrus
Repo: wundergraph/cosmo PR: 2156
File: controlplane/src/core/repositories/SubgraphRepository.ts:1749-1751
Timestamp: 2025-08-29T10:28:04.846Z
Learning: In the controlplane codebase, authentication and authorization checks (including organization scoping) are handled at the service layer in files like unlinkSubgraph.ts before calling repository methods. Repository methods like unlinkSubgraph() in SubgraphRepository.ts can focus purely on data operations without redundant security checks.
Applied to files:
controlplane/src/core/repositories/analytics/AnalyticsRequestViewRepository.tscontrolplane/src/core/repositories/analytics/TraceRepository.tscontrolplane/src/core/repositories/analytics/SubgraphMetricsRepository.tscontrolplane/src/core/services/SchemaUsageTrafficInspector.ts
📚 Learning: 2025-09-10T09:59:17.257Z
Learnt from: JivusAyrus
Repo: wundergraph/cosmo PR: 2156
File: controlplane/src/core/bufservices/subgraph/checkSubgraphSchema.ts:417-419
Timestamp: 2025-09-10T09:59:17.257Z
Learning: When skipTrafficCheck is true in schema checks, the traffic inspection loop is completely bypassed (as shown in line 2040 of SubgraphRepository.ts with the continue statement), which means hasClientTraffic will be false. The traffic analysis doesn't run at all when skipTrafficCheck is true, unlike the previous understanding that it always ran to collect usage data.
Applied to files:
controlplane/src/core/services/SchemaUsageTrafficInspector.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build_test
- GitHub Check: build_push_image
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (32)
controlplane/src/core/repositories/CacheWarmerRepository.ts (1)
49-96: LGTM! Proper parameterization implemented.The SQL query has been correctly converted to use typed parameter bindings for all dynamic values (dates, IDs, quantile, limits). This eliminates SQL injection risks and improves maintainability.
controlplane/src/core/repositories/analytics/MonthlyRequestViewRepository.ts (1)
14-22: LGTM!Clean parameterization of the
OrganizationIDfilter. The placeholder syntax{organizationId:String}and params object pattern are correctly implemented for ClickHouse.controlplane/src/core/repositories/analytics/AnalyticsRequestViewRepository.ts (2)
504-506: LGTM!Correct parameterization for client name filtering. The comment clarifies the intent.
720-722: LGTM!The security-critical scoping of data to organization and graph is now properly parameterized, preventing SQL injection in these authorization filters.
controlplane/src/core/repositories/analytics/TraceRepository.ts (2)
64-66: LGTM!Good parameterization of all security-sensitive fields in both the base and recursive parts of the CTE. The
{traceID:String} AS trace_idbinding in the WITH clause is valid ClickHouse syntax.Also applies to: 76-79
97-104: LGTM!Clean params object construction with all required parameters passed to
queryPromise.controlplane/src/core/services/SchemaUsageTrafficInspector.ts (2)
67-77: LGTM!Correct parameterization for
namedType,typeName, andfieldNamefilters.
95-98: LGTM!Date interval and ID filters are properly parameterized with correct ClickHouse type annotations.
controlplane/src/core/repositories/analytics/AnalyticsDashboardViewRepository.ts (4)
34-49: LGTM!Clean parameterization for the weekly request series query.
75-106: LGTM!Correct parameterization including the granule interval. The
{granule:UInt32}type annotation is appropriate for time intervals.
130-144: LGTM!Properly parameterized date range and ID filters.
168-187: LGTM!The
{multiplier:Float64}type annotation is correct for the rate calculation divisor.controlplane/src/core/repositories/analytics/SubgraphMetricsRepository.ts (5)
74-210: Request‑rate metrics parameterization is consistent and complete
getSubgraphRequestRateMetricsnow correctly builds params objects (queryRate,top5Params,querySeries) that cover all placeholders used in the SQL (startDate,endDate,organizationId,subgraphId,multiplier,granule) and merges inqueryParamsfor filter placeholders. This looks functionally equivalent to the interpolated version and avoids direct value injection.
215-405: Latency metrics queries use fully bound parameters without regressionsIn
getSubgraphLatencyMetrics, the newparamsobjects forqueryLatency,queryTop5, andquerySeriescorrectly providestartDate,endDate,organizationId,subgraphId,granule, and numericquantilefor the{…}placeholders, alongsidequeryParams. This keeps the histogram logic intact while removing inline interpolation.
411-557: Error percentage + series parameterization looks correctFor
getSubgraphErrorMetrics, bothqueryPercentage,top5Params, andgetSeriesare now parameterized with the expected keys and still reusewhereSqlplusqueryParamsfor dynamic filters. I don’t see any missing or unused parameters here.
563-605: Error‑rate series now safely parameterized
getSubgraphErrorRateMetrics’s singleparamsobject aligns with all placeholders in the series query (startDate,endDate,organizationId,subgraphId,granule). The change is straightforward and removes the remaining interpolation of IDs and time bounds.
749-782: Filter query parameterization for subgraph metrics is sound
getSubgraphMetricFiltersnow bindsstartDate,endDate,organizationId, andsubgraphIdas parameters and passes them viaqueryPromise(query, params), while leaving the projection and grouping unchanged. This is consistent with the rest of the analytics repositories in this PR.controlplane/src/core/repositories/analytics/MetricsRepository.ts (8)
62-198: Request‑rate metrics parameterization matches placeholdersIn
getRequestRateMetrics, the newparams,top5Params, andquerySeriesparams objects correctly supplystartDate,endDate,organizationId,federatedGraphId,multiplier, andgranuleplusqueryParamsfor filter placeholders. The SQL only references those names, so this is a clean, safe replacement for string interpolation.
203-394: Latency metrics parameter binding is coherent across all helpers
getLatencyMetrics’squeryLatency,queryTop5, andquerySerieseach construct params objects that fully cover the used placeholders (startDate,endDate,organizationId,federatedGraphId,granule,quantile) and preserve the histogram logic. UsingNumber.parseFloat(quantile)keeps the quantile numeric while avoiding inline literals.
399-546: Error metrics queries are correctly parameterizedFor
getErrorMetrics, bothqueryPercentageand the top‑5 / series queries now take params objects that supplystartDate,endDate,organizationId,federatedGraphId, andgranulealong withqueryParamsfor dynamic filters. This eliminates direct interpolation of IDs and times without changing the aggregation semantics.
551-606: Error‑rate series parameterization mirrors other series helpers
getErrorRateMetrics’sparamsobject lines up with all{…}placeholders in the SQL (startDate,endDate,organizationId,federatedGraphId,granule). The mapping over the result set is unchanged, so this should be a safe refactor.
719-783: Metric filters query now safely binds IDs and date range
getMetricFiltersadds aparamsobject forstartDate,endDate,organizationId, andfederatedGraphIdand passes it toqueryPromise. That matches all placeholders in the filter query and keeps the filter options construction logic intact.
820-1051: Operations listing: parameter coverage for pagination, search, and deprecated CTEs looks completeIn
getOperations,queryParamsWithPaginationcorrectly merges:
- filter
queryParams- pagination (
limit,offset)- IDs and time bounds (
organizationId,federatedGraphId,startDate,endDate)- optional
searchQueryPattern- optional
deprecatedStart/deprecatedEndwhendeprecatedFields.length > 0All branches (latency/requests/errors) reference only these placeholders (plus any from
whereSqlbuilt fromqueryParams), so the refactor removes inline values without introducing missing/bogus params.
1068-1208: Operations count: parameterized variants for both deprecated and non‑deprecated paths are consistent
getOperationsCountnow:
- Uses the same
{startDate:UInt32},{endDate:UInt32},{organizationId:String},{federatedGraphId:String}, and optional{searchQueryPattern:String}placeholders as other queries.- Supplies them via the inline
queryPromisecalls’ params objects in both the deprecated‑only and general cases.This keeps behavior aligned with
getOperationswhile avoiding raw interpolation of IDs, dates, and search text.
1211-1238: Clients list query: IDs and date range now bound instead of interpolated
getClientsnow builds aparamsobject forstartDate,endDate,organizationId, andfederatedGraphId, matching the placeholders in the query. The mapping of results is unchanged; this is a straightforward hardening.controlplane/src/core/repositories/analytics/UsageRepository.ts (5)
15-58: Usage request series: shared params and placeholders are aligned
getUsageRequestSeriesnow accepts aparamsobject, extends it withstartDate,endDate, andgranule, and uses{startDate:UInt32},{endDate:UInt32},{granule:UInt32}placeholders. Combined withwhereSql‑driven placeholders fromparams, this removes inline IDs/timestamps cleanly.
60-114: Clients‑with‑operations query uses params consistently
getClientsWithOperationsfollows the same pattern—mergingparamswithstartDate/endDateand binding them via{startDate:UInt32}/{endDate:UInt32}. That matches the SQL and keeps the grouping logic untouched.
150-193: Shared params ingetFieldUsageare wired correctly into all three helpers
getFieldUsagenow builds a singleparamsobject (IDs + optional typename/field/namedType) and awhereSqlthat references{…:String}placeholders. Passing that samewhereSql/timeFilters/paramstriple intogetUsageRequestSeries,getClientsWithOperations, andgetMetakeeps everything in sync and centralizes the binding logic.
283-366:getUsedFieldsapplies the same safe escaping and parameterization pattern
getUsedFieldsmirrorsgetUnusedFields: escaping single quotes in field names/typeNames, using{startDate:UInt32}/{endDate:UInt32}for time bounds, and bindingorganizationId/federatedGraphIdas parameters. This keeps behavior consistent between “used” and “unused” queries.
368-460: Deprecated‑fields‑per‑operation query is correctly parameterized
getDeprecatedFieldsUsedInOperation:
- Builds
deprecatedFieldsArraywith names/typeNames that escape single quotes.- Uses
{startDate:UInt32},{endDate:UInt32},{organizationId:String},{federatedGraphId:String},{operationHash:String}, and optional{operationName:String}placeholders.- Supplies all of these keys in the
paramsobject, only addingoperationNamewhen needed.That’s a solid hardening over any previous string interpolation.
controlplane/src/core/repositories/analytics/RouterMetricsRepository.ts (2)
29-64: Router runtime query now binds IDs via params as intended
getRouterRuntimecorrectly replaces direct embedding ofFederatedGraphID,OrganizationID, andServiceInstanceIDwith{…:String}placeholders and passes a matchingparamsobject. The aggregation logic overMetricName/MetricValueis untouched.
144-181: Active routers query parameterization is correct and minimal
getActiveRoutersnow uses{federatedGraphId:String}and{organizationId:String}placeholders and binds them via theparamsobject. This achieves the PR’s security goal here without altering the shape of the returnedRouterDTO[].
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
Checklist