[Metricbeat][SQL] Add cursor-based incremental data fetching to SQL module#48722
[Metricbeat][SQL] Add cursor-based incremental data fetching to SQL module#48722shmsr wants to merge 61 commits intoelastic:mainfrom
Conversation
🤖 GitHub commentsJust comment with:
|
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
Vale Linting ResultsSummary: 1 suggestion found 💡 Suggestions (1)
The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale. |
3f8e80d to
635398a
Compare
…odule
Add a cursor feature to the SQL query metricset that enables incremental
data fetching by tracking the last fetched row value. This is useful for
continuously appended data like audit logs, event tables, or time-series
data where users want to avoid re-fetching already-seen rows.
Key changes:
- New `cursor` sub-package with modular components:
- `config.go`: Cursor configuration and validation
- `cursor.go`: Manager for cursor lifecycle (init, update, persist)
- `store.go`: State persistence via libbeat statestore (memlog backend)
- `types.go`: Type-safe cursor value handling (integer, timestamp, date,
float, decimal) with database value parsing
- `placeholder.go`: Driver-specific query placeholder translation
(e.g., :cursor -> $1 for Postgres, ? for MySQL, @p1 for MSSQL)
- `doc.go`: Package documentation
- Modified `query.go` MetricSet:
- Cursor initialization gated behind `cursor.enabled: true`
- `Close()` method implementing `mb.Closer` for resource cleanup
- `fetchWithCursor()` for parameterized query execution
- Concurrent fetch prevention via `sync.Mutex.TryLock()`
- At-least-once delivery: events emitted before cursor state is saved
- Modified `metricbeat/helper/sql/sql.go`:
- Added `FetchTableModeWithParams()` for parameterized queries
- Added `mssql -> sqlserver` driver name mapping in `SwitchDriverName()`
(consistent with standalone MSSQL module which already uses sqlserver)
- Updated `queryDBNames()` and `dbSelector()` to accept both `mssql` and
`sqlserver` driver names
- Fixed Oracle timezone handling in integration tests: set session
`TIME_ZONE=UTC` to prevent godror driver timezone conversion issues
- Comprehensive test coverage:
- Unit tests for all cursor sub-package components
- Integration tests covering PostgreSQL, MySQL, MSSQL, and Oracle
- Tests for edge cases: NULL values, missing columns, type mismatches,
state persistence, descending cursors, concurrent fetch prevention
- Documentation updates for both module and metricset level docs
Supported cursor types: integer, timestamp, date, float, decimal
Supported directions: asc (track max, default), desc (track min)
66d98bc to
d83f11f
Compare
Replace 'e.g.' with 'for example' in doc.go and query.go comments to comply with Elastic Vale style guidelines (Elastic.Latinisms rule).
- Document cursor incompatibility with sql_queries and fetch_from_all_databases (users would get confusing errors otherwise) - Add MSSQL example using TOP instead of LIMIT - Add MSSQL driver-specific note (TOP, @p1 placeholder, mssql->sqlserver) - Expand timestamp type description with accepted formats - Document cursor state reset behavior on config changes - Clarify that raw_data.enabled is optional (not required for cursor)
- errcheck: wrap deferred db.Exec calls to handle return values in cursor_integration_test.go (lines 135, 692) - goimports: fix whitespace alignment in cursor_integration_test.go (lines 185, 823/986 trailing space) - gosec G115: suppress false-positive uint->int64 overflow warning in types.go:206 (bounds check exists on line 203) - Vale Elastic.WordChoice: replace 'may' with 'can'/'might' in docs (lines R221, R233, R234, R253) - Vale Elastic.Semicolons: replace semicolon with period in MSSQL driver note (line R268)
- cursor_test.go: fix comment alignment in table-driven test data - store.go: use Go 1.13+ octal literal 0o600 instead of 0600
🔍 Preview links for changed docs |
giorgi-imerlishvili-elastic
left a comment
There was a problem hiding this comment.
LGTM
Addressed this with a new opt-in feature. Please see: https://github.com/elastic/obs-integration-team/issues/830#issuecomment-3860018902 (See: "Mitigation for password rotation"); "cursor.state_id" is the new config. |
|
@claude pls review |
|
Reminder for review! |
|
Analyzed workflow run The only notable log line was a post-job cache save warning ( Tests/checks run in this workflow: docs/check pipeline ( What is this? | From workflow: PR Actions Detective Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not. |
Proposed commit message
Add a cursor feature to the SQL query metricset that enables incremental data fetching by tracking the last fetched row value. The implementation introduces a new
cursorsub-package underx-pack/metricbeat/module/sql/query/cursor/with the following architecture:config.go): Cursor configuration struct with validation — supportsenabled,column,type(integer/timestamp/date/float/decimal),default, anddirection(asc/desc) fields.cursor.go): Manages cursor lifecycle — initializes from persisted state or default, updates by scanning query results for the max (ascending) or min (descending) value of the cursor column, and persists state after each fetch cycle.store.go): State persistence vialibbeat/statestorewith the memlog backend (same battle-tested backend used by Filebeat's registry). State is stored at{data.path}/sql-cursor/. State keys are xxhash-based — derived from(moduleID, DSN, query, column)— so no secrets are stored in the key itself.types.go): Type-safe cursor value handling with parsing from both configuration strings and database result values. Supports int, int32, int64, uint, uint32, uint64, float32, float64,[]byte, string, andtime.Timeinputs. Thedecimaltype usesshopspring/decimalfor arbitrary precision.placeholder.go): Translates the:cursorplaceholder in user queries to driver-specific parameterized syntax ($1for Postgres,?for MySQL,@p1for MSSQL/sqlserver,:1for Oracle). Queries usedb.QueryContext(ctx, query, args...)— fully SQL-injection safe with no string interpolation.The
query.goMetricSet is extended with:cursor.enabled: truein configClose()method implementingmb.Closerfor proper statestore resource cleanupfetchWithCursor()path using the newFetchTableModeWithParams()helper for parameterized queriessync.Mutex.TryLock()(skips cycle if previous still running)The
metricbeat/helper/sql/sql.goshared helper gains:FetchTableModeWithParams(): Same asFetchTableModebut acceptsargs ...interface{}for parameterized queriesmssql → sqlservermapping inSwitchDriverName(): Aligns the SQL module with the standalone MSSQL module which already uses the modernsqlserverdriver. ThequeryDBNames()anddbSelector()functions are updated to match both driver names.The SQL module currently re-executes the full query on every collection cycle, fetching all rows each time. For append-only data sources (audit logs, event tables, time-series data), this causes:
The cursor feature solves this by maintaining a persistent bookmark (the cursor) that tracks the last fetched row's value in a specified column. Subsequent queries use this value as a filter parameter (e.g.,
WHERE id > :cursor), fetching only new rows. This is the same pattern used by Filebeat'sinput.cursorand is a highly requested feature for the SQL module.Please note that these changes were implemented using Cursor (AI), with systematic guidance and specific constraints.
Checklist
I have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability../changelog/fragmentsusing the changelog tool.Disruptive User Impact
None. The cursor feature is entirely opt-in via
cursor.enabled: truein the metricset configuration. When cursor is not configured (the default), the existing code path is completely unchanged — no new fields are initialized, no new methods are called.The
mssql → sqlserverdriver name mapping inSwitchDriverName()aligns with the standalone MSSQL module's existing behavior. Thesqlserverdriver is the modern, recommended driver and accepts the same URL-based connection strings. Existing MSSQL users of the SQL module should see no behavioral change.Author's Checklist
go test ./x-pack/metricbeat/module/sql/query/cursor/...)go test ./x-pack/metricbeat/module/sql/query/...)go test ./metricbeat/helper/sql/...)go test ./x-pack/metricbeat/module/mssql/...)go build ./x-pack/metricbeat/...)go vetclean on all modified packagesdsn.goandsql_test.goare unrelated)How to test this PR locally
Unit tests (no database required)
Integration tests (require database containers)
Manual end-to-end test
data/sql-cursor/)Use cases
Scenario: Incremental audit log ingestion
Scenario: Time-based incremental fetch
Scenario: Descending scan for latest-first workloads