Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(spans): Support lowercase redis commands #4235

Merged
merged 10 commits into from
Nov 12, 2024
Merged

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Nov 8, 2024

Not all redis integrations send uppercase redis commands, which results in redis spans being scrubbed as *. See

// NOTE: this should return `DEL *`, but we cannot detect lowercase command names yet.
assert_eq!(scrubbed.0.as_deref(), Some("*"));

This PR removes the redis regex and introduces a list of redis command that can be prefix-searched.

Fixes #3643

@jjbayer jjbayer changed the title Fix/spans redis lowercase fix(spans): Support lowercase redis commands Nov 8, 2024
@jjbayer jjbayer self-assigned this Nov 8, 2024
@jjbayer jjbayer marked this pull request as ready for review November 8, 2024 14:50
@jjbayer jjbayer requested a review from a team as a code owner November 8, 2024 14:50
@Dav1dde
Copy link
Member

Dav1dde commented Nov 8, 2024

Also might be fun to look into aho-corasick, it has MatchKind::LeftmostLongest and ascii_case_insensitive.

@jjbayer
Copy link
Member Author

jjbayer commented Nov 8, 2024

Also might be fun to look into aho-corasick, it has MatchKind::LeftmostLongest and ascii_case_insensitive.

I also considered using a trie for this, e.g. https://docs.rs/prefix-trie/latest/prefix_trie/.

@loewenheim
Copy link
Contributor

Also might be fun to look into aho-corasick, it has MatchKind::LeftmostLongest and ascii_case_insensitive.

… and everything already depends on it anyway, right?

@Dav1dde
Copy link
Member

Dav1dde commented Nov 8, 2024

I also considered using a trie ...

Yes! Trie's are very cool data structures.

… and everything already depends on it anyway, right?

yep^^

@Dav1dde
Copy link
Member

Dav1dde commented Nov 8, 2024

But tbf, nothing wrong with a good old binary search!


/// Returns a redis command if it is a case-insensitive prefix of `seek`.
pub fn matching_redis_command(seek: &str) -> Option<&str> {
let m = REDIS_COMMAND_MATCHER.find(&format!("{seek} "))?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only suboptimal part of this solution is that it adds a string allocation for every search string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be possible with two matchers, one with spaces at the end the other one without, if the matcher without spaces matches (after the one without failed) it must be the full command with no trailing data.

Alternative option, you can find_iter{overlapping} to get all matches then just select the longest one which has a space or end of input following. That seems pretty straight forward.

@jjbayer
Copy link
Member Author

jjbayer commented Nov 11, 2024

@Dav1dde restored my Aho-Corasick approach now. Only thing I don't like is the extra string allocation.

@jjbayer jjbayer requested a review from Dav1dde November 11, 2024 10:25
/// Returns a redis command if it is a case-insensitive prefix of `seek`.
pub fn matching_redis_command(needle: &str) -> Option<&str> {
let mut longest_match = None;
for command in REDIS_COMMANDS {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite expensive, could also build a simple Regex in a LazyLock (or aho-corasick, which the regex probably compiles to).

None | Some("") => command.to_string(),
Some(_other) => format!("{command} *"),
},
None => "*".to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
None => "*".to_string(),
None => "*".to_owned(),

@@ -0,0 +1,598 @@
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
use once_cell::sync::Lazy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std lib has a LazyLock now.


/// Returns a redis command if it is a case-insensitive prefix of `seek`.
pub fn matching_redis_command(seek: &str) -> Option<&str> {
let m = REDIS_COMMAND_MATCHER.find(&format!("{seek} "))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be possible with two matchers, one with spaces at the end the other one without, if the matcher without spaces matches (after the one without failed) it must be the full command with no trailing data.

Alternative option, you can find_iter{overlapping} to get all matches then just select the longest one which has a space or end of input following. That seems pretty straight forward.

///
/// See <https://redis.io/docs/latest/commands/>.
static COMMAND_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r"(?i)(?-u:\s*)(?P<command>ZUNIONSTORE|ZUNION|ZSCORE|ZSCAN|ZREVRANK|ZREVRANGEBYSCORE|ZREVRANGEBYLEX|ZREVRANGE|ZREMRANGEBYSCORE|ZREMRANGEBYRANK|ZREMRANGEBYLEX|ZREM|ZRANK|ZRANGESTORE|ZRANGEBYSCORE|ZRANGEBYLEX|ZRANGE|ZRANDMEMBER|ZPOPMIN|ZPOPMAX|ZMSCORE|ZMPOP|ZLEXCOUNT|ZINTERSTORE|ZINTERCARD|ZINTER|ZINCRBY|ZDIFFSTORE|ZDIFF|ZCOUNT|ZCARD|ZADD|XTRIM|XSETID|XREVRANGE|XREADGROUP|XREAD|XRANGE|XPENDING|XLEN|XINFO STREAM|XINFO HELP|XINFO GROUPS|XINFO CONSUMERS|XINFO|XGROUP SETID|XGROUP HELP|XGROUP DESTROY|XGROUP DELCONSUMER|XGROUP CREATECONSUMER|XGROUP CREATE|XGROUP|XDEL|XCLAIM|XAUTOCLAIM|XADD|XACK|WATCH|WAITAOF|WAIT|UNWATCH|UNSUBSCRIBE|UNLINK|TYPE|TTL|TS.REVRANGE|TS.RANGE|TS.QUERYINDEX|TS.MREVRANGE|TS.MRANGE|TS.MGET|TS.MADD|TS.INFO|TS.INCRBY|TS.GET|TS.DELETERULE|TS.DEL|TS.DECRBY|TS.CREATERULE|TS.CREATE|TS.ALTER|TS.ADD|TOUCH|TOPK.RESERVE|TOPK.QUERY|TOPK.LIST|TOPK.INFO|TOPK.INCRBY|TOPK.COUNT|TOPK.ADD|TIME|TDIGEST.TRIMMED_MEAN|TDIGEST.REVRANK|TDIGEST.RESET|TDIGEST.RANK|TDIGEST.QUANTILE|TDIGEST.MIN|TDIGEST.MERGE|TDIGEST.MAX|TDIGEST.INFO|TDIGEST.CREATE|TDIGEST.CDF|TDIGEST.BYREVRANK|TDIGEST.BYRANK|TDIGEST.ADD|SYNC|SWAPDB|SUNSUBSCRIBE|SUNIONSTORE|SUNION|SUBSTR|SUBSCRIBE|STRLEN|SSUBSCRIBE|SSCAN|SREM|SRANDMEMBER|SPUBLISH|SPOP|SORT_RO|SORT|SMOVE|SMISMEMBER|SMEMBERS|SLOWLOG RESET|SLOWLOG LEN|SLOWLOG HELP|SLOWLOG GET|SLOWLOG|SLAVEOF|SISMEMBER|SINTERSTORE|SINTERCARD|SINTER|SHUTDOWN|SETRANGE|SETNX|SETEX|SETBIT|SET|SELECT|SDIFFSTORE|SDIFF|SCRIPT LOAD|SCRIPT KILL|SCRIPT HELP|SCRIPT FLUSH|SCRIPT EXISTS|SCRIPT DEBUG|SCRIPT|SCARD|SCAN|SAVE|SADD|RPUSHX|RPUSH|RPOPLPUSH|RPOP|ROLE|RESTORE-ASKING|RESTORE|RESET|REPLICAOF|REPLCONF|RENAMENX|RENAME|READWRITE|READONLY|RANDOMKEY|QUIT|PUNSUBSCRIBE|PUBSUB SHARDNUMSUB|PUBSUB SHARDCHANNELS|PUBSUB NUMSUB|PUBSUB NUMPAT|PUBSUB HELP|PUBSUB CHANNELS|PUBSUB|PUBLISH|PTTL|PSYNC|PSUBSCRIBE|PSETEX|PING|PFSELFTEST|PFMERGE|PFDEBUG|PFCOUNT|PFADD|PEXPIRETIME|PEXPIREAT|PEXPIRE|PERSIST|OBJECT REFCOUNT|OBJECT IDLETIME|OBJECT HELP|OBJECT FREQ|OBJECT ENCODING|OBJECT|MULTI|MSETNX|MSET|MOVE|MONITOR|MODULE UNLOAD|MODULE LOADEX|MODULE LOAD|MODULE LIST|MODULE HELP|MODULE|MIGRATE|MGET|MEMORY USAGE|MEMORY STATS|MEMORY PURGE|MEMORY MALLOC-STATS|MEMORY HELP|MEMORY DOCTOR|MEMORY|LTRIM|LSET|LREM|LRANGE|LPUSHX|LPUSH|LPOS|LPOP|LOLWUT|LMPOP|LMOVE|LLEN|LINSERT|LINDEX|LCS|LATENCY RESET|LATENCY LATEST|LATENCY HISTORY|LATENCY HISTOGRAM|LATENCY HELP|LATENCY GRAPH|LATENCY DOCTOR|LATENCY|LASTSAVE|KEYS|JSON.TYPE|JSON.TOGGLE|JSON.STRLEN|JSON.STRAPPEND|JSON.SET|JSON.RESP|JSON.OBJLEN|JSON.OBJKEYS|JSON.NUMMULTBY|JSON.NUMINCRBY|JSON.MSET|JSON.MGET|JSON.MERGE|JSON.GET|JSON.FORGET|JSON.DEL|JSON.DEBUG MEMORY|JSON.DEBUG HELP|JSON.DEBUG|JSON.CLEAR|JSON.ARRTRIM|JSON.ARRPOP|JSON.ARRLEN|JSON.ARRINSERT|JSON.ARRINDEX|JSON.ARRAPPEND|INFO|INCRBYFLOAT|INCRBY|INCR|HVALS|HTTL|HSTRLEN|HSETNX|HSET|HSCAN|HRANDFIELD|HPTTL|HPEXPIRETIME|HPEXPIREAT|HPEXPIRE|HPERSIST|HMSET|HMGET|HLEN|HKEYS|HINCRBYFLOAT|HINCRBY|HGETALL|HGET|HEXPIRETIME|HEXPIREAT|HEXPIRE|HEXISTS|HELLO|HDEL|GETSET|GETRANGE|GETEX|GETDEL|GETBIT|GET|GEOSEARCHSTORE|GEOSEARCH|GEORADIUS_RO|GEORADIUSBYMEMBER_RO|GEORADIUSBYMEMBER|GEORADIUS|GEOPOS|GEOHASH|GEODIST|GEOADD|FUNCTION STATS|FUNCTION RESTORE|FUNCTION LOAD|FUNCTION LIST|FUNCTION KILL|FUNCTION HELP|FUNCTION FLUSH|FUNCTION DUMP|FUNCTION DELETE|FUNCTION|FT._LIST|FT.TAGVALS|FT.SYNUPDATE|FT.SYNDUMP|FT.SUGLEN|FT.SUGGET|FT.SUGDEL|FT.SUGADD|FT.SPELLCHECK|FT.SEARCH|FT.PROFILE|FT.INFO|FT.EXPLAINCLI|FT.EXPLAIN|FT.DROPINDEX|FT.DICTDUMP|FT.DICTDEL|FT.DICTADD|FT.CURSOR READ|FT.CURSOR DEL|FT.CREATE|FT.CONFIG SET|FT.CONFIG HELP|FT.CONFIG GET|FT.ALTER|FT.ALIASUPDATE|FT.ALIASDEL|FT.ALIASADD|FT.AGGREGATE|FLUSHDB|FLUSHALL|FCALL_RO|FCALL|FAILOVER|EXPIRETIME|EXPIREAT|EXPIRE|EXISTS|EXEC|EVAL_RO|EVALSHA_RO|EVALSHA|EVAL|ECHO|DUMP|DISCARD|DEL|DECRBY|DECR|DEBUG|DBSIZE|COPY|CONFIG SET|CONFIG REWRITE|CONFIG RESETSTAT|CONFIG HELP|CONFIG GET|CONFIG|COMMAND LIST|COMMAND INFO|COMMAND HELP|COMMAND GETKEYSANDFLAGS|COMMAND GETKEYS|COMMAND DOCS|COMMAND COUNT|COMMAND|CMS.QUERY|CMS.MERGE|CMS.INITBYPROB|CMS.INITBYDIM|CMS.INFO|CMS.INCRBY|CLUSTER SLOTS|CLUSTER SLAVES|CLUSTER SHARDS|CLUSTER SETSLOT|CLUSTER SET-CONFIG-EPOCH|CLUSTER SAVECONFIG|CLUSTER RESET|CLUSTER REPLICATE|CLUSTER REPLICAS|CLUSTER NODES|CLUSTER MYSHARDID|CLUSTER MYID|CLUSTER MEET|CLUSTER LINKS|CLUSTER KEYSLOT|CLUSTER INFO|CLUSTER HELP|CLUSTER GETKEYSINSLOT|CLUSTER FORGET|CLUSTER FLUSHSLOTS|CLUSTER FAILOVER|CLUSTER DELSLOTSRANGE|CLUSTER DELSLOTS|CLUSTER COUNTKEYSINSLOT|CLUSTER COUNT-FAILURE-REPORTS|CLUSTER BUMPEPOCH|CLUSTER ADDSLOTSRANGE|CLUSTER ADDSLOTS|CLUSTER|CLIENT UNPAUSE|CLIENT UNBLOCK|CLIENT TRACKINGINFO|CLIENT TRACKING|CLIENT SETNAME|CLIENT SETINFO|CLIENT REPLY|CLIENT PAUSE|CLIENT NO-TOUCH|CLIENT NO-EVICT|CLIENT LIST|CLIENT KILL|CLIENT INFO|CLIENT ID|CLIENT HELP|CLIENT GETREDIR|CLIENT GETNAME|CLIENT CACHING|CLIENT|CF.SCANDUMP|CF.RESERVE|CF.MEXISTS|CF.LOADCHUNK|CF.INSERTNX|CF.INSERT|CF.INFO|CF.EXISTS|CF.DEL|CF.COUNT|CF.ADDNX|CF.ADD|BZPOPMIN|BZPOPMAX|BZMPOP|BRPOPLPUSH|BRPOP|BLPOP|BLMPOP|BLMOVE|BITPOS|BITOP|BITFIELD_RO|BITFIELD|BITCOUNT|BGSAVE|BGREWRITEAOF|BF.SCANDUMP|BF.RESERVE|BF.MEXISTS|BF.MADD|BF.LOADCHUNK|BF.INSERT|BF.INFO|BF.EXISTS|BF.CARD|BF.ADD|AUTH|ASKING|APPEND|ACL WHOAMI|ACL USERS|ACL SETUSER|ACL SAVE|ACL LOG|ACL LOAD|ACL LIST|ACL HELP|ACL GETUSER|ACL GENPASS|ACL DRYRUN|ACL DELUSER|ACL CAT|ACL)(?-u:\s|$)").unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can disable unicode for the regex with (?-u) in the beginning, that's what globset does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It complained because I did not escape dots in redis commands. Fixed now.

Cargo.toml Outdated Show resolved Hide resolved
relay-event-normalization/Cargo.toml Outdated Show resolved Hide resolved
@jjbayer jjbayer merged commit 1c1e4de into master Nov 12, 2024
23 checks passed
@jjbayer jjbayer deleted the fix/spans-redis-lowercase branch November 12, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parse & scrub lowercase redis commands
3 participants