In particular, PCRE2's JIT seems to optimize this particular pattern quite well, even when its UCP mode is enabled. See BurntSushi/ripgrep#3167 for an example.
Note that if you disable Unicode mode for Rust regex and have the perf-dfa-full
crate feature enabled, then performance for (?m)^\w+$
improves quite a bit. This is because the pattern becomes so small that a full DFA is built. This in turn permits state acceleration, which lets it skip around using memchr
on \n
.