Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pattern with wildcard and globstar can't match correctly when using glob_match #6668

Merged
merged 3 commits into from
Jul 16, 2024

Conversation

shulaoda
Copy link
Contributor

Summary

Fixed #6245 and #6613

To solve this problem, I have invested a lot of effort in understanding the glob_match. Due to the original author not maintaining it for a long time, I have republished the fast_glob.

Using fast_glob, you can get:

  • Faster problem response
  • Nearly 60% performance improvement

The original intention of fast_glob is only to promptly fix the first issue mentioned above.

Checklist

  • Tests updated (or not required).
  • Documentation updated (or not required).

@github-actions github-actions bot added the release: bug fix release: bug related release(mr only) label May 30, 2024
Copy link

netlify bot commented May 30, 2024

Deploy Preview for rspack canceled.

Built without sensitive environment variables

Name Link
🔨 Latest commit c19b0df
🔍 Latest deploy log https://app.netlify.com/sites/rspack/deploys/6658161eb08fec000838f174

Copy link

netlify bot commented May 30, 2024

Deploy Preview for rspack canceled.

Built without sensitive environment variables

Name Link
🔨 Latest commit f2bbbff
🔍 Latest deploy log https://app.netlify.com/sites/rspack/deploys/668fe49197bd5c0008792f79

@shulaoda
Copy link
Contributor Author

There are some changes in test cases compared to glob_match, please check PR#15.

@SyMind
Copy link
Member

SyMind commented May 30, 2024

@JSerFeng I believe we can make use fast_glob. I have ported test cases from glob-to-regex and have successfully executed all the tests.

Link to the test cases: https://github.com/SyMind/glob-test

@LingyuCoder
Copy link
Collaborator

@JSerFeng I believe we can make use fast_glob. I have ported test cases from glob-to-regex and have successfully executed all the tests.

Link to the test cases: https://github.com/SyMind/glob-test

need a benchmark

@shulaoda
Copy link
Contributor Author

Is the Brace Expansion feature necessary? @LingyuCoder

@shulaoda
Copy link
Contributor Author

shulaoda commented Jul 10, 2024

I haven't replied to this PR for a long time because I have encountered a serious issue related to Brace Explanation. Below are the fixes and optimizations I have made to address this issue during this period:

Examples

Simple Match

Note that simple matching does not support brace expansion, but all other syntaxes do.

use fast_glob::glob_match;

let glob = "some/**/n*d[k-m]e?txt";
let path = "some/a/bigger/path/to/the/crazy/needle.txt";

assert!(glob_match(glob, path));

Brace Expansion

Brace expansion is supported by using glob_match_with_brace. While the performance is lower than simple match, some performance loss is inevitable due to the complexity of brace expansion.

use fast_glob::glob_match_with_brace;

let glob = "some/**/{the,crazy}/?*.{png,txt}";
let path = "some/a/bigger/path/to/the/crazy/needle.txt";

assert!(glob_match_with_brace(glob, path));

Multi-Pattern Matching

You can build a matcher like globset and add multiple patterns to match.

use fast_glob::Glob;

// let mut glob = Glob::new(glob);
let mut glob = Glob::default();

assert!(glob.add("*.txt"));
assert!(glob.is_match("name.txt"));

Benchmark

Test Case 1

const GLOB: &'static str = "some/**/n*d[k-m]e?txt";
const PATH: &'static str = "some/a/bigger/path/to/the/crazy/needle.txt";
mine                       time:   [63.347 ns 63.389 ns 63.450 ns]
glob                       time:   [380.53 ns 382.89 ns 386.44 ns]
globset                    time:   [27.686 µs 27.696 µs 27.707 µs]
glob_match                 time:   [197.79 ns 198.67 ns 199.79 ns]

Test Case 2

const GLOB: &'static str = "some/**/{tob,crazy}/?*.{png,txt}";
const PATH: &'static str = "some/a/bigger/path/to/the/crazy/needle.txt";
mine                       time:   [403.06 ns 405.17 ns 407.76 ns]
globset                    time:   [36.929 µs 37.284 µs 37.731 µs]
glob_match                 time:   [367.10 ns 369.04 ns 371.61 ns]

Q&A

Why not use the more efficient glob_match for brace expansion?

glob_match is unable to handle complex brace expansions. Below are some failed examples:

  • glob_match("{a/b,a/b/c}/c", "a/b/c")
  • glob_match("**/foo{bar,b*z}", "foobuzz")
  • glob_match("**/{a,b}/c.png", "some/a/b/c.png")

Due to these limitations, brace expansion requires a different implementation that can handle the complexity of such patterns, resulting in some performance trade-offs.

@LingyuCoder
Copy link
Collaborator

Is the Brace Expansion feature necessary? @LingyuCoder

sideEffects is widely used in various component libraries, and if there is one library that uses Brace Expansion, I think it needs to be supported.

@shulaoda
Copy link
Contributor Author

Is there anything else I need to do? @LingyuCoder

@LingyuCoder
Copy link
Collaborator

LingyuCoder commented Jul 10, 2024

Is there anything else I need to do? @LingyuCoder

LGTM, cc @SyMind @JSerFeng

@JSerFeng
Copy link
Collaborator

LGTM

@JSerFeng JSerFeng requested a review from ahabhgk July 10, 2024 09:59
@SyMind
Copy link
Member

SyMind commented Jul 11, 2024

@shulaoda the performance of glob_match_with_brace is more than twice as slow as glob_match. I don't think can accept it.

@shulaoda
Copy link
Contributor Author

@shulaoda the performance of glob_match_with_brace is more than twice as slow as glob_match. I don't think can accept it.

This is because it needs to support complex brace_expansion, and glob_match has many flaws in this regard. If you use glob_matche_with_brace for simple match, you will find that the performance is still better than glob_match.

I believe that priority should be given to ensuring the soundness of the functionality, as a certain degree of performance loss is inevitable.

@shulaoda
Copy link
Contributor Author

@shulaoda the performance of glob_match_with_brace is more than twice as slow as glob_match. I don't think can accept it.

Perhaps we should give up on brace_expansion, as achieving sound functionality and high performance is a very difficult task.

JSerFeng
JSerFeng previously approved these changes Jul 11, 2024
@JSerFeng
Copy link
Collaborator

also fix #7100

@JSerFeng
Copy link
Collaborator

!bench

@rspack-bot
Copy link

rspack-bot commented Jul 11, 2024

📝 Benchmark detail: Open

Name Base (2024-07-11 78c1194) Current Change
10000_development-mode + exec 2.22 s ± 22 ms 2.19 s ± 12 ms -1.67 %
10000_development-mode_hmr + exec 692 ms ± 6.5 ms 696 ms ± 5.7 ms +0.56 %
10000_production-mode + exec 2.86 s ± 45 ms 2.77 s ± 34 ms -3.03 %
arco-pro_development-mode + exec 1.93 s ± 72 ms 1.91 s ± 61 ms -0.81 %
arco-pro_development-mode_hmr + exec 434 ms ± 1.6 ms 434 ms ± 1.4 ms -0.09 %
arco-pro_production-mode + exec 3.44 s ± 77 ms 3.44 s ± 101 ms +0.05 %
threejs_development-mode_10x + exec 1.63 s ± 14 ms 1.6 s ± 11 ms -1.77 %
threejs_development-mode_10x_hmr + exec 825 ms ± 11 ms 804 ms ± 5.4 ms -2.57 %
threejs_production-mode_10x + exec 5.61 s ± 19 ms 5.6 s ± 27 ms -0.18 %

@LingyuCoder
Copy link
Collaborator

@shulaoda the performance of glob_match_with_brace is more than twice as slow as glob_match. I don't think can accept it.

Perhaps we should give up on brace_expansion, as achieving sound functionality and high performance is a very difficult task.

Can you provide an option to control whether to enable complex brace expansion or not.

If it is not enabled, only the same processing logic as glob_match will be used, so as to have better performance.

@shulaoda
Copy link
Contributor Author

Can you provide an option to control whether to enable complex brace expansion or not.

If it is not enabled, only the same processing logic as glob_match will be used, so as to have better performance.

I can only control it to use glob_match when it comes to simple match.

This is because there are many issues with the data structure of glob_match, and I have adopted a completely new data structure.

@LingyuCoder
Copy link
Collaborator

Can you provide an option to control whether to enable complex brace expansion or not.
If it is not enabled, only the same processing logic as glob_match will be used, so as to have better performance.

I can only control it to use glob_match when it comes to simple match.

This is because there are many issues with the data structure of glob_match, and I have adopted a completely new data structure.

Is this switch automatic or configuration-driven?

@shulaoda
Copy link
Contributor Author

Is this switch automatic or configuration-driven?

glob_match_with_brace will first scan the glob. If it is a simple match, it will go directly to glob_match. Otherwise, it will perform the relevant operations of brace-expansion.

Please wait, I will show you more benchmark information.

@shulaoda
Copy link
Contributor Author

I made a mistake, the matching of brace-expansion with glob_match in the benchmark is a simple match. I have updated the benchmark data.

image

@shulaoda
Copy link
Contributor Author

shulaoda commented Jul 11, 2024

Test Case 1

const GLOB: &'static str = "some/**/n*d[k-m]e?txt";
const PATH: &'static str = "some/a/bigger/path/to/the/crazy/needle.txt";
crate_glob_match        time:   [166.43 ns 167.02 ns 168.04 ns]
mine_glob_match         time:   [69.918 ns 70.031 ns 70.122 ns]
glob_match_with_brace   time:   [103.33 ns 103.85 ns 104.60 ns]

Test Case 2

const GLOB: &'static str = "some/**/{tob,crazy}/?*.{png,txt}";
const PATH: &'static str = "some/a/bigger/path/to/the/crazy/needle.txt";
crate_glob_match        time:   [363.06 ns 364.40 ns 366.12 ns]
glob_match_with_brace   time:   [400.63 ns 404.17 ns 410.00 ns]

Test Case 3

const GLOB: &'static str = "some/**/{a,b,c}{d,e,f,g,h,i}{{gg,gger},r}/needle.{png,txt}{a,b}{c,d}{e,f}";
const PATH: &'static str = "some/a/b/c/bigger/needle.txtbdf";
crate_glob_match        time:   [338.61 ns 339.32 ns 340.32 ns]
glob_match_with_brace   time:   [2.6493 µs 2.6581 µs 2.6663 µs]

Test Case 4

const GLOB: &'static str = "some/**/{b,c}/{**/*?txt,needle.txt}";
const PATH: &'static str = "some/a/b/c/bigger/needle.txt";
crate_glob_match        time:   [324.88 ns 326.92 ns 330.22 ns]
glob_match_with_brace   time:   [249.18 ns 250.83 ns 252.96 ns]

Test Case 5

const GLOB: &'static str = "some/**/{the,crazy}/?*.{png,txt}";
const PATH: &'static str = "some/a/bigger/path/to/the/crazy/needle.txt";
crate_glob_match        error:  assertion failed
glob_match_with_brace   time:   [467.83 ns 469.16 ns 470.74 ns]

@LingyuCoder

@LingyuCoder
Copy link
Collaborator

Thanks for your great work.

@LingyuCoder LingyuCoder merged commit 271f63b into web-infra-dev:main Jul 16, 2024
29 checks passed
@shulaoda shulaoda deleted the fix/glob-match branch July 16, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release: bug fix release: bug related release(mr only)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: transformImport did not import styles.
5 participants