Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple PATH arguments, truncation fix, fmt, clippy #92

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 82 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,82 +11,80 @@ resembles C and C++ code, making it easy to turn interesting code patterns into

weggli is inspired by great tools like [Semgrep](https://semgrep.dev/), [Coccinelle](https://coccinelle.gitlabpages.inria.fr/website/), [joern](https://joern.readthedocs.io/en/latest/) and [CodeQL](https://securitylab.github.com/tools/codeql), but makes some different design decisions:

- **C++ support**: weggli has first class support for modern C++ constructs, such as lambda expressions, range-based for loops and constexprs.
- **C++ support**: weggli has first class support for modern C++ constructs, such as lambda expressions, range-based for loops and constexprs.

- **Minimal setup**: weggli should work *out-of-the box* against most software you will encounter. weggli does not require the ability to build the software and can work with incomplete sources or missing dependencies.

- **Minimal setup**: weggli should work *out-of-the box* against most software you will encounter. weggli does not require the ability to build the software and can work with incomplete sources or missing dependencies.

- **Interactive**: weggli is designed for interactive usage and fast query performance. Most of the time, a weggli query will be faster than a grep search. The goal is to enable an interactive workflow where quick switching between code review and query creation/improvement is possible.
- **Greedy**: weggli's pattern matching is designed to find as many (useful) matches as possible for a specific query. While this increases the risk of false positives it simplifies query creation. For example, the query `$x = 10;` will match both assignment expressions (`foo = 10;`) and declarations (`int bar = 10;`).

- **Greedy**: weggli's pattern matching is designed to find as many (useful) matches as possible for a specific query. While this increases the risk of false positives it simplifies query creation. For example, the query `$x = 10;` will match both assignment expressions (`foo = 10;`) and declarations (`int bar = 10;`).




## Usage
```
Use -h for short descriptions and --help for more details.
Use -h for short descriptions and --help for more details.

Homepage: https://github.com/weggli-rs/weggli

USAGE: weggli [OPTIONS] <PATTERN> <PATH>
USAGE: weggli [OPTIONS] <PATTERN> <PATH>...

ARGS:
<PATTERN>
<PATTERN>
A weggli search pattern. weggli's query language closely resembles
C and C++ with a small number of extra features.

For example, the pattern '{_ $buf[_]; memcpy($buf,_,_);}' will
find all calls to memcpy that directly write into a stack buffer.

Besides normal C and C++ constructs, weggli's query language
supports the following features:

_ Wildcard. Will match on any AST node.

$var Variables. Can be used to write queries that are independent
of identifiers. Variables match on identifiers, types,
field names or namespaces. The --unique option
optionally enforces that $x != $y != $z. The --regex option can
enforce that the variable has to match (or not match) a
regular expression.

_(..) Subexpressions. The _(..) wildcard matches on arbitrary
sub expressions. This can be helpful if you are looking for some
operation involving a variable, but don't know more about it.
For example, _(test) will match on expressions like test+10,
buf[test->size] or f(g(&test));

not: Negative sub queries. Only show results that do not match the
following sub query. For example, '{not: $fv==NULL; not: $fv!=NULL *$v;}'
would find pointer dereferences that are not preceded by a NULL check.

strict: Enable stricter matching. This turns off statement unwrapping
and greedy function name matching. For example 'strict: func();'
will not match on 'if (func() == 1)..' or 'a->func()' anymore.

weggli automatically unwraps expression statements in the query source
to search for the inner expression instead. This means that the query `{func($x);}`
will match on `func(a);`, but also on `if (func(a)) {..}` or `return func(a)`.
Matching on `func(a)` will also match on `func(a,b,c)` or `func(z,a)`.
Similarly, `void func($t $param)` will also match function definitions
with multiple parameters.

Additional patterns can be specified using the --pattern (-p) option. This makes
it possible to search across functions or type definitions.

<PATH>
C and C++ with a small number of extra features.

For example, the pattern '{_ $buf[_]; memcpy($buf,_,_);}' will
find all calls to memcpy that directly write into a stack buffer.

Besides normal C and C++ constructs, weggli's query language
supports the following features:

_ Wildcard. Will match on any AST node.

$var Variables. Can be used to write queries that are independent
of identifiers. Variables match on identifiers, types,
field names or namespaces. The --unique option
optionally enforces that $x != $y != $z. The --regex option can
enforce that the variable has to match (or not match) a
regular expression.

_(..) Subexpressions. The _(..) wildcard matches on arbitrary
sub expressions. This can be helpful if you are looking for some
operation involving a variable, but don't know more about it.
For example, _(test) will match on expressions like test+10,
buf[test->size] or f(g(&test));

not: Negative sub queries. Only show results that do not match the
following sub query. For example, '{not: $fv==NULL; not: $fv!=NULL *$v;}'
would find pointer dereferences that are not preceded by a NULL check.

strict: Enable stricter matching. This turns off statement unwrapping and greedy
function name matching. For example 'strict: func();' will not match
on 'if (func() == 1)..' or 'a->func()' anymore.

weggli automatically unwraps expression statements in the query source
to search for the inner expression instead. This means that the query `{func($x);}`
will match on `func(a);`, but also on `if (func(a)) {..}` or `return func(a)`.
Matching on `func(a)` will also match on `func(a,b,c)` or `func(z,a)`.
Similarly, `void func($t $param)` will also match function definitions
with multiple parameters.

Additional patterns can be specified using the --pattern (-p) option. This makes
it possible to search across functions or type definitions.
<PATH>...
Input directory or file to search. By default, weggli will search inside
.c and .h files for the default C mode or .cc, .cpp, .cxx, .h and .hpp files when
executing in C++ mode (using the --cpp option).
Alternative file endings can be specified using the --extensions (-e) option.

When combining weggli with other tools or preprocessing steps,
files can also be specified via STDIN by setting the directory to '-'
and piping a list of filenames.
.c and .h files for the default C mode or .cc, .cpp, .cxx, .h and .hpp files when
executing in C++ mode (using the --cpp option).
Alternative file endings can be specified using the --extensions=h,c (-e) option.

When combining weggli with other tools or preprocessing steps,
files can also be specified via STDIN by setting the directory to '-'
and piping a list of filenames.

OPTIONS:
-A, --after <after>
-A, --after <after>
Lines to print after a match. Default = 5.

-B, --before <before>
Expand Down Expand Up @@ -116,39 +114,40 @@ Use -h for short descriptions and --help for more details.
-l, --limit
Only show the first match in each function.

-n, --line-numbers
Enable line numbers

-p, --pattern <p>...
Specify additional search patterns.

-R, --regex <regex>...
Filter variable matches based on a regular expression.
This feature uses the Rust regex crate, so most Perl-style
regular expression features are supported.
(see https://docs.rs/regex/1.5.4/regex/#syntax)
This feature uses the Rust regex crate, so most Perl-style
regular expression features are supported.
(see https://docs.rs/regex/1.5.4/regex/#syntax)

Examples:
Examples:

Find calls to functions starting with the string 'mem':
weggli -R 'func=^mem' '$func(_);'

Find memcpy calls where the last argument is NOT named 'size':
weggli -R 's!=^size$' 'memcpy(_,_,$s);'
Find calls to functions starting with the string 'mem':
weggli -R 'func=^mem' '$func(_);'

Find memcpy calls where the last argument is NOT named 'size':
weggli -R 's!=^size$' 'memcpy(_,_,$s);'
-u, --unique
Enforce uniqueness of variable matches.
By default, two variables such as $a and $b can match on identical values.
For example, the query '$x=malloc($a); memcpy($x, _, $b);' would
match on both

void *buf = malloc(size);
memcpy(buf, src, size);
By default, two variables such as $a and $b can match on identical values.
For example, the query '$x=malloc($a); memcpy($x, _, $b);' would
match on both

and
void *buf = malloc(size);
memcpy(buf, src, size);

void *buf = malloc(some_constant);
memcpy(buf, src, size);
and

Using the unique flag would filter out the first match as $a==$b.
void *buf = malloc(some_constant);
memcpy(buf, src, size);

Using the unique flag would filter out the first match as $a==$b.
-v, --verbose
Sets the level of verbosity.

Expand Down Expand Up @@ -192,8 +191,8 @@ $func(&$p);
Potentially insecure WeakPtr usage:
```cpp
weggli --cpp '{
$x = _.GetWeakPtr();
DCHECK($x);
$x = _.GetWeakPtr();
DCHECK($x);
$x->_;}' ./target/src
```

Expand All @@ -203,7 +202,7 @@ weggli -X 'DCHECK(_!=_.end());' ./target/src
```

Functions that perform writes into a stack-buffer based on
a function argument.
a function argument.
```c
weggli '_ $fn(_ $limit) {
_ $buf[_];
Expand Down Expand Up @@ -237,7 +236,7 @@ $ cargo install weggli

```sh
# optional: install rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

git clone https://github.com/googleprojectzero/weggli.git
cd weggli; cargo build --release
Expand All @@ -249,8 +248,8 @@ cd weggli; cargo build --release
Weggli is built on top of the [`tree-sitter`](https://tree-sitter.github.io/tree-sitter/) parsing library and its [`C`](https://github.com/tree-sitter/tree-sitter-c) and [`C++`](https://github.com/tree-sitter/tree-sitter-cpp) grammars.
Search queries are first parsed using an extended version of the corresponding grammar, and the resulting `AST` is
transformed into a set of tree-sitter queries
in `builder.rs`.
The actual query matching is implemented in `query.rs`, which is a relatively small wrapper around tree-sitter's query engine to add weggli specific features.
in `builder.rs`.
The actual query matching is implemented in `query.rs`, which is a relatively small wrapper around tree-sitter's query engine to add weggli specific features.


## Contributing
Expand All @@ -266,5 +265,3 @@ Apache 2.0; see [`LICENSE`](LICENSE) for details.
This project is not an official Google project. It is not supported by
Google and Google specifically disclaims all warranties as to its quality,
merchantability, or fitness for a particular purpose.


8 changes: 4 additions & 4 deletions src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -524,11 +524,11 @@ impl QueryBuilder {
let mut result = if kind == "type_identifier" {
"[ (type_identifier) (sized_type_specifier) (primitive_type)]".to_string()
} else if kind == "identifier" && pattern.starts_with('$') {
if is_num_var(pattern) && parent!="declarator" {
if is_num_var(pattern) && parent != "declarator" {
"(number_literal)".to_string()
}
else if self.cpp {
"[(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)]".to_string()
} else if self.cpp {
"[(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)]"
.to_string()
} else {
"[(identifier) (field_expression) (field_identifier)]".to_string()
}
Expand Down
Loading
Loading