Skip to content

docs: Visitor pattern example#26

Merged
Tobiky merged 45 commits intomainfrom
docs/visitor-pattern-example
May 26, 2025
Merged

docs: Visitor pattern example#26
Tobiky merged 45 commits intomainfrom
docs/visitor-pattern-example

Conversation

@Tobiky
Copy link
Collaborator

@Tobiky Tobiky commented Mar 10, 2025

This adds an application of visitor pattern to the MAL tree-sitter parser, to show how it can be used. For fresh local builds the workflow is a bit more convoluted compared to standard for the size of the example, however to ensure that users can use a newly built MAL parser this is necessary. These steps have however been recorded to a README in the same folder.

The example can be found under examples/visitor.

@Tobiky
Copy link
Collaborator Author

Tobiky commented Mar 10, 2025

I'm planning on adding another version of the visitor using the cursor, since its much more performant, while hopefully not adding too much complexity.

Currently I'm unable to see if it actually works or not because the python bindings to the tree sitter library is older than what is used in this project, however reducing the version causes some tests to fail. I have found that they are (from what I can understand) upgrading the version with a recent activity. So the question is if it we are to reduce the version and deal with the false-negatives of the tests or wait a couple days to see if the package updates soon?

@Tobiky
Copy link
Collaborator Author

Tobiky commented Mar 12, 2025

I'll look into reducing the TS version, hopefully I can find one that doesn't break the tests but have the correct parser version for the library python bindings.

@Tobiky
Copy link
Collaborator Author

Tobiky commented Mar 13, 2025

I was unable to find a working version yesterday. The example available shouldn't be far from the truth, but it still won't be usable in MAL due to the parser version being wrong (this repo being v15 and the library python bindings being 13-14).

I'll shift over standardizing the syntax highlighting to neovim, got started yesterday as well.

@Tobiky
Copy link
Collaborator Author

Tobiky commented Mar 31, 2025

@nkakouros me and @tagyieh were thinking of setting up a WIP branch for the toolbox where we, temporarily, use the PR branch that is updating the library bindings until that gets resolved and completely updated. Once the bindings are on the same latest version as the parser we can then move to that and possibly merge that request (depending on completion too, obviously).

Would this be OK or do you have concerns as to only allow this kind of work on local trees/forks?

Related; once we get started on using tree-sitter in the toolbox I'll get the publishing of the parser bindings package working (so it only needs to be imported as a dependency tree-sitter-mal).

@nkakouros
Copy link
Collaborator

nkakouros commented Mar 31, 2025 via email

@Tobiky
Copy link
Collaborator Author

Tobiky commented Apr 1, 2025

@nkakouros Do you wish to set up the packages on an account as MAL toolbox or my personal accounts like with this repository? Three will ideally be needed, PyPI, NPM, and Crates.io, for the template publishing that tree-sitter provides.

Tobiky and others added 24 commits May 13, 2025 17:08
… example

This makes the behavior more predictiable and readable.
Since the last commit, more visitor patters have been added. Currently, the TTC is fully implemented.

The grammar was updated to support TTC functions which have no arguments, to be in accordance to the ANTLR grammar.
Also updated grammar, since the group operations had mismatched priorities. According to the ANTLR version, union, intersection and difference should all have the lowest priority, followed by collection and the highest priority is transitive and subtype.
Also updated grammar to allow for multiple detectors.
Currently, the TreeSitter compiles the coreLang and produces the exact same output as the ANTLR version.

The biggest change is that instead of using cursor.goto_next_sibling(), an auxiliary method was created to avoid comment nodes which still appear in the AST.

The grammar was updated to allow for multiple tags.
tagyieh and others added 20 commits May 13, 2025 17:10
The analyzer was added to the visitor patterns and the visitor was slightly adjusted for some minor mistakes. There are currently two analyzer tests which fail, due to how TreeSitter processes grammar, since it no longer throws the errors as ANTLR does.
The 'E' operator/attack-step-type constantly breaks and stops highlighting, by turning the types into a named node we can highlight the named node instead of the string target.
Dicionaries are more maintainable but also faster than series of if-else
statements, especially since they are easier for the interpreter to
optimize for.

This also allows easier usage of byte-strings (bstr) since they can be
mapped. Using bstr allows us to avoid decoding them as much as possible.
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_attack_step
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_detector_context
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_asset_declaration
Greedily replace `strip('"')` with slicing operation `[1:-1]` since '"'
are garantueed to be ASCII and so 1-byte UTF-8, which the `decode()`
call assumes anyway.
added *args and **kwargs to make it accept anything
    Greedily replace `strip(')` with slicing operation `[1:-1]` since '
    are garantueed to be ASCII and so 1-byte UTF-8, which the `decode()`
    call assumes anyway.
Dicionaries are more maintainable but also faster than series of if-else
statements, especially since they are easier for the interpreter to
optimize for.

This also allows easier usage of byte-strings (bstr) since they can be
mapped. Using bstr allows us to avoid decoding them as much as possible.
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler._visit_intermediary_ttc_expr
The `float()` function (or casting function) natively supports byte
strings. `decode()` adds unecessary overhead.
… bindings

Dicionaries are more maintainable but also faster than series of if-else
statements, especially since they are easier for the interpreter to
optimize for.

This also allows easier usage of byte-strings (bstr) since they can be
mapped. Using bstr allows us to avoid decoding them as much as possible.
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_ttc_distribution
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_reaching
…indings

Dicionaries are more maintainable but also faster than series of if-else
statements, especially since they are easier for the interpreter to
optimize for.

This also allows easier usage of byte-strings (bstr) since they can be
mapped. Using bstr allows us to avoid decoding them as much as possible.
…tion

Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_associations_declaration
@Tobiky Tobiky force-pushed the docs/visitor-pattern-example branch from 9e1ab36 to fad8f26 Compare May 16, 2025 10:09
@Tobiky Tobiky changed the title [draft] docs: Visitor pattern example docs: Visitor pattern example May 26, 2025
@Tobiky Tobiky merged commit dc7f608 into main May 26, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants