Conversation
|
I'm planning on adding another version of the visitor using the cursor, since its much more performant, while hopefully not adding too much complexity. Currently I'm unable to see if it actually works or not because the python bindings to the tree sitter library is older than what is used in this project, however reducing the version causes some tests to fail. I have found that they are (from what I can understand) upgrading the version with a recent activity. So the question is if it we are to reduce the version and deal with the false-negatives of the tests or wait a couple days to see if the package updates soon? |
|
I'll look into reducing the TS version, hopefully I can find one that doesn't break the tests but have the correct parser version for the library python bindings. |
|
I was unable to find a working version yesterday. The example available shouldn't be far from the truth, but it still won't be usable in MAL due to the parser version being wrong (this repo being v15 and the library python bindings being 13-14). I'll shift over standardizing the syntax highlighting to neovim, got started yesterday as well. |
|
@nkakouros me and @tagyieh were thinking of setting up a WIP branch for the toolbox where we, temporarily, use the PR branch that is updating the library bindings until that gets resolved and completely updated. Once the bindings are on the same latest version as the parser we can then move to that and possibly merge that request (depending on completion too, obviously). Would this be OK or do you have concerns as to only allow this kind of work on local trees/forks? Related; once we get started on using tree-sitter in the toolbox I'll get the publishing of the parser bindings package working (so it only needs to be imported as a dependency |
|
Sure, go ahead, whatever works.
…________________________________
From: Tobiky ***@***.***>
Sent: Monday, March 31, 2025 3:06:00 PM
To: Tobiky/tree-sitter-mal ***@***.***>
Cc: Nikolaos Kakouros ***@***.***>; Mention ***@***.***>
Subject: Re: [Tobiky/tree-sitter-mal] [draft] docs: Visitor pattern example (PR #26)
@nkakouros<https://github.com/nkakouros> me and @tagyieh<https://github.com/tagyieh> were thinking of setting up a WIP branch for the toolbox where we, temporarily, use the PR branch that is updating the library bindings<tree-sitter/py-tree-sitter#333> until that gets resolved and completely updated. Once the bindings are on the same latest version as the parser we can then move to that and possibly merge that request (depending on completion too, obviously).
Would this be OK or do you have concerns as to only allow this kind of work on local trees/forks?
Related; once we get started on using tree-sitter in the toolbox I'll get the publishing of the parser bindings package working (so it only needs to be imported as a dependency tree-sitter-mal).
—
Reply to this email directly, view it on GitHub<#26 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC2CEIWGO6XXMLPGN75SKHD2XEVSRAVCNFSM6AAAAABYWLSSX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRWGAZDEMRUHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
[Tobiky]Tobiky left a comment (mal-lang/tree-sitter-mal#26)<#26 (comment)>
@nkakouros<https://github.com/nkakouros> me and @tagyieh<https://github.com/tagyieh> were thinking of setting up a WIP branch for the toolbox where we, temporarily, use the PR branch that is updating the library bindings<tree-sitter/py-tree-sitter#333> until that gets resolved and completely updated. Once the bindings are on the same latest version as the parser we can then move to that and possibly merge that request (depending on completion too, obviously).
Would this be OK or do you have concerns as to only allow this kind of work on local trees/forks?
Related; once we get started on using tree-sitter in the toolbox I'll get the publishing of the parser bindings package working (so it only needs to be imported as a dependency tree-sitter-mal).
—
Reply to this email directly, view it on GitHub<#26 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC2CEIWGO6XXMLPGN75SKHD2XEVSRAVCNFSM6AAAAABYWLSSX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRWGAZDEMRUHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
|
@nkakouros Do you wish to set up the packages on an account as MAL toolbox or my personal accounts like with this repository? Three will ideally be needed, PyPI, NPM, and Crates.io, for the template publishing that tree-sitter provides. |
… example This makes the behavior more predictiable and readable.
Since the last commit, more visitor patters have been added. Currently, the TTC is fully implemented. The grammar was updated to support TTC functions which have no arguments, to be in accordance to the ANTLR grammar.
Also updated grammar, since the group operations had mismatched priorities. According to the ANTLR version, union, intersection and difference should all have the lowest priority, followed by collection and the highest priority is transitive and subtype.
Also updated grammar to allow for multiple detectors.
Currently, the TreeSitter compiles the coreLang and produces the exact same output as the ANTLR version. The biggest change is that instead of using cursor.goto_next_sibling(), an auxiliary method was created to avoid comment nodes which still appear in the AST. The grammar was updated to allow for multiple tags.
The analyzer was added to the visitor patterns and the visitor was slightly adjusted for some minor mistakes. There are currently two analyzer tests which fail, due to how TreeSitter processes grammar, since it no longer throws the errors as ANTLR does.
The 'E' operator/attack-step-type constantly breaks and stops highlighting, by turning the types into a named node we can highlight the named node instead of the string target.
Dicionaries are more maintainable but also faster than series of if-else statements, especially since they are easier for the interpreter to optimize for. This also allows easier usage of byte-strings (bstr) since they can be mapped. Using bstr allows us to avoid decoding them as much as possible.
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_attack_step
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_detector_context
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_asset_declaration
Greedily replace `strip('"')` with slicing operation `[1:-1]` since '"'
are garantueed to be ASCII and so 1-byte UTF-8, which the `decode()`
call assumes anyway.
added *args and **kwargs to make it accept anything
Greedily replace `strip(')` with slicing operation `[1:-1]` since '
are garantueed to be ASCII and so 1-byte UTF-8, which the `decode()`
call assumes anyway.
Dicionaries are more maintainable but also faster than series of if-else statements, especially since they are easier for the interpreter to optimize for. This also allows easier usage of byte-strings (bstr) since they can be mapped. Using bstr allows us to avoid decoding them as much as possible.
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler._visit_intermediary_ttc_expr
The `float()` function (or casting function) natively supports byte strings. `decode()` adds unecessary overhead.
… bindings Dicionaries are more maintainable but also faster than series of if-else statements, especially since they are easier for the interpreter to optimize for. This also allows easier usage of byte-strings (bstr) since they can be mapped. Using bstr allows us to avoid decoding them as much as possible.
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_ttc_distribution
Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_reaching
…indings Dicionaries are more maintainable but also faster than series of if-else statements, especially since they are easier for the interpreter to optimize for. This also allows easier usage of byte-strings (bstr) since they can be mapped. Using bstr allows us to avoid decoding them as much as possible.
…tion Replace some `<bytes_val>.decode() == '<text>'` with `<bytes_val> == b'<text>'` in fast.MalCompiler.visit_associations_declaration
9e1ab36 to
fad8f26
Compare
This adds an application of visitor pattern to the MAL tree-sitter parser, to show how it can be used. For fresh local builds the workflow is a bit more convoluted compared to standard for the size of the example, however to ensure that users can use a newly built MAL parser this is necessary. These steps have however been recorded to a README in the same folder.
The example can be found under
examples/visitor.