Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support integration with cstree #96

Open
zesterer opened this issue Mar 3, 2022 · 6 comments · Fixed by #681
Open

Support integration with cstree #96

zesterer opened this issue Mar 3, 2022 · 6 comments · Fixed by #681
Labels
enhancement New feature or request help wanted Extra attention is needed interop An issue that affects interoperability with other crates

Comments

@zesterer
Copy link
Owner

zesterer commented Mar 3, 2022

With the eventual merge of #82, it might be possible to have chumsky integrate with cstree, a library for lossless parsing using untyped syntax tree. This might be achieved by allowing implementers of the Input trait to specify functions that should be run when sequences of the input are consumed by the parser. For a dedicated Input implementation (that wraps another internally) we could specify these functions, allowing the emission of parse events.

@zesterer zesterer added enhancement New feature or request help wanted Extra attention is needed labels Mar 3, 2022
@zesterer
Copy link
Owner Author

This is probably possible with zero-copy, but it's not on the roadmap for release.

@zesterer zesterer added the interop An issue that affects interoperability with other crates label Feb 21, 2023
@robo9k
Copy link

robo9k commented May 2, 2023

What does zero-copy refer to (a Git branch?), the mentioned pull request has been merged already?
Are we talking about InputRef (not Input) here and if so, how would that work given it's not a trait?
What would chumsky's role be; parse input (possibly from a separate lexer) and output .. events to construct a cstree CST?

@CraftSpider
Copy link
Collaborator

zero-copy used to be a git branch, but has since been merged and is now the main branch of the repository, anything after 1.0.0-alpha.1 is zero-copy.

Most likely this would involve both Input and InputRef - InputRef would gain methods that the parsers call, that it delegates to implementors of the Input trait (or possible a new sub-trait of it, similar to SliceInput or ValueInput, if I understand the original idea correctly.

I can't answer this one as zesterer, but personally, that sounds about right, given my limited knowledge of cstree.

@zesterer
Copy link
Owner Author

zesterer commented May 3, 2023

Yep, that's about right: we'd expand the input traits to allow collecting 'parsing events'. For most imputs, this would just be a stub that does nothing, allowing the optimiser to swallow the overhead. For a cstree input, these parsing events could be consumed by cstree's node builder and used to track token inputs.

Unfortunately I don't have the time to put together a proof of concept for this at the moment, but it probably wouldn't be the most difficult thing in the world to do (for those that understand how cstree works well).

@jyn514
Copy link
Contributor

jyn514 commented Oct 17, 2024

@zesterer helped me got something working that functions pretty well:

struct RowanNode_<'a, O, P: CSTParser<'a, O>> {
    parser: P,
    kind: SyntaxKind,
    _marker: PhantomData<(&'a str, fn() -> O)>,
}

type RowanNode<'a, O, P> = Ext<RowanNode_<'a, O, P>>;

/// This needs to be an extension, not a combinator using `map_with`, because map_with can be evaluated multiple times in the case of backtracking.
impl<'a, O, P: CSTParser<'a, O>> ExtParser<'a, &'a str, O, CSTExtra<'a>> for RowanNode_<'a, O, P> {
    fn parse(&self, inp: &mut InputRef<'a, '_, &'a str, CSTExtra<'a>>) -> Result<O, CSTError<'a>> {
        let checkpoint = inp.state().checkpoint();

        let output = inp.parse(&self.parser)?;
        let builder = inp.state();
        builder.start_node_at(checkpoint, self.kind.into());
        builder.finish_node();
        Ok(output)
    }

    fn check(&self, inp: &mut InputRef<'a, '_, &'a str, CSTExtra<'a>>) -> Result<(), CSTError<'a>> {
        let checkpoint = inp.state().checkpoint();

        inp.check(&self.parser)?;
        let builder = inp.state();
        builder.start_node_at(checkpoint, self.kind.into());
        builder.finish_node();
        Ok(())
    }
}

fn node<'a, O, P: CSTParser<'a, O>>(kind: SyntaxKind, parser: P) -> RowanNode<'a, O, P> {
    Ext(RowanNode_ {
        parser,
        kind,
        _marker: PhantomData,
    })
}

there also needs to be a combinator for tokens, but the general principle is working pretty well :)

the stuff with CSTParser is just because i didn't feel like making it more generic, it would be pretty easy to change. CSTExtra is kinda locked in because it needs the builder though, you'd need either that or yet another trait that gives you back &mut GreenNodeBuilder.

@zesterer
Copy link
Owner Author

Reopening since there's still need for a combinator for this, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed interop An issue that affects interoperability with other crates
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants