Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify lexing/preprocessing/parsing into hand-rolled scanner #261

Open
2 tasks done
Cirras opened this issue Jun 18, 2024 · 0 comments
Open
2 tasks done

Unify lexing/preprocessing/parsing into hand-rolled scanner #261

Cirras opened this issue Jun 18, 2024 · 0 comments
Labels
engine Improvements to the core engine enhancement Improvements to an existing feature

Comments

@Cirras
Copy link
Collaborator

Cirras commented Jun 18, 2024

Prerequisites

  • This improvement has not already been suggested.
  • This improvement would be generally useful, not specific to my code or setup.

Engine area

Delphi language support

Improvement description

Background

  • The current parsing is implemented in 3 steps:
    • Lexing (DelphiLexer generated by ANTLR)
    • Preprocessing (DelphiPreprocessor)
    • Parsing (DelphiParser generated by ANTLR)
  • In addition, we have a hand-rolled CompilerDirectiveParser with corresponding ExpressionLexer and ExpressionParser for handling constant expressions within {$if} and {$elseif} directives.

Problem

While this approach keeps the concerns cleanly separated and sounds nice on paper, it doesn't reflect reality at all.

Delphi's parsing is a fairly streamlined single-pass implementation that parses high-level AST structures and constructs the symbol table while reading tokens on-demand from the input data and processing compiler directives as they appear. In other words, it's all mashed together into a single process.

There are many subtle deficiencies in the current parsing that we can trace back to this fundamental difference in how the process is modeled.

Solution

Merge the 3 parsing steps into a hand-rolled DelphiScanner:

  • Lex tokens on-demand instead of doing it upfront.
  • Evaluate compiler directives as they're lexed from the input data instead of afterward.
  • Additionally, perform symbol table construction simultaneously instead of afterward.

Rationale

Enable full support for:

  • caret notation (Parsing error on caret-escaped special characters #111)
  • compiler directives nested in directive expressions (Expression lexer failing to compile file in recursive directives #260)
  • symbol information in directive expressions
    • references to constants
    • Declared function
  • "vexing" parses that require symbol information to resolve
    // [[OPTION 1]]
    // In this scenario, `Write` is called with 2 arguments
    // parsed as binary expressions: `Foo < T` and `T > (B)` 
    {
      type
        A = class
          procedure Bar;
        end;
       
      const
        Foo = 0;
        T = 0;
        B = 0;
    }
    
    // [[OPTION 2]]
    // In this scenario, `Write` is called with 1 argument
    // parsed as an invocation of generic function `Foo`.
    {  
      type
        A = class
          function Foo<U, V>(X: Integer): Integer;
          procedure Bar;
        end;
       
        T = class end;
       
      const
        B = 0;
    }
    
    procedure A.Bar;
    begin
      Write(Foo<T, T>(B));
    end;

Make it easier to implement:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine Improvements to the core engine enhancement Improvements to an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant