Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added TypeScript PhpLexerBase for php grammar #3994

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions php/PhpLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,8 @@ HtmlText : ~[<#]+;
XmlStart : '<?xml' -> pushMode(XML);
PHPStartEcho : PhpStartEchoFragment -> type(Echo), pushMode(PHP);
PHPStart : PhpStartFragment -> channel(SkipChannel), pushMode(PHP);
HtmlScriptOpen : '<script' { _scriptTag = true; } -> pushMode(INSIDE);
HtmlStyleOpen : '<style' { _styleTag = true; } -> pushMode(INSIDE);
HtmlScriptOpen : '<script' { this._scriptTag = true; } -> pushMode(INSIDE);
HtmlStyleOpen : '<style' { this._styleTag = true; } -> pushMode(INSIDE);
HtmlComment : '<!' '--' .*? '-->' -> channel(HIDDEN);
HtmlDtd : '<!' .*? '>';
HtmlOpen : '<' -> pushMode(INSIDE);
Expand Down
161 changes: 161 additions & 0 deletions php/TypeScript/PhpLexerBase.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
import {CommonToken, Lexer, Token, CharStream} from "antlr4";
// change the imports below accordingly
import PhpParser from "./PhpParser";
import PhpLexer from "./PhpLexer";

export default abstract class PhpLexerBase extends Lexer {
private AspTags: boolean;
protected _scriptTag: boolean;
protected _styleTag: boolean;
private _heredocIdentifier: string | undefined;
private _prevTokenType: number;
private _htmlNameText: string | undefined;
private _phpScript: boolean;
private _insideString: boolean;
protected _mode: number
protected _channel: number

static DEFAULT_MODE = 0;
protected static MORE = -2;
protected static SKIP = -3;

protected static DEFAULT_TOKEN_CHANNEL: number = (Token as any).DEFAULT_CHANNEL;
protected static HIDDEN: number = (Token as any).HIDDEN_CHANNEL;
protected static MIN_CHAR_VALUE = 0x0000;
protected static MAX_CHAR_VALUE = 0x10FFFF;

constructor(input: CharStream) {
super(input);
this.AspTags = true;
this._scriptTag = false;
this._styleTag = false;
this._heredocIdentifier = undefined;
this._prevTokenType = 0;
this._htmlNameText = undefined;
this._phpScript = false;
this._insideString = false;
}

nextToken() {
let token = super.nextToken()

if (token.type === PhpParser.PHPEnd || token.type === PhpLexer.PHPEndSingleLineComment) {
if (this._mode === PhpLexer.SingleLineCommentMode) {
// SingleLineCommentMode for such allowed syntax:
// // <?php echo "Hello world"; // comment ?>
this.popMode();
}
this.popMode();

if (token.text === "</script>") {
this._phpScript = false;
token.type = PhpLexer.HtmlScriptClose;
} else {
// Add semicolon to the end of statement if it is absent.
// For example: <?php echo "Hello world" ?>
if (this._prevTokenType === PhpLexer.SemiColon || this._prevTokenType === PhpLexer.Colon || this._prevTokenType === PhpLexer.OpenCurlyBracket || this._prevTokenType === PhpLexer.CloseCurlyBracket) {
token = super.nextToken()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is wrong. Check out the CSharp port. The code should be token.channel = 1;. ("1" is the hidden channel.)

} else {
token = new CommonToken(undefined, PhpLexer.SemiColon, undefined, undefined, undefined);
token.text = ';';
Copy link
Contributor

@kaby76 kaby76 Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change these two lines to correspond to that in the CSharp port: token.type = PhpLexer.SemiColon; and delete the "new CommonToken()" call.

}
}
}

else if (token.type === PhpLexer.HtmlName) {
this._htmlNameText = token.text
}

else if (token.type === PhpLexer.HtmlDoubleQuoteString) {
if (token.text === "php" && this._htmlNameText === "language") {
this._phpScript = true;
}
}

else if (this._mode === PhpLexer.HereDoc) {
// Heredoc and Nowdoc syntax support: http://php.net/manual/en/language.types.string.php#language.types.string.syntax.heredoc
if (token.type === PhpLexer.StartHereDoc || token.type === PhpLexer.StartNowDoc) {
this._heredocIdentifier = token.text.slice(3).trim().replace(/\'$/, '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code does not correspond to the CSharp port. This fails because in C#, trim(char) removes both the leading and trailing char. But, in TypeScript, replace(pat, string) only replaces the first occurrence of the pattern. So, 'NOWDOC is replaced with NOWDOC'. The way I did this is to have two replace(), one for the beginning of the string, and another for the end. Or, you could just call replace(/\'/, '') twice.

}

if (token.type === PhpLexer.HereDocText) {
Copy link
Contributor

@kaby76 kaby76 Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really should be else if (token.type === PhpLexer.HereDocText) } because it does not follow exactly the switch statement in the CSharp port. Notice the "break" following the two cases.

. The TypeScript port code should have a one-to-one correspondence with the C# target.

if (this.CheckHeredocEnd(token.text)) {
this.popMode()
const heredocIdentifier = this.GetHeredocEnd(token.text)
if (token.text.trim().endsWith(';')) {
token = new CommonToken((CommonToken as any).EMPTY_SOURCE, PhpParser.SemiColon,
undefined, undefined, undefined)
token.text = `${heredocIdentifier};\n`;
Copy link
Contributor

@kaby76 kaby76 Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line (token.text = '$........;\n';. This rewrite of the text makes it very confusing. In addition, just delete the line with token = new CommonToken(...); because it's not in the CSharp port.

} else {
token = super.nextToken()
token.text = `${heredocIdentifier}\n;`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, delete this line token.text = ...;. Same reason as above--very confusing. If anything, it should just set the text to ';'.

}
}
}
}

else if (this._mode === PhpLexer.PHP) {
if (this._channel === PhpLexer.HIDDEN) {
this._prevTokenType = token.type;
}
}

return token;
}

GetHeredocEnd(text: string): string {
return text.trim().replace(/\;$/, "");
}

CheckHeredocEnd(text: string): boolean {
return this.GetHeredocEnd(text) === this._heredocIdentifier;
}

IsNewLineOrStart(pos: number): boolean {
return this._input.LA(pos) <= 0 || this._input.LA(pos) == '\r'.charCodeAt(0) ||
this._input.LA(pos) == '\n'.charCodeAt(0)
}

PushModeOnHtmlClose() {
this.popMode();
if (this._scriptTag) {
if (!this._phpScript) {
this.pushMode(PhpLexer.SCRIPT);
} else {
this.pushMode(PhpLexer.PHP);
}
this._scriptTag = false;
} else if (this._styleTag) {
this.pushMode(PhpLexer.STYLE);
this._styleTag = false;
}
}

HasAspTags(): boolean {
return this.AspTags;
}

HasPhpScriptTag(): boolean {
return this._phpScript;
}

PopModeOnCurlyBracketClose() {
if (this._insideString) {
this._insideString = false;
this.skip;
this.popMode();
}
}

ShouldPushHereDocMode(pos: number): boolean {
return this._input.LA(pos) === '\r'.charCodeAt(0) || this._input.LA(pos) === '\n'.charCodeAt(0);
}

IsCurlyDollar(pos: number): boolean {
return this._input.LA(pos) === '$'.charCodeAt(0);
}

SetInsideString() {
this._insideString = true
}
}
2 changes: 1 addition & 1 deletion php/desc.xml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="UTF-8" ?>
<desc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
<antlr-version>^4.10</antlr-version>
<targets>CSharp;Java;Python3</targets>
<targets>CSharp;Java;Python3;TypeScript</targets>
</desc>