Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANTLR4 runtime cannot be transpiled to ES5 because of incorrect inheritance #3032

Closed
octogonz opened this issue Jan 8, 2021 · 30 comments
Closed

Comments

@octogonz
Copy link

octogonz commented Jan 8, 2021

Background

Web applications must be transpiled to ECMAScript 5 (ES5) if they need to support older JavaScript runtimes, for example Internet Explorer or hardware devices that shipped with older JavaScript engines. In the past, the ANTLR runtime was written using ES5 compatible code. But version 4.9.x recently introduced ES6 class definitions that require transpilation. This is fine.

However, the ANTLR library has some classes that inherit from the system Error class, which is incompatible with transpilation. The reason is that ES5 system APIs are modeled as ES6 classes, which use a different inheritance mechanism from transpiled classes. (See
this article for some details.)

Repro

  1. Transpile ANTLR4 to ES5 and bundle it into a web application.

This code fails to catch LexerNoViableAltException:

                    catch (e) {
                        if (e instanceof RecognitionException) {
                            this.notifyListeners(e); // report error
                            this.recover(e);
                        }
                        else {
                            console.log(e.stack);
                            throw e;
                        }
                    }

The e instanceof RecognitionException test returns false, even though LexerNoViableAltException inherits from RecognitionException. This happens because of the incorrect prototype introduced when RecognitionException inherits from the Error base class.

Possible fixes

The two standard solutions are:

  1. Don't inherit from the Error class; instead simply implement its contract using a custom base class. This will break instanceof Error, but generally nobody ever tests that, so this approach works fine in practice. The callstack property can be calculated using (new Error()).stack.
    - OR -
  2. Call Object.setPrototypeOf() in the constructor for every subclass of the Error class. This also works pretty well. Its main downside is that people sometimes forget to apply the workaround when adding new subclasses.

I would be willing to make a PR to fix this. Would you accept a fix?

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Jan 8, 2021

Hi,

thanks for this.

I struggle a bit with your findings because my transpiled parsers do not suffer at all from the problem you are describing.
Every generated parser rule relies on the instanceof construct, and it works perfectly here...
Any chance the issue is introduced by your toolset?
(I notice the provided link relates to Typescript)

Eric

@octogonz
Copy link
Author

octogonz commented Jan 8, 2021

Hmmm... What does your transpiled RecognitionException look like? Perhaps Babel transpiles it differently.

I'm using TypeScript 3.9.7 which produces this output:

var RecognitionException = /** @class */ (function (_super) {
    tslib_1.__extends(RecognitionException, _super);
    function RecognitionException(params) {
        var _this = _super.call(this, params.message) || this;
        if (!!Error.captureStackTrace) {
            Error.captureStackTrace(_this, RecognitionException);
        }
        else {
            var stack = new Error().stack;
        }
        _this.message = params.message;
        _this.recognizer = params.recognizer;
        _this.input = params.input;
        _this.ctx = params.ctx;
        /**
         * The current {@link Token} when an error occurred. Since not all streams
         * support accessing symbols by index, we have to track the {@link Token}
         * instance itself
         */
        _this.offendingToken = null;
        /**
         * Get the ATN state number the parser was in at the time the error
         * occurred. For {@link NoViableAltException} and
         * {@link LexerNoViableAltException} exceptions, this is the
         * {@link DecisionState} number. For others, it is the state whose outgoing
         * edge we couldn't match.
         */
        _this.offendingState = -1;
        if (_this.recognizer !== null) {
            _this.offendingState = _this.recognizer.state;
        }
        return _this;
    }
    /**
     * Gets the set of input symbols which could potentially follow the
     * previously matched symbol at the time this exception was thrown.
     *
     * <p>If the set of expected tokens is not known and could not be computed,
     * this method returns {@code null}.</p>
     *
     * @return The set of token types that could potentially follow the current
     * state in the ATN, or {@code null} if the information is not available.
     */
    RecognitionException.prototype.getExpectedTokens = function () {
        if (this.recognizer !== null) {
            return this.recognizer.atn.getExpectedTokens(this.offendingState, this.ctx);
        }
        else {
            return null;
        }
    };
    // <p>If the state number is not known, this method returns -1.</p>
    RecognitionException.prototype.toString = function () {
        return this.message;
    };
    return RecognitionException;
}(Error));

@octogonz
Copy link
Author

octogonz commented Jan 8, 2021

Note also that LexerNoViableAltException is thrown only for unusual inputs. In my case, it was a Jest test (running in the Node.js runtime) that was testing error reporting for invalid inputs. The web browser app worked okay, perhaps because the inputs it was parsing didn't have any errors.

I just now confirmed that the problem is fixed if we add this line (as suggested in the TypeScript docs):

class RecognitionException extends Error {
  constructor(params) {
    super(params.message);

    Object.setPrototypeOf(this, RecognitionException.prototype); // <======  option #2 from above

    if (!!Error.captureStackTrace) {
      Error.captureStackTrace(this, RecognitionException);
    } else {
      var stack = new Error().stack;
    }

However my preference would be to do option #1 instead of #2, because it is more portable and less error-prone.

@ericvergnaud
Copy link
Contributor

Can you share your babelrc?

@octogonz
Copy link
Author

octogonz commented Jan 8, 2021

We don't use Babel. We transpile to ES5 using TypeScript and a polyfill library. This is leaner and faster. (It's also better suited for corporate monorepos with lots of projects, since transpilation happens incrementally rather than centrally.)

@octogonz
Copy link
Author

octogonz commented Jan 8, 2021

But look at this Stack Overflow thread. It seems to say that Babel has the same limitation as TypeScript.

I'm really curious to see your transpiler output for RecognitionException.

@ericvergnaud
Copy link
Contributor

ok it seems the npm package is not webpacked as expected, it contains the src code but not the generated minimized bundle
I'll look into that and check if I can reproduce your issue with a bundle generated using babel and webpack
(might require a couple of days)

@ericvergnaud
Copy link
Contributor

Hi,

here is what babelized code looks like:

 var RecognitionException =/*#__PURE__*/function (_Error) {
      _inherits(RecognitionException, _Error);
      var _super = _createSuper(RecognitionException);

      function RecognitionException(params) {
          var _this;
          _classCallCheck(this, RecognitionException);
          _this = _super.call(this, params.message);
  ...  

with utility code being:

  function _inherits(subClass, superClass) {
      if (typeof superClass !== "function" && superClass !== null) {
          throw new TypeError("Super expression must either be null or a function");
      }
      subClass.prototype = Object.create(superClass && superClass.prototype, {
          constructor: {
              value: subClass,
              writable: true,
              configurable: true
          }
      });
      if (superClass) _setPrototypeOf(subClass, superClass);
  }

  function _setPrototypeOf(o, p) {
      _setPrototypeOf = Object.setPrototypeOf || function _setPrototypeOf(o, p) {
          o.__proto__ = p;
          return o;
      };
      return _setPrototypeOf(o, p);
  }

  function _createSuper(Derived) {
      var hasNativeReflectConstruct = _isNativeReflectConstruct();
      return function _createSuperInternal() {
          var Super = _getPrototypeOf(Derived), result;
          if (hasNativeReflectConstruct) {
              var NewTarget = _getPrototypeOf(this).constructor;
              result = Reflect.construct(Super, arguments, NewTarget);
          } else {
              result = Super.apply(this, arguments);
          }
          return _possibleConstructorReturn(this, result);
      };
  }

So it looks like Babel (with the help of @babel/plugin-transform-classes) is already calling Object.setPrototypeOf.

Practically, I see 3 scenarios:

  • no class transform = no issue
  • class transform using Babel = no issue
  • class transform using TypeScript = issue

I am not super inclined to add redundant code when the issue seems to lie with the TypeScript transpiler.

Also, if you are using TypeScript, should you not be using the TypeScript runtime target ?
I would expect the authors would have addressed that problem.

@GordonSmith
Copy link

GordonSmith commented Jan 15, 2021

FWIW I hit something similar, in my case when I looked inside my webpacked file, I found two instances of the generated JS code, basically webpack was seeing two versions:

  1. require("...")
  2. import {} from "..."

And was bundling them separately, this is what broke the instanceof logic.

My fix was simple - move the generated JS grammar out of my "src" folder so that the TypeScript didn't transpile it (thus creating the ambiguity).

Alternatively upgrading to 4.9.1 and setting the tsconfig module to "es6" also worked (again since webpack only "bundles" one variant).

@octogonz
Copy link
Author

I am not super inclined to add redundant code when the issue seems to lie with the TypeScript transpiler.

I'm not recommending to add redundant code. My recommendation is to avoid extending the system Error class. This is the best practice for a library that intends to be used in many different environments.

I can also open a TypeScript issue proposing for the transpiler to call Object.setPrototypeOf(), but (from previous experience) they might have some complicated justification for why they don't do that.

Also, if you are using TypeScript, should you not be using the TypeScript runtime target ?
I would expect the authors would have addressed that problem.

ANTLR's TypeScript target is still experimental and not officially supported. It might be nice to have, but frankly since humans don't edit the generated parsers, it doesn't matter much whether the generated code is typed or not. @types/antlr4 provides typings for the ANTLR runtime, and for my case at least, I only had to write about 10 lines of code to add typings for the generated classes. So I'm fine using the JavaScript target. Except for this one problem. :-)

@octogonz
Copy link
Author

My fix was simple - move the generated JS grammar out of my "src" folder so that the TypeScript didn't transpile it (thus creating the ambiguity). Alternatively upgrading to 4.9.1 and setting the tsconfig module to "es6" also worked (again since webpack only "bundles" one variant).

This will fix the problem for ES6+ runtimes such as Chrome, but it will not work for older runtimes unless you apply a separate transpiler such as Babel.

@ericvergnaud
Copy link
Contributor

Well what you are proposing are 2 workarounds for a TypeScript transpiler bug.
(this is clearly a bug from them since Babel is perfectly able to produce the correct code)

So may I suggest that you either:

  • get them to fix it
  • use babel instead of the built-in Typescript transpiler

@GordonSmith
Copy link

@ericvergnaud Having re-read the original issue and read up on it some more, I have to agree with your synopses.

BUT (isn't there always)

  1. I am in the same scenario as @octogonz, that I have never used Babel and rely on TypeScript for my transpiling. (and this number will be increasing as similar folks update to the current es6 version).
  2. This type of subtle JS runtime incompatibility can be a real PITA to track down when it does show up - @octogonz Can you be more specific about which JS Runtimes the above code fails in (are we talking old versions of Node or IE or some Phone Browsers)?

@ericvergnaud would you accept a PR which introduces a "safe" Error class AntlrError which extends Error? This could then be safely used as needed?

I would probably go with old school es5 code like (from MDN):

function AntlrError(foo, message, fileName, lineNumber) {
  var instance = new Error(message, fileName, lineNumber);
  instance.name = 'AntlrError';
  instance.foo = foo;
  Object.setPrototypeOf(instance, Object.getPrototypeOf(this));
  if (Error.captureStackTrace) {
    Error.captureStackTrace(instance, AntlrError);
  }
  return instance;
}

AntlrError.prototype = Object.create(Error.prototype, {
  constructor: {
    value: Error,
    enumerable: false,
    writable: true,
    configurable: true
  }
});

if (Object.setPrototypeOf){
  Object.setPrototypeOf(AntlrError, Error);
} else {
  AntlrError.__proto__ = Error;
}

Yes it is a compromise, but if your goal is to "just work" with a not insignificant audience...

@ericvergnaud
Copy link
Contributor

ericvergnaud commented Jan 18, 2021

well I guess you could also workaround the problem as follows:

const oldError = window.Error;
window.Error = function(message) {
  this.message = message;
  return this;
}
try {
  callLexerAndParser();
} finally {
  window.Error = oldError;
}

@GordonSmith
Copy link

Doesn't look like it would be NodeJS Friendly?

Extending Error has been an edge case for as long as I can remember (looking over my code base I can see where I do it "correctly" as per MDN and where I don't as per TS 2.x) - I wonder what runtimes this is actually an issue for (If this is only an IE8 issue then nothing to see here, lets move along...)

@octogonz
Copy link
Author

2. @octogonz Can you be more specific about which JS Runtimes the above code fails in (are we talking old versions of Node or IE or some Phone Browsers)?

The code fails if you specify "target": "es5" in your tsconfig.json. The actual JavaScript runtime does not matter -- for example my example will fail even with the latest Chrome.

The ECMAScript 5 target is needed mainly for Internet Explorer 11, and for devices that shipped with an older web browser (e.g. Playstation, Tizen, etc). These devices can last for many years beyond their initial release, and the OEM generally will never update the browser beyond security fixes. Recently at work I encountered a Comcast TV box that mostly supports ECMAScript 6, but with arbitrary gaps that are most easily solved by transpiling to ES5.

ES6 is now 6 years old. I'll be overjoyed on the day when we can finally forget about ES5, but we're not there yet...

@ericvergnaud
Copy link
Contributor

Doesn't look like it would be NodeJS Friendly?

just do it conditionally? i.e. if NodeJS replace the global Error directly.

I'm about to close this since this is not an ANTLR4 runtime issue.

@octogonz
Copy link
Author

Here's a PR #3041

It's really pretty simple to fix this problem. I don't believe this fix has any practical downsides.

@octogonz
Copy link
Author

I'm about to close this since this is not an ANTLR4 runtime issue.

BTW the TypeScript compiler maintainers replied and said they will not implement Babel's solution because it requires too much extra code. Their behavior is "by design".

@ericvergnaud
Copy link
Contributor

It's much simpler to fix it outside the runtime, see above examples, so closing this.

@ericvergnaud
Copy link
Contributor

I'm about to close this since this is not an ANTLR4 runtime issue.

BTW the TypeScript compiler maintainers replied and said they will not implement Babel's solution because it requires too much extra code. Their behavior is "by design".

My advice is: don't use tools written by lazy people (it not incompetent)

@octogonz
Copy link
Author

octogonz commented Jan 22, 2021

ericvergnaud wrote, regarding the TypeScript project:

My advice is: don't use tools written by lazy people (it not incompetent)

@parrt Is there a Code of Conduct for the people who publicly represent the ANTLR project? Maybe it's time to adopt one.

@parrt
Copy link
Member

parrt commented Jan 22, 2021

Hi @octogonz, yeah, sometimes people have strong opinions and express them publicly. I don't think Eric intended to insult you. Codes of conduct are hard to enforce... better if we all just agree to soften our language if we can.

@octogonz
Copy link
Author

In a broad community people will have different ideas about what constitutes professional behavior. So it can be helpful to write something down, even if there is no "enforcement."

(BTW Eric didn't insult me, but rather the TypeScript maintainers. Given that TypeScript's NPM stats are 3x the popularity of Babel and 144x the popularity of ANTLR, it seemed a bit cavalier to dismiss their design decision and claim that it is TypeScript's responsibility to accommodate ANTLR's assumption that everyone uses Babel.)

I thought of another possibility by the way: If the ANTLR package provided its own ES5 bundle that is built using Babel, we can use that directly, and then we won't need need to use TypeScript to transpile your library. However PR #3041 is still the best solution IMO.

@parrt
Copy link
Member

parrt commented Jan 22, 2021

Yeah, probably should avoid too many digs on other projects...hahah. I can't comment on JS/babel as it's outside my area.

@ericvergnaud
Copy link
Contributor

@octogonz

as I mentioned 2 weeks ago:

ok it seems the npm package is not webpacked as expected, it contains the src code but not the generated minimized bundle
that's the actual issue

so maybe you could submit a PR that does just that: "provided its own ES5 bundle that is built using Babel"

@ericvergnaud ericvergnaud reopened this Jan 23, 2021
@ericvergnaud
Copy link
Contributor

@octogonz you could try using this branch https://github.com/ericvergnaud/antlr4/tree/purify-javascript-runtime-code
it supposedly generates an ES5 script file

@octogonz
Copy link
Author

octogonz commented Feb 1, 2021

Here's an initial sketch: PR #3065

It builds two ES5 bundles, one for development, and one for production. By default the "module" entry point links to the development bundle.

It seems that the "main" entry point must still refer to src/index.js for now, because the ANTLR runtime relies on filesystem APIs that are not available in a web browser. (Your Webpack configuration works around that problem by mapping those modules to "empty", but this is not very clean. A better solution might be to separate the ANTLR APIs with Node.js dependencies to have their own supplementary entry point.)

I can test this and refine it some more. But first I'm wondering how you will publish it. It seems that the version 4.9.1 did not include the dist folder, so perhaps some work is also needed for whatever CI job publishes these NPM relases. (?)

@ericvergnaud
Copy link
Contributor

@octogonz can you move your last comment to the PR itself - best to discuss there ?

@ericvergnaud
Copy link
Contributor

We will soon be shipping a typescript target, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants