Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: add ASCII only lint rule in lib/ #11371

Closed
wants to merge 1 commit into from

Conversation

hkal
Copy link

@hkal hkal commented Feb 14, 2017

Detects if files in lib/ contain non-ASCII characters and
raises a linting error. Also removes non-ASCII characters from
lib/timers.js

Fixes: #11209

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

tools, lib

@nodejs-github-bot nodejs-github-bot added timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. tools Issues and PRs related to the tools directory. labels Feb 14, 2017
@Trott
Copy link
Member

Trott commented Feb 14, 2017

lib/timers.js Outdated
// ║ ╚════ > Actual JavaScript timeouts
// ║
// ╚════ > Linked List
// |---- > Object Map
Copy link
Contributor

@mscdex mscdex Feb 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: maybe slashes would look a tad better for the corners instead of a pipe?

@thefourtheye
Copy link
Contributor

This contradicts #11129

@aqrln
Copy link
Contributor

aqrln commented Feb 14, 2017

@thefourtheye actually it doesn't, to the contrary, I'd say it complements it.

lib/timers.js Outdated
// | |
// | |---- > Actual JavaScript timeouts
// |
// |---- > Linked List
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be okay to make an exception for this file and just add an eslint-disable line for the rule.

(It would be good to have this file as a test that we can use UTF-8 in sources, even if we prefer not to.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax I don't really think it is a way to test this. It wasn't possible to use UTF-8 in sources since #5458 and before #11129 yet all worked well since it doesn't actually matter whether comments are encoded correctly while these are just comments.

@gibfahn
Copy link
Member

gibfahn commented Feb 14, 2017

@thefourtheye cross-posting @bnoordhuis's comment from #11209 (comment)

Would be good to enforce it because Unicode files take up twice as much space in the binary as plain ASCII files.

@jasnell
Copy link
Member

jasnell commented Feb 14, 2017

I'm really not entirely sure that we should do this.

@addaleax
Copy link
Member

Fwiw once I have the time (maybe later this week) I’d like to look into making the tooling strip comments during compilation, so that we can at least keep non-ASCII characters inside of comments.

@Fishrock123
Copy link
Contributor

Erm, I added those comments, what is the point of this? The source bundling tool supports UTF-8 as previously linked above.

@Fishrock123 Fishrock123 self-requested a review February 14, 2017 15:54
@aqrln
Copy link
Contributor

aqrln commented Feb 14, 2017

@addaleax that would be great. Do you mind if I take it? :)

@addaleax
Copy link
Member

@aqrln You mean, updating the tooling to do that? Sure, go for it! You can ping me if there are any questions :)

@bnoordhuis
Copy link
Member

@addaleax @aqrln It shouldn't strip line ends though, or line numbers in stack traces won't match with the on-disk files.

@aqrln
Copy link
Contributor

aqrln commented Feb 14, 2017

@bnoordhuis sure thing :) But thanks for pointing that out anyway.

@hkal
Copy link
Author

hkal commented Feb 15, 2017

Alright, so it looks like the consensus is we don't want the linter checking for ASCII characters and other solutions will be explored?

@addaleax
Copy link
Member

@hkal I think it still makes sense to try to enforce this outside of comments… I am not an eslint expert but it looks like adjusting your code should be easy?

Maybe just wait until we’ve resolve the above discussion…

@hkal
Copy link
Author

hkal commented Feb 15, 2017

@addaleax the code can easily be changed to not include comments. I'll hold off making any changes until a decision has been reached. Thanks!

@gibfahn
Copy link
Member

gibfahn commented Feb 16, 2017

I think it still makes sense to try to enforce this outside of comments… I am not an eslint expert but it looks like adjusting your code should be easy?

Maybe just wait until we’ve resolve the above discussion…

@Fishrock123 @thefourtheye are you opposed to a lint rule that enforces ASCII outside of comments?

@Fishrock123
Copy link
Contributor

Not really, I think?

aqrln added a commit to aqrln/node that referenced this pull request Feb 16, 2017
In order to allow using Unicode characters inside comments of built-in
JavaScript libraries without forcing them to be stored as UTF-16 data in
Node's binary, update the tooling to strip comments during build
process. All line breaks are preserved so that line numbers in stack
traces aren't broken.

Refs: nodejs#11129
Refs: nodejs#11371 (comment)
@hkal
Copy link
Author

hkal commented Mar 8, 2017

@gibfahn any movement on this?

@gibfahn
Copy link
Member

gibfahn commented Mar 9, 2017

@hkal I personally prefer the unicode comment in lib/timers.js, so I'd rather we added an eslint-disable exception for that block and left it as it is. In the future if that is moved out into a separate guide it could be removed.

Otherwise, unless @jasnell or @thefourtheye or any other collaborators disagree, we should be able to move forward on this.

@jasnell
Copy link
Member

jasnell commented Mar 10, 2017

As long as we add the eslint-disable for the timers comment block, I'm ok with this going forward

@TimothyGu
Copy link
Member

@hkal, can you add a eslint-disable for lib/timers.js as @aqrln and @jasnell suggested? If you do so we'll be able to merge this. Thanks a lot!

@thefourtheye
Copy link
Contributor

@gibfahn wrote

@Fishrock123 @thefourtheye are you opposed to a lint rule that enforces ASCII outside of comments?

No, I am not opposed to it.

@hkal hkal force-pushed the only-ascii-characters branch 3 times, most recently from 3a7ff16 to aad90cf Compare March 22, 2017 14:36
@hkal
Copy link
Author

hkal commented Mar 22, 2017

@gibfahn Alright, I think we're good here. Please review.

lib/timers.js Outdated
@@ -19,6 +19,7 @@
// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
// USE OR OTHER DEALINGS IN THE SOFTWARE.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated whitespace change :-)

const { loc } = token;

// Will only report the first non-ascii character per line
const character = matches[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either the comment is misleading, or it works not the way you planned it to. This will report the first non-ASCII character per token, not per line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes the values in the tokens (couldn't think of a better name) array look like:

{ type: 'Punctuator',
  value: ';',
  start: 22327,
  end: 22328,
  loc:
   SourceLocation {
     start: Position { line: 767, column: 1 },
     end: Position { line: 767, column: 2 } },
  range: [ 22327, 22328 ] }
{ type: 'Line',
  value: ' Copyright Joyent, Inc. and other Node contributors.',
  start: 0,
  end: 54,
  range: [ 0, 54 ],
  loc:
   { start: Position { line: 1, column: 0 },
     end: Position { line: 1, column: 54 } } }

In the case of type Line we only report the the first occurrence of a non-ASCII character. In my latest iteration I took the comment out since it didn't really help clarify.

}
});

errors.forEach((error) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do this in one pass without the extra errors array? You can make this a named function and call it instead of errors.push().

const { value } = token;
const matches = value.match(nonAsciiPattern);

if (matches) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be better to flip the condition and return early (if (!matches) return;) reducing the indentation level for the rest of the function.

Detects if files in lib/ contain non-ASCII characters and
raises a linting error. Also removes non-ASCII characters
from lib/console.js comments

Fixes: nodejs#11209
@hkal hkal force-pushed the only-ascii-characters branch from aad90cf to e7b2a19 Compare March 24, 2017 04:39
@hkal
Copy link
Author

hkal commented Mar 24, 2017

I think I've addressed all the feedback.

@aqrln @jasnell @gibfahn

@gibfahn
Copy link
Member

gibfahn commented Mar 24, 2017

As this is an eslint rule addition, cc/ @not-an-aardvark, @silverwind, @Trott, @mscdex

const commentTokens = source.getAllComments();
const tokens = sourceTokens.concat(commentTokens);

tokens.forEach((token) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail to match non-ascii whitespace that could appear between tokens, so it's not quite disallowing all non-ascii characters in files. This could be fixed by matching the regex against source.text rather than against each token.

That said, I think the no-irregular-whitespace rule will cover non-ascii whitespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the latter is true, it would be good to have a comment explaining that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment alone would not be sufficient, it is important that the linter ensures there are no irregular Unicode whitespace characters since they cannot be seen during code review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aqrln by:

If the latter is true

I mean that if:

I think the no-irregular-whitespace rule will cover non-ascii whitespace.

@not-an-aardvark's theory is correct, and the non-ASCII whitespace characters are already covered in a separate rule, then we could just use that for whitespace, and add a comment in here to explain that we don't need to worry about whitespace as it's covered in another rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibfahn ah, I see, sorry. I didn't pay enough attention to that "if the latter" part so I didn't understand you right.

// Rule Definition
//------------------------------------------------------------------------------

const nonAsciiPattern = new RegExp('([^\x00-\x7F])', 'g');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to use a regex literal here instead of the RegExp constructor. Right now, \x00 and \x7F are interpreted as part of the string, so the resulting regex pattern actually contains a null character. This still works fine, but it could be confusing for debugging (e.g. if the regex is printed, it will be difficult to tell that it contains a null character).


reportError({
line: loc.start.line,
column,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could result in an invalid report location if the offending character is in a block comment. For example:

/* foo
■ */

The rule reports an error for this comment at line 1, column 7, but that location doesn't actually exist.

* @author Kalon Hinds
*/

/* eslint no-control-regex:0 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer see eslint-disable-line or eslint-disable-next-line to target the places where the control characters are needed.

@Fishrock123
Copy link
Contributor

still not really a fan

@BridgeAR
Copy link
Member

@hkal would you be so kind and have a look at the other comments and rebase this?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Aug 29, 2017
@BridgeAR
Copy link
Member

BridgeAR commented Sep 8, 2017

Closing this due to a long inactivity period. @hkal thanks for your contribution anyways and please feel free to reopen (or just leave a comment to reopen) if you would like to follow up on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues and PRs that are stalled. timers Issues and PRs related to the timers subsystem / setImmediate, setInterval, setTimeout. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linter rule: use ASCII quotes not Unicode ones