-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools: add ASCII only lint rule in lib/ #11371
Conversation
lib/timers.js
Outdated
// ║ ╚════ > Actual JavaScript timeouts | ||
// ║ | ||
// ╚════ > Linked List | ||
// |---- > Object Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: maybe slashes would look a tad better for the corners instead of a pipe?
This contradicts #11129 |
@thefourtheye actually it doesn't, to the contrary, I'd say it complements it. |
lib/timers.js
Outdated
// | | | ||
// | |---- > Actual JavaScript timeouts | ||
// | | ||
// |---- > Linked List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be okay to make an exception for this file and just add an eslint-disable
line for the rule.
(It would be good to have this file as a test that we can use UTF-8 in sources, even if we prefer not to.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thefourtheye cross-posting @bnoordhuis's comment from #11209 (comment)
|
I'm really not entirely sure that we should do this. |
Fwiw once I have the time (maybe later this week) I’d like to look into making the tooling strip comments during compilation, so that we can at least keep non-ASCII characters inside of comments. |
Erm, I added those comments, what is the point of this? The source bundling tool supports UTF-8 as previously linked above. |
@addaleax that would be great. Do you mind if I take it? :) |
@aqrln You mean, updating the tooling to do that? Sure, go for it! You can ping me if there are any questions :) |
@bnoordhuis sure thing :) But thanks for pointing that out anyway. |
Alright, so it looks like the consensus is we don't want the linter checking for ASCII characters and other solutions will be explored? |
@hkal I think it still makes sense to try to enforce this outside of comments… I am not an eslint expert but it looks like adjusting your code should be easy? Maybe just wait until we’ve resolve the above discussion… |
@addaleax the code can easily be changed to not include comments. I'll hold off making any changes until a decision has been reached. Thanks! |
@Fishrock123 @thefourtheye are you opposed to a lint rule that enforces ASCII outside of comments? |
Not really, I think? |
In order to allow using Unicode characters inside comments of built-in JavaScript libraries without forcing them to be stored as UTF-16 data in Node's binary, update the tooling to strip comments during build process. All line breaks are preserved so that line numbers in stack traces aren't broken. Refs: nodejs#11129 Refs: nodejs#11371 (comment)
@gibfahn any movement on this? |
@hkal I personally prefer the unicode comment in Otherwise, unless @jasnell or @thefourtheye or any other collaborators disagree, we should be able to move forward on this. |
As long as we add the eslint-disable for the timers comment block, I'm ok with this going forward |
@gibfahn wrote
No, I am not opposed to it. |
3a7ff16
to
aad90cf
Compare
@gibfahn Alright, I think we're good here. Please review. |
lib/timers.js
Outdated
@@ -19,6 +19,7 @@ | |||
// OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE | |||
// USE OR OTHER DEALINGS IN THE SOFTWARE. | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated whitespace change :-)
const { loc } = token; | ||
|
||
// Will only report the first non-ascii character per line | ||
const character = matches[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either the comment is misleading, or it works not the way you planned it to. This will report the first non-ASCII character per token, not per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line makes the values in the tokens
(couldn't think of a better name) array look like:
{ type: 'Punctuator',
value: ';',
start: 22327,
end: 22328,
loc:
SourceLocation {
start: Position { line: 767, column: 1 },
end: Position { line: 767, column: 2 } },
range: [ 22327, 22328 ] }
{ type: 'Line',
value: ' Copyright Joyent, Inc. and other Node contributors.',
start: 0,
end: 54,
range: [ 0, 54 ],
loc:
{ start: Position { line: 1, column: 0 },
end: Position { line: 1, column: 54 } } }
In the case of type Line
we only report the the first occurrence of a non-ASCII character. In my latest iteration I took the comment out since it didn't really help clarify.
} | ||
}); | ||
|
||
errors.forEach((error) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do this in one pass without the extra errors
array? You can make this a named function and call it instead of errors.push()
.
const { value } = token; | ||
const matches = value.match(nonAsciiPattern); | ||
|
||
if (matches) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would be better to flip the condition and return early (if (!matches) return;
) reducing the indentation level for the rest of the function.
Detects if files in lib/ contain non-ASCII characters and raises a linting error. Also removes non-ASCII characters from lib/console.js comments Fixes: nodejs#11209
aad90cf
to
e7b2a19
Compare
As this is an eslint rule addition, cc/ @not-an-aardvark, @silverwind, @Trott, @mscdex |
const commentTokens = source.getAllComments(); | ||
const tokens = sourceTokens.concat(commentTokens); | ||
|
||
tokens.forEach((token) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail to match non-ascii whitespace that could appear between tokens, so it's not quite disallowing all non-ascii characters in files. This could be fixed by matching the regex against source.text
rather than against each token.
That said, I think the no-irregular-whitespace
rule will cover non-ascii whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the latter is true, it would be good to have a comment explaining that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment alone would not be sufficient, it is important that the linter ensures there are no irregular Unicode whitespace characters since they cannot be seen during code review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aqrln by:
If the latter is true
I mean that if:
I think the
no-irregular-whitespace
rule will cover non-ascii whitespace.
@not-an-aardvark's theory is correct, and the non-ASCII whitespace characters are already covered in a separate rule, then we could just use that for whitespace, and add a comment in here to explain that we don't need to worry about whitespace as it's covered in another rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gibfahn ah, I see, sorry. I didn't pay enough attention to that "if the latter" part so I didn't understand you right.
// Rule Definition | ||
//------------------------------------------------------------------------------ | ||
|
||
const nonAsciiPattern = new RegExp('([^\x00-\x7F])', 'g'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to use a regex literal here instead of the RegExp
constructor. Right now, \x00
and \x7F
are interpreted as part of the string, so the resulting regex pattern actually contains a null character. This still works fine, but it could be confusing for debugging (e.g. if the regex is printed, it will be difficult to tell that it contains a null character).
|
||
reportError({ | ||
line: loc.start.line, | ||
column, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could result in an invalid report location if the offending character is in a block comment. For example:
/* foo
■ */
The rule reports an error for this comment at line 1, column 7, but that location doesn't actually exist.
* @author Kalon Hinds | ||
*/ | ||
|
||
/* eslint no-control-regex:0 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer see eslint-disable-line
or eslint-disable-next-line
to target the places where the control characters are needed.
still not really a fan |
@hkal would you be so kind and have a look at the other comments and rebase this? |
Closing this due to a long inactivity period. @hkal thanks for your contribution anyways and please feel free to reopen (or just leave a comment to reopen) if you would like to follow up on this! |
Detects if files in lib/ contain non-ASCII characters and
raises a linting error. Also removes non-ASCII characters from
lib/timers.js
Fixes: #11209
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
tools, lib