Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Re-implementations to fix function and #define related errors. #206

Merged
merged 7 commits into from
Mar 6, 2017
Merged

Re-implementations to fix function and #define related errors. #206

merged 7 commits into from
Mar 6, 2017

Conversation

alpyre
Copy link
Contributor

@alpyre alpyre commented Feb 12, 2017

Description of the Change

This PR has major re-implementations to provide fixes for many issues (along with some minor tweaks here’n there). Since the issues are related and the fixing implementations are intermingled, it was unable to cover them in separate PRs.

  1. Function patterns are re-implemented according to the “SOLUTION” in issue Scope floods because of macros being matched as functions. #189 :
    meta.function scopes are now captured in two steps:

    • to avoid unnecessary nesting of meta.function scopes when function calls (or func. like macros) are nested,
    • to provide a meta.function scope that even supports the injection of support-function names,
    • to provide balanced function parentheses.
  2. #define pattern is re-implemented according:

    • to provide fully featured highlighting of the definition part of the line,
    • to avoid scope flood if the coder uses “misnesting”
      see: Misnesting
  3. Unnecessary nesting of section scopes (like meta.parens, meta.block, meta.function). fixes Fix unnecessary nesting of meta.block.c #113

  4. All parenthesis patterns are re-implemented to provide complete balanced parenthesis tokenization everywhere in code. fixes Punctuation improvments #103

  5. And some minor changes like:

    • Highlighting the GNU-C keyword #include_next.
    • #pragma and #pragma-mark lines are now named as meta.preprocessor.pragma.c so an injecting grammar may choose to inject only into them or exclude them (as they exclude “strings” and “comments” when necessary).
    • Hyphen characters are included in the #pragma identifier.
  6. New specs added for new features. Old specs are corrected to reflect changes.

Alternate Designs

  • Since we now have very smart parenthesis tokenization we can catch wayward close parentheses and highlight them as “invalid” (if we wish to).

  • (See Possible Drawbacks) By adding a specific pattern for function implementation blocks (right before the current function pattern) we can get back the previous behavior (without any drawbacks from the current one) if found necessary.

  • Implementation to prevent unnecessary nesting of section scopes does not cover ‘blocks in parentheses’ (which is a rare usage). Although occurrence of such a pattern does not break any highlighting features, only an inner meta.parens scope will repeat in scopes array. Covering that would confuscate the code a lot (would require many new rules) so it is not done in this PR.
    NOTE: As an alternative idea, preventing unnecessary nesting of section scopes could be implemented in Atom engine (an issue may be opened there). When such a feature is available we may consider simplifying this grammar.

Benefits

  • Fully highlights the definition part of the #define lines.
  • Fixes many currently open issues. (See Applicable Issues)
    (Also see Description of the Change)

Possible Drawbacks

  • With this new function implementation we drew back from the function implementation blocks being scoped in meta.function.c scope (which was discussed in Scope floods because of macros being matched as functions. #189). Since this older behaviour had no benefits, yet it is still applicable (See Alternate Designs).

  • The scope whitespace.function.leading (and similars) are not tokenized anymore. This seems to have no side effects.

Applicable Issues

Fixes #211, fixes #209, fixes #203, fixes #199, fixes #192, fixes #189, fixes #186, fixes #185, fixes #180, fixes #173, fixes #163, fixes #160, fixes #155, fixes #152, fixes #124, fixes #119, fixes #113, fixes #111, fixes #103, fixes #100, fixes #90, fixes #78, fixes #75, fixes #31

These issues are related (but not fixed):
#101, #23

Visual Demonstrations

Effects of the changes in function patterns:
pr2

Effects of the changes in #define patterns:
pr1

- re-implement function patterns
- re-implement #define patterns
- prevent unnecessary nesting of section scopes
- tokenize all parentheses
- highlight #include_next
- introduce meta.preprocessor.pragma.c scope
- include hyphens in #pragma identifiers
- create specs for the changes
@alpyre alpyre changed the title Changes for PR #206 Re-implementations to fix function and #define related errors. Feb 12, 2017
- Included Access patterns for functions in blocks.
- Moved the storage_types patterns into repository to provide modularization.
@winstliu
Copy link
Contributor

winstliu commented Feb 12, 2017

I'm looking forward to reviewing this. I recently toyed around a bit with redoing the C grammar as well and had tons of success with just changing how functions and function calls were captured (albeit losing C++ support along the way), so I'll see if I can incorporate some of what I learned into this.

(I've also edited your PR description so that all the fixed issues are automatically closed when this is merged)

Highlight the comments inside function sections.
@alpyre
Copy link
Contributor Author

alpyre commented Feb 25, 2017

Hi there @50Wliu ,
I've figured out that vararg ellipses ... are tokenized as punctuation.separator.dot-access.c which is not right, and this occasionally breaks other highlighting features (especially, when they are used in #define lines).

I know that "varargs" is not a built-in c feature and you may think it would be more appropriate for them to be handled in c++.cson, but on the other hand for many years c programmers are utilizing some magic macros for it (which are a part of many standart link libraries and API's, including GNU-C for many years). So it is better to fix it in c.cson.

I have a solution for this situation and I want to fix it in c.cson if you approve.
...and since it breaks tokenizations especially in #define lines it is highly related to this PR, so I wan't to fix it in this PR if you again approve.

By the way, how is it going with reviewing this PR? I have a new API highlighter package which is heavily dependent on the features in this PR waiting to be published.
Regards.

@winstliu
Copy link
Contributor

Regarding varargs: go for it.

Regarding the review process: I'm hoping to get started this weekend. It might take a while.

@winstliu
Copy link
Contributor

winstliu commented Feb 25, 2017

Ok. I briefly looked this over. The biggest thing that stood out to me was the excessive effort required in removing the nesting of meta.block and meta.parens. Instead of doing that, you can just straight out remove those scopes, which should reduce the complexity of this PR by a lot. I can't think of any reasonable use for them.

grammars/c.cson Outdated
@@ -380,38 +359,48 @@
'name': 'meta.initialization.c'
}
{
'include': '#block'
# Prevent unnecessary nesting of meta.block.c scope
'begin': '\\{'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curly braces don't need to be escaped

grammars/c.cson Outdated
'begin': '\\{'
'beginCaptures':
'0':
'name': 'punctuation.section.block.begin.c'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include .bracket.curly after the begin. Same with end below.

@alpyre
Copy link
Contributor Author

alpyre commented Feb 25, 2017

The use for meta.xxx section scopes is in auto-complete suggestions. The suggestion providers take the scopes at cursor as an argument, and are supposed to return their suggestions regarding those scopes.

As an example, a suggestion provider may suggest specifically function arguments when the cursor is in meta.function.

So their presence is quite important actually.

- Implements tokenization for vararg ellipses.
- Unnecessary escape characters in curly bracket regexes removed.
- punctuation.section.block.XXXX.c scopes are renamed as punctuation.section.block.XXXX.bracket.curly.c
- New spec added for vararg ellipses, and failing specs because of the above two changes fixed.
@winstliu
Copy link
Contributor

Yes, I recognize that. However, meta.block and meta.parens are so generic that there is no information gained from them. I'm not saying to get rid of the other meta scopes. Like you mentioned, most of the other ones do contain useful information, like meta.function.

Unless you can provide a counterexample, of course.

@alpyre
Copy link
Contributor Author

alpyre commented Feb 25, 2017

meta.block is again important for auto-complete suggestions.

For example, the provider may suggest support function snippets when the cursor is inside a block and not when outside respectively. You wouldn't want to be suggested a function call when outside any function block. It would be annoying.

However for meta.parens you may be quite right. But since we did the other two so far, why not keep meta.parens as well? The concept was there before me and maybe some community package somewhere is utilising it! So that drawback may break someones package out there. Who knows?

@winstliu
Copy link
Contributor

@alpyre Could you give a code snippet please? Maybe I'm just too biased to be able to think of something myself 🙂.

@alpyre
Copy link
Contributor Author

alpyre commented Feb 27, 2017

This is the way I used meta.function.c and meta.function-call.c scopes in my (yet unpublished) package.
https://github.com/alpyre/language-amigaos-c/blob/master/lib/autocomplete-provider.js#L19-L30

And I was planning to add a check for meta.block.c as well. (If we decide here to draw back from it, I'll have to find another way to detect it in my auto-complete provider indeed).

And finally for meta.parens.c scope, I have no idea if anybody else utilized it for anything? I've took a quick look at the atom package repos, couldn't find any. It seems safe to remove it, but not %100 percent sure, if it will break someone elses code. I don't know. The decision is yours.

@winstliu
Copy link
Contributor

winstliu commented Feb 27, 2017

Ok. So my comment here is going to be based off of language-java, which is very verbose with its meta scopes.

Instead of using a generic #block for everything, language-java captures the curly braces in each pattern and assigns it its own meta scope. For example, here's the meta scopes of the following code snippet:

public class Example                     // meta.class.java
{                                        // meta.class.java meta.class.body.java
  public static void main(String[] args) // meta.class.java meta.class.body.java meta.method.java
  {                                      // meta.class.java meta.class.body.java meta.method.java meta.method.body.java
                                         // meta.class.java meta.class.body.java meta.method.java meta.method.body.java
  }                                      // meta.class.java meta.class.body.java
}                                        // none

Basically, meta.*.java starts at the definition and includes the closing bracket, and meta.*.body.java is everything between the two brackets.

I think this method gives us finer-grained control, and takes care of your use case. Is this something you think can be adapted to language-c?

@alpyre
Copy link
Contributor Author

alpyre commented Feb 28, 2017

I believe it can be (and should be) adapted to language-c.
Yet, bear in mind that it won't simplify the code in anyway, instead it will add many new rules.

According to C curriculum, there are only two major types curly bracket use:
One is the declaration of an array of declarations like:

  • Arrays int prime[5] = { 2, 3, 5, 7, 11 };
  • Enums enum Days { SUN, MON, TUE, WED, THR, FRI, SAT };
  • Structs struct Employee { short id; int age; double wage; };
  • Classes very similar to structs but can also have function declarations inside
    (this is C++ only and hopefully won't complicate the c.cson any further)

The other is the local blocks.

I highly support the idea. It will have great benefits for auto-complete providers.
On the other hand, it will introduce new meta scopes and complicate the code even further (since these new ones can be nested in one another: imagine a struct declaration inside another struct within a class declaration inside an enclosing class declaration. We will again have to take care for the inner ones not to infest the scopes array).

Also the implementation for it will not require the removal of anything implemented in this PR (there will still be i.e. local blocks inside local blocks, which must be handled the same way here). It will just add some additional code. So we can keep this idea to be implemented in a next PR.

Currently the only thing that will simplify this PR is the removal of meta.parens scope completely.

We should decide on that.
I don't know why it was introduced at the first place. Was it arbitrary? To me it has no use, and I cannot think any use for it too.

I'd be happy without it and can remove it with your decision now.

@winstliu
Copy link
Contributor

We will again have to take care for the inner ones not to infest the scopes array

Let's not worry about this for now. I don't think local blocks need their own meta scope, do you agree with that?

Currently the only thing that will simplify this PR is the removal of meta.parens scope completely.

💯 Go for it.

Also, maybe those meta changes won't simplify the PR per se, but it may make the end result easier to understand :).

@alpyre
Copy link
Contributor Author

alpyre commented Feb 28, 2017

I don't think local blocks need their own meta scope, do you agree with that?

Inner local blocks don't. The outermost ones do. Because they indicate that the coder is inside a function definition block, so that an autoComplete provider can behave acording to that (the current implementation in this PR is just so).

@winstliu
Copy link
Contributor

winstliu commented Mar 1, 2017

Ok, let's try that and see where it goes.

- Removes all implementations for the tokenization of the section scope meta.parens.c together with all the code tricks to provide unnecessary nesting of it in the scopes array.
- Fix the failing specs because of this drawback.
@alpyre
Copy link
Contributor Author

alpyre commented Mar 1, 2017

Now, before you do the merge, I'd like to make a subtle change in my #define line patterns.
Since there is no meta.parens.c anymore and misnesting is allowed, it seems to me that it is unnecessary to couple opening and closing parentheses in #define lines. We only need to match parentheses and name them as punctuation.

This will save the code from one another rule and also solve the ‘blocks in parentheses’ issue mentioned in Alternate Designs (only for #define lines of course, other exceptions very probably will be covered when we create those verbose block scopes).

@winstliu
Copy link
Contributor

winstliu commented Mar 1, 2017

@alpyre can you clarify on how misnesting is allowed?

@alpyre
Copy link
Contributor Author

alpyre commented Mar 1, 2017

Before including $base, this PR implements special patterns for quoted strings, blocks, parens and functions for the for the definition part of #define lines which terminate on a new line without a line continuation char \. This way the coder can use misnesting without causing section scopes flood.

We can now simplify it by removing the special parens pattern.

Simplifications in #define line rules according to the absence of meta.parens.c scope.
@alpyre
Copy link
Contributor Author

alpyre commented Mar 2, 2017

Ok. I've done the very final simplification. I believe we're ok for the merge for this PR.

Would you please create an issue for verbose meta.block scopes like in language-java.
So we don't forget the next step we should go.

grammars/c.cson Outdated
'endCaptures':
'0':
'name': 'punctuation.section.block.end.bracket.curly.c'
#'name': 'meta.block.c'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

grammars/c.cson Outdated
]
}
{
# Capture paranthesis'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't needed

grammars/c.cson Outdated
'endCaptures':
'0':
'name': 'punctuation.section.block.end.bracket.curly.c'
#'name': 'meta.block.c' <-- DO NOT NAME THE SECTION THIS TIME!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can safely get rid of this comment.

grammars/c.cson Outdated
'include': '#parens'
}
{
# NEW FUNCTION/MACRO IMPLEMENTATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of the NEW FUNCTION comments.

grammars/c.cson Outdated
}
{
'include': '#block'
# NOW CONSUME TOKENS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

grammars/c.cson Outdated
{
'include': '#block'
# NOW CONSUME TOKENS
'include': '#meta-function-scope-innards'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is named function_innards in language-javascript. Since it seems like we're following a dashed convention for this language, could you rename this to function-innards? I think that still gets the point across.

grammars/c.cson Outdated
'patterns': [
{
# NOW CONSUME TOKENS
'include': '#meta-function-call-scope-innards'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to function-calls

grammars/c.cson Outdated
'1':
'name': 'entity.name.function.c'
'2':
'name': 'punctuation.section.parameters.begin.c'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments for function-calls

grammars/c.cson Outdated
'name': 'meta.function-call.c'
'patterns': [
{
# NOW CONSUME TOKENS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@winstliu
Copy link
Contributor

winstliu commented Mar 6, 2017

Additional comments:

  • It's spelled "parentheses" (which is plural)
  • Add .bracket.round scopes to the parentheses, similar to what was done for the curly brackets
  • Have you made sure that C storage types still work correctly in C++?

After that, I think it'll be ready for merge.

@alpyre
Copy link
Contributor Author

alpyre commented Mar 6, 2017

  • 👍
  • You mean like: punctuation.section.parens.begin.bracket.round.c ?
  • They will work, but not as precisely as C ones at the moment. Solution to that should be implemented in C++.cson (which is my next objective)...

- Typo fixes.
- Removal of unnecessary comments
- Addition of .bracket.round to scope names of all parentheses.
@winstliu
Copy link
Contributor

winstliu commented Mar 6, 2017

Yes.

Edit: Oh, I see you already have :)

Copy link
Contributor

@winstliu winstliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting on CI.

@winstliu winstliu merged commit 9abcae3 into atom:master Mar 6, 2017
@winstliu
Copy link
Contributor

winstliu commented Mar 6, 2017

Thanks @alpyre!

@sean-mcmanus
Copy link

sean-mcmanus commented Mar 29, 2017

This change appears to have regressed this scenario #218 (#define with no definition). Could someone fix it?

#ifndef _UCRT
#define _UCRT
#endif

image

@alpyre
Copy link
Contributor Author

alpyre commented Mar 29, 2017

@sean-mcmanus: I cannot reproduce this both in C and C++. Are you sure you are on the latest version of language-c grammar (we're at 0.57.0)?

As far as I know the way I've implemented #define patterns such a behaviour is impossible. Anyway it looks like this on my Atom:
nobugs

I see the _UCRT words are painted yellow on your screen shot. That makes me think there is something else than language-c grammar going on there.

@sean-mcmanus
Copy link

sean-mcmanus commented Mar 30, 2017

VS Code is reporting 0.51.3, but I think they just forgot to update the version, because 1.10.2 of VS Code has the same number and doesn't repro the bug and the code is different. I'll ask the VS Code people to take a look at fixing this: microsoft/vscode#23630 .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.