Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add case insensitive comparison, besides Levenstein for DYM #46347

Merged
merged 3 commits into from
Dec 2, 2017
Merged

Add case insensitive comparison, besides Levenstein for DYM #46347

merged 3 commits into from
Dec 2, 2017

Conversation

raventid
Copy link
Contributor

Closes #46332

Draft version. The idea is that Levenstein does not work for some cases when we have multiple equal weights for strings. I didn't understand the case with if found != name => Some(found) so it means that new code does not work correctly yet.

At least now I think that we might return all maximal weights from levenstein and think about next cases in priority order:

  1. There is exact match -> None
  2. There is exact match, but case insensitive -> Some(match)
  3. There is some match from levenstein -> Some(matches.take_any)
  4. There is no match -> None

@estebank WDYT?

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @pnkfelix (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@@ -0,0 +1,14 @@
// Original problem is Levenstein distance.

fn TyUint() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(An enum variant rather than a function is closer to the spirit of the example in the issue (which isn't to say that we shouldn't test functions, too, but those are snake_case_in_standard_style, unlike TyUint).)

@@ -0,0 +1,14 @@
// Original problem is Levenstein distance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue-46332.rs (with the hyphen) as a file name would conform more with most of the other test file names. The license also needs to be at the top of the file (the build system will check).



// Ugly code, just to see if using case insensitive comparison will help
let exact_match = names.iter().find(|x| x.as_str().to_uppercase() == name.as_str().to_uppercase());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case_insensitive_match would be a less-misleading name.

Some(found) => Some(found.clone()),
_ => None,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less nesting:

if case_insensitive_match.is_some() {
    return case_insensitive_match.cloned()
}

_ => None,
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that we should likely be making this change as a "tiebreaker" in syntax::util::lev_distance::find_best_match_for_name itself: the only time case-insensitive match is going to do better than minimizing Levenshtein distance is when there's a tie for the minimum and the method makes an unlucky choice among the minima, as in the motivating example (TyInt and TyUint are both Levenshtein-distance 1 away from TyUInt).

@petrochenkov
Copy link
Contributor

I didn't understand the case with if found != name

It looks very silly when you write foo and compiler tells you "foo is wrong, use foo instead".
The candidate search isn't precise so it can sometimes find things that have the same name (perfect match from Levenshtein's distance point of view!) but are still inaccessible, so we need to filter them away.

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 29, 2017
Copy link
Contributor

@estebank estebank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the comments above.

Some(found) => Some(found.clone()),
_ => None,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can be written as

if let Some(found) = exact_match {
    return Some(found.clone());
}



// Ugly code, just to see if using case insensitive comparison will help
let exact_match = names.iter().find(|x| x.as_str().to_uppercase() == name.as_str().to_uppercase());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the following to account for @petrochenkov's comment

names.iter().find(|x| {
    x.as_str().to_uppercase() == name.as_str().to_uppercase() && x.as_str() != name.as_str()
})


fn main() {
TyUInt();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the .stderr file in this PR.

struct TyInt {}

fn main() {
TyUInt {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a comment below this to test for errors that do not care for the ui output:

//~^ ERROR cannot find struct, variant or union type `TyUInt` in this scope

The lack of this is why the test failed. It was made mandatory recently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Fixed the build.

@raventid raventid changed the title WIP: Add case insensitive comparison, besides Levenstein for DYM Add case insensitive comparison, besides Levenstein for DYM Dec 1, 2017
@estebank
Copy link
Contributor

estebank commented Dec 1, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Dec 1, 2017

📌 Commit f18446e has been approved by estebank

@petrochenkov
Copy link
Contributor

An alternative would be to adjust the distance formula to incorporate case differences, e.g. insertion/deletion/arbitrary substitution would still add 1 to the distance, but different case substitution would add 0.5.

@bors
Copy link
Contributor

bors commented Dec 2, 2017

⌛ Testing commit f18446e with merge 7a64987...

bors added a commit that referenced this pull request Dec 2, 2017
…ebank

Add case insensitive comparison, besides Levenstein for DYM

Closes #46332

Draft version. The idea is that Levenstein does not work for some cases when we have multiple equal weights for strings. I didn't understand the case with `if found != name => Some(found)` so it means that new code does not work correctly yet.

At least now I think that we might return all maximal weights from levenstein and think about next cases in priority order:

1) There is exact match -> None
2) There is exact match, but case insensitive -> Some(match)
3) There is some match from levenstein -> Some(matches.take_any)
4) There is no match -> None

@estebank WDYT?
@bors
Copy link
Contributor

bors commented Dec 2, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: estebank
Pushing 7a64987 to master...

@bors bors merged commit f18446e into rust-lang:master Dec 2, 2017
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Oct 18, 2021
…match-names, r=estebank

Suggest a case insensitive match name regardless of levenshtein distance

Fixes rust-lang#86170

Currently, `find_best_match_for_name` only returns a case insensitive match name depending on a Levenshtein distance. It's a bit unfortunate that that hides some suggestions for typos like `Bar` -> `BAR`. That idea is from rust-lang#46347 (comment), but I think it still makes some sense to show a candidate when we find a case insensitive match name as it's more like a typo.
Skipped the `candidate != lookup` check because the current (i.e, `levenshtein_match`) returns the exact same `Symbol` anyway but it doesn't seem to confuse anything on UI tests.

r? `@estebank`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Oct 19, 2021
…match-names, r=estebank

Suggest a case insensitive match name regardless of levenshtein distance

Fixes rust-lang#86170

Currently, `find_best_match_for_name` only returns a case insensitive match name depending on a Levenshtein distance. It's a bit unfortunate that that hides some suggestions for typos like `Bar` -> `BAR`. That idea is from rust-lang#46347 (comment), but I think it still makes some sense to show a candidate when we find a case insensitive match name as it's more like a typo.
Skipped the `candidate != lookup` check because the current (i.e, `levenshtein_match`) returns the exact same `Symbol` anyway but it doesn't seem to confuse anything on UI tests.

r? ``@estebank``
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants