Skip to content

Conversation

@burakkose
Copy link
Contributor

@burakkose burakkose commented May 6, 2016

What changes were proposed in this pull request?

  • add locale support for the case insensitive operation
  • add a new Python test
  • change filter to filterNot

How was this patch tested?

unit tests

* Default: English locale ("en")
* @group param
*/
val locale: Param[String] = new Param[String](this, "locale",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn't all this perhaps be linked to the stopwords set? if you loaded the French stopwords you'd want the French locale always?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but, How can we know that users loaded the French stopwords? User can load stopwords by
StopWordsRemover.loadDefaultStopWords("french")
and setting is
new StopWordsRemover().setStopWords(stopWords)
. Do you have any suggestion about that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For supported languages, we can know the appropriate locale and maintain an internal mapping. So "french" is known to map to Locale.FRENCH. For loading an arbitrary list, we don't know, but you could provide an overload where you provide a Locale.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented May 7, 2016

(@burakkose I think the cc:@mengxr can be left in comments not in the PR description because I guess cc for someone to review may not be the part of the PR itself. @ will be removed by merging script anyway but cc: and mengxr in the PR summary. I am a bit careful about this because it seems (I was told) some committers agree with this and others do not. but strictly I think it is right.)

@burakkose
Copy link
Contributor Author

@HyukjinKwon, thank you for informing. Yes, you're right.

setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
/**
* Locale for doing a case sensitive comparison
* Default: English locale ("en")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we list what're the available options, or provide some reference here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done

@hhbyyh
Copy link
Contributor

hhbyyh commented May 9, 2016

Made a pass. That's all from me.

@hhbyyh
Copy link
Contributor

hhbyyh commented May 13, 2016

This is blocking user guide /examples update for 2.0.

burakkose added 2 commits May 13, 2016 09:28
# Conflicts:
#	mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala
@burakkose
Copy link
Contributor Author

Can you specify the blocking?

} else {
// TODO: support user locale (SPARK-15064)
val toLower = (s: String) => if (s != null) s.toLowerCase else s
val loadedLocale = StopWordsRemover.loadLocale($(locale))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just new Locale($(locale))

@hhbyyh
Copy link
Contributor

hhbyyh commented May 23, 2016

I'm not sure if this will be shipped with Spark 2.0. If yes, we should update user guide accordingly.

/**
* Locale for doing a case sensitive comparison
* Default: English locale ("en")
* @see [[http://www.localeplanet.com/java/]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link to the official Java doc: https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html or the Locale class.

@holdenk
Copy link
Contributor

holdenk commented Oct 7, 2016

@burakkose is this something you are still working on? If so can you update it to master and look at @mengxr's comments - if not interested in working on it anymore no worries.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

srowen added a commit to srowen/spark that referenced this pull request Jan 1, 2017
@srowen srowen mentioned this pull request Jan 1, 2017
@asfgit asfgit closed this in ba48812 Jan 2, 2017
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#12968
Closes apache#16215
Closes apache#16212
Closes apache#16086
Closes apache#15713
Closes apache#16413
Closes apache#16396

Author: Sean Owen <[email protected]>

Closes apache#16447 from srowen/CloseStalePRs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants