Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

add doc for string indexer ordering

How was this patch tested?

existing tests

zhengruifeng3 and others added 3 commits July 17, 2018 18:43
@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93169 has finished for PR 21792 at commit febd66f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93170 has finished for PR 21792 at commit 2003324.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jul 17, 2018

My only question here is whether it's worth duplicating this documentation from the Python and R docs. It's more detailed info for callers of the API, and might just belong in the API doc.

@zhengruifeng
Copy link
Contributor Author

@srowen I think we need to update the docs
1, Current doc in StringIndexer is somewhat misleading: "The indices are in [0, numLabels), ordered by label frequencies, so the most frequent label gets index 0." this is true only with default ordering type.
2, In RFormula, stringOrderType only affect feature columns, not label column. This need to be emphasised, which is somewhat out of expectation.

@MLnick your thoughts?

@srowen
Copy link
Member

srowen commented Jul 21, 2018

Merged to master

@asfgit asfgit closed this in 81af886 Jul 21, 2018
@zhengruifeng zhengruifeng deleted the doc_string_indexer_ordering branch August 21, 2019 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants