Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

add single-column input/ouput support in OneHotEncoder

Why are the changes needed?

Currently, OneHotEncoder only has multi columns support. It makes sense to support single column as well.

Does this PR introduce any user-facing change?

Yes
OneHotEncoder.setInputCol
OneHotEncoder.setOutputCol

How was this patch tested?

Unit test

@huaxingao huaxingao changed the title [SPARK-29565][ML][PYTHON] [SPARK-29565][ML][PYTHON] OneHotEncoder should support single-column input/output Oct 26, 2019
@SparkQA
Copy link

SparkQA commented Oct 26, 2019

Test build #112695 has finished for PR 26265 at commit 685b7e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _OneHotEncoderParams(HasInputCol, HasInputCols, HasOutputCol, HasOutputCols,

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems OK for consistency. I guess however you can already specify just one input column? and is it more usual to one-hot encode probably all categoricals columns at once? I don't mind either way: making it consistent, or arguing this is a bit different and leaving it.

@huaxingao
Copy link
Contributor Author

@srowen Thanks for your comment. I prefer to add the single column support for the consistency and completeness of the APIs. Also, in the use cases of one-hot encoding a single column, it's simpler and more convenient to use an API of one input column.

@zhengruifeng
Copy link
Contributor

@huaxingao You need to rebase this PR.

@SparkQA
Copy link

SparkQA commented Oct 29, 2019

Test build #112820 has finished for PR 26265 at commit f3fa47d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class _OneHotEncoderParams(HasInputCol, HasInputCols, HasOutputCol, HasOutputCols,

@SparkQA
Copy link

SparkQA commented Oct 29, 2019

Test build #112821 has finished for PR 26265 at commit 9d5d6ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Oct 29, 2019

Thanks! Merging to master.

@viirya viirya closed this in 37690de Oct 29, 2019
@huaxingao
Copy link
Contributor Author

Thanks! @viirya @srowen @zhengruifeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants