Skip to content

Conversation

@jiayue-zhang
Copy link
Contributor

What changes were proposed in this pull request?

SPARK-14712
spark.mllib LogisticRegressionModel overrides toString to print a little model info. We should do the same in spark.ml and override repr in pyspark.

How was this patch tested?

LogisticRegressionSuite.scala
Python doctest in pyspark.ml.classification.py

@jiayue-zhang
Copy link
Contributor Author

jiayue-zhang commented Aug 3, 2017

Hi @holdenk , I'm opening this PR to continue the effort in #12491
When adding doctest, I noticed that the uid part of the string is always a random UUID, for example LogisticRegression_42a29165218fb4d4f81d22, generated in Scala even after I call _resetUid. So I re-construct the __repr__ string in Python code.
Really appreciate it if @yanboliang @jkbradley can also take a look.

@holdenk
Copy link
Contributor

holdenk commented Sep 6, 2017

On the Python side using repr looks reasonable (although I would like to see a doctest for this). But we really should get @jkbradley or maybe @dbtsai to take a look on the ML side.

Jenkins ok to test.

Sorry for the delay on getting to this.

@SparkQA
Copy link

SparkQA commented May 18, 2018

Test build #4183 has finished for PR 18826 at commit a718694.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 22, 2018

Test build #4187 has finished for PR 18826 at commit a718694.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@jiayue-zhang
Copy link
Contributor Author

This PR recently got tested so it draws my attention. Is this something we want to proceed? @holdenk @yanboliang @jkbradley @dbtsai I don't see how the test failures relate to LogisticRegression. Might it be some other flaky tests? Also tagging @HyukjinKwon who is active in maintaining stale PRs.

@HyukjinKwon
Copy link
Member

@bravo-zhang, mind if I ask to rebase it and see if the tests pass? BTW, let's fix the PR title to link the JIRA.

@jiayue-zhang jiayue-zhang changed the title LogisticRegressionModel.toString should summarize model [SPARK-14712][ML] LogisticRegressionModel.toString should summarize model Jun 1, 2018
@jiayue-zhang
Copy link
Contributor Author

@HyukjinKwon It's ready to test.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 7, 2018

Test build #91531 has finished for PR 18826 at commit 96d9430.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiayue-zhang
Copy link
Contributor Author

@HyukjinKwon Can you recommend someone to take a look at this PR or maybe you can take a look?

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

I think you cc'ed right ones ..

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92157 has finished for PR 18826 at commit 96d9430.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing this earlier, looks really good - quick question on the Python side though. Would love to get this in :)

java_blr_summary = self._call_java("evaluate", dataset)
return BinaryLogisticRegressionSummary(java_blr_summary)

def __repr__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my question here is why we aren't calling the Java/Scala toString method directly as we do in the mllib one and in many of the other models in regression.py for the ml one?

@SparkQA
Copy link

SparkQA commented Jun 23, 2018

Test build #92240 has finished for PR 18826 at commit 189acfb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiayue-zhang
Copy link
Contributor Author

@holdenk @HyukjinKwon I updated __repr__ by calling toString. I also added class name LogisticRegressionModel: to toString. Otherwise uid alone is a bit confusing.

@holdenk
Copy link
Contributor

holdenk commented Jun 28, 2018

LGTM merging to master.

@holdenk
Copy link
Contributor

holdenk commented Jun 28, 2018

Thanks for the improvement :)

@asfgit asfgit closed this in 524827f Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants