Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented May 3, 2016

What changes were proposed in this pull request?

Move the VectorUDT and MatrixUDT in PySpark to new ML package.

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented May 3, 2016

Test build #57642 has finished for PR 12870 at commit 24ec53e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

@viirya Thanks for this PR. I'll make a few high-level comments.

We need to keep the old pyspark.mllib.linalg APIs. We also want to freeze the pyspark.mllib.linalg APIs, which means we need to keep that old code (not just alias it).

Can you please modify the PR to do the following:

  • Keep the pyspark.mllib.linalg code
  • Copy the linalg code to pyspark.ml.linalg, including unit tests
  • Update the pyspark.ml.linalg code as needed to handle UDTs and make tests run

Thanks!

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57723 has finished for PR 12870 at commit b504a60.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class VectorUDT(UserDefinedType):
    • class MatrixUDT(UserDefinedType):
    • class Vector(object):
    • class DenseVector(Vector):
    • class SparseVector(Vector):
    • class Vectors(object):
    • class Matrix(object):
    • class DenseMatrix(Matrix):
    • class SparseMatrix(Matrix):
    • class Matrices(object):
    • class QRDecomposition(object):

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57725 has finished for PR 12870 at commit 7314444.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 4, 2016

@jkbradley Thanks for your comments. I want to make sure that you said we will freeze the pyspark.mllib.linalg APIs, so do I need to change the current PySpark ml/mllib codes which use pyspark.mllib.linalg APIs to use new pyspark.ml.linalg?

I just did that. So if you mean we don't touch that, I need to revert it back. Thanks.

@viirya viirya changed the title [SPARK-14906][ML][WIP] Move VectorUDT and MatrixUDT in PySpark to new ML package [SPARK-14906][ML] Move VectorUDT and MatrixUDT in PySpark to new ML package May 4, 2016
@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57728 has finished for PR 12870 at commit 7df46dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 6, 2016

ping @jkbradley @dbtsai @mengxr

…or-matrix-udt-3-

Conflicts:
	python/pyspark/ml/tests.py
	python/pyspark/mllib/tests.py
@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58489 has started for PR 12870 at commit 33afa2e.

@shaneknapp
Copy link
Contributor

i will retrigger this build once maintenance is over.

@shaneknapp
Copy link
Contributor

jenkins, test this please

1 similar comment
@shaneknapp
Copy link
Contributor

jenkins, test this please

@mengxr
Copy link
Contributor

mengxr commented May 12, 2016

@viirya Sorry for late response! The changes should be similar to mllib-local. The scope is not moving UDTs but copying the entire linalg package to spark.ml. We should leave spark.mllib unchanged and turn it to maintenance mode in 2.0. Do you have time to make this change? Or I can send a PR based on yours. Thanks!

@SparkQA
Copy link

SparkQA commented May 12, 2016

Test build #58500 has finished for PR 12870 at commit 33afa2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented May 13, 2016

@mengxr Currently I copyed only Vector/Matrix and their UDTs to pyspark.ml.linalg and made pyspark.ml and pyspark.mllib codes to use moved pyspark.ml.linalg Vector/Matrix. Besides, pyspark.mllib.linalg is untouched.

Do you mean that we want to keep pyspark.mllib codes using pyspark.mllib.linalg and let pyspark.ml codes using new pyspark.ml.linalg?

@viirya
Copy link
Member Author

viirya commented May 13, 2016

@mengxr @dbtsai Separating the pickle paths for pyspark.ml and pyspark.mllib is not possible without the changes in #12627, as it requires the Scala ML code changes as well.

I will submit another PR for this jira that only copies pyspark.mllib.linalg to pyspark.ml.linalg. When that PR is merged, I will apply the changes of python codes in this PR against #12627 by submitting a PR for it. Then I think it should solve the problem.

@viirya
Copy link
Member Author

viirya commented May 13, 2016

@mengxr @dbtsai That PR is at #13099.

@dbtsai
Copy link
Member

dbtsai commented May 16, 2016

Ping @mengxr

@viirya viirya closed this May 17, 2016
@viirya viirya deleted the move-pyspark-vector-matrix-udt-3- branch December 27, 2023 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants