Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,8 @@ def coalesce(self, numPartitions):
"""
Returns a new :class:`DataFrame` that has exactly `numPartitions` partitions.

:param numPartitions: int, to specify the target number of partitions

Similar to coalesce defined on an :class:`RDD`, this operation results in a
narrow dependency, e.g. if you go from 1000 partitions to 100 partitions,
there will not be a shuffle, instead each of the 100 new partitions will
Expand All @@ -612,9 +614,10 @@ def repartition(self, numPartitions, *cols):
Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The
resulting DataFrame is hash partitioned.

``numPartitions`` can be an int to specify the target number of partitions or a Column.
If it is a Column, it will be used as the first partitioning column. If not specified,
the default number of partitions is used.
:param numPartitions:
can be an int to specify the target number of partitions or a Column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to:

        :param numPartitions: can be an int to specify the target number of partitions or a Column.
            If it is a Column, it will be used as the first partitioning column. If not specified,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, was maintaining consistency with other :param on page.

If it is a Column, it will be used as the first partitioning column. If not specified,
the default number of partitions is used.

.. versionchanged:: 1.6
Added optional arguments to specify the partitioning columns. Also made numPartitions
Expand Down Expand Up @@ -673,9 +676,10 @@ def repartitionByRange(self, numPartitions, *cols):
Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The
resulting DataFrame is range partitioned.

``numPartitions`` can be an int to specify the target number of partitions or a Column.
If it is a Column, it will be used as the first partitioning column. If not specified,
the default number of partitions is used.
:param numPartitions:
can be an int to specify the target number of partitions or a Column.
If it is a Column, it will be used as the first partitioning column. If not specified,
the default number of partitions is used.

At least one partition-by expression must be specified.
When no explicit sort order is specified, "ascending nulls first" is assumed.
Expand Down Expand Up @@ -892,6 +896,8 @@ def colRegex(self, colName):
def alias(self, alias):
"""Returns a new :class:`DataFrame` with an alias set.

:param alias: string, an alias name to be set for the DataFrame.

>>> from pyspark.sql.functions import *
>>> df_as1 = df.alias("df_as1")
>>> df_as2 = df.alias("df_as2")
Expand Down Expand Up @@ -1900,7 +1906,7 @@ def withColumnRenamed(self, existing, new):
This is a no-op if schema doesn't contain the given column name.

:param existing: string, name of the existing column to rename.
:param col: string, new name of the column.
:param new: string, new name of the column.

>>> df.withColumnRenamed('age', 'age2').collect()
[Row(age2=2, name=u'Alice'), Row(age2=5, name=u'Bob')]
Expand Down