Skip to content

Releases: mars-project/mars

v0.7.0a3

03 Jan 04:33
abc46a5
Compare
Choose a tag to compare
v0.7.0a3 Pre-release
Pre-release

This is the release notes of v0.7.0a3. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Implements head() on groupby objects #1849
  • Learn
    • Implements mars.learn.preprocessing.{MinMaxScaler, minmax_scale} (#1844)

Enhancements

  • Improve Proxima recall_by_id computation method (#1805, thanks @rg070836rg!)
  • Revise to/from vineyard, for Tensor and DataFrame. (#1790)
  • Add memory estimation for read_parquet as well as read_csv (#1811)
  • Support using compound agg function in lambda (#1810)
  • Add incremental_index argument to reset_index which by default is False (#1823)
  • Refine kubedl cluster-api. (#1827, thanks @SimonCqk!)
  • Enhancements for Mars on kubedl (#1848)
  • Support to_pandas in a batch way for DataFrame and Series (#1853)
  • Support specifying memory limit scale in kubernetes (#1856)
  • Set marsjob worker cache by memoryTuningPolicy. (#1857, thanks @SimonCqk!)

Bug fixes

  • Fix compatibility for sklearn 0.24.0 (#1817)
  • Remove unnecessary iterative tiling when predicting via XGBoost and data from/to parquet (#1818)
  • Resolve KeyError when calling delete_keys for ray backend (#1846)
  • Fix compatibility for pandas 1.2.0 (#1847)

v0.7.0a2

20 Dec 11:35
10521e6
Compare
Choose a tag to compare
v0.7.0a2 Pre-release
Pre-release

This is the release notes of v0.7.0a2. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Support missing argument for tensor.tosparse() and fill_value argument for sparse_tensor.todense() (#1797)
  • DataFrame
    • Implements {DataFrame,Series}.replace (#1762)
    • Add {DataFrame, Series}.cartesian_chunk support (#1774)
    • Integrate str.cat into reduction and groupby-aggregation (#1776)
    • Implements reduction with level argument (#1779)

Bug fixes

  • Spawn serialization of executable graphs (#1769)
  • Fix getitem on DataFrames with unknown index (#1772)
  • Fix reading partitioned parquet files in HDFS (#1782)
  • Fix creating Mars Series from empty pandas Series (#1787)
  • Support md.concat on DataFrame and Series (#1798)
  • Fix bug that explicit execute may be required for to_parquet and XGB predict (#1794)
  • Fix TypeError when timeout argument is absent when starting Mars cluster in YARN (#1803, thanks @smartguo!)

Documentation

  • Fill docs for apply and transform (#1764)

Tests

  • Create different test workflows & fix accessor docs (#1799)

v0.6.1

20 Dec 16:40
1d83d55
Compare
Choose a tag to compare

This is the release notes of v0.6.1. See here for the complete list of solved issues and merged PRs.

New Features

  • Tensor
    • Support missing argument for tensor.tosparse() and fill_value argument for sparse_tensor.todense() (#1802)
  • DataFrame
    • Implements {DataFrame,Series}.replace (#1765)
    • Add {DataFrame, Series}.cartesian_chunk support (#1777)
    • Integrate str.cat into reduction and groupby-aggregation (#1781)
    • Implements reduction with level argument (#1784)

Bug fixes

  • Spawn serialization of executable graphs (#1770)
  • Fix getitem on DataFrames with unknown index (#1778)
  • Fix reading partitioned parquet files in HDFS (#1783)
  • Fix creating Mars Series from empty pandas Series (#1788)
  • Fix bug that explicit execute may be required for to_parquet and XGB predict (#1800)
  • Support md.concat on DataFrame and Series (#1801)
  • Fix TypeError when timeout argument is absent when starting Mars cluster in YARN (#1804, thanks @smartguo!)

Documentation

  • Fill docs for apply and transform (#1767)

Tests

  • Create different test workflows & fix accessor docs (#1804)

v0.7.0a1

05 Dec 09:14
6ec8fbd
Compare
Choose a tag to compare
v0.7.0a1 Pre-release
Pre-release

This is the release notes of v0.7.0a1. See here for the complete list of solved issues and merged PRs.

Changes that break compatibility

  • Aggregations and Groupby with aggregations have been rewritten in v0.6.0, older client may raise error when connecting to cluster with new version installed.

Highlights

  • Statsmodels as well as joblib are preliminarily supported.

New Features

  • DataFrame
    • Support num_partitions argument for DataFrame initializers (#1729)
    • Add support for named aggregations (#1747)
  • Tensor
    • Add rebalance method for tensors (#1731)
  • Learn
    • Add preliminary statsmodels support (#1735)
    • Add preliminary joblib support (#1757)

Bug fixes

  • Fix md.read_csv when names and usecols specified (#1737)
  • Make PSRS chunks more balanced (#1742)
  • Support string dtype for tensor reductions (#1745)
  • Fix xgboost and lightgbm on DataFrames (#1750)
  • Fix repeated execution of same code in distributed mode (#1749)
  • Support setting scalar which is a tensor for DataFrame (#1755)

v0.6.0

05 Dec 08:43
07e6009
Compare
Choose a tag to compare

This is the release notes of v0.6.0. See here for the complete list of solved issues and merged PRs.

This release note only covers the difference from v0.6.0rc1; for all highlights and changes, please refer to the release notes of the pre-releases:

alpha1
alpha2
alpha3
beta1
beta2
rc1

Changes that break compatibility

  • Aggregations and Groupby with aggregations have been rewritten in v0.6.0, older client may raise error when connecting to cluster with new version installed.

New Features

  • DataFrame
    • Support num_partitions argument for DataFrame initializers (#1733)
    • Add support for named aggregations (#1748)

Enhancements

  • Unify groupby.agg() using ReductionCompiler (#1739)

Bug fixes

  • Fix md.read_csv when names and usecols specified (#1738)
  • Support string dtype for tensor reductions & balance PSRS chunks (#1746)
  • Fix XGBoost and LightGBM on DataFrames (#1751)
  • Fix repeated execution of same code in distributed mode (#1753)
  • Support setting scalar which is a tensor for DataFrame (#1758)

v0.6.0rc1

24 Nov 18:19
c3a10d5
Compare
Choose a tag to compare
v0.6.0rc1 Pre-release
Pre-release

This is the release notes of v0.6.0rc1. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Implements {DataFrame,Series}.explode (#1714)
  • Learn
    • Support predicting on local LGBM models (#1716)

Enhancements

  • Add configuration page on Mars Web (#1697)
  • Add shared limit option for Mars worker (#1702)
  • Remount the shm directory in entrypoint.sh (#1700)
  • Add pure-dependent option for operands (#1706)
  • Remove prepare_inputs property on operands (#1709)
  • Use ReductionCompiler to support function aggregation in mars.dataframe.reduction (#1705)
  • Write into and read from merged files when data sizes are small (#1708)
  • Refactor builder and searcher of Proxima (#1710, thanks @rg070836rg!)

Bug fixes

  • Fix mars not working on ray cluster (#1712, thanks @fyrestone!)
  • Fix inferring dtype for series.map (#1722)
  • Fix sort functions of DataFrames on CUDA (#1723)

v0.5.5

24 Nov 14:51
8823fbd
Compare
Choose a tag to compare

This is the release notes of v0.5.5. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Implements {DataFrame,Series}.explode (#1715)

Enhancements

  • Add configuration page on Mars Web (#1701)
  • Remount /dev/shm directory in entrypoint.sh in Kubernetes and limit plasma size to avoid SIGBUS (#1703)
  • Add pure-dependent option for operands (#1707)

Bug fixes

  • Fix the KeyError in estimate_fuse_size (#1699)
  • Fix inferring dtype for series.map (#1724)
  • Fix sort functions in CUDA (#1725)

v0.6.0b2

07 Nov 12:15
7fcf3f5
Compare
Choose a tag to compare
v0.6.0b2 Pre-release
Pre-release

This is the release notes of v0.6.0b2. See here for the complete list of solved issues and merged PRs.

New Features

Enhancements

  • Support inplace parameter in reset_index method (#1662)
  • Add a threshold for DataFrame.head optimization (#1673)

Bug fixes

  • Check unknown shape chunks in tile of md.concat (#1655)
  • Create Fetch operands given output types (#1666)
  • Fix hang for rerun DataFrame.groupby in distributed mode (#1667)
  • Modify df.copy() so that it generates the identical key (#1671)
  • Fix IndexError when binary op on Series whose type is datetime (#1675)
  • Mount /dev/shm on host to pods when starting Mars workers in Kubernetes (#1677)
  • Fix Series reduction that output type consistent for map and combine phase (#1685)
  • Fix wrong dtypes of DataFrame setitem chunks (#1690)
  • Add timeout for SharedHolderActor creation (#1684)
  • Fix assigning operands with expected workers (#1689)

v0.5.4

07 Nov 12:27
65669bb
Compare
Choose a tag to compare

This is the release notes of v0.5.4. See here for the complete list of solved issues and merged PRs.

Enhancements

  • Support inplace parameter in reset_index method (#1663)
  • Add a threshold for DataFrame.head optimization (#1679)

Bug fixes

  • Check unknown shape chunks in tile of md.concat (#1656)
  • Fix hang for rerun DataFrame.groupby in distributed mode (#1669)
  • Create Fetch operands given output types (#1668)
  • Modify df.copy() so that it generates the identical key (#1678)
  • Fix IndexError when binary op on Series whose type is datetime (#1680)
  • Mount /dev/shm on host to pods when starting Mars workers in Kubernetes (#1681)
  • Fix DataFrame reduction that output type consistent for map and combine phase (#1686)
  • Fix wrong dtypes of DataFrame setitem chunks (#1691)
  • Fix assigning operands with expected workers (#1693)
  • Add timeout for SharedHolderActor creation (#1692)

v0.6.0b1

24 Oct 04:49
b5b63f3
Compare
Choose a tag to compare
v0.6.0b1 Pre-release
Pre-release

This is the release notes of v0.6.0b1. See here for the complete list of solved issues and merged PRs.

New Features

  • DataFrame
    • Add DataFrame.to_parquet support (#1652)

Enhancements

  • Optimize memory usage for brute-force algorithm in NearestNeighbors (#1640)
  • Structural adjustment for proxima (#1624, thanks @rg070836rg!)

Bug fixes

  • Fix the wrong dtypes of DataFrameSetitem's inputs (#1623)
  • Fix issue that output_type does not take effect for df.apply (#1626)
  • Fix registration for DataFrameSetLabel operand (#1631)
  • Fix issue that serialization of transpose failed when input has unknown shape (#1632)
  • Eliminate TimeoutError when there are running nodes (#1637)
  • Fix PSRS error when chunks has fewer rows than partition number (#1642)
  • Add flush method to _LogWrapper (#1646)
  • Fix md.concat which may occupy huge amount of memory on client when all of DataFrames own large RangeIndex (#1649)