Skip to content

Releases: microsoft/SynapseML

v0.11.2-spark3.4

01 Sep 18:59
f54275f
Compare
Choose a tag to compare
v0.11.2-spark3.4 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v0.11.2...v0.11.2-spark3.4

SynapseML v0.11.2

10 Jul 22:45
1e80aa1
Compare
Choose a tag to compare

v0.11.2

Bug Fixes ๐Ÿž

  • make geospatial services robust to 404 thrown by the service (#2007)
  • dont retry 4XX codes other than 429 (#2005)
  • improve tolerance for error handling (#1991)
  • Some tidying and build fixes (#1984)
  • fix import error on using the cognitive services on AML spark clusters (#1951)
  • Update retry arguments in GeospatialServices - Flooding Risk.ipynb (#1954)
  • Update retry arguments in GeospatialServices - Flooding Risk.ipynb (#1954)
  • add retries for torchvision download (#1949)

Build ๐Ÿญ

  • upload notebooks to storage on every build (#2001)
  • Publishing to ADO Feed (#1995)
  • bump ossf/scorecard-action from 2.1.3 to 2.2.0 (#1993)
  • Update documentprojection to support manifest files and custom formatters/endpoints (#1976)
  • bump microsoft/gpt-review from 0.9.3 to 0.9.4 (#1962)
  • bump microsoft/gpt-review from 0.9.2 to 0.9.3 (#1960)
  • Added module for generating docs from notebooks (#1911)

Documentation ๐Ÿ“˜

  • Improved LightGBM docs (#2003)
  • Add Langchain notebook demo (#2002)
  • update website synapse info and fabric installation (#2000)
  • update OpenAI notebook for acrolinx (#1999)
  • Add new LightGBM streaming docs (#1992)
  • Update Vowpal Wabbit - Multi-class classification.ipynb (#1971)
  • Update Vowpal Wabbit - Overview.ipynb (#1972)
  • Update Vowpal Wabbit - Contextual Bandits.ipynb (#1970)
  • Update Vowpal Wabbit - Classification using VW-native Format.ipynb (#1969)
  • Update Vowpal Wabbit - Classification using SparkML Vector.ipynb (#1968)
  • update spark33 installation instruction readme (#1961)
  • remove cell output - sentiment analysis quickstart (#1932)
  • improve and organize openai docs (#1937)

Features ๐ŸŒˆ

  • Reference dataset (#1977)
  • Add Langchain Transformation (#1925)
  • OrthogonalForestDML (#1873)

Maintenance ๐Ÿ”ง

  • bump to v0.11.2 (#2011)
  • make it so custom versions are possible (#1998)
  • fix spark_install for R tests (#1994)
  • retry conda install and add a timeout (#1983)
  • exclude failing explanation dashboard notebook (#1982)
  • fix broken tests (#1981)
  • add timeouts for R tests (#1963)
  • add internal to find_secret (#1948)
  • fix flakiness in spark installation (#1944)
  • update R-setup.md docs (#1946)
  • Add Azure Open AI based PR Summarization (#1957)
  • fix gpu notebook protobuf version (#1959)
  • A base scrubber and a "Shared Access Signature" Scrubber (#1939)
  • remove old notebooks from website (#1934)
  • no-op commit to ensure no double-releasing of library

Performance Improvements ๐Ÿš€

  • Update OpenAI Embedding with latest embedding model (#1938)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

v0.11.2-spark3.3

10 Jul 23:15
Compare
Choose a tag to compare
v0.11.2-spark3.3 Pre-release
Pre-release
chore: bump to spark 3.3.1

v0.11.1-spark3.3

04 May 20:02
Compare
Choose a tag to compare
v0.11.1-spark3.3 Pre-release
Pre-release
chore: make it so custom versions are possible

SynapseML v0.11.1

24 Apr 23:13
866261c
Compare
Choose a tag to compare

SynapseML v0.11.1

Bug Fixes ๐Ÿž

  • set default values for aadToken & url for internal Synapse (#1918)
  • ONNX model shape inference cannot handle batch with shape [-1] (#1906)
  • forgot to add getPValue to python side (#1909)
  • generate random dir for each test (#1908)
  • add back diagnosticsInfo for MVAD (#1892)
  • DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
  • fix date parsing in FaceSuite test (#1896)
  • fix Build pipeline (#1904)
  • Retry OnnxHub call to improve test reliability (#1889)
  • Normalize line-endings (#1883)
  • Remove case matching for erased generic type (#1880)
  • fix bug #1869, DML .setFitIntercept should be set to true (#1876)
  • Remove extraneous "Foo" type from Py codegen (#1867)
  • Allow variable size in ONNX inputs (#1851)
  • Abstain from CodeQL for markdown-only changes (#1865)
  • fix style
  • update OpenAIEmbedding internalServiceType

Build ๐Ÿญ

  • bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
  • bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
  • bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
  • bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
  • bump webpack from 5.75.0 to 5.76.1 in /website (#1870)

Documentation ๐Ÿ“˜

  • Fix installation instruction in the webpage for the build.sbt file (#1921)
  • note discrete treatment data type (#1905)
  • add custom chatbot creation to form demo (#1888)
  • add overview page for simple DNN and fix some typos (#1879)
  • Fix a typo in installation docs
  • fix link issue in CONTRIBUTING.md (#1864)
  • fix a few issues in cognitive service demo (#1861)

Features ๐ŸŒˆ

  • add streaming API for MVAD (#1893)
  • [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
  • Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
  • support new api version of form recognizer (#1882)
  • Add a new function to DMLModel, getPValue (#1863)
  • update default internal endpoint for cog services (#1859)

Maintenance ๐Ÿ”ง

  • bump to v0.11.1 (#1933)
  • Adding telemetry for the dataset metadata. This one is specially for โ€ฆ (#1917)
  • fix r tests (#1927)
  • fix build issues (#1916)
  • disable test until Synapse is fixed (#1915)
  • add .bloop to .gitignore (#1897)
  • clean up old/missed search indexes in SearchWriterSuite (#1901)
  • Add utility to clean azure search indexes
  • update website docs to point to correct developer API docs (#1877)
  • Update pipeline.yaml for Azure Pipelines (#1866)
  • make sure nightly build has new commit

Changes:

  • 866261c chore: bump to v0.11.1 (#1933)
  • 3c09702 chore: Adding telemetry for the dataset metadata. This one is specially for โ€ฆ (#1917)
  • 0d0d10c feat: add streaming API for MVAD (#1893)
  • 1b71c1d chore: fix r tests (#1927)
  • 0df97ad chore: fix build issues (#1916)
  • 78695fb Update Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb (#1922)
  • 87d5bc5 docs: Fix installation instruction in the webpage for the build.sbt file (#1921)
  • 8320b2b fix: set default values for aadToken & url for internal Synapse (#1918)
  • 4912ae4 chore: disable test until Synapse is fixed (#1915)
  • 469445b fix: ONNX model shape inference cannot handle batch with shape [-1] (#1906)
See More
  • 3fa001e build: bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
  • f51327e Update LightGBM version to 3.3.5 (#1910)
  • b1e584e fix: forgot to add getPValue to python side (#1909)
  • a09a6f7 docs: note discrete treatment data type (#1905)
  • 0fa3f2a fix: generate random dir for each test (#1908)
  • 736c317 fix: add back diagnosticsInfo for MVAD (#1892)
  • 13afff6 fix: DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
  • 7546e7f build: bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
  • f227f02 fix: fix date parsing in FaceSuite test (#1896)
  • 0f02626 fix: fix Build pipeline (#1904)
  • ce9fe41 chore: add .bloop to .gitignore (#1897)
  • 7ffa970 chore: clean up old/missed search indexes in SearchWriterSuite (#1901)
  • 9a6cf03 chore: Add utility to clean azure search indexes
  • 52919ce fix: Retry OnnxHub call to improve test reliability (#1889)
  • 979c629 feat: [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
  • 412620a docs: add custom chatbot creation to form demo (#1888)
  • 9f634a6 feat: Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
  • 7657089 fix: Normalize line-endings (#1883)
  • c156792 feat: support new api version of form recognizer (#1882)
  • ed842a5 docs: add overview page for simple DNN and fix some typos (#1879)
  • 87e1c78 fix: Remove case matching for erased generic type (#1880)
  • cd72bc9 build: bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
  • 564d047 fix: fix bug #1869, DML .setFitIntercept should be set to true (#1876)
  • 392dbbf chore: update website docs to point to correct developer API docs (#1877)
  • 129abde build: bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
  • 4d1c560 build: bump webpack from 5.75.0 to 5.76.1 in /website (#1870)
  • 62c79d8 docs: Fix a typo in installation docs
  • 1f63dab feat: Add a new function to DMLModel, getPValue (#1863)
  • 83f8260 fix: Remove extraneous "Foo" type from Py codegen (#1867)
  • a5bec45 fix: Allow variable size in ONNX inputs (#1851)
  • 23c9b0a chore: Update pipeline.yaml for Azure Pipelines (#1866)
  • dedcbda docs: fix link issue in CONTRIBUTING.md (#1864)
  • a7f31d5 fix: Abstain from CodeQL for markdown-only changes (#1865)
  • a5f38b1 Update DoubleMLEstimator test CI verification (#1862)
  • a44f917 fix: fix style
  • cc931af fix: update OpenAIEmbedding internalServiceType
  • 424d586 feat: update default internal endpoint for cog services (#1859)
  • e4a0e2c docs: fix ...
Read more

SynapseML v0.11.0

05 Mar 13:37
7b23764
Compare
Choose a tag to compare

SynapseML: Simple and distributed machine learning

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.11.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API thatโ€™s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights

ChatGPT and GPT-4 at Scale Simple Deep Learning LightGBM v2
Intelligent chat and embeddings. Simplified Prompting APIs. Train custom image and text classifiers with ease Higher performance, >10x lower memory footprint, same API
View Notebook Learn More Try an example
ONNX Model Hub Causal Learning Vowpal Wabbit v2
Embed >150 state of the art deep networks into your pipelines Discover and measure causal treatment effects New second generation integration
Learn More View Docs Explore Samples

New Features

General โœจ

  • R Support is no longer Beta! (#1586)
  • Support for Spark 3.2.3

Open AI ๐Ÿค–

  • Add OpenAI Prompt Template support (#1843)
  • Add Azure OpenAI embedding support (#1832)
  • Add Azure Active Directory authentication for OpenAI (#1829)
  • Add Null-value handling for OpenAI models (#1854)

Deep Learning ๐Ÿ•ธ

  • Remove CNTK functionality and replace with ONNX (#1593)
  • Add the DeepTextClassifier a simple API for fine tuning a wide array of Hugging Face ๐Ÿค— text transformers using PyTorch Lightning (#1591)
  • Add the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Azure Cognitive Services for Big Data ๐Ÿง 

  • Add SpeakerEmotionInference transformer to generate emotion annotation tags for emotive reading in SpeechToText (#1691)
  • Add new AnalyzeText API (#1760)
  • Support Azure Active Directory (AAD) authentication for the cognitive services (#1778, #1797)
  • Move different cognitive services into sub packages (#1746)
  • Add audiobook generation example (#1852)
  • Add a notebook for advanced cognitive service usage (#1825)
  • Upgrade MVAD to v1.1 (#1788)
  • Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
  • Add word-level timing to SpeechToTextSDK and ConversationTranscription (#1801)
  • Add the descriptionExcludes parameter to AnalyzeImage (#1590)

Causal Learning ๐Ÿ“ˆ

  • Add the causal DoubleMLEstimator for learning causal treatment effects from data (#1715)
  • Add a DoubleMLEstimator document and sample notebook (#1730)
  • Fix DML regression bug, should remove both treatment and outcome columns as feature columns (#1820)
  • Add TreatmentCol type checking (#1816)
  • Update test to validate ATE value should be positive (#1821)
  • Fix issue with missing causal test coverage (#1799)

LightGBM ๐ŸŒณ

  • Add LightGBM streaming execution mode for more reliable performance with orders of magnitude less memory. (#1580)
  • Add maxNumClasses param to LightGBMClassifier for multi-class (#1841)
  • Added the passThroughArgs feature which allows users to set low level LGBM parameters before they are wrapped in SparkML (#1749)

Vowpal Wabbit ๐Ÿ‡

Additional Updates

Bug Fixes ๐Ÿž

  • Support grayscale images in toNDArray (#1592)
  • Adjust learning rate in VW example notebook (#1853)
  • Correct copy/paste error in acr cleanup (#1838)
  • Fix synapse test config, and isolation forest notebook (#1833)
  • Add spark config to fix ArrayStoreException (#1757)
  • Fix breeze NoSuchMethodError (#1807)
  • Fix modelVersion param in TextAnalytics (#1756)
  • Make logging infrastructure consistent and add logging checks (#1755)
  • Fix website sidebars and vulnerabilities in packages (#1753)
  • Remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
  • Update isolation forest notebook (#1696)
  • Remove error on invalid columns in DropColumns (#1695)
  • Fix PyArrow failure in deeplearning test (#1689)
  • Fix linked service setters on cog service base class (#1685)
  • KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
  • Fix flaky translate tests (#1643)
  • Fix speechToTextSuite serialization Fuzzing failure (#1626)
  • Fix translator endpoint and update all endpoints for gov regions (#1623)
  • Finder runtime issues (#1598)
  • Clean up cluster if Databricks tests pass ([#1599](https://github....
Read more

SynapseML v0.10.2

22 Nov 14:30
cd1d2ea
Compare
Choose a tag to compare

v0.10.2

Bug Fixes ๐Ÿž

  • remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
  • remove synapse E2E testing exclusion - cyber ml (#1699)
  • update isolation forest notebook (#1696)
  • don't throw on invalid columns in DropColumns (#1695)
  • fix pyarrow failure in deeplearning test (#1689)
  • fix linked service on cog service base (#1685)
  • fix Uplift Modelling style
  • KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
  • fix flaky translate tests (#1643)
  • update ubuntu to 20.04 in pipeline (#1624)

Build ๐Ÿญ

  • bump actions/checkout from 2 to 3 (#1737)
  • bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
  • bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
  • bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)

Documentation ๐Ÿ“˜

  • update developer readme instruction on python env creation (#1693)
  • fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
  • improve error msg to make it clearer for users and fix typos (#1662)
  • simplify data downloading and add mlflow to uplift modelling (#1659)
  • move magic command forward since it restarts interpreter
  • remove unused docs and fix links
  • improve example notebooks
  • add aisample uplift modelling (#1640)
  • fix command to launch jupyter notebook (#1649)
  • add mlflow in ai samples time series forecasting (#1645)
  • add mlflow logging and loading (#1641)
  • update spark version in Readme
  • improve readme overview
  • add aisample on text classification (#1617)

Features ๐ŸŒˆ

  • add simple deep learning text classifier (#1591)
  • Add SpeakerEmotionInference transformer for generating SSML tโ€ฆ (#1691)
  • Deprecate CNTK objects (#1712)
  • Remove CNTK functionality and replace with ONNX (#1593)
  • R test generation (#1586)

Maintenance ๐Ÿ”ง

  • bump version to 0.10.2 (#1738)
  • fix style (#1736)
  • automate clean-acr with github action workflow (#1735)
  • autodelete old models (#1729)
  • Making secrets optional and cached (#1726)
  • add secret scanning infrastructure (#1724)
  • Move new ImageFeaturizer to onnx namespace (#1711)
  • ScalaStyle fixes (#1716)
  • update scalatest and scalactic (#1706)
  • remove synapse test exclusions (#1698)
  • pin az and python versions (#1705)
  • fix ado integration (#1704)
  • remove notebooks (#1703)
  • fix reopen comment action
  • fix reopen on comment workflow
  • fix typo in issue reopen yaml
  • re open github issues after a comment (#1676)
  • clean up github workflows and add issue label remover (#1674)
  • turn off failing synapse tests temporarily (#1658)
  • added synapse-internal to platform detector function (#1651)
  • publish test jars
  • improve test coverage (#1631)
  • Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
  • clean up TextAnalytics cog service APIs (#1622)

Testing ๐Ÿ’š

  • Additional E2E testing infrastructure (#1727)
  • Improve ONNXtests reliability (#1713)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

See More
  • 0b96cc5 chore: add secret scanning infrastructure (#1724)
  • 2a7a67b feat: Deprecate CNTK objects (#1712)
  • e38e3ad chore: Move new ImageFeaturizer to onnx namespace (#1711)
  • 0ff6802 test: Improve ONNXtests reliability (#1713)
  • fe4c5d2 chore: ScalaStyle fixes (#1716)
  • 050b541 build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
  • f2e88fd feat: Remove CNTK functionality and replace with ONNX (#1593)
  • abdfe19 fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
  • 6a1f994 chore: update scalatest and scalactic (#1706)
  • 144674f chore: remove synapse test exclusions (#1698)
  • 32c654b chore: pin az and python versions (#1705)
  • c8fba28 chore: fix ado integration (#1704)
  • 92d4095 chore: remove notebooks (#1703)
  • a953780 fix: remove synapse E2E testing exclusion - cyber ml (#1699)
  • b257c70 fix: update isolation forest notebook (#1696)
  • 9120b05 using predictionCol for isolation forest (#1686) [ #1060 ]
  • 448f6b7 Remove trident.mlflow APIs. (#1687)
  • f4af33f fix: don't throw on invalid columns in DropColumns (#1695)
  • c531bbb docs: update developer readme instruction on python env creation (#1693)
  • 467e651 build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
  • 302831f fix: fix pyarrow failure in deeplearning test (#1689)
  • e857511 fix: fix linked service on cog service base (#1685)
  • f29318a build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)
  • 50ac0c8 Update reopen-issue-on-comment.yml
  • c9278b5 chore: fix reopen comment action
  • b3a9ba9 chore: fix reopen on comment workflow
  • 9fe273b chore: fix typo in issue reopen yaml
  • a7c50de chore: re open github issues after a comment (#1676)
  • 8914750 chore: clean up github workflows and add issue label remover (#1674)
  • 965231a docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
  • 4fa7249 docs: improve error msg to make it clearer for users and fix typos (#1...
Read more

v0.10.1

23 Aug 03:41
0f54bc6
Compare
Choose a tag to compare

SynapseML v0.10.1

Bug Fixes ๐Ÿž

  • fix speechToTextSuite serializationFuzzing failure (#1626)
  • fix translator endpoint and update all endpoints for gov regions (#1623)
  • binder runtime issues (#1598)
  • clean up cluster if databricks tests pass (#1599)
  • fix deep-learning test flakiness (#1600)
  • update dotnetTestBase assembly version (#1601)
  • fix flaky forms test (#1584)

Build ๐Ÿญ

  • bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
  • bump actions/setup-node from 2 to 3 (#1610)
  • bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
  • bump actions/setup-java from 2 to 3 (#1612)
  • simplify e2e test pipeline with test matrix

Documentation ๐Ÿ“˜

  • add aisample notebooks into community folder (#1606)
  • add aisample time series forecasting (#1614)
  • fix .NET logo on website (#1604)
  • improve OpenAI notebook (#1596)
  • pin mybinder to v0.10.0 to avoid thrashing
  • add demo into videos on website (#1581)
  • update installation guidance of v0.10.0 (#1578)
  • add more .net samples (#1570)
  • add dotnet installation & example doc (#1567)
  • Update issue template

Features ๐ŸŒˆ

  • add stale bot for issues (#1602)
  • Support grayscale images in toNDArray (#1592)
  • Add the descriptionExcludes parameter to AnalyzeImage (#1590)
  • Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Maintenance ๐Ÿ”ง

  • bump to v0.10.1 (#1628)
  • deprecate old Text analytics APIs to prepare for refactoring (#1627)
  • remove deprecated lime APIs (#1620)
  • update openai service to the official deployment, and disable test due to outage (#1619)
  • Auto update GitHub actions with dependabot (#1608)
  • hotfix binder badge
  • pin binder version for users (#1607)
  • Bump spark to 3.2.2
  • bump spark version
  • Format welcome message with emojis (#1583)
  • Add welcome message to new PRs/Issues (#1573)
  • Add GH workflow to label new/reopened issues (#1571)
  • update website (#1566)

Testing ๐Ÿ’š

  • stabilize unit tests (#1576)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

  • 0f54bc6 chore: bump to v0.10.1 (#1628)
  • 3d0f3f4 chore: deprecate old Text analytics APIs to prepare for refactor (#1627)
  • 2052e13 chore: remove deprecated lime APIs (#1620)
  • 09213b0 fix: fix speechToTextSuite serializationFuzzing failure (#1626)
  • 9f78bf0 fix: fix translator endpoint and update all endpoints for gov regions (#1623)
  • 7e90d19 docs: add aisample notebooks into community folder (#1606)
  • ac40e5a chore: update openai service to official, and disable test due to outage (#1619)
  • f54f7f6 docs: add aisample time series forecasting (#1614)
  • 7b4b0e1 build: bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
  • 43b0d17 build: bump actions/setup-node from 2 to 3 (#1610)
See More
  • c48a07a build: bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
  • b1a331c build: bump actions/setup-java from 2 to 3 (#1612)
  • 78e40cb chore: Auto update github actions with dependabot (#1608)
  • 69d2d20 chore: hotfix binder badge
  • 93d7ccf chore: pin binder version for users (#1607)
  • c7a61ec fix: binder runtime issues (#1598)
  • c960c06 docs: fix .NET logo on website (#1604)
  • 28a35b4 fix: clean up cluster if databricks tests pass (#1599)
  • 5a28740 fix: fix deep-learning test flakiness (#1600)
  • adf1a61 fix: update dotnetTestBase assembly version (#1601)
  • c659b33 feat: add stale bot for issues (#1602)
  • 05a4202 docs: improve OpenAI notebook (#1596)
  • e019756 feat: Support gray scale images in toNDArray (#1592)
  • 51beaa0 feat: Add the descriptionExcludes parameter to AnalyzeImage (#1590)
  • b9ac22a docs: pin mybinder to v0.10.0 to avoid thrashing
  • 1808a0f chore: Bump spark to 3.2.2
  • 8e7d453 build: simplify e2e test pipeline with test matrix
  • 8e34c7b chore: bump spark version
  • 44c8ed5 feat: Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)
  • e4f0883 fix: fix flaky forms test (#1584)
  • 7da5f49 chore: Format welcome message with emojis (#1583)
  • 0e6bb35 Serena/update issue template (#1582)
  • a6a2718 docs: add demo into videos on website (#1581)
  • 7c34fc4 test: stabilize unit tests (#1576)
  • 49f3a58 chore: Add welcome message to new PRs/Issues (#1573)
  • 4868e8b Add back LightGBM library initialization in booster (#1575)
  • d427b88 docs: update installation guidance of v0.10.0 (#1578)
  • 55a60c9 docs: add more .net samples (#1570)
  • 39fe2d8 chore: Add GH workflow to label new/reopened issues (#1571)
  • 0febe3c docs: add dotnet installation & example doc (#1567)
  • db95a10 chore: update website (#1566)

This list of changes was auto generated.

v0.10.0

18 Jul 02:50
e9986fe
Compare
Choose a tag to compare

SynapseML

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API thatโ€™s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights

OpenAI Language Models .NET, C#, and F# Support Full MLFlow Support Live Demos in Browser
Embed 175-billion parameter models into your databases with ease Use or train any SynapseML model from .NET Quick and easy MLOps, model management, and autologging Explore the SynapseML library with zero setup
Learn More Getting Started Guide Explore the Docs Run in Browser

New Features

General โœจ

Azure Cognitive Services for Big Data ๐Ÿง 

Responsible AI at Scale ๐Ÿ˜‡

  • Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
  • Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
  • Added a notebook for ICE and PDP feature explainers (#1318)
  • Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)

MLFlow ๐Ÿ”ƒ

LightGBM on Spark ๐ŸŒณ

  • Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
  • Added seed parameters to LightGBM (#1387)
  • Added a method to get LightGBM native model string directly (#1515)
  • Fixed issue with validation data creation during useSingleDataset mode (#1527)
  • Fixed multiclass training with initial scores (#1526)
  • Fixed saving LightGBM model iterations with early stopping (#1497)
  • Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
  • Fixed issue where when empty partition is chosen as the main worker in singleDatasetMode (#1458)
  • Fixed bug with data repartitioning in LightGBMRanker (#1368)
  • Fixed outdated docs for useSingleDatasetMode (#1562)
  • Refactored LightGBM class structure to improve logging and debugging (#1557)

Vowpal Wabbit ๐Ÿ‡

  • Fixed issues with the saveNativeModel for the VWRegressionModel #1364 (#1366)
  • Fixed issues with building quadratic interaction terms (#1460)

Isolation Forests ๐ŸŒฒ

Additional Updates

Maintenance ๐Ÿ”ง

  • Removed unused debugging code (#1546)
  • Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
  • Made python style checks verbose (#1532)
  • Fixed library checking while installing library on Databricks cluster (#1488)
  • Upgraded and fix Dockerfiles (#1472)
  • Added Developer Docker Image build to pipeline (#1480)
  • Fixed ADO area path in Issue Linker (#1464)
  • Fix master version badge display
  • Improved Databricks error reporting
  • Updated azure cli to stop build errors
  • Fixed SSL handshake flakiness
  • Added itsdangerous as a dependency to ADB tests (#1412)
  • Turned on debug for pr to work item workflow
  • Pointed pr linker to official implementation
  • Changed GitHub action trigger from pull_request_target to pull_request (#1413)
  • Fixed issue where Unit Tests were not executing ([#1409](https://github.com/Microsoft/SynapseML/issu...
Read more

SynapseML v0.9.5

12 Jan 22:42
79d92d3
Compare
Choose a tag to compare

SynapseML

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.9.5 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API thatโ€™s usable across Python, R, Scala, and Java.

Highlights

Geospatial Intelligence Multivariate Anomaly Detection Responsible AI at Scale Text To Speech Healthcare Analytics
Large-scale map and geocoding operations Build custom time series anomaly detection systems Distributed Conditional Expectation and Partial Dependence Analysis East-to-use Neural Text to Speech for large datasets Quickly understand entities and relationships in corpora of medical text.

New Features

Geospatial Intelligence ๐Ÿ—บ๏ธ

  • Added support for distributed geospatial queries backed by the Azure Maps API
  • Added the geospatial usage overview (#1339)
  • Explore how to use the geospatial intelligence services to analyze flood risks. (#1339)
  • Added the AddressGeocoder transformer to map informal addresses to standardized adresses with latitude and longitude (#1294)
  • Added the ReverseGeocoder transformer to map latitude and longitude measurements to standardized addresses. (#1339)
  • Added the CheckPointInPolygon, to detect if latitude and longitude queries lie inside regions of interest (#1339)

Azure Cognitive Services for Big Data ๐Ÿง 

  • Added the Healthcare Analytics Transformer for extracting medical information, entities, and relationships for text. [Example Usage] (#1329)
  • Added the FitMultivariateAnomaly estimator for training custom anomaly detection models on DataFrames of multivariate time series data (#1272)
  • Added example notebook for Multivariate Anomaly Detector
  • See how to train a custom Multivariate Anomaly detector in the Estimators reference docs (#1323)
  • Added simplified Text Analytics transformers that support auto-batching (#1329)
  • Added the TextToSpeech Transformer for transforming Dataframes of text to audio files with neural voice synthesis (#1320)
  • Added the TextAnalyze transformer to support executing multiple text analytics workloads within a single API call (#1267, #1312)

Responsible AI at Scale ๐Ÿ˜‡

  • Added Individual Conditional Expectation explanations and Partial Dependence Plots with the ICETransformer. This tool gives detailed explanations of how features in opaque-box models affect the model prediction. (#1284)
  • Learn about how to use the ICETransformer through an example with the Adult Census dataset

MLFlow ๐Ÿ”ƒ

  • Add MLFlow support for saving and loading SynapseML models (#1277)

LightGBM on Spark ๐ŸŒณ

  • Improved LightGBM training performance 4x-10x by setting num_threads to be cores-1 (#1282)
  • Added the predict_disable_shape_check in LightGBM (#1273)
  • Reduced temporary file bloat by creating the LightGBM native temp directory lazily (#1326)
  • Added logging for number of columns and rows when creating datasets, set useSingleDatasetMode=True by default (#1222)

Infrastructure ๐Ÿญ

  • SynapseML now installable from Maven Central!
  • SynapseML now supports spark v3.2.x

Additional Updates

Bug Fixes ๐Ÿž

  • Allowed FlattenBatch to propagate non-array values (#1286)
  • Fixed flaky tests (#1342)
  • Fixed website bugs and migrated docSearch (#1331)
  • Fixed issue where IsolationForestModel does not properly exchange params with the inner model (#1330)
  • Corrected the objective param when using fobj (#1292)
  • Fixed issue where broadcasted sum in breeze 1.0 breaks in Spark 3.2.0 (#1299)
  • Hotfixes for R test runners (#1283)
  • fix installation instruction (#1268)
  • Removing broadcast hint (#1255)
  • fix install instructions (#1259)

Build ๐Ÿญ

  • bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website (#1270)
  • remove some deps that cause sec issues (#1264)

Documentation ๐Ÿ“˜

  • Fixed broken link to CyberML notebook (#1322)
  • Added website announcement bar (#1263)
  • Updated and improve readme (#1262)
  • Removed references to runme in contributing.md
  • Supported Math expressions in website markdown (#1278)
  • Corrected Synapse typo in website (#1335)

Maintenance ๐Ÿ”ง

  • Stopped lightGBM tests from timing out (#1315)
  • Fixed r test flakiness (#1314)
  • Updated VerifyLightGBMClassifier.scala (#1313)
  • Update speech SDK test results
  • Add in missing tests in build (#1300)
  • Fix flaky build steps (#1298)
  • Fix website telemetry (#1261)
  • Add website telemetry (#1260)
  • Added missing test classes to pipeline

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

Serena Ruan Ilya Matiach Sudhindra Kovalam
Serena is an engineer on the Azure Synapse team in Beijing. In this release, Serena has continued her unbelievable speed of contributions with support for Multivariate Anomaly Detection, MLFlow, and installation from Maven Central. These contributions are just a few of the many projects Serena has contributed since she joined just a few months ago! Ilya is a prolific engineer on the Azure Machine Learning Boston team working on responsible AI. Ilya contributed LightGBM on Spark and worked tirelessly to improve and support this feature. Ilya has been an active contributor to the SynapseML project for 5 years and has built many of the tools in the library. Sudhindra is an engineer on the Microsoft Maps team and has contributed intelligent geospatial APIs to SynapseML v0.9.5. Sudhindra developed new ways to automate generation of Spa...
Read more