Releases: microsoft/SynapseML
Releases ยท microsoft/SynapseML
v0.11.2-spark3.4
What's Changed
- fix: modified the search engine in the demo notebook to bing by @sherylZhaoCode in #2013
- build: bump semver from 5.7.1 to 5.7.2 in /website by @dependabot in #2012
- docs: add prerequisites - openai and cognitive services resources by @JessicaXYWang in #2008
- docs: update notebooks - bring back fabric reviewers changes. by @JessicaXYWang in #1979
- fix: docker link by @niehaus59 in #2019
- docs: Refactor docs and docgen framework by @mhamilton723 in #2021
- chore: bump databricks e2e timeout by @mhamilton723 in #2024
- docs: add dead link checker by @mhamilton723 in #2022
- docs: fix broken links by @mhamilton723 in #2025
- docs: continue fixing broken links by @mhamilton723 in #2026
- docs: fix broken links by @mhamilton723 in #2027
- docs: fix broken link by @mhamilton723 in #2032
- docs: add QandA notebook. by @aydan-at-microsoft in #2029
- build: bump actions/checkout from 2 to 3 by @dependabot in #2030
- docs: variable formatting for QandA nb by @aydan-at-microsoft in #2033
- fix: Fix ONNX link by @iemejia in #2035
- fix: Improve LGBM exception and logging by @svotaw in #2037
- docs: fix broken links by @JessicaXYWang in #2042
- docs: initial POC of Jessica's fabric doc generator by @mhamilton723 in #2023
- fix: improve docgen by @eisber in #2043
- feat: Support langchain transformer on fabric by @lhrotk in #2036
- chore: remove secret scanner by @mhamilton723 in #2048
- fix: Fix problem with empty partition assigned to validation data by @svotaw in #2059
- chore: Adding Spark34 support by @KeerthiYandaOS in #2052
New Contributors
- @aydan-at-microsoft made their first contribution in #2029
Full Changelog: v0.11.2...v0.11.2-spark3.4
SynapseML v0.11.2
v0.11.2
Bug Fixes ๐
- make geospatial services robust to 404 thrown by the service (#2007)
- dont retry 4XX codes other than 429 (#2005)
- improve tolerance for error handling (#1991)
- Some tidying and build fixes (#1984)
- fix import error on using the cognitive services on AML spark clusters (#1951)
- Update retry arguments in GeospatialServices - Flooding Risk.ipynb (#1954)
- Update retry arguments in GeospatialServices - Flooding Risk.ipynb (#1954)
- add retries for torchvision download (#1949)
Build ๐ญ
- upload notebooks to storage on every build (#2001)
- Publishing to ADO Feed (#1995)
- bump ossf/scorecard-action from 2.1.3 to 2.2.0 (#1993)
- Update documentprojection to support manifest files and custom formatters/endpoints (#1976)
- bump microsoft/gpt-review from 0.9.3 to 0.9.4 (#1962)
- bump microsoft/gpt-review from 0.9.2 to 0.9.3 (#1960)
- Added module for generating docs from notebooks (#1911)
Documentation ๐
- Improved LightGBM docs (#2003)
- Add Langchain notebook demo (#2002)
- update website synapse info and fabric installation (#2000)
- update OpenAI notebook for acrolinx (#1999)
- Add new LightGBM streaming docs (#1992)
- Update Vowpal Wabbit - Multi-class classification.ipynb (#1971)
- Update Vowpal Wabbit - Overview.ipynb (#1972)
- Update Vowpal Wabbit - Contextual Bandits.ipynb (#1970)
- Update Vowpal Wabbit - Classification using VW-native Format.ipynb (#1969)
- Update Vowpal Wabbit - Classification using SparkML Vector.ipynb (#1968)
- update spark33 installation instruction readme (#1961)
- remove cell output - sentiment analysis quickstart (#1932)
- improve and organize openai docs (#1937)
Features ๐
Maintenance ๐ง
- bump to v0.11.2 (#2011)
- make it so custom versions are possible (#1998)
- fix spark_install for R tests (#1994)
- retry conda install and add a timeout (#1983)
- exclude failing explanation dashboard notebook (#1982)
- fix broken tests (#1981)
- add timeouts for R tests (#1963)
- add internal to find_secret (#1948)
- fix flakiness in spark installation (#1944)
- update R-setup.md docs (#1946)
- Add Azure Open AI based PR Summarization (#1957)
- fix gpu notebook protobuf version (#1959)
- A base scrubber and a "Shared Access Signature" Scrubber (#1939)
- remove old notebooks from website (#1934)
- no-op commit to ensure no double-releasing of library
Performance Improvements ๐
- Update OpenAI Embedding with latest embedding model (#1938)
Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n
v0.11.2-spark3.3
chore: bump to spark 3.3.1
v0.11.1-spark3.3
chore: make it so custom versions are possible
SynapseML v0.11.1
SynapseML v0.11.1
Bug Fixes ๐
- set default values for aadToken & url for internal Synapse (#1918)
- ONNX model shape inference cannot handle batch with shape [-1] (#1906)
- forgot to add getPValue to python side (#1909)
- generate random dir for each test (#1908)
- add back diagnosticsInfo for MVAD (#1892)
- DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
- fix date parsing in FaceSuite test (#1896)
- fix Build pipeline (#1904)
- Retry OnnxHub call to improve test reliability (#1889)
- Normalize line-endings (#1883)
- Remove case matching for erased generic type (#1880)
- fix bug #1869, DML .setFitIntercept should be set to true (#1876)
- Remove extraneous "Foo" type from Py codegen (#1867)
- Allow variable size in ONNX inputs (#1851)
- Abstain from CodeQL for markdown-only changes (#1865)
- fix style
- update OpenAIEmbedding internalServiceType
Build ๐ญ
- bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
- bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
- bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
- bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
- bump webpack from 5.75.0 to 5.76.1 in /website (#1870)
Documentation ๐
- Fix installation instruction in the webpage for the build.sbt file (#1921)
- note discrete treatment data type (#1905)
- add custom chatbot creation to form demo (#1888)
- add overview page for simple DNN and fix some typos (#1879)
- Fix a typo in installation docs
- fix link issue in CONTRIBUTING.md (#1864)
- fix a few issues in cognitive service demo (#1861)
Features ๐
- add streaming API for MVAD (#1893)
- [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
- Add ChatGPT through the
OpenAIChatCompletion
transformer (#1887) - support new api version of form recognizer (#1882)
- Add a new function to DMLModel, getPValue (#1863)
- update default internal endpoint for cog services (#1859)
Maintenance ๐ง
- bump to v0.11.1 (#1933)
- Adding telemetry for the dataset metadata. This one is specially for โฆ (#1917)
- fix r tests (#1927)
- fix build issues (#1916)
- disable test until Synapse is fixed (#1915)
- add .bloop to .gitignore (#1897)
- clean up old/missed search indexes in SearchWriterSuite (#1901)
- Add utility to clean azure search indexes
- update website docs to point to correct developer API docs (#1877)
- Update pipeline.yaml for Azure Pipelines (#1866)
- make sure nightly build has new commit
Changes:
- 866261c chore: bump to v0.11.1 (#1933)
- 3c09702 chore: Adding telemetry for the dataset metadata. This one is specially for โฆ (#1917)
- 0d0d10c feat: add streaming API for MVAD (#1893)
- 1b71c1d chore: fix r tests (#1927)
- 0df97ad chore: fix build issues (#1916)
- 78695fb Update Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb (#1922)
- 87d5bc5 docs: Fix installation instruction in the webpage for the build.sbt file (#1921)
- 8320b2b fix: set default values for aadToken & url for internal Synapse (#1918)
- 4912ae4 chore: disable test until Synapse is fixed (#1915)
- 469445b fix: ONNX model shape inference cannot handle batch with shape [-1] (#1906)
See More
- 3fa001e build: bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
- f51327e Update LightGBM version to 3.3.5 (#1910)
- b1e584e fix: forgot to add getPValue to python side (#1909)
- a09a6f7 docs: note discrete treatment data type (#1905)
- 0fa3f2a fix: generate random dir for each test (#1908)
- 736c317 fix: add back diagnosticsInfo for MVAD (#1892)
- 13afff6 fix: DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
- 7546e7f build: bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
- f227f02 fix: fix date parsing in FaceSuite test (#1896)
- 0f02626 fix: fix Build pipeline (#1904)
- ce9fe41 chore: add .bloop to .gitignore (#1897)
- 7ffa970 chore: clean up old/missed search indexes in SearchWriterSuite (#1901)
- 9a6cf03 chore: Add utility to clean azure search indexes
- 52919ce fix: Retry OnnxHub call to improve test reliability (#1889)
- 979c629 feat: [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
- 412620a docs: add custom chatbot creation to form demo (#1888)
- 9f634a6 feat: Add ChatGPT through the
OpenAIChatCompletion
transformer (#1887) - 7657089 fix: Normalize line-endings (#1883)
- c156792 feat: support new api version of form recognizer (#1882)
- ed842a5 docs: add overview page for simple DNN and fix some typos (#1879)
- 87e1c78 fix: Remove case matching for erased generic type (#1880)
- cd72bc9 build: bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
- 564d047 fix: fix bug #1869, DML .setFitIntercept should be set to true (#1876)
- 392dbbf chore: update website docs to point to correct developer API docs (#1877)
- 129abde build: bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
- 4d1c560 build: bump webpack from 5.75.0 to 5.76.1 in /website (#1870)
- 62c79d8 docs: Fix a typo in installation docs
- 1f63dab feat: Add a new function to DMLModel, getPValue (#1863)
- 83f8260 fix: Remove extraneous "Foo" type from Py codegen (#1867)
- a5bec45 fix: Allow variable size in ONNX inputs (#1851)
- 23c9b0a chore: Update pipeline.yaml for Azure Pipelines (#1866)
- dedcbda docs: fix link issue in CONTRIBUTING.md (#1864)
- a7f31d5 fix: Abstain from CodeQL for markdown-only changes (#1865)
- a5f38b1 Update DoubleMLEstimator test CI verification (#1862)
- a44f917 fix: fix style
- cc931af fix: update OpenAIEmbedding internalServiceType
- 424d586 feat: update default internal endpoint for cog services (#1859)
- e4a0e2c docs: fix ...
SynapseML v0.11.0
Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.11.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API thatโs usable across Python, R, Scala, Java, .NET, C#, and F#.
Highlights
ChatGPT and GPT-4 at Scale | Simple Deep Learning | LightGBM v2 |
Intelligent chat and embeddings. Simplified Prompting APIs. | Train custom image and text classifiers with ease | Higher performance, >10x lower memory footprint, same API |
View Notebook | Learn More | Try an example |
ONNX Model Hub | Causal Learning | Vowpal Wabbit v2 |
Embed >150 state of the art deep networks into your pipelines | Discover and measure causal treatment effects | New second generation integration |
Learn More | View Docs | Explore Samples |
New Features
General โจ
- R Support is no longer Beta! (#1586)
- Support for Spark 3.2.3
Open AI ๐ค
- Add OpenAI Prompt Template support (#1843)
- Add Azure OpenAI embedding support (#1832)
- Add Azure Active Directory authentication for OpenAI (#1829)
- Add Null-value handling for OpenAI models (#1854)
Deep Learning ๐ธ
- Remove CNTK functionality and replace with ONNX (#1593)
- Add the
DeepTextClassifier
a simple API for fine tuning a wide array of Hugging Face ๐ค text transformers using PyTorch Lightning (#1591) - Add the
DeepVisionClassifier
a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)
Azure Cognitive Services for Big Data ๐ง
- Add
SpeakerEmotionInference
transformer to generate emotion annotation tags for emotive reading inSpeechToText
(#1691) - Add new AnalyzeText API (#1760)
- Support Azure Active Directory (AAD) authentication for the cognitive services (#1778, #1797)
- Move different cognitive services into sub packages (#1746)
- Add audiobook generation example (#1852)
- Add a notebook for advanced cognitive service usage (#1825)
- Upgrade MVAD to v1.1 (#1788)
- Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
- Add word-level timing to
SpeechToTextSDK
andConversationTranscription
(#1801) - Add the
descriptionExcludes
parameter to AnalyzeImage (#1590)
Causal Learning ๐
- Add the causal
DoubleMLEstimator
for learning causal treatment effects from data (#1715) - Add a DoubleMLEstimator document and sample notebook (#1730)
- Fix DML regression bug, should remove both treatment and outcome columns as feature columns (#1820)
- Add TreatmentCol type checking (#1816)
- Update test to validate ATE value should be positive (#1821)
- Fix issue with missing causal test coverage (#1799)
LightGBM ๐ณ
- Add LightGBM streaming execution mode for more reliable performance with orders of magnitude less memory. (#1580)
- Add maxNumClasses param to LightGBMClassifier for multi-class (#1841)
- Added the
passThroughArgs
feature which allows users to set low level LGBM parameters before they are wrapped in SparkML (#1749)
Vowpal Wabbit ๐
- Vowpal Wabbit v2 (#1579):
- Support Vowpal Wabbit input format using VowpalWabbitGeneric model
- Support additional algorithms & label types (multi-class, cost sensitive one against all): sample notebook
- Progressive validation (aka 1-step ahead) using VowaplWabbitGenericProgressive
- New Contextual Bandit Offline Policy Evaluation Notebook
- Data parallel training independent of cluster size
Additional Updates
Bug Fixes ๐
- Support grayscale images in
toNDArray
(#1592) - Adjust learning rate in VW example notebook (#1853)
- Correct copy/paste error in acr cleanup (#1838)
- Fix synapse test config, and isolation forest notebook (#1833)
- Add spark config to fix ArrayStoreException (#1757)
- Fix breeze NoSuchMethodError (#1807)
- Fix
modelVersion
param in TextAnalytics (#1756) - Make logging infrastructure consistent and add logging checks (#1755)
- Fix website sidebars and vulnerabilities in packages (#1753)
- Remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
- Update isolation forest notebook (#1696)
- Remove error on invalid columns in DropColumns (#1695)
- Fix PyArrow failure in deeplearning test (#1689)
- Fix linked service setters on cog service base class (#1685)
- KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
- Fix flaky translate tests (#1643)
- Fix speechToTextSuite serialization Fuzzing failure (#1626)
- Fix translator endpoint and update all endpoints for gov regions (#1623)
- Finder runtime issues (#1598)
- Clean up cluster if Databricks tests pass ([#1599](https://github....
SynapseML v0.10.2
v0.10.2
Bug Fixes ๐
- remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
- remove synapse E2E testing exclusion - cyber ml (#1699)
- update isolation forest notebook (#1696)
- don't throw on invalid columns in DropColumns (#1695)
- fix pyarrow failure in deeplearning test (#1689)
- fix linked service on cog service base (#1685)
- fix Uplift Modelling style
- KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
- fix flaky translate tests (#1643)
- update ubuntu to 20.04 in pipeline (#1624)
Build ๐ญ
- bump actions/checkout from 2 to 3 (#1737)
- bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
- bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
- bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)
Documentation ๐
- update developer readme instruction on python env creation (#1693)
- fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
- improve error msg to make it clearer for users and fix typos (#1662)
- simplify data downloading and add mlflow to uplift modelling (#1659)
- move magic command forward since it restarts interpreter
- remove unused docs and fix links
- improve example notebooks
- add aisample uplift modelling (#1640)
- fix command to launch jupyter notebook (#1649)
- add mlflow in ai samples time series forecasting (#1645)
- add mlflow logging and loading (#1641)
- update spark version in Readme
- improve readme overview
- add aisample on text classification (#1617)
Features ๐
- add simple deep learning text classifier (#1591)
- Add SpeakerEmotionInference transformer for generating SSML tโฆ (#1691)
- Deprecate CNTK objects (#1712)
- Remove CNTK functionality and replace with ONNX (#1593)
- R test generation (#1586)
Maintenance ๐ง
- bump version to 0.10.2 (#1738)
- fix style (#1736)
- automate clean-acr with github action workflow (#1735)
- autodelete old models (#1729)
- Making secrets optional and cached (#1726)
- add secret scanning infrastructure (#1724)
- Move new ImageFeaturizer to onnx namespace (#1711)
- ScalaStyle fixes (#1716)
- update scalatest and scalactic (#1706)
- remove synapse test exclusions (#1698)
- pin az and python versions (#1705)
- fix ado integration (#1704)
- remove notebooks (#1703)
- fix reopen comment action
- fix reopen on comment workflow
- fix typo in issue reopen yaml
- re open github issues after a comment (#1676)
- clean up github workflows and add issue label remover (#1674)
- turn off failing synapse tests temporarily (#1658)
- added
synapse-internal
to platform detector function (#1651) - publish test jars
- improve test coverage (#1631)
- Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
- clean up TextAnalytics cog service APIs (#1622)
Testing ๐
Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n
Changes:
- cd1d2ea chore: bump version to 0.10.2 (#1738)
- fd78889 build: bump actions/checkout from 2 to 3 (#1737)
- c806ba7 chore: fix style (#1736)
- e6b5a90 feat: add simple deep learning text classifier (#1591)
- 1de2d55 chore: automate clean-acr with github action workflow (#1735)
- 952d1bd clarify date comparisons when deleting old models/groups (#1733)
- 6ea02bd chore: autodelete old models (#1729)
- 8b02e1d chore: Making secrets optional and cached (#1726)
- c62c6ad test: Additional E2E testing infrastructure (#1727)
- aeb2ff7 feat: Add SpeakerEmotionInference transformer for generating SSML tโฆ (#1691)
See More
- 0b96cc5 chore: add secret scanning infrastructure (#1724)
- 2a7a67b feat: Deprecate CNTK objects (#1712)
- e38e3ad chore: Move new ImageFeaturizer to onnx namespace (#1711)
- 0ff6802 test: Improve ONNXtests reliability (#1713)
- fe4c5d2 chore: ScalaStyle fixes (#1716)
- 050b541 build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
- f2e88fd feat: Remove CNTK functionality and replace with ONNX (#1593)
- abdfe19 fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
- 6a1f994 chore: update scalatest and scalactic (#1706)
- 144674f chore: remove synapse test exclusions (#1698)
- 32c654b chore: pin az and python versions (#1705)
- c8fba28 chore: fix ado integration (#1704)
- 92d4095 chore: remove notebooks (#1703)
- a953780 fix: remove synapse E2E testing exclusion - cyber ml (#1699)
- b257c70 fix: update isolation forest notebook (#1696)
- 9120b05 using predictionCol for isolation forest (#1686) [ #1060 ]
- 448f6b7 Remove trident.mlflow APIs. (#1687)
- f4af33f fix: don't throw on invalid columns in DropColumns (#1695)
- c531bbb docs: update developer readme instruction on python env creation (#1693)
- 467e651 build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
- 302831f fix: fix pyarrow failure in deeplearning test (#1689)
- e857511 fix: fix linked service on cog service base (#1685)
- f29318a build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)
- 50ac0c8 Update reopen-issue-on-comment.yml
- c9278b5 chore: fix reopen comment action
- b3a9ba9 chore: fix reopen on comment workflow
- 9fe273b chore: fix typo in issue reopen yaml
- a7c50de chore: re open github issues after a comment (#1676)
- 8914750 chore: clean up github workflows and add issue label remover (#1674)
- 965231a docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
- 4fa7249 docs: improve error msg to make it clearer for users and fix typos (#1...
v0.10.1
SynapseML v0.10.1
Bug Fixes ๐
- fix speechToTextSuite serializationFuzzing failure (#1626)
- fix translator endpoint and update all endpoints for gov regions (#1623)
- binder runtime issues (#1598)
- clean up cluster if databricks tests pass (#1599)
- fix deep-learning test flakiness (#1600)
- update dotnetTestBase assembly version (#1601)
- fix flaky forms test (#1584)
Build ๐ญ
- bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
- bump actions/setup-node from 2 to 3 (#1610)
- bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
- bump actions/setup-java from 2 to 3 (#1612)
- simplify e2e test pipeline with test matrix
Documentation ๐
- add aisample notebooks into community folder (#1606)
- add aisample time series forecasting (#1614)
- fix .NET logo on website (#1604)
- improve OpenAI notebook (#1596)
- pin mybinder to v0.10.0 to avoid thrashing
- add demo into videos on website (#1581)
- update installation guidance of v0.10.0 (#1578)
- add more .net samples (#1570)
- add dotnet installation & example doc (#1567)
- Update issue template
Features ๐
- add stale bot for issues (#1602)
- Support grayscale images in
toNDArray
(#1592) - Add the
descriptionExcludes
parameter to AnalyzeImage (#1590) - Added the
DeepVisionClassifier
a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)
Maintenance ๐ง
- bump to v0.10.1 (#1628)
- deprecate old Text analytics APIs to prepare for refactoring (#1627)
- remove deprecated lime APIs (#1620)
- update openai service to the official deployment, and disable test due to outage (#1619)
- Auto update GitHub actions with dependabot (#1608)
- hotfix binder badge
- pin binder version for users (#1607)
- Bump spark to 3.2.2
- bump spark version
- Format welcome message with emojis (#1583)
- Add welcome message to new PRs/Issues (#1573)
- Add GH workflow to label new/reopened issues (#1571)
- update website (#1566)
Testing ๐
- stabilize unit tests (#1576)
Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n
Changes:
- 0f54bc6 chore: bump to v0.10.1 (#1628)
- 3d0f3f4 chore: deprecate old Text analytics APIs to prepare for refactor (#1627)
- 2052e13 chore: remove deprecated lime APIs (#1620)
- 09213b0 fix: fix speechToTextSuite serializationFuzzing failure (#1626)
- 9f78bf0 fix: fix translator endpoint and update all endpoints for gov regions (#1623)
- 7e90d19 docs: add aisample notebooks into community folder (#1606)
- ac40e5a chore: update openai service to official, and disable test due to outage (#1619)
- f54f7f6 docs: add aisample time series forecasting (#1614)
- 7b4b0e1 build: bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
- 43b0d17 build: bump actions/setup-node from 2 to 3 (#1610)
See More
- c48a07a build: bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
- b1a331c build: bump actions/setup-java from 2 to 3 (#1612)
- 78e40cb chore: Auto update github actions with dependabot (#1608)
- 69d2d20 chore: hotfix binder badge
- 93d7ccf chore: pin binder version for users (#1607)
- c7a61ec fix: binder runtime issues (#1598)
- c960c06 docs: fix .NET logo on website (#1604)
- 28a35b4 fix: clean up cluster if databricks tests pass (#1599)
- 5a28740 fix: fix deep-learning test flakiness (#1600)
- adf1a61 fix: update dotnetTestBase assembly version (#1601)
- c659b33 feat: add stale bot for issues (#1602)
- 05a4202 docs: improve OpenAI notebook (#1596)
- e019756 feat: Support gray scale images in
toNDArray
(#1592) - 51beaa0 feat: Add the
descriptionExcludes
parameter to AnalyzeImage (#1590) - b9ac22a docs: pin mybinder to v0.10.0 to avoid thrashing
- 1808a0f chore: Bump spark to 3.2.2
- 8e7d453 build: simplify e2e test pipeline with test matrix
- 8e34c7b chore: bump spark version
- 44c8ed5 feat: Added the
DeepVisionClassifier
a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518) - e4f0883 fix: fix flaky forms test (#1584)
- 7da5f49 chore: Format welcome message with emojis (#1583)
- 0e6bb35 Serena/update issue template (#1582)
- a6a2718 docs: add demo into videos on website (#1581)
- 7c34fc4 test: stabilize unit tests (#1576)
- 49f3a58 chore: Add welcome message to new PRs/Issues (#1573)
- 4868e8b Add back LightGBM library initialization in booster (#1575)
- d427b88 docs: update installation guidance of v0.10.0 (#1578)
- 55a60c9 docs: add more .net samples (#1570)
- 39fe2d8 chore: Add GH workflow to label new/reopened issues (#1571)
- 0febe3c docs: add dotnet installation & example doc (#1567)
- db95a10 chore: update website (#1566)
This list of changes was auto generated.
v0.10.0
Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API thatโs usable across Python, R, Scala, Java, .NET, C#, and F#.
Highlights
OpenAI Language Models | .NET, C#, and F# Support | Full MLFlow Support | Live Demos in Browser |
Embed 175-billion parameter models into your databases with ease | Use or train any SynapseML model from .NET | Quick and easy MLOps, model management, and autologging | Explore the SynapseML library with zero setup |
Learn More | Getting Started Guide | Explore the Docs | Run in Browser |
New Features
General โจ
- SynapseML now supports .NET, C#, F#, and other .NET ecosystem languages in addition to Scala, Python, and R. Please see our Setup Guide and LightGBM from .NET example for more details. (#1539, #1156, #1443)
- SynapseML is now usable from your browser with zero setup using Binder. Quickly explore our demos in Binder. (#1487, #1493)
Azure Cognitive Services for Big Data ๐ง
- Added OpenAI GPT-3 Sentence Completion Transformer. Use this feature to embed 175-billion parameter language models into distributed pipelines and databases to solve a variety of general purpose NLP tasks across natural language and code. (#1495, #1541)
- Added an example of Sentence Completion with GPT-3 (#1564)
- Added support for Form Recognizer V3.0 (#1269)
- Improved MVAD usability with async training and better data validation (#1477)
- Upgraded the univariate anomaly detection version to v1.1-preview (#1440)
- Added a multivariate anomaly detection sample notebook (#1365)
- Added a Text to Speech example to cognitive service overview (#1350)
- Added opinion mining to TextSentiment Models (#1449)
- Fixed Azure Maps schemas (#1553)
- Removed modelID param validators in FormRecognizerV3 (#1551)
- Fixed form recognizer and form ontology learner issues (#1506)
- Fixed
setServiceName
python method in OpenAI (#1498) - Fixed error in Text Analytics Analyze schema
- Improved error handling for MVAD (#1448, #1391)
- Removed unused concurrency parameter for MVAD (#1383)
- Improved robustness of flood risk notebook by adding polling (#1427)
Responsible AI at Scale ๐
- Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
- Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
- Added a notebook for ICE and PDP feature explainers (#1318)
- Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)
MLFlow ๐
- Added documentation for MLFlow autologging (#1508)
- Added documentation on the SynapseML-MLFlow integration (#1428)
LightGBM on Spark ๐ณ
- Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
- Added seed parameters to LightGBM (#1387)
- Added a method to get LightGBM native model string directly (#1515)
- Fixed issue with validation data creation during
useSingleDataset
mode (#1527) - Fixed multiclass training with initial scores (#1526)
- Fixed saving LightGBM model iterations with early stopping (#1497)
- Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
- Fixed issue where when empty partition is chosen as the main worker in
singleDatasetMode
(#1458) - Fixed bug with data repartitioning in
LightGBMRanker
(#1368) - Fixed outdated docs for
useSingleDatasetMode
(#1562) - Refactored LightGBM class structure to improve logging and debugging (#1557)
Vowpal Wabbit ๐
- Fixed issues with the
saveNativeModel
for the VWRegressionModel #1364 (#1366) - Fixed issues with building quadratic interaction terms (#1460)
Isolation Forests ๐ฒ
Additional Updates
Maintenance ๐ง
- Removed unused debugging code (#1546)
- Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
- Made python style checks verbose (#1532)
- Fixed library checking while installing library on Databricks cluster (#1488)
- Upgraded and fix Dockerfiles (#1472)
- Added Developer Docker Image build to pipeline (#1480)
- Fixed ADO area path in Issue Linker (#1464)
- Fix master version badge display
- Improved Databricks error reporting
- Updated azure cli to stop build errors
- Fixed SSL handshake flakiness
- Added
itsdangerous
as a dependency to ADB tests (#1412) - Turned on debug for pr to work item workflow
- Pointed pr linker to official implementation
- Changed GitHub action trigger from pull_request_target to pull_request (#1413)
- Fixed issue where Unit Tests were not executing ([#1409](https://github.com/Microsoft/SynapseML/issu...
SynapseML v0.9.5
Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.9.5 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API thatโs usable across Python, R, Scala, and Java.
Highlights
New Features
Geospatial Intelligence ๐บ๏ธ
- Added support for distributed geospatial queries backed by the Azure Maps API
- Added the geospatial usage overview (#1339)
- Explore how to use the geospatial intelligence services to analyze flood risks. (#1339)
- Added the
AddressGeocoder
transformer to map informal addresses to standardized adresses with latitude and longitude (#1294) - Added the
ReverseGeocoder
transformer to map latitude and longitude measurements to standardized addresses. (#1339) - Added the
CheckPointInPolygon
, to detect if latitude and longitude queries lie inside regions of interest (#1339)
Azure Cognitive Services for Big Data ๐ง
- Added the Healthcare Analytics Transformer for extracting medical information, entities, and relationships for text. [Example Usage] (#1329)
- Added the
FitMultivariateAnomaly
estimator for training custom anomaly detection models on DataFrames of multivariate time series data (#1272) - Added example notebook for Multivariate Anomaly Detector
- See how to train a custom Multivariate Anomaly detector in the Estimators reference docs (#1323)
- Added simplified Text Analytics transformers that support auto-batching (#1329)
- Added the
TextToSpeech
Transformer for transforming Dataframes of text to audio files with neural voice synthesis (#1320) - Added the
TextAnalyze
transformer to support executing multiple text analytics workloads within a single API call (#1267, #1312)
Responsible AI at Scale ๐
- Added Individual Conditional Expectation explanations and Partial Dependence Plots with the
ICETransformer
. This tool gives detailed explanations of how features in opaque-box models affect the model prediction. (#1284) - Learn about how to use the ICETransformer through an example with the Adult Census dataset
MLFlow ๐
LightGBM on Spark ๐ณ
- Improved LightGBM training performance 4x-10x by setting num_threads to be cores-1 (#1282)
- Added the predict_disable_shape_check in LightGBM (#1273)
- Reduced temporary file bloat by creating the LightGBM native temp directory lazily (#1326)
- Added logging for number of columns and rows when creating datasets, set useSingleDatasetMode=True by default (#1222)
Infrastructure ๐ญ
- SynapseML now installable from Maven Central!
- SynapseML now supports spark v3.2.x
Additional Updates
Bug Fixes ๐
- Allowed FlattenBatch to propagate non-array values (#1286)
- Fixed flaky tests (#1342)
- Fixed website bugs and migrated docSearch (#1331)
- Fixed issue where IsolationForestModel does not properly exchange params with the inner model (#1330)
- Corrected the objective param when using fobj (#1292)
- Fixed issue where broadcasted sum in breeze 1.0 breaks in Spark 3.2.0 (#1299)
- Hotfixes for R test runners (#1283)
- fix installation instruction (#1268)
- Removing broadcast hint (#1255)
- fix install instructions (#1259)
Build ๐ญ
- bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website (#1270)
- remove some deps that cause sec issues (#1264)
Documentation ๐
- Fixed broken link to CyberML notebook (#1322)
- Added website announcement bar (#1263)
- Updated and improve readme (#1262)
- Removed references to runme in contributing.md
- Supported Math expressions in website markdown (#1278)
- Corrected Synapse typo in website (#1335)
Maintenance ๐ง
- Stopped lightGBM tests from timing out (#1315)
- Fixed r test flakiness (#1314)
- Updated VerifyLightGBMClassifier.scala (#1313)
- Update speech SDK test results
- Add in missing tests in build (#1300)
- Fix flaky build steps (#1298)
- Fix website telemetry (#1261)
- Add website telemetry (#1260)
- Added missing test classes to pipeline
Contributor Spotlight
We are excited to highlight the contributions of the following SynapseML contributors: