01 Sep 18:59

KeerthiYandaOS

f54275f

v0.11.2-spark3.4 Pre-release

Pre-release

What's Changed

fix: modified the search engine in the demo notebook to bing by @sherylZhaoCode in #2013
build: bump semver from 5.7.1 to 5.7.2 in /website by @dependabot in #2012
docs: add prerequisites - openai and cognitive services resources by @JessicaXYWang in #2008
docs: update notebooks - bring back fabric reviewers changes. by @JessicaXYWang in #1979
fix: docker link by @niehaus59 in #2019
docs: Refactor docs and docgen framework by @mhamilton723 in #2021
chore: bump databricks e2e timeout by @mhamilton723 in #2024
docs: add dead link checker by @mhamilton723 in #2022
docs: fix broken links by @mhamilton723 in #2025
docs: continue fixing broken links by @mhamilton723 in #2026
docs: fix broken links by @mhamilton723 in #2027
docs: fix broken link by @mhamilton723 in #2032
docs: add QandA notebook. by @aydan-at-microsoft in #2029
build: bump actions/checkout from 2 to 3 by @dependabot in #2030
docs: variable formatting for QandA nb by @aydan-at-microsoft in #2033
fix: Fix ONNX link by @iemejia in #2035
fix: Improve LGBM exception and logging by @svotaw in #2037
docs: fix broken links by @JessicaXYWang in #2042
docs: initial POC of Jessica's fabric doc generator by @mhamilton723 in #2023
fix: improve docgen by @eisber in #2043
feat: Support langchain transformer on fabric by @lhrotk in #2036
chore: remove secret scanner by @mhamilton723 in #2048
fix: Fix problem with empty partition assigned to validation data by @svotaw in #2059
chore: Adding Spark34 support by @KeerthiYandaOS in #2052

New Contributors

@aydan-at-microsoft made their first contribution in #2029

Full Changelog: v0.11.2...v0.11.2-spark3.4

Contributors

iemejia, svotaw, and 9 other contributors

Assets 2

10 Jul 22:45

mhamilton723

v0.11.2

1e80aa1

SynapseML v0.11.2

v0.11.2

Bug Fixes 🐞

make geospatial services robust to 404 thrown by the service (#2007)
dont retry 4XX codes other than 429 (#2005)
improve tolerance for error handling (#1991)
Some tidying and build fixes (#1984)
fix import error on using the cognitive services on AML spark clusters (#1951)
Update retry arguments in GeospatialServices - Flooding Risk.ipynb (#1954)
Update retry arguments in GeospatialServices - Flooding Risk.ipynb (#1954)
add retries for torchvision download (#1949)

Build 🏭

upload notebooks to storage on every build (#2001)
Publishing to ADO Feed (#1995)
bump ossf/scorecard-action from 2.1.3 to 2.2.0 (#1993)
Update documentprojection to support manifest files and custom formatters/endpoints (#1976)
bump microsoft/gpt-review from 0.9.3 to 0.9.4 (#1962)
bump microsoft/gpt-review from 0.9.2 to 0.9.3 (#1960)
Added module for generating docs from notebooks (#1911)

Documentation 📘

Improved LightGBM docs (#2003)
Add Langchain notebook demo (#2002)
update website synapse info and fabric installation (#2000)
update OpenAI notebook for acrolinx (#1999)
Add new LightGBM streaming docs (#1992)
Update Vowpal Wabbit - Multi-class classification.ipynb (#1971)
Update Vowpal Wabbit - Overview.ipynb (#1972)
Update Vowpal Wabbit - Contextual Bandits.ipynb (#1970)
Update Vowpal Wabbit - Classification using VW-native Format.ipynb (#1969)
Update Vowpal Wabbit - Classification using SparkML Vector.ipynb (#1968)
update spark33 installation instruction readme (#1961)
remove cell output - sentiment analysis quickstart (#1932)
improve and organize openai docs (#1937)

Features 🌈

Reference dataset (#1977)
Add Langchain Transformation (#1925)
OrthogonalForestDML (#1873)

Maintenance 🔧

bump to v0.11.2 (#2011)
make it so custom versions are possible (#1998)
fix spark_install for R tests (#1994)
retry conda install and add a timeout (#1983)
exclude failing explanation dashboard notebook (#1982)
fix broken tests (#1981)
add timeouts for R tests (#1963)
add internal to find_secret (#1948)
fix flakiness in spark installation (#1944)
update R-setup.md docs (#1946)
Add Azure Open AI based PR Summarization (#1957)
fix gpu notebook protobuf version (#1959)
A base scrubber and a "Shared Access Signature" Scrubber (#1939)
remove old notebooks from website (#1934)
no-op commit to ensure no double-releasing of library

Performance Improvements 🚀

Update OpenAI Embedding with latest embedding model (#1938)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Assets 2

10 Jul 23:15

mhamilton723

v0.11.2-spark3.3

bab43c8

v0.11.2-spark3.3 Pre-release

Pre-release

chore: bump to spark 3.3.1

Assets 2

04 May 20:02

mhamilton723

v0.11.1-spark3.3

5656c3c

v0.11.1-spark3.3 Pre-release

Pre-release

chore: make it so custom versions are possible

Assets 2

24 Apr 23:13

mhamilton723

v0.11.1

866261c

SynapseML v0.11.1

Bug Fixes 🐞

set default values for aadToken & url for internal Synapse (#1918)
ONNX model shape inference cannot handle batch with shape [-1] (#1906)
forgot to add getPValue to python side (#1909)
generate random dir for each test (#1908)
add back diagnosticsInfo for MVAD (#1892)
DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
fix date parsing in FaceSuite test (#1896)
fix Build pipeline (#1904)
Retry OnnxHub call to improve test reliability (#1889)
Normalize line-endings (#1883)
Remove case matching for erased generic type (#1880)
fix bug #1869, DML .setFitIntercept should be set to true (#1876)
Remove extraneous "Foo" type from Py codegen (#1867)
Allow variable size in ONNX inputs (#1851)
Abstain from CodeQL for markdown-only changes (#1865)
fix style
update OpenAIEmbedding internalServiceType

Build 🏭

bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
bump webpack from 5.75.0 to 5.76.1 in /website (#1870)

Documentation 📘

Fix installation instruction in the webpage for the build.sbt file (#1921)
note discrete treatment data type (#1905)
add custom chatbot creation to form demo (#1888)
add overview page for simple DNN and fix some typos (#1879)
Fix a typo in installation docs
fix link issue in CONTRIBUTING.md (#1864)
fix a few issues in cognitive service demo (#1861)

Features 🌈

add streaming API for MVAD (#1893)
[DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
support new api version of form recognizer (#1882)
Add a new function to DMLModel, getPValue (#1863)
update default internal endpoint for cog services (#1859)

Maintenance 🔧

bump to v0.11.1 (#1933)
Adding telemetry for the dataset metadata. This one is specially for … (#1917)
fix r tests (#1927)
fix build issues (#1916)
disable test until Synapse is fixed (#1915)
add .bloop to .gitignore (#1897)
clean up old/missed search indexes in SearchWriterSuite (#1901)
Add utility to clean azure search indexes
update website docs to point to correct developer API docs (#1877)
Update pipeline.yaml for Azure Pipelines (#1866)
make sure nightly build has new commit

Changes:

866261c chore: bump to v0.11.1 (#1933)
3c09702 chore: Adding telemetry for the dataset metadata. This one is specially for … (#1917)
0d0d10c feat: add streaming API for MVAD (#1893)
1b71c1d chore: fix r tests (#1927)
0df97ad chore: fix build issues (#1916)
78695fb Update Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb (#1922)
87d5bc5 docs: Fix installation instruction in the webpage for the build.sbt file (#1921)
8320b2b fix: set default values for aadToken & url for internal Synapse (#1918)
4912ae4 chore: disable test until Synapse is fixed (#1915)
469445b fix: ONNX model shape inference cannot handle batch with shape [-1] (#1906)

See More

3fa001e build: bump peter-evans/create-or-update-comment from 2 to 3 (#1907)
f51327e Update LightGBM version to 3.3.5 (#1910)
b1e584e fix: forgot to add getPValue to python side (#1909)
a09a6f7 docs: note discrete treatment data type (#1905)
0fa3f2a fix: generate random dir for each test (#1908)
736c317 fix: add back diagnosticsInfo for MVAD (#1892)
13afff6 fix: DML run get timeout if big dataset has more feature columns (Workaround Synapse Spark optimizer issue) (#1903)
7546e7f build: bump ossf/scorecard-action from 2.1.2 to 2.1.3 (#1898)
f227f02 fix: fix date parsing in FaceSuite test (#1896)
0f02626 fix: fix Build pipeline (#1904)
ce9fe41 chore: add .bloop to .gitignore (#1897)
7ffa970 chore: clean up old/missed search indexes in SearchWriterSuite (#1901)
9a6cf03 chore: Add utility to clean azure search indexes
52919ce fix: Retry OnnxHub call to improve test reliability (#1889)
979c629 feat: [DistributionBalanceMeasure] Add implementation + unit tests for custom reference distribution (#1885)
412620a docs: add custom chatbot creation to form demo (#1888)
9f634a6 feat: Add ChatGPT through the OpenAIChatCompletion transformer (#1887)
7657089 fix: Normalize line-endings (#1883)
c156792 feat: support new api version of form recognizer (#1882)
ed842a5 docs: add overview page for simple DNN and fix some typos (#1879)
87e1c78 fix: Remove case matching for erased generic type (#1880)
cd72bc9 build: bump amannn/action-semantic-pull-request from 5.1.0 to 5.2.0 (#1878)
564d047 fix: fix bug #1869, DML .setFitIntercept should be set to true (#1876)
392dbbf chore: update website docs to point to correct developer API docs (#1877)
129abde build: bump @sideway/formula from 3.0.0 to 3.0.1 in /website (#1874)
4d1c560 build: bump webpack from 5.75.0 to 5.76.1 in /website (#1870)
62c79d8 docs: Fix a typo in installation docs
1f63dab feat: Add a new function to DMLModel, getPValue (#1863)
83f8260 fix: Remove extraneous "Foo" type from Py codegen (#1867)
a5bec45 fix: Allow variable size in ONNX inputs (#1851)
23c9b0a chore: Update pipeline.yaml for Azure Pipelines (#1866)
dedcbda docs: fix link issue in CONTRIBUTING.md (#1864)
a7f31d5 fix: Abstain from CodeQL for markdown-only changes (#1865)
a5f38b1 Update DoubleMLEstimator test CI verification (#1862)
a44f917 fix: fix style
cc931af fix: update OpenAIEmbedding internalServiceType
424d586 feat: update default internal endpoint for cog services (#1859)
e4a0e2c docs: fix ...

Assets 2

05 Mar 13:37

mhamilton723

v0.11.0

7b23764

SynapseML v0.11.0

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.11.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights


ChatGPT and GPT-4 at Scale	Simple Deep Learning	LightGBM v2
Intelligent chat and embeddings. Simplified Prompting APIs.	Train custom image and text classifiers with ease	Higher performance, >10x lower memory footprint, same API
View Notebook	Learn More	Try an example


ONNX Model Hub	Causal Learning	Vowpal Wabbit v2
Embed >150 state of the art deep networks into your pipelines	Discover and measure causal treatment effects	New second generation integration
Learn More	View Docs	Explore Samples

New Features

General ✨

R Support is no longer Beta! (#1586)
Support for Spark 3.2.3

Open AI 🤖

Add OpenAI Prompt Template support (#1843)
Add Azure OpenAI embedding support (#1832)
Add Azure Active Directory authentication for OpenAI (#1829)
Add Null-value handling for OpenAI models (#1854)

Deep Learning 🕸

Remove CNTK functionality and replace with ONNX (#1593)
Add the DeepTextClassifier a simple API for fine tuning a wide array of Hugging Face 🤗 text transformers using PyTorch Lightning (#1591)
Add the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Azure Cognitive Services for Big Data 🧠

Add SpeakerEmotionInference transformer to generate emotion annotation tags for emotive reading in SpeechToText (#1691)
Add new AnalyzeText API (#1760)
Support Azure Active Directory (AAD) authentication for the cognitive services (#1778, #1797)
Move different cognitive services into sub packages (#1746)
Add audiobook generation example (#1852)
Add a notebook for advanced cognitive service usage (#1825)
Upgrade MVAD to v1.1 (#1788)
Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
Add word-level timing to SpeechToTextSDK and ConversationTranscription (#1801)
Add the descriptionExcludes parameter to AnalyzeImage (#1590)

Causal Learning 📈

Add the causal DoubleMLEstimator for learning causal treatment effects from data (#1715)
Add a DoubleMLEstimator document and sample notebook (#1730)
Fix DML regression bug, should remove both treatment and outcome columns as feature columns (#1820)
Add TreatmentCol type checking (#1816)
Update test to validate ATE value should be positive (#1821)
Fix issue with missing causal test coverage (#1799)

LightGBM 🌳

Add LightGBM streaming execution mode for more reliable performance with orders of magnitude less memory. (#1580)
Add maxNumClasses param to LightGBMClassifier for multi-class (#1841)
Added the passThroughArgs feature which allows users to set low level LGBM parameters before they are wrapped in SparkML (#1749)

Vowpal Wabbit 🐇

Vowpal Wabbit v2 (#1579):
- Support Vowpal Wabbit input format using VowpalWabbitGeneric model
- Support additional algorithms & label types (multi-class, cost sensitive one against all): sample notebook
- Progressive validation (aka 1-step ahead) using VowaplWabbitGenericProgressive
- New Contextual Bandit Offline Policy Evaluation Notebook
- Data parallel training independent of cluster size

Additional Updates

Bug Fixes 🐞

Support grayscale images in toNDArray (#1592)
Adjust learning rate in VW example notebook (#1853)
Correct copy/paste error in acr cleanup (#1838)
Fix synapse test config, and isolation forest notebook (#1833)
Add spark config to fix ArrayStoreException (#1757)
Fix breeze NoSuchMethodError (#1807)
Fix modelVersion param in TextAnalytics (#1756)
Make logging infrastructure consistent and add logging checks (#1755)
Fix website sidebars and vulnerabilities in packages (#1753)
Remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
Update isolation forest notebook (#1696)
Remove error on invalid columns in DropColumns (#1695)
Fix PyArrow failure in deeplearning test (#1689)
Fix linked service setters on cog service base class (#1685)
KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
Fix flaky translate tests (#1643)
Fix speechToTextSuite serialization Fuzzing failure (#1626)
Fix translator endpoint and update all endpoints for gov regions (#1623)
Finder runtime issues (#1598)
Clean up cluster if Databricks tests pass ([#1599](https://github....

Contributors

nightscape, svotaw, and 20 other contributors

Assets 2

22 Nov 14:30

mhamilton723

v0.10.2

cd1d2ea

SynapseML v0.10.2

v0.10.2

Bug Fixes 🐞

remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
remove synapse E2E testing exclusion - cyber ml (#1699)
update isolation forest notebook (#1696)
don't throw on invalid columns in DropColumns (#1695)
fix pyarrow failure in deeplearning test (#1689)
fix linked service on cog service base (#1685)
fix Uplift Modelling style
KernelSHAP throws error when the key type in the ZipMap output is LongType (#1656)
fix flaky translate tests (#1643)
update ubuntu to 20.04 in pipeline (#1624)

Build 🏭

bump actions/checkout from 2 to 3 (#1737)
bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)

Documentation 📘

update developer readme instruction on python env creation (#1693)
fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
improve error msg to make it clearer for users and fix typos (#1662)
simplify data downloading and add mlflow to uplift modelling (#1659)
move magic command forward since it restarts interpreter
remove unused docs and fix links
improve example notebooks
add aisample uplift modelling (#1640)
fix command to launch jupyter notebook (#1649)
add mlflow in ai samples time series forecasting (#1645)
add mlflow logging and loading (#1641)
update spark version in Readme
improve readme overview
add aisample on text classification (#1617)

Features 🌈

add simple deep learning text classifier (#1591)
Add SpeakerEmotionInference transformer for generating SSML t… (#1691)
Deprecate CNTK objects (#1712)
Remove CNTK functionality and replace with ONNX (#1593)
R test generation (#1586)

Maintenance 🔧

bump version to 0.10.2 (#1738)
fix style (#1736)
automate clean-acr with github action workflow (#1735)
autodelete old models (#1729)
Making secrets optional and cached (#1726)
add secret scanning infrastructure (#1724)
Move new ImageFeaturizer to onnx namespace (#1711)
ScalaStyle fixes (#1716)
update scalatest and scalactic (#1706)
remove synapse test exclusions (#1698)
pin az and python versions (#1705)
fix ado integration (#1704)
remove notebooks (#1703)
fix reopen comment action
fix reopen on comment workflow
fix typo in issue reopen yaml
re open github issues after a comment (#1676)
clean up github workflows and add issue label remover (#1674)
turn off failing synapse tests temporarily (#1658)
added synapse-internal to platform detector function (#1651)
publish test jars
improve test coverage (#1631)
Remove MVAD's dependence on hardwired credentials and azure SDKs (#1629)
clean up TextAnalytics cog service APIs (#1622)

Testing 💚

Additional E2E testing infrastructure (#1727)
Improve ONNXtests reliability (#1713)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

cd1d2ea chore: bump version to 0.10.2 (#1738)
fd78889 build: bump actions/checkout from 2 to 3 (#1737)
c806ba7 chore: fix style (#1736)
e6b5a90 feat: add simple deep learning text classifier (#1591)
1de2d55 chore: automate clean-acr with github action workflow (#1735)
952d1bd clarify date comparisons when deleting old models/groups (#1733)
6ea02bd chore: autodelete old models (#1729)
8b02e1d chore: Making secrets optional and cached (#1726)
c62c6ad test: Additional E2E testing infrastructure (#1727)
aeb2ff7 feat: Add SpeakerEmotionInference transformer for generating SSML t… (#1691)

See More

0b96cc5 chore: add secret scanning infrastructure (#1724)
2a7a67b feat: Deprecate CNTK objects (#1712)
e38e3ad chore: Move new ImageFeaturizer to onnx namespace (#1711)
0ff6802 test: Improve ONNXtests reliability (#1713)
fe4c5d2 chore: ScalaStyle fixes (#1716)
050b541 build: bump loader-utils from 2.0.2 to 2.0.3 in /website (#1709)
f2e88fd feat: Remove CNTK functionality and replace with ONNX (#1593)
abdfe19 fix: remove Vowpal Wabbit exclusion, add Interpretability exclusion (#1708)
6a1f994 chore: update scalatest and scalactic (#1706)
144674f chore: remove synapse test exclusions (#1698)
32c654b chore: pin az and python versions (#1705)
c8fba28 chore: fix ado integration (#1704)
92d4095 chore: remove notebooks (#1703)
a953780 fix: remove synapse E2E testing exclusion - cyber ml (#1699)
b257c70 fix: update isolation forest notebook (#1696)
9120b05 using predictionCol for isolation forest (#1686) [ #1060 ]
448f6b7 Remove trident.mlflow APIs. (#1687)
f4af33f fix: don't throw on invalid columns in DropColumns (#1695)
c531bbb docs: update developer readme instruction on python env creation (#1693)
467e651 build: bump amannn/action-semantic-pull-request from 5.0.1 to 5.0.2 (#1688)
302831f fix: fix pyarrow failure in deeplearning test (#1689)
e857511 fix: fix linked service on cog service base (#1685)
f29318a build: bump amannn/action-semantic-pull-request from 4 to 5.0.1 (#1680)
50ac0c8 Update reopen-issue-on-comment.yml
c9278b5 chore: fix reopen comment action
b3a9ba9 chore: fix reopen on comment workflow
9fe273b chore: fix typo in issue reopen yaml
a7c50de chore: re open github issues after a comment (#1676)
8914750 chore: clean up github workflows and add issue label remover (#1674)
965231a docs: fix multiple typos and update error hintings in ai-samples-timeseries notebook (#1663)
4fa7249 docs: improve error msg to make it clearer for users and fix typos (#1...

Assets 2

23 Aug 03:41

mhamilton723

v0.10.1

0f54bc6

v0.10.1

SynapseML v0.10.1

Bug Fixes 🐞

fix speechToTextSuite serializationFuzzing failure (#1626)
fix translator endpoint and update all endpoints for gov regions (#1623)
binder runtime issues (#1598)
clean up cluster if databricks tests pass (#1599)
fix deep-learning test flakiness (#1600)
update dotnetTestBase assembly version (#1601)
fix flaky forms test (#1584)

Build 🏭

bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
bump actions/setup-node from 2 to 3 (#1610)
bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
bump actions/setup-java from 2 to 3 (#1612)
simplify e2e test pipeline with test matrix

Documentation 📘

add aisample notebooks into community folder (#1606)
add aisample time series forecasting (#1614)
fix .NET logo on website (#1604)
improve OpenAI notebook (#1596)
pin mybinder to v0.10.0 to avoid thrashing
add demo into videos on website (#1581)
update installation guidance of v0.10.0 (#1578)
add more .net samples (#1570)
add dotnet installation & example doc (#1567)
Update issue template

Features 🌈

add stale bot for issues (#1602)
Support grayscale images in toNDArray (#1592)
Add the descriptionExcludes parameter to AnalyzeImage (#1590)
Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)

Maintenance 🔧

bump to v0.10.1 (#1628)
deprecate old Text analytics APIs to prepare for refactoring (#1627)
remove deprecated lime APIs (#1620)
update openai service to the official deployment, and disable test due to outage (#1619)
Auto update GitHub actions with dependabot (#1608)
hotfix binder badge
pin binder version for users (#1607)
Bump spark to 3.2.2
bump spark version
Format welcome message with emojis (#1583)
Add welcome message to new PRs/Issues (#1573)
Add GH workflow to label new/reopened issues (#1571)
update website (#1566)

Testing 💚

stabilize unit tests (#1576)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

0f54bc6 chore: bump to v0.10.1 (#1628)
3d0f3f4 chore: deprecate old Text analytics APIs to prepare for refactor (#1627)
2052e13 chore: remove deprecated lime APIs (#1620)
09213b0 fix: fix speechToTextSuite serializationFuzzing failure (#1626)
9f78bf0 fix: fix translator endpoint and update all endpoints for gov regions (#1623)
7e90d19 docs: add aisample notebooks into community folder (#1606)
ac40e5a chore: update openai service to official, and disable test due to outage (#1619)
f54f7f6 docs: add aisample time series forecasting (#1614)
7b4b0e1 build: bump EnricoMi/publish-unit-test-result-action from 1 to 2 (#1609)
43b0d17 build: bump actions/setup-node from 2 to 3 (#1610)

See More

c48a07a build: bump actions/setup-python from 2.3.2 to 4.2.0 (#1611)
b1a331c build: bump actions/setup-java from 2 to 3 (#1612)
78e40cb chore: Auto update github actions with dependabot (#1608)
69d2d20 chore: hotfix binder badge
93d7ccf chore: pin binder version for users (#1607)
c7a61ec fix: binder runtime issues (#1598)
c960c06 docs: fix .NET logo on website (#1604)
28a35b4 fix: clean up cluster if databricks tests pass (#1599)
5a28740 fix: fix deep-learning test flakiness (#1600)
adf1a61 fix: update dotnetTestBase assembly version (#1601)
c659b33 feat: add stale bot for issues (#1602)
05a4202 docs: improve OpenAI notebook (#1596)
e019756 feat: Support gray scale images in toNDArray (#1592)
51beaa0 feat: Add the descriptionExcludes parameter to AnalyzeImage (#1590)
b9ac22a docs: pin mybinder to v0.10.0 to avoid thrashing
1808a0f chore: Bump spark to 3.2.2
8e7d453 build: simplify e2e test pipeline with test matrix
8e34c7b chore: bump spark version
44c8ed5 feat: Added the DeepVisionClassifier a simple API for deep transfer learning and fine-tuning of a variety of vision backbones (#1518)
e4f0883 fix: fix flaky forms test (#1584)
7da5f49 chore: Format welcome message with emojis (#1583)
0e6bb35 Serena/update issue template (#1582)
a6a2718 docs: add demo into videos on website (#1581)
7c34fc4 test: stabilize unit tests (#1576)
49f3a58 chore: Add welcome message to new PRs/Issues (#1573)
4868e8b Add back LightGBM library initialization in booster (#1575)
d427b88 docs: update installation guidance of v0.10.0 (#1578)
55a60c9 docs: add more .net samples (#1570)
39fe2d8 chore: Add GH workflow to label new/reopened issues (#1571)
0febe3c docs: add dotnet installation & example doc (#1567)
db95a10 chore: update website (#1566)

This list of changes was auto generated.

Assets 2

18 Jul 02:50

mhamilton723

v0.10.0

e9986fe

v0.10.0

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.10.0 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, Java, .NET, C#, and F#.

Highlights


OpenAI Language Models	.NET, C#, and F# Support	Full MLFlow Support	Live Demos in Browser
Embed 175-billion parameter models into your databases with ease	Use or train any SynapseML model from .NET	Quick and easy MLOps, model management, and autologging	Explore the SynapseML library with zero setup
Learn More	Getting Started Guide	Explore the Docs	Run in Browser

New Features

General ✨

SynapseML now supports .NET, C#, F#, and other .NET ecosystem languages in addition to Scala, Python, and R. Please see our Setup Guide and LightGBM from .NET example for more details. (#1539, #1156, #1443)
SynapseML is now usable from your browser with zero setup using Binder. Quickly explore our demos in Binder. (#1487, #1493)

Azure Cognitive Services for Big Data 🧠

Added OpenAI GPT-3 Sentence Completion Transformer. Use this feature to embed 175-billion parameter language models into distributed pipelines and databases to solve a variety of general purpose NLP tasks across natural language and code. (#1495, #1541)
Added an example of Sentence Completion with GPT-3 (#1564)
Added support for Form Recognizer V3.0 (#1269)
Improved MVAD usability with async training and better data validation (#1477)
Upgraded the univariate anomaly detection version to v1.1-preview (#1440)
Added a multivariate anomaly detection sample notebook (#1365)
Added a Text to Speech example to cognitive service overview (#1350)
Added opinion mining to TextSentiment Models (#1449)
Fixed Azure Maps schemas (#1553)
Removed modelID param validators in FormRecognizerV3 (#1551)
Fixed form recognizer and form ontology learner issues (#1506)
Fixed setServiceName python method in OpenAI (#1498)
Fixed error in Text Analytics Analyze schema
Improved error handling for MVAD (#1448, #1391)
Removed unused concurrency parameter for MVAD (#1383)
Improved robustness of flood risk notebook by adding polling (#1427)

Responsible AI at Scale 😇

Added partial dependence plots (PDP) to allow for understanding how independent variables affect a model's prediction (#1426)
Updated ICE/PDP documentation with PDP-based feature importance and additional examples (#1441, #1352)
Added a notebook for ICE and PDP feature explainers (#1318)
Updated data balance documentation to better describe how it can be used to ensure model fairness (#1540)

MLFlow 🔃

Added documentation for MLFlow autologging (#1508)
Added documentation on the SynapseML-MLFlow integration (#1428)

LightGBM on Spark 🌳

Added the ability to pass in generic argument strings to LightGBM enabling many complex parameterizations (#1444)
Added seed parameters to LightGBM (#1387)
Added a method to get LightGBM native model string directly (#1515)
Fixed issue with validation data creation during useSingleDataset mode (#1527)
Fixed multiclass training with initial scores (#1526)
Fixed saving LightGBM model iterations with early stopping (#1497)
Fixed issue where chunk size parameter was incorrectly specified during data copy (#1490)
Fixed issue where when empty partition is chosen as the main worker in singleDatasetMode (#1458)
Fixed bug with data repartitioning in LightGBMRanker (#1368)
Fixed outdated docs for useSingleDatasetMode (#1562)
Refactored LightGBM class structure to improve logging and debugging (#1557)

Vowpal Wabbit 🐇

Fixed issues with the saveNativeModel for the VWRegressionModel #1364 (#1366)
Fixed issues with building quadratic interaction terms (#1460)

Isolation Forests 🌲

Added an Isolation Forest Multivariate Anomaly Detection sample notebook (#1483)

Additional Updates

Maintenance 🔧

Removed unused debugging code (#1546)
Remove Synapse test exclusion for Explanation Dashboard notebook (#1531)
Made python style checks verbose (#1532)
Fixed library checking while installing library on Databricks cluster (#1488)
Upgraded and fix Dockerfiles (#1472)
Added Developer Docker Image build to pipeline (#1480)
Fixed ADO area path in Issue Linker (#1464)
Fix master version badge display
Improved Databricks error reporting
Updated azure cli to stop build errors
Fixed SSL handshake flakiness
Added itsdangerous as a dependency to ADB tests (#1412)
Turned on debug for pr to work item workflow
Pointed pr linker to official implementation
Changed GitHub action trigger from pull_request_target to pull_request (#1413)
Fixed issue where Unit Tests were not executing ([#1409](https://github.com/Microsoft/SynapseML/issu...

Contributors

riserrad, svotaw, and 24 other contributors

Assets 2

12 Jan 22:42

mhamilton723

v0.9.5

79d92d3

SynapseML v0.9.5

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.9.5 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights


Geospatial Intelligence	Multivariate Anomaly Detection	Responsible AI at Scale	Text To Speech	Healthcare Analytics
Large-scale map and geocoding operations	Build custom time series anomaly detection systems	Distributed Conditional Expectation and Partial Dependence Analysis	East-to-use Neural Text to Speech for large datasets	Quickly understand entities and relationships in corpora of medical text.

New Features

Geospatial Intelligence 🗺️

Added support for distributed geospatial queries backed by the Azure Maps API
Added the geospatial usage overview (#1339)
Explore how to use the geospatial intelligence services to analyze flood risks. (#1339)
Added the AddressGeocoder transformer to map informal addresses to standardized adresses with latitude and longitude (#1294)
Added the ReverseGeocoder transformer to map latitude and longitude measurements to standardized addresses. (#1339)
Added the CheckPointInPolygon, to detect if latitude and longitude queries lie inside regions of interest (#1339)

Azure Cognitive Services for Big Data 🧠

Added the Healthcare Analytics Transformer for extracting medical information, entities, and relationships for text. [Example Usage] (#1329)
Added the FitMultivariateAnomaly estimator for training custom anomaly detection models on DataFrames of multivariate time series data (#1272)
Added example notebook for Multivariate Anomaly Detector
See how to train a custom Multivariate Anomaly detector in the Estimators reference docs (#1323)
Added simplified Text Analytics transformers that support auto-batching (#1329)
Added the TextToSpeech Transformer for transforming Dataframes of text to audio files with neural voice synthesis (#1320)
Added the TextAnalyze transformer to support executing multiple text analytics workloads within a single API call (#1267, #1312)

Responsible AI at Scale 😇

Added Individual Conditional Expectation explanations and Partial Dependence Plots with the ICETransformer. This tool gives detailed explanations of how features in opaque-box models affect the model prediction. (#1284)
Learn about how to use the ICETransformer through an example with the Adult Census dataset

MLFlow 🔃

Add MLFlow support for saving and loading SynapseML models (#1277)

LightGBM on Spark 🌳

Improved LightGBM training performance 4x-10x by setting num_threads to be cores-1 (#1282)
Added the predict_disable_shape_check in LightGBM (#1273)
Reduced temporary file bloat by creating the LightGBM native temp directory lazily (#1326)
Added logging for number of columns and rows when creating datasets, set useSingleDatasetMode=True by default (#1222)

Infrastructure 🏭

SynapseML now installable from Maven Central!
SynapseML now supports spark v3.2.x

Additional Updates

Bug Fixes 🐞

Allowed FlattenBatch to propagate non-array values (#1286)
Fixed flaky tests (#1342)
Fixed website bugs and migrated docSearch (#1331)
Fixed issue where IsolationForestModel does not properly exchange params with the inner model (#1330)
Corrected the objective param when using fobj (#1292)
Fixed issue where broadcasted sum in breeze 1.0 breaks in Spark 3.2.0 (#1299)
Hotfixes for R test runners (#1283)
fix installation instruction (#1268)
Removing broadcast hint (#1255)
fix install instructions (#1259)

Build 🏭

bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website (#1270)
remove some deps that cause sec issues (#1264)

Documentation 📘

Fixed broken link to CyberML notebook (#1322)
Added website announcement bar (#1263)
Updated and improve readme (#1262)
Removed references to runme in contributing.md
Supported Math expressions in website markdown (#1278)
Corrected Synapse typo in website (#1335)

Maintenance 🔧

Stopped lightGBM tests from timing out (#1315)
Fixed r test flakiness (#1314)
Updated VerifyLightGBMClassifier.scala (#1313)
Update speech SDK test results
Add in missing tests in build (#1300)
Fix flaky build steps (#1298)
Fix website telemetry (#1261)
Add website telemetry (#1260)
Added missing test classes to pipeline

Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:


Serena Ruan	Ilya Matiach	Sudhindra Kovalam
Serena is an engineer on the Azure Synapse team in Beijing. In this release, Serena has continued her unbelievable speed of contributions with support for Multivariate Anomaly Detection, MLFlow, and installation from Maven Central. These contributions are just a few of the many projects Serena has contributed since she joined just a few months ago!	Ilya is a prolific engineer on the Azure Machine Learning Boston team working on responsible AI. Ilya contributed LightGBM on Spark and worked tirelessly to improve and support this feature. Ilya has been an active contributor to the SynapseML project for 5 years and has built many of the tools in the library.	Sudhindra is an engineer on the Microsoft Maps team and has contributed intelligent geospatial APIs to SynapseML v0.9.5. Sudhindra developed new ways to automate generation of Spa...

Contributors

nhymxu, martin0258, and 19 other contributors

Assets 2

Releases: microsoft/SynapseML

v0.11.2-spark3.4

What's Changed

New Contributors

Contributors

SynapseML v0.11.2

v0.11.2

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Performance Improvements 🚀

Acknowledgements

v0.11.2-spark3.3

v0.11.1-spark3.3

SynapseML v0.11.1

SynapseML v0.11.1

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Changes:

SynapseML v0.11.0

Highlights

New Features

General ✨

Open AI 🤖

Deep Learning 🕸

Azure Cognitive Services for Big Data 🧠

Causal Learning 📈

LightGBM 🌳

Vowpal Wabbit 🐇

Additional Updates

Bug Fixes 🐞

Contributors

SynapseML v0.10.2

v0.10.2

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Testing 💚

Acknowledgements

Changes:

v0.10.1

SynapseML v0.10.1

Bug Fixes 🐞

Build 🏭

Documentation 📘

Features 🌈

Maintenance 🔧

Testing 💚

Acknowledgements

Changes:

v0.10.0

Highlights

New Features

General ✨

Azure Cognitive Services for Big Data 🧠

Responsible AI at Scale 😇

MLFlow 🔃

LightGBM on Spark 🌳

Vowpal Wabbit 🐇

Isolation Forests 🌲

Additional Updates

Maintenance 🔧

Contributors

SynapseML v0.9.5

Highlights

New Features

Geospatial Intelligence 🗺️

Azure Cognitive Services for Big Data 🧠

Responsible AI at Scale 😇

MLFlow 🔃

LightGBM on Spark 🌳

Infrastructure 🏭

Additional Updates