diff --git a/README.md b/README.md index 8c8dfad7..cf801d22 100644 --- a/README.md +++ b/README.md @@ -6,11 +6,11 @@ ML.NET was originally developed in Microsoft Research and is used across many pr This package enables training ML.NET pipelines or integrating ML.NET components directly into Scikit-Learn pipelines (it supports `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs). -Documentation can be found [here](https://docs.microsoft.com/en-us/NimbusML/overview) with additional [notebook samples](https://github.com/Microsoft/NimbusML-Samples). +Documentation can be found [here](https://docs.microsoft.com/en-us/NimbusML/overview) and additional notebook samples can be found [here](https://github.com/Microsoft/NimbusML-Samples). ## Installation -`nimbusml` runs on Windows, Linux, and macOS - any platform where 64 bit .NET Core is available. It relies on .NET Core, and this is installed automatically as part of the package. +`nimbusml` runs on Windows, Linux, and macOS. `nimbusml` requires Python **2.7**, **3.5**, or **3.6**, 64 bit version only. Python 3.7 is not yet supported. @@ -20,34 +20,57 @@ Install `nimbusml` using `pip` with: pip install nimbusml ``` -`nimbusml` has been tested on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7. +`nimbusml` has been reported to work on Windows 10, MacOS 10.13, Ubuntu 14.04, Ubuntu 16.04, Ubuntu 18.04, CentOS 7, and RHEL 7. ## Examples Here is an example of how to train a model to predict sentiment from text samples (based on [this](https://github.com/dotnet/machinelearning/blob/master/README.md) ML.NET example). The full code for this example is [here](https://github.com/Microsoft/NimbusML-Samples/blob/master/samples/2.1%20%5BText%5D%20Sentiment%20Analysis%201%20-%20Data%20Loading%20with%20Pandas.ipynb). ```python +from nimbusml import Pipeline, FileDataStream +from nimbusml.datasets import get_dataset +from nimbusml.ensemble import FastTreesBinaryClassifier +from nimbusml.feature_extraction.text import NGramFeaturizer + +train_file = get_dataset('gen_twittertrain').as_filepath() +test_file = get_dataset('gen_twittertest').as_filepath() + +train_data = FileDataStream.read_csv(train_file, sep='\t') +test_data = FileDataStream.read_csv(test_file, sep='\t') + pipeline = Pipeline([ # nimbusml pipeline - NGramFeaturizer(columns={'Features': ['SentimentText']}), - FastTreeBinaryClassifier(feature=['Features'], label='Sentiment') + NGramFeaturizer(columns={'Features': ['Text']}), + FastTreesBinaryClassifier(feature=['Features'], label='Label') ]) # fit and predict -pipeline.fit(data) -results = pipeline.predict(data) +pipeline.fit(train_data) +results = pipeline.predict(test_data) ``` Instead of creating an `nimbusml` pipeline, you can also integrate components into Scikit-Learn pipelines: ```python +from sklearn.pipeline import Pipeline +from nimbusml.datasets import get_dataset +from nimbusml.ensemble import FastTreesBinaryClassifier +from sklearn.feature_extraction.text import TfidfVectorizer +import pandas as pd + +train_file = get_dataset('gen_twittertrain').as_filepath() +test_file = get_dataset('gen_twittertest').as_filepath() + +train_data = pd.read_csv(train_file, sep='\t') +test_data = pd.read_csv(test_file, sep='\t') + pipeline = Pipeline([ # sklearn pipeline ('tfidf', TfidfVectorizer()), # sklearn transform - ('clf', FastTreeBinaryClassifier())]) # nimbusml learner + ('clf', FastTreesBinaryClassifier()) # nimbusml learner ]) # fit and predict -pipeline.fit(data) -results = pipeline.predict(data) +pipeline.fit(train_data["Text"], train_data["Label"]) +results = pipeline.predict(test_data["Text"]) ``` @@ -57,11 +80,15 @@ Many additional examples and tutorials can be found in the [documentation](https ## Building -To build `nimbusml` from source please visit our [developers guide](docs/developers/developer-guide.md). +To build `nimbusml` from source please visit our [developer guide](docs/developers/developer-guide.md). ## Contributing -We welcome [contributions](docs/project-docs/contributing.md)! +The contributions guide can be found [here](docs/project-docs/contributing.md). Given the experimental nature of this project, support will be provided on a best-effort basis. We suggest opening an issue for discussion before starting a PR with big changes. + +## Support + +If you have an idea for a new feature or encounter a problem, please open an [issue](https://github.com/Microsoft/NimbusML/issues/new) in this repository or ask your question on Stack Overflow. ## License