Fixtypos everywhere updating contributors list (#83)

vidyap-xgboost · web-flow · commit e439e8027998 · 2020-07-14T12:26:23.000+02:00
* fixed typos and suggested a branch naming convention

* fixed typos in README, CONTRIBUTING, PURPOSE files + updated contributors list
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -25,7 +25,7 @@ Texthero follows an approach known as shift-left testing. According to [Wikipedi
 
 > Shift-left testing is an approach to software testing and system testing in which testing is performed earlier in the lifecycle.
 
-Shift-left testing reduces the number of bugs by attempting to solve the problem at the origin. Often many programming defects are not uncovered and fixed until after significant effort has been wasted on their implementation. Texthero's attempt to avoid this kind of issue.
+Shift-left testing reduces the number of bugs by attempting to solve the problem at the origin. Often many programming defects are not uncovered and fixed until after significant effort has been wasted on their implementation. Texthero attempts to avoid these kind of issues.
 
 
 ## Improve documentation!
@@ -56,7 +56,7 @@ The following link gives some advice on how to submit a successful pull request.
 
 ## Ask questions!
 
-We are there for you! If everything is unclear, just ask. We will do our best to answer you quickly.
+We are there for you! If anything is unclear, just ask. We will do our best to answer you quickly.
 
 ## Propose new ideas!
 
@@ -84,15 +84,15 @@ $ cd scripts
 $ ./tests.sh
 ```
 
-Calling `./test.sh` is equivalent to execute form the _root_ `python3 -m unittest discover -s tests -t .`
+Calling `./tests.sh` is equivalent to executing it from the _root_ `python3 -m unittest discover -s tests -t .`
 
 
 **Important.** If you worked on a bug, you should add a test that checks the bug is not present anymore. This is extremely useful as it avoids to re-introduce the same bug again in the future.
 
 
 ### Passing doctests
 
-When executing `./test.sh` it will also check that the Examples in the docstrings are correct (doctests).
+When executing `./tests.sh` it will also check that the Examples in the docstrings are correct (doctests).
 
 Passing doctests might be a bit annoying sometimes. Let's look at this example for instance:
 
@@ -114,7 +114,7 @@ The docstring failed? Why? The reason is that somewhere in the `Example` section
 
 When you submit your code, all code will be tested on different operating systems using Travis CI: [TRAVIS CI texthero](https://travis-ci.com/github/jbesomi/texthero).
 
-Make sure you pass all your test locally before opening a pull request!
+Make sure you pass all your tests locally before opening a pull request!
 
 ## Formatting
 
@@ -182,7 +182,8 @@ $ git checkout -b new-branch
 Try to commit regularly. In addition, whenever possible, group changes into distinct commits. It will be easier for the rest of us to understand what you worked on just by reading the description of your commit.
 
 ```
-$ ...
+$ git add README.md
+$ git commit -m "added README.md"
 ```
 
 1. Test your changes
@@ -200,7 +201,7 @@ The time to submit the PR has come. Head to your forked repository on Github. Th
 
 - `./test.sh`
    - Execute unittests as well as test all doctests
-- `./formath.sh`
+- `./format.sh`
    - format all code with [black](https://github.com/psf/black)
 - `./check.sh`
    - Format the code with black (`format.sh`)
diff --git a/PURPOSE.md b/PURPOSE.md
@@ -1,6 +1,6 @@
 # PURPOSE
 
-This document attempt at defining the purpose of Texthero and it's futures enhancements.
+This document attempts at defining the purpose of Texthero and it's future enhancements.
 
 ### Motivation
 
@@ -14,7 +14,7 @@ We can decompose the objective of Texthero in two parts:
 
 1. ** Offer an efficient tool to deal with text-based datasets (The texthero python package). Texthero is mainly a teaching tool and therefore easy to use and understand, but at the same time quite efficient and should be able to handle large quantities of data.
 
-2. ** Provide a sustain to newcomers in the NLP word to efficiently learn all the main core topics (tf-idf, text cleaning, regular expression, etc). As there are many other tutorials, the main approach is to redirect users to valuable resources and explain better any missing point. This part is done mainly through the *tutorials* on texthero.org.
+2. ** Provide a sustain to newcomers in the NLP world to efficiently learn all the main core topics (tf-idf, text cleaning, regular expression, etc). As there are many other tutorials, the main approach is to redirect users to valuable resources and explain better any missing point. This part is done mainly through the *tutorials* on texthero.org.
 
 
 ### Channels
@@ -33,23 +33,23 @@ We can decompose the objective of Texthero in two parts:
 
 ### Python package
 
-For future development, is important to have a clear idea in mind of the purpose of Texthero as a python package.
+For future development, it is important to have a clear idea in mind of the purpose of Texthero as a python package.
 
 
 **Package core purpose**
 
 The goal is to extract insights from the whole corpora, i.e collection of document and not from the single element.
 
-Generally, the corpora are composed of a __long__ collection of documents and therefore the require techniques need to be efficient to deal with a large amount of text.
+Generally, the corpora are composed of a __long__ collection of documents and therefore the required techniques need to be efficient to deal with a large amount of text.
 
 **Neural network**
 
 Texthero function (as of now) does not make use of a neural network solution. The main reason is that there is no need for that as there are mature libraries (PyTorch and Tensorflow to name a few).
 
-What Texthero offers is a tool to be used in addition to any other machine learning libraries. Ideally, texthero should be used before applying any "sophisticated" approach to the dataset; to first better understand the underline data before applying any complex model.
+What Texthero offers is a tool to be used in addition to any other machine learning libraries. Ideally, texthero should be used before applying any "sophisticated" approach to the dataset; to first better understand the underlying data before applying any complex model.
 
 
-Note: a text corpus or collection of documents need always to be in form of a Pandas Series. "do that on a text corpus" or "do that on a Pandas Series" refers to the same act.
+Note: a text corpus or collection of documents need to be always in form of a Pandas Series. "do that on a text corpus" or "do that on a Pandas Series" refers to the same act.
 
 **Common usage**:
  - Clean a text Pandas Series
diff --git a/README.md b/README.md
@@ -46,13 +46,13 @@
 
 Texthero is a python toolkit to work with text-based dataset quickly and effortlessly. Texthero is very simple to learn and designed to be used on top of Pandas. Texthero has the same expressiveness and power of Pandas and is extensively documented. Texthero is modern and conceived for programmers of the 2020 decade with little knowledge if any in linguistic. 
 
-You can think of Texthero as a tool to help you _understand_ and work with text-based dataset. Given a tabular dataset, it's easy to _grasp the main concept_. Instead, given a text dataset, it's harder to have quick insights into the underline data. With Texthero, preprocessing text data, map it into vectors, and visualize the obtained vector space takes just a couple of lines.
+You can think of Texthero as a tool to help you _understand_ and work with text-based dataset. Given a tabular dataset, it's easy to _grasp the main concept_. Instead, given a text dataset, it's harder to have quick insights into the underline data. With Texthero, preprocessing text data, mapping it into vectors, and visualizing the obtained vector space takes just a couple of lines.
 
 Texthero include tools for:
 * Preprocess text data: it offers both out-of-the-box solutions but it's also flexible for custom-solutions.
 * Natural Language Processing: keyphrases and keywords extraction, and named entity recognition.
 * Text representation: TF-IDF, term frequency, and custom word-embeddings (wip)
-* Vector space analysis: clustering (K-means, Meanshift, DBSAN and Hierarchical), topic modeling (wip) and interpretation.
+* Vector space analysis: clustering (K-means, Meanshift, DBSCAN and Hierarchical), topic modeling (wip) and interpretation.
 * Text visualization: vector space visualization, place localization on maps (wip).
 
 Texthero is free, open-source and [well documented](https://texthero.org/docs) (and that's what we love most by the way!). 
@@ -61,9 +61,9 @@ We hope you will find pleasure working with Texthero as we had during his develo
 
 <h2 align="center">Hablas español? क्या आप हिंदी बोलते हैं? 日本語が話せるのか？</h2>
 
-Texthero has been developed for the whole NLP community. We know how hard is to deal with different NLP tools (NLTK, SpaCy, Gensim, TextBlob, Sklearn): that's why we developed Texthero, to simplify things.
+Texthero has been developed for the whole NLP community. We know how hard it is to deal with different NLP tools (NLTK, SpaCy, Gensim, TextBlob, Sklearn): that's why we developed Texthero, to simplify things.
 
-Now, the next main milestone is to provide *multilingual support* and for this big step, we need the help of all of you. ¿Hablas español? Sie sprechen Deutsch? 你会说中文？ 日本語が話せるのか？ Fala português? Parli Italiano? Вы говорите по-русски? If yes or you speak another language not mentioned, then you might help us develop multilingual support! Even if you haven't contributed before or you just started with NLP contact us or open a Github issue, there is always a first time :) We promise you will learn a lot, and, ... who knows? It might help you find your new job as an NLP-developer!
+Now, the next main milestone is to provide *multilingual support* and for this big step, we need the help of all of you. ¿Hablas español? Sie sprechen Deutsch? 你会说中文？ 日本語が話せるのか？ Fala português? Parli Italiano? Вы говорите по-русски? If yes or you speak another language not mentioned here, then you might help us develop multilingual support! Even if you haven't contributed before or you just started with NLP, contact us or open a Github issue, there is always a first time :) We promise you will learn a lot, and, ... who knows? It might help you find your new job as an NLP-developer!
 
 For improving the python toolkit and provide an even better experience, your aid and feedback are crucial. If you have any problem or suggestion please open a Github [issue](https://github.com/jbesomi/texthero/issues), we will be glad to support you and help you.
 
@@ -72,11 +72,11 @@ For improving the python toolkit and provide an even better experience, your aid
 
 Texthero's community is growing fast. Texthero though is still in a beta version; soon, a faster and better version will be released and it will bring some major changes.
 
-For instance, to give a more granular control over the pipeline, starting from the next version on, all `preprocessing` functions will require as argument an already tokenized text. This will be a major changes.
+For instance, to give a more granular control over the pipeline, starting from the next version on, all `preprocessing` functions will require as argument an already tokenized text. This will be a major change.
 
 Once released the stable version (Texthero 2.0), backward compatibility will be respected. Until this point, backward compatibility will be present but it will be weaker.
 
-If you want to be part of this fast-growing movements, do not hesitate to contribute: [CONTRIBUTING](blob/master/CONTRIBUTING.md)!
+If you want to be part of this fast-growing movements, do not hesitate to contribute: [CONTRIBUTING](./CONTRIBUTING.md)!
 
 <h2 align="center">Installation</h2>
 
@@ -88,7 +88,7 @@ pip install texthero
 
 > ☝️Under the hoods, Texthero makes use of multiple NLP and machine learning toolkits such as Gensim, NLTK, SpaCy and scikit-learn. You don't need to install them all separately, pip will take care of that.
 
-> For fast performance, make sure you have installed Spacy version >= 2.2. Also, make sure you have a recent version of python, the higher, the best.
+> For faster performance, make sure you have installed Spacy version >= 2.2. Also, make sure you have a recent version of python, the higher, the best.
 
 <h2 align="center">Getting started</h2>
 
@@ -98,7 +98,7 @@ In case you are an advanced python user, then `help(texthero)` should do the wor
 
 <h2 align="center">Examples</h2>
 
-<h3>1. Text cleaning, TF-IDF representation and visualization</h3>
+<h3>1. Text cleaning, TF-IDF representation and Visualization</h3>
 
 
 ```python
@@ -122,7 +122,7 @@ hero.scatterplot(df, 'pca', color='topic', title="PCA BBC Sport news")
    <img src="https://github.com/jbesomi/texthero/raw/master/github/scatterplot_bbcsport.svg">
 </p>
 
-<h3>2. Text preprocessing, TF-IDF, K-means and visualization</h3>
+<h3>2. Text preprocessing, TF-IDF, K-means and Visualization</h3>
 
 ```python
 import texthero as hero
@@ -174,7 +174,7 @@ Remove all digits:
 dtype: object
 ```
 
-> Remove digits replace only blocks of digits. The digits in the string "hello123" will not be removed. If we want to remove all digits, you need to set only_blocks to false.
+> Remove digits replaces only blocks of digits. The digits in the string "hello123" will not be removed. If we want to remove all digits, you need to set only_blocks to false.
 
 Remove all types of brackets and their content.
 
@@ -272,7 +272,7 @@ Full documentation: [visualization](https://texthero.org/docs/api-visualization)
 
 <h5>Why Texthero</h5>
 
-Sometimes we just want things done, right? Texthero help with that. It helps make things easier and give the developer more time to focus on his custom requirements. We believe that start cleaning text should just take a minute. Same for finding the most important part of a text and the same for representing it.
+Sometimes we just want things done, right? Texthero helps with that. It helps make things easier and give the developer more time to focus on his custom requirements. We believe that cleaning text should just take a minute. Same for finding the most important part of a text and the same for representing it.
 
 In a very pragmatic way, texthero has just one goal: make the developer spare time. Working with text data can be a pain and in most cases, a default pipeline can be quite good to start. There is always time to come back and improve previous work.
 
@@ -283,7 +283,7 @@ In a very pragmatic way, texthero has just one goal: make the developer spare ti
 
 Texthero is for all of us NLP-developers and it can continue to exist with the precious contribution of the community.
 
-Your level of expertise of python and NLP does not matter, anyone can help and anyone is more than welcomed to contribute!
+Your level of expertise of python and NLP does not matter, anyone can help and anyone is more than welcome to contribute!
 
 **Are you an NLP expert?**
 
@@ -313,6 +313,7 @@ If you have just other questions or inquiry drop me a line at jonathanbesomi__AT
 - [Christian Claus](https://github.com/cclauss)
 - [bobfang1992](https://github.com/bobfang1992)
 - [Ishan Arora](https://github.com/ishanarora04)
+- [Vidya P](https://github.com/vidyap-xgboost)
 
 
 <h2 align="center"><a href="./LICENSE">License</a></h2>