Skip to content

Commit e439e80

Browse files
Fixtypos everywhere updating contributors list (#83)
* fixed typos and suggested a branch naming convention * fixed typos in README, CONTRIBUTING, PURPOSE files + updated contributors list
1 parent b051f6c commit e439e80

File tree

3 files changed

+27
-25
lines changed

3 files changed

+27
-25
lines changed

CONTRIBUTING.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Texthero follows an approach known as shift-left testing. According to [Wikipedi
2525

2626
> Shift-left testing is an approach to software testing and system testing in which testing is performed earlier in the lifecycle.
2727
28-
Shift-left testing reduces the number of bugs by attempting to solve the problem at the origin. Often many programming defects are not uncovered and fixed until after significant effort has been wasted on their implementation. Texthero's attempt to avoid this kind of issue.
28+
Shift-left testing reduces the number of bugs by attempting to solve the problem at the origin. Often many programming defects are not uncovered and fixed until after significant effort has been wasted on their implementation. Texthero attempts to avoid these kind of issues.
2929

3030

3131
## Improve documentation!
@@ -56,7 +56,7 @@ The following link gives some advice on how to submit a successful pull request.
5656

5757
## Ask questions!
5858

59-
We are there for you! If everything is unclear, just ask. We will do our best to answer you quickly.
59+
We are there for you! If anything is unclear, just ask. We will do our best to answer you quickly.
6060

6161
## Propose new ideas!
6262

@@ -84,15 +84,15 @@ $ cd scripts
8484
$ ./tests.sh
8585
```
8686

87-
Calling `./test.sh` is equivalent to execute form the _root_ `python3 -m unittest discover -s tests -t .`
87+
Calling `./tests.sh` is equivalent to executing it from the _root_ `python3 -m unittest discover -s tests -t .`
8888

8989

9090
**Important.** If you worked on a bug, you should add a test that checks the bug is not present anymore. This is extremely useful as it avoids to re-introduce the same bug again in the future.
9191

9292

9393
### Passing doctests
9494

95-
When executing `./test.sh` it will also check that the Examples in the docstrings are correct (doctests).
95+
When executing `./tests.sh` it will also check that the Examples in the docstrings are correct (doctests).
9696

9797
Passing doctests might be a bit annoying sometimes. Let's look at this example for instance:
9898

@@ -114,7 +114,7 @@ The docstring failed? Why? The reason is that somewhere in the `Example` section
114114

115115
When you submit your code, all code will be tested on different operating systems using Travis CI: [TRAVIS CI texthero](https://travis-ci.com/github/jbesomi/texthero).
116116

117-
Make sure you pass all your test locally before opening a pull request!
117+
Make sure you pass all your tests locally before opening a pull request!
118118

119119
## Formatting
120120

@@ -182,7 +182,8 @@ $ git checkout -b new-branch
182182
Try to commit regularly. In addition, whenever possible, group changes into distinct commits. It will be easier for the rest of us to understand what you worked on just by reading the description of your commit.
183183

184184
```
185-
$ ...
185+
$ git add README.md
186+
$ git commit -m "added README.md"
186187
```
187188

188189
1. Test your changes
@@ -200,7 +201,7 @@ The time to submit the PR has come. Head to your forked repository on Github. Th
200201

201202
- `./test.sh`
202203
- Execute unittests as well as test all doctests
203-
- `./formath.sh`
204+
- `./format.sh`
204205
- format all code with [black](https://github.com/psf/black)
205206
- `./check.sh`
206207
- Format the code with black (`format.sh`)

PURPOSE.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# PURPOSE
22

3-
This document attempt at defining the purpose of Texthero and it's futures enhancements.
3+
This document attempts at defining the purpose of Texthero and it's future enhancements.
44

55
### Motivation
66

@@ -14,7 +14,7 @@ We can decompose the objective of Texthero in two parts:
1414

1515
1. ** Offer an efficient tool to deal with text-based datasets (The texthero python package). Texthero is mainly a teaching tool and therefore easy to use and understand, but at the same time quite efficient and should be able to handle large quantities of data.
1616

17-
2. ** Provide a sustain to newcomers in the NLP word to efficiently learn all the main core topics (tf-idf, text cleaning, regular expression, etc). As there are many other tutorials, the main approach is to redirect users to valuable resources and explain better any missing point. This part is done mainly through the *tutorials* on texthero.org.
17+
2. ** Provide a sustain to newcomers in the NLP world to efficiently learn all the main core topics (tf-idf, text cleaning, regular expression, etc). As there are many other tutorials, the main approach is to redirect users to valuable resources and explain better any missing point. This part is done mainly through the *tutorials* on texthero.org.
1818

1919

2020
### Channels
@@ -33,23 +33,23 @@ We can decompose the objective of Texthero in two parts:
3333

3434
### Python package
3535

36-
For future development, is important to have a clear idea in mind of the purpose of Texthero as a python package.
36+
For future development, it is important to have a clear idea in mind of the purpose of Texthero as a python package.
3737

3838

3939
**Package core purpose**
4040

4141
The goal is to extract insights from the whole corpora, i.e collection of document and not from the single element.
4242

43-
Generally, the corpora are composed of a __long__ collection of documents and therefore the require techniques need to be efficient to deal with a large amount of text.
43+
Generally, the corpora are composed of a __long__ collection of documents and therefore the required techniques need to be efficient to deal with a large amount of text.
4444

4545
**Neural network**
4646

4747
Texthero function (as of now) does not make use of a neural network solution. The main reason is that there is no need for that as there are mature libraries (PyTorch and Tensorflow to name a few).
4848

49-
What Texthero offers is a tool to be used in addition to any other machine learning libraries. Ideally, texthero should be used before applying any "sophisticated" approach to the dataset; to first better understand the underline data before applying any complex model.
49+
What Texthero offers is a tool to be used in addition to any other machine learning libraries. Ideally, texthero should be used before applying any "sophisticated" approach to the dataset; to first better understand the underlying data before applying any complex model.
5050

5151

52-
Note: a text corpus or collection of documents need always to be in form of a Pandas Series. "do that on a text corpus" or "do that on a Pandas Series" refers to the same act.
52+
Note: a text corpus or collection of documents need to be always in form of a Pandas Series. "do that on a text corpus" or "do that on a Pandas Series" refers to the same act.
5353

5454
**Common usage**:
5555
- Clean a text Pandas Series

README.md

+13-12
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,13 @@
4646

4747
Texthero is a python toolkit to work with text-based dataset quickly and effortlessly. Texthero is very simple to learn and designed to be used on top of Pandas. Texthero has the same expressiveness and power of Pandas and is extensively documented. Texthero is modern and conceived for programmers of the 2020 decade with little knowledge if any in linguistic.
4848

49-
You can think of Texthero as a tool to help you _understand_ and work with text-based dataset. Given a tabular dataset, it's easy to _grasp the main concept_. Instead, given a text dataset, it's harder to have quick insights into the underline data. With Texthero, preprocessing text data, map it into vectors, and visualize the obtained vector space takes just a couple of lines.
49+
You can think of Texthero as a tool to help you _understand_ and work with text-based dataset. Given a tabular dataset, it's easy to _grasp the main concept_. Instead, given a text dataset, it's harder to have quick insights into the underline data. With Texthero, preprocessing text data, mapping it into vectors, and visualizing the obtained vector space takes just a couple of lines.
5050

5151
Texthero include tools for:
5252
* Preprocess text data: it offers both out-of-the-box solutions but it's also flexible for custom-solutions.
5353
* Natural Language Processing: keyphrases and keywords extraction, and named entity recognition.
5454
* Text representation: TF-IDF, term frequency, and custom word-embeddings (wip)
55-
* Vector space analysis: clustering (K-means, Meanshift, DBSAN and Hierarchical), topic modeling (wip) and interpretation.
55+
* Vector space analysis: clustering (K-means, Meanshift, DBSCAN and Hierarchical), topic modeling (wip) and interpretation.
5656
* Text visualization: vector space visualization, place localization on maps (wip).
5757

5858
Texthero is free, open-source and [well documented](https://texthero.org/docs) (and that's what we love most by the way!).
@@ -61,9 +61,9 @@ We hope you will find pleasure working with Texthero as we had during his develo
6161

6262
<h2 align="center">Hablas español? क्या आप हिंदी बोलते हैं? 日本語が話せるのか?</h2>
6363

64-
Texthero has been developed for the whole NLP community. We know how hard is to deal with different NLP tools (NLTK, SpaCy, Gensim, TextBlob, Sklearn): that's why we developed Texthero, to simplify things.
64+
Texthero has been developed for the whole NLP community. We know how hard it is to deal with different NLP tools (NLTK, SpaCy, Gensim, TextBlob, Sklearn): that's why we developed Texthero, to simplify things.
6565

66-
Now, the next main milestone is to provide *multilingual support* and for this big step, we need the help of all of you. ¿Hablas español? Sie sprechen Deutsch? 你会说中文? 日本語が話せるのか? Fala português? Parli Italiano? Вы говорите по-русски? If yes or you speak another language not mentioned, then you might help us develop multilingual support! Even if you haven't contributed before or you just started with NLP contact us or open a Github issue, there is always a first time :) We promise you will learn a lot, and, ... who knows? It might help you find your new job as an NLP-developer!
66+
Now, the next main milestone is to provide *multilingual support* and for this big step, we need the help of all of you. ¿Hablas español? Sie sprechen Deutsch? 你会说中文? 日本語が話せるのか? Fala português? Parli Italiano? Вы говорите по-русски? If yes or you speak another language not mentioned here, then you might help us develop multilingual support! Even if you haven't contributed before or you just started with NLP, contact us or open a Github issue, there is always a first time :) We promise you will learn a lot, and, ... who knows? It might help you find your new job as an NLP-developer!
6767

6868
For improving the python toolkit and provide an even better experience, your aid and feedback are crucial. If you have any problem or suggestion please open a Github [issue](https://github.com/jbesomi/texthero/issues), we will be glad to support you and help you.
6969

@@ -72,11 +72,11 @@ For improving the python toolkit and provide an even better experience, your aid
7272

7373
Texthero's community is growing fast. Texthero though is still in a beta version; soon, a faster and better version will be released and it will bring some major changes.
7474

75-
For instance, to give a more granular control over the pipeline, starting from the next version on, all `preprocessing` functions will require as argument an already tokenized text. This will be a major changes.
75+
For instance, to give a more granular control over the pipeline, starting from the next version on, all `preprocessing` functions will require as argument an already tokenized text. This will be a major change.
7676

7777
Once released the stable version (Texthero 2.0), backward compatibility will be respected. Until this point, backward compatibility will be present but it will be weaker.
7878

79-
If you want to be part of this fast-growing movements, do not hesitate to contribute: [CONTRIBUTING](blob/master/CONTRIBUTING.md)!
79+
If you want to be part of this fast-growing movements, do not hesitate to contribute: [CONTRIBUTING](./CONTRIBUTING.md)!
8080

8181
<h2 align="center">Installation</h2>
8282

@@ -88,7 +88,7 @@ pip install texthero
8888

8989
> ☝️Under the hoods, Texthero makes use of multiple NLP and machine learning toolkits such as Gensim, NLTK, SpaCy and scikit-learn. You don't need to install them all separately, pip will take care of that.
9090
91-
> For fast performance, make sure you have installed Spacy version >= 2.2. Also, make sure you have a recent version of python, the higher, the best.
91+
> For faster performance, make sure you have installed Spacy version >= 2.2. Also, make sure you have a recent version of python, the higher, the best.
9292
9393
<h2 align="center">Getting started</h2>
9494

@@ -98,7 +98,7 @@ In case you are an advanced python user, then `help(texthero)` should do the wor
9898

9999
<h2 align="center">Examples</h2>
100100

101-
<h3>1. Text cleaning, TF-IDF representation and visualization</h3>
101+
<h3>1. Text cleaning, TF-IDF representation and Visualization</h3>
102102

103103

104104
```python
@@ -122,7 +122,7 @@ hero.scatterplot(df, 'pca', color='topic', title="PCA BBC Sport news")
122122
<img src="https://github.com/jbesomi/texthero/raw/master/github/scatterplot_bbcsport.svg">
123123
</p>
124124

125-
<h3>2. Text preprocessing, TF-IDF, K-means and visualization</h3>
125+
<h3>2. Text preprocessing, TF-IDF, K-means and Visualization</h3>
126126

127127
```python
128128
import texthero as hero
@@ -174,7 +174,7 @@ Remove all digits:
174174
dtype: object
175175
```
176176

177-
> Remove digits replace only blocks of digits. The digits in the string "hello123" will not be removed. If we want to remove all digits, you need to set only_blocks to false.
177+
> Remove digits replaces only blocks of digits. The digits in the string "hello123" will not be removed. If we want to remove all digits, you need to set only_blocks to false.
178178
179179
Remove all types of brackets and their content.
180180

@@ -272,7 +272,7 @@ Full documentation: [visualization](https://texthero.org/docs/api-visualization)
272272

273273
<h5>Why Texthero</h5>
274274

275-
Sometimes we just want things done, right? Texthero help with that. It helps make things easier and give the developer more time to focus on his custom requirements. We believe that start cleaning text should just take a minute. Same for finding the most important part of a text and the same for representing it.
275+
Sometimes we just want things done, right? Texthero helps with that. It helps make things easier and give the developer more time to focus on his custom requirements. We believe that cleaning text should just take a minute. Same for finding the most important part of a text and the same for representing it.
276276

277277
In a very pragmatic way, texthero has just one goal: make the developer spare time. Working with text data can be a pain and in most cases, a default pipeline can be quite good to start. There is always time to come back and improve previous work.
278278

@@ -283,7 +283,7 @@ In a very pragmatic way, texthero has just one goal: make the developer spare ti
283283
284284
Texthero is for all of us NLP-developers and it can continue to exist with the precious contribution of the community.
285285

286-
Your level of expertise of python and NLP does not matter, anyone can help and anyone is more than welcomed to contribute!
286+
Your level of expertise of python and NLP does not matter, anyone can help and anyone is more than welcome to contribute!
287287

288288
**Are you an NLP expert?**
289289

@@ -313,6 +313,7 @@ If you have just other questions or inquiry drop me a line at jonathanbesomi__AT
313313
- [Christian Claus](https://github.com/cclauss)
314314
- [bobfang1992](https://github.com/bobfang1992)
315315
- [Ishan Arora](https://github.com/ishanarora04)
316+
- [Vidya P](https://github.com/vidyap-xgboost)
316317

317318

318319
<h2 align="center"><a href="./LICENSE">License</a></h2>

0 commit comments

Comments
 (0)