Skip to content

Commit ac0011b

Browse files
Update README.md: from development branch plus Python 3 note
1 parent 522cb38 commit ac0011b

File tree

1 file changed

+41
-43
lines changed

1 file changed

+41
-43
lines changed

README.md

+41-43
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,58 @@
11
Pattern
22
=======
33

4+
[![Build Status](http://img.shields.io/travis/clips/pattern/master.svg?style=flat)](https://travis-ci.org/clips/pattern/branches)
5+
[![Coverage](https://img.shields.io/coveralls/clips/pattern/master.svg?style=flat)](https://coveralls.io/github/clips/pattern?branch=master)
6+
[![PyPi version](http://img.shields.io/pypi/v/pattern.svg?style=flat)](https://pypi.python.org/pypi/pattern)
7+
[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg?style=flat)](https://github.com/clips/pattern/blob/master/LICENSE.txt)
8+
49
Pattern is a web mining module for Python. It has tools for:
510

611
* Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
712
* Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
813
* Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
914
* Network Analysis: graph centrality and visualization.
1015

11-
It is well documented and bundled with 50+ examples and 350+ unit tests. The source code is licensed under BSD and available from <http://www.clips.ua.ac.be/pages/pattern>.
16+
It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD and available from <http://www.clips.ua.ac.be/pages/pattern>.
1217

13-
![Pattern example workflow](http://www.clips.ua.ac.be/media/pattern_schema.gif)
18+
![Example workflow](https://raw.githubusercontent.com/clips/pattern/master/docs/g/pattern_schema.gif)
1419

15-
Version
20+
Example
1621
-------
1722

18-
2.6
23+
This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: *"$20 tip off a sweet little old lady today #win"*. The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled `WIN` or `FAIL`. The classifier uses the vectors to learn which other tweets look more like `WIN` or more like `FAIL`.
1924

20-
License
21-
-------
25+
```python
26+
from pattern.web import Twitter
27+
from pattern.en import tag
28+
from pattern.vector import KNN, count
2229

23-
**BSD**, see `LICENSE.txt` for further details.
30+
twitter, knn = Twitter(), KNN()
31+
32+
for i in range(1, 3):
33+
for tweet in twitter.search('#win OR #fail', start=i, count=100):
34+
s = tweet.text.lower()
35+
p = '#win' in s and 'WIN' or 'FAIL'
36+
v = tag(s)
37+
v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
38+
v = count(v) # {'sweet': 1}
39+
if v:
40+
knn.train(v, type=p)
41+
42+
print(knn.classify('sweet potato burger'))
43+
print(knn.classify('stupid autocorrect'))
44+
```
2445

2546
Installation
2647
------------
2748

28-
Pattern is written for Python 2.5+ (no support for Python 3 yet). The module has no external dependencies except when using LSA in the pattern.vector module, which requires NumPy (installed by default on Mac OS X). To install Pattern so that it is available in all your scripts, unzip the download and from the command line do:
49+
Pattern supports Python 2.7 and Python 3.6+. The Python 3 version is currently **only** available on the `development` branch. To install Pattern so that it is available in all your scripts, unzip the download and from the command line do:
2950
```bash
3051
cd pattern-2.6
3152
python setup.py install
3253
```
3354

34-
If you have pip, you can automatically download and install from the PyPi repository:
55+
If you have pip, you can automatically download and install from the [PyPI repository](https://pypi.python.org/pypi/Pattern):
3556
```bash
3657
pip install pattern
3758
```
@@ -50,36 +71,20 @@ import sys; if MODULE not in sys.path: sys.path.append(MODULE)
5071
from pattern.en import parsetree
5172
```
5273

53-
Example
54-
-------
55-
56-
This example trains a classifier on adjectives mined from Twitter. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled `WIN` or `FAIL`. The classifier uses the vectors to learn which other tweets look more like `WIN` or more like `FAIL`.
57-
58-
```python
59-
from pattern.web import Twitter
60-
from pattern.en import tag
61-
from pattern.vector import KNN, count
74+
Documentation
75+
-------------
6276

63-
twitter, knn = Twitter(), KNN()
77+
For documentation and examples see the [user documentation](http://www.clips.ua.ac.be/pages/pattern). If you are a developer, go check out the [developer documentation](http://www.clips.ua.ac.be/pages/pattern-dev).
6478

65-
for i in range(1, 3):
66-
for tweet in twitter.search('#win OR #fail', start=i, count=100):
67-
s = tweet.text.lower()
68-
p = '#win' in s and 'WIN' or 'FAIL'
69-
v = tag(s)
70-
v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
71-
v = count(v) # {'sweet': 1}
72-
if v:
73-
knn.train(v, type=p)
79+
Version
80+
-------
7481

75-
print knn.classify('sweet potato burger')
76-
print knn.classify('stupid autocorrect')
77-
```
82+
2.6
7883

79-
Documentation
80-
-------------
84+
License
85+
-------
8186

82-
<http://www.clips.ua.ac.be/pages/pattern>
87+
**BSD**, see `LICENSE.txt` for further details.
8388

8489
Reference
8590
---------
@@ -89,14 +94,13 @@ De Smedt, T., Daelemans, W. (2012). Pattern for Python. *Journal of Machine Lear
8994
Contribute
9095
----------
9196

92-
The source code is hosted on GitHub and contributions or donations are welcomed, see the [developer documentation](http://www.clips.ua.ac.be/pages/pattern#contribute). If you use Pattern in your work, please cite our reference paper.
97+
The source code is hosted on GitHub and contributions or donations are welcomed. Please have look at the [developer documentation](http://www.clips.ua.ac.be/pages/pattern-dev). If you use Pattern in your work, please cite our reference paper.
9398

9499
Bundled dependencies
95100
--------------------
96101

97102
Pattern is bundled with the following data sets, algorithms and Python packages:
98103

99-
- **Beautiful Soup**, Leonard Richardson
100104
- **Brill tagger**, Eric Brill
101105
- **Brill tagger for Dutch**, Jeroen Geertzen
102106
- **Brill tagger for German**, Gerold Schneider & Martin Volk
@@ -110,13 +114,7 @@ Pattern is bundled with the following data sets, algorithms and Python packages:
110114
- **LIBSVM**, Chih-Chung Chang & Chih-Jen Lin
111115
- **LIBLINEAR**, Rong-En Fan et al.
112116
- **NetworkX centrality**, Aric Hagberg, Dan Schult & Pieter Swart
113-
- **PDFMiner**, Yusuke Shinyama
114-
- **Python docx**, Mike Maccana
115-
- **PyWordNet**, Oliver Steele
116-
- **simplejson**, Bob Ippolito
117117
- **spelling corrector**, Peter Norvig
118-
- **Universal Feed Parser**, Mark Pilgrim
119-
- **WordNet**, Christiane Fellbaum et al.
120118

121119
Acknowledgements
122120
----------------
@@ -159,4 +157,4 @@ Acknowledgements
159157
- Dan Fu
160158
- Salvatore Di Dio
161159
- Vincent Van Asch
162-
- Frederik Elwert
160+
- Frederik Elwert

0 commit comments

Comments
 (0)