The file wikigraph.py
implements classes for finding paths between wikipedia articles and other related functions using the wikimedia API. A path is created by linking articles by the links they contain, just like the wikipedia game. See blog post https://winstonjay.github.io/posts/homunculus for more info on project motivations.
install requiremnts
python findpath.py --start="Car" --end="Home"
The main method find_path
is better run in a shell session or in a batch collection as its use of memoization will speed up searches whilst it runs, reducing requests to the Wikimedia API.
>>> import wikigraph
>>> w = wikigraph.WikiGraph()
>>> path = w.find_path(start="Tom Hanks", end="Kevin Bacon")
>>> print(path)
<wikigraph.Path: Tom Hanks -> Kevin Bacon>
>>> print(path.info)
Path:
Path: Tom Hanks -> Kevin Bacon
Separation: 1 steps
Time Taken: 0.578131 seconds
Requests: 2
>>> path.data
{'start': 'Tom Hanks', 'end': 'Kevin Bacon', 'path': 'Tom Hanks->Kevin Bacon', 'degree': 1}
>>> print(path.json(indent=2))
{
"start": "Tom Hanks",
"end": "Kevin Bacon",
"path": "Tom Hanks->Kevin Bacon",
"degree": 1
}
For a given sample of start articles find a path from each to a central end article.
Save the output to a given csv file. Without start list specified, program
will default to collecting an k
sized random sample generated by the wikimedia API. For more info, See command line arg details below.
usage:
-h, --help show this help message and exit
-o OUTFILE, --outfile OUTFILE
Filename to save the results to.
-x CENTER, --center CENTER
Title of valid wiki page to center all nodes on
-k SAMPLE_SIZE, --sample_size SAMPLE_SIZE
Sample size of k pages to search from. (Only applies
when sample source is not given)
-s SAMPLE_SOURCE, --sample_source SAMPLE_SOURCE
Filename containing newline delimited list of valid
wiki article titles if not specified sample defaults
to random selection from wikimedia api.
-v add to display titles of page requests made.
Requirements: requests