Skip to content

Commit

Permalink
Merge pull request #54 from openeventdata/dev
Browse files Browse the repository at this point in the history
Fix gazetteer and other issues
  • Loading branch information
ahalterman authored Jun 7, 2018
2 parents 184d654 + 3b33cdd commit fcc16a8
Show file tree
Hide file tree
Showing 7 changed files with 52 additions and 34 deletions.
49 changes: 24 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,26 @@ docker run -d -p 127.0.0.1:9200:9200 -v $(pwd)/geonames_index/:/usr/share/elasti
See the [es-geonames](https://github.com/openeventdata/es-geonames) for the code used
to produce this index.

To update the index, simply shut down the old container, re-download the index
from s3, and restart the container with the new index.

Citing
------

If you use this software in academic work, please cite as

```
@article{halterman2017mordecai,
title={Mordecai: Full Text Geoparsing and Event Geocoding},
author={Halterman, Andrew},
journal={The Journal of Open Source Software},
volume={2},
number={9},
year={2017},
doi={10.21105/joss.00091}
}
```

How does it work?
-----------------

Expand All @@ -93,8 +113,8 @@ from it.
The training data for the two models includes copyrighted text so cannot be
shared freely, but get in touch with me if you're interested in it.

API
--------
API and Configuration
---------------------

When instantiating the `Geoparser()` module, the following options can be changed:

Expand Down Expand Up @@ -148,7 +168,8 @@ Acknowledgements
----------------

An earlier verion of this software was donated to the Open Event Data Alliance
by Caerus Associates. See [Releases](https://github.com/openeventdata/mordecai/releases) or the [legacy-docker](https://github.com/openeventdata/mordecai/tree/legacy-docker) branch for the
by Caerus Associates. See [Releases](https://github.com/openeventdata/mordecai/releases)
or the [legacy-docker](https://github.com/openeventdata/mordecai/tree/legacy-docker) branch for the
2015-2016 and the 2016-2017 production versions of Mordecai.

This work was funded in part by DARPA's XDATA program, the U.S. Army Research
Expand All @@ -159,28 +180,6 @@ recommendations expressed in this material are those of the authors and do not
necessarily reflect the views of DARPA, ARO, Minerva, NSF, or the U.S.
government.

Citing
------

Send a note if you use Mordecai! It's always interesting to hear what people
are doing with it and whether it's doing what they want it to.

If you use this software in academic work, please cite as

Andrew Halterman, (2017). Mordecai: Full Text Geoparsing and Event Geocoding. *Journal of Open Source
Software*, 2(9), 91, doi:10.21105/joss.00091

```
@article{halterman2017mordecai,
title={Mordecai: Full Text Geoparsing and Event Geocoding},
author={Halterman, Andrew},
journal={The Journal of Open Source Software},
volume={2},
number={9},
year={2017},
doi={10.21105/joss.00091}
}
```

Contributing
------------
Expand Down
2 changes: 1 addition & 1 deletion mordecai/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from .geoparse import Geoparser

__version__ = "2.0.0a6"
__version__ = "2.0.1"
5 changes: 4 additions & 1 deletion mordecai/geoparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

class Geoparser:
def __init__(self, es_ip="localhost", es_port="9200", verbose = False,
country_threshold = 0.6, n_threads = 4):
country_threshold = 0.6, n_threads = 4, mod_date = "2018-06-05"):
DATA_PATH = pkg_resources.resource_filename('mordecai', 'data/')
MODELS_PATH = pkg_resources.resource_filename('mordecai', 'models/')
self._cts = utilities.country_list_maker()
Expand Down Expand Up @@ -59,6 +59,9 @@ def __init__(self, es_ip="localhost", es_port="9200", verbose = False,
"Mordecai needs access to the Geonames/Elasticsearch gazetteer to function.",
"See https://github.com/openeventdata/mordecai#installation-and-requirements",
"for instructions on setting up Geonames/Elasticsearch")
es_date = utilities.check_geonames_date(self.conn)
if es_date != mod_date:
print("You may be using an outdated Geonames index. Your index is from {0}, while the most recent is {1}. Please see https://github.com/openeventdata/mordecai/ for instructions on updating.".format(es_date, mod_date))

def _feature_country_mentions(self, doc):
"""
Expand Down
10 changes: 10 additions & 0 deletions mordecai/tests/test_mordecai.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
import sys
import glob
import json
from elasticsearch_dsl import Q
from ..utilities import structure_results
#from ..geoparse import Geoparser

import spacy
Expand Down Expand Up @@ -150,3 +152,11 @@ def test_issue_45(geo):
capital city of Digos."""
locs = geo.geoparse(text)
assert len(locs) > 0

def test_ohio(geo):
# This was a problem in issue 41
r = Q("match", geonameid='5165418')
result = geo.conn.query(r).execute()
output = structure_results(result)
assert output['hits']['hits'][0]['asciiname'] == "Ohio"

6 changes: 6 additions & 0 deletions mordecai/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,3 +247,9 @@ def setup_es(es_ip, es_port):
CLIENT = Elasticsearch([{'host' : es_ip, 'port' : es_port}])
S = Search(using=CLIENT, index="geonames")
return S

def check_geonames_date(conn):
r = Q("match", geonameid='4943351')
result = conn.query(r).execute()
output = structure_results(result)
return output['hits']['hits'][0]['modification_date']
12 changes: 6 additions & 6 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
editdistance==0.3.1
editdistance>=0.3.1
elasticsearch==5.4.0
elasticsearch-dsl==5.3.0
h5py>=2.6.0
Keras==2.0.8
pandas==0.19.2
spacy==2.0.3
https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz
tensorflow==1.3.0
Keras>=2.0.8
pandas>=0.19.2
spacy>=2.0.3
#https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.0.0/en_core_web_lg-2.0.0.tar.gz
tensorflow>=1.3.0
numpy>=1.12
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from setuptools import setup

setup(name='mordecai',
version='2.0.0a6',
version='2.0.1',
description='Full text geoparsing and event geocoding',
url='https://github.com/openeventdata/mordecai/',
author='Andy Halterman',
Expand Down

0 comments on commit fcc16a8

Please sign in to comment.