Implementation of Deep Dirichlet Multinomial Regression in Python 3 with Cython Gibbs sampler.
The Gibbs sampler was built from that the Cython implementation here: https://github.com/lda-project/lda. Current Gibbs sampler supports asymmetric priors.
sh build.sh
After building, run the sample code to fit LDA, DMR, and dDMR on synthetic data (data are generated from a deep DMR model):
cd deepdmr
python test_synth.py
Synthetic data can be found under test_output/data/, model runs can be found under test_output/runs/, and plots under test_output/plots/. The synthetic data is saved in compressed numpy format -- format can be grokked by inspecting this file. Relevant fields are:
- Ds_body: integer array for document indices
- Ws_body: integer array for token indices
- token_dict: dictionary mapping token type to index
- annotation_names: list of different annotation names associated with each document
- annotation_dicts: maps annotation name to an index-to-feature name dictionary
- annotations: dictionary of model parameters along with document supervision. In the synthetic data, the document-level supervision is under the "descriptor" field.
If you use this code and find it useful, please consider citing the accompanying paper:
Adrian Benton and Mark Dredze. Deep Dirichlet Multinomial Regression. 2018. HLT-NAACL.
Feel free to email any questions or concerns to the first author:
first_name dot last_name at gmail dot com