Skip to content
/ SDDM Public

Code for NeurIPS 2019 paper "Scalable inference of topic evolution via models for latent geometric structures"

License

Notifications You must be signed in to change notification settings

moonfolk/SDDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for NeurIPS 2019 paper Scalable inference of topic evolution via models for latent geometric structures

The code has been tested on Ubuntu 14.04, 16.04, 18.04 LTS


Three Settings

The code repo contains three different algorithms for scalable inference of topic evolution considering three different settings:

  1. Distributed: texts distributed/grouped in some manner (e.g., location, categories)

  2. Online (Streaming): texts coming in online/streaming fashion

  3. Distributed & Online (Streaming): texts coming in both distributed and online fashion

####Algorithms Matching Three Settings (Setting:Algorithm)

  1. Distributed: DM
  2. Online: SDM
  3. Distributed & Online: SDDM

usage:

Install the required dependencies:

./conf

To run three different algorithms, go to the corresponding folder (distributed for DM, online for SDM, distributed_online for SDDM) and refer the README.md file under each for algorithm & setting specific details.

###Perplexity: To compute the perplexity on the new dataset with three algorithms mentioned above, go to the folder perplexity and refer the README.md file under the folder for details.

###Data: Two datasets are used - EJC (ejc) and Wiki (wiki) (more details regarding the two datasets are given in the paper)

Each dataset folder has 5 subfolders:

  • _group (dataset partitioned by different groups)

  • _time (dataset partitioned by different timestamp)

  • _time_group (dataset first partitioned by time and then partitioned by groups)

  • _test (test dataset)

  • _time_group_meta_data (contains vocabulary and group name & id mapping)

EJC_Data ([links] (https://drive.google.com/file/d/1fVQIwmZAdb76pftMKJdT9MsSq9Tlfsk0/view?usp=sharing)):

Wiki_Data ([links] (https://drive.google.com/file/d/12thNKl_BpF0ozmg_wyLAGHVQKz0lBu7m/view?usp=sharing)):

Parameter Interpretation

  • tau0

Controls how topic changes from time to time: small value => sharp change of topics between different time points

  • tau1

Affects the number of topics: small tau1 => larger variance between topics => less number of topics: vectors with small difference are regarded as different topics

  • gamma0

Parameter of Beta process

About

Code for NeurIPS 2019 paper "Scalable inference of topic evolution via models for latent geometric structures"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published