Clinical Reasoning over Tabular Data and Text with Bayesian Networks

Summary

This repository contains the code for Clinical Reasoning over Tabular Data and Text with Bayesian Networks. Should you reuse any ideas from this repository or the accompanying paper, please cite our work as follows:

@InProceedings{prabaey_clinical_reasoning,
  author="Rabaey, Paloma
  and Deleu, Johannes
  and Heytens, Stefan
  and Demeester, Thomas",
  title="Clinical Reasoning over Tabular Data and Text with Bayesian Networks",
  booktitle="Artificial Intelligence in Medicine",
  year="2024",
  publisher="Springer Nature Switzerland",
  pages="229--250",
}

In this paper, we propose two approaches to integrate text in Bayesian networks, to enable joint inference of tabular data and unstructured text. One approach uses a generative text model, while the other uses a discriminative one. We also compare with two types of baselines: a normal Bayesian network trained on only the tabular data and a feed-forward neural network.

We evaluate our models using simulation results for a primary care use case (diagnosis of pneumonia), for which we generate our own dataset. This dataset is based on both expert knowledge (used to build a ground truth Bayesian network) and ChatGPT for generation of a realistic textual description that fits the tabular portion of the sample. An example of what the samples in this dataset look like is shown below.

Files

notebooks:

data_generation.ipynb: illustration of the full simulation data generation process
BN_baselines.ipynb: training and evaluation of two Bayesian network baselines ($BN$ and $BN^{++}$)
FF_baseline.ipynb: training and evaluation of discriminative feed-forward neural network baseline ($\text{ff-discr-text}$)
gen_model.ipynb: training and evaluation of Bayesian network with text generator $(\text{bn-gen-text})$, and its ablated version $(\text{bn-gen-text}^{-})$
discr_model.ipynb: training and evaluation of Bayesian network with text discriminator $(\text{bn-discr-text})$, and its ablated version $(\text{bn-discr-text}^{-})$

data directory:

train_4000_final.p: simulated train dataset (4000 samples)
test_1000_final.p: simulated test dataset (1000 samples)
train_4000_final_fever_pain.p: version of train dataset with observed values for the features "fever" and "pain" (which are normally excluded from the tabular portion of the dataset, and only expressed in text)
test_1000_final_fever_pain.p: versio of test dataset with observed values for the features "fever" and "pain" (idem)
suppl: intermediate data files used in data_generation.ipynb to demonstrate the intermediate steps in the data generation process

utils directory:

baselines.py: code to fit Bayesian network baselines
data.py: data generation and processing classes
prompt.py: OpenAI prompting helper functions (used for data generation)
models.py: code to model Bayesian network with text generator $(\text{bn-gen-text}$, $\text{bn-gen-text}^{-})$ and Bayesian network with text discriminator $(\text{bn-discr-text}$, $\text{bn-discr-text}^{-})$
evaluation.py: evaluation code containing methods to calculate KL divergence and predict diagnosis probabilities using Bayesian inference over various model types

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clinical Reasoning over Tabular Data and Text with Bayesian Networks

Summary

Files

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
figures		figures
utils		utils
BN_baselines.ipynb		BN_baselines.ipynb
FF_baseline.ipynb		FF_baseline.ipynb
README.md		README.md
data_generation.ipynb		data_generation.ipynb
discr_model.ipynb		discr_model.ipynb
gen_model.ipynb		gen_model.ipynb

prabaey/bn-text

Folders and files

Latest commit

History

Repository files navigation

Clinical Reasoning over Tabular Data and Text with Bayesian Networks

Summary

Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages