COVID-DST Dataset

Overview

Dialogue State Tracking (DST) is the core component in a dialogue system. Compared with vanilla encoder-decoder models, DST typically provides more robustness and expandability for the system via dialogue management.

In this work, we construct COVID-DST dataset from the CovidDialogue dataset that contains patient-doctor conversations about COVID-19 and other related pneumonia. This work is the fundamental part in constructing COVID dialogue system.

COVID-DST dataset is constructed in the file:

covid-dst-en.json

Dataset structure

Ontology

We define the dialogue state as a set of slots and their corresponding values, i.e., the slot-value pair. The slots and values consist of the ontology of the dialogue system. Slots track patients' description on their conditions and we define them in a hierarchical manner.

Slots for patients

Values in most slots are either "yes/no", or are extracted/summarized from the utterances. All slots are listed as follows (in lowercase).

Symptoms
- Lungs
  - cough
  - phlegm
  - breath
  - chest: include chest pain and chest discomfort
- Upper respiratory tract
  - runny nose
  - throat: include sore throat, throat discomfort
- Systemic
  - fever
  - chills
  - pain/aches
  - fatigue/weakness
- Central
  - headache/lightheaded
  - mood
- gastric
- other symptoms: recording other symtoms
medication
Diagnosis (medical condition)
- pneumonia
- asthma
- diabetes
- other diagnosis
Exposure
- travel: travel history
- exposure: possible to be exposured to COVID-19
Physical condition
- age
- smoking
- other-phy-con

Slots for doctors (Dialogue act)

In addition to the ontology, we also need to define a set of dialogue acts to keep track of how doctors respond to the patients. We define two types of dialogue acts. One type of acts can have multiple values, while the other type can only take two values (yes/no).

Note that the REQUEST act only take the slots defined for patients as values. For example, the doctor asks patients for more detailed information on cough and breath, which is represented as (REQUEST, [cough, breath]). Dialogue acts are listed as follows:

REQUEST: take the slots as values
action: suggest what actions to take
prescription
diagnose
Checking
- chest X-ray/CT
- other-checking
reqmore (current info is insufficient)
answer
knowledge base: track external knowledge for the dialogue system.

Annotated sample

We show an exemplar consultation in the English CovidDST (ID=1) constructed from CovidDialog dataset as follows:

Description: I have cough with no travel history. Is this a symptom of Covid-19?

Dialogue

Patient: Hello doctor, I get a cough for the last few days, which is heavy during night times. No raise in temperature but feeling tired with no travel history. No contact with any Covid-19 persons. It has been four to five days and has drunk a lot of Benadryl and took Paracetamol too. Doctors have shut the OP so do not know what to do? Please help.
Doctor: Hello, I understand your concern. I just have a few more questions. Does your cough has phlegm? Any other symptoms like difficulty breathing? Any other medical condition such as asthma, hypertension? Are you a smoker? Alcoholic beverage drinker?
SLOT: (travel, no), (cough, yes), (fever, no), (fatigue, yes), (community, no), (medication, [Benadryl, Paracetamol]).
ACT: (REQUEST, [cough, breath, asthma, blood, smoking]), (reqmore, true)

Patient: Thank you doctor, I have phlegm but not a lot. A tiny amount comes out most of the time. I have no difficulty in breathing. No medical conditions and not a smoker nor a drinker.
Doctor: Hi, I would recommend you take n-acetylcysteine 200 mg powder dissolved in water three times a day. You may also nebulize using PNSS (saline nebulizer) three times a day. This will help the phlegm to come out. I would also recommend you take vitamin C 500 mg and zinc to boost your immune system. If symptoms persist, worsen or new onset of symptoms has been noted, further consult is advised.
SLOT: (travel, no), (cough, yes), (fever, no), (fatigue, yes), (community, no), (medication, [Benadryl, Paracetamol]), (breath, no), (asthma, no), (smoking, no).
ACT: (prescription, [n-acetylcysteine, nebulize using PNSS (saline nebulizer), vitamin C 500 mg and zinc])

Dataset statistics

COVID-DST contains 603 conversations and 1232 utterances. Most of dialogues contain one turn for English dataset. The average, maximum, and minimum number of utterances in a conversation is 2.0, 17, and 2 respectively.

Slot count

There are 33 slots in total. 24 slots for patient and 9 slots (dialogue acts) for doctors. We count the number of annotated samples for each slots and dialogue acts.

Slots	#Samples	Slots	#Samples	Slots	#Samples
Action	359	Diagnose	94	Gastric	38
Answer	327	KB	86	Fatigue	37
Cough	160	Otherphycon	72	Phlegm	34
Medication	155	Headache	68	CT/Xray	33
Pneumonia	126	Exposure	67	Mood	31
Fever	124	Chest	65	Diabetes	25
Prescription	124	Breath	63	Chills	20
Throat	118	Travel	57	Smoking	20
Othersym	115	Pain	53	REQUEST	20
Otherdiagnosis	115	Runnynose	44	Asthma	16
Age	97	Otherchecking	42	Reqmore	13

Baseline model

Please refer to the folder .\baseline for more details.

Evaluation

We plan to adopt the joint goal accuracy - This is the average accuracy of predicting all slot assignments for a turn correctly. For non-categorical slots a fuzzy matching score is used to reward partial matches with the ground truth following DSTC8.

Challenges for the dataset

The samples for slots are imbalanced and insufficient.
Hard to map all important descriptions/points in the patients' utterances to the slots.
Hard to define a set of dialogue acts to fully describe how doctors respond.
Hard to model the conditioning realations, e.g., get tested if symptomatic.

Issues

Major issues

The values in the "action" slot should be unified.
- e.g., drink fluids, stay home, self-quarantine, call doctor, etc.
There are more details to track, e.g., the details of symptoms, the dose of medication.
Multiple values need to be combined into one in a proper way.
Some slots could be redundant.

Minor issues

Typos or wrong conjunction (e.g., HiIam).
Some samples are in other languages (ID = 270, 322, 564).

References

@article{ju2020CovidDialog,
  title={CovidDialog: Medical Dialogue Datasets about COVID-19},
  author={Ju, Zeqian and Chakravorty, Subrato and He, Xuehai and Chen, Shu and Yang, Xingyi and Xie, Pengtao},
  journal={ https://github.com/UCSD-AI4H/COVID-Dialogue}, 
  year={2020}
}

@article{10.1145/3166054.3166058,
  author = {Chen, Hongshen and Liu, Xiaorui and Yin, Dawei and Tang, Jiliang},
  title = {A Survey on Dialogue Systems: Recent Advances and New Frontiers},
  year = {2017},
  issue_date = {November 2017},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {19},
  number = {2},
  issn = {1931-0145},
  url = {https://doi.org/10.1145/3166054.3166058},
  doi = {10.1145/3166054.3166058},
  journal = {SIGKDD Explor. Newsl.},
  month = nov,
  pages = {25–35},
  numpages = {11}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
baseline		baseline
origin_data		origin_data
.gitignore		.gitignore
README.md		README.md
covid-dst-en-single-value.json		covid-dst-en-single-value.json
covid-dst-en.json		covid-dst-en.json
dialogue-sys.md		dialogue-sys.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-DST Dataset

Overview

Dataset structure

Ontology

Slots for patients

Slots for doctors (Dialogue act)

Annotated sample

Dataset statistics

Slot count

Baseline model

Evaluation

Challenges for the dataset

Issues

Major issues

Minor issues

References

About

Releases

Packages

Languages

mqianliu/COVID-DST

Folders and files

Latest commit

History

Repository files navigation

COVID-DST Dataset

Overview

Dataset structure

Ontology

Slots for patients

Slots for doctors (Dialogue act)

Annotated sample

Dataset statistics

Slot count

Baseline model

Evaluation

Challenges for the dataset

Issues

Major issues

Minor issues

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages