Skip to content

University of Southampton CLPsych 2022 software submission

License

Notifications You must be signed in to change notification settings

stuartemiddleton/uos_clpsych

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

University of Southampton CLPsych 2022 Shared Task Submission

Detecting Moments of Change and Suicidal Risks in Longitudinal User Texts Using Multi-task Learning

This work (https://aclanthology.org/2022.clpsych-1.19/) describes the classification system proposed for the Computational Linguistics and Clinical Psychology (CLPsych) Shared Task 2022. We propose the use of multitask learning approach with a bidirectional long-short term memory (Bi-LSTM) model for predicting changes in user's mood (Task A) and their suicidal risk level (Task B). The two classification tasks have been solved independently or in an augmented way previously, where the output of one task is leveraged for learning another task, however this work proposes an 'all-in-one' framework that jointly learns the related mental health tasks. Our experimental results (ranked top for task A) suggest that the proposed multi-task framework outperforms the alternative single-task frameworks submitted to the challenge and evaluated via the timeline based and coverage based performance metrics shared by the organisers. We also assess the potential of using various types of feature embedding schemes that could prove useful in initialising the Bi-LSTM model for better multitask learning in the mental health domain.

This work is part of the UKRI TAS Hub SafeSpacesNLP project https://www.tas.ac.uk/safespacesnlp/ and supported by the Engineering and Physical Sciences Research Council (EP/V00784X/1). If you use any of the resources in this repository, please cite it as:

Tayyaba Azim, Loitongbam Singh, and Stuart Middleton. 2022.Detecting Moments of Change and Suicidal Risks in Longitudinal User Texts Using Multi-task Learning. In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, pages 213–218, Seattle, USA. Association for Computational Linguistics.

@inproceedings{azim-etal-2022-detecting,
    title = "Detecting Moments of Change and Suicidal Risks in Longitudinal User Texts Using Multi-task Learning",
    author = "Azim, Tayyaba  and
      Singh, Loitongbam  and
      Middleton, Stuart",
    booktitle = "Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology",
    month = July,
    year = "2022",
    address = "Seattle, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.clpsych-1.19",
    pages = "213--218",
    abstract = "This work describes the classification system proposed for the Computational Linguistics and Clinical Psychology (CLPsych) Shared Task 2022. We propose the use of multitask learning approach with bidirectional long-short term memory (Bi-LSTM) model for predicting changes in user{'}s mood and their suicidal risk level. The two classification tasks have been solved independently or in an augmented way previously, where the output of one task is leveraged for learning another task, however this work proposes an {`}all-in-one{'} framework that jointly learns the related mental health tasks. The experimental results suggest that the proposed multi-task framework outperforms the remaining single-task frameworks submitted to the challenge and evaluated via timeline based and coverage based performance metrics shared by the organisers. We also assess the potential of using various types of feature embedding schemes that could prove useful in initialising the Bi-LSTM model for better multitask learning in the mental health domain.",
}

Proposed Framework

Framework


License

Data Set:

The CLPsych data set is proprietary and not shared here. Please contact the competition organisers at [email protected] to get a copy of its distribution.

Software:

  • © Copyright University of Southampton, 2022, Highfield, University Road, Southampton SO17 1BJ.
  • Created By : Tayyaba Azim and Gyanendro Loitongbam
  • Created Date : 2022/05/26
  • Project : SafeSpacesNLP (https://www.tas.ac.uk/safespacesnlp/)

Installation Requirements Under Ubuntu 20.04LTS

  • The experiments were run on Dell Precision 5820 Tower Workstation with Nvidia Quadro RTX 6000 24 GB GPU using Nvidia CUDA Toolkit 11.7 and Ubunti 20.04 LTS.
  • Install the following pre-requisite libraries:
pip install -U sentence-transformers
pip install gensim
pip install transformers
pip install tensorflow
pip install keras

Package                       Version
----------------------------- --------------------
gensim                        4.0.1
keras                         2.9.0
sentence-transformers         2.2.0
tensorflow                    2.9.1
transformers                  4.20.1

Pretrained Models Required

cd <uos_clpsych_dir>
mkdir dataset
cd <uos_clpsych_dir>/dataset
wget -O wiki-news-300d-1M.vec.zip https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
unzip wiki-news-300d-1M.vec.zip

Preparing Data Set

The CLPsych 2022 dataset consists of two types of tab delimited CSV files: timeline_ID.tsv and train_users.json.

Each timeline_ID.tsv contains information about user posts (postid, date, label, title, content) in a particular timeline and the filename (timeline_ID) is the timeline ID.

Each train_users.json consists of { user_ID : { risk_level : Severe|Moderate|Low, timelines : [timeline_ID, ...] }.

We merged all the files and prepare a single comma delimited CSV file containing both the informations in timeline_ID.tsv and train_users.json. The following relational table structure shows how we prepared for the single CSV file.

| Timeline_ID | User_ID | User_risk | Content | Post_ID | Post_label |
  • Timeline_ID: This is the timeline ID of the sequence of posts by a user
  • User_ID: The user id of the above timeline.
  • User_Risk: It is the user risk label shared with all the timelines of the user above (Severe|Moderate|Low).
  • Content: This is the total text merged with title and content for each user post.
  • Post_ID: It is the post id.
  • Post_label: It is the label of the post to indicate the moment of change (0|IS|ES). IS = switch. ES = escalation.

Save the training and testing sets as training_dataset.csv and testing_dataset.csv respectively.

Sentence Embedding Methods

There are two types of sentence embedding methods considered for this study (Please refer to the paper for detail explaination):

  • sent_emb: fastText + SBERT
  • sent_score_emb: fastText + SBERT + Task-specific scores

Training Models

cd <uos_clpsych_dir>
mkdir results
mkdir model

# train Multitask (use nohup to run in background as training can take a few hours)
nohup python codes/CLPsych-multitask_text.py --attention_layer 0 --load_classes dataset/training_classes_index.pkl --training_dataset dataset/training_dataset.csv --testing_dataset dataset/testing_dataset.csv --result_dir results/ --save_model 1 > train_stdout.txt &

# train Multitask-attn
nohup python codes/CLPsych-multitask_text.py --attention_layer 1 --load_classes dataset/training_classes_index.pkl --training_dataset dataset/training_dataset.csv --testing_dataset dataset/testing_dataset.csv --result_dir results/ --save_model 1 > train_stdout.txt &

# train Multitask-score (use nohup to run in background as training can take a few days)
nohup python codes/CLPsych-multitask_text-score.py --attention_layer 0 --load_classes dataset/training_classes_index.pkl --training_dataset dataset/training_dataset.csv --testing_dataset dataset/testing_dataset.csv --result_dir results/ --save_model 1 > train_stdout.txt &

# train Multitask-attn-score
nohup python codes/CLPsych-multitask_text-score.py --attention_layer 1 --load_classes dataset/training_classes_index.pkl --training_dataset dataset/training_dataset.csv --testing_dataset dataset/testing_dataset.csv --result_dir results/ --save_model 1 > train_stdout.txt &
  • attention_layer: Flag to define whether the model to be trained is with (1) or without (0) attention layer.
  • load_classes: Location to load or save the pickle of training class indices.
  • training_dataset: Location of the training dataset file.
  • testing_dataset: Location of the testing dataset file.
  • result_dir: Location to save the model predicted results.
  • save_model: Flag to save model yes (1) or not (0).

Testing Models

# test Multitask
python codes/CLPsych-multitask_text_testing.py --attention_layer 0 --load_classes dataset/training_classes_index.pkl --testing_dataset dataset/testing_dataset.csv --result_dir results/

# test Multitask-attn
python codes/CLPsych-multitask_text_testing.py --attention_layer 1 --load_classes dataset/training_classes_index.pkl --testing_dataset dataset/testing_dataset.csv --result_dir results/

# test Multitask-score
python codes/CLPsych-multitask_text_testing_score.py --attention_layer 0 --load_classes dataset/training_classes_index.pkl --testing_dataset dataset/testing_dataset.csv --result_dir results/

# test Multitask-attn-score
python codes/CLPsych-multitask_text_testing_score.py --attention_layer 1 --load_classes dataset/training_classes_index.pkl --testing_dataset dataset/testing_dataset.csv --result_dir results/
  • Multitask: model using sent_emb
  • Multitask-score: model using sent_score_emb
  • Multitask-attn: model with attention layer using sent_emb
  • Multitask-attn-score: model with attention layer using sent_score_emb.

Evaluating models on Shared Task 2022 (run by our team locally)

# eval script for task a (using trainingset here but you should provide a validation or test set)
python codes/evaluate.py results/training_dataset_multitask-taska-with_attention.csv no
python codes/evaluate.py results/training_dataset_multitask-taska-without_attention.csv no
python codes/evaluate.py results/training_dataset_multitask-score-taska-without_attention.csv no
python codes/evaluate.py results/training_dataset_multitask-score-taska-with_attention.csv no

# eval script for task b (using trainingset here but you should provide a validation or test set)
python codes/evaluate.py results/training_dataset_multitask-taskb-with_attention.csv yes
python codes/evaluate.py results/training_dataset_multitask-taskb-without_attention.csv yes
python codes/evaluate.py results/training_dataset_multitask-score-taskb-without_attention.csv yes
python codes/evaluate.py results/training_dataset_multitask-score-taskb-with_attention.csv yes

Task A: Moments of Change

Model Precision Recall F1
Multitask 0.582 0.717 0.629
Multitask-attn 0.663 0.697 0.676
Multitask-score 0.680 0.760 0.713
Multitask-attn-score 0.674 0.800 0.724

Task B: Suicidal Risk Levels

Model Precision Recall F1
Multitask 0.352 0.327 0.335
Multitask-attn 0.408 0.378 0.388
Multitask-score 0.355 0.331 0.334
Multitask-attn-score 0.415 0.397 0.382

Evaluating models on Shared Task 2022 Test Set Results (run by CLPsych-2022 organisers on our behalf using hidden test labels)

Task A: Moments of Change

Model Precision Recall F1
Multitask 0.680 0.579 0.649
Multitask-score 0.677 0.595 0.625
Multitask-attn-score 0.680 0.579 0.607

Task B: Suicidal Risk Levels

Model Precision Recall F1
Multitask-attn-score 0.618 0.427 0.451

Releases

No releases published

Packages

No packages published

Languages