Multi-modal Reference Resolution

Requirements

Python: >=3.9,<3.12
Python dependencies: See pyproject.toml.
nobu-g/GLIP
nobu-g/cohesion-analysis

Setup Environment

poetry env use /path/to/python
poetry install

Prepare Datasets

Download the J-CRe3 annotations.

git clone [email protected]:riken-grp/J-CRe3.git /somewhere/J-CRe3

Download the video and audio files following the instructions in J-CRe3.

Place data files under data directory. You can use cp -r instead of ln -s.

mkdir -p data
ln -s /somewhere/J-CRe3/textual_annotations ./data/knp
ln -s /somewhere/J-CRe3/visual_annotations ./data/image_text_annotation
ln -s /somewhere/J-CRe3/id ./data/id
ln -s /somewhere/J-CRe3/recording ./data/recording
ln -s /somewhere/J-CRe3/recording ./data/dataset

Setup GLIP

Follow the instructions in nobu-g/GLIP to set up the GLIP environment. We recommend to use Poetry to install the dependencies.

Download the checkpoint files under the GLIP root directory.

cd /somewhere/GLIP
wget https://lotus.kuee.kyoto-u.ac.jp/~ueda/dist/GLIP/OUTPUT.zip
unzip OUTPUT.zip

Setup cohesion analyzer

Clone nobu-g/cohesion-analysis and checkout jcre3 branch.

git clone [email protected]:nobu-g/cohesion-analysis.git
cd cohesion-analysis
git checkout jcre3

Follow the instructions in nobu-g/cohesion-analysis to set up the Python virtual environment.

Download the pre-trained checkpoint file.

wget https://lotus.kuee.kyoto-u.ac.jp/~ueda/dist/cohesion_analysis_v2/model_jcre3_large.bin

Run Prediction

Make a config file for cohesion analysis.

Copy the example config file and modify it as needed.

cp configs/cohesion/example.yaml configs/cohesion/default.yaml

Modify the python, project_root, and checkpoint fields in configs/cohesion/default.yaml as needed.

Make a config file for phrase grounding by GLIP.

Copy the example config file and modify it as needed.

cp configs/glip/example.yaml configs/glip/ft2_deberta_b24_u3s1_b48_1e.yaml

Modify the python, project_root, name, exp_name, and checkpoint fields in configs/glip/ft2_deberta_b24_u3s1_b48_1e.yaml as needed. The name field should be the same as the file name of the config file (ft2_deberta_b24_u3s1_b48_1e).

Make a config file for prediction writer.

Copy the example config file and modify it as needed.

cp configs/example.yaml configs/default.yaml

Modify the cohesion, glip, and checkpoint fields in configs/prediction_writer/default.yaml as needed.

Run prediction_writer.py.

[AVAILABLE_GPUS=0,1,2,3] python src/prediction_writer.py -cn default \
    phrase_grounding_model=glip \
    glip=ft2_deberta_b24_u3s1_b48_1e \
    id_file=<(cat data/id/test.id data/id/valid.id) \
    luigi.workers=4

Run Evaluation

python src/evaluation.py \
  --dataset-dir data/dataset \
  --gold-knp-dir data/knp \
  --gold-annotation-dir data/image_text_annotation \
  --prediction-mmref-dir result/mmref/glip_ft2_deberta_b24_u3s1_b48_1e \
  --prediction-knp-dir result/cohesion \
  --scenario-ids $(cat data/id/test.id) \
  --recall-topk -1 1 5 10 \
  --confidence-threshold 0.0 \
  --column-prefixes rel_type prec rec

Citation

@inproceedings{ueda-2024-jcre3,
  title={J-CRe3: A Japanese Conversation Dataset for Real-world Reference Resolution},
  author={Nobuhiro Ueda and Hideko Habe and Yoko Matsui and Akishige Yuguchi and Seiya Kawano and Yasutomo Kawanishi and Sadao Kurohashi and Koichiro Yoshino},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
  year={2024},
  pages={},
  address={Turin, Italy},
}

@inproceedings{植田2023a,
  author    = {植田 暢大 and 波部 英子 and 湯口 彰重 and 河野 誠也 and 川西 康友 and 黒橋 禎夫 and 吉野 幸一郎},
  title     = {実世界における総合的参照解析を目的としたマルチモーダル対話データセットの構築},
  booktitle = {言語処理学会 第29回年次大会},
  year      = {2023},
  address   = {沖縄},
}

Name		Name	Last commit message	Last commit date
Latest commit History 484 Commits
configs		configs
pdf_viewer		pdf_viewer
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
luigi.toml		luigi.toml
lvis_categories.json		lvis_categories.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal Reference Resolution

Requirements

Setup Environment

Prepare Datasets

Setup GLIP

Setup cohesion analyzer

Run Prediction

Run Evaluation

Citation

References

About

Releases

Packages

Languages

riken-grp/multimodal-reference

Folders and files

Latest commit

History

Repository files navigation

Multi-modal Reference Resolution

Requirements

Setup Environment

Prepare Datasets

Setup GLIP

Setup cohesion analyzer

Run Prediction

Run Evaluation

Citation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages