Skip to content

DLCV-Fall-2024/DLCV-Fall-2024-Final-1-to_be_frank_with_you

 
 

Repository files navigation

DLCV Final Project

How to run your code?

  1. Install requirements and fix library bugs
pip3 install -r ./requirements.txt
pip3 install natten==0.17.3+torch250cu124 -f https://shi-labs.com/natten/wheels/

Inside the site-packages of the current python environment, go to transformers/models/dinat/modeling_dinat.py and replace the following lines

if is_natten_available():
    from natten.functional import natten2dav, natten2dqkrpb
else:
    ...

with

if is_natten_available():
    # from natten.functional import natten2dav, natten2dqkrpb
    from natten.functional import na2d_av 
    natten2dav = lambda attn, value, kernel_size, dilation: na2d_av(attn, value, kernel_size, dilation)
    from natten.functional import na2d_qk
    natten2dqkrpb  = lambda query, key, rpb, kernel_size, dilation: na2d_qk(query, key, kernel_size, dilation, rpb=rpb)
else:
    ...
  1. Download and preprocess dataset
python -m src.utils.dataset preprocess --split test
  1. Retrieve object info and prepare RAG database
## Please give the path which contains the whole dataset (train, val, test) or these splits itself
python -m tools.obj_info --dataset_path <dataset_path>
  1. Prepare Parsed RAG data
python -m src.inference_first_stage --name <inference_first_stage | inference_first_stage_val |inference_first_stage_test> 

Since this takes a long time, you can only run with inference_first_stage_test and run others to generate only some samples. Then, first use tools/replace.py to get a coarse answer generated by random replace some words in the ground truth. (You need to modify path manually in the file) Finally, run the tools/merge_obj_info.py to get the merged final first stage answer.

  1. Download weights and configuration files

Download all from submission by running:

bash download_submission.sh

The downloaded files should look like

submission/
├── checkpoint/
│   └── latest.pt
└── config.yaml
  1. Inference
python -m src.inference --name submission --training_dir submission --ckpt_path latest.pt

The results should be in `outputs/inference//<LAST_TIMESTAMP>/submission.json

Usage

To start working on this final project, you should clone this repository into your local machine by the following command:

    git clone https://github.com/DLCV-Fall-2024/DLCV-Fall-2024-Final-1-<team_name>.git

Note that you should replace <team_name> with your own team name.

For more details, please click this link to view the slides of Final Project - .Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving The introduction video for final project can be accessed in the slides.

After cloning the repository from GitHub, the folder structure should be like this:

CODA-LM/
└── few_shot/
│   ├── scene_few_shot/
│   │   ├── high.json
│   │   └── low.json
│   └── suggestion_few_shot/
│       ├── high.json
│       └── low.json 
└── gemini_eval.py
└── llama_eval.py
└── scorer.py
└── requirement.txt
...

Environment Setup

  1. Create a new environment (python version needs to >=3.10)
conda create -n <your_env_name> python=<python_version>=3.10>
conda activate <your_env_name>
pip install -r requirement.txt
  1. Install Gemini API: To install Gemini API, please refer to the following command. For more details, please refer to Gemini API.
pip install -q -U google-generativeai
  1. You can install any additional packages you need for you final project.

Dataset

Submission Example

Download Dataset

Our data is available on huggingface, you can load the data by the following command:

from datasets import load_dataset

dataset = load_dataset("ntudlcv/dlcv_2024_final1", split=split, streaming=True)

The argmument split can be ["train", "val", "test"].

Dataset Format

for each data, the data format is as follows:

{
    "id": {subset_name}_{question_type}_{index},
    "image": PIL image, 
    "conversations": [
        {"from": "human", "value": "input text"}, 
        {"from": "gpt", "value": "output text"}
    ], ...
}

the value of key conversations shares the same format of LLaVA’s instruction tuning dataset format, you can see LLaVA-instruct-150K for further details

Submission Rules

  • You need to submit your predicted json file to the following link: Codalab
  • The submission file should be a zip file named in pred.zip containing the following files:
    • api_key.txt: your valid Gemini API key
    • submission.json: your predicted json file (key: id, value: your model's prediction),
    • e.g. Submission Example
  • You can submit up to 5 times per day.
  • For more submission details, please refer to the slides.

Evaluation

we provide two evaluation scripts to evaluate the performance of your model in validation set.

  1. Gemini evaluation: this file is identical to the one we used in Codalab
python3 gemini_eval.py --prediction <you predicted json file> --api_key <your gemini api key>
  1. Llama evaluation: Since Gemini API has daily usage limits for free accounts, we provide a local testing option using LLaMA-3 as the LLM base model. Note that using llama_eval.py requires approximately 16GB of GPU memory.
python3 llama_eval.py --prediction <you predicted json file>
  • For the argument --prediction, you should provide the json file which format is identical to "submission.json" described in Submission Rules.
  • Both files will return the LLM judges and BLEU score of your predicted json file. The Total score is calculated by the following formula: 0.8 * LLM Score + 0.2 * BLEU-3
Genral score: x.xx
Reasoning score: x.xx
Suggestion score: x.xx
LLM judges: x.xx
Bleu_1 score: x.xx
Bleu_2 score: x.xx
Bleu_3 score: x.xx
Bleu_4 score: x.xx
Total score: x.xx

Notes:

  • Since the total number of validation set is over the limit of free Gemini API, we suggest testing with only a small subset of the validation set when using Gemini API evaluation.
  • The results from LLaMA-3 may differ from Gemini's evaluation. Please use LLaMA-3's results only as a reference.
  • The supplementary materials of using Gemini API and huggingface tokens can be found in slides.

Deadline

113/12/26 (Thur.) 23:59 (GMT+8)

Q&A

If you have any problems related to Final Project, you may

  • Use TA hours
  • Contact TAs by e-mail ([email protected])
  • Post your question under [Final challenge 1] Discussion section in NTU Cool Discussion

About

dlcv-fall-2024-final-project-challenge-1-DLCV-Fall-2024-Final-1-6 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%