- Install requirements and fix library bugs
pip3 install -r ./requirements.txt
pip3 install natten==0.17.3+torch250cu124 -f https://shi-labs.com/natten/wheels/
Inside the site-packages
of the current python environment, go to transformers/models/dinat/modeling_dinat.py
and replace the following lines
if is_natten_available():
from natten.functional import natten2dav, natten2dqkrpb
else:
...
with
if is_natten_available():
# from natten.functional import natten2dav, natten2dqkrpb
from natten.functional import na2d_av
natten2dav = lambda attn, value, kernel_size, dilation: na2d_av(attn, value, kernel_size, dilation)
from natten.functional import na2d_qk
natten2dqkrpb = lambda query, key, rpb, kernel_size, dilation: na2d_qk(query, key, kernel_size, dilation, rpb=rpb)
else:
...
- Download and preprocess dataset
python -m src.utils.dataset preprocess --split test
- Retrieve object info and prepare RAG database
## Please give the path which contains the whole dataset (train, val, test) or these splits itself
python -m tools.obj_info --dataset_path <dataset_path>
- Prepare Parsed RAG data
python -m src.inference_first_stage --name <inference_first_stage | inference_first_stage_val |inference_first_stage_test>
Since this takes a long time, you can only run with inference_first_stage_test
and run others to generate only some samples. Then, first use tools/replace.py
to get a coarse answer generated by random replace some words in the ground truth. (You need to modify path manually in the file) Finally, run the tools/merge_obj_info.py
to get the merged final first stage answer.
- Download weights and configuration files
Download all from submission by running:
bash download_submission.sh
The downloaded files should look like
submission/
├── checkpoint/
│ └── latest.pt
└── config.yaml
- Inference
python -m src.inference --name submission --training_dir submission --ckpt_path latest.pt
The results should be in `outputs/inference//<LAST_TIMESTAMP>/submission.json
To start working on this final project, you should clone this repository into your local machine by the following command:
git clone https://github.com/DLCV-Fall-2024/DLCV-Fall-2024-Final-1-<team_name>.git
Note that you should replace <team_name>
with your own team name.
For more details, please click this link to view the slides of Final Project - .Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving The introduction video for final project can be accessed in the slides.
After cloning the repository from GitHub, the folder structure should be like this:
CODA-LM/
└── few_shot/
│ ├── scene_few_shot/
│ │ ├── high.json
│ │ └── low.json
│ └── suggestion_few_shot/
│ ├── high.json
│ └── low.json
└── gemini_eval.py
└── llama_eval.py
└── scorer.py
└── requirement.txt
...
- Create a new environment (python version needs to
>=3.10
)
conda create -n <your_env_name> python=<python_version>=3.10>
conda activate <your_env_name>
pip install -r requirement.txt
- Install Gemini API: To install Gemini API, please refer to the following command. For more details, please refer to Gemini API.
pip install -q -U google-generativeai
- You can install any additional packages you need for you final project.
Our data is available on huggingface, you can load the data by the following command:
from datasets import load_dataset
dataset = load_dataset("ntudlcv/dlcv_2024_final1", split=split, streaming=True)
The argmument split
can be ["train", "val", "test"]
.
for each data, the data format is as follows:
{
"id": {subset_name}_{question_type}_{index},
"image": PIL image,
"conversations": [
{"from": "human", "value": "input text"},
{"from": "gpt", "value": "output text"}
], ...
}
the value of key conversations
shares the same format of LLaVA’s instruction tuning dataset format, you can see LLaVA-instruct-150K for further details
- You need to submit your predicted json file to the following link: Codalab
- The submission file should be a
zip
file named inpred.zip
containing the following files: - You can submit up to 5 times per day.
- For more submission details, please refer to the slides.
we provide two evaluation scripts to evaluate the performance of your model in validation set.
Gemini evaluation
: this file is identical to the one we used in Codalab
python3 gemini_eval.py --prediction <you predicted json file> --api_key <your gemini api key>
Llama evaluation
: Since Gemini API has daily usage limits for free accounts, we provide a local testing option using LLaMA-3 as the LLM base model. Note that using llama_eval.py requires approximately 16GB of GPU memory.
python3 llama_eval.py --prediction <you predicted json file>
- For the argument
--prediction
, you should provide the json file which format is identical to "submission.json" described in Submission Rules. - Both files will return the LLM judges and BLEU score of your predicted json file. The
Total score
is calculated by the following formula:0.8 * LLM Score + 0.2 * BLEU-3
Genral score: x.xx
Reasoning score: x.xx
Suggestion score: x.xx
LLM judges: x.xx
Bleu_1 score: x.xx
Bleu_2 score: x.xx
Bleu_3 score: x.xx
Bleu_4 score: x.xx
Total score: x.xx
Notes:
- Since the total number of validation set is over the limit of free Gemini API, we suggest testing with only a small subset of the validation set when using Gemini API evaluation.
- The results from LLaMA-3 may differ from Gemini's evaluation. Please use LLaMA-3's results only as a reference.
- The supplementary materials of using Gemini API and huggingface tokens can be found in slides.
113/12/26 (Thur.) 23:59 (GMT+8)
If you have any problems related to Final Project, you may
- Use TA hours
- Contact TAs by e-mail ([email protected])
- Post your question under
[Final challenge 1] Discussion
section in NTU Cool Discussion