Skip to content

Commit

Permalink
New abstract
Browse files Browse the repository at this point in the history
  • Loading branch information
mattolson93 authored Mar 8, 2024
1 parent 10ca30e commit 789ec0b
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions lvlm_interpret/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -157,10 +157,13 @@ <h4 class="subtitle has-text-centered">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
As multi-modal large language models gain popularity, there is a growing need to delve into their internal workings. There have been multiple advancements in the field of various explainability tools and mechanisms. Our application highlights the significance of studying the internal mechanisms of these models.
Our tool aims to enhance understanding the relevant image parts that were used to generate an answer and how well the language model grounds its output in the image.
Such explorations help uncover shortcomings, leading to improvements in system capabilities.
In this work, we showcase how this application can help to understand better a failure mechanism on a popular large multi-modal model: LLaVA.
In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest.
These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task.
Numerous advancements have been made in the field of explainability tools and mechanisms, yet there is still much to explore.
In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of these models.
Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer, and assess the efficacy of the language model in grounding its output in the image.
With out application, a user can systematically investigate the model and uncover system limitations, paving the way for enhancements in system capabilities.
This work presents a case study of how our application can aid in understanding a failure mechanism in a popular large multi-modal model: LLaVA.
</p>

</div>
Expand Down

0 comments on commit 789ec0b

Please sign in to comment.