- Regenerated concreteness plots for TVQA/AVSD/PVSE: See
plots_n_stats/all/2_improved_concreteness_distribution
Observation: - Feature concreteness for each dataset appears intuitive
- Inparitulcar vcpts from TVQA
- Much of AVSD is concrete. Very interesting
(Recent History)
TVQA's statistics(Early October)Kastner's Thesis(Mid October)Create the word norm dictionary(Late October)Clean, centralise and 'clone ready' clean GitHub(Late October)Check examples of datasets with massively improved norm compilation:(Late October)TVQAAVSDPVSE
Polish and extend norm-dict functionality(End of October)Redo Concreteness distribution:(Beginning of November)TVQAAVSDPVSE
Isolate norm information in existing models. Using Hopfield network's BERT analysis(End of November)- Create associative/categorical network draft (NEXT)
- Norm pretraining experimental results (NEXT)
- TVQA
- AVSD
- PVSE
- Network refinement ...
- Vcpts and regional features from both nautral co-reference. Segmentation map one part and coreference another!!!
- Object Oriented Ontology
- GET THE QA FOR ABSTRACT CONCEPTS FOR A GOOD AMOUNT OF ABSTRACT CONCEPTS
- TVQA statistics with BERT
- Redo TVQA's
1_data_concreteness_distribution
with extended full norm dictionary this time - Create larger transformer model
- Hopfield networks for associative learning
- Dean as an honorable mention for pointing out Hopfield networks to me
- (Hudson/Dean suggestion) (Pending)
- Emulate the Hopfield network's idea to get softmaxes of concepts that i have and see how close they are
- Follow this up by associatively overtraining these elements to create proper convergence around these fuzzy terms
- Do something similar for concrete terms
- Find out which fine-tuned things are specialising in what
- Storing patterns is potentially better suited to associating linguistic labels with concrete instantiations of objects
- This project was initially inspired by the dual coding theory paradigm.
- The Neurologists and Psycholinguists have worked hard to isolate different properties of words, concepts and information we store in our brains, so-called Psycholinguistic Word Norms, e.g. concreteness, imagability, dominance...
- Of particular note is concreteness:"specific, definite and vivid", which is considered on a spectrum against abstractness: "vague, immaterial and incoporeal".
- Put far too simply, the most advanced intelligence we know of (the human brain) apparently sees it fit to store and handle concrete and abstract words and concepts in structurally different representations.
- If the brain decides to engineer itself with these priors in mind, we presumptive explorers of intelligence would perhaps do well to consider how this may guide our comparatively clumsy efforts in modern machine learning.
- (
avsd
): AVSD implementation - (
misc
): Holds images, the norm dictionary, and other various single-use files - (
myutils.py
): Helper functions - (
results
): Where runs are stored - (
tvqa
): TVQA implementation - (
word_norms.py
): Handling code for word norm processing and dictionary creation - (
plots
): Directory for statistics and plots from each dataset. Noura and co should pay attention - (
scripts
): A central location for example scripts for functionality of all code in this repository
Dont bullet points just make everything nicer.
- Big(gest?) Norm Dictionary: We centralise many of the existing norm databases into one flexible and extensive resource. It includes concreteness values we focus on, but many, many more that others may find useful. To the best of our knowledge, this is the largest single compilation of word-norm databases available in code.
- To be confirmed
AVSD and PVSE implementations are directly adapted from the official repositories. The TVQA implementation is one we used in another of our projects (which is in turn adapted from the original repository). We thank and appreciate the authors of these repositories for their well documented implementations. If in using our implementation here you use any of the features from these 3 implementations please credit and cite the original authors and implementations as they ask.
Example scripts running various experiments in this repository for all sections are centralised in (scripts
).
- Clone repo -> (
git clone [email protected]:Jumperkables/a_vs_c.git
) - Central (
a_vs_c
) virtual env: (pip install -r requirements.txt
). You will have to edit running scripts to source your virtual environment. You may find it useful to create a symlink: (ln -s /your/virtual/envs venvs
). - The Word Norm Dictionary. Use the supplied pickle file yourself, or generate your own and browse the other leftover norms those datasets have to offer:
- (
misc/all_norms.pickle
) ('Word2Norm' class defined in (word_norms.py
) - Gather a_vs_c data into a single directory. Follow the links below, and cross check with the path arguments in (
word_norms.py
) for appropriate subdirectory names. (ln -s /single/directory/you/just/made data
) (This will take some time). When you're done, run (scripts/extraction/create_norm_dict.sh
)
- (
You will prepare several virtual environments for different implementations. You may skip any of these if you don't plan on using those parts of my implementation.
- Best to make some results directories (
mkdir results results/avsd results/pvse results/tvqa
)
- 'avsd' virtual env: (
pip install -r avsd/requirements.txt
) - Follow the data download instructions in (
avsd/README.md
) - Take your pick: (
scripts/avsd/runs
)
- 'pvse' virtual env: (
pip install -r pvse/requirements.txt
) - Follow the data download instructions in (
pvse/README.md
) - ADDITIONALLY, (
ln -s /where/you/saved/pvse/data pvse/data
) - Bon appetite: (
scripts/pvse/runs
)
A bit more involved, because you'll have to set up my other TVQA repository and symlink it in here.
- In a different location or directory (anywhere really), follow my full instructions for setting up my tvqa_modality_bias repo. MAKE SURE TO CLONE THE a-vs-c BRANCH, NOT MASTER
- Now back in the root (
a_vs_c
) repo, create a symlink to TVQA repo you just installed: (ln -s /path/to/tvqa_modality_bias tvqa/tvqa_modality_bias
) - Create a symlink in the
tvqa_modality_bias/models
directory to the overall a_vs_c top-levelmodels
directory to allow imports from my custom models (ln path/to/a-vs-c/models tvqa_modality_bias/models/a-vs-c_models
) - (
scripts/tvqa/runs
)
The norm dictionary we created (misc/all_norms.pickle
) is made using the following sources. The link are not ALL the official ones:
- MT40k
- USF
- MRC
- SimLex999
- Vinson
- McRae
- SimVerb
- CP (includes and extends from the PYM dataset)
- TWP
- Battig
- Cortese
- Imageability Corpus
- Reilly's Compilation from "Formal Distinctiveness of High- and Low-Imageability Nouns:Analyses and Theoretical Implications" (contact the author to request access)
- Sianpar's Indonesian Norms (property norms for indonesian, but English translations are included)
- Chinese Word Norm Corpus (norms for Chinese words, with English translations)
- MEGAHR Facebook Cross-Lingual
- Glasgow Word Norms
- CSLB (the property norms are too specific for our use)
- imSitu (wordy and specific descriptions of images)
- EViLBERT (embeddings and images of non-concrete concepts)
- Instructions for TVQA vocab changing
- Doublecheck that AVSD doesnt need extra external work done
- Mention and thank Remi's multimodal package.
Published at somewhere
@inproceedings{avscmm,
title={TBC},
author={Winterbottom, T. and Xiao, S. and McLean, A. and Al Moubayed, N.},
booktitle={},
year={202X}
}
Feel free to contact me @ ([email protected]
) if you have any criticisms you'd like me to hear out or would like any help