This repo contains the implementation of of 3-dimensional Testing Protocols (HoldOut, ACD and FewShot) for CompMCTG (the main repo can be found here: https://github.com/tqzhong/CG4MCTG.)
2 attributes: "sent", "topic"
- "sent"$\in${"pos","neg"}
- "topic"$\in${"books", "clothing", "music", "electronics", "movies", "sports"}
4 attributes: "sentiment", "gender", "cuisine", "tense"
- "sentiment"$\in${"Pos","Neg"}
- "gender"$\in${"Male","Female"}
- "cuisine"$\in${"Asian","American","Mexican","Bar","Dessert"}
- "tense"$\in${"present","past"}
Below is an example:
{
"gender": "Male",
"sentiment": "Pos",
"cuisine": "Bar",
"tense": "Present",
"review": "love going here for happy hour or dinner ! great patio with fans to beat the stl heat ! also ... very accomodating at this location . i like the veal milanese but with mixed greens instead of pasta ! they 'll modify the menu to suit your taste !\n"
}
3 attributes: "sentiment", "pronoun", "tense"
- "sentiment"$\in${"pos","neg"}
- "pronoun"$\in${"plural","singular"}
- "tense"$\in${"present","past"}
2 attributes: "sentiment", "topic_cged"
- "sentiment"$\in${"pos","neg"}
- "topic_cged"$\in${"imdb", "opener", "tablets", "auto"}
Basically, you can refer to the inferences in test_load_dataset.py
and view the code in the load_dataset.py
.
- Construction of the classifer data : training/dev/testing (70% : 15% : 15%)
- Construction of the generator training data: HoldOut/MCD(max-avg-min)/FewShot(max-avg-min)
The implementation is on the basis of Google Research's implementation of TMCD (https://github.com/google-research/language/tree/master/language/compgen/nqg, "Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both?", ACL'2021).