-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial: Creating a simple visualization input file
This tutorial creates a VULCAN visualization input file for the Little Prince AMR corpus, visualizing strings and graphs.
Download the Little Prince AMR corpus: https://amr.isi.edu/download/amr-bank-struct-v3.0.txt, we will need it below.
You also need the penman
package. It is part of the default VULCAN setup, but if you don't have it installed yet, run
pip install penman
We use the penman package to load the AMR corpus (let's say its location is stored in the variable little_prince_path
):
from penman import load
amrs = load(little_prince_path)
This gives us a list of all the AMRs in the corpus, over which we can now iterate.
First, we make a PickleBuilder
object (you can find that class here). We want to visualize graphs and sentences, and let us simply title them "Graph" and "Sentence". We need to specify what format they are in, choosing an option from this list. VULCAN supports the penman
Graph
object with the format name "graph". Our sentences will just be (untokenized) strings, so we can use the "string" format name that applies simple tokenization based on whitespace. We pass all this information to the PickleBuilder
constructor, using a dictionary that maps our object names/titles to their format name:
pb = PickleBuilder({"Graph": "graph", "Sentence": "string"})
We now iterate over the corpus, adding instances to the PickleBuilder
as we go:
for amr in amrs:
sentence = amr.metadata["snt"]
pb.add_instance_by_name("Graph", amr)
pb.add_instance_by_name("Sentence", sentence)
Finally, we write the data to a pickle file (assuming the file path is in the variable output_path
):
pb.write(output_path)
Using the above code, we can write the following script converting the AMR corpus into a pickle file:
import sys
from penman import load
from vulcan.pickle_builder.pickle_builder import PickleBuilder
def main():
little_prince_path = sys.argv[1]
output_path = sys.argv[2]
amrs = load(little_prince_path)
pb = PickleBuilder({"Graph": "graph", "Sentence": "string"})
for amr in amrs:
sentence = amr.metadata["snt"]
pb.add_instance_by_name("Graph", amr)
pb.add_instance_by_name("Sentence", sentence)
pb.write(output_path)
if __name__ == "__main__":
main()
Try it yourself! You can also find the above code in the VULCAN repo, here.
You might want to store the data in a JSON file instead of a pickle file, for example because JSON files are (to some extent) human-readable. For JSON, we need to convert everything into strings first, which the following code does:
import sys
from penman import load, encode
from vulcan.pickle_builder.pickle_builder import PickleBuilder
def main():
little_prince_path = sys.argv[1]
output_path = sys.argv[2]
amrs = load(little_prince_path)
pb = PickleBuilder({"Graph": "graph_string", "Sentence": "string"})
for amr in amrs:
sentence = amr.metadata["snt"]
amr_string = encode(amr)
pb.add_instance_by_name("Graph", amr_string)
pb.add_instance_by_name("Sentence", sentence)
pb.write_as_json(output_path)
if __name__ == "__main__":
main()