Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset explation #3

Open
zhouhoo opened this issue Jan 16, 2017 · 3 comments
Open

dataset explation #3

zhouhoo opened this issue Jan 16, 2017 · 3 comments

Comments

@zhouhoo
Copy link

zhouhoo commented Jan 16, 2017

Thank you for your great work. When I learn your code, I am confused by the WN18.bin dataset, why entities in it are all numbers? what are they stand for ?

@vfrico
Copy link

vfrico commented Jan 23, 2017

The dataset WN18.bin is a Wordnet subset. I'm not sure at all, but I bet that @mnick uses the same datasets as A. Bordes (https://everest.hds.utc.fr/doku.php?id=en:transe) There you can get the two datasets used by the experiment. Try to generate the WN18.bin (or other binary) using python pickle.

@sharifza
Copy link

sharifza commented Apr 8, 2018

It seems like a preprocess to has been done before pickling the file. In the training the data is unpickled like this:

   with open(self.args.fin, 'rb') as fin:
        data = pickle.load(fin)

    N = len(data['entities'])
    M = len(data['relations'])
    sz = (N, N, M)

    true_triples = data['train_subs'] + data['test_subs'] + data['valid_subs']
    if self.args.mode == 'rank':
        self.ev_test = self.evaluator(data['test_subs'], true_triples, self.neval)
        self.ev_valid = self.evaluator(data['valid_subs'], true_triples, self.neval)
    elif self.args.mode == 'lp':
        self.ev_test = self.evaluator(data['test_subs'], data['test_labels'])
        self.ev_valid = self.evaluator(data['valid_subs'], data['valid_labels'])

    xs = data['train_subs']
    ys = np.ones(len(xs))

Now, if I want to train it on for example FB15k-237 dataset (that only contains three files of "train.txt", "test.txt" and "valid.txt"), I first have to generate an object structure containing test_subs, train_subs, valid_subs, entities, relations and then pickle it. I wished this pre-process code was available otherwise I believe we need to do that before testing on any new datasets.

@aayushee
Copy link

Hi
Thanks for your explanation on the unpickling of WN18 dataset from code. Were you able to preprocess the FB15k-237 dataset and generate Hole embeddings for the same? I am not able to relate the WN18 example entity numbers in this repository with the ones in the WN18 dataset here: https://everest.hds.utc.fr/doku.php?id=en:transe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants