References directories to compare apples and apples #53

jnothman · 2014-06-23T17:46:48Z

I propose that under references/ we divide the system outputs into directories representing the different task settings. I propose that we split references into:

references/gold-mentions: the system attempted to link all (including NILs) gold mentions (?schwa-linkable)
references/gold-linked-mentions: the system attempted to link only gold linked mentions (aida, houlsby)
`references/system-mentions': the system identified its own mentions (schwa, tagme)

There's still the potential for the entries in the directories not to be altogether comparable with one another. For example, we could subdivide system-mentions into those that generate NEs only (schwa), and those that include other wikilinks (tagme); we could subdivide gold-mentions according to whether the system had access to CoNLL 2003 type annotations (although this may be harder to infer).

There is also the question of whether the directory structure should similarly be utilised to label (a) the corpus being evaluated (e.g. CoNLL vs ?IITB; testa vs testb), and (b) the ID mapping.

The text was updated successfully, but these errors were encountered:

benhachey · 2014-06-24T01:27:52Z

Also:

references/gold-linked-aidacandidates: Same as references/gold-linked-mentions, but uses aida_means.tsv.bz2 for candidate generation. I.e., the precise Hoffart et al. (2011) task setting.

jnothman · 2014-06-24T04:34:57Z

I still don't see the difference between that and the setting where a
system's input is those mentions in the gold that are linked... assuming
this version of the gold, which for now is all we have.

On 23 June 2014 21:27, Ben Hachey [email protected] wrote:

Also:

references/gold-linked-aidacandidates: Same as
references/gold-linked-mentions, uses YAGO means/label relationships
for candidate generation. I.e., the precise Hoffart et al. (2011) task
setting.

—
Reply to this email directly or view it on GitHub
#53 (comment)
.

wejradford · 2014-06-24T05:55:32Z

I agree with the first structure points.

I think we keep the means dataset, as the goal is to demystify the evaluation (and its knobs and levers).

There is also the question of whether the directory structure should similarly be utilised to label (a) the corpus being evaluated (e.g. CoNLL vs ?IITB; testa vs testb), and (b) the ID mapping.

I favour putting in conll or similar, but am not sure about ID mappings. They're nice regression test fodder, but we shouldn't really need them as a user can run the appropriate commands to generate.

benhachey · 2014-06-25T01:27:01Z

@jnothman - The difference is in the candidates (not the mentions).

On Tue, Jun 24, 2014 at 2:34 PM, jnothman [email protected] wrote:

I still don't see the difference between that and the setting where a
system's input is those mentions in the gold that are linked... assuming
this version of the gold, which for now is all we have.

jnothman linked a pull request Jun 24, 2014 that will close this issue

Rename reference outputs and add readme #57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

References directories to compare apples and apples #53

References directories to compare apples and apples #53

jnothman commented Jun 23, 2014

benhachey commented Jun 24, 2014

jnothman commented Jun 24, 2014

wejradford commented Jun 24, 2014

benhachey commented Jun 25, 2014

References directories to compare apples and apples #53

References directories to compare apples and apples #53

Comments

jnothman commented Jun 23, 2014

benhachey commented Jun 24, 2014

jnothman commented Jun 24, 2014

wejradford commented Jun 24, 2014

benhachey commented Jun 25, 2014