Skip to content

briatte/epsa2019

Repository files navigation

Main project repository: briatte/epsaconf

Data notes

  • The main dataset containing all information is program.tsv. That fiel is also the only one that contains all fixes and improvements, and should therefore always be preferred to all other 'intermediary' ones, such as authors.tsv or sessions.tsv.

  • Conference participants do not have a unique author id in the raw data, and are identified by their full names instead. Even though there are no homonyms (all author-affiliation pairs are unique), the code still creates an alphanumeric identifier that should be unique to the author and conference, by combining the name and affiliation of each participation to the epsa2019 keyword, and by taking a 128-bit hash of that string.

  • In a few cases, some participants have submitted multiple affiliations, sometimes pointing to the same entity under different names, sometimes not. The file participants-fixes.tsv, which was assembled by hand, solves all cases, either by providing a single affiliation per participant, or by combining two affiliations into a single one with &&. The latter affect 8 affiliations: Adriana Buena, Anita Gohdes, Catherine de Vries, Christophe Crombez, Dominik Hangartner, Kathrin Ackermann, Raimondas Ibenskas and Sebastian Ziaja.