Skip to content

Supplemental Materials for "Duet: Helping Data Analysis Novices Conduct Pairwise Comparisons by Minimal Specifications"

Notifications You must be signed in to change notification settings

DuetPairComp/supplemental

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supplemental Materials for "Duet: Helping Data Analysis Novices Conduct Pairwise Comparisons by Minimal Specification"

Contents

Links to Videos Introducing Duet’s User Interface

Link to the System

Explanations for the “Logistic Regression” Folder

Explanations for the “User Study” Folder

Clarification of Literature Review


Links to Videos Introducing Duet’s User Interface

Tutorial Video Used in the User Study: Link

Analyzing US College Scorecard Data Using Duet: Link


Link to the System

Duet’s prototype: Link

README:
1. Use Google Chrome for better experience.
2. Some datasets are provided in the “Datasets” folder for trying out the system.


Explanations for the “Logistic Regression” Folder

In the following, we explain the materials in the “Logistic Regression” folder. The “Logistic Regression” folder provides details for the model in Sec. 5.3.2 (Multinomial Logistic Regression for Classification) of the paper.

  1. 520 Distribution Pairs

This folder contains 520 distributions pairs we collected from the 83 R datasets. Each row in a csv file is a data point. There are three important columns in each csv file: “newGroupName”, “newAttributeName” and, “attributeValue”. “newGroupName” is the name of the group to which a data point belongs, “newAttributeName” is the name of an attribute. “attributeValue” is the value that a data point has for the attribute.

  1. Bh Coefficient + Labels.csv

We used SPSS to model the data. This csv file is the input to SPSS for modelling. The “fileName” column is the file name of a distribution pair inside the “520 Distribution Pairs” folder. “BhCoefficicent” is the Bhattacharyya coefficient for a distribution pair and “class” is the label of distribution pair we collected from people.

  1. Code for Relabelling 150 Marginal Cases

It is the code of the interface we used for asking 10 subjects to relabel 150 marginal cases. You need Python 3 and Flask to run the interface. To run the code, go to the directory using the console if you are using a Mac and enter “python server.py”. For those who have difficulties running the tool, we provide the screenshots of the labelling tool as follows:

  1. Data Collected From Relabelling

Each csv file in the folder contains two columns: “filename” that is the file name of a distribution pair in the “520 Distribution Pairs” folder and “class” that is the label provided by a subject.

  1. SPSS Modelling Result

It is the screenshot of the output generated by SPSS. “Bh Coefficient + Labels.csv” is used as the input to SPSS. The following explains how the model in Sec. 5.3.2 corresponds to the SPSS output. Formally, our logistic regression model is

  1. R Code for Computing Model Accuracy.txt

This text file contains the R code for computing the cross-validation accuracy of our logistic regression model using 10-fold cross validation. The input file is “Bh Coefficient + Labels.csv”. The cross-validation accuracy is around 78.1%. We envision that this accuracy can be improved by using more advanced machine learning models and more predictor variables.


Explanations for the “User Study” Folder

The “User Study” folder contains all the materials for the qualitative user study in Sec. 6 (Evaluation) of the paper. The materials inside are described as follows:

  1. Training Session

This folder contains the car dataset “cars.csv” we used for the training session, a link to the tutorial video we showed to the participants and the training tasks to get participants familiar with Duet’s interface.

  1. Analysis Session

During each analysis session, we first showed the participants “Task Description.pdf”. We then gave them some time to review either “Description for College Dataset.pdf” or “Description for City Dataset.pdf” to get them familiar with the dataset they were about to analyze. The “Datasets” folder contains the city dataset and the college dataset we used for the analysis session.

  1. Interview and Survey

At the end of the study, we first showed them “Three Main Features of the Tool.pdf” to ensure the participants know the terminology like “minimal specification” we are going to use in the interview. This folder contains the questions for the semi-structured interview (“Interview Questions.pdf”) and the survey questions (“Survey Questions.pdf”).

  1. Questionnaire Results.pdf

It is a summary of the survey result.


Clarification of Literature Review

We drew inspiration from the literature to develop the idea of minimal specification. As described in the paper, there are two high-level considerations in designing minimal specification:

  1. To address execution barriers, minimal specification allows users to focus on what they know (the objects of interest in answering a pairwise comparison question) rather than what they might not know (system operations).

  2. To address interpretation barriers, the recommendations offered should be explained in order to result in better understanding of the recommendations and stronger feeling of trust.

The following two sections describes the basis of these two components.

Addressing Execution Barrier

This idea of allowing users to focus on what they know by shielding them from what they might not know is grounded in the following three ideas that have been explored by the HCI community:

Addressing Interpretation Barrier

Explaining the recommendations help users understand why they are recommended and inspire users’ trust in the system. This idea is grounded in the movement of explainable artificial intelligence (XAI).

About

Supplemental Materials for "Duet: Helping Data Analysis Novices Conduct Pairwise Comparisons by Minimal Specifications"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages