Skip to content

New HEPData Explore

Juan Luis Boya García edited this page Apr 21, 2016 · 4 revisions

New HEPData Explore design

Weaknesses of the current HEPData Explore:

  • Flash and reload each time primary filters are modified. The interaction is slow and uncomfortable.
  • The filtering interface needs to be compacted.
  • Flattening of data.
  • Data is counted by points instead of tables, which may be more relevant.
    • crossfilter is not being taken advantage of at all but has caused design decisions like the flattening of data.
  • No custom charts.
  • It's impossible to plot two independent variables against each other.

Solving the flattening problem

At the server side

In the new HEPData Explore system, the remote database will index only publications and tables (instead of arbitrary table groups made of variable pairs as done before). This simplifies querying since most, if not all, filterable fields are from the table.

The publications will be indexed with this fields:

  • Comment (actually the title of the publication)
  • Inspire record
  • Tables. For each one:
    • table_num (within the publication)
    • cmenergies_min
    • cmenergies_max
    • reactions (list of strings)
    • observables (list of strings)
    • phrases (list of strings)
    • dep vars (list of strings)
    • indep vars (list of strings)
    • data points

Data points will not necessarily be indexed by the server. They will be stored in such a way that each row of the original table is one row of data, maintaining all the relationships between all the variables.

At the client side

Since the document level at the server side continues to be the publication, server queries will still retrieve entire publications.

On the client side, for plotting purposes data will be indexed by tables, e.g. having a iterable list of all tables (currently it's being indexed by data points).

When a plot is requested, a couple of variables will be specified for the X and Y axis respectively. They may be any combination of dep and indep vars. Then, tables will be filtered in order to get those that have both specified variables. For each row of each table, the chosen variables will be read and plotted.

Further (but complex) improvement: A third variable or expression may be added, linked to the size of the dot. A language capable of writing mathematical expressions and reading tables variables would be needed.

Solving the server interaction problem

The current system destroys the interface whenever server side filters are modified. This is uncomfortable and unconvenient for exploration.

In this regard, we will take advantage of the fact that server querying is coarse-grained, returning entire publications data. The application will have an in memory set of publications returned from the last query. After new queries, the set will be updated with the difference from the past query.

Tables that went missing will have all their data removed from plots. Empty plots will disappear. New variable pairs will get a new plot, should there are free plots. For this purpose, pairs with the greatest number of tables will be chosen with the greatest priority, but they will never kick out existing plots as long as they have data points.

A regenerate plots button may be added that would drop all the existing plots and replace them with the 8 variable pairs with the greatest number of tables under the currently active filters.

Custom plots

A custom plot button will show a popup dialog asking for a specific variable pair. As explained before, both dep and indep vars will be allowed.

A preview of the plot will be shown in the dialog.

To make custom plot creation easier, once either X or Y is set, suggestions for the other will be sorted in such a way that variables that actually appear in pairs and therefore are capable of generating plots appear first. They may also appear in bold or other color to hint they are good matches.

Custom plots will be pinned by default so they will not be deleted automatically in the case the filters are modified in such a way that they no longer contain data points.

Better plots

Each plot will live in a visible box. The boxes could be reorder by dragging and dropping.

Each plot will have the following widgets:

  • A scatterplot showing the data, the X and Y axes, their values and the name of the variables.

  • An indication of the scale for each axis, log or lin. Clicking it will switch between them. Logarithmic scale will automatically be chosen if there is more than a 10-base order of magnitude between the minimum and the maximum value of that axis scale.

  • A close button to get rid of the plot.

  • An indication of how many tables are being plotted, from how many publications.

  • Data points of different tables within the same plot will show up as a different color.

  • A download button that will allow retrieving the data from the tables plotted in a custom format.

  • A view publications button that will show a popup with the same plot on the left and a list of publications and tables on the right. Each table will have a box showing what color it's being represented with in the plot. Hovering a table or a publication will highlight the data points belonging to it.

  • A pin toggle button. Pinned graphs will never be deleted even if they become empty.

  • A edit button will show the custom plot dialog, even if the plot was automatically generated. From there, the user may change the variables. Editing an automatic plot turns it into a custom plot and pins it.

Clone this wiki locally