Analysis plugins #1427

lukavdplas · 2023-11-29T14:42:34Z

lukavdplas
Nov 29, 2023
Maintainer

This is an idea for a major feature + refactor I've discussed with @jgonggrijp . The core idea is to support adding backend "plugins" for analysis / visualisations, and convert existing visualisations to separate plugins.

There are a few things I'd hope to accomplish with this:

Modularise the application
Make it possible to build integration of I-analyzer with other applications, without losing its generalisability
Make I-analyzer more suitable to be hosted by other teams and interface with their software

How it works

Fundamentally, you would write an independent python module or package that is responsible for some kind of analysis. Our current visualisations (results count, search term frequency, wordcloud, related words, etc.) would all work as such modules.

When you set up an I-analyzer instance, you include these modules in the backend settings.py which will enable that analysis for your environment.

Of course, modules would need to conform to an API that I-analyzer expects to work with. If you're turning, say, the wordcloud into a plugin, the module should ultimately offer analysis on a set of documents for which the user has made a query. You could end up with the following endpoints:

Some metadata about the visualisations offered: name, description
A method that determines whether the visualisation should be available in a particular context
A method that returns a specification for a short form to set parameters for the analysis.
A method that takes a query, an elasticsearch client, configured parameters (per the specification above) and returns results (more on that later)

Right now, we have two types of analysis that I'd want to convert to this plugin structure, namely:

Analysis on a documents query
Analysis on a word model query

For generalisability, I would add a third option, namely:

Analysis on a single document

Results format

This is a tricky question. In our current visualisations (or the ones that are the most neatly structured), we return a JSON with the data (e.g. a value per year), and let the frontend figure out how to turn that into an interactive chart.

You could use this approach and generalise the data format somewhat, but it's quite limited. You can only use visualisations that we've written frontend support for, so you can't write a plugin for a network or map visualisation until we add that to the frontend.

My proposal would be that backend modules return JSON specifications of visualisations using the vega / vega-lite grammar. Vega is a javascript visualisation library, but importantly for us, it is entirely declarative, so you can fully define (interactive!) visualisations in a JSON object. Vega also supports a wide range of visualisation types (see their examples page).

I imagine we'll also want the module to present results in a format suitable for table data / CSV downloads, but that will be the smaller hurdle.

An even more powerful option would be that modules can essentially return a web component to embed in our frontend. That gives you a lot of power, but there is more complexity in both supporting or developing such modules.

For single-document analysis, you could also consider an option to return annotations on the text.

Extra hooks

We may want to consider adding extra "hooks" for plugins to interact with I-analyzer. For example, a module might add analysed multifields to an elasticsearch mapping, or provide extra options in the corpus configuration. You might also consider making other features plugin-based. None of that is immediately relevant, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis plugins #1427

{{title}}

Replies: 0 comments

Select a reply

Analysis plugins #1427

lukavdplas Nov 29, 2023 Maintainer

How it works

Results format

Extra hooks

Replies: 0 comments

lukavdplas
Nov 29, 2023
Maintainer