Analysis plugins #1427
lukavdplas
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is an idea for a major feature + refactor I've discussed with @jgonggrijp . The core idea is to support adding backend "plugins" for analysis / visualisations, and convert existing visualisations to separate plugins.
There are a few things I'd hope to accomplish with this:
How it works
Fundamentally, you would write an independent python module or package that is responsible for some kind of analysis. Our current visualisations (results count, search term frequency, wordcloud, related words, etc.) would all work as such modules.
When you set up an I-analyzer instance, you include these modules in the backend
settings.py
which will enable that analysis for your environment.Of course, modules would need to conform to an API that I-analyzer expects to work with. If you're turning, say, the wordcloud into a plugin, the module should ultimately offer analysis on a set of documents for which the user has made a query. You could end up with the following endpoints:
Right now, we have two types of analysis that I'd want to convert to this plugin structure, namely:
For generalisability, I would add a third option, namely:
Results format
This is a tricky question. In our current visualisations (or the ones that are the most neatly structured), we return a JSON with the data (e.g. a value per year), and let the frontend figure out how to turn that into an interactive chart.
You could use this approach and generalise the data format somewhat, but it's quite limited. You can only use visualisations that we've written frontend support for, so you can't write a plugin for a network or map visualisation until we add that to the frontend.
My proposal would be that backend modules return JSON specifications of visualisations using the vega / vega-lite grammar. Vega is a javascript visualisation library, but importantly for us, it is entirely declarative, so you can fully define (interactive!) visualisations in a JSON object. Vega also supports a wide range of visualisation types (see their examples page).
I imagine we'll also want the module to present results in a format suitable for table data / CSV downloads, but that will be the smaller hurdle.
An even more powerful option would be that modules can essentially return a web component to embed in our frontend. That gives you a lot of power, but there is more complexity in both supporting or developing such modules.
For single-document analysis, you could also consider an option to return annotations on the text.
Extra hooks
We may want to consider adding extra "hooks" for plugins to interact with I-analyzer. For example, a module might add analysed multifields to an elasticsearch mapping, or provide extra options in the corpus configuration. You might also consider making other features plugin-based. None of that is immediately relevant, though.
Beta Was this translation helpful? Give feedback.
All reactions