Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Events.beta.csv format is imperfect #506

Open
atruskie opened this issue Jul 15, 2021 · 4 comments
Open

Events.beta.csv format is imperfect #506

atruskie opened this issue Jul 15, 2021 · 4 comments

Comments

@atruskie
Copy link
Member

Actual behaviour:

The new CSV output format has some problems:

  • event duration is not included
  • event frequency bands are not included
  • sometimes profiles are not included (like for the boobook)
  • FileName is the segmented file name
  • ResultStartSeconds and EventStartSeconds are duplicates
  • SegmentStartSeconds and SegmentDurationSeconds are verbose and duplicate ResultMinute
  • All floating point numbers should be truncated but aren't

Expected behavior:

The above not to happen.

How to reproduce this bug:

  1. Run a multi recogniser, investigate the results

Additional Details

AP

Version: v21.7.0.4

Some example data: all.txt

@towsey
Copy link
Contributor

towsey commented Jul 16, 2021

Fantastic that you are dealing with this. I have been meaning to log it myself as an issue. Is it possible for events where appropriate to include additional info? For example, for oscillation events to include the oscillation rate and for harmonics to include the interval. And also to include score where one is available.

@atruskie
Copy link
Member Author

Great question.

In short: not really.

CSV is great when all the events have the same shape/type of data. The reason for most of the above issues is we output the results based on the base class, which is EventCommon I think, which lacks the end/low/high properties.

I think, given our nature, it's safe to try and output those extra columns. But if any of that data is missing, we'll get a lot of sparse columns.

But for even more specific events, then we'll definitely end up with a lot of sparse columns. For a recogniser that produces oscillation events, most rows would have the oscillation column filled. But for a multi-recogniser case, most rows would have an empty oscillation column.

To achieve the flexibility we want here, we need to be able to encode arbitrary data structures, which is what the JSON output is for. Each object inside a JSON result can have whatever properties we'd like it to have.

Both of these formats are inefficient for their own reasons, and have strengths over the other.

I think I want to make the CSV useful and dense by default for the common case. And leave the JSON for outputting complex data.

@towsey
Copy link
Contributor

towsey commented Jul 27, 2021

The additional info I would like to add is not complex - i,e, just scalars. It could be done by adding another one or two properties to EventCommon called Score1 and Score2 that would be in addition to the existing Score property. You could then add an event property such as periodicity by assigning it to one of the score fields. The documentation would describe what information was provided in each of the score fields. Trouble is that if we wait for json parsers etc, it will be long time and more difficult for the user.

@atruskie
Copy link
Member Author

Assigning data to columns with generic names is not something we will do. Descriptive names are vital to people understanding what data they're looking at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants