Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation sheet API #297

Closed
wants to merge 4 commits into from
Closed

Conversation

richiejp
Copy link
Contributor

@richiejp richiejp commented Nov 19, 2024

The first two commits refactor the current index filter code. This actually
introduces more lines of code than it saves, but decouples the filtering from
the indexing a little.

The last commit adds a higher level API for extensions to use on top of the first. An example of what this API looks like is below, this adds an extra column to the active buffer counting the rows.

// Similar to a regular ZSV row handler used in ext_parse_all
void my_transformer_row_handler(void *ctx) {
  zsvsheet_transformation trn = ctx;
  struct transformation_context *priv = zsv_cb.ext_sheet_transformation_user_context(trn);
  zsv_parser parser = zsv_cb.ext_sheet_transformation_parser(trn);
  zsv_csv_writer writer = zsv_cb.ext_sheet_transformation_writer(trn);

  size_t j = zsv_cb.cell_count(parser);
  for (size_t i = 0; i < j; i++) {
    struct zsv_cell c = zsv_cb.get_cell(parser, i);
    zsv_writer_cell(writer, i == 0, c.str, c.len, c.quoted);
  }

  priv->col_count += j;

  if (!priv->row_count)
    zsv_writer_cell_s(writer, 0, (const unsigned char *)"Column count", 0);
  else
    zsv_writer_cell_zu(writer, 0, priv->col_count);

  priv->row_count++;
}

zsvsheet_status my_transformer_command_handler(zsvsheet_proc_context_t ctx) {
  struct transformation_context my_ctx = {
    .col_count = 0,
    .row_count = 0,
  };

  // TODO: This probably should happen in another worker thread and while that is happening the status should display
  //       that some work is in progress. The extension author will maybe want to have control over the status message.
  return zsv_cb.ext_sheet_push_transformation(ctx, &my_ctx, my_transformer_row_handler);
}

TODO (either in this PR or another):

  • run the transformation in a background thread and keep the user updated by changing the status
  • figure out where custom_prop_handler and opts_used should come from?
  • tests

@richiejp richiejp force-pushed the transformer branch 4 times, most recently from 9953e04 to 6d64076 Compare November 20, 2024 16:31
@richiejp richiejp marked this pull request as ready for review November 20, 2024 16:40
@liquidaty
Copy link
Owner

@richiejp looks great. Just a few questions/comments so far (have a few others but will reserve for after further review)

  1. (this is somewhat of a nit): I'm don't quite follow why various names include "push"-- is referring to the style of parsing (push parsing)?
  2. Some minor harmless but extraneous code bits can be removed:
  // these two lines unnecessary following `filename = zsvsheet_buffer_data_filename(buff);
  // since that function already does this
  if (!filename)
    filename = zsvsheet_buffer_filename(buff);
// opts_used only matters the very first time after a user has specified property overrides
// (such as adding a `--header-span` command-line option) and *before* the file it applies
// to has been opened by `zsv_new_with_properties()`. Once `zsv_new_with_properties(*opts)`
// has been called, the effect of opts_used is already reflected in the passed `opts` arg and in turn
// is saved in the ui_buffer structure
struct zsvsheet_transformation_opts {
  ...
  const char *opts_used; // this can be omitted
};

// note: custom_prop_handler should also be tracked, but this only matters for code that is external to
// `zsv` so this implementation can be deferred with a to-do note in the comments (as there already is)
// It may also be useful / desirable to consolidate the tracking of this info here, with the same tracking
// performed by sqlite3_csv_vtab-mem.c, in which case the global list could be kept in the CLI scope
// a pointer to it passed to the sql extension module, and that single shared memory structure used by
// both the sql module and `sheet` (and any others that may need)

@richiejp
Copy link
Contributor Author

thanks @liquidaty!

  1. Push in {zsvsheet,ext}_sheet_push_transformation actually refers to pushing a new buffer to the top of the stack. It's quite ambiguous so I'll think of some alternatives.
  2. OK, so opts_used can be removed and custom_prop_handler can be deferred.

@richiejp
Copy link
Contributor Author

So far my only suggestion is to replace "push" with "buffer", so that it becomes ext_sheet_buffer_transformation or just to remove "push".

On a separate note I think it is possible to "stream" the transformation with relatively few alterations by reading the file that is being written to. Albeit with the possibility of parser errors if quotes are left open, but this would allow the user to see the transformation in progress. If the builtin filter were also made to work this way then the user would see filter results slowly appear when filtering a very large file with most of the entries filtered out.

Furthermore the index could also be built incrementally and used before it has completed.

First though I am working on just putting the transformation into a background thread.

Fixes a number of memory errors and other issues. Some introduced with
a rebase, but mostly from the transformation handler's implementation.
@richiejp
Copy link
Contributor Author

Rebased onto main and fixed some memory errors. I created a version which offloads the work onto another thread and redisplays the file as new results are read. I'll put that in a separate PR.

@richiejp
Copy link
Contributor Author

At this point I think this could be closed and superseded by #304

There are some issues fixed in #304 that would need to be backported here to the extent #304 is probably now in better working condition than this PR.

@richiejp richiejp closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants