-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Isolate expressions per-context and ensure memory stability #1431
Conversation
WIP WIP working WIP WIP WIP
working for realsies properly construct vocab
725d256
to
478eaef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Check my review notes for a some debug cruft leftover.
There is a mostly-trivial performance regression here of about 5%, which is IMO acceptable for the massive improvement in code organization, test coverage, inline comments and general good engineering. We discussed t_data_table
construction costs at length offline, and I do not suspect these are the cause of this slowdown - but IMO we should measure the impact of this anyway and more generally make some project-wide rules about shared_ptr
vs. const ref vs. std::move
just so we're consistent (as this is completely ad-hoc right now :) ).
@@ -250,6 +250,7 @@ if (PSP_WASM_BUILD) | |||
-s DISABLE_EXCEPTION_CATCHING=0 \ | |||
-s ASSERTIONS=2 \ | |||
-s DEMANGLE_SUPPORT=1 \ | |||
-s SAFE_HEAP=1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this setting basically makes debug-build Perspective unusable (though it is quite useful for finding memory leaks!)
computed_function::order order_fn = computed_function::order(vocab); \ | ||
computed_function::upper upper_fn = computed_function::upper(vocab); \ | ||
computed_function::lower lower_fn = computed_function::lower(vocab); \ | ||
computed_function::length length_fn = computed_function::length(vocab); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason this needs to be a macro and not a function? The scope-capture of vocab
specifically is something I'd like to stylistically avoid if possible. (Looks also like recompute
is deprecated?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this macro additionally captures the vocab - it is basically just inlining this code into where the macro is invoked, and the macro doesn't take vocab
as a parameter. AFAIK since it's just a text replacement macro, it shouldn't capture vocab
or have any notion of what vocab
is during the preprocessor step. I think leaving it as a macro works well for readability in the compute
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what macro variable capture means. There's little risk of vocab
being mis-used here as you mention the macro won't compile, but for complexity analysis purposes I think its best if we avoid variable capture (in this case with a macro argument) and try where possible to force scope (with {}
, though this doesn't work in this case since it introduces symbols) in our macros.
Regarding readability - this is identical to a function call, no?
@@ -42,9 +42,25 @@ t_ctx_grouped_pkey::init() { | |||
m_tree = std::make_shared<t_stree>(pivots, m_config.get_aggregates(), m_schema, m_config); | |||
m_tree->init(); | |||
m_traversal = std::shared_ptr<t_traversal>(new t_traversal(m_tree)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR but - we should not compile this as there is no public API (nor tests ...)
const std::string& colname, | ||
const std::vector<t_tscalar>& pkeys, | ||
std::vector<t_tscalar>& out_data) const { | ||
std::shared_ptr<t_data_table> master_table = m_gstate->get_table(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go within the else
block
|
||
for (const std::string& column : other_schema.m_columns) { | ||
// Only append unique columns | ||
if (!schema.has_column(column)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative to this seems an error ...
// Check if this expression is already in the array, if so then | ||
// we need to replace the expression so the last expression tagged | ||
// with the alias is the one that is applied to the engine. | ||
if (expression_idx_map[expression_alias] !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is impossible without a malformed config
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is possible in the current UI, if one creates an expression column and then replaces it with another one - the viewer API will pass both expressions sequentially: ["// a \n 123", "// a\n upper('abcde')"]
. With all the expression UI/fix PRs merged we can fix this on the viewer side.
@@ -160,22 +159,15 @@ def _parse_expression_strings(expressions): | |||
|
|||
alias_match = re.match(ALIAS_REGEX, expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: ALIAS_REGEX
is wrong (corrected in JS but not python ..)
const expression_schema = await view.expression_schema(); | ||
expect(Object.keys(expression_schema).length).toEqual(1); | ||
await view.delete(); | ||
}, 3000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These looks correct on-the-face, but why 3,000? In theory, these test should run identically with 1 or 1,000,000 - iteration checks are just there for offensively slow operations.
auto joined_delta = delta->join(ctx_expression_tables->m_delta); | ||
auto joined_prev = prev->join(ctx_expression_tables->m_prev); | ||
auto joined_current = current->join(ctx_expression_tables->m_current); | ||
auto joined_transitions = transitions->join(ctx_expression_tables->m_transitions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do this join
once per context? The column sets do not change per gnode.process()
call.
This PR refactors the way that expressions are handled and stored in Perspective - previously, expressions columns were calculated and stored on the main table inside
t_gstate
and its intermediate tables (prev, current, delta, transitions, etc.). By having all expressions on the gnode's tables, expressions were "real" columns to the engine, and it guaranteed that all the features available to real columns (pivot, filter, etc.) were available to expression columns without having to do large refactors to the code that generated aggregates, pivots etc.However, this meant that expressions were not "isolated" to the
View
they were created in - a user could not create a new expression namednew column
offloat
type, and then replace it immediately with an expression that returns astring
, as the main table schema had already recordednew column
as afloat
column and could not overwrite it.Now, expression columns are stored per-context on data tables that only contain the expression columns belonging to the
View
. This means that eachView
can create expression columns that use the same aliases as otherViews
, and they can be of any type:And now, expression columns are guaranteed to be deleted when
view.delete()
is called - when the context is deleted, the data tables that contain their expressions are cleaned up from the heap.To implement this, I added a
join
function tot_data_table
- given two data tables, it creates a new data table that references the columns from the source tables using their shared pointers without copying any of the underlying memory or allocating new data stores on the heap. When the context receives data from the gnode duringnotify
, it is provided at_data_table
that contains the "real" columns joined with the expression columns from the context being notified. This means that the context will continue to treat expression columns as "real" columns so there is no loss of functionality.Some work remains on the UI to allow for duplicate expression aliases to be created - right now,
perspective-viewer
does not allow aliases to be reused. Because #1426 changes the expressions UI "bigly", this PR does not contain any UI changes, and those changes will be added on in a new PR before the Perspective 0.9.0 release.Changelog
t_expression_tables
structt_vocab
for string expressions are now per-context, instead of a static global vocab for all expressionsView
documentation to reflect the expressions API