Improve ${} reference conversion performance #739

lognaturel · 2024-11-14T21:15:53Z

@lindsay-stevens from getodk/central#171 (comment):

complexity generally means lots of things that have to be parsed, checked, cross-referenced / looked up: e.g. pyxform reference replacements (like ${other_item}), nested repeats, pulldata/instance calls, translations/media, etc.

As for the 496KB form with 6728 rows, it has: ~1000 answerable questions, ~2000 notes, ~500 calculate items, ~1000 groups (up to 4 levels of nesting), ~2000 constraints, ~2000 pyxform references, ~50 choice lists with a total ~500 options. No repeats, media, or translations. Uncompressed file size is ~4MB, of which ~2MB is the survey sheet, and ~200KB for ~400 document comments. On the survey sheet, when saved as CSV (excluding ",) ~20% of the 2M characters are non-latin unicode script (I think this may be why it's sluggish to open and use in Excel/LibreOffice). The converted XForm document size is 5MB.

When I run XLS2XForm conversion with pyxform master (b65e727 ~v2.2.0), the 496KB file takes ~5 minutes. It takes ~6 seconds to read the data and prepare the initial internal form structure, and the rest is spent in the survey.to_xml step. Of that, ODK Validate takes ~6s; most of that time is explosive recursion of the internal survey nested dict structures e.g. SurveyElement get_lineage (go self.parent recursively, 8M calls), iter_descendants (self.children recursively, 60M calls), getattr (self.get recursion, compounded by a lot of dict.copy(), 152M calls). Much of which is initiated by get_xpath() (8M calls).

So, definitely room for improvement in pyxform in regards to processing large forms. These kinds of issues don't become obvious until there's a certain quantity of e.g. cross references, groups, nesting levels, etc.

The text was updated successfully, but these errors were encountered:

lognaturel added this to the Next milestone Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ${} reference conversion performance #739

Improve ${} reference conversion performance #739

lognaturel commented Nov 14, 2024

Improve ${} reference conversion performance #739

Improve ${} reference conversion performance #739

Comments

lognaturel commented Nov 14, 2024