Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix and speed up CSV export #8863

Merged
merged 5 commits into from
Aug 17, 2023
Merged

fix: fix and speed up CSV export #8863

merged 5 commits into from
Aug 17, 2023

Conversation

stephanegigandet
Copy link
Contributor

The sort is now done after the CSV file is generated.

  • Profiled export_database.pl to try to speed it up. The largest amount if time was spent in get_string_id_for_lang() (mostly called by display_taxonomy_tag) :

(all numbers below on a small database of 12K products)

Calls P F ExclusiveTime InclusiveTime Subroutine
892196 19 9 7.51s 9.86s ProductOpener::Store::get_string_id_for_lang
745106 3 2 5.28s 13.7s ProductOpener::Tags::display_taxonomy_tag
250 1 1 4.99s 4.99s BSON::XS::_decode_bson (xsub)
4607286 7 1 2.77s 2.77s main::CORE:match (opcode)
24692 1 1 1.92s 1.92s ProductOpener::Tags::get_all_taxonomy_entries

The display_taxonomy_tag() calls are used to translates values of fields like categories_tags in their English and French names. There are lots of repeated values, so we can try caching them.

Added a cached_display_taxonomy_tag() function in Tags.pm to keep a cache (simple hash).

After:

Calls P F ExclusiveTime InclusiveTime Subroutine
249 1 1 5.22s 5.22s BSON::XS::_decode_bson (xsub)
4607286 7 1 2.73s 2.73s main::CORE:match (opcode)
24692 1 1 1.80s 1.80s ProductOpener::Tags::get_all_taxonomy_entries
164748 19 9 1.52s 2.13s ProductOpener::Store::get_string_id_for_lang

Typical runs:

Before:

--- not cached:

real 0m30.468s
user 0m28.947s
sys 0m1.093s

real 0m31.633s
user 0m30.102s
sys 0m1.101s

real 0m32.819s
user 0m31.207s
sys 0m1.173s

After:

--- cached:

cached_display_taxonomy_tag_calls: 742838
cached_display_taxonomy_tag_misses: 15390

real 0m26.765s
user 0m25.185s
sys 0m1.158s

real 0m28.247s
user 0m26.808s
sys 0m1.012s

real 0m25.590s
user 0m24.114s
sys 0m1.068s

Copy link
Member

@CharlesNepote CharlesNepote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Have you try how much time take the sort?

Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool !

scripts/export_database.pl Outdated Show resolved Hide resolved
scripts/export_database.pl Show resolved Hide resolved
Comment on lines 596 to 597
print "cached_display_taxonomy_tag_calls: $cached_display_taxonomy_tag_calls\n";
print "cached_display_taxonomy_tag_misses: $cached_display_taxonomy_tag_misses\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remove ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it, thanks

@alexgarel
Copy link
Member

you need a make lint_perltidy it seems

@codecov-commenter
Copy link

Codecov Report

Merging #8863 (308b991) into main (2db5929) will decrease coverage by 0.02%.
Report is 2 commits behind head on main.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #8863      +/-   ##
==========================================
- Coverage   48.61%   48.59%   -0.02%     
==========================================
  Files         118      118              
  Lines       22069    22077       +8     
  Branches     4903     4904       +1     
==========================================
  Hits        10728    10728              
- Misses      10038    10046       +8     
  Partials     1303     1303              
Files Changed Coverage Δ
lib/ProductOpener/Tags.pm 40.60% <0.00%> (-0.19%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@stephanegigandet stephanegigandet merged commit 8e69a1d into main Aug 17, 2023
@stephanegigandet stephanegigandet deleted the fix-csv-export branch August 17, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants