Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess 3.4.0 performance with large databases #7304

Closed
asmecher opened this issue Sep 17, 2021 · 9 comments
Closed

Assess 3.4.0 performance with large databases #7304

asmecher opened this issue Sep 17, 2021 · 9 comments
Assignees
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.

Comments

@asmecher
Copy link
Member

asmecher commented Sep 17, 2021

OJS 3.4.0 introduces a lot of SQL changes that should improve performance but need to be tested.

In particular, assess the performance with a large database and the following operations/tasks:

  • Submission lists/filters
  • Reviewer selection
  • User management
  • OAI Identify ("first record date")
  • Merge Users
  • 3.3.x to 3.4.x upgrade
  • Large tables of contents (@diegoabadan may have test data)
    • pkp/ojs@75d8dfd for stable-3_3_0 should improve performance somewhat (to be included in stable-3_3_0-15)
@asmecher asmecher self-assigned this Sep 17, 2021
@asmecher asmecher added this to the 3.4 milestone Sep 17, 2021
@asmecher
Copy link
Member Author

(@alexxxmendonca, I think this captures the pain points identified by the SciELO/OJS working group, but please let me know if there were others.)

@diegoabadan
Copy link
Contributor

@asmecher , is better performance expected when there are many articles published in the table of contents?

Referring to #6511

Is it desirable to run these tests now, from the git code?

@asmecher
Copy link
Member Author

@diegoabadan, it's too early to start tests, but I've added that scenario to the list. I may ping you later for some test data; we don't have anything with that large a number of articles in a single TOC.

@NateWr NateWr added the Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day. label Sep 20, 2021
@diegoabadan
Copy link
Contributor

Thanks Alec.

We have more than one case to test. :)

@mpbraendle
Copy link
Contributor

Hi Alec - if that helps: we have a journal with about 6000 articles, but issues usually have 10-40 articles (so not a very large TOC).

@asmecher
Copy link
Member Author

asmecher commented May 2, 2022

@mpbraendle, that's a fairly typical size and not likely to turn up bottlenecks in our environment. I think the delays you're experiencing will need to be resolved separately from this issue -- as posted here: https://forum.pkp.sfu.ca/t/submissions-again-loading-slowly/72688/8

@NateWr NateWr moved this to Todo in Infrastructure May 9, 2022
@asmecher asmecher moved this from Todo to Under Research in Infrastructure May 9, 2022
@ajnyga
Copy link
Collaborator

ajnyga commented Jun 24, 2022

We are going to 3.3 soon and will try to do testing with 3.4 right after that. This would mean probably october(ish).

@jonasraoni jonasraoni mentioned this issue Oct 15, 2022
4 tasks
asmecher added a commit to pkp/ojs that referenced this issue Feb 25, 2023
asmecher added a commit to pkp/ojs that referenced this issue Feb 25, 2023
asmecher added a commit to asmecher/pkp-lib that referenced this issue Mar 28, 2023
asmecher added a commit that referenced this issue Mar 28, 2023
@asmecher
Copy link
Member Author

asmecher commented Mar 28, 2023

We'll need to audit warnings encountered with a large upgrade database. The categories of warnings from the SciELO data set are...

  • Nulling non-existent genre_id ### in submission_files. [AS]
  • Nulling non-existent source_submission_file_id ### in submission_files. [AS]
  • Nulling non-existent uploader_user_id ### in submission_files. [AS]
  • Removing orphaned controlled_vocab_entry_settings for missing controlled_vocab_entry_id ### [AS]
  • Removing orphaned edit_decisions entries for missing submission_id ### [AS]
  • Removing orphaned edit_decisions entry for missing editor_id ### [AS]
  • Removing orphaned email_templates_settings for missing email_id ### [AS]
  • Removing orphaned filters for missing filter_group ### [AS]
  • Removing orphaned library_files for missing context_id ### [AS]
  • Removing orphaned library_files for missing submission_id ### [AS]
  • Removing orphaned note entry ID ### with nonexistent query ### [AS]
  • Removing orphaned query_participants for missing user ID ### [AS]
  • Removing orphaned review_assignments entry ID ### with submission_id ### [AS]
  • Removing orphaned review_files for missing review_id ### [AS]
  • Removing orphaned settings for missing library_file ### [AS]
  • Removing orphaned settings for missing notification ID ### [AS]
  • Removing orphaned settings for missing submission file ID ### [AS]
  • Removing orphaned submission_comments entry for missing author_id ### [AS]
  • Removing orphaned submission_comments entry for missing submission_id ### [AS]
  • Removing orphaned submission_files entry ### with non-existent submission. [AS]
  • Removing orphaned submission ID ### with nonexistent context ID ###. [AS]
  • Removing orphaned user_interests for missing user_id ### [AS]
  • Reset ### announcements with orphaned (non-null) announcement types to no announcement type. [AS]

@asmecher
Copy link
Member Author

These seem fine at a glance with a large data set. Closing for internal testing.

@github-project-automation github-project-automation bot moved this from Under Research to Done in Infrastructure Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping:1:Todo Any dependency management or refactor that would be nice to have some day.
Projects
Status: Done
Status: Done
Development

No branches or pull requests

6 participants