Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags performance issue Forum::tags #2177

Closed
luceos opened this issue May 19, 2020 · 18 comments
Closed

Tags performance issue Forum::tags #2177

luceos opened this issue May 19, 2020 · 18 comments
Assignees
Labels
org/keep Issues we want to keep open prio/high
Milestone

Comments

@luceos
Copy link
Member

luceos commented May 19, 2020

Bug Report

Current Behavior

    public function loadTagsRelationship(WillSerializeData $event)
    {
        // Expose the complete tag list to clients by adding it as a
        // relationship to the /api endpoint. Since the Forum model
        // doesn't actually have a tags relationship, we will manually load and
        // assign the tags data to it using an event listener.
        if ($event->isController(ShowForumController::class)) {
            $event->data['tags'] = Tag::whereVisibleTo($event->actor)
                ->withStateFor($event->actor)
                ->with(['parent', 'lastPostedDiscussion'])
                ->get();
        }
    }

https://github.com/flarum/tags/blob/fa83f496b0004e31ef2f4fc854bfcdf6a4cfa3c3/src/Listener/AddForumTagsRelationship.php#L46-L58

This piece of code is responsible for a performance penalty with forums that have many tags. There are a few things at fault here:

  • Loading all tags into the js app on load, even though only 5 (?) are initially shown on the side bar and the index can load them through the api as an include.
  • This specific piece of code seems to cause 2 queries per tag. For a forum with 100+ tags, this boils down to an additional two seconds loading time (first byte that is).

Suggested solution

What needs to be done is:

  • Load only those tags on page load that are actually shown on the sidebar
  • Load the additional tags on request
  • Include the tags in the discussion list request

or:

  • Identify and solve the issue with the Relation listener

Environment

  • Flarum version: all of them, but tested on beta 13
@askvortsov1
Copy link
Member

This PR: flarum/tags#84 should already reduce strain substantially. From there:

  • It would be nice to figure out why ShowForumController is running twice: could that be a rogue extension? It doesn't seem to be that way in core.
  • We could look into loading only the necessary tags (without the lastDiscussion relation too), and then loading the full set (with lastPostedDiscussion) from /api/tags the first time that the Tags page is visited.

@rafaucau
Copy link
Contributor

After migrating tags from the old forum to Flarum, I wondered why the forum was loading indefinitely. Well, we have the answer - I have 2000+ tags 😄

I like the first solution:

What needs to be done is:

  • Load only those tags on page load that are actually shown on the sidebar
  • Load the additional tags on request
  • Include the tags in the discussion list request

@askvortsov1
Copy link
Member

@rafaucau we're now investigating another possible cause: by any chance, do you have the fof follow tags extension installed?

@rafaucau
Copy link
Contributor

@askvortsov1 Yes

@askvortsov1
Copy link
Member

Then that's likely it: an issue with FoF follow tags means that multiple SQL queries are made per tag whenever loading a page. I'm gonna close this for now unless someone can reproduce this on an install with only core extensions.

@rafaucau
Copy link
Contributor

@askvortsov1 After disabling FoF follow-tags I don't see any difference. It still loads very slowly

@askvortsov1 askvortsov1 reopened this Jun 23, 2020
@askvortsov1
Copy link
Member

askvortsov1 commented Jun 23, 2020

Nevermind then... Could you open a new issue and share the output of your php flarum infoand link to this one

@simon1222
Copy link

The problem is generated in TagSerializer by:

'canStartDiscussion' => $this->actor->can('startDiscussion', $tag),
'canAddToDiscussion' => $this->actor->can('addToDiscussion', $tag),

and

protected function lastPostedDiscussion($tag)
    {
        return $this->hasOne($tag, DiscussionSerializer::class);
    }

@askvortsov1
Copy link
Member

@simon1222 is this something you've confirmed via testing? We were able to determine that disabling FoF Follow Tags reduces the number of SQL queries sent to the database on initial load from 411 to 54 for a forum with hundreds of tags.

@simon1222
Copy link

@askvortsov1 yes, I've confimed via testing. I have 320k discussions and 2k tags, so lastPostedDiscussion was one of the problem. I was waiting about 60s, so I've commented return $this->hasOne($tag, DiscussionSerializer::class); and reduced to 30s.
I was testing line by line and this piece of code is working fine:

$event->data['tags'] = Tag::whereVisibleTo($event->actor)
                ->withStateFor($event->actor)
                ->with(['parent', 'lastPostedDiscussion'])
                ->get();

but methods, which are executed in Serializer, are working too slow. I think these methods generate too many queries to the database.

@askvortsov1
Copy link
Member

hmm, thing is, that should be processed by the includes system, which would load it all in one query. It would be super helpful to get a bit more information as to what is going on with your forum, so we can try to fix some of these bugs and increase performance.

Would you be able to use https://github.com/FriendsOfFlarum/clockwork (if it works) or some alternate tool to see how many and which database requests are being made on forum load? Also, could you share the URL of your forum and the output of php flarum info? If this is something you'd prefer not to post publically, could you send your https://discuss.flarum.org username and I can reach out to you via private message?

@w-4
Copy link
Contributor

w-4 commented Aug 29, 2020

I think big performance gains will be achieved by not loading the lastPostedDiscussion relationship on normal pages. I might be wrong, but I think it's only used on the tags overview page. Here's the relevant code:

    /**
     * @param WillGetData $event
     */
    public function includeTagsRelationship(WillGetData $event)
    {
        if ($event->isController(ShowForumController::class)) {
            $event->addInclude(['tags', 'tags.lastPostedDiscussion', 'tags.parent']);
        }
    }

That would remove two SQL queries per Tag:

SELECT * FROM `discussions` WHERE `discussions`.`id` = '1' LIMIT  | runtime: 0.27 ms
SELECT `tags`.*, `discussion_tag`.`discussion_id` as `pivot_discussion_id`, `discussion_tag`.`tag_id` as `pivot_tag_id` FROM `tags` inner join `discussion_tag` on `tags`.`id` = `discussion_tag`.`tag_id` WHERE `discussion_tag`.`discussion_id` = '1' | runtime: 0.43 ms

And it would also remove the size of the payload JSON significantly. Right now every last posted discussion object is included in the resource section.

@dsevillamartin
Copy link
Member

@w-4 That's a good point. The last posted discussion can be loaded through an HTTP request when one clicks the tags page and/or included only in the /tags route.

@stale
Copy link

stale bot commented Nov 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We do this to keep the amount of open issues to a manageable minimum.
In any case, thanks for taking an interest in this software and contributing by opening the issue in the first place!

@stale stale bot added the stale Issues that have had over 90 days of inactivity label Nov 27, 2020
@luceos luceos added the org/keep Issues we want to keep open label Nov 27, 2020
@stale stale bot removed the stale Issues that have had over 90 days of inactivity label Nov 27, 2020
@askvortsov1 askvortsov1 added this to the 0.1 milestone Nov 28, 2020
@SychO9
Copy link
Member

SychO9 commented Feb 24, 2021

Sharing what I recently found,

So I decided to update fof/clockwork to see flarum's performance and have a better idea of it. That's when I came across the fact that there were two types of models being loaded for every single primary tag (UserState, Tag) which are relations of the tag's lastPostedDiscussion discussion model, they are called and used in the DiscussionSerializer which results in querying them.

Simply eager loading these relationships solves the issue by only using two queries instead of two queries per primary tag. Do we want to make this change or instead just do as posted above and query the lastPostedDiscussions from the frontend while also limiting the number of tags exposed ?

We could include this fix for now since it doesn't hurt. (I posted the PR incase we'd like to do that).

$data['tags'] = Tag::whereVisibleTo($actor)
            ->withStateFor($actor)
            ->with([
                'parent',
                'lastPostedDiscussion',
                'lastPostedDiscussion.tags',
                'lastPostedDiscussion.state'
            ])
            ->get();

Also observation here, we don't actually need to eager load the tags here since we are loading all of them, we could just loop through all the lastPostedDiscussion relations and set the tags relationship from the collection of tags we already have.

Screenshot from 2021-02-24 18-51-11
Screenshot from 2021-02-24 18-50-46
Screenshot from 2021-02-24 18-48-55

@askvortsov1
Copy link
Member

I'd be in favor of including this since those queries are needed anyway, but I wouldn't close this issue until we properly paginate it and only load the tags we need, when we need them.

@SychO9 SychO9 mentioned this issue Feb 25, 2021
13 tasks
@askvortsov1
Copy link
Member

Infinite nesting PR might have a start to this

@askvortsov1 askvortsov1 self-assigned this Mar 17, 2021
@askvortsov1
Copy link
Member

Plan for handling this is to do it in 3 stages.

Stage 1: only load top-level primary tags, top 3 secondary tags, and current tag's children/siblings on an arbitrary forum request. This solves the issue of thousands of tags being loaded in at once, which understandably makes things very slow. This is ready for review at flarum/tags#87

Stage 2: Optimize tag permission checking so we don't have to send huge arrays to/from the database. This significantly speeds up all pages (2x at least), and when coupled with Stage 1, makes the effect of tags negligible for every page except the "Tags" page. This is ready for review at flarum/tags#126

Stage 3: Pagination. I propose the following changes:

  • On the tags page, load in all primary tags on load, but paginate in secondary tags
  • On the admin tags page, set up a paginated table / grid for secondary tags. Load in all primary tags on load though.
  • Send in the number of primary tags usable via API, use that as criteria for showing/disabling the "select tags" button.
  • In the tag discussion modal, lazy-load in children when a top-level primary tag is selected (make the code from IndexPage.currentTag into a util). Add search (and show only 10-20 most recent results) instead of showing ALL of them.
  • Improve scalability of the admin permissions dashboard (also through pagination of some sort, maybe a scope selector too).

Stages 1 and 2 make tags usable on an enterprise-scale forum. Stage 3 enables hundreds of primary and thousands of secondary tags with elegant scaling. Stage 3 will likely be after stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
org/keep Issues we want to keep open prio/high
Projects
None yet
Development

No branches or pull requests

7 participants