Performance improvements and various fixes #104

nacin · 2014-04-24T20:52:14Z

These commits do a number of things:

Improve performance via traditional means (suspending cache invalidation, etc.) and less traditional means (applying knowledge of WP API internals to reduce taxonomy and meta queries, etc.)
Use post_name instead of post_title when searching for existing items, as proposed by @Otto42.
Skip @ignore'd items as well as duplicate hooks.
Use :: to delimit class methods.
Print the file name as context when processing files.

Track queries, time. Fire actions to allow for environment-specific adjustments like sending DB reads to masters, autocommit=0, etc.

Non-strings fail the test in update_metadata() where get_metadata() is used to see if an update is needed.

Simplify and centralize term creation to avoid repeated term_exists(). Watch for changes to terms and meta before deciding whether to update the post. (Updating the post bumps post_modified, which we do want, but don't want on every import.) Also fixes package/subpackage assignments which were overwriting each other.

Rarst · 2014-04-24T20:58:08Z

Nope on skipping anything. As previously discussed we might not use ignored stuff in reference (which I am against of personally), but Parser should provide as complete data as possible.

JDGrimes · 2014-04-24T21:07:19Z

Use post_name instead of post_title when searching for existing items, as proposed by @Otto42.

Sorry, but we can't do that at present. See #92. We need to change the way we handle duplicates (#94)

Most of the rest is probably great though.

Nope on skipping anything. As previously discussed we might not use ignored stuff in reference (which I am against of personally), but Parser should provide as complete data as possible.

Maybe this could be an option, but I agree with @Rarst, I think that in general the parser should provide as much data as possible (within reason) and let the user decide what to ignore further down the line. But the option to speed things up by skipping @ignored stuff is probably a good idea. Related: #37.

JDGrimes · 2014-04-24T21:21:31Z

Use post_name instead of post_title when searching for existing items, as proposed by @Otto42.

Sorry, but we can't do that at present. See #92. We need to change the way we handle duplicates (#94)

Ahh, I see this is handled in nacin/WP-Parser@05b013d by just skipping the duplicates. I think we want to know the locations of every place that the hook is used though. So instead of just skipping them we should save the file and line numbers.

nacin · 2014-04-24T21:21:34Z

@Rarst: We have @internal which is what you may be thinking of. Skipping @ignore is not for performance. It's because @ignore doesn't mean internal — it means ignore. This is used in core when we're defining duplicate functions in load-scripts.php, load-styles.php, setup-config.php. There may be one or two incorrect uses in core but otherwise the directive needs to be followed.

@JDGrimes: I avoided the issues switching to post_name by not importing secondary instances of a hook for now. That fixes the first half of #94. There are still some problematic situations (about 15 total) but all of them are issues of documentation or the parser (it doesn't detect when we break out of PHP and back again between the doc block and the hook), and are not to be imported. The second half of #94 would be to actually make note of these "duplicate" locations, which is low priority for actually launching this thing.

As a general note: This PR is designed to make the importer both faster and also fairly idempotent. Once the database has been populated initially, running it again should result in no INSERT or UPDATE queries. Previously it was churning absolutely everything.

nacin · 2014-04-24T21:23:30Z

Another aspect: by preventing churn, that means post_modified would actually be accurate and can be used to indicate and track recent updates. Previously all posts were getting updated even if no data (including meta or tax) were changing. This PR handles that.

JDGrimes · 2014-04-24T21:27:25Z

Another aspect: by preventing churn, that means post_modified would actually be accurate and can be used to indicate and track recent updates. Previously all posts were getting updated even if no data (including meta or tax) were changing. This PR handles that.

👍

I see now that we are on the same page (I should've finished looking at the whole PR before commenting. 😊).

Rarst · 2014-04-25T22:13:59Z

We have @internal which is what you may be thinking of. Skipping @ignore is not for performance.

Ah, right. As per #16 I am for opt-out, rather than current opt-in about internal.

This is used in core when we're defining duplicate functions in load-scripts.php, load-styles.php, setup-config.php. There may be one or two incorrect uses in core but otherwise the directive needs to be followed.

Makes sense in core context, however parser is not just for core. Can't know how other people are possibly using the tag. Also the ignored tag doesn't seem to be part of PSR-5 spec at all. As for me that's argument against special handling for it, more so against "lossy" one.

Performance improvements and various fixes

Rarst · 2014-05-04T09:11:18Z

Merging, but I'll look more at handling ignore part later.

nacin added 11 commits April 24, 2014 20:27

Performance: Suspend cache invalidation.

e55999f

Track queries, time. Fire actions to allow for environment-specific adjustments like sending DB reads to masters, autocommit=0, etc.

Add filename being processed to line output.

d185130

Skip duplicate and undocumented hooks.

05b013d

Use :: as a class::method delimiter.

310e874

Ignore items with @ignore tags.

7ed46b4

Avoid queries with update_post_meta().

57f6ce0

Non-strings fail the test in update_metadata() where get_metadata() is used to see if an update is needed.

Use the indexed slug field to find existing items.

af34f24

Add insert_term method to simplify/reduce queries.

6b0f533

Remove transition post callbacks for performance.

9c1f3d4

Avoid unnecessary post update calls (part 1).

1830332

Rarst added a commit that referenced this pull request May 4, 2014

Merge pull request #104 from nacin/speed

c70a033

Performance improvements and various fixes

Rarst merged commit c70a033 into WordPress:master May 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements and various fixes #104

Performance improvements and various fixes #104

nacin commented Apr 24, 2014

Rarst commented Apr 24, 2014

JDGrimes commented Apr 24, 2014

JDGrimes commented Apr 24, 2014

nacin commented Apr 24, 2014

nacin commented Apr 24, 2014

JDGrimes commented Apr 24, 2014

Rarst commented Apr 25, 2014

Rarst commented May 4, 2014

Performance improvements and various fixes #104

Performance improvements and various fixes #104

Conversation

nacin commented Apr 24, 2014

Rarst commented Apr 24, 2014

JDGrimes commented Apr 24, 2014

JDGrimes commented Apr 24, 2014

nacin commented Apr 24, 2014

nacin commented Apr 24, 2014

JDGrimes commented Apr 24, 2014

Rarst commented Apr 25, 2014

Rarst commented May 4, 2014