Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MinifyHtml transformer #251

Merged
merged 18 commits into from
Jul 1, 2021
Merged

Add MinifyHtml transformer #251

merged 18 commits into from
Jul 1, 2021

Conversation

ediamin
Copy link
Collaborator

@ediamin ediamin commented Jun 22, 2021

  • collapses whitespace (except in pre, textarea, script tags)
  • trims whitespace (except in pre, textarea tags)
  • removes comments except if they match a given pattern
  • minifies JSON
  • minifies inline amp-script using terser (see Missing JavaScript minifier #260)

closes #14

@ediamin ediamin requested a review from schlessera June 22, 2021 07:16
@CLAassistant
Copy link

CLAassistant commented Jun 22, 2021

CLA assistant check
All committers have signed the CLA.

@schlessera
Copy link
Collaborator

@ediamin Can you rebase this PR onto the latest main and then ensure that all the PHPCS checks regarding docs are passing?

@schlessera
Copy link
Collaborator

@ediamin Regarding terser, please create a new issue about the missing JS modification, and then add the spec tests that fail because of the need for terser to the TESTS_TO_SKIP array (see https://github.com/ampproject/amp-toolbox-php/blob/0.5.2/tests/Optimizer/SpecTest.php#L36-L46) with a link to that issue.

@westonruter
Copy link
Member

Do I understand correctly that the amp-script minification will be skipped for now? That seems like overkill and should be something that is done by whomever is writing JS for amp-script.

@ediamin ediamin force-pushed the add/14-minify-html-transformer branch from b7faec2 to 4e09b53 Compare June 25, 2021 05:14
@schlessera
Copy link
Collaborator

@westonruter Yes, not only is it overkill, but it is also hardly feasible in PHP. While there are libraries around, I didn't find any that supports full ES6 and minifies to the degree that terser does, which is a huge library.

@ediamin
Copy link
Collaborator Author

ediamin commented Jun 25, 2021

@schlessera I've added the amp-script failing test in TESTS_TO_SKIP list. Now I have only one failing test for removes_comments spec. The problem is, if we have any html comment before doctype then the Document class transforms it as a securedDoctype. And after I remove the html comment, this transformation cannot be restored by restoreDoctypeNode method. Please suggest what would be my best approach.

@schlessera
Copy link
Collaborator

@ediamin For the secure doctype comment, I'd suggest adding a hard-coded check in the minifier to skip this particular comment. Also, please check whether there might be other comments we are using as placeholders (both in the library as well as in the plugin) that we might want to preserve. It probably makes sense to have a configuration argument that allows a consumer of the library to provide a list of patterns to preserve.

@schlessera
Copy link
Collaborator

We already have the commentIgnorePattern option. Maybe extend that one to also accept an array, and then add to that array as needed?

@ediamin
Copy link
Collaborator Author

ediamin commented Jun 25, 2021

@schlessera I tried to skip the secured doctype comment from removing with this in MinifyHtml class,

    private function minifyCommentNode(DOMComment $node, $opts)
    {
        // ....

        if (preg_match('/^amp-doctype html/i', $node->data)) {
            return;
        }

        $this->nodesToRemove[] = $node;
    }

It prevents removing the comment node, but since there is no comment node before this doctype comment node, the Document::restoreDoctypeNode won't restore the secure doctype comment and hence in final result, we will get the doctype comment like this,

<!--amp-DOCTYPE html--><html ⚡>...

In order to fix this I need to change the the HTML_RESTORE_DOCTYPE_PATTERN in Document class like this,

const HTML_RESTORE_DOCTYPE_PATTERN = '/(<!--amp-)(doctype)(\s+[^>]+?)(-->)/i';

and also need to change the replacement pattern in restoreDoctypeNode method

    private function restoreDoctypeNode($html)
    {
        if (! $this->securedDoctype) {
            return $html;
        }

        return preg_replace(self::HTML_RESTORE_DOCTYPE_PATTERN, '<!\2\3>', $html, 1);
    }

Having these changes in Document class, we can get the correct <!DOCTYPE html> back.

Let me know your thoughts on this. Should I implement this changes in Document class?

@schlessera
Copy link
Collaborator

Ah, I see, the doctype is only secured because there's an element in front of it, and that element is removed by the minify transformer.

I think it is enough to make the element in front of the secured doctype optional by turning the + modifier into a * modifier:

-    const HTML_RESTORE_DOCTYPE_PATTERN             = '/(^[^<]*(?>\s*<!--[^>]*>\s*)+<)'
+    const HTML_RESTORE_DOCTYPE_PATTERN             = '/(^[^<]*(?>\s*<!--[^>]*>\s*)*<)'
                                                     . '(!--amp-)(doctype)(\s+[^>]+?)(-->)/i';

No other changes should be needed then.

@ediamin ediamin marked this pull request as ready for review June 28, 2021 05:44
@ediamin
Copy link
Collaborator Author

ediamin commented Jun 28, 2021

@schlessera Thank you very much. I've updated the regex pattern and it's working. The PR is ready for your final review.

Copy link
Collaborator

@schlessera schlessera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far, @ediamin!

There are a lot of minor nitpicks in the review, mostly about comments. I had hoped that the WPCS docs enforcing would highlight these, but it seems that's not (yet) the case.

The most important change I'd like you to work through is to separate configuration from internal state. The $canCollapseWhitespace and $inBody values are not configuration values, they manage the state to be kept across node iteration, therefore they should be managed as method arguments.

Another bigger change is to use the return value of the methods to pass around $nodesToRemove, as there is no reason to store these as a class property.

src/Optimizer/Configuration/MinifyHtmlConfiguration.php Outdated Show resolved Hide resolved
src/Optimizer/Configuration/MinifyHtmlConfiguration.php Outdated Show resolved Hide resolved
src/Optimizer/Configuration/MinifyHtmlConfiguration.php Outdated Show resolved Hide resolved
src/Optimizer/Configuration/MinifyHtmlConfiguration.php Outdated Show resolved Hide resolved
src/Optimizer/Configuration/MinifyHtmlConfiguration.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
@ediamin
Copy link
Collaborator Author

ediamin commented Jun 29, 2021

Hi @schlessera, I've updated the code with following notable changes

  1. Fixed the comment errors.
  2. Removed canCollapseWhitespace and inBody from the configuration class.
  3. Use minifyAmpScript = false by default and throw an InvalidConfigurationValue exception for minifyAmpScript = true.
  4. Use the configuration class to get a config value whenever we need, instead of using $this->configuration->toArray().
  5. Removed $nodesToRemove class property and collect the nodes via return values.
  6. Use Tag and Attribute interfaces for tags and attributes.
  7. Added a new InvalidJson error class to handle errors during encoding and decoding JSON data.

Please review again and let me know your feedback. Thanks.

src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
src/Optimizer/Transformer/MinifyHtml.php Outdated Show resolved Hide resolved
tests/Optimizer/Transformer/MinifyHtmlTest.php Outdated Show resolved Hide resolved
tests/Optimizer/SpecTest.php Show resolved Hide resolved
@schlessera
Copy link
Collaborator

I just finished another round of review.
Apart from many more comment fixes, there are two main things:

  • the commentIgnorePattern references Next.js, but that is meaningless in the PHP context.
  • the JSON modification was very inefficient in terms of computing within its inner loop.

@ediamin
Copy link
Collaborator Author

ediamin commented Jun 29, 2021

@schlessera Updated the code according to your suggestions.

@ediamin ediamin requested a review from schlessera June 30, 2021 15:47
@schlessera schlessera added this to the 0.7.0 milestone Jul 1, 2021
@schlessera schlessera merged commit ce85ecc into main Jul 1, 2021
@schlessera schlessera deleted the add/14-minify-html-transformer branch July 1, 2021 05:28
@schlessera
Copy link
Collaborator

Great job, @ediamin !

Comment on lines +159 to +162
// In case the main $document has `securedDoctype`.
if (preg_match('/^amp-doctype html/i', $node->data)) {
return [];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another case that I think may have been missed here, and that is the Mustache template workaround for HTML tables which is documented at https://amp.dev/documentation/components/amp-mustache/#tables

image

So if a comment node appears inside of a template[type=amp-mustache] element, and the comment text includes {{, then it should be ignored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ediamin Maybe you could open a new issue or PR to address this point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, I just filed an issue for it: #285.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter Yeah sure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you've already created the issue. Thank you very much @westonruter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transformer: Minify HTML
4 participants