Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas #12

Open
barryvdh opened this issue Feb 27, 2017 · 8 comments
Open

Ideas #12

barryvdh opened this issue Feb 27, 2017 · 8 comments

Comments

@barryvdh
Copy link
Contributor

barryvdh commented Feb 27, 2017

Nice tool :)
I was building something similar in house, but didn't get around to building a nice UI around it.

Some suggestions/ideas that I was thinking of:

  • Add the PSR-7 Response to the Rule for headers/statuscodes etc.
  • Perhaps it could be simplified to return multiple statuses, eg. besides success or fail, it could also be a warning or dynamic. In this case we could dynamically set the message and return that, but feels cumbersome (but no biggy)

Some ideas for checks (besides my PRs)

@barryvdh
Copy link
Contributor Author

barryvdh commented Feb 27, 2017

Oh and storing the current URL in the route/hash, so that we can share the url and have the url pre-filled (and started directly when filled)

Probably best with vue-router, but not such an expert in Vue (yet). Edit; see #16

@phanan
Copy link
Owner

phanan commented Feb 28, 2017

Hi @barryvdh, thanks for the ideas and PRs! I'm not very familiar with PSR-7 TBH, so will need some time to wrap my head around it, but SEO and PageSpeed actually have been planned.

P.S. I'm a huge fan of you and your works, so this is like a "Senpai noticed me" moment ;)

@barryvdh
Copy link
Contributor Author

PSR-7 is what you are already using, that's the response object from Guzzle :)

What I'm running in to is that you define the constructor and check method in the interface, so you can't easily add extra dependencies to checks.
If you would leave the constructor open, but add the crawler etc to the check method, it could be more flexibel.

Also, again with PSR-7, you also get the Response body, so you could think about just passing that. There is also a UriInterface you could use. (And even psr-7 request instead of the URI, as that also contains it, but not needed perhaps)

public function check(Crawler $crawler, ResponseInterface $response, UriInterface $uri);

The rule, with flexible constructor, can be creating using the app container; $rule = $container->make($ruleClassName), which allows for injection of whatever you need, for example the request fetcher.

Currently, the UrlFetcher is a singleton, which keeps the Response as property. But you can't use the UrlFetcher for a different response in a Rule, because then you can't access the old response.
You could return the ReponseInterface again, so rules could just use ->getBody() on the response if they need the HTML, but also can access statuscode and headers. And with injecting the Response in the check() method, you don't need to keep a reference of the old response in most cases.

One problem there currently is, is that you manually gzip decode the response. But can't you check the content-encoding header? Guzzle should add the original header as x-encoded-content-encoding, which you could check instead of manually decoding, right? Then you could drop the isGzipped also, and do it properly in your rule based on headers, and just give you this:

/**
     * Fetch an URL.
     *
     * @param UriInterface|string $url
     *
     * @throws \RuntimeException
     *
     * @return ResponseInterface
     */
    public function fetch($url) : ResponseInterface { ..}

@barryvdh
Copy link
Contributor Author

I can submit a PR for it if you want, but it will break BC (and you already tagged v1..).

Probably not much people using custom rules yet, but still..

@phanan
Copy link
Owner

phanan commented Feb 28, 2017

Breaking is perfectly fine, we can always tag a v2. Move fast, break things :)

@barryvdh
Copy link
Contributor Author

barryvdh commented Mar 1, 2017

Oh and a SSL check (yes/no https) and valid certificate, + warning if < 30 days expiry.
Could use https://github.com/spatie/ssl-certificate but that's php7+

@phanan
Copy link
Owner

phanan commented Mar 2, 2017

Oh and a SSL check (yes/no https) and valid certificate, + warning if < 30 days expiry.

Interestingly I thought about this, too, the other day. The lib looks nice. PHP7 isn't an issue I think – we can force 7 in the next version (we have the breaking changes on the way anyway).

@hbakhtiyor
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants