Skip to content

Certain Sites Only Work When facebookexternalhit/1.1 User-Agent Is Specified #28

@kachi1227

Description

@kachi1227

First of all, great library! Thanks for putting it together.

So earlier this year, Twitter stopped sending down open graph data unless specific user-agents were specified. The only one that I've gotten to work is the 'facebookexternalhit/1.1' user-agent and it should, in theory, work forever since it's the user-agent that the Facebook crawler uses. Unfortunately for me, I figured this out after a considerable amount of debugging on my end. In order to save users time, it might be helpful to update the code in the examples folder to the following:
$client = new Psr18Client(new NativeHttpClient(['headers'=>['User-Agent'=>'facebookexternalhit/1.1']]));

In case you're wondering, I've tested the code above with all major websites (Google, Facebook, CNN, Twitter, Youtube, Instagram, Amazon, etc) and they all return valid open graph data.

Attached are screenshots of the open graph data that gets returned when the following url is passed: https://twitter.com/CNN/status/1308170698175774720. The first screenshot is the result without a user-agent and the second screenshot is the result with the user-agent mentioned above.

No User-Agent:
Screen Shot 2020-09-21 at 6 50 03 PM

facebookexternalhit/1.1 User-Agent:
Screen Shot 2020-09-21 at 6 50 45 PM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions