app-ads.txt crawler according to "IAB Technology Laboratory"
npm i app-ads-txt
const AppAdsTxtCrawler = require('app-ads-txt').crawler;
const appAdsTxtCrawler = new AppAdsTxtCrawler(options);
const appAdsTxtData = appAdsTxtCrawler.crawlData('example.com');
input: example.com/app-ads.txt output:
{
"appAdsUrl": "https://example.com/app-ads.txt",
"data" : {
"variables": {},
"fields" :
[{
"domain" : "example.com",
"publisherAccountID" : "104023",
"accountType" : "DIRECT",
"certificateAuthorityID": "79929e88b2ba73bc"
}]
}
}
options:
{
"proxyUrl": "http://user:[email protected]"
}
- ERR_INVALID_URL: on invalid input url
Follow these steps to transform the developer URL into a path to crawl for locating an appads. txt file.
- Extract the host name portion of the URL.
- Remove any “www.” or “m.” prefix present in the host name.
- Remove all but the first (and, if present, second) name from the host name which precedes the standard public suffix. For example: a. example.com simply remains example.com b. subdomain.example.com remains subdomain.example.com c. another.subdomain.example.com becomes subdomain.example.com d. another.subdomain.example.co.uk becomes subdomain.example.co.uk
- Append /app-ads.txt to that path.
- Crawlers should attempt to fetch the HTTPS version of the URL first, falling back to the HTTP version if SSL is unavailable.
MIT