A command line tool that scrapes data of .fi domains from an open api provided by the The Finnish Communications Regulatory Authority and saves the results to a JSON file and to a CSV file.
Clone or download the repository and run yarn
in it.
The app has been tested only on a macOS High Sierra and has the followind dependencies:
- availability of
/bin/bash
- availability of
yarn
command on/bin/bash
- node version >= 9.3.0
- a good network connection
Run a full scrape. This means that the scraper will fech all data on all .fi domains owned by organizations and unions. At the time of writing this the amout is close to 370 000 unique domains. As a result you'll get single JSON file and a single CSV file that both weigh around 250 Mb. Running the full scrape takes about ~20 minutes.
npm run start
Run a limited scrape. The --soft-limit
handle allows you to set a soft limit for the scrape. Exact returned amount it's guaranteed to be the same as the limit.
npm run start -- --soft-limit=500
Disable JSON output.
npm run start -- --no-json
Disable CSV output.
npm run start -- --no-csv
All of the flags above can be combined freely.
$ npm run start -- --soft-limit=500
> @ start /fi-tld-scraper
> node index.js "--soft-limit=500"
Fetching page https://odata.domain.fi/v4/odata/domains
Fetching page https://odata.domain.fi/v4/odata/domains?$skip=100
Fetching page https://odata.domain.fi/v4/odata/domains?$skip=200
Fetching page https://odata.domain.fi/v4/odata/domains?$skip=300
Fetching page https://odata.domain.fi/v4/odata/domains?$skip=400
Scrape duration: 2275.034ms
The resulting JSON file with one domain would look like this (actual data reducted).
{
"domains":[
{
"Name":"",
"State":"",
"GrantDate":"",
"LastValidityDate":"",
"IsDNSSecInUse":"",
"Holder":"",
"Registrar":"",
"OrganizationId":"",
"Address":"",
"PostalCode":"",
"PostalArea":"",
"AssociationType":"",
"PhoneNumber":"",
"DepartmentOrContactPerson":"",
"Country":"",
"NameServer1":"",
"NameServer2":"",
"NameServer3":"",
"NameServer4":"",
"NameServer5":"",
"NameServer6":"",
"NameServer7":"",
"NameServer8":"",
"NameServer9":"",
"NameServer10":""
}
]
}
The resulting CSV file with one domain would look like this (actual data reducted).
Name;State;GrantDate;LastValidityDate;IsDNSSecInUse;Holder;Registrar;OrganizationId;Address;PostalCode;PostalArea;AssociationType;PhoneNumber;DepartmentOrContactPerson;Country;NameServer1;NameServer2;NameServer3;NameServer4;NameServer5;NameServer6;NameServer7;NameServer8;NameServer9;NameServer10
"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";"";""