-
Notifications
You must be signed in to change notification settings - Fork 16
Wikipedia terms and conditions
When using Infoboxer for massive data extraction from Wikipedia, you should consider this:
-
Before using the data, you should consider Wikipedia's license. Here is some explanation of how to properly reuse the content
-
There's no official API request limits, and documentation explicitly states that
If you make your requests in series rather than in parallel (i.e. wait for the one request to finish before sending a new request, such that you're never making more than one request at the same time), then you should definitely be fine. →
-
Official documentation explicitly requires you to specify User-Agent header. Infoboxer provides some default header, but docs say:
Don't use the default User-Agent provided by your client library, but make up a custom header that identifies your script or service and provides some type of means of contacting you (e.g., an e-mail address). →
With Infoboxer, you do the latter like this:
UA = 'MyCoolTool/1.1 (http://example.com/MyCoolTool/; [email protected])'
# All requests to all wikis will be with your User-Agent:
Infoboxer.user_agent = UA
# or, alternatively, just for one target site:
client = Infoboxer.wikipedia(user_agent: UA)
client.get('Argentina')