WBGAPI provides modern, pythonic access to the World Bank's data API. It is designed both for data novices and data scientist types.
WBGAPI differs from other packages for World Bank data in a few respects and in general tries to take full advantage of the World Bank's powerful API while mitigating the impact of some its ideosyncracies. The most significant difference from other packages is that WBGAPI queries databases individually instead of collectively. By default, WBGAPI queries the World Development Indicators database (db=2) but the default can be changed for each request or globally. This prevents confusion when indicators such as population (SP.POP.TOTL) appear in several different databases or when different databases have different dimensions, economies or time periods.
Other key features:
-
Easily select multiple series, economies (countries) and time periods in a single request
-
Select individual years, ranges, and most recent values (MRVs)
-
Metadata queries
-
Very "pythonic": use of generators, ranges and sets make data access easy and elegant
-
Extensive (but optional) pandas support
pip install wbgapi
Import the module; my preferred namespace is wb
:
import wbgapi as wb
WBGAPI includes extenstive docstrings with lots of examples:
help(wb)
help(wb.series)
[etc]
WBGAPI includes sub-packages for major features in the World Bank API:
Feature | Description |
---|---|
series | Indicators (e.g., 'SP.POP.TOTL') |
economy | Countries and economies (could be subnational for some databases) |
time | Time (usually annual, sometimes quarterly or monthly) |
source | Databases (e.g., WDI, Doing Business, International Debt) |
region | World Bank regions (this is global to all databases) |
income | World Bank income groups (also global) |
lending | World Bank lending types (also global) |
topic | World Bank topics (this is also a global list and discrete from the Topic metadata field for series) |
Each of the above implements a minimum of four functions for accessing and displaying elements of that feature:
Function | Description |
---|---|
list |
Returns an iterable list (python generator) of elements |
get |
Returns a single element, e.g. get('SP.POP.TOTL') |
info |
Like list but returns a human-readable table |
Series |
Like list but returns a pandas Series |
In interactive mode or a jupyter notebook, the info
functions are great for exploring what's in the API
or a particular database. A good place to start is by listing the available databases:
import wbgapi as wb
wb.source.info()
id name lastupdated
---- -------------------------------------------------------------------- -------------
1 Doing Business 2019-10-23
2 World Development Indicators 2020-12-16
3 Worldwide Governance Indicators 2020-09-28
5 Subnational Malnutrition Database 2016-03-21
6 International Debt Statistics 2021-01-21
...
63 elements
From there, you can inspect the contents of individual databases:
wb.series.info() # WDI by default
wb.economy.info(db=6) # economies in the Debt Statistics database
wb.db = 1 # Change default database to...
wb.series.info() # ...Doing Business
info
, list
and Series
also let you pass an identifier or list of identifiers to filter the printout:
wb.series.info('NY.GDP.PCAP.CD') # GDP
wb.economy.info(['CAN', 'USA', 'MEX']) # Countries in North America
You can also query by keyword:
wb.series.info(q='women')
wb.economy.info(q='congo')
Note: keyword queries ignore the parenthetical part of the indicator name. For example,
q='GDP'
will not match "Gross domestic savings (% of GDP)". To search the parenthetical part too, add
an exclamation point like this: q='!GDP'
Additionally, the region
, income
, lending
, and topic
sub-packages have a members
function
that returns the membership of the specfied group, so you can do this:
wb.economy.info(wb.income.members('HIC')) # high-income economies
wb.series.info(wb.topic.members(8)) # indicators in the health topic (wb.topic.info() for full list)
wb.series.info(topic=8) # same as above but easier to type
If that doesn't do it, the search
function provides deeper search on all metadata in the current database:
wb.search('fossil fuels')
When you need programmatic access, just call list
or Series
instead of info
in the above examples.
The data
sub-package requests data for combinations of series, economies, and time periods in the current
database. Use the fetch
function to return rows as dictionary objects:
for row in wb.data.fetch('SP.POP.TOTL', 'USA'): # all years
print(row)
Or DataFrame
to return a pandas data frame:
wb.data.DataFrame(['NY.GDP.PCAP.CD', 'SP.POP.TOTL'], 'CAN', mrv=5) # most recent 5 years
Each of those parameters (series, economy, time) accepts a single identifier, a list of identifiers, or the default keyword 'all':
# population for African countries, every other year
wb.data.DataFrame('SP.POP.TOTL', wb.region.members('AFR'), range(2010, 2020, 2))
Both fetch
and DataFrame
provide a lot of paramters for customizing your request, so use the help function to check
the documentation.
Note that DataFrame
will use multi-indexes where necessary (use the "index" and "columns" parameters to change the
default behavior)::
wb.data.DataFrame(['SP.POP.TOTL', 'EN.ATM.CO2E.KT'], time=range(2000, 2020), skipBlanks=True, columns='series')
EN.ATM.CO2E.KT SP.POP.TOTL
economy time
ABW YR2000 2379.883 90853.0
YR2001 2409.219 92898.0
YR2002 2438.555 94992.0
YR2003 2563.233 97017.0
YR2004 2618.238 98737.0
... ... ...
ZWE YR2015 12317.453 13814629.0
YR2016 10982.665 14030390.0
YR2017 NaN 14236745.0
YR2018 NaN 14439018.0
YR2019 NaN 14645468.0
Use the reset_index
function (on the data frame) to replace the index with 0-based integers:
wb.data.DataFrame('SP.POP.TOTL', time=2015, labels=True).reset_index()
economy Country SP.POP.TOTL
0 ZWE Zimbabwe 1.381463e+07
1 ZMB Zambia 1.587936e+07
2 YEM Yemen, Rep. 2.649789e+07
3 PSE West Bank and Gaza 4.270092e+06
4 VIR Virgin Islands (U.S.) 1.077100e+05
.. ... ... ...
Most World Bank databases consist of 3 dimensions: series, economy and time. But some, like WDI Archives, contain 4 dimensions, which you can access like this:
wb.source.concepts(db=57)
{'economy': {'key': 'country', 'value': 'Country'},
'series': {'key': 'series', 'value': 'Series'},
'time': {'key': 'time', 'value': 'Time'},
'version': {'key': 'version', 'value': 'Version'}}
And query like this:
# Have population estimates for Brazil been revised over time?
# Version identifiers are in the form YYYYMM. This example queries data for the April
# versions from 2010-2019
wb.data.DataFrame('SP.POP.TOTL', 'BRA', range(2000,2005), version=range(201004,202004,100), db=57)
YR2000 YR2001 YR2002 YR2003 YR2004
version
201004 174174447.0 176659138.0 179123364.0 181537359.0 183863524.0
201104 174174447.0 176659138.0 179123364.0 181537359.0 183863524.0
201204 174425387.0 176877135.0 179289227.0 181633074.0 183873377.0
201304 174425387.0 176877135.0 179289227.0 181633074.0 183873377.0
201404 174504898.0 176968205.0 179393768.0 181752951.0 184010283.0
201504 174504898.0 176968205.0 179393768.0 181752951.0 184010283.0
201604 175786441.0 178419396.0 181045592.0 183627339.0 186116363.0
201704 175786441.0 178419396.0 181045592.0 183627339.0 186116363.0
201804 175287587.0 177750670.0 180151021.0 182482149.0 184738458.0
201904 175287587.0 177750670.0 180151021.0 182482149.0 184738458.0
WBGAPI tries to provide some level of normalization for dimensions in API databases. As suggested
above, the 'economy' dimension is referenced as 'economy' even though the target database
may defined it as 'state,' 'province' or something else. Similarly, 'year' becomes 'time.'
Reserved characters are mapped to underscores so you can pass them as function arguments.
Again, the concepts
function shows what is going on behind the scenes:
wb.source.concepts(db=6)
{'counterpart_area': {'key': 'counterpart-area', 'value': 'Counterpart-Area'},
'economy': {'key': 'country', 'value': 'Country'},
'series': {'key': 'series', 'value': 'Series'},
'time': {'key': 'time', 'value': 'Time'}}
The standard dimensions all support the Series function to provide elements as a pandas Series (see above), but they all share a common implemention function which you can call yourself. Here's how to get a Series for a custom dimension:
wb.Series(wb.source.features('counterpart_area', db=6))
As explained above, any feature in WBGAPI can be returned as a pandas Series. In addition economies can also be returned as a DataTable with region, income, and lending codes:
wb.economy.DataFrame()
Or to limit exclude the aggregate regions:
wb.economy.DataFrame(skipAggs=False)
wbgapi returns metadata for series, economies and combinations:
wb.series.metadata.get('SP.POP.TOTL', economies=['KEN', 'TZA'])
or single footnotes:
wb.data.footnote('SP.POP.TOTL', 'ARG', 2010)
wbgapi includes utility function that resolves common spellings of country names to the ISO3 codes used by the API. The return from this function is a "dict" subclass that provides a nice report, but can still be processed programmatically:
wb.economy.coder(['Argentina', 'Swaziland', 'South Korea', 'England', 'Chicago'])
ORIGINAL NAME WBG NAME ISO_CODE
--------------- -------------- ----------
Argentina Argentina ARG
Swaziland Eswatini SWZ
South Korea Korea, Rep. KOR
England United Kingdom GBR
Chicago
wbgapi provides fairly good support for IPython, Jupyter Notebook, etc and will generally return HTML
output for things like tables in those environments. HTML output is wrapped in a <div class="wbgapi"/>
container so that you can customize the CSS if you so desire (for instance, I like to left-align the columns).
The location of your custom.css varies depending on your environment. Note that this does not apply
to DataFrame objects, which are formatted by pandas.
WBGAPI uses requests for all HTTP/HTTPS calls. As of version 1.0.10
you can use the get_options
module variable to pass any additional parameters you like to
requests.get
for instance, to specify a
proxy server or
disable SSL verification.
For example:
wb.get_options['proxies'] = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
Using the wb.proxies
variable is still supported on a deprecated basis and will raise a DeprecationWarning
exception (which python ignores by default).
WBGAPI has no built-in caching, but you can implement it yourself using requests cache.