2.1 Data collection

In this tutorial I’ll be retrieve data and metadata from World Bank on world popoulation between 1960 and 2021 WDI package.

We start by loading the population data and countries metadata from World Bank.

#population data
country_pop <- WDI::WDI(indicator='SP.POP.TOTL',
                        start=1960,
                        end=2022)
#metadata
country_meta <- WDI::WDI_data$country

Now let’s have a brief look on these two large tables.

#data table
glimpse(country_pop)
## Rows: 16,492
## Columns: 5
## $ country     <chr> "Africa Eastern and Southern", "Africa Eastern and Souther…
## $ iso2c       <chr> "ZH", "ZH", "ZH", "ZH", "ZH", "ZH", "ZH", "ZH", "ZH", "ZH"…
## $ iso3c       <chr> "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "AFE", "A…
## $ year        <int> 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012…
## $ SP.POP.TOTL <dbl> 694665117, 677243299, 660046272, 643090131, 626392880, 609…
gt::gt(head(country_pop))
country iso2c iso3c year SP.POP.TOTL
Africa Eastern and Southern ZH AFE 2021 694665117
Africa Eastern and Southern ZH AFE 2020 677243299
Africa Eastern and Southern ZH AFE 2019 660046272
Africa Eastern and Southern ZH AFE 2018 643090131
Africa Eastern and Southern ZH AFE 2017 626392880
Africa Eastern and Southern ZH AFE 2016 609978946
#metadata table
glimpse(country_meta)
## Rows: 299
## Columns: 9
## $ iso3c     <chr> "ABW", "AFE", "AFG", "AFR", "AFW", "AGO", "ALB", "AND", "ARB…
## $ iso2c     <chr> "AW", "ZH", "AF", "A9", "ZI", "AO", "AL", "AD", "1A", "AE", …
## $ country   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", "Afri…
## $ region    <chr> "Latin America & Caribbean", "Aggregates", "South Asia", "Ag…
## $ capital   <chr> "Oranjestad", "", "Kabul", "", "", "Luanda", "Tirane", "Ando…
## $ longitude <chr> "-70.0167", "", "69.1761", "", "", "13.242", "19.8172", "1.5…
## $ latitude  <chr> "12.5167", "", "34.5228", "", "", "-8.81155", "41.3317", "42…
## $ income    <chr> "High income", "Aggregates", "Low income", "Aggregates", "Ag…
## $ lending   <chr> "Not classified", "Aggregates", "IDA", "Aggregates", "Aggreg…
gt::gt(head(country_meta))
iso3c iso2c country region capital longitude latitude income lending
ABW AW Aruba Latin America & Caribbean Oranjestad -70.0167 12.5167 High income Not classified
AFE ZH Africa Eastern and Southern Aggregates Aggregates Aggregates
AFG AF Afghanistan South Asia Kabul 69.1761 34.5228 Low income IDA
AFR A9 Africa Aggregates Aggregates Aggregates
AFW ZI Africa Western and Central Aggregates Aggregates Aggregates
AGO AO Angola Sub-Saharan Africa Luanda 13.242 -8.81155 Lower middle income IBRD

Looks like a lot of data! Let’s dive a bit deeper and explore the data visually.