One of my childhood dreams has been to up close to a meteorite. Imagine. Being in close proximity to something that has travelled a unimaginably large distance present right in front of you.
While this dataset by the Tidy Tuesday project does not quite let me do that, it still makes for a nice exercise in animation via the gganimate
package.
As always, the first thing I do is load in the tidyverse
, followed by loading in the data and inspecting it briefly.
library(tidyverse)
meteorites <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-06-11/meteorites.csv")
theme_set(theme_light()) #personal theme preference
meteorites %>%
glimpse()
## Rows: 45,716
## Columns: 10
## $ name <chr> "Aachen", "Aarhus", "Abee", "Acapulco", "Achiras", "Adhi K…
## $ id <dbl> 1, 2, 6, 10, 370, 379, 390, 392, 398, 417, 423, 424, 425, …
## $ name_type <chr> "Valid", "Valid", "Valid", "Valid", "Valid", "Valid", "Val…
## $ class <chr> "L5", "H6", "EH4", "Acapulcoite", "L6", "EH4", "LL3-6", "H…
## $ mass <dbl> 21, 720, 107000, 1914, 780, 4239, 910, 30000, 1620, 1440, …
## $ fall <chr> "Fell", "Fell", "Fell", "Fell", "Fell", "Fell", "Fell", "F…
## $ year <dbl> 1880, 1951, 1952, 1976, 1902, 1919, 1949, 1814, 1930, 1920…
## $ lat <dbl> 50.77500, 56.18333, 54.21667, 16.88333, -33.16667, 32.1000…
## $ long <dbl> 6.08333, 10.23333, -113.00000, -99.90000, -64.95000, 71.80…
## $ geolocation <chr> "(50.775, 6.08333)", "(56.18333, 10.23333)", "(54.21667, -…
We get a few columns that immediately pop out to me, specifically the lat
and long
columns. This brings to mind the idea of visualizing a world map, which is what we will be proceeding to do.
The year
column also gives us the possibility of getting a few insights that related to time, which can be done by the lubridate
package.
library(lubridate)
meteorites %>%
summary()
## name id name_type class
## Length:45716 Min. : 1 Length:45716 Length:45716
## Class :character 1st Qu.:12689 Class :character Class :character
## Mode :character Median :24262 Mode :character Mode :character
## Mean :26890
## 3rd Qu.:40657
## Max. :57458
##
## mass fall year lat
## Min. : 0 Length:45716 Min. : 860 Min. :-87.37
## 1st Qu.: 7 Class :character 1st Qu.:1987 1st Qu.:-76.71
## Median : 33 Mode :character Median :1998 Median :-71.50
## Mean : 13278 Mean :1992 Mean :-39.12
## 3rd Qu.: 203 3rd Qu.:2003 3rd Qu.: 0.00
## Max. :60000000 Max. :2101 Max. : 81.17
## NA's :131 NA's :291 NA's :7315
## long geolocation
## Min. :-165.43 Length:45716
## 1st Qu.: 0.00 Class :character
## Median : 35.67 Mode :character
## Mean : 61.07
## 3rd Qu.: 157.17
## Max. : 354.47
## NA's :7315
meteorites %>%
mutate(class = as.factor(class)) %>%
glimpse()
## Rows: 45,716
## Columns: 10
## $ name <chr> "Aachen", "Aarhus", "Abee", "Acapulco", "Achiras", "Adhi K…
## $ id <dbl> 1, 2, 6, 10, 370, 379, 390, 392, 398, 417, 423, 424, 425, …
## $ name_type <chr> "Valid", "Valid", "Valid", "Valid", "Valid", "Valid", "Val…
## $ class <fct> "L5", "H6", "EH4", "Acapulcoite", "L6", "EH4", "LL3-6", "H…
## $ mass <dbl> 21, 720, 107000, 1914, 780, 4239, 910, 30000, 1620, 1440, …
## $ fall <chr> "Fell", "Fell", "Fell", "Fell", "Fell", "Fell", "Fell", "F…
## $ year <dbl> 1880, 1951, 1952, 1976, 1902, 1919, 1949, 1814, 1930, 1920…
## $ lat <dbl> 50.77500, 56.18333, 54.21667, 16.88333, -33.16667, 32.1000…
## $ long <dbl> 6.08333, 10.23333, -113.00000, -99.90000, -64.95000, 71.80…
## $ geolocation <chr> "(50.775, 6.08333)", "(56.18333, 10.23333)", "(54.21667, -…
#Inspecting for missing data
mean(is.na(meteorites$lat))
## [1] 0.1600096
mean(is.na(meteorites$long))
## [1] 0.1600096
mean(is.na(meteorites$mass))
## [1] 0.002865518
meteorites %>%
count(name, sort = TRUE)
## # A tibble: 45,716 × 2
## name n
## <chr> <int>
## 1 Aachen 1
## 2 Aarhus 1
## 3 Abajo 1
## 4 Abar al' Uj 001 1
## 5 Abbott 1
## 6 Abee 1
## 7 Abernathy 1
## 8 Abo 1
## 9 Abu Moharek 1
## 10 Acapulco 1
## # ℹ 45,706 more rows
#Years with the most recorded impacts
meteorites %>%
count(year, sort = TRUE)
## # A tibble: 266 × 2
## year n
## <dbl> <int>
## 1 2003 3323
## 2 1979 3046
## 3 1998 2697
## 4 2006 2456
## 5 1988 2296
## 6 2002 2078
## 7 2004 1940
## 8 2000 1792
## 9 1997 1696
## 10 1999 1691
## # ℹ 256 more rows
#Extracting the century and decade from the year variable.
meteoritesProcessed <- meteorites %>%
mutate(century = year%/%100,
decade = ((year%%100)%/%10)*10,
class = fct_lump(class, 10)) #Keeping the top 10 most common classes of meteorites.
#Counting the lumped classes
meteorites %>%
mutate(class = fct_lump(class, 10)) %>%
count(class, sort = TRUE)
## # A tibble: 11 × 2
## class n
## <fct> <int>
## 1 Other 9734
## 2 L6 8339
## 3 H5 7164
## 4 L5 4817
## 5 H6 4529
## 6 H4 4222
## 7 LL5 2766
## 8 LL6 2045
## 9 L4 1256
## 10 H4/5 428
## 11 CM2 416
This brings us to the exciting stuff. Before we get into it, there are a few concepts I’d like to highlight here:
- The
borders()
function is a quick and easy way to get political borders onto my ggplot object. - The
transition_states()
function in thegganimate
package allows us to define the various states of our animation. Basically, this is the function that allows us to link various visualizations depicting different values of the same variable into one animation. - The
enter_fade()
function does exactly what it sounds like: it decides the entry animation of a new frame of our animation. - The
ease_aes()
function allows tweening of the frames in our animation. It interpolates individual frames based on the functions we pass as parameters to it.
With those concepts out in the open, we can create our animation now!
library(gganimate)
library(RColorBrewer)
#Storing the number of classes
numCols <- meteoritesProcessed %>%
distinct(class) %>%
nrow()
#Creating a custom colour palette with more colours than the default 8
custColors <- colorRampPalette(brewer.pal(8,"Dark2"))(numCols)
meteoritesProcessed %>%
group_by(century) %>%
filter(n()>10) %>% #removing infrequent groups
ungroup() %>%
filter(!is.na(mass), #removing missing data
!is.na(century),
!is.na(class)) %>%
ggplot(aes(long, lat)) +
borders() + #adding world map borders to our plot
geom_point(aes(color = class, size = mass, group = century), alpha = 0.1) +
transition_states(century) + #unique frame for each century
enter_fade() + #default fade in animation
ease_aes('exponential-in') + #using an exponential function to tween our frames
scale_color_manual(values = custColors) +
scale_size_continuous(range = c(2,12)) +
coord_map() +
expand_limits(x = c(-200,200)) +
labs(x = "",
y = "",
title = "Meteorite impacts over the centuries",
subtitle = paste("{closest_state}","th", "century")) + #displaying the current state of the animation
theme(legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "vertical",
plot.title = element_text(size = 14, face = "bold")) +
labs(colour = "Class",
size = "Mass") +
guides(colour = guide_legend(override.aes = list(alpha = 1))) #default behaviour makes the legend transparent, this overrides it
anim_save("meteoritesGif.gif")
And we’re done! This was a quick blog post I wrote up based off the visualization I made a while back. I’ve always enjoyed plotting maps, and this might be one of my favorite recent visualizations I’ve done!