This much delayed blog post deals with analyzing revenue across media franchises, with the data presented by the Tidy Tuesday project.
This was a really fun analysis, mainly due to the brainstorming that took place due to the fact that there aren’t a lot of variables in the dataset present. I also used stringr
fairly frequently here to parse and manipulate the character data present in the dataset.
As always, I load in my preferred packages, after which I briefly glimpse()
the data.
library(tidyverse) #for a streamlined analysis
library(patchwork) #for combined plots
library(igraph) #for network graphs
library(ggraph) #for plotting network graphs using the grammar of graphics
library(RColorBrewer) #for colour palettes
theme_set(theme_light()) #preferred theme choice
media_franchises <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-02/media_franchises.csv") #reading in the data
media_franchises %>%
glimpse()
## Rows: 321
## Columns: 7
## $ franchise <chr> "A Song of Ice and Fire / Game of Thrones", "A Song …
## $ revenue_category <chr> "Book sales", "Box Office", "Home Video/Entertainment…
## $ revenue <dbl> 0.900, 0.001, 0.280, 4.000, 0.132, 0.760, 1.000, 0.50…
## $ year_created <dbl> 1996, 1996, 1996, 1996, 1996, 1992, 1992, 1992, 1992,…
## $ original_media <chr> "Novel", "Novel", "Novel", "Novel", "Novel", "Animate…
## $ creators <chr> "George R. R. Martin", "George R. R. Martin", "George…
## $ owners <chr> "Random House WarnerMedia (AT&T)", "Random House Warn…
media_franchises %>%
head()
## # A tibble: 6 × 7
## franchise revenue_category revenue year_created original_media creators owners
## <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 A Song o… Book sales 0.9 1996 Novel George … Rando…
## 2 A Song o… Box Office 0.001 1996 Novel George … Rando…
## 3 A Song o… Home Video/Ente… 0.28 1996 Novel George … Rando…
## 4 A Song o… TV 4 1996 Novel George … Rando…
## 5 A Song o… Video Games/Gam… 0.132 1996 Novel George … Rando…
## 6 Aladdin Box Office 0.76 1992 Animated film Walt Di… The W…
media_franchises_processed <- media_franchises %>%
separate_rows(owners, sep = "\\) ") #separating multiple owners into their own rows
I run through a few count()
commands in order to get a better look at the categorical data present in my dataset, since I have a hunch that they are what I will be mainly looking at.
media_franchises_processed %>%
count(revenue_category, sort = TRUE)
## # A tibble: 8 × 2
## revenue_category n
## <chr> <int>
## 1 Box Office 104
## 2 Merchandise, Licensing & Retail 101
## 3 Home Video/Entertainment 91
## 4 Video Games/Games 73
## 5 Comic or Manga 46
## 6 Music 19
## 7 TV 16
## 8 Book sales 11
media_franchises_processed %>%
count(owners, sort = TRUE)
## # A tibble: 97 × 2
## owners n
## <chr> <int>
## 1 Shueisha (Hitotsubashi Group 25
## 2 The Walt Disney Company 25
## 3 Shueisha (Hitotsubashi Group) 16
## 4 (manga 13
## 5 DC Entertainment (AT&T) 13
## 6 (films) 11
## 7 (franchise 10
## 8 Square Enix 10
## 9 Pierrot 9
## 10 Bandai Namco (games) 8
## # ℹ 87 more rows
media_franchises_processed %>%
count(creators, owners, sort = TRUE)
## # A tibble: 132 × 3
## creators owners n
## <chr> <chr> <int>
## 1 Stan Lee Steve Ditko (franchise 7
## 2 Stan Lee Steve Ditko Marvel Entertainment… 7
## 3 Stan Lee Steve Ditko Sony (films) 7
## 4 Akira Toriyama (manga 6
## 5 Akira Toriyama Bandai Namco (games) 6
## 6 Akira Toriyama Bird Studio Shueish… 6
## 7 Akira Toriyama Toei Animation (anime 6
## 8 George Lucas Lucasfilm (The Walt … 6
## 9 Hideaki Anno Gainax Tatsunoko Production Khara[dc][279][280] 6
## 10 Hironobu Sakaguchi Hiromichi Tanaka Nasir Gebelli Square Enix 6
## # ℹ 122 more rows
Now comes the exciting part: plotting!
I don’t do anything I haven’t done before here, but what makes it refreshing is the use of the patchwork
package that lets me create a truly beautiful combined plot, in an extremely readable syntax!
For using patchwork
, I store my revenue
plots in 3 separate variables.
#Boxplots of revenue across categories
p1 <- media_franchises_processed %>%
mutate(revenue_category = fct_reorder(revenue_category, revenue, median)) %>%
ggplot(aes(revenue_category, revenue)) +
geom_boxplot(aes(fill = revenue_category)) +
coord_flip() +
labs(x = "Revenue category",
y = "Revenue",
title = "Total revenue across categories",
subtitle = "In billions (USD)") +
guides(fill = FALSE)
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#Revenue timelines, split across categories
p2 <- media_franchises_processed %>%
ggplot(aes(year_created, revenue)) +
geom_smooth(method = "lm", aes(color = revenue_category)) +
facet_wrap(~revenue_category, ncol = 2) + #facetting across categories
guides(color = FALSE) +
labs(title = "Revenue trends over the\nyears, per category",
subtitle = "Generally consistent,\ndrop in merchandise revenue",
x = "Year created",
y = "Revenue (in billion USD)") +
theme(axis.text.x = element_text(angle = 45)) #for readability
#Storing the names of the highest earning franchises
topFranchises <- media_franchises_processed %>%
group_by(franchise) %>%
summarise(totalRev=sum(revenue)) %>%
arrange(desc(totalRev)) %>%
head(8) %>% #for readability
pull(franchise)
#Creating a custom colour palette
custColors <- colorRampPalette(brewer.pal(8,"Set1"))(10)
#Stacked bar plot of highest earning franchises
p3 <- media_franchises_processed %>%
filter(franchise %in% topFranchises) %>%
mutate(franchise = fct_reorder(franchise, revenue, sum)) %>%
ggplot(aes(franchise, revenue)) +
geom_col(aes(fill = revenue_category)) +
coord_flip() +
scale_fill_manual(values = custColors) +
labs(title = "Highest earning franchises",
subtitle = "Sorted via total revenue",
x = "Franchise name",
y = "Revenue",
caption = "In billions of dollars",
fill = "Revenue category") +
theme(legend.position = "bottom")
(p1/p3)|p2 #patchwork!
## `geom_smooth()` using formula = 'y ~ x'
This is where the beauty of `patchwork` kicks in: `|` for adding plots horizontally, and `/` for adding them vertically. Brilliant and simple. For more functionality be sure to check out the official [repository](https://github.com/thomasp85/patchwork)!
Delving into a little detail, we can see that on average books create the most revenue for franchises, although merchandise are very close behind.
Now comes the kicker: I’ve always wanted to somehow know the number of franchises a media powerhouse like, say, The Walt Disney company owns, since they own a lot of big name franchises present today (the MCU and Star Wars come to mind). I visualize relationships between the original creators and their present day owners using the igraph
and ggraph
packages, as follows:
set.seed(100) #for reproducibility
#Defining an arrow
a <- grid::arrow(type = "closed", length = unit(.15, "inches"), angle = 15)
media_franchises_processed %>%
count(creators, owners, sort = TRUE) %>%
filter(!str_detect(owners, "[\\(\\)]")) %>% #removing characters within brackets
filter(n > 2) %>%
graph_from_data_frame() %>% #creating a network graph from the dataframe
ggraph(layout = 'fr') + #plotting it using the grammar of graphics
geom_edge_link(aes(edge_colour = n),
arrow = a,
end_cap = circle(.07, 'inches')) +
geom_node_point() +
geom_node_text(aes(label = name), repel = TRUE, size = 2) +
scale_edge_color_distiller(palette = "YlOrRd", direction = 1) +
labs(title = "Relationships between creators and owners",
subtitle = "The Walt Disney Company seems to own quite a few creations",
caption = "Arrows point from creator to owner") +
theme_void() +
theme(legend.position = "bottom",
legend.box = "vertical")
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
A lot of names stick out to me. We can see arrows from Masashi Kishimoto (creator and writer of Naruto) and Tite Kubo (creator and writer of Bleach) pointing towards Studio Pierrot, which makes sense, as this studio animated the shows Naruto and Bleach, two shows with an absolutely massive fan following. We can also see arrows from Pixar to The Walt Disney Company, for example.
And there we have it. We successfully used stringr
, patchwork
and ggraph
to effectively look at revenue generated across a variety of different categories like merchandise, video games, and more while also gaining some information about the franchises themselves, using the network graph we created! Fun stuff!