7 min read

Media franchise powerhouses: Who's the biggest earner of them all?

This much delayed blog post deals with analyzing revenue across media franchises, with the data presented by the Tidy Tuesday project.

This was a really fun analysis, mainly due to the brainstorming that took place due to the fact that there aren’t a lot of variables in the dataset present. I also used stringr fairly frequently here to parse and manipulate the character data present in the dataset.

As always, I load in my preferred packages, after which I briefly glimpse() the data.

library(tidyverse) #for a streamlined analysis
library(patchwork) #for combined plots
library(igraph) #for network graphs
library(ggraph) #for plotting network graphs using the grammar of graphics
library(RColorBrewer) #for colour palettes

theme_set(theme_light()) #preferred theme choice

media_franchises <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-02/media_franchises.csv")  #reading in the data

media_franchises %>%
  glimpse()
## Rows: 321
## Columns: 7
## $ franchise        <chr> "A Song of Ice and Fire /  Game of Thrones", "A Song …
## $ revenue_category <chr> "Book sales", "Box Office", "Home Video/Entertainment…
## $ revenue          <dbl> 0.900, 0.001, 0.280, 4.000, 0.132, 0.760, 1.000, 0.50…
## $ year_created     <dbl> 1996, 1996, 1996, 1996, 1996, 1992, 1992, 1992, 1992,…
## $ original_media   <chr> "Novel", "Novel", "Novel", "Novel", "Novel", "Animate…
## $ creators         <chr> "George R. R. Martin", "George R. R. Martin", "George…
## $ owners           <chr> "Random House WarnerMedia (AT&T)", "Random House Warn…
media_franchises %>% 
  head()
## # A tibble: 6 × 7
##   franchise revenue_category revenue year_created original_media creators owners
##   <chr>     <chr>              <dbl>        <dbl> <chr>          <chr>    <chr> 
## 1 A Song o… Book sales         0.9           1996 Novel          George … Rando…
## 2 A Song o… Box Office         0.001         1996 Novel          George … Rando…
## 3 A Song o… Home Video/Ente…   0.28          1996 Novel          George … Rando…
## 4 A Song o… TV                 4             1996 Novel          George … Rando…
## 5 A Song o… Video Games/Gam…   0.132         1996 Novel          George … Rando…
## 6 Aladdin   Box Office         0.76          1992 Animated film  Walt Di… The W…
media_franchises_processed <- media_franchises %>% 
                                separate_rows(owners, sep = "\\) ") #separating multiple owners into their own rows

I run through a few count() commands in order to get a better look at the categorical data present in my dataset, since I have a hunch that they are what I will be mainly looking at.

media_franchises_processed %>% 
  count(revenue_category, sort = TRUE)
## # A tibble: 8 × 2
##   revenue_category                    n
##   <chr>                           <int>
## 1 Box Office                        104
## 2 Merchandise, Licensing & Retail   101
## 3 Home Video/Entertainment           91
## 4 Video Games/Games                  73
## 5 Comic or Manga                     46
## 6 Music                              19
## 7 TV                                 16
## 8 Book sales                         11
media_franchises_processed %>% 
  count(owners, sort = TRUE)
## # A tibble: 97 × 2
##    owners                            n
##    <chr>                         <int>
##  1 Shueisha (Hitotsubashi Group     25
##  2 The Walt Disney Company          25
##  3 Shueisha (Hitotsubashi Group)    16
##  4 (manga                           13
##  5 DC Entertainment (AT&T)          13
##  6 (films)                          11
##  7 (franchise                       10
##  8 Square Enix                      10
##  9 Pierrot                           9
## 10 Bandai Namco (games)              8
## # ℹ 87 more rows
media_franchises_processed %>% 
  count(creators, owners, sort = TRUE)
## # A tibble: 132 × 3
##    creators                                          owners                    n
##    <chr>                                             <chr>                 <int>
##  1 Stan Lee Steve Ditko                              (franchise                7
##  2 Stan Lee Steve Ditko                              Marvel Entertainment…     7
##  3 Stan Lee Steve Ditko                              Sony (films)              7
##  4 Akira Toriyama                                    (manga                    6
##  5 Akira Toriyama                                    Bandai Namco (games)      6
##  6 Akira Toriyama                                    Bird Studio  Shueish…     6
##  7 Akira Toriyama                                    Toei Animation (anime     6
##  8 George Lucas                                      Lucasfilm (The Walt …     6
##  9 Hideaki Anno Gainax Tatsunoko Production          Khara[dc][279][280]       6
## 10 Hironobu Sakaguchi Hiromichi Tanaka Nasir Gebelli Square Enix               6
## # ℹ 122 more rows

Now comes the exciting part: plotting!

I don’t do anything I haven’t done before here, but what makes it refreshing is the use of the patchwork package that lets me create a truly beautiful combined plot, in an extremely readable syntax!

For using patchwork, I store my revenue plots in 3 separate variables.

#Boxplots of revenue across categories
p1 <- media_franchises_processed %>% 
  mutate(revenue_category = fct_reorder(revenue_category, revenue, median)) %>% 
  ggplot(aes(revenue_category, revenue)) +
  geom_boxplot(aes(fill = revenue_category)) + 
  coord_flip() +
  labs(x = "Revenue category",
       y = "Revenue",
       title = "Total revenue across categories",
       subtitle = "In billions (USD)") +
  guides(fill = FALSE) 
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#Revenue timelines, split across categories
p2 <- media_franchises_processed %>% 
  ggplot(aes(year_created, revenue)) +
  geom_smooth(method = "lm", aes(color = revenue_category)) +
  facet_wrap(~revenue_category, ncol = 2) + #facetting across categories
  guides(color = FALSE) +
  labs(title = "Revenue trends over the\nyears, per category",
       subtitle = "Generally consistent,\ndrop in merchandise revenue",
       x = "Year created",
       y = "Revenue (in billion USD)") +
  theme(axis.text.x = element_text(angle = 45)) #for readability
#Storing the names of the highest earning franchises
topFranchises <- media_franchises_processed %>% 
  group_by(franchise) %>% 
  summarise(totalRev=sum(revenue)) %>% 
  arrange(desc(totalRev)) %>% 
  head(8) %>% #for readability
  pull(franchise)

#Creating a custom colour palette
custColors <- colorRampPalette(brewer.pal(8,"Set1"))(10)

#Stacked bar plot of highest earning franchises
p3 <- media_franchises_processed %>% 
  filter(franchise %in% topFranchises) %>% 
  mutate(franchise = fct_reorder(franchise, revenue, sum)) %>% 
  ggplot(aes(franchise, revenue)) +
  geom_col(aes(fill = revenue_category)) +
  coord_flip() +
  scale_fill_manual(values = custColors) +
  labs(title = "Highest earning franchises",
       subtitle = "Sorted via total revenue",
       x = "Franchise name",
       y = "Revenue",
       caption = "In billions of dollars",
       fill = "Revenue category") +
  theme(legend.position = "bottom")

(p1/p3)|p2 #patchwork!
## `geom_smooth()` using formula = 'y ~ x'
This is where the beauty of `patchwork` kicks in: `|` for adding plots horizontally, and `/` for adding them vertically. Brilliant and simple. For more functionality be sure to check out the official [repository](https://github.com/thomasp85/patchwork)!

Delving into a little detail, we can see that on average books create the most revenue for franchises, although merchandise are very close behind.

Now comes the kicker: I’ve always wanted to somehow know the number of franchises a media powerhouse like, say, The Walt Disney company owns, since they own a lot of big name franchises present today (the MCU and Star Wars come to mind). I visualize relationships between the original creators and their present day owners using the igraph and ggraph packages, as follows:

set.seed(100) #for reproducibility

#Defining an arrow
a <- grid::arrow(type = "closed", length = unit(.15, "inches"), angle = 15)

media_franchises_processed %>%
  count(creators, owners, sort = TRUE) %>%
  filter(!str_detect(owners, "[\\(\\)]")) %>% #removing characters within brackets
  filter(n > 2) %>%
  graph_from_data_frame() %>% #creating a network graph from the dataframe
  ggraph(layout = 'fr') + #plotting it using the grammar of graphics
  geom_edge_link(aes(edge_colour = n),
                 arrow = a,
                 end_cap = circle(.07, 'inches')) +
  geom_node_point() +
  geom_node_text(aes(label = name), repel = TRUE, size = 2) +
  scale_edge_color_distiller(palette = "YlOrRd", direction = 1) +
  labs(title = "Relationships between creators and owners",
       subtitle = "The Walt Disney Company seems to own quite a few creations",
       caption = "Arrows point from creator to owner") +
  theme_void() +
  theme(legend.position = "bottom", 
        legend.box = "vertical")
## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

A lot of names stick out to me. We can see arrows from Masashi Kishimoto (creator and writer of Naruto) and Tite Kubo (creator and writer of Bleach) pointing towards Studio Pierrot, which makes sense, as this studio animated the shows Naruto and Bleach, two shows with an absolutely massive fan following. We can also see arrows from Pixar to The Walt Disney Company, for example.

And there we have it. We successfully used stringr, patchwork and ggraph to effectively look at revenue generated across a variety of different categories like merchandise, video games, and more while also gaining some information about the franchises themselves, using the network graph we created! Fun stuff!