The Economist’s Errors and Credit Where Credit is Due

The Economist is serious about their use of data visualization and they have occasionally owned up to errors in their visualizations. They can be deceptive, uninformative, confusing, excessively busy, and present a host of other barriers to clean communication. Their blog post on their errors is great.

I have drawn the following example from a #tidyTuesday earlier this year that explores this. Here is the link to the setup page.

Your Task

Your task comes in two parts. The first one is to recreate any of the second through sixth visualizations: Brexit, Dogs, EU Balance, Pensions, or Trade. You should have a look at the original and note the pathologies, but reproduce a nice visual (either their reconceptualization or something useful). One thing to note is that three axis graphics are almost always bad.

A Template

For example, the first one; the example is bad because it cuts the axis.

My Re-creation [in Labour red]

I built the whole thing in equisse with two exceptions. First, reorder here is used to arrange the names in the order that I want to plot them. By default they are alphabetical. Second, I wanted the label geometry which is not an option in esquisse but one treats it like a point with the text as the plotting character. In the aesthetic, it is a label; geom_text does the same thing without a box but it harder to distinguish in this case. If you want to play, the cheat sheet is here.

Code

library(tidyverse) # call the tidyverse for %>% and ggplot
library(ggthemes) # Use the economist theme
library(ggiraph)
corbyn <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/corbyn.csv")
# reorder will allow me to sort the names by the values.
# Uncomment to see: # with(corbyn, reorder(political_group,avg_facebook_likes))
# geom_label will allow me to put the numbers on the graph by specifying a label.
CFLplot1 <- ggplot(corbyn, aes(x=reorder(political_group,avg_facebook_likes), y=avg_facebook_likes, label=avg_facebook_likes)) + 
  geom_bar(stat="identity", fill="#DC241f") + 
  geom_label(color="#DC241f") +
  labs(x="Associate of the Labour Party", y="Average Facebook Likes") +
  theme_economist() +
  coord_flip()
CFLplot1

Code

library(ggrepel)
library(tidyverse) # call the tidyverse for %>% and ggplot
library(ggthemes) # Use the economist theme
corbyn <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/corbyn.csv")
# reorder will allow me to sort the names by the values.
# Uncomment to see: # with(corbyn, reorder(political_group,avg_facebook_likes))
# geom_label will allow me to put the numbers on the graph by specifying a label.
CFLplot2 <- ggplot(corbyn, aes(x=reorder(political_group,avg_facebook_likes), y=avg_facebook_likes, label=avg_facebook_likes)) + 
  geom_segment(aes(yend=0, xend=political_group), size=4, color="#DC241f") + 
  geom_label_repel(color="#DC241f", size=4) +
  labs(x="Associate of the Labour Party", y="Average Facebook Likes") +
  theme_economist() +
  coord_flip()
CFLplot2

Code

p1.Int <- ggplot(corbyn, aes(x=reorder(political_group,avg_facebook_likes), data_id=political_group, y=avg_facebook_likes, label=avg_facebook_likes, tooltip = paste0(political_group,"<br>",avg_facebook_likes, sep=""))) + 
  geom_bar_interactive(stat="identity", fill="#DC241f") + 
  geom_label(fill="#DC241f", color="#FFFFFF", size=3.5) +
  labs(x="Associate of the Labour Party", y="Average Facebook Likes") +
  hrbrthemes::theme_ipsum_rc() +
  coord_flip()
p2.Int <- ggplot(corbyn, aes(x=reorder(political_group,avg_facebook_likes), y=avg_facebook_likes, label=avg_facebook_likes, data_id=political_group, tooltip = paste0(political_group,"<br>",avg_facebook_likes, sep=""))) + 
  geom_segment_interactive(aes(yend=0, xend=political_group), size=2, color="#DC241f") + 
  geom_point(size=10, color="#DC241f", alpha=0.8) +
  geom_text(color="#FFFFFF", size=3.5, fontface=2) +
  labs(x="Associate of the Labour Party", y="Average Facebook Likes") +
  hrbrthemes::theme_ipsum_rc() +
  coord_flip()
library(patchwork)
GIp1 <- p1.Int / p2.Int
girafe(ggobj=GIp1)

Your Turn

In the next code chunk, you will need to cut and paste the code for grabbing your particular choice of data.

Now I am going to grab the Brexit data.

Code

brexit <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/brexit.csv")
summary(brexit)

     date           percent_responding_right percent_responding_wrong
 Length:85          Min.   :40.00            Min.   :41.00           
 Class :character   1st Qu.:42.00            1st Qu.:43.00           
 Mode  :character   Median :44.00            Median :45.00           
                    Mean   :43.71            Mean   :44.51           
                    3rd Qu.:45.00            3rd Qu.:45.00           
                    Max.   :47.00            Max.   :48.00

Code

# The variable names are too long and terrible.  Let me change them.
names(brexit) <- c("Date","Right","Wrong")
# The dates are also not read as proper dates.  Let me fix that.  It can also be fixed by coercion in esquisse
brexit$Date <- as.Date(brexit$Date, "%d/%m/%y")
brexit

The data are structured side by side and that makes it hard for R [or anything] to draw axes because it needs to know the boundaries for the axes. I need to stack the rights and wrongs into one column with an associated label in another column for Right and Wrong. That is reshaping data; pivoting longer and wider in the language of tidyr.

Code

# My first step is to stack the percent reporting on top of each other to make color and fill work properly.  I am going to take wider data [two columns of data] and stack them on top of each other.  What am I going to name this new stacked column?  "direction" and which columns do I wish to stack?  c(Right,Wrong)
My.Brexit <- brexit %>% pivot_longer(.,names_to = "direction",cols = c(Right,Wrong))
My.Brexit

Code

brexit %>% ggplot(aes(x=Date, y=Right)) + geom_smooth(color="green", size=1) + geom_point(color = "green") + geom_point(aes(x=Date, y=Wrong), color="red") + geom_smooth(aes(x=Date, y=Wrong), color="red") + labs(y="% Saying Right [Green] or Wrong [Red] to UK Leaving EU")

Your Graphic

Code

# The only addition beyond esquisse here is the addition of geom_smooth that gives me the smmoth intervals and annotate to write the sides in the graph instead of the legend..
ggp <- My.Brexit %>% ggplot(aes(x = Date, y = value, color=direction, fill = direction)) +
  geom_smooth(size = 1L) + 
  geom_point(size=1.5)  +
  scale_color_viridis_d(name="Direction", guide = FALSE) +
  scale_fill_viridis_d(name="Direction", guide = FALSE) +
  theme_economist_white() + labs(title="Bremorse...", subtitle = "In hindsight, do you think Britain was Right or Wrong to vote to leave the EU?", x="Date", y = "Percent Responding ...") + # add two annotations to the plot instead of a scale
  annotate("text" , x=as.Date("2018-01-01"), y=47, label="Wrong", size=6, color = "#fde725ff") + 
  annotate("text" , x=as.Date("2018-01-01"), y=41, label="Right", size=6, color = "#440154ff")
ggp

That’s pretty cool.

Pensions

Code

library(ggrepel) # I need to repel the labels.
pensions <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/pensions.csv")
pensions$scaled.65 <- scale(pensions$pop_65_percent)
pensions$scaled.GSgdp <- scale(pensions$gov_spend_percent_gdp)
pensions$prod.scaled <- pensions$scaled.65*pensions$scaled.GSgdp

Now a picture. I want it simple but complete. At first, I used text but it is not easy to read.

Code

pensions <- pensions %>% 
  mutate(Mean65 = mean(pop_65_percent), MeanSpend = mean(gov_spend_percent_gdp)) 
pensions %>% 
  ggplot(., aes(x=pop_65_percent, y=gov_spend_percent_gdp, label=country, color=prod.scaled))  + 
  geom_point(alpha=0.1) + 
  geom_hline(yintercept = mean(pensions$MeanSpend)) +
  geom_vline(xintercept = mean(pensions$Mean65)) + 
  geom_text(size=3, fill="white") + 
  labs(x="Percent of Population over 65", y="Government Pension Spending/GDP") + 
  theme_economist_white() + 
  labs(title="Population over 65 and Pension Spending per GDP", caption = "Coloring by Scaled Covariance") +  
  scale_color_viridis_c(guide=FALSE)

**ggrepel* allows labels that arrow to plot locations and keeps the labels from overlapping. I called the library earlier, now I will use it’s geom called geom_label_repel.

Code

ggplot(pensions) + 
  aes(x=pop_65_percent, y=gov_spend_percent_gdp, label=country, color=prod.scaled)  +
  geom_label_repel(size=3, fill="white") + 
  labs(x="Percent of Population over 65", y="Government Pension Spending/GDP") + 
  geom_point() + 
  geom_hline(yintercept = mean(pensions$MeanSpend)) +
  geom_vline(xintercept = mean(pensions$Mean65)) + 
  theme_economist_white() + 
  labs(title="Population over 65 and Pension Spending per GDP", caption = "Coloring by Scaled Covariance") + 
  scale_color_viridis_c(guide=FALSE)

That seems to get it. What’s up with Brazil?

dogs

Code

dogs <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/dogs.csv")

esquisse

Code

p <- ggplot(dogs) + aes(x=avg_neck, y=avg_weight, size=year) + geom_point() + theme_economist_white() + labs(x="Neck Size", y="Weight", title="The Incredible Shrinking Dog?")
p

This is a perfect case for a text label. If they are decent sized, they overlap. I used geom_text which requires a label [label = year] in the aesthetic mapping.

Code

p <- ggplot(dogs) + aes(x=avg_neck, y=avg_weight, label=year) + geom_text(size=3, color="white") + theme_dark() + labs(x="Neck Size", y="Weight", title="The Incredible Shrinking Dog?") 
p

Code

library(emojifont)
p <- ggplot(dogs) + aes(x=avg_neck, y=avg_weight, size=-year, label = emoji('dog')) + geom_text() + theme_minimal() + labs(x="Neck Size", y="Weight", title="The Incredible Shrinking Dog?", caption = "Sized by Year -- decreasing...  Shrinking Dogs?") + guides(size=FALSE)
p

EU Balances

Code

eu_balance <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/eu_balance.csv")
skimr::skim(eu_balance)

Data summary
Name	eu_balance
Number of rows	266
Number of columns	4
_______________________
Column type frequency:
character	2
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
country	0	1	5	11	0	19	0
account_type	0	1	6	7	0	2	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	0	1	2012.00	2.00	2009	2010.0	2012.00	2014.00	2015	▇▃▃▃▇
value	0	1	-5239.31	43197.02	-138934	-11967.7	-1527.05	185.25	271402	▁▇▁▁▁

Code

table(eu_balance$account_type)


 budget current 
    133     133

Code

table(eu_balance$country)


    Austria     Belgium      Cyprus     Estonia     Finland      France 
         14          14          14          14          14          14 
    Germany      Greece     Ireland       Italy      Latvia   Lithuania 
         14          14          14          14          14          14 
 Luxembourg       Malta Netherlands    Portugal    Slovakia    Slovenia 
         14          14          14          14          14          14 
      Spain 
         14

Code

ggplot(eu_balance) +
 aes(x = year, y = value, fill = account_type, color=account_type) +
 geom_line(size = 1L, alpha=0.3) + geom_hline(yintercept = 0, size=1, color="black") +
 scale_fill_hue() +
 theme_minimal() +
 facet_wrap(vars(country), scales = "free_y")

trade

gridExtra allows us to combine multiple plots into one. I build two plots and grid them together in the last line.

Code

library(gridExtra)
trade <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/trade.csv")
trade

Code

p1 <- ggplot(trade, aes(x=year, y=trade_deficit/1000000000)) + geom_area(fill="red") + theme_economist() + labs(y = "US Trade Deficit in Billions of US $")
p2 <- ggplot(trade, aes(x=year, y=manufacture_employment/1000000)) + geom_line() + theme_economist() + labs(y = "US Manufacturing Employment in millions")
grid.arrange(p1,p2)

That is fairly simple and it works. The trick here was to use a grid to create two plots and stack them.

The Last One: No Solution Provided

The authors of the post give you good advice on how to solve this. I contend that there is a great plot to generate using esquisse though it may not be just one plot. Have a go. If you wish to, you can peruse the appropriate timelines for #tidyTuesday and see what people came up with though in many cases, this will be apparent so use appropriate citations; it is easy to do with the examples here in this Markdown.

Code

women_research <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-16/women_research.csv")
women_research

Code

# There's a spelling error I want to fix, select the rows and columns to fix and reassign the value
women_research[women_research$field=="Women inventores","field"] <- "Women Inventors" 
# The label for CS/Maths is too long, also
women_research[women_research$field=="Computer science, maths","field"] <- "CS/Math"

The data consist of three columns and are a long pivot table. The pivots are country and field with a quantitative indicator of the percent of women in the field. It is a long pivot table characterised by country and field combinations in the rows. To show the data cleanly, let me spread out the fields by widening the pivot table by fields.

Code

# I find it far more intuitive to use pivot_wider but this will always work and is safer on versions of tidyr
# women_research %>% pivot_wider(field, percent_women) 
women_research %>% spread(field, percent_women) %>% knitr::kable()

country	CS/Math	Engineering	Health sciences	Physical sciences	Women Inventors
Australia	0.24	0.25	0.50	0.23	0.12
Brazil	0.24	0.32	0.57	0.33	0.19
Canada	0.22	0.22	0.49	0.21	0.13
Chile	0.16	0.22	0.43	0.23	0.19
Denmark	0.18	0.23	0.47	0.22	0.13
EU28	0.22	0.25	0.48	0.25	0.12
France	0.22	0.25	0.48	0.24	0.17
Japan	0.11	0.11	0.24	0.11	0.08
Mexico	0.22	0.26	0.46	0.25	0.18
Portugal	0.27	0.36	0.57	0.37	0.26
United Kingdom	0.21	0.22	0.45	0.21	0.12
United States	0.22	0.22	0.46	0.20	0.14

Visualize it…

The first way is to use esquisse and a simple grid plot for two categorical axes and add details to it in the boxes.

Code

ggp1 <- ggplot(women_research) +
 aes(x = field, y = country, fill = percent_women) +
 geom_tile(size = 1L) +
 scale_fill_viridis_c(option = "viridis") +
 labs(x = "Field or Occupation", y = "Country", title = "Publication and Invention by Women Around the World", subtitle = "Percent of Publications or Patents by Women", caption = "source: Economist Visualization Errors", fill = "Percent Women") +
 theme_minimal(base_size = 10)
ggp1 + theme(plot.background = element_rect(colour = "whitesmoke"))

We can also use equisse to build a faceted barplot by dragging the field to facet. X is country, y is percent women, fill is country with facets determined by fields.

Code

women_research %>% ggplot(.) +
 aes(x = country, fill = country, weight = percent_women) +
 geom_bar() +
 scale_fill_hue() +
 coord_flip() +
 theme_minimal(base_size = 10) +
 theme(legend.position = "none") +
  labs(y = "Percent of Publications/Patents", x = "Country", title = "Publication and Invention by Women Around the World", subtitle = "Percent of Publications or Patents by Women", caption = "source: Economist Visualization Errors") +
 facet_wrap(vars(field))

Or reverse the above use of field and country where field is the X and the fill, y is still percent_women and facets are now countries. This works quite well if only because there are 12 entities which grid into a 3 x 4 nicely.

Code

women_research %>% ggplot(.) +
 aes(x = field, fill = field, weight = percent_women) +
 geom_bar() +
 scale_fill_hue() +
 coord_flip() +
 theme_minimal(base_size = 10) +
 theme(legend.position = "none") +
  labs(y = "Percent of Publications/Patents", x = "Field", title = "Publication and Invention by Women Around the World", subtitle = "Percent of Publications or Patents by Women", caption = "source: Economist Visualization Errors") +
 facet_wrap(~country)

To show something very cool that R can do, I have added one more bit that is turned off. In the following code chunk, I use eval=FALSE to avoid evaluating the chunk in R. If the FALSE is changed to TRUE, R will need ggiraph via install.packages("ggiraph").

Code

library(ggiraph)
women_research <- women_research %>% group_by(country) %>% mutate(Avg.Women.Percent = mean(percent_women)) %>% ungroup()
women_research$PercentWomen <- as.character(round(women_research$percent_women, 3))
p1 <- ggplot(women_research) +
 aes(x = reorder(country,Avg.Women.Percent), fill = country, weight = percent_women, tooltip = PercentWomen, data_id = country) +
 geom_bar_interactive() +
 scale_fill_viridis_d(option = "cividis") +
 guides(fill=FALSE) +
 coord_flip() +
 theme_minimal() + 
  theme(axis.text.y = element_text(angle = 45, hjust = 1, size=6)) +
  labs(y = "Percent of Publications/Patents", x = "Country/Group", fill="Country/Grouping", title = "Publication and Invention by Women Around the World", subtitle = "Percent of Publications or Patents by Women", caption = "source: Economist Visualization Errors") +
 facet_wrap(~field)
girafe(code = print(p1))

Code

library(patchwork)
ggp1.changed <- ggplot(women_research) +
 aes(x = field, y = country, fill = percent_women, , data_id = country, tooltip=percent_women) +
 geom_tile_interactive(size = 1L) +
 scale_fill_viridis_c(option="cividis") +
 labs(x = "Field or Occupation", y = "Country", fill = "% Women") +
 theme_minimal(base_size = 10)
ggp1.changed <- ggp1.changed + theme(plot.background = element_rect(colour = "whitesmoke"))
comb1 <- p1 / ggp1.changed
girafe(code = print(comb1))

A Brief Crosstalk

Getting this to work wasn’t hard. Getting it to style properly in interaction with the bootstrap frame was a pain. I finally just went with a blank column in two places to make sure that names don’t get cut off.

Code

library(htmltools)
library(crosstalk)
library(DT)
library(d3scatter)
# Load the data
Bonds <- read.csv(url("https://github.com/robertwwalker/DADMStuff/raw/master/BondFunds.csv"), row.names=1)
# Turn characters into factors
Bonds <- Bonds %>% mutate(Fees = as.factor(Fees), Risk = as.factor(Risk), Type = as.factor(Type))
# Create shared data object
shared_bonds <- SharedData$new(Bonds)
# Generate the bootsrap columns as three rows, page is 12 wide
bscols(widths=c(12,12,12),
  # A d3 visualization
  d3scatter(shared_bonds, ~Expense.Ratio, ~Return.2009, ~Risk),
  # The filter checkboxes
  list(filter_checkbox("Fees", "Fees", shared_bonds, ~Fees, inline=TRUE),
       filter_checkbox("Type", "Type", shared_bonds, ~Type, inline=TRUE),
       filter_checkbox("Fees", "Risk", shared_bonds, ~Risk, inline=TRUE)),
  # The datatable
  datatable(shared_bonds))

Without the bootstrap formatting of the webpage, it works much better.

The Fix for the cutoff parts

The trick to getting this to display properly was a blank column with width=1.

Code

shared_bonds2 <- SharedData$new(Bonds)
# This was adjusted from above
bscols(widths=c(12,1,11,1,11),
  d3scatter(shared_bonds2, ~Expense.Ratio, ~Return.2009, ~Risk),
  "",  # First blank column
  list(filter_checkbox("Fees", "Fees", shared_bonds2, ~Fees, inline=TRUE),
       filter_checkbox("Type", "Type", shared_bonds2, ~Type, inline=TRUE),
       filter_checkbox("Fees", "Risk", shared_bonds2, ~Risk, inline=TRUE)),
  "", # Second blank column
  datatable(shared_bonds2))

References

Code

knitr::write_bib(names(sessionInfo()$otherPkgs), file="bibliography.bib")

References

Arnold, Jeffrey B. 2021. Ggthemes: Extra Themes, Scales and Geoms for Ggplot2. https://github.com/jrnold/ggthemes.

Auguie, Baptiste. 2017. gridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

Cheng, Joe. 2023. D3scatter: Demo of D3 Scatter Plot; Testbed for Crosstalk Library.

Cheng, Joe, and Carson Sievert. 2021. Crosstalk: Inter-Widget Interactivity for HTML Widgets. https://rstudio.github.io/crosstalk/.

Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2022. Htmltools: Tools for HTML. https://CRAN.R-project.org/package=htmltools.

Gohel, David, and Panagiotis Skintzos. 2023. Ggiraph: Make Ggplot2 Graphics Interactive. https://davidgohel.github.io/ggiraph/.

Müller, Kirill, and Hadley Wickham. 2022. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.

Pedersen, Thomas Lin. 2022. Patchwork: The Composer of Plots. https://CRAN.R-project.org/package=patchwork.

Slowikowski, Kamil. 2023. Ggrepel: Automatically Position Non-Overlapping Text Labels with Ggplot2. https://github.com/slowkow/ggrepel.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

———. 2022a. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.

———. 2022b. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.

———. 2023. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, and Lionel Henry. 2023. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.

Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2023. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.

Xie, Yihui, Joe Cheng, and Xianying Tan. 2023. DT: A Wrapper of the JavaScript Library DataTables. https://github.com/rstudio/DT.

Yu, Guangchuang. 2021. Emojifont: Emoji and Font Awesome in Graphics. https://github.com/GuangchuangYu/emojifont.