2.3 Colour, size, shape and other aesthetic attributes

To add additional variables to a plot, we can use other aesthetics like colour, shape, and size

πŸ’‘ Extra: ggplot2 accepts American and British spellings

These work in the same way as the x and y aesthetics, and are added into the call to aes():

  • aes(displ, hwy, colour = class)
  • aes(displ, hwy, shape = drv)
  • aes(displ, hwy, size = cyl)

ggplot2 takes care of the details of converting data (e.g., β€˜f’, β€˜r’, β€˜4’) into aesthetics (e.g., β€˜red’, β€˜yellow’, β€˜green’) with a scale.

πŸ“˜ Note: β€˜f’ – a single character, β€˜r’ – another character, β€˜4’ – a character that looks like a number but is still stored as text, not numeric.

There is one scale for each aesthetic mapping in a plot. The scale is also responsible for creating a guide, an axis or legend, that allows you to read the plot, converting aesthetic values back into data values.

For now, we’ll stick with the default scales provided by ggplot2.

To learn more about those outlying variables in the previous scatterplot, we could map the class variable to colour:

ggplot(mpg, aes(displ, hwy, colour = class)) + 
  geom_point()

This gives each point a unique colour corresponding to its class.

The legend allows us to read data values from the colour, showing us that the group of cars with unusually high fuel economy for their engine size are two seaters: cars with big engines, but lightweight bodies.

If you want to set an aesthetic to a fixed value, without scaling it, do so in the individual layer outside of aes().

Compare the following two plots:

ggplot(mpg, aes(displ, hwy)) + 
  geom_point(aes(colour = "blue"))


ggplot(mpg, aes(displ, hwy)) + 
  geom_point(colour = "blue")


In the first plot, the value blue is scaled to a pinkish colour, and a legend is added.

In the second plot, the points are given the R colour blue .

This is an important technique and you’ll learn more about it in mapping for the values needed for colour and other aesthetics.

Different types of aesthetic attributes work better with different types of variables.

πŸ’‘ Tip: colour and shape work well with categorical variables, while size works well for continuous variables.

Categorical vs Continuous in the mpg dataset

Aesthetic Works Best For Example mpg Variables
colour (discrete) Categorical class, drv, cyl
shape Categorical drv, fl
facets Categorical class, year
colour (gradient) Continuous displ, hwy, cty
size Continuous displ, hwy

The amount of data also makes a difference: if there is a lot of data it can be hard to distinguish different groups.

An alternative solution is to use faceting, as described next.

It’s difficult to see the simultaneous relationships among colour andshape and size, so exercise restraint when using aesthetics.

Instead of trying to make one very complex plot that shows everything at once, see if you can create a series of simple plots that tell a story, leading the reader from ignorance to knowledge.

πŸ“Œ Remember: When using aesthetics in a plot, less is usually more.

πŸ’‘ πŸš— Categorical variables (factor / discrete), πŸ“ˆ Continuous (numeric) variables.

2.3.1 Using labs() together with colors

ggplot(mpg, aes(displ, hwy, color = class)) +
  geom_point() +
  labs(
    title = "Engine Size vs Highway Fuel Efficiency",
    subtitle = "Colored by vehicle class",
    x = "Engine Displacement (litres)",
    y = "Highway (mpg)",
    color = "Vehicle Class",
    caption = "mpg dataset, ggplot"
  )

2.3.2 Exercises

  1. Experiment with the colour, shape and size aesthetics.

    -What happens when you map them to continuous values?

    -What about categorical values?

    -What happens when you use more than one aesthetic in a plot?

  2. What happens if you map a continuous variable to shape?

    -Why?

    -What happens if you map trans to shape?

    -Why?

2.3.3 Small hint

ggplot(mpg, aes(x = displ, 
                y = hwy, 
                colour = class,     # categorical
                size = displ)) +    # continuous
  geom_point(alpha = 0.8) +
  scale_size_continuous(range = c(2, 8)) +
  labs(
    title = "Engine Size vs Highway Efficiency",
    subtitle = "Colour = vehicle class (categorical), Size = engine displacement (continuous)",
    x = "Engine displacement (L)",
    y = "Highway MPG",
    colour = "Vehicle Class",
    size = "Displ"
  ) +
  theme_pubr(border = FALSE)