3 — Group and code responses

code
group
themes
Assign responses to groups based on the occurrence of key words.
Published

January 3, 2024

Modified

January 9, 2024

Crayon colors

A box of 12 Crayola crayons.

Let’s start simple and assign crayons a color based on their descriptive names. Below is a table of 12 crayons and their names. We want to create a group for each of the primary colors - red, yellow, and blue - and an everything-else group labeled other.

We’ll begin by searching for the primary color words in the name column of each of the crayons. If a color word is detected, it will be assigned to the appropriate group.

Load the data

library(tidyverse)
library(tidytext)


crayons <- read_csv('https://tidy-mn.github.io/qualitative-guide/posts/data/crayons.csv') 

crayons
# A tibble: 12 × 2
      id name            
   <dbl> <chr>           
 1     1 brick red       
 2     2 vibrant orange  
 3     3 warm yellow     
 4     4 slate green     
 5     5 indigo blue     
 6     6 grey black      
 7     7 plum purple     
 8     8 beige brown     
 9     9 denim blue      
10    10 incredible pink 
11    11 sap green       
12    12 cloudy off-white

Find color words

We’ll use case_when() and str_detect() to test if a crayon name contains a given primary color word. If it does, we’ll assign the crayon to that color. If none of the primary color words are detected, the crayon will be assigned to the group other.

crayons_groups <- crayons %>%
                  mutate(group = case_when(str_detect(name, "red") ~ "RED",
                                           str_detect(name, "yellow") ~ "YELLOW",
                                           str_detect(name, "blue") ~ "BLUE",
                                           .default = "other"))

crayons_groups %>% head(5)
# A tibble: 5 × 3
     id name           group 
  <dbl> <chr>          <chr> 
1     1 brick red      RED   
2     2 vibrant orange other 
3     3 warm yellow    YELLOW
4     4 slate green    other 
5     5 indigo blue    BLUE  

Joining groups and codes

The approach above works well for a small number of groups, but it can be cumbersome when you have lots of groups to assign. If we wanted to sort the crayons into many more color groups, a better approach would be to create a table to store our group names and the associated color word.

Then we can join the groups table to the crayons table. The end result is the same, but it requires less code and will be much easier to update when we want to add new groups or change the words associated with a group.

Here’s the same example as above using the table joining approach.

Create the group table

First, create a table of our groups with two columns:

  • color_group The group’s name
  • word The word associated with the group
color_groups <- tibble(color_group = c('RED','YELLOW','BLUE'), 
                       word = c("red", "yellow", "blue")) 

Split-up (unnest) the words in the crayon names for easy joining

crayons <- crayons %>%
           unnest_tokens(word, name, drop = FALSE)

Join groups with left_join

Now we can join our two tables using the word column in the crayons table, and the word column in the color_groups table. The function used to join the tables together is left_join. It has a by argument used to set the columns for linking the two tables together. In this case, the tables both share a column named word that we want to join by.

crayons_groups <- crayons %>%
                  left_join(color_groups, 
                           by = join_by(word == word))

crayons_groups %>% head()
# A tibble: 6 × 4
     id name           word    color_group
  <dbl> <chr>          <chr>   <chr>      
1     1 brick red      brick   <NA>       
2     1 brick red      red     RED        
3     2 vibrant orange vibrant <NA>       
4     2 vibrant orange orange  <NA>       
5     3 warm yellow    warm    <NA>       
6     3 warm yellow    yellow  YELLOW     

Summarize

To reduce this long list of words down to a single row for each crayon, we can use the function fill() and slice_head().

crayons_groups <- crayons_groups %>%
                  group_by(id, name) %>%
                  fill(color_group, .direction = "updown") %>%
                  slice_head(n = 1) %>%
                  select(-word)

crayons_groups %>% head()
# A tibble: 6 × 3
# Groups:   id, name [6]
     id name           color_group
  <dbl> <chr>          <chr>      
1     1 brick red      RED        
2     2 vibrant orange <NA>       
3     3 warm yellow    YELLOW     
4     4 slate green    <NA>       
5     5 indigo blue    BLUE       
6     6 grey black     <NA>       

Set NA’s group to “other”

Finally, to tidy things up we’ll use replace_na() to assign the crayons without a color group to a group called “other”.

crayons_groups <- crayons_groups %>%
                  replace_na(list(color_group = "other"))

crayons_groups %>% head(10)
# A tibble: 10 × 3
# Groups:   id, name [10]
      id name            color_group
   <dbl> <chr>           <chr>      
 1     1 brick red       RED        
 2     2 vibrant orange  other      
 3     3 warm yellow     YELLOW     
 4     4 slate green     other      
 5     5 indigo blue     BLUE       
 6     6 grey black      other      
 7     7 plum purple     other      
 8     8 beige brown     other      
 9     9 denim blue      BLUE       
10    10 incredible pink other      

Success!

Using umbrella groups: Parent and child codes

We want to sort our crayons into primary colors and secondary colors. To do this we’ll need to check a crayon name for multiple words. For example, a crayon will be assigned to the primary color group if any of the following words occur in its name: red, yellow, or blue.

The larger umbrella group is sometimes referred to as the parent code, and the individual terms that fall under it are its children codes. Here’s the table of our color groups in terms of parent and child codes.

color_group_codes <- tribble(
  ~parent, ~child,
  "primary",   "red",
  "primary",   "yellow",
  "primary",   "blue",
  "secondary", "green",
  "secondary", "orange",
  "secondary", "purple",
)

Join primary and secondary

Now we can repeat our previous fuzzy join steps to assign each of the crayons to the groups: primary, secondary, or other.

crayons_groups <- crayons %>%
                  left_join(color_group_codes, 
                            by = join_by(word == child)) 

crayons_groups %>% head(10)
# A tibble: 10 × 4
      id name           word    parent   
   <dbl> <chr>          <chr>   <chr>    
 1     1 brick red      brick   <NA>     
 2     1 brick red      red     primary  
 3     2 vibrant orange vibrant <NA>     
 4     2 vibrant orange orange  secondary
 5     3 warm yellow    warm    <NA>     
 6     3 warm yellow    yellow  primary  
 7     4 slate green    slate   <NA>     
 8     4 slate green    green   secondary
 9     5 indigo blue    indigo  <NA>     
10     5 indigo blue    blue    primary  

Tidy up

Repeat our clean-up steps with group_by, fill, slice_head, and replace_na.

crayons_groups <- crayons_groups %>%
                  group_by(id, name) %>%
                  fill(parent, .direction = "updown") %>%
                  slice_head(n = 1) %>%
                  select(-word) %>%
                  replace_na(list(parent = "other"))

crayons_groups %>% head(10)
# A tibble: 10 × 3
# Groups:   id, name [10]
      id name            parent   
   <dbl> <chr>           <chr>    
 1     1 brick red       primary  
 2     2 vibrant orange  secondary
 3     3 warm yellow     primary  
 4     4 slate green     secondary
 5     5 indigo blue     primary  
 6     6 grey black      other    
 7     7 plum purple     secondary
 8     8 beige brown     other    
 9     9 denim blue      primary  
10    10 incredible pink other    

Assign multiple tags or codes

When we work with longer pieces of text we may want to assign a piece of text to multiple groups or tags. For example, the description of a kids’ TV show may be about both dinosaurs and sisters.

In this example, we will label the shows about people and tag each description with the types of people it references, such as sister, brother, or grandmother.

Load kids TV data

library(tidyverse)
library(tidytext)
library(fuzzyjoin)

tv_shows <- read_csv('https://tidy-mn.github.io/qualitative-guide/posts/data/kids_netflix_shows.csv')

Split-up (unnest) the words for easy joining

tv_shows <- tv_shows %>%
            unnest_tokens(word, description, drop = FALSE)

Load the parent and child code table

Here we provide an example parent/child code table to tag descriptions with various types of people. Both the singular and plural version of each term is included - such as uncle and uncles.

people_codes <- read_csv("https://tidy-mn.github.io/qualitative-guide/posts/data/people_codes.csv")

people_codes %>% head()
# A tibble: 6 × 3
  parent child child_words
  <chr>  <chr> <chr>      
1 people aunt  aunt       
2 people aunt  aunts      
3 people boy   boy        
4 people boy   boys       
5 people child child      
6 people child children   

Join the parent_codes table to the tv_shows

tv_groups <- tv_shows %>%
             left_join(people_codes, 
                       by = join_by(word == child_words))

# Drop the rows/words with no word matches
tv_groups <- tv_groups %>%
             filter(!is.na(parent))

# View word matches
tv_groups %>%
  select(title, description, parent, child) %>%
  arrange(child) %>%
  head() %>%
  knitr::kable()
title description parent child
Heidi, bienvenida a casa Inspired by the classic novel, this telenovela follows Heidi, who leaves her happy life in the mountains behind when her aunt takes her to the big city. people aunt
Hotel Transylvania With her dad away, Mavis is so ready for adventure – if strict Aunt Lydia doesn’t stop her first. Set four years before the “Hotel Transylvania” film. people aunt
Judy Moody and the Not Bummer Summer In this family film, never-dull third-grader Judy Moody embarks on a summer adventure with her brother, Stink, and always-up-for-fun Aunt Opal. people aunt
Rip Tide Following an embarrassing viral video, a New York model decides to escape from her suffocating existence by visiting her faraway aunt in Australia. people aunt
A Babysitter’s Guide to Monster Hunting Recruited by a secret society of babysitters, a high schooler battles the Boogeyman and his monsters when they nab the boy she’s watching on Halloween. people boy
A Cinderella Story Teen Sam meets the boy of her dreams at a dance before returning to toil in her stepmother’s diner. Can her lost cell phone bring them together? people boy

Summarize

To simplify things, let’s take all of the parent and child tags assigned to each movie and bring them together into a comma separated list.

tv_groups <- tv_groups %>%
              filter(!is.na(parent)) %>%
              group_by(show_id, type, title, country, release_year, description) %>%
              summarize(parent_codes = paste(parent %>% unique %>% sort, collapse = ", "),
                        child_codes = paste(child %>% unique %>% sort, collapse = ", "),
                        .groups = "drop")
  
# View assigned codes in a single list
tv_groups %>%
  select(title, description, parent_codes, child_codes) %>%
  arrange(-nchar(child_codes)) %>%
  head() %>%
  knitr::kable()
title description parent_codes child_codes
Snow Day When a snow day shuts down the whole town, the Wheeler family cuts loose. Hal makes a play for the most popular girl in his school, 10-year-old Natalie takes on the dreaded snowplow man, and Dad gets into a showdown with a rival meteorologist. people, places family, father, girl, man, school, town
The Haunted Hathaways Single mom Michelle Hathaway and her daughters find that they share their New Orleans home with the ghosts of single dad Ray Preston and his two sons. people, places daughter, father, home, mother, son
Riding Faith Following her father’s death, a young woman struggles to help her mother keep the family ranch afloat while preserving a special bond with her horse. events, people death, family, mother, woman
The Breadwinner A courageous 11-year-old Afghan girl disguises herself as a boy and takes on odd jobs to provide for her family when her father is arrested. people boy, family, father, girl
Elf Pets: A Fox Cub’s Christmas Tale An elite team of elves – and their furry fox cub friends – help bring the Christmas spirit to a boy whose mom may not make it home for the holidays. people, places boy, home, mother, team
Lego Friends As a way to make friends, new girl in town Olivia volunteers to work at the Heartlake City World Petacular with four other girls. people, places city, girl, town, world