Assign responses to groups based on the occurrence of key words.
Published
January 3, 2024
Modified
January 9, 2024
Crayon colors
Let’s start simple and assign crayons a color based on their descriptive names. Below is a table of 12 crayons and their names. We want to create a group for each of the primary colors - red, yellow, and blue - and an everything-else group labeled other.
We’ll begin by searching for the primary color words in the name column of each of the crayons. If a color word is detected, it will be assigned to the appropriate group.
# A tibble: 12 × 2
id name
<dbl> <chr>
1 1 brick red
2 2 vibrant orange
3 3 warm yellow
4 4 slate green
5 5 indigo blue
6 6 grey black
7 7 plum purple
8 8 beige brown
9 9 denim blue
10 10 incredible pink
11 11 sap green
12 12 cloudy off-white
Find color words
We’ll use case_when() and str_detect() to test if a crayon name contains a given primary color word. If it does, we’ll assign the crayon to that color. If none of the primary color words are detected, the crayon will be assigned to the group other.
# A tibble: 5 × 3
id name group
<dbl> <chr> <chr>
1 1 brick red RED
2 2 vibrant orange other
3 3 warm yellow YELLOW
4 4 slate green other
5 5 indigo blue BLUE
Joining groups and codes
The approach above works well for a small number of groups, but it can be cumbersome when you have lots of groups to assign. If we wanted to sort the crayons into many more color groups, a better approach would be to create a table to store our group names and the associated color word.
Then we can join the groups table to the crayons table. The end result is the same, but it requires less code and will be much easier to update when we want to add new groups or change the words associated with a group.
Here’s the same example as above using the table joining approach.
Create the group table
First, create a table of our groups with two columns:
color_group The group’s name
word The word associated with the group
color_groups <-tibble(color_group =c('RED','YELLOW','BLUE'), word =c("red", "yellow", "blue"))
Split-up (unnest) the words in the crayon names for easy joining
crayons <- crayons %>%unnest_tokens(word, name, drop =FALSE)
Join groups with left_join
Now we can join our two tables using the word column in the crayons table, and the word column in the color_groups table. The function used to join the tables together is left_join. It has a by argument used to set the columns for linking the two tables together. In this case, the tables both share a column named word that we want to join by.
crayons_groups <- crayons %>%left_join(color_groups, by =join_by(word == word))crayons_groups %>%head()
# A tibble: 6 × 4
id name word color_group
<dbl> <chr> <chr> <chr>
1 1 brick red brick <NA>
2 1 brick red red RED
3 2 vibrant orange vibrant <NA>
4 2 vibrant orange orange <NA>
5 3 warm yellow warm <NA>
6 3 warm yellow yellow YELLOW
Summarize
To reduce this long list of words down to a single row for each crayon, we can use the function fill() and slice_head().
# A tibble: 6 × 3
# Groups: id, name [6]
id name color_group
<dbl> <chr> <chr>
1 1 brick red RED
2 2 vibrant orange <NA>
3 3 warm yellow YELLOW
4 4 slate green <NA>
5 5 indigo blue BLUE
6 6 grey black <NA>
Set NA’s group to “other”
Finally, to tidy things up we’ll use replace_na() to assign the crayons without a color group to a group called “other”.
# A tibble: 10 × 3
# Groups: id, name [10]
id name color_group
<dbl> <chr> <chr>
1 1 brick red RED
2 2 vibrant orange other
3 3 warm yellow YELLOW
4 4 slate green other
5 5 indigo blue BLUE
6 6 grey black other
7 7 plum purple other
8 8 beige brown other
9 9 denim blue BLUE
10 10 incredible pink other
Success!
Using umbrella groups: Parent and child codes
We want to sort our crayons into primary colors and secondary colors. To do this we’ll need to check a crayon name for multiple words. For example, a crayon will be assigned to the primary color group if any of the following words occur in its name: red, yellow, or blue.
The larger umbrella group is sometimes referred to as the parent code, and the individual terms that fall under it are its children codes. Here’s the table of our color groups in terms of parent and child codes.
# A tibble: 10 × 3
# Groups: id, name [10]
id name parent
<dbl> <chr> <chr>
1 1 brick red primary
2 2 vibrant orange secondary
3 3 warm yellow primary
4 4 slate green secondary
5 5 indigo blue primary
6 6 grey black other
7 7 plum purple secondary
8 8 beige brown other
9 9 denim blue primary
10 10 incredible pink other
Assign multiple tags or codes
When we work with longer pieces of text we may want to assign a piece of text to multiple groups or tags. For example, the description of a kids’ TV show may be about both dinosaurs and sisters.
In this example, we will label the shows about people and tag each description with the types of people it references, such as sister, brother, or grandmother.
tv_shows <- tv_shows %>%unnest_tokens(word, description, drop =FALSE)
Load the parent and child code table
Here we provide an example parent/child code table to tag descriptions with various types of people. Both the singular and plural version of each term is included - such as uncle and uncles.
# A tibble: 6 × 3
parent child child_words
<chr> <chr> <chr>
1 people aunt aunt
2 people aunt aunts
3 people boy boy
4 people boy boys
5 people child child
6 people child children
Join the parent_codes table to the tv_shows
tv_groups <- tv_shows %>%left_join(people_codes, by =join_by(word == child_words))# Drop the rows/words with no word matchestv_groups <- tv_groups %>%filter(!is.na(parent))# View word matchestv_groups %>%select(title, description, parent, child) %>%arrange(child) %>%head() %>% knitr::kable()
title
description
parent
child
Heidi, bienvenida a casa
Inspired by the classic novel, this telenovela follows Heidi, who leaves her happy life in the mountains behind when her aunt takes her to the big city.
people
aunt
Hotel Transylvania
With her dad away, Mavis is so ready for adventure – if strict Aunt Lydia doesn’t stop her first. Set four years before the “Hotel Transylvania” film.
people
aunt
Judy Moody and the Not Bummer Summer
In this family film, never-dull third-grader Judy Moody embarks on a summer adventure with her brother, Stink, and always-up-for-fun Aunt Opal.
people
aunt
Rip Tide
Following an embarrassing viral video, a New York model decides to escape from her suffocating existence by visiting her faraway aunt in Australia.
people
aunt
A Babysitter’s Guide to Monster Hunting
Recruited by a secret society of babysitters, a high schooler battles the Boogeyman and his monsters when they nab the boy she’s watching on Halloween.
people
boy
A Cinderella Story
Teen Sam meets the boy of her dreams at a dance before returning to toil in her stepmother’s diner. Can her lost cell phone bring them together?
people
boy
Summarize
To simplify things, let’s take all of the parent and child tags assigned to each movie and bring them together into a comma separated list.
When a snow day shuts down the whole town, the Wheeler family cuts loose. Hal makes a play for the most popular girl in his school, 10-year-old Natalie takes on the dreaded snowplow man, and Dad gets into a showdown with a rival meteorologist.
people, places
family, father, girl, man, school, town
The Haunted Hathaways
Single mom Michelle Hathaway and her daughters find that they share their New Orleans home with the ghosts of single dad Ray Preston and his two sons.
people, places
daughter, father, home, mother, son
Riding Faith
Following her father’s death, a young woman struggles to help her mother keep the family ranch afloat while preserving a special bond with her horse.
events, people
death, family, mother, woman
The Breadwinner
A courageous 11-year-old Afghan girl disguises herself as a boy and takes on odd jobs to provide for her family when her father is arrested.
people
boy, family, father, girl
Elf Pets: A Fox Cub’s Christmas Tale
An elite team of elves – and their furry fox cub friends – help bring the Christmas spirit to a boy whose mom may not make it home for the holidays.
people, places
boy, home, mother, team
Lego Friends
As a way to make friends, new girl in town Olivia volunteers to work at the Heartlake City World Petacular with four other girls.