1 Day 1 review

Get to know your DATA

Function	Information
`names(scrap)`	column names
`nrow(...)`	number of rows
`ncol(...)`	number of columns
`summary(...)`	summary of all column values (ex. max, mean, median)
`glimpse(...)`	column names + a glimpse of first values (requires dplyr package)

Filtering

Menu of comparisons

Symbol Comparison

> greater than

>= greater than or equal to

< less than

<= less than or equal to

== equal to

!= NOT equal to

%in% value is in list: X %in% c(1,3,7)

is.na(...) is the value missing?

str_detect(col_name, "word") “word” appears in text?

Symbol	Comparison
`>`	greater than
`>=`	greater than or equal to
`<`	less than
`<=`	less than or equal to
`==`	equal to
`!=`	NOT equal to
`%in%`	value is in list: `X %in% c(1,3,7)`
`is.na(...)`	is the value missing?
`str_detect(col_name, "word")`	“word” appears in text?

Your analysis toolbox

dplyr is the hero for most analysis tasks. With these six functions you can accomplish just about anything you want with your data.

Function	Job
`select()`	Select individual columns to drop or keep
`arrange()`	Sort a table top-to-bottom based on the values of a column
`filter()`	Keep only a subset of rows depending on the values of a column
`mutate()`	Add new columns or update existing columns
`summarize()`	Calculate a single summary for an entire table
`group_by()`	Sort data into groups based on the values of a column

`dplyr` with Porgs

The poggle of porgs are back to help us review dplyr functions.

library(tidyverse)

porgs <- read_csv("https://mn-r.netlify.com/data/porg_data.csv")

2 Day 2 review

Join tables with left_join()
summarize() functions
Group by category with group_by()
ifelse(): if THIS IS TRUE do a thing, otherwise do a different thing
Plots and charts with ggplot

library(tidyverse)

porgs <- read_csv("https://mn-r.netlify.com/data/porg_data.csv")

porg_names <- read_csv("https://mn-r.netlify.com/data/porg_names.csv")

The joined table

named_porgs <- left_join(porgs, porg_names, by = "id")

Welcome to Endor!

But wait…

The Ewoks say it’s unsafe to land. Not good. It sounds like the Empire is checking all incoming ships for licenses.

Lucky for us, there is some good news. The Ewoks say for a small ship like ours, it could be possible to land undetected. But we’ll have to put down right-dab in the middle of the 3 northern outposts.

Are you up for a challenge data droid?

Either way, let’s make a map to see what we’re up against. To help, our comrades sent along the coordinates of the Empire’s outposts. Guess what the first step is? That’s right. Time to add a new package.

3 Reading coordinates

To read coordinates stored in a CSV or Excel spreadsheet, we can read them into R with our usual friends:

readr’s read_csv() for CSV files
readxl’s read_excel() for Excel files

For shapefiles, we use the sf package:

sf’s st_read() for SHP files

Get the coordinates from CSV

library(readr)

outposts <- read_csv("https://mn-r.netlify.app/data/endor_outposts.csv")

Take a look inside

outposts

## # A tibble: 3 × 3
##   ID      lat  long
##   <chr> <dbl> <dbl>
## 1 1NO    36.7 -119.
## 2 1NL    36.6 -119.
## 3 3NB    36.6 -119.

glimpse(outposts)

## Rows: 3
## Columns: 3
## $ ID   <chr> "1NO", "1NL", "3NB"
## $ lat  <dbl> 36.65520, 36.58375, 36.59790
## $ long <dbl> -118.8469, -118.8195, -118.9022

To make a quick map, use ggplot and add + geom_point().

library(ggplot2)

ggplot(outposts, aes(x = long, y = lat)) + 
  geom_point(size = 8, color = "steelblue")

Where’s the center point?

Hint: How would you find the halfway point between two points?

Let’s use mutate() from the dplyr toolbox to update our location columns.

Set the new lat column to the center of all the lat’s
Set the new long column to the center of all the long’s.

These will be the coordinates for our landing pad.

`mutate` the center

Complete the code

# Update lat/long to the center point (the average of ALL the lat/long)             
land_pad <- summarize(outposts, 
                      lat  = mean( ______ ),
                      long = _____________ ,
                      ID   = "Land here!")

Show code

land_pad <- summarize(outposts, 
                      lat  = mean( lat ),
                      long = mean( long ),
                      ID   = "Land here!")

Where should we land?

# View center coordinates
land_pad

lat	long	ID
36.61228	-118.8562	Land here!

We found it!

Let’s add it to our map.

library(ggplot2)

ggplot(outposts, aes(x = long, y = lat)) + 
  geom_point(size = 8, color = "steelblue") +
  geom_point(data = land_pad, 
             aes(x = long, y = lat), 
             size = 12,
             color = "green")

That’s looking good, but…

We’re going to need to be very precise to land perfectly in between these outposts. To make our captain really happy, let’s put this all in an interactive zoomable map. For interactive maps we use leaflet.

4 Leaflet maps

The `leaflet` package.

Leaflet makes interactive maps easy and you build them up in layers similar to a ggplot.

Install `leaflet`

install.packages("leaflet")

Explore the Full Leaflet Guide!

Maps that zoom

Leaflet builds map with layers similar to ggplot, but instead of adding things with the +, leaflet adds new layers with the %>% (“the pipe”).

1. Start with the outpost coordinates

library(leaflet)

leaflet(outposts) %>% 
  addCircles(radius = 250) %>%  
  addTiles()

2. Add the landing site

Fingers crossed it’s in the middle….

leaflet(outposts) %>% 
  addCircles(radius = 300) %>% 
  addTiles() %>%
  addMarkers(data = land_pad)  #<<

3. Add labels

Try hovering over one of the outposts to see its label.

leaflet(outposts) %>% 
  addCircles(radius = 300,
             label = ~ID) %>%   #<<
  addTiles() %>%
  addMarkers(data  = land_pad,
             label = "Land HERE!",  #<<
             labelOptions = labelOptions(noHide = TRUE))  #<<

Explore!

News just came in that the outposts will soon be upgrading their radar to detect ships up to 4,200 meters away. Will that be a problem?

Steps

Update the radius argument in addCircles(...) to be equal to 4200.
Add the argument , fillColor = "yellow" inside addCircles(..)

Show code

leaflet(outposts) %>%
  addCircles(radius = 4200, #<<
             label = ~ID,
             fillColor = "yellow") %>% 
  addTiles() %>%
  addMarkers(data  = land_pad,
             label = "Land HERE!",  
             labelOptions = labelOptions(noHide = TRUE))

Will our landing spot still be safe?

Woah! That’s a close one.

Great work data droid. Let’s take a minute to cool down…

Ok, now let’s land on that planet before we get stuck up here with permanent space legs.

BASEMAPS

There are lots of options for basemaps in leaflet. One of my favorites is CartoDB.Positron because its greyness doesn’t distract from the data.

Add it to your map by swapping out addTiles() for addProviderTiles(providers$CartoDB.Positron)

Show code

leaflet(outposts) %>%
  addCircles(radius = 4200, 
             label = ~ID,
             fillColor = "yellow") %>% 
   addProviderTiles(providers$CartoDB.Positron) %>%  #<<
   addMarkers(data  = land_pad,
              label = "Land HERE!",  
              labelOptions = labelOptions(noHide = TRUE))

See more available basemaps at rstudio.github.io/leaflet/basemaps.

For more Leaflet examples, see the Full Leaflet Guide!

5 Invasive Porg Survey - 2023

The Ewoks need our help. There’s porgs EVERYWHERE!

Porgs have been spreading across the galaxy; likely by hitching rides on unsuspecting ships. They’re so cute people hate to say it, but they are starting to become a nuisance.

To get a grasp on the population explosion the Ewoks are launching a porg survey. And they need your help.

6 Dates with `lubridate`

The `lubridate` package.

It’s about time! Lubridate makes working with dates much easier.

We can find how much time has elapsed, add or subtract days, and find seasonal and day of the week averages. The package is included in the tidyverse bundle of packages, so it’s already installed!

View the date cheatsheet HERE.

It’s a great reference when you need to manipulate dates or timezones in your data.

Menu of date functions

Convert text to a `DATE`

Function	Order of date elements
`mdy()`	Month-Day-Year :: `05-18-2019` or `05/18/2019`
`dmy()`	Day-Month-Year (Euro dates) :: `18-05-2019` or `18/05/2019`
`ymd()`	Year-Month-Day (science dates) :: `2019-05-18` or `2019/05/18`
`ymd_hm()`	Year-Month-Day Hour:Minutes :: `2019-05-18 8:35 AM`
`ymd_hms()`	Year-Month-Day Hour:Minutes:Seconds :: `2019-05-18 8:35:22 AM`

Get specific date parts (eg. the year, month, day… )

Function	Date element
`year()`	Year
`month()`	Month as 1,2,3; For Jan, Feb, Mar use `label=TRUE`
`week()`	Week of the year
`day()`	Day of the month
`wday()`	Day of the week as 1,2,3; For Sun, Mon, Tue use `label=TRUE`
- Time -
`hour()`	Hour of the day (24hr)
`minute()`	Minutes
`second()`	Seconds
`tz()`	Time zone

Clean the dates

Real world examples

Does your date column look like one of these?

Here’s the lubridate function to convert the column to a date.

Format	Function to use
“05/18/2019”	`mdy(date_column)`
“May 18, 2019”	`mdy(date_column)`
“05/18/2019 8:00 CDT”	`mdy_hm(date_column, tz = "US/Central")`
“05/18/2019 11:05:32 PDT”	`mdy_hms(date_column, tz = "US/Pacific")`

European dates

Format	Function to use
“30.05.2019”	`dmy(date_column)`
“30-05-2019”	`dmy(date_column)`
“30/05/2019”	`dmy(date_column)`

Simple number dates

Format	Function to use
“20190518”	`ymd(sample_date)`

Survey objective

The Ewoks have asked you help organize their Porg survey.
In 2023, they would like to perform a porg count once a week from May to October.
They have enough volunteers to run the survey at two locations
- Bright Tree and Fern Gully

Let’s generate the full list of dates during this time span using the sequence function: seq()

Run the code below to create a list of dates for the survey.

library(tidyverse)
library(lubridate)

start_date <- ymd("2023-05-01")

end_date   <- ymd("2023-10-31")

# Sequence from start to end, counting by 1 day intervals
survey_dates <- seq(from = start_date, 
                    to = end_date, 
                    by = "day")

How many days will the survey run?

Show answer

184 days

Let’s put the dates in a dataframe

You can create a dataframe with the functions data.frame() or tibble().

Here’s a reminder

survey <- data.frame(count_date = survey_dates)

# or

survey <- tibble(count_date = survey_dates)

head(survey)

## # A tibble: 6 × 1
##   count_date
##   <date>    
## 1 2023-05-01
## 2 2023-05-02
## 3 2023-05-03
## 4 2023-05-04
## 5 2023-05-05
## 6 2023-05-06

Yes!

1. Scheduling weekdays

Ewoks are very busy. They only have one day per week when they can volunteer.

Here is the weekday when volunteers are available at each location:

Bright Tree: Thursdays
Fern Gully: Fridays

We can use the seq.Date() function and the option to step by = "week" to generate the survey dates for each site. But we need to know which day to start from.

When is the first Thursday in May of 2023?

For that, we can use the wday(), the “weekday” function.

# wday tells you the day of week (Sun, Mon, etc..) for a specific date
wday(ymd('2023-05-01'), label = TRUE, abbr = FALSE)

## [1] Monday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

So the 1st of May will be a Monday. That means…. May the 4th will be a Thursday. Perfect! That’s my favorite day.

Mutate to the rescue

We really don’t want to check every date one by one do we?

Let’s add a new week_day column to our survey table that checks ALL the dates ALL at once. To add a new column we call our friend mutate().

Complete the code below to add a week_day column.

survey <- mutate(survey, 
                 week_day = wday( ________ , label = TRUE, abbr = FALSE))

`filter()` week days

With filter we can pick out only the days of the week that we want.

Split the schedule in two by filtering the survey to only the week day needed at each site:

Thursday for Bright Tree
Friday for Fern Gully

bright_dates <- filter(survey, week_day ==  ________ )

fern_dates   <- filter(survey, week_day ==  ________ )

Show code

bright_dates <- filter(survey, week_day ==  "Thursday" )

fern_dates   <- filter(survey, week_day ==  "Friday" )

How many survey dates are at each site?

Hint: It’s less than 50.

Show answer

26 survey days

2. Particular date formats

Oh no! Each survey site has a very-very particular Assistant to the Regional Manager. And they are demanding a very specific date format for their work schedules.

Before you send off the survey dates, you’ll need to adjust the dates to match the requested formats below.

Preferred date formats

Bright Tree: 5-11-2023
Fern Gully: May 12, 2023

Use format(count_date, ...) and the date expressions below to format the schedule for each region accordingly.

For example: format(count_date, "%b, %Y") prints the date as Aug, 2023.

%b stands for 3-letter month abbreviation

%Y% stands for the full 4 digit year

Date parts

Expression	Description	Example
`%Y`	Year (4 digit)	2023
`%y`	Year (2 digit)	21
`%B`	Month (full name)	December
`%b`	Month (abbreviated)	Dec
`%m`	Month (decimal number)	12
`%d`	Day of the month (decimal number)	30

Time parts

Expression	Description	Example
`%H`	Hour	8
`%M`	Minute	13
`%S`	Second	35

Use mutate() to update the week_day column in both site schedules.

Here’s a start

# Set date format to 5-11-2023
bright_dates <- mutate(bright_dates, pretty_date = format(count_date, _______  ))

# Set date format to May 12, 2023
fern_dates <- mutate(fern_dates, pretty_date = format(count_date, _______ ))

Show code

# Set date format to 5-11-2023
bright_dates <- mutate(bright_dates, pretty_date = format(count_date, "%m-%d-%Y"))

# Set date format to May 12, 2023
fern_dates <- mutate(fern_dates, pretty_date = format(count_date, "%b %d, %Y"))

How’d we do?

# Bright Tree schedule
head(bright_dates, 3)

count_date	week_day	pretty_date
2023-05-04	Thursday	05-04-2023
2023-05-11	Thursday	05-11-2023
2023-05-18	Thursday	05-18-2023

# Fern Gully schedule
head(fern_dates, 3)

count_date	week_day	pretty_date
2023-05-05	Friday	May 05, 2023
2023-05-12	Friday	May 12, 2023
2023-05-19	Friday	May 19, 2023

Congrats!

Your fine-tuned schedules worked perfectly.

Now let’s jump ahead and take a look at the survey results.

3. Results

Load the porg survey results.

porgs <- read_csv("https://mn-r.netlify.app/data/2023_porg_survey_results.csv")

Explore a bit.

Are there missing values?

A missing site

It looks like we have a slight missing data problem.

There’s a data point in the results that wasn’t labeled with the site location. We do know the date however.

On 2023-06-30 there were a whopping 7 porgs counted - but we just don’t know where.

Can you determine the site based on the date of the porg count?

mystery_site_date <- "2023-06-30"

Hint: What weekday is this?

Try the wday(date) function.

Good sleuthing data droid.

We’ll learn how to update the site value later today, but right now we’re in a hurry, so let’s remove the row using filter.

Use filter() to keep only the rows in the porgs data where site is NOT NA (missing).

porgs <- filter(porgs, !is.na(site))

4. The best time for porgs

What is the best month to see porgs?

First, add a month column to the data with the function month() and the column count_date.

porgs <- mutate(porgs, month = month(count_date))

Next, use ggplot() and geom_col() to plot the porg sightings by month.

ggplot(porgs, aes(x = month, y = ______ , fill = site)) +
    geom_col()

Show code

ggplot(porgs, aes(x = month, y = porg_count, fill = site)) +
    geom_col()

Why might June be the lowest month?

Hint: Fern Gully

5. Time series: All the data

Plot all the data with geom_point(). Put count_date on the x-axis, and the porg_count on the y-axis. Set the color to match the site column.

ggplot(porgs, aes(x = _______, y = ______ , color = _____ )) +
    geom_point(size = 5)

Show code

ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
    geom_point(size = 5)

Oof! That’s a busy plot. Try adding + facet_wrap("site") to the end. What happens?

ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
  geom_point(size = 5) +
  facet_wrap("site")   #<<

Try adding + geom_line().

ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
  geom_point(size = 5) +
  facet_wrap(~ site) +
  geom_line()                #<<

Show code

ggplot(porgs, aes(x = count_date, y = porg_count, color = site)) +
  geom_point(size = 5) +
  facet_wrap(~ site) +
  geom_line()

Great work

The Ewoks are deeply thankful. They’ll be in touch for Porg Survey 2024.

1 Day 1 review

Get to know your DATA

Filtering

Menu of comparisons

Your analysis toolbox

dplyr with Porgs

2 Day 2 review

The joined table

Welcome to Endor!

But wait…

3 Reading coordinates

Get the coordinates from CSV

Take a look inside

Where’s the center point?

mutate the center

Where should we land?

We found it!

4 Leaflet maps

The leaflet package.

Install leaflet

Maps that zoom

1. Start with the outpost coordinates

2. Add the landing site

3. Add labels

Explore!

Woah! That’s a close one.

BASEMAPS

5 Invasive Porg Survey - 2023

The Ewoks need our help. There’s porgs EVERYWHERE!

6 Dates with lubridate

The lubridate package.

Menu of date functions

Convert text to a DATE

Get specific date parts (eg. the year, month, day… )

Clean the dates

Real world examples

Survey objective

Run the code below to create a list of dates for the survey.

How many days will the survey run?

Let’s put the dates in a dataframe

Here’s a reminder

Yes!

1. Scheduling weekdays

When is the first Thursday in May of 2023?

Mutate to the rescue

filter() week days

2. Particular date formats

Date parts

Time parts

How’d we do?

3. Results

Load the porg survey results.

A missing site

4. The best time for porgs

What is the best month to see porgs?

5. Time series: All the data

Great work

7 Share with friends

Create a GitHub account

Add a new repository

Add an R script or plot

Package questions

Secret work projects

Return to Homebase

`dplyr` with Porgs

`mutate` the center

The `leaflet` package.

Install `leaflet`

6 Dates with `lubridate`

The `lubridate` package.

Convert text to a `DATE`

`filter()` week days