You’re ready to blast off on your own. The example outline below includes some placeholder code based on ozone measurements and is presented as a helpful starting point for your analysis. The script snippets will not run successfully as is. They will need to be updated with the name of your own dataframe and its specific column names.
Good luck!
Set up your project
- Open a new project
- Open a new R script
- Create a
data
folder in your project directory - Copy your data into the folder
Begin your analysis
If you’d like, you can try doing your analysis in an Rmarkdown
document instead of an R script. Rmarkdown lets you add text and images to your analysis, as well as share your work as a Word document, a website, or even a PDF. Download a version of the analysis outline below can be downloaded here.
DOWNLOAD - Rmarkdown Analysis Outline
1. Read data into R
library(readr)
library(janitor)
# Read a CSV file
air_data <- read_csv("data/My-data.csv")
# Have an EXCEL file?
## You can use read_excel() from the readxl package
install.packages(readxl)
library(readxl)
# Read an EXCEL file
air_data <- read_excel("data/My-data.xlsx")
2. Clean the column names
air_data <- clean_names(air_data)
2. Plot the data
library(ggplot2)
# Remember the ggplot sandwich!
ggplot(air_data, aes(x = TEMP_F, y = OZONE)) +
geom_point(aes(color = site_name), alpha = 0.2) +
geom_smooth(method = "lm")
3. Clean the data
library(dplyr)
# Examples of common issues
## Drop values out of range
air_data <- air_data %>% filter(OZONE > 0, TEMP_F < 199)
## Convert all samples to PPB
air_data <- air_data %>%
mutate(OZONE = ifelse(UNITS == "PPM", OZONE * 1000,
OZONE))
4. View the data again
Look at the data from different angles (e.g. by category, site, County, or facility).
- The plotting function
facet_wrap()
is great for this.
#
# Are some sites different?
#
# We can facet the data by 'Site' to eliminate any noise
# caused by mixing data from different sites, and learn
# if the pattern between ozone and temperature varies.
ggplot(air_data, aes(x = TEMP_F, y = OZONE)) +
geom_point(alpha = 0.2, size = 3) +
geom_smooth(method = "lm") +
facet_wrap(~SITE) +
labs(title = "Ozone increases with temperature",
subtitle = "Observations from 2015-2017")
5. Summarize the data
air_data <- air_data %>%
group_by(SITE, YEAR) %>%
summarize(AVG_OZONE = mean(OZONE) %>% round(2),
AVG_TEMP = mean(TEMP_F) %>% round(2))
6. Save the results
Save the final data table
write_csv(air_data, "results/2015-17_ozone_summary.csv")
Save the plots
ggsave("results/2015-2017 - Ozone vs Temp.pdf")