๐ŸŒพ Week 3: Data Manipulation with dplyr

Master Data Wrangling and Visualization Techniques ๐Ÿ”ง

Welcome to Week 3! This week, we explore data manipulation using dplyr and the tidyverse ecosystem - essential tools for organizing, cleaning, and visualizing agricultural data. Learn to filter, select, arrange, and transform data efficiently!

๐Ÿ”ง What You'll Learn This Week

๐Ÿ” Data Subsetting - Using brackets and logical conditions
๐Ÿ”— Pipes (%>%) - Chain operations for readable workflows
๐Ÿ“‹ filter() - Subset rows based on conditions
โœ‚๏ธ select() - Choose specific columns efficiently
๐Ÿ”„ mutate() - Create new variables and transformations
๐Ÿ“Š group_by() + summarize() - Calculate group statistics

๐Ÿš€ Getting Started: Step-by-Step Guide

Step 1: Launch Week 3 Binder Environment ๐ŸŒ

Click the "Launch Week 3" button above to start your R environment. This will take 2-5 minutes to load with all necessary packages for data manipulation.

Step 2: Navigate to Class Activity ๐Ÿ“š

Once Binder loads, you'll see the Jupyter Notebook interface. In the left panel, you'll see:

Click on the class_activity folder to access this week's content.

Step 3: Open the Week 3 Lab Notebook ๐Ÿ“–

Inside the class_activity folder, double-click on Week3_Data_Manipulation.ipynb to open the interactive lab notebook.

Step 4: Work with Multiple Datasets ๐Ÿ“Š

This week we'll use multiple datasets including the iris dataset and real-world data! The notebook will guide you through:

๐ŸŽฏ Interactive Learning Tools

Practice with Data Manipulation Tools

Use these interactive tools to understand data manipulation concepts before working with R code:

๐Ÿ’ก Tip: Use these tools to visualize data manipulation concepts before applying them in your R notebook!

๐Ÿงฎ Key R Functions This Week

Data Manipulation

filter(data, condition) # Subset rows
select(data, columns) # Choose columns
slice(data, rows) # Select by position
arrange(data, variable) # Sort data
mutate(data, new_var = ...) # Create variables
group_by(data, variable) # Group for analysis

Column Selection Helpers

starts_with("Sepal") # Columns starting with "Sepal"
ends_with("Length") # Columns ending with "Length"
contains("Petal") # Columns containing "Petal"
matches(".*Width") # Regular expression matching

Data Cleaning

str_replace_all(text, pattern, replacement) # Clean text
na.omit(data) # Remove missing values
as.integer(vector) # Convert data types

๐Ÿ“ Assignment 3: Data Visualization and Analysis

Step 1: Access Assignment Folder ๐Ÿ“‹

From the main directory, click on the assignment folder to access Assignment 3.

Step 2: Open Assignment 3 Notebook ๐Ÿ“„

Double-click on Assignment3.ipynb to open your assignment on data visualization and analysis.

Assignment Overview (20 points total)

๐Ÿ“Š

Part 1: LA Data Analysis (6 points)

Filter data by gender and create comparative boxplots

๐Ÿงน

Part 2: SAT Dataset Processing (9 points)

Import, clean, and subset real-world data

๐Ÿ“ˆ

Part 3: Distribution Analysis (5 points)

Create stem-and-leaf plots and analyze patterns

Step 3: Work with Real-World Data ๐ŸŒ

The assignment uses multiple datasets:

Learn to handle messy real-world data with non-numeric values and missing information!

๐ŸŒพ Why This Matters in Agriculture

๐ŸŒฑ Crop Yield Analysis - Filter by variety, location, season
๐Ÿงช Soil Sample Processing - Clean mixed numeric/text data
๐ŸŒง๏ธ Weather Data Analysis - Group by month, calculate averages
๐Ÿ„ Livestock Performance - Compare treatments, identify outliers
โœ… Quality Control - Monitor product consistency over time

Data Manipulation Skills Help You:

๐Ÿ’พ Saving Your Work

โš ๏ธ Important: Binder environments are temporary! Always save your work locally.

Download Your Notebook ๐Ÿ“ฅ

When you're done working, save your progress:

  1. Save your notebook: File โ†’ Save
  2. Download .ipynb file: File โ†’ Download
  3. Export HTML/PDF: File โ†’ Save and Export Notebook As โ†’ HTML

Continue Your Progress Later ๐Ÿ”„

To resume your work:

  1. Launch Binder again
  2. Click Upload button
  3. Upload your saved .ipynb file
  4. Continue where you left off!

๐Ÿ“ค Submission Requirements

For Assignment 3, submit TWO files to UC Davis Canvas:

๐Ÿ“„

HTML/PDF Report

Your completed assignment with all outputs and analysis

๐Ÿ’พ

.ipynb File

Your notebook code as backup

Due Date: Check Canvas for assignment deadline

๐ŸŽฏ Learning Objectives

By the end of this week, you will be able to:

โœ… Master data subsetting with brackets and logical conditions
โœ… Apply dplyr functions for efficient data manipulation
โœ… Chain operations using pipes for readable workflows
โœ… Select columns using helper functions and patterns
โœ… Clean real-world data with mixed data types
โœ… Create visualizations to understand data patterns

โ“ Need Help?

๐Ÿ“ง Contact Information

Mohammadreza Narimani
๐Ÿ“ง mnarimani@ucdavis.edu
๐Ÿซ Department of Biological and Agricultural Engineering, UC Davis

๐Ÿ”ง Common Issues

๐Ÿ“š Additional Resources

๐ŸŒŸ Tips for Success

๐Ÿ’ก Best Practices

โšก Keyboard Shortcuts

Shift + Enter Run current cell and move to next
Ctrl + Enter Run current cell and stay in place
Tab Auto-complete function names
Ctrl + Shift + M Insert pipe operator (%>%)

๐ŸŽ‰ Ready to Start?

Click the Binder badge below to launch Week 3!

Happy data wrangling! ๐Ÿ”ง๐ŸŒพ