Individual Lab 3

This lab will review the following concepts:

Reading in a data set

Make sure the datasets library is installed and loaded up.
We will be using the “iris” data set.

Use ggplot to create graphs by :

Changing aesthetics
Adding labels
Using Facets

Pulling variables from a data set
Finding the maximum and minimum values of a variable
Finding the mean and median of the variables.
Comparing values from the species to the entire data set
Turning a Data Frame into a Tibble

Step 1 : Install / load the “tidyverse” package

# Response

Step 2 : Install / load the “datasets” package. Examine the package and determine how many datasets are in the package.

# Response

Step 3 : Copy the dataset “iris” into the variable “iris_data”. Print out iris_data to make sure it is correct.

# Response

We will now start to create a plot one step at a time :

Step 4 : Create a graph using Sepal Length as the x axis and Sepal Width as the y axis, but no points are plotted.

# Response

Step 5 : Add the data points using basic dots.

# Response

Step 6 : Differentiate the dots by coloring the by their species.

# Response

We need to start adding labels.

Step 7 : Add the title “Sepal Length vs Sepal Width”

# Response

Step 8 : Add a subtitle “Wowsers”

# Response

Step 9 : Add better looking labels on the x and y axis. Rename the x axis “Sepal Length” and the y axis “Sepal Width”

# Response

Step 10 : Add a caption at the bottom of the graph that says “Source : Iris Data Set”

# Response

Step 11 : Change the shape of the dots by species

# Response

Step 12 : Change the size of the dots by Petal Length

# Response

Step 13 : Create a facet_grid for each species using Species ~ Petal.Length

# Response

Step 14 : Create a facet wrap. Describe the patterns you see. Is it random? Do the facets follow any kind of a pattern? Linear? Quadratic?

# Response

Step 15 : Let’s describe each species values by finding the maximum and minimum values of the Sepal Length, Sepal Width, Petal Length, and Petal Width, and the means and medians of each species and comparing that to the mean of the entire data set.

Find the mean and median of the variables using two different techniques :

Pulling out the data from the data set, storing it in a different variable name, and then using the mean command.

Ex : name_1 <- dataset[,3]
name_2 <- mean(name_1)

# Response

Use the mean and median command directly on the variable in the data set.
Ex : name_3 <- mean(dataset$variable_name)

# Response

Compare the means of the species to the mean of the entire data set. Which variables had values above the mean? Which had values below the mean?

For example, did the variable Sepal Length of the species virginica have a value larger or smaller than the Sepal Length of the entire data set?

# Response

Determine the maximum and minimum values for each of the variables for the four species.

# Response

Turn the iris_data dataframe into a tibble. Verify your result.

# Response