Individual Lab 3

This lab will review the following concepts:

  1. Reading in a data set
  1. Use ggplot to create graphs by :
  1. Pulling variables from a data set
  2. Finding the maximum and minimum values of a variable
  3. Finding the mean and median of the variables.
  4. Comparing values from the species to the entire data set
  5. Turning a Data Frame into a Tibble

Step 1 : Install / load the “tidyverse” package

# Response


Step 2 : Install / load the “datasets” package. Examine the package and determine how many datasets are in the package.

# Response


Step 3 : Copy the dataset “iris” into the variable “iris_data”. Print out iris_data to make sure it is correct.

# Response


We will now start to create a plot one step at a time :

Step 4 : Create a graph using Sepal Length as the x axis and Sepal Width as the y axis, but no points are plotted.

# Response


Step 5 : Add the data points using basic dots.

# Response


Step 6 : Differentiate the dots by coloring the by their species.

# Response


We need to start adding labels.

Step 7 : Add the title “Sepal Length vs Sepal Width”

# Response


Step 8 : Add a subtitle “Wowsers”

# Response


Step 9 : Add better looking labels on the x and y axis. Rename the x axis “Sepal Length” and the y axis “Sepal Width”

# Response


Step 10 : Add a caption at the bottom of the graph that says “Source : Iris Data Set”

# Response


Step 11 : Change the shape of the dots by species

# Response


Step 12 : Change the size of the dots by Petal Length

# Response


Step 13 : Create a facet_grid for each species using Species ~ Petal.Length

# Response


Step 14 : Create a facet wrap. Describe the patterns you see. Is it random? Do the facets follow any kind of a pattern? Linear? Quadratic?

# Response


Step 15 : Let’s describe each species values by finding the maximum and minimum values of the Sepal Length, Sepal Width, Petal Length, and Petal Width, and the means and medians of each species and comparing that to the mean of the entire data set.

Find the mean and median of the variables using two different techniques :

  1. Pulling out the data from the data set, storing it in a different variable name, and then using the mean command.

Ex : name_1 <- dataset[,3]
name_2 <- mean(name_1)

# Response


  1. Use the mean and median command directly on the variable in the data set.
    Ex : name_3 <- mean(dataset$variable_name)
# Response


  1. Compare the means of the species to the mean of the entire data set. Which variables had values above the mean? Which had values below the mean?

For example, did the variable Sepal Length of the species virginica have a value larger or smaller than the Sepal Length of the entire data set?

# Response


  1. Determine the maximum and minimum values for each of the variables for the four species.
# Response


Turn the iris_data dataframe into a tibble. Verify your result.

# Response