Individual Lab 1

In this Lab we are going to go through some of the basics to programming in R. We will look at how to comment your work, basic calculator computations, create a variable, create a vector, how to install a library / package, and functions.

Comments

Comments are a very important part of coding. When you are writing code, you will want to leave notes to yourself and your collaborators to describe what you are doing at each step. You can also leave notes on parts of the code that are norking or that youo feel should be changed. It is a way to remind yourself and your collaborators the work that has been done, why it was done, what is broken, other changes you want to make, and more. They are especially important when you come back to the code after not having looked at it for a while.

Basically, leave as many comments as possible while coding. You should start with the names of those that are working on the project with a synopsis on what is the purpose of the project.

A comment is any text following a hashtag (#) on the same line. This text is ignored by the R compiler and does not affect how the script is run.

Calculator

R can be used as a calculator. Here are the operators :

Addition +

# This is an example of addition (and how to make a comment!).

4 + 8
[1] 12


Subtraction -

# Here is an example of subtraction.

5 - 14
[1] -9


Multiplication *

# Here is an example of multiplication

8 * 17
[1] 136


Division /

# Division

22 / 7
[1] 3.142857


Exponentiation ^

# Exponentiation

5^3
[1] 125


Square Roots and Radicals

Recall that we can use exponents to calculate radicals, too. If you recall, the square root function is the same as raising a value to the (1 / 2) power!

# Here is the square root of 9

9^(1/2)
[1] 3


Notice that there are some levels of estimation / rounding here. For example, calculate the square root of 2

2^(1/2)
[1] 1.414214


Obviously the real answer goes further than six decimal places. Also think about the square root of 2 multiplied by itself. We should get the value 2 right back :

2^(1/2) * 2^(1/2)
[1] 2


But notice what happens if we then subtract 2 from the previous result :

2^(1/2) * 2^(1/2) - 2
[1] 4.440892e-16


This result is in scientific notation, but what does it mean? When you see the “e-16” part, that says to move the deciaml places 16 spots to the left, so this answer is closer to 0.000000000000000440892. Notice that it is not zero, as it should be. This is because of the estimation that was talked about above.

Creating a variable

What is a variable? Imagine dumping some data in a bucket and then giving that bucket a name. Now, every time you use that name, you are really referring to what is in the bucket.

NOTE 1 : Recall our discussion from class about naming your variables. You want to pick a name for your variables that make sense. It will make editing your script much easier in the long run. If I called a variable “Quiz_Scores” then we know what types of values we are working with. If I called the variable “x” instead, what does that tell us about the variable itself?

We can use an “arrow” to assign a value into a variable. An arrow is just a less than sign followed by a dash: <-

For example, what if I wanted to assign the value 3 into a variable called “x” and the value 7 into a variable called “y”

# Assign 3 to the variable "x"

x <- 3

# Assign 7 to the variable "y"

y <- 7


NOTE 2 : At this point you should look at the ENVIRONMENT window. This window shows you the variables you are using and the values they contain.

Now that we have values in these variables, we can now use them in our script. For example, what if I wanted to perform some basic calculations with these variables :

Addition of two variables : x + y

# We are going to calculate x + y. 

# Recall that x = 3 and y = 7 from above, so this should return 10

x + y
[1] 10


We could do the same for several different operations :

# Example 1 : Subtract two variables :

y - x
[1] 4
# Example 2 : Multiply a variable by a constant. We could multiply the variable x by 9 to
# get 9 * 3 = 27

9*x
[1] 27
# Example 3 : Take a linear combination of two different variables. For example, we can
# multiply your by 2 and x by 3 and add them together to get 
# 2*7 + 3*3 = 13 + 9 = 23 

2*y + 3*x
[1] 23
# Example 4 : Use one variable as an exponent for another variable. In this case, take the 
# variable x and raise it to the power y. In this case, we are computing 3 ^ 7
# which is 3 * 3 * 3 * 3 * 3 * 3 * 3 = 2187:

x^y
[1] 2187


Assigning Operations to Variables
We could also take the result from an operation and assign that to a different variable. For exampe, we could multiply x by 7 and multiply y by 2, subtract them and store the result in a new variable z.

z <- 7*x - 2*y


If you want to print out what is now in the variable z, you can now see it in the ENVIRONMENT window. You can also type it out in the script :

# Print out the variable z

z
[1] 7


Recall from class that there are different kinds of variables we might be asked to consider. We could have QUANTITATIVE data that could be in the form of continuous or discrete values. Another kind of data we talked about was CATEGORICAL data. Depending on the type of data you are analyzing, you would need to perform different operations.

Create a vector

Let’s assume the class takes a quiz and I want to keep track of them. Instead of creating a variable for each individual quiz, I can create one vector that will have all of the scores.

For example, if I have the quiz scores 10, 5, 8, 9, 4 then I could create the variables Student_1_Quiz, Student_2_Quiz, etc. It is much easier to create a single variable that holds all of these values.

We will use the following command : c( )

The “c” is shorthand for “concatenate” which means to link objects together in a chain or series.

If I wanted to create a vector for the quiz scores above, I would create a vector called Quiz1_Scores and assign the variables as follows :

Notice the order is important. If we want to assign the first student a 10, the second a 5, the third an 8, the fourth a 9, and the fifth a 4, then we would do the following :

# Create a vector and call it "Quiz1_Scores" 

Quiz1_Scores <- c(10, 5, 8, 9 ,4)


Notice how this is represented in the ENVIRONMENT window :

# We can print out the variable to check what it contains

Quiz1_Scores     
[1] 10  5  8  9  4

This tells us the scores are located in spots 1, 2, 3, 4, 5 in the vector. In this vector, the values in the spots are then shown as the values : 10 5 8 9 4

Let’s say Student 3 comes to see us and ask about their quiz grade. We can then access the individual value as follows :

Quiz1_Scores[3]
[1] 8


If we wanted the fifth value in the vector we could say :

Quiz1_Scores[5]
[1] 4


If we wanted to print out the 3rd, 4th, and 5th scores, we could say :

Quiz1_Scores[3:5]
[1] 8 9 4


Obviously we would want to make sure we enter in the data in an appropriate order because if I rearrange the order I get a completely different vector :

Quiz1_Scores_B <- c(5, 10, 4, 9, 8)


While I have the same scores, they are located in different spots of the vectors and would assign different values to Student 1, Student 2, etc.

We could also use characters in our vectors. We would need to make sure we use quotation marks so the compiler does not think we are using other variables in our vector.

Students <- c("Alice", "Bob", "Chad", "Debbie", "Eric")

Check out what is now in the ENVIRONMENT window :

# Print out the Students vector :

Students
[1] "Alice"  "Bob"    "Chad"   "Debbie" "Eric"  


The only real difference from above is the we are using characters instead of numeric values and that is identified because of the “chr” notation in the description.

We can look at the individual entries jsut as we did above. To look at the fourth entry in the vector we would type :

Students[4]
[1] "Debbie"


If you are given a variable or vector and want to know what type of values it contains, you can use the “class” command to tell you. Here are some examples to check entire vectors or individual locations in a vector :

class(Quiz1_Scores)
[1] "numeric"
class(Quiz1_Scores[2])
[1] "numeric"
class(Students)
[1] "character"
class(Students[1])
[1] "character"
class(Students)
[1] "character"


What happens if we mix our variables and have numbers and characters in the same vector?

Here is how one could be created :

blah <- c(4, "dffdg", 6, 9, "trte")


How does R interpret these values?

class(blah)
[1] "character"


Notice that it considers EVERY entry in this vector to be a CHARACTER even though we entered some as numbers. Be careful with this as it could cause issues on how we work and interact with this vector.

Libraries and Packages

When we start up a session of R, there are some commands that are already built into the program that we can use. For instance, we used the basic mathematics operations above.

You will eventually want to do a deeper analysis of the data that needs a command that is not already installed in your current session of R. This is where the idea of Libraries or Packages comes into play.

If there is a command we want to use that is not currently loaded into R, we can install the package that includes the command.

You can see what is loaded already by clicking on the PACKAGES tab.

You will see the packages that are loaded up as they will have a check mark indicated they have been installed.

Let’s say there was something in the “tcltk” library I wanted to use. I could then click on the check box for “tcltk” and a message should come up in the Console showing that the package was installed.

> library(tcltk, lib.loc = “/opt/R/4.3.1/lib/R/library”)

We could remove the package by unclicking on the check box. We get a confirmation in the Console :

> detach(“package:tcltk”, unload = TRUE)

What happens if we need a package that is not included in the list. There are hundreds of packages that people have developed to use in R.

For example, consider the tidyverse library. This is a library that contains several commands that we will be using over the semester.

If we know we are going to be using a specific library in our R script, then we should install it at the top of the script. We would enter in a command such as :
install.packages(“tidyverse”)

Once we do this, you will see several different commands that are now available for us to use to analyze our data set.

If you look at the Packages tab, you will see several new packages that we can add to use in the program.

tidyverse comes with several packages. If you click on the tidyverse package, you will see several packages installed :

> library(tidyverse)

── Attaching core tidyverse packages ────────────────────── tidyverse 2.0.0 ──

✔ dplyr 1.1.2 ✔ readr 2.1.4

✔ forcats 1.0.0 ✔ stringr 1.5.0

✔ ggplot2 3.4.2 ✔ tibble 3.2.1

✔ lubridate 1.9.2 ✔ tidyr 1.3.0

✔ purrr 1.0.1

These packages are now loaded into R. Note that is we uncheck the tidyverse package, these are still loaded into R. We can remove them by unchecking their package. For example, if I uncheck the ggplot2 package you will see the following message :

detach(“package:ggplot2”, unload = TRUE)

When you are starting to become a programmer it might be tempting just to load up EVERYTING, but that is not good practice. When these packages are loaded up, they are taking up memory. This can lead to slower computation times as well as lead to larger files being generated, wasting space. Try to be efficient and load up what you need and avoid bloat.

Functions

There are two kinds of functions we are going to deal with in class - the ones that are built into R and ones we create ourselves. This lesson will only consider the functions that are built in.

A FUNCTION is a command that takes in some kind of data, manipulates it, and returns a value. It has the following form :

FUNCTION_NAME( data )

This will simply print out the result. Remember we could also assign the result to a variable.

result <- FUNCTION_NAME ( data)

For example, go back to the quiz grades we had listed earlier. What if I wanted to calculated the average (mean) of the quiz scores? I could enter the following :

mean(Quiz1_Scores)
[1] 7.2


I could assign the result to a variable, such as :

# Calculate the average

Quiz1_Average <- mean(Quiz1_Scores)

# Print out the average

Quiz1_Average
[1] 7.2


There are several other built in functions. Here are a few :

min( ), max( ), mean( ), median( ), sum( ), range( ), abs( )

If we wanted the highest quiz grade, we could say :

# Find the maximum value :

max(Quiz1_Scores)
[1] 10


If I wanted to know how many values are in my vector, I could say :

# Use the length function :

length(Quiz1_Scores)
[1] 5


If I wanted to pick a random number from 1 - 100, I could type the following

sample(100,1)
[1] 2


If I wanted to pick three random numbers (all different) from 1 - 100, we could say this :

sample(100,3)
[1]  9 89 45


If I wanted to pick seven random numbers from 1 - 100 where we could have (but not guarantee) duplicate values, I could say this :

sample(100, 7, replace=TRUE)
[1] 41  6 17 49 50 72 11


To create a random vector for a range of values, we can use sample function. We just need to pass the range and the sample size inside the sample function.

For example, if we want to create a random sample of size 20 for a range of values between 1 to 100 then we can use the command sample(1:100,20) and if the sample size is larger than 100 then we can add replace=TRUE as shown in the below examples.

# Create random sample of 20 values from 1 - 100

x1 <- sample(1:100,20)

# Print out the result

x1
 [1] 87 76 24 12 61  4 29 70 64 16 55 30 48 10 36 80 11 38 96 54


# Create a sample of 200 values from 1 - 100. Obviously there will be repeats!

x2 <- sample(1:100,200,replace=TRUE)

# Print out the sample

x2
  [1]  18   9  62  17   5  86 100   7  39  54  45  58  59  86  45  61  62  48
 [19]  20  60  83   1  55  72  77  40  34  11  91  10  94 100  86  40  66  43
 [37]  90  20  24  55  35   1  30  14  81  28  89  18   3  15  49  23  24  18
 [55]   9  98  49  31  28  64   1  78  97  24   6  73  97  77  33  31  46   3
 [73]   2  96  69  98  73  22  16  20  81  78  70  87  29  96  93  68  64  55
 [91]   6  68  85  52   3  83  32  77  13  34  76  98  65  50   8  79  72  38
[109]  47   4   9   1  17   7  89   1  67  47  37  34  80  56  63  54   5  72
[127]   5  98  83  91  57  25  90  57  76  49  64  70  45  61  18  32  39  56
[145]  16  79  11   2  64  85  52  84  69  69  53  28  43  80  22  18  74  13
[163]  11  56  97  23  25  20  44   4  27  58  71  12  71  86  66 100  98  60
[181]  35  29  17  85  26  33  59  58  82  36  39  65  98  22   7  52  68  77
[199]  81  32


There are far too many built in functions to list. You may have to use a book or The Google to help you find an appropriate one to use.

Lastly, make sure that the data you enter into the function makes sense. For example, what if I tried to find the average of the names of the students we put into the vector “Students” :

mean(Students)
Warning in mean.default(Students): argument is not numeric or logical:
returning NA
[1] NA


You will see that this returns an error :

[1] NA

Warning message:

In mean.default(Students) :

argument is not numeric or logical: returning NA

PRACTICE PROBLEMS

  1. Calculate the following, printing the outputs to the console :
# Responses
  1. Assign the following to the vector x1 : 125 + 15 `# Print out x1 to the console to check your work.
# Response
  1. Assign the sentence “Transylvania Pioneers - 2023 national Champions in Womens Basketball” to the variable Sentence-1. Print out Sentence-1 to check your work.
# Response
  1. Create a vector called “Friends” that has the names of your friends. Add 7 names to the list. Check your vector by printing it out.
# Response
# Response
  1. Create a vector called “Family” and another vector called Ages” that has the ages of your family. Make sure the locations of the ages match up with the Family vector. Check your vector by printing it out.
# Response
  1. For each of the previous four vectors (x1, Sentence-1, Friends, Ages), determine the CLASS of each vector.
# Response
  1. Install the package “hms”
# Response
  1. Create a vector of 50 random numbers between 200 - 700 without replacement and store it in a vector called Random1. Print out the values of the vector.
# Response
  1. Create a vector of 50 random numbers between 30 - 50 with replacement and store it in a vector called Random2. Print out the values
# Response
  1. Find the average of the values in Random1 and Random 2 (Avg1 and Avg2) and print them out
# Response
  1. Write a single line of code that will create a vector named R1 that will select a random number of values (from 1 to 100) from a range of numbers from 500 - 700.
# Response
  1. Create three more vectors (R2, R3, R4) with vectors of size 10, 1000, 100000 that will select a sample from a range of numbers from 500 - 700.
# Response
  1. Think about picking numbers at random from 500 - 700. If you take the average of these values, what number do you think should you be close to? Why?
# Response
  1. Look at your repeated averages. What were the ranges of scores for the samples of size 10? size 1000? size 100000?
# Response
  1. Based on this data, what do you think will happen to the average if we take even larger samples? Why?
# Response