Why R?

  • Free and Open source.
  • Can run on windows/Mac/Linux.
  • Great extensibility.
  • Can integrate with other languages(C, C++, Java, Python) and packages(SAS, SPSS).
  • Statistics, Data visualization (Graphs & Images)
  • Graphical User Interfaces (GUI)
  • R and Python are both open-source programming languages with a large community.
  • R is mainly used for statistical analysis while Python provides a more general approach to data science.

R Studio

  • 4 panes in 4 quadrants.
  • Can change size of quadrants using splitters or buttons on top.
  •  An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development.
  • Tools provided by an IDE include a text editor, a project editor, a tool bar, and an output viewer.
  • IDEs can perform a variety of functions ( write code, compile code, debug code, and monitor resources).

Packages

  • R packages are a collection of R functions, complied code and sample data.
  • They are stored under a directory called “library” in the R environment.
  • By default, R installs a set of packages during installation. More packages are added later, when they are needed for some specific purpose.
  • Task specific (developed for particular set of tasks).

Built-in Help Commands

  • Help mechanisms (help, demo, vignette) that work on information available and installed on your computer.
  • R functions are case-sensitive.
  • Click Ctrl+Enter or Run to execute the command.

help()

  • help(term) or ?term if know the exact term for document to be searched.
  • help.search(‘term’) or ??term if don’t know the exact search term.
  • example() to directly execute the example given in that document (rnorm here). It gives an idea about its usage and these examples are mostly reproducible in nature, can simply copy-paste to execute those.

Codes for help command

  1. help(rnorm) #search exact term
  2. ?rnorm #shortcut to search exact term
  3. example(rnorm) #to diectly execute examples available in this documentation
  4. help.search(‘rnorm’) #search simialr (not exact) term
  5. ??rnorm #short syntax to search simialr (not exact) term

demo()

  • Demos are like examples but tend to be longer. Instead of focussing on a single function, they show how to weave together multiple functions to solve a problem.
  • Demos are feature specific but not available for all functions/features.
  • demo(graphics) pass a topic name (again, graphics here) as a parameter to demo function. Optionally can also mention package name as a parameter.Console will prompt to hit enter key to start. Again, press enter to see next plots.

Codes for demo command

  1. demo() #List demonstrations in all attached packages. (shown with checked check-boxes under package panel)
  2. demo(package= .packages(all.available= TRUE)) #List demonstrations in all installed packages
  3. demo(package= ‘graphics’) #List demonstrations in specific package such as graphics here
  4. demo(‘graphics’) #demonstrate graphics topic, using enter key in console,in plots section
  5. demo(‘graphics’,package= ‘graphics’) #demonstrate graphics topic,optionally giving package name too

vignette()

  • Demos are feature specific. To know more about a package as a whole we can use vignette option.
  • Cover all or part of functionality of a package.
  • Each vignette provides three things: the original source file, a readable HTML page or PDF, and a file of R code. 
  • Short to mid size pdf document; covers brief introduction, background, usage, result, references.

Codes for vignette command

  1. vignette() #List all vignette in attached packages
  2. vignette(package=.packages(all.available=TRUE)) #List all vignette in all installed packages
  3. vignette(package= ‘parallel’) #List all vignette topics in parallel package
  4. vignette(‘parallel’) #show vignette(pdf) for topic parallel
  5. vignette(‘parallel’,package =’parallel’) #show vignette(pdf) for topic parallel using package name optionally.

Web Search

  • If information is not available on our local machine then we can use power of web to find it, through either of following two ways.
  • Using R command(console)-limited search uses.
  • Using various popular search engines.

Web search using R command

  • RSiteSearch command to invoke a website from R console.
  • This function can search keywords or phrases in various help pages, vignettes and task views.
  • uses search engine of R project available at following link
  • http://search.r-project.org
  • Click on search-key term will be highlighted as red.
  • Can filter search results by choosing target (function, vignette or task view)
  • can sort the search results as per need (by default search is done by score value-signifies total number of occurrence of key term in the search result.
  • Search results can contain all or one of the search terms in any order.
  • Have to put search terms in curly braces-If want to have exact search with both key terms occurring together.
  • install.packages(‘sos’) to install any package (sos here)
  • library(sos) we have to load that library into the memory before we can use any function of the sos package.
  • findFn Now we can invoke the functions available in this sos package using this function name.
  • Description and Link (last column) have links to actual search help page.
  • Package column shows name of package containing the help page.
  • Function column contains the help page name.
  • Count column- total number of matches in the current package.
  • Score column- score values coming from RSiteSearch function output.
  • Maxscore column- maximum of the score value of the various help pages in corresponding package.
  • Totalscore column– sum of all scores for particular package.
  • maxPages parameter can be set if want to reduce the number of search pages or see the top pages.
  • ??? is Shortcut to findFn function
  • Count is package specific(number of link/help page/functions in packages having key term)
  • Score is link/help page specific (occurrence of key term in specific link page) .
  • Example-for search term “inheritance” packages may include three books-Python, Genetics and social science (count=3). Help pages may correspond to number of chapters in each book.Each chapter having different occurrence of key term makes score of each chapter.

Codes for web search

  1. RSiteSearch(“arithmetic%20mean”)#to search from http://search.r-project.org
  2. RSiteSearch(‘{arithmetic%20mean}’)#exact search all key terms occurring together
  3. install.packages(‘sos’) #to install sos package
  4. library(sos) #to load sos package
  5. findFn(‘{arithmetic mean}’) #search function using findFn in tabular form
  6. findFn(‘{arithmetic mean}’,maxPages=2) #maxPages parameter to get top pages
  7. ???'{arithmetic mean}'(2) #shortcut for findfn function with maxPages parameter

Communities

Mailing List

  • R-help for primary help.
  • R-devel for code development in R.
  • R-sig-finance, R-sig-hpc for special interest groups.
  • http://bit.ly/mailingR

Forums

  • http://bit.ly/stackoverflowR
  • http://bit.ly/crossvalidatedR (statistics and data-mining questions in R)
  • http://bit.ly/nabbleR

Blogs

  • http://bit.ly/Rbloggers
  • http://bit.ly/revolutionanalyticsR
  • http://bit.ly/rstatistics
  • http://bit.ly/rdataminig
  • http://bit.ly/RPostingGuide (guidelines to post queries)

Object

  • An object, in object-oriented programming (OOP), is an abstract data type created by a developer. A simple example of an object may be a user account created for a website.
  • Objects have states and behaviors. Example: A dog has states – color, name, breed as well as behaviors – wagging the tail, barking, eating.
  •  Object is an instance of a class.
  • Class can be defined as a template/blueprint for creating a futute objects (a particular data structure),that describes the initial values for state (member variables or attributes), and implementations of behavior (member functions or methods).

Variable

  • Variable is a place holder or container that can holds any object.
  • We can use assignment operater (<-) along with spaces (to increase program readability) before & after to assign any value to a variable (X).
  • X <- 10
  • R varaibles are case-sensitive. (X is different from x).

Function

  • Function is a group of statements that together perform a task.
  • Function can be seen as a box that takes some value and perform some actions.
  • A function is an object. R interpreter passes control to the function, along with arguments necessary for the function to accomplish the actions.
  • The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects.
  • function is a piece of code that is independently called by name.
  • A method is also a piece of code that is called by a name that is associated with an object.
  • function applies to both OOP (obect-oriented) and non OOP non-object oriented languages (C, Fortran, COBOL, Pascal)
  • Method applies to only object oriented languages(C, C++, Java).

Parameter & Arguement

  • Parameters are components of functions. They identify values that are passed into a function.
  • Parameter is variable in the declaration of function. While, Argument is the actual value of this variable that gets passed to function.
  • Passing an argument to a program means setting the value of parameter of that program.

Creating a variable in a new (custom) environment

  • Environment can be considered as a bag to store variables.
  • It is used in lexical scoping.
  • When we crearte a variable x, it is created by default in the global environment.
  • When we crearte a variable in a custom environment (inside parent global environment) also. Same variables can have different values in different environments.
  • To make sure that variable we created belongs to the global environment we can use get command
  • get (“x”, globalenv())
  • To create a new environment (named my.environment) we can use command
  • my.environment <- new.env ()
  • We can check parent environment to this new custom environment by following R function.
  • parent.env (my.environment)
  • We can create a variable in this new custom environment by 3 different ways.
  • assign (“x”, 10, my.environment)
  • my.environment [[“x”]] <-10
  • my.environment$x <-10 (quotes around variable name are optional)
  • we can read above command as ‘ assign the value of 10 to variable “x” in the environment named as my.environment.

Code for creating a variable in a new (custom) environment

  1. my.environment <- new.env ()
  2. assign (“comprehensive.marks”, 120, my.environment)
  3. my.environment [[“comprehensive.marks”]] <-120
  4. my.environment$x <-10

Getting a variable from a environment

  • We can use “get” command to get a variable by passing the name of variable as well as name of environment as the parameters.
  • get (“x”, my.environment)
  • my.environment [[“x”]](quotes around variable name are optional)
  • my.environment$x

Code for getting a variable from a environment

  1. get (“comprehensive.marks”, my.environment)
  2. my.environment [[“comprehensive.marks”]]
  3. my.environment$comprehensive.marks # (quotes around variable name are optional)
  1. comprehensive.marks <- 100 # assign variable in global environment
  2. comprehensive.marks # get variable from global environment
  3. get (“comprehensive.marks”, globalenv())# get variable from global environment (confirm it’s in global environment)
  4. my.environment <- new.env ()# create a new custom environment
  5. parent.env (my.environment)# check parent environment
  6. assign a variable in this new custom environment by 3 different ways. can select all 3 lines and run together.
    • assign (“comprehensive.marks”, 120, my.environment)
    • my.environment [[“comprehensive.marks”]] <-120 #(quotes around variable name are optional)
    • my.environment$x <-10
  7. to get a variable by 3 different ways.
    • get (“comprehensive.marks”, my.environment)
    • my.environment [[“comprehensive.marks”]]
    • my.environment$comprehensive.marks # (quotes around variable name are optional)

Naming convention

  • Syntactically valid name consist of-
    • letter, number, dots or underline characters.
    • starts with either letter or dot followed by letter(not number), ex-goodName, good.Name, good_Name, .goodName are
    • .4goodName is an invalid name.
    • We cannot use few reserve keywords (with some semantic definition in R) for naming in R. ex-function, for, if, else, while, next, repeat, TRUE, FALSE, NULL, NA, NaN etc.

Google’s R Style Guide

  • For not only syntactically valid but also more consistent and readable variable names.
  • File Names
    • File names should end in .R and be meaningful.
    • GOOD: predict_ad_revenue.R
    • BAD: foo.R
  • Identifiers
  • Variable names should have all lower case letters and words separated with dots (.) Don’t use underscores ( _ ) or hyphens (-).
  • variable.name
    GOOD: avg.clicks
    BAD: avg_Clicks avgClicks
  • Function names have initial capital letters and no dots (CapWords).Make function names verbs.
  • FunctionName
    GOOD: CalculateAvgClicks
    BAD: calculate_avg_clicks calculateAvgClicks

    Exception: When creating a classed object, the function name (constructor) and class should match (e.g., lm).
  • kConstantName
  • http://web.stanford.edu/class/cs109l/unrestricted/resources/google-style.html

How to assign a variable

  • By using assignment operator (less than sign folloed by dash). Use “run” to execute this line. We can see this execution on the bottom console.
  • If we want to see the content of varaible, we can simple execute the variable name.
  • comprehensive.marks <- 100
  • By using assign () function. Assign function can take several parameters. In the example below, first parameter is the variable name and second parameter is the value we want to assign it.
  • assign ( “comprehensive.marks”, 100)
  • Whenever we create a variable that is also created in environment tab on right panel.

Codes for assigning a variable

  1. comprehensive.marks <- 100 # assign value to a variable using assignment operator
  2. comprehensive.marks # print variable content
  3. assign (“match.score” , 500) # assign value to a variable using assign function
  4. match.score # print variable content

Operator

  • Operator are similar to in-buit mathematical ‘Functions’ but both are syntactically different.
    • are pillars of any programming language
    • can work on one or more objects called “Operands”
    • will return some results
    • Two types of Operators.
      • Arithmatic operator-operate on numeric values
      • Logical operator- operate on Boolean or Logical values (True or false)

Mathematical Operators in R

  1. 12 + 4 #Addition
  2. 12 – 4 #subtraction
  3. 12 * 4 #Multiplication
  4. 12 / 4 #Divison
  5. 12 ^ 2 #Exponentiation (carrot symbol)
  6. 12 ** 2 #Exponentiation (double multiplication symbol)
  7. 10 ^ 3
  8. format(10 ^ 3, scientific = TRUE) #With scientific notation
  9. format(10 ^ 3, scientific = FALSE) #Without scientific notation (string format)
  10. 12 %% 5 #Modulus
  11. 12 %/% 7 #Integer divison
https://www.statmethods.net/management/operators.html

In-built Mathematical Functions in R

  1. abs(-7) #Absolute value
  2. log(2) #Natural Logarithm
  3. log(2, base = 10) #Logarithm
  4. exp(5) #exponential (e ^ 5)
  5. factorial(5) #factorial
  6. sqrt(625) #Square root
  7. round(3.07822, digit=2) #3.08
  8. signif(3.07822, digit=2) #3.1
  9. ceiling(3.07822) #4
  10. floor(3.07822) #3
  11. We can get other Mathematical Functions in R documentation.

Special constant

  1. pi #Special constant pi value
  2. options () #Get global options
  3. options (digits = 4) #set digits to 4 (means three digits after decimal)

Special Numbers in R

  • Special Numbers allow the calculations or analyses to continue (or terminate gracefully) in adverse situations and prevent the program to crash abruptly.For example when we hit a overflow conditions.
  • Positive and negative infinity (Inf and -Inf)
    • To represent overflow conditions.
    • some number that is too big or too small to be handled by computer.
  • NaN (Not a Number or Undefined)
    • To represent a value that is not a real number.
    • It output doesn’t make any mathematical sense.
  • NA (Not available or Missing)
    • Missing value, not available in data set.
    • In R language NaN is NA but converse is not true.

Special Numbers in R

  1. 1 / 0 #Positive infinity
  2. -20 / 0 #Negative infinity
  3. Inf + 5 #Operation on Inf
  4. -Inf + 5 #Operation on Inf
  5. is.finite(1 / 0) # check if finite number (FALSE)
  6. is.infinite(1 / 0) # check if infinite number (TRUE)
  7. is.finite(0/1) # check if finite number (TRUE)
  8. Inf / Inf #Not a Number (NaN)
  9. Inf – Inf #Not a Number (NaN)
  10. is.nan(Inf – Inf) #Check if NAN (TRUE)
  11. NA + 4 #Missing value (NA)(operation on missing number)
  12. NA – 4 #Missing value (NA)
  13. NA * 4 #Missing value (NA)
  14. NA / 4 #Missing value (NA)
  15. is.na(NA +4) #Check if NA (TRUE)
  16. is.nan(NA) #Check if NA in NaN (FALSE)
  17. is.na(NaN) #Check if NaN in NA (TRUE)
  • R has six basic (‘atomic‘) vector types: logical, integer, real, complex, string (or character) and raw.
  • A real number is any positive or negative number. This includes all integers (whole numbers including zero) and all rational and irrational numbers. Rational numbers may be expressed as a fraction (such as 7/8) and irrational numbers may be expressed by an infinite decimal representation (3.1415926535…). Real numbers that include decimal points are also called  floating point numbers, since the decimal “floats” between the digits.
  • Zero is known as the neutral integer, or the whole number that comes in the middle of positive and negative numbers on a number line, which in turn makes it an integer, but not necessarily a natural number.
  • Natural numbers are the positive integers (whole numbers) 1, 2, 3, etc., and sometimes zero as well.
  • https://techterms.com/definition/realnumber

Logical Operators

  • Logical operators not only work on numerical values But also work on character strings.
  • Boolean result for each comparison.
  • Logical NOT operator will simply inverse the value inside parenthesis.
  • Logical OR operator works on two logical expressions , if any of the two operands evaluates to true, then the output will be TRUE.
  • For logical AND result will be TRUE only when both operands or logical expressions evaluate to true.
https://www.tutorialkart.com/r-tutorial/r-operators/
https://methodenlehre.github.io/SGSCLM-R-course/the-r-language.html#logical-operators-and-functions
https://methodenlehre.github.io/SGSCLM-R-course/the-r-language.html
  1. 6 < 3 #less than
  2. 6 <= 3 #less than equal to
  3. 6 > 3 #greater than
  4. 6 >= 3 #greater than equal to
  5. 6 == 3 # equal to
  6. 6 != 3 #not equal to
  7. “b” > “a” #comparing characters (TRUE)
  8. “e” < “b” #comparing characters (FALSE)
  9. !(TRUE) #logical NOT operator (inverses value inside parenthesis) (FALSE)
  10. TRUE | FALSE #logical OR operator (TRUE)(TRUE when any of two operands true)
  11. TRUE & FALSE #logical AND operator (FALSE) (TRUE when both logical expressions are true)

Data Structure

  • Data sturucture defines the way in which data will be organized and stored in the memory.
  • Data sturucture is collection of data elements grouped under one name.
  • It can be seen as a container holding data elements. Selection of data structure depends upon answers to questions-
    • what type of items to put in (homo or heterogenous)
    • how to arrange-to produce different data structures (List, Vector, Factor, Data frame, Matrix, Array)
  • We can use str() function to see the structure of an object.
  • is.numeric() function to test if it’s a numeric vector.
  • Most of the data in real word is either objects or basic/atomic classes (character, Numeric, Integer, logical and complex ) or objects that can be built using these basic classes.
  • In R language we use capital “L”as suffix to explicitly mark any numeric value without a decimal part as an integer.
  • Elements without”L” suffix will be trated as double values making it a numeric vector.
  • Integers are basically numeric values without decimal parts. Integer can be considered as numeric vector but converse may not be true.
  • Data Frames: A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
  • Factors: Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in data analysis for statistical modeling.
  • Matrices: Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. A Matrix is created using the matrix() function. 
  • Arrays: Arrays are the R data objects which can store data in more than two dimensions. It takes vectors as input and uses the values in the dim parameter to create an array.
  • List: Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it.
  • Vector: A Vector is a sequence of data elements of the same basic type. 
  • https://www.edureka.co/blog/r-tutorial/
Atomic classes of objects
https://www.edureka.co/blog/r-tutorial/

Various Data structures in R
http://www.simonqueenborough.info/R/basic/data-types.html
http://www.simonqueenborough.info/R/basic/data-types.html

Atomic Vector

  • Commonly known as “Vector”.
  • Homogenous data structure.
  • One-dimensional arrangement of elements.
  • Most of the basic functions or operators in R are vectorized in nature.
  • Vectorized operators can work on one or more vectors to give some results.
  • vector is a sequence of data elements of the same basic type (numeric, chracter or boolean type).
  • Members of a vector are called Components.
  • We use combine function c() to create vector by combining discrete values.
  • Here is a vector containing three numeric values 2, 3 and 5 : c(2, 3, 5).
  • The c() function coerces non-character values to character type if one of the elements is a character.
  • The syntax to access a vector a at (position) index i is, a[i]
  • If we try to access elements outside of the index range, we will get the value as NA
  •  We can access multiple elements of a vector using a vector of indices. The returned value is also a vector.
  • We can also make some elements disappear in the returned vector by giving the indices of those elements with a negative sign.
  • We can also provide TRUE and FALSE in an indices vector. When there is TRUE, the element is considered in the result and when there is FALSE, the element is not considered in the result.
  • For the two vectors with length greater than one, the arithmetic operations are performed like a dot operation (one to one).
  • R has six basic (‘atomic’) vector types: logical, integer, real, complex, string (or character) and raw.
  • We can perform arithmetic operations (like addition, subtraction, multiplication and division).
  • Importantly, the two vectors should be of same length and same type or the second vector can be an atomic value of same type.
  • If the second vector is of different length, something called recycling happens to the smallest vector to match the size of the largest vector.
https://www.wikitechy.com/tutorials/r-programming/r-datatypes-vectors

Create a vector

  1. students.marks <-c (10,20,30,40) # create a vector using combine function
  2. students.marks # print variable
  3. [1] 10 20 30 40
  4. students.grades <- c (“pass”, 50, “B”, “TRUE”) # create a vector using different element types
  5. students.grades # print (non-character values converted to character type)
  6. [1] “pass” “50” “B” “TRUE”

Atomic Vector Script

  1. student.names <-c (“amar”,”bob”, “chap”, “don”) # create character vector (using combine function)
  2. student.names # print vector content
  3. [1] “amar” “bob” “chap” “don”
  4. str (student.names) # get vector structure ( four elements indexed from [1:4])
  5. chr [1:4] “amar” “bob” “chap” “don”
  6. is.character (student.names ) # test character vector
  7. [1] TRUE
  8. student.weights <-c (70.1, 70.2, 70.3, 70.4) # create numeric vector (using combine function)
  9. student.weights # print vector content
  10. [1] 70.1 70.2 70.3 70.4
  11. str (student.weights) # get vector structure
  12. num [1:4] 70.1 70.2 70.3 70.4
  13. is.numeric (student.weights) # test numeric vector
  14. [1] TRUE
  15. student.marks <-c (51L, 52L, 53L, 54L) # create integer vector (using combine function)
  16. student.marks # print vector content
  17. [1] 51 52 53 54
  18. str (student.marks) # get vector structure
  19. int [1:4] 51 52 53 54
  20. is.integer (student.weights) # test integer vector
  21. [1] FALSE
  22. is.numeric (student.marks) # integer is numeric without decimal part
  23. [1] TRUE
  24. is.integer (student.weights) # numeric maynot be integer
  25. [1] FALSE
  26. student.maths.interest <-c (T, TRUE, F, FALSE) # create logical vector (using combine function)
  27. student.maths.interest # print vector content
  28. [1] TRUE TRUE FALSE FALSE
  29. str (student.maths.interest) # get vector structure
  30. logi [1:4] TRUE TRUE FALSE FALSE
  31. is.logical (student.maths.interest) # test logical vector
  32. [1] TRUE
  33. complex.vector <-c (10+2i, -10+2i, 10-2i) # create complex vector (using combine function)
  34. complex.vector # print vector content
  35. [1] 10+2i -10+2i 10-2i
  36. str (complex.vector) # get vector structure
  37. cplx [1:3] 10+2i -10+2i 10-2i
  38. is.complex (complex.vector) # test complex vector
  39. [1] TRUE
  40. vector (“character”, length = 4) # create character vector (using vector command (default:empty string)
  41. [1] “” “” “” “”
  42. vector (“numeric”, length = 4) # create numeric vector (using vector command)(default:0)
  43. [1] 0 0 0 0
  44. vector (“integer”, length = 4) # create integer vector (using vector command)
  45. [1] 0 0 0 0
  46. vector (“logical”, length = 4) # create logical vector (using vector command)(default:FALSE)
  47. [1] FALSE FALSE FALSE FALSE
  48. vector (“complex”, length = 4) # create complex vector (using vector command)(default:0+i0)
  49. [1] 0+0i 0+0i 0+0i 0+0i

Subsetting or Access vector component

  • Subsetting is the process of extracting one or more elements from any data structure (vector here).
  • In R there is 1 based index.
  • Use [] to access an element.

Subsetting/Extraction/Accessing vector elements

  1. students.marks [3] # to access vector at third position (index)
  2. [1] 30
  3. students.marks [5] # to access vector element outside the index range
  4. [1] NA
  5. students.marks [1:3] # access multiple elements in a sequence using colon operator (1 through 3 here)
  6. [1] 10 20 30
  7. student.marks [c(T,F,T,F)] #access multiple elements specifying logical vector
  8. [1] 10 30
  9. students.marks [c(1,4)] #access multiple elements specifying index vector
  10. [1] 10 40
  11. student.names <- c(“amar”,”bob”,”chap”,”don”) #create vector using combine function
  12. [1] “bob” “chap” “don”
  13. student.names [students.marks >= 20] #access multiple elements specifying logical vector
  14. indices <- c(1,3) # create vector of indices
  15. students.marks [indices] # access multiple elements of a vector using a vector of indices
  16. [1] 10 30
  17. neg.indices <- c(-1, -3) # create vector of negative indices to make some elements disappear
  18. students.marks [neg.indices] # print to make some elements disappear
  19. [1] 20 40
  20. bool.indices <-c (TRUE, FALSE, TRUE, FALSE) # create logical vector (TRUE, FALSE) of indices
  21. students.marks [bool.indices] # using TRUE and FALSE in an indices vector
  22. [1] 10 30
  23. bool.indices.short <-c (T,F,T,F) # create logical vector of indices (T & F)
  24. student.marks [bool.indices.short] #access element
  25. [1] 10 30

Coercions/ Typecasting/ Type conversion

  • Converting one type to another.
  • Whenever there is mixing of data type coercion will occur.
  • Typesetting can be Implicit or Explicit (may or may not be Sensible)

Coercion script

  1. student.weights <- c(50.1,50.2,50.3,50.5) #create numeric(double) atomic vector
  2. str (student.weights) #print data structure
  3. num [1:4] 50.1 50.2 50.3 50.5
  4. student.weights <- c(50.1,50.2,50.3,’50.5′) #Implicit coercion due to mixing of data types
  5. str (student.weights) #print converted data structure
  6. chr [1:4] “50.1” “50.2” “50.3” “50.5”
  7. as.numeric (student.weights >= 50.3) # converting logical values (T,F) to numeric values(1,0)
  8. [1] 0 0 1 1
  9. as.integer (student.weights) #converting numeric to integer vector
  10. [1] 50 50 50 50
  11. as.character(student.weights) #converting numeric to character
  12. [1] “50.1” “50.2” “50.3” “50.5”
  13. as.logical(student.weights) #converting numeric values to logical values (Insensible coercion)
  14. [1] NA NA NA NA
  15. student.names <- c(“amar”,”bob”, “chap”, “don”) #create character atomic vector
  16. as.numeric (student.names) #converting names to logical values (Insensible coercion)
  17. [1] NA NA NA NA
  18. Warning message:
  19. NAs introduced by coercion

R Vector Arithmetic & Logical Operations

  1. students.test.marks <- c(1,2,3,4)
  2. add1.student.marks <-students.marks + 2 # arithmetic operations performed like a dot operation (one to one).
  3. add1.student.marks # print variable
  4. [1] 12 22 32 42
  5. add2.student.marks <-students.marks + students.test.marks # arithmetic operations performed like a dot operation (one to one).
  6. add2.student.marks # print variable
  7. [1] 11 22 33 44
  8. sub1.student.marks <- students.marks – 2 #R Vector Subtraction
  9. sub1.student.marks # print variable
  10. [1] 8 18 28 38
  11. sub2.student.marks <- students.marks – students.test.marks #R Vector Subtraction
  12. sub2.student.marks # print variable
  13. [1] 9 18 27 36
  14. mul1.students.marks <- students.marks * 2 #R Vector Multiplication
  15. mul1.students.marks # print variable
  16. [1] 20 40 60 80
  17. mul2.students.marks <- students.marks * students.test.marks #R Vector Multiplication
  18. mul2.students.marks # print variable
  19. [1] 10 40 90 160
  20. div1.students.marks <- students.marks %/% 2 #R Vector integer divison
  21. div1.students.marks # print variable
  22. [1] 5 10 15 20
  23. div2.students.marks <- students.marks %/% students.test.marks #R Vector integer divison
  24. div2.students.marks # print variable
  25. [1] 10 10 10 10
  26. students.less.marks <- c(1,2) # create vector with different length
  27. recycle.students.less.marks <- students.marks + students.less.marks # smallest Vector Element Recycling
  28. recycle.students.less.marks # print variable
  29. [1] 11 22 31 42
  30. students.marks >= 20 # R vector logical operation
  31. [1] FALSE TRUE TRUE TRUE

Vectorized Operations

  1. Vectorized operations: Flavor I: Input= single vector, Output= Scalar(vector of length 1)
  2. mean(students.marks) # print variable
  3. [1] 25
  4. Vectorized operations: Flavor II: Input= single vector, Output= single vector
  5. student.marks <-students.marks + 1 #arithmatic operator
  6. student.marks # print the content of new vector students.marks
  7. [1] 11 21 31 41
  8. student.marks >= 12 # logical operator
  9. [1] FALSE TRUE TRUE TRUE
  10. sqrt (students.marks) # print the content of variable
  11. [1] 3.162278 4.472136 5.477226 6.324555
  12. Vectorized operations: Flavor III: Input= multiple vector, Output= single vector
  13. total.students.marks <- students.marks + students.test.marks # addition
  14. total.students.marks # print the content of vector
  15. [1] 11 22 33 44

Factor

  • Special case of vector used to store nominal/categorical values.
  • “Categorical” means field can take values from few categories only.
  • “Gender” is a categorical variable as limited categories available.
  • Factors are more efficient than character vectors as characters are case sensitive. Further, string comparisons (filtering) are inefficient in comparison to integer comparison.
  • Factors are self-describing in contrast to integer vectors.
  • We can create a factor by wrapping combine function output in the factor method. This function will also provide additional information about number of levels (categories).
  • Explicit coercion of factor to numeric vector will create Implicit levels (as.numeric () function ).By default, levels are decided alphabatically.
  • Explicit levels can also be created when we want to order our own levels.We can do it by creating a factor with first arguement as the combine function to provide various categories(levels) and second arguement levels with the combine function again to order levels as per our need.
  • Not only we can order the levels but also can create additional levels to provide all possible levels in the problem.

Script for Factor

  1. student.gender <- c(‘Male’,’Male’,’Female’,’Female’) #character vector to represent Female & Male categories
  2. student.gender #print categories
  3. [1] “Male” “Male” “Female” “Female”
  4. student.gender <- c(2L,2L,1L,1L) #integer vector to represent Female & Male categories as 2& 1 resp.
  5. student.gender #print categories
  6. [1] 2 2 1 1
  7. student.gender <- factor(c(‘Male’,’Male’,’Female’,’Female’)) #create factor(wrap combine function output in a factor method)
  8. student.gender # print categories with levels (not wrapped in quotes)
  9. [1] Male Male Female Female
  10. Levels: Female Male
  11. as.numeric (student.gender) #explicit coercion of factor to numeric vector( create Implicit levels)
  12. [1] 2 2 1 1
  13. student.blood.groups <- factor(c(“A”,”B”,”O”, “O”)) #create factor(wrap combine function output in a factor method)
  14. student.blood.groups #print categories along with levels (no quotes)
  15. [1] A B O O
  16. Levels: A B O
  17. student.blood.groups <- factor(c(“A”,”B”,”O”,”O”), levels = c(“A”, “B”, “AB”, “O”)) #create Explicit levels
  18. student.blood.groups #creating additional levels
  19. [1] A B O O
  20. Levels: A B AB O
  21. str(student.blood.groups) #test structure of factor with additional level
  22. Factor w/ 4 levels “A”,”B”,”AB”,”O”: 1 2 4 4

List

  • Heterogenous data structure-can contain different class items.
  • One dimentional arrangement of elements.
  • With the help of list we can extract all information of one student stored in different type of vectors using single command.
  • To create a list, first we create various different vectors storing different type of information. Then, we create an unnamed list using list() function and in this list () function we can pass all desired elements by use of square bracket to get a partcular element from the designated vector ( indexing ).
  • We can create a named list also by specifying a name for each parameter. This will give the name of each element after the dollar sign in the output.
  • Can use str () function to get the structure of created list.
  • We can use the elemnts of atomic class only as the members of list or we can also use a vector as an element in the list (List can also be a member of another list).
  • We can extract one or more elements from a list using square bracket.
    • If we use single bracket to extract an object/element it will give an object of same type from which we are subsetting. If we subset, we get a vector and if we subset a list we will et a list.
    • If we use single bracket to extract an object/element it will return the type of extracted element.
    • typeof() function to get the type of extracted element.
    • We can use names (wrapped in quotes) instead of index number to extract element from named list.
    • We can also use dollar sign to extract element from named list and don’t need to use square brackets or wrap the element name in quotes.
    • We can use length() function to get the total number of elements in the list.

Script for List

  1. student.names <- c(‘amar’, ‘bindu’, ‘chris’, ‘don’) #character vector
  2. student.weights <- c(70.1, 70.2, 70.3, 70.5) #numeric (double)vector
  3. student.gender <- factor (c(“male” , “female” ,”female”, “male” )) #factor
  4. student.eng.marks <- c( 51L, 52L, 53L, 55L) #integer vector
  5. student.maths.marks <- c( 61L, 62L, 63L, 65L) #integer vector
  6. student1 <- list (student.names[1], student.weights[1], student.gender[1], student.eng.marks[1], student.maths.marks[1] ) # creating unnamed list
  7. str (student1) #get the structure of created list
  8. List of 5
  9. $ : chr “amar”
  10. $ : num 70.1
  11. $ : Factor w/ 2 levels “female”,”male”: 2
  12. $ : int 51
  13. $ : int 61
  14. student1 # print content of unnamed list
  15. [[1]]
  16. [1] “amar”
  17. [[2]]
  18. [1] 70.1
  19. [[3]]
  20. [1] male
  21. Levels: female male
  22. [[4]]
  23. [1] 51
  24. [[5]]
  25. [1] 61
  26. student1 <- list (name= student.names[1], weight= student.weights[1], gender= student.gender[1], english= student.eng.marks[1], maths= student.maths.marks[1] ) # creating named list
  27. str (student1) #get the structure of created list
  28. List of 5
  29. $ name : chr “amar”
  30. $ weight : num 70.1
  31. $ gender : Factor w/ 2 levels “female”,”male”: 2
  32. $ english: int 51
  33. $ maths : int 61
  34. student1 # print content of named list
  35. $name
  36. [1] “amar”
  37. $weight
  38. [1] 70.1
  39. $gender
  40. [1] male
  41. Levels: female male
  42. $english
  43. [1] 51
  44. $maths
  45. [1] 61
  46. student1 <- list (name= student.names[1], weight= student.weights[1], gender= student.gender[1], marks.eng.maths= c(student.eng.marks[1], student.maths.marks[1])) # List containig a vector as an element
  47. str (student1) #get the structure of List containig a vector as an element
  48. List of 4
  49. $ name : chr “amar”
  50. $ weight : num 70.1
  51. $ gender : Factor w/ 2 levels “female”,”male”: 2
  52. $ marks.eng.maths: int [1:2] 51 61
  53. student1
  54. $name
  55. [1] “amar”
  56. $weight
  57. [1] 70.1
  58. $gender
  59. [1] male
  60. Levels: female male
  61. $marks.eng.maths
  62. [1] 51 61
  63. student2 <- list(student.names[2], student.gender[2], student.weights[2], student.eng.marks[2], student.maths.marks[2]) # craete unnamed list
  64. student2 #print unnamed list
  65. [[1]]
  66. [1] “bindu”
  67. [[2]]
  68. [1] female
  69. Levels: female male
  70. [[3]]
  71. [1] 70.2
  72. [[4]]
  73. [1] 52
  74. [[5]]
  75. [1] 62
  76. str(student2) #get structure of created list
  77. List of 5
  78. $ : chr “bindu”
  79. $ : Factor w/ 2 levels “female”,”male”: 1
  80. $ : num 70.2
  81. $ : int 52
  82. $ : int 62
  83. typeof(student2) #get type of created object
  84. [1] “list”
  85. student2 [2] #extract element from unnamed list using single square bracket
  86. [[1]]
  87. [1] female
  88. Levels: female male
  89. typeof(student2 [2]) #get type of extracted object (same as from which element was extracted)
  90. [1] “list”
  91. student2 [[2]] #extract element from unnamed list using double square bracket
  92. [1] female
  93. Levels: female male
  94. typeof( student2 [[2]]) #get type of extracted object (type of extracted object itself)
  95. [1] “integer”
  96. student2 [1:3] #extract multiple elements from unnamed list using single square bracket
  97. [[1]]
  98. [1] “bindu”
  99. [[2]]
  100. [1] female
  101. Levels: female male
  102. [[3]]
  103. [1] 70.2
  104. student3 <- list(name= student.names[3], gender= student.gender[3], weight= student.weights[3], eng= student.eng.marks[3], maths= student.maths.marks[3])# create named list
  105. student3 [“gender”] #extract element from named list using element name wrapped in quotes
  106. $gender
  107. [1] female
  108. Levels: female male
  109. student3 $ gender # extract element from named using dollar sign without quotes
  110. [1] female
  111. Levels: female male
  112. student3 [ c(“gender”, “name” , “weight”)] #extract multiple elements from named list using combine function to create a character vector
  113. $gender
  114. [1] female
  115. Levels: female male
  116. $name
  117. [1] “chris”
  118. $weight
  119. [1] 70.3
  120. length(student3) #get the total number of elements in the list
  121. [1] 5

Data frames

  • Data frames are most popular data structures.
  • Heterogenous-can contain elements of different classes (like list).
  • 2-dimensional arrangement (unlike list).
  • Data frames are like spread-sheets; each column represents a field and values are stored in the rows.
  • If we want to get details of all the students of the class we can pass our atomic vectors to data.frame () function.
  • However, each object passed in data.frame function should have an equal number of elements.
  • Data frames are actually a type of list (we get ” list” when use typeof () function) where each element of the list is a vector of equal lengths. Vector variable names become field/column names and the values are stored in data rows.
  • Head = top line with field/column/vector names
  • Cell = Each value in data row
  • “Character” string values are treated as “factor” by default in data frames.
  • to remove this default behaviour we can set Boolean paramer stringasFactor = FALSE

Script for Data frame

  1. student.names <- c(“amar” , “bindu” , “chris” , “don” ) #character vector
  2. student.gender <- factor(c(‘male’, ‘female’, ‘female’, ‘male’)) # factor
  3. student.weights <- c(70.1, 70.2, 70.3, 70.5) #numeric (double)vector
  4. student.eng.marks <- c(51L, 52L, 53L,55L) #integer vector
  5. student.maths.marks <- c( 61L, 62L, 63L, 65L) #integer vector
  6. all.students <- data.frame (student.names, student.gender, student.weights, student.eng.marks, student.maths.marks) # create a data frame named all.students
  7. all.students #print content of created data frame
  8. typeof(all.students) #get type of created data frame
  9. [1] “list”
  10. str(all.students) #get structure of created data frame
  11. ‘data.frame’: 4 obs. of 5 variables:
  12. $ student.names : Factor w/ 4 levels “amar”,”bindu”,..: 1 2 3 4
  13. $ student.gender : Factor w/ 2 levels “female”,”male”: 2 1 1 2
  14. $ student.weights : num 70.1 70.2 70.3 70.5
  15. $ student.eng.marks : int 51 52 53 55
  16. $ student.maths.marks: int 61 62 63 65
  17. all.students <- data.frame (student.names, student.gender, student.weights, student.eng.marks, student.maths.marks, StringsAsFactors = FALSE) # use stringsAsFactors = FALSE to avoid conversion of character vector to factor
  18. all.students #print content of created data frame
  19. typeof(all.students) #get type of created data frame
  20. [1] “list”
  21. str(all.students) #get structure of created data frame
  22. ‘data.frame’: 4 obs. of 5 variables:
  23. $ student.names : chr “amar” “bindu” “chris” “don”
  24. $ student.gender : Factor w/ 2 levels “female”,”male”: 2 1 1 2
  25. $ student.weights : num 70.1 70.2 70.3 70.5
  26. $ student.eng.marks : int 51 52 53 55
  27. $ student.maths.marks: int 61 62 63 65

Script for operations on Data frame

  1. student.names <- c(“amar” , “bindu” , “chris” , “don” ) #create character vector
  2. student.gender <- factor(c(‘male’, ‘female’, ‘female’, ‘male’)) # factor
  3. student.weights <- c(70.1, 70.2, 70.3, 70.5) #numeric (double)vector
  4. student.eng.marks <- c(51L, 52L, 53L,55L) #integer vector
  5. student.maths.marks <- c( 61L, 62L, 63L, 65L) #integer vector
  6. #create dataframe while using stringsAsFactors = FALSE to avoid conversion of character vector to factor
  7. all.students <- data.frame (student.names, student.gender, student.weights, student.eng.marks, student.maths.marks, stringsAsFactors = F)
  8. all.students #print content of created data frame
  9. #subsetting data frame (first element means first column from data frame)
  10. all.students [1] #single square bracket to return object of same type (from which we extract)
  11. student.names
  12. 1 amar
  13. 2 bindu
  14. 3 chris
  15. 4 don
  16. typeof(all.students [1])
  17. [1] “list”
  18. all.students [[1]] #double square bracket to return object in its own type (element type)
  19. [1] “amar” “bindu” “chris” “don”
  20. typeof(all.students [[1]])
  21. [1] “character”
  22. # use column name instead of index to extract it
  23. all.students [“student.weights”] # single square bracket to return object of same type
  24. student.weights
  25. 1 70.1
  26. 2 70.2
  27. 3 70.3
  28. 4 70.5
  29. typeof(all.students [“student.weights”])
  30. [1] “list”
  31. all.students [[“student.weights”]] #double square bracket to give object of its own type
  32. [1] 70.1 70.2 70.3 70.5
  33. typeof(all.students [[“student.weights”]] )
  34. [1] “double”
  35. all.students$student.weights # dollar sign (with column name) to return object in its own type
  36. [1] 70.1 70.2 70.3 70.5
  37. typeof(all.students$student.weights)
  38. [1] “double”

Script to extract multiple elements (slice of data frame)

  1. all.students
  2. all.students [1:3] # colon or sequence operator to extract slice of first through third column
  3. all.students [ c(“student.weights”, “student.names”) ]# combine function to extract two columns together
  4. all.students [3,2] # Row, Column (extract individual cell located at third row & second column)
  5. [1] female
  6. Levels: female male
  7. all.students [1:3,1:2] #colon or sequence operator to extract consecutive rows and columns)(first three rows & first two columns)
  8. all.students [ c(1,2,3), c(1,2)] #combine function to extract individual cells
  9. all.students [,3] #unspecified row number(get all rows for third column)
  10. [1] 70.1 70.2 70.3 70.5
  11. all.students [1,]#unspecified column number (get all columns for first row)
  12. all.students [c(T,F,F,T),]# logical vector to pick required rows of all columns
  13. all.students [,c(T,F,F,F,T)]# logical vector to pick required column for all rows
  14. all.students [c(T,F,F,T),1:2] #extraction using both sequence operator and combine function
  15. all.students [student.gender == “female”, ] #logical operator to get details of only female students
  16. all.students [student.eng.marks == 51, ]
  17. all.students [student.eng.marks > 51, ]
  18. all.students [student.eng.marks >= 53, 1:2 ]

Matrix

  • 2-dimensional arrangement (similar to Data frames and spreadsheets) but are homogenous in nature (unlike Data frames) and can store elements of same type only.
  • Typically, Matrices are used to store & process numeric data.
  • rbind() function for row-wise binding of matrix. Variable names are ‘Row-names’ and column names are auto-generated as comma followed by index number.
  • cbind() function for column-wise binding of matrix. Variable names are ‘Column names or Head’ while Row names are auto-generated as index number followed by a comma.
  • We can customize row names by passing the created matrix as an arguement in the rownames() function on the left hand side of assignment operator. On the right hand side we create a character vector using combine function.
  • str() function can be used to test the structure of created matrix.
  • Dimnames or dimension names is a list which contains row names and column names.
  • Matrix can also be constructed using matrix ()function. As the first parameter to this function we will use combine function to create an integer vector. As second and third parameter we will use ncol=x and nrow =y to designate the number of columns and rows respectively. Row names as well as column names are auto-generated.
  • If we use matrix() function to create a matrix, by default the elements will be arranged column-wise. first of all, the first column will be filled , then the second column will be filled and then so on.
  • Techniques of extracting elements from Matrix are similar to that of used for extracting elements from Data frames.

Script for Matrix

  1. student.eng.marks <- c(61L, 62L, 63L, 65L) #integer vector
  2. student.maths.marks <- c(71L, 72L, 73L, 75L) #integer vector
  3. student.marks <- rbind (student.eng.marks, student.maths.marks) #row-wise matrix binding
  4. student.marks #print content of(row-wise)created matrix
  5. student.marks <- cbind (student.eng.marks, student.maths.marks) #column-wise matrix binding
  6. student.marks #print content of(column-wise)created matrix
  7. rownames(student.marks) <- c(“amar”, “bindu”, “chris”, “don”) #customize row-names for column bound matrix
  8. student.marks #print content of(column-wise)created matrix with customized row-names
  9. str(student.marks) #get structure of matrix with dimnames (cbind function)
  10. student.marks <- matrix(c(61L, 62L, 63L, 65L,71L, 72L, 73L, 75L), ncol=4, nrow=2)# matrix function to create matrix (default column-wise item arrangement)
  11. student.marks #print content of created matrix(default column-wise, c=4,r=2)
  12. student.marks <- matrix(c(61L, 62L, 63L, 65L,71L, 72L, 73L, 75L), nrow=4, ncol=2 ) #create matrix(default column-wise, r=4,c=2)
  13. student.marks #print content of created matrix(default column-wise, r=4,c=2)
  14. student.marks <- matrix(c(61L, 62L, 63L, 65L,71L, 72L, 73L, 75L), ncol=4, nrow=2, byrow= TRUE) #row wise item arrangement
  15. student.marks # print content of this row-wise matrix
  16. str(student.marks) #get structure of matrix without dimnames (matrix function)
  17. int [1:2, 1:4] 61 71 62 72 63 73 65 75
  18. rownames(student.marks) <-c(“amar” , “bindu”)
  19. student.marks

Script for Matrix Operations

  1. student.eng.marks <- c(61L, 62L, 63L, 65L) #integer vector
  2. student.maths.marks <- c(71L, 72L, 73L, 75L) #integer vector
  3. student.marks <- cbind (student.eng.marks, student.maths.marks) #column-wise matrix binding
  4. rownames(student.marks) <- c(“amar”, “bindu”, “chris”, “don”) #customize row-names for column bound matrix
  5. student.marks #print content of(column-wise)created matrix with customized row-names
  6. str(student.marks) #get structure of matrix with dimnames (cbind function)
  7. #Subsetting or extracting elements from Matrix [Row, Column]
  8. student.marks[3,2] #extract single cell
  9. [1] 73
  10. student.marks[,] #extract all rows & all columns
  11. student.marks[,2] #extract all rows with specified column
  12. student.marks[2,] #extract specified row with all columns
  13. student.marks[2:4,1] #colon or sequence operator for consecutive rows
  14. student.marks[c(1,3), ] #combine function (integer vector)to extract individual rows and all columns
  15. student.marks[c(T,F,T,F), 2] #logical vector (combine function) to extract individual rows and specified column
  16. #Matrix Summary
  17. student.marks #print content of(column-wise)created matrix with customized row-names
  18. rowSums(student.marks) #Row-wise sum
  19. colSums(student.marks) #Column-wise sum
  20. colMeans(student.marks) #Column wise mean

REFERENCES