- Free and Open source.
- Can run on windows/Mac/Linux.
- Great extensibility.
- Can integrate with other languages(C, C++, Java, Python) and packages(SAS, SPSS).
- Statistics, Data visualization (Graphs & Images)
- Graphical User Interfaces (GUI)
- R and Python are both open-source programming languages with a large community.
- R is mainly used for statistical analysis while Python provides a more general approach to data science.
R Studio
- 4 panes in 4 quadrants.
- Can change size of quadrants using splitters or buttons on top.
- An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development.
- Tools provided by an IDE include a text editor, a project editor, a tool bar, and an output viewer.
- IDEs can perform a variety of functions ( write code, compile code, debug code, and monitor resources).
Packages
- R packages are a collection of R functions, complied code and sample data.
- They are stored under a directory called “library” in the R environment.
- By default, R installs a set of packages during installation. More packages are added later, when they are needed for some specific purpose.
- Task specific (developed for particular set of tasks).
Built-in Help Commands
- Help mechanisms (help, demo, vignette) that work on information available and installed on your computer.
- R functions are case-sensitive.
- Click Ctrl+Enter or Run to execute the command.
help()
- help(term) or ?term if know the exact term for document to be searched.
- help.search(‘term’) or ??term if don’t know the exact search term.
- example() to directly execute the example given in that document (rnorm here). It gives an idea about its usage and these examples are mostly reproducible in nature, can simply copy-paste to execute those.
Codes for help command
- help(rnorm) #search exact term
- ?rnorm #shortcut to search exact term
- example(rnorm) #to diectly execute examples available in this documentation
- help.search(‘rnorm’) #search simialr (not exact) term
- ??rnorm #short syntax to search simialr (not exact) term
demo()
- Demos are like examples but tend to be longer. Instead of focussing on a single function, they show how to weave together multiple functions to solve a problem.
- Demos are feature specific but not available for all functions/features.
- demo(graphics) pass a topic name (again, graphics here) as a parameter to demo function. Optionally can also mention package name as a parameter.Console will prompt to hit enter key to start. Again, press enter to see next plots.
Codes for demo command
- demo() #List demonstrations in all attached packages. (shown with checked check-boxes under package panel)
- demo(package= .packages(all.available= TRUE)) #List demonstrations in all installed packages
- demo(package= ‘graphics’) #List demonstrations in specific package such as graphics here
- demo(‘graphics’) #demonstrate graphics topic, using enter key in console,in plots section
- demo(‘graphics’,package= ‘graphics’) #demonstrate graphics topic,optionally giving package name too
vignette()
- Demos are feature specific. To know more about a package as a whole we can use vignette option.
- Cover all or part of functionality of a package.
- Each vignette provides three things: the original source file, a readable HTML page or PDF, and a file of R code.
- Short to mid size pdf document; covers brief introduction, background, usage, result, references.
Codes for vignette command
- vignette() #List all vignette in attached packages
- vignette(package=.packages(all.available=TRUE)) #List all vignette in all installed packages
- vignette(package= ‘parallel’) #List all vignette topics in parallel package
- vignette(‘parallel’) #show vignette(pdf) for topic parallel
- vignette(‘parallel’,package =’parallel’) #show vignette(pdf) for topic parallel using package name optionally.
Web Search
- If information is not available on our local machine then we can use power of web to find it, through either of following two ways.
- Using R command(console)-limited search uses.
- Using various popular search engines.
Web search using R command
- RSiteSearch command to invoke a website from R console.
- This function can search keywords or phrases in various help pages, vignettes and task views.
- uses search engine of R project available at following link
- http://search.r-project.org
- Click on search-key term will be highlighted as red.
- Can filter search results by choosing target (function, vignette or task view)
- can sort the search results as per need (by default search is done by score value-signifies total number of occurrence of key term in the search result.
- Search results can contain all or one of the search terms in any order.
- Have to put search terms in curly braces-If want to have exact search with both key terms occurring together.
- install.packages(‘sos’) to install any package (sos here)
- library(sos) we have to load that library into the memory before we can use any function of the sos package.
- findFn Now we can invoke the functions available in this sos package using this function name.
- Description and Link (last column) have links to actual search help page.
- Package column shows name of package containing the help page.
- Function column contains the help page name.
- Count column- total number of matches in the current package.
- Score column- score values coming from RSiteSearch function output.
- Maxscore column- maximum of the score value of the various help pages in corresponding package.
- Totalscore column– sum of all scores for particular package.
- maxPages parameter can be set if want to reduce the number of search pages or see the top pages.
- ??? is Shortcut to findFn function

- Count is package specific(number of link/help page/functions in packages having key term)
- Score is link/help page specific (occurrence of key term in specific link page) .
- Example-for search term “inheritance” packages may include three books-Python, Genetics and social science (count=3). Help pages may correspond to number of chapters in each book.Each chapter having different occurrence of key term makes score of each chapter.
Codes for web search
- RSiteSearch(“arithmetic%20mean”)#to search from http://search.r-project.org
- RSiteSearch(‘{arithmetic%20mean}’)#exact search all key terms occurring together
- install.packages(‘sos’) #to install sos package
- library(sos) #to load sos package
- findFn(‘{arithmetic mean}’) #search function using findFn in tabular form
- findFn(‘{arithmetic mean}’,maxPages=2) #maxPages parameter to get top pages
- ???'{arithmetic mean}'(2) #shortcut for findfn function with maxPages parameter
Communities
Mailing List
- R-help for primary help.
- R-devel for code development in R.
- R-sig-finance, R-sig-hpc for special interest groups.
- http://bit.ly/mailingR
Forums
- http://bit.ly/stackoverflowR
- http://bit.ly/crossvalidatedR (statistics and data-mining questions in R)
- http://bit.ly/nabbleR
Blogs
- http://bit.ly/Rbloggers
- http://bit.ly/revolutionanalyticsR
- http://bit.ly/rstatistics
- http://bit.ly/rdataminig
- http://bit.ly/RPostingGuide (guidelines to post queries)

Object
- An object, in object-oriented programming (OOP), is an abstract data type created by a developer. A simple example of an object may be a user account created for a website.
- Objects have states and behaviors. Example: A dog has states – color, name, breed as well as behaviors – wagging the tail, barking, eating.
- Object is an instance of a class.
- Class can be defined as a template/blueprint for creating a futute objects (a particular data structure),that describes the initial values for state (member variables or attributes), and implementations of behavior (member functions or methods).
Variable
- Variable is a place holder or container that can holds any object.
- We can use assignment operater (<-) along with spaces (to increase program readability) before & after to assign any value to a variable (X).
- X <- 10
- R varaibles are case-sensitive. (X is different from x).
Function
- Function is a group of statements that together perform a task.
- Function can be seen as a box that takes some value and perform some actions.
- A function is an object. R interpreter passes control to the function, along with arguments necessary for the function to accomplish the actions.
- The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects.
- A function is a piece of code that is independently called by name.
- A method is also a piece of code that is called by a name that is associated with an object.
- function applies to both OOP (obect-oriented) and non OOP non-object oriented languages (C, Fortran, COBOL, Pascal)
- Method applies to only object oriented languages(C, C++, Java).
Parameter & Arguement
- Parameters are components of functions. They identify values that are passed into a function.
- Parameter is variable in the declaration of function. While, Argument is the actual value of this variable that gets passed to function.
- Passing an argument to a program means setting the value of parameter of that program.
Creating a variable in a new (custom) environment
- Environment can be considered as a bag to store variables.
- It is used in lexical scoping.
- When we crearte a variable x, it is created by default in the global environment.
- When we crearte a variable in a custom environment (inside parent global environment) also. Same variables can have different values in different environments.
- To make sure that variable we created belongs to the global environment we can use get command
- get (“x”, globalenv())
- To create a new environment (named my.environment) we can use command
- my.environment <- new.env ()
- We can check parent environment to this new custom environment by following R function.
- parent.env (my.environment)
- We can create a variable in this new custom environment by 3 different ways.
- assign (“x”, 10, my.environment)
- my.environment [[“x”]] <-10
- my.environment$x <-10 (quotes around variable name are optional)
- we can read above command as ‘ assign the value of 10 to variable “x” in the environment named as my.environment.
Code for creating a variable in a new (custom) environment
- my.environment <- new.env ()
- assign (“comprehensive.marks”, 120, my.environment)
- my.environment [[“comprehensive.marks”]] <-120
- my.environment$x <-10
Getting a variable from a environment
- We can use “get” command to get a variable by passing the name of variable as well as name of environment as the parameters.
- get (“x”, my.environment)
- my.environment [[“x”]](quotes around variable name are optional)
- my.environment$x
Code for getting a variable from a environment
- get (“comprehensive.marks”, my.environment)
- my.environment [[“comprehensive.marks”]]
- my.environment$comprehensive.marks # (quotes around variable name are optional)
- comprehensive.marks <- 100 # assign variable in global environment
- comprehensive.marks # get variable from global environment
- get (“comprehensive.marks”, globalenv())# get variable from global environment (confirm it’s in global environment)
- my.environment <- new.env ()# create a new custom environment
- parent.env (my.environment)# check parent environment
- assign a variable in this new custom environment by 3 different ways. can select all 3 lines and run together.
- assign (“comprehensive.marks”, 120, my.environment)
- my.environment [[“comprehensive.marks”]] <-120 #(quotes around variable name are optional)
- my.environment$x <-10
- to get a variable by 3 different ways.
- get (“comprehensive.marks”, my.environment)
- my.environment [[“comprehensive.marks”]]
- my.environment$comprehensive.marks # (quotes around variable name are optional)
Naming convention
- Syntactically valid name consist of-
- letter, number, dots or underline characters.
- starts with either letter or dot followed by letter(not number), ex-goodName, good.Name, good_Name, .goodName are
- .4goodName is an invalid name.
- We cannot use few reserve keywords (with some semantic definition in R) for naming in R. ex-function, for, if, else, while, next, repeat, TRUE, FALSE, NULL, NA, NaN etc.
Google’s R Style Guide
- For not only syntactically valid but also more consistent and readable variable names.
- File Names
- File names should end in .R and be meaningful.
- GOOD:
predict_ad_revenue.R
- BAD:
foo.R
- Identifiers
- Variable names should have all lower case letters and words separated with dots (.) Don’t use underscores ( _ ) or hyphens (-).
variable.name
GOOD:avg.clicks
BAD:avg_Clicks
,avgClicks
- Function names have initial capital letters and no dots (CapWords).Make function names verbs.
FunctionName
GOOD:CalculateAvgClicks
BAD:calculate_avg_clicks
,calculateAvgClicks
Exception: When creating a classed object, the function name (constructor) and class should match (e.g., lm).kConstantName
- http://web.stanford.edu/class/cs109l/unrestricted/resources/google-style.html
How to assign a variable
- By using assignment operator (less than sign folloed by dash). Use “run” to execute this line. We can see this execution on the bottom console.
- If we want to see the content of varaible, we can simple execute the variable name.
- comprehensive.marks <- 100
- By using assign () function. Assign function can take several parameters. In the example below, first parameter is the variable name and second parameter is the value we want to assign it.
- assign ( “comprehensive.marks”, 100)
- Whenever we create a variable that is also created in environment tab on right panel.
Codes for assigning a variable
- comprehensive.marks <- 100 # assign value to a variable using assignment operator
- comprehensive.marks # print variable content
- assign (“match.score” , 500) # assign value to a variable using assign function
- match.score # print variable content
Operator
- Operator are similar to in-buit mathematical ‘Functions’ but both are syntactically different.
- are pillars of any programming language
- can work on one or more objects called “Operands”
- will return some results
- Two types of Operators.
- Arithmatic operator-operate on numeric values
- Logical operator- operate on Boolean or Logical values (True or false)
Mathematical Operators in R
- 12 + 4 #Addition
- 12 – 4 #subtraction
- 12 * 4 #Multiplication
- 12 / 4 #Divison
- 12 ^ 2 #Exponentiation (carrot symbol)
- 12 ** 2 #Exponentiation (double multiplication symbol)
- 10 ^ 3
- format(10 ^ 3, scientific = TRUE) #With scientific notation
- format(10 ^ 3, scientific = FALSE) #Without scientific notation (string format)
- 12 %% 5 #Modulus
- 12 %/% 7 #Integer divison

In-built Mathematical Functions in R
- abs(-7) #Absolute value
- log(2) #Natural Logarithm
- log(2, base = 10) #Logarithm
- exp(5) #exponential (e ^ 5)
- factorial(5) #factorial
- sqrt(625) #Square root
- round(3.07822, digit=2) #3.08
- signif(3.07822, digit=2) #3.1
- ceiling(3.07822) #4
- floor(3.07822) #3
- We can get other Mathematical Functions in R documentation.
Special constant
- pi #Special constant pi value
- options () #Get global options
- options (digits = 4) #set digits to 4 (means three digits after decimal)

Special Numbers in R
- Special Numbers allow the calculations or analyses to continue (or terminate gracefully) in adverse situations and prevent the program to crash abruptly.For example when we hit a overflow conditions.
- Positive and negative infinity (Inf and -Inf)
- To represent overflow conditions.
- some number that is too big or too small to be handled by computer.
- NaN (Not a Number or Undefined)
- To represent a value that is not a real number.
- It output doesn’t make any mathematical sense.
- NA (Not available or Missing)
- Missing value, not available in data set.
- In R language NaN is NA but converse is not true.
Special Numbers in R
- 1 / 0 #Positive infinity
- -20 / 0 #Negative infinity
- Inf + 5 #Operation on Inf
- -Inf + 5 #Operation on Inf
- is.finite(1 / 0) # check if finite number (FALSE)
- is.infinite(1 / 0) # check if infinite number (TRUE)
- is.finite(0/1) # check if finite number (TRUE)
- Inf / Inf #Not a Number (NaN)
- Inf – Inf #Not a Number (NaN)
- is.nan(Inf – Inf) #Check if NAN (TRUE)
- NA + 4 #Missing value (NA)(operation on missing number)
- NA – 4 #Missing value (NA)
- NA * 4 #Missing value (NA)
- NA / 4 #Missing value (NA)
- is.na(NA +4) #Check if NA (TRUE)
- is.nan(NA) #Check if NA in NaN (FALSE)
- is.na(NaN) #Check if NaN in NA (TRUE)
- R has six basic (‘atomic‘) vector types: logical, integer, real, complex, string (or character) and raw.
- A real number is any positive or negative number. This includes all integers (whole numbers including zero) and all rational and irrational numbers. Rational numbers may be expressed as a fraction (such as 7/8) and irrational numbers may be expressed by an infinite decimal representation (3.1415926535…). Real numbers that include decimal points are also called floating point numbers, since the decimal “floats” between the digits.
- Zero is known as the neutral integer, or the whole number that comes in the middle of positive and negative numbers on a number line, which in turn makes it an integer, but not necessarily a natural number.
- Natural numbers are the positive integers (whole numbers) 1, 2, 3, etc., and sometimes zero as well.
- https://techterms.com/definition/realnumber
Logical Operators
- Logical operators not only work on numerical values But also work on character strings.
- Boolean result for each comparison.
- Logical NOT operator will simply inverse the value inside parenthesis.
- Logical OR operator works on two logical expressions , if any of the two operands evaluates to true, then the output will be TRUE.
- For logical AND result will be TRUE only when both operands or logical expressions evaluate to true.



- 6 < 3 #less than
- 6 <= 3 #less than equal to
- 6 > 3 #greater than
- 6 >= 3 #greater than equal to
- 6 == 3 # equal to
- 6 != 3 #not equal to
- “b” > “a” #comparing characters (TRUE)
- “e” < “b” #comparing characters (FALSE)
- !(TRUE) #logical NOT operator (inverses value inside parenthesis) (FALSE)
- TRUE | FALSE #logical OR operator (TRUE)(TRUE when any of two operands true)
- TRUE & FALSE #logical AND operator (FALSE) (TRUE when both logical expressions are true)
Data Structure
- Data sturucture defines the way in which data will be organized and stored in the memory.
- Data sturucture is collection of data elements grouped under one name.
- It can be seen as a container holding data elements. Selection of data structure depends upon answers to questions-
- what type of items to put in (homo or heterogenous)
- how to arrange-to produce different data structures (List, Vector, Factor, Data frame, Matrix, Array)
- We can use str() function to see the structure of an object.
- is.numeric() function to test if it’s a numeric vector.
- Most of the data in real word is either objects or basic/atomic classes (character, Numeric, Integer, logical and complex ) or objects that can be built using these basic classes.
- In R language we use capital “L”as suffix to explicitly mark any numeric value without a decimal part as an integer.
- Elements without”L” suffix will be trated as double values making it a numeric vector.
- Integers are basically numeric values without decimal parts. Integer can be considered as numeric vector but converse may not be true.
- Data Frames: A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.
- Factors: Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in data analysis for statistical modeling.
- Matrices: Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. A Matrix is created using the matrix() function.
- Arrays: Arrays are the R data objects which can store data in more than two dimensions. It takes vectors as input and uses the values in the dim parameter to create an array.
- List: Lists are the R objects which contain elements of different types like − numbers, strings, vectors and another list inside it.
- Vector: A Vector is a sequence of data elements of the same basic type.
- https://www.edureka.co/blog/r-tutorial/

https://www.edureka.co/blog/r-tutorial/

http://www.simonqueenborough.info/R/basic/data-types.html

Atomic Vector
- Commonly known as “Vector”.
- Homogenous data structure.
- One-dimensional arrangement of elements.
- Most of the basic functions or operators in R are vectorized in nature.
- Vectorized operators can work on one or more vectors to give some results.
- A vector is a sequence of data elements of the same basic type (numeric, chracter or boolean type).
- Members of a vector are called Components.
- We use combine function c() to create vector by combining discrete values.
- Here is a vector containing three numeric values 2, 3 and 5 : c(2, 3, 5).
- The c() function coerces non-character values to character type if one of the elements is a character.
- The syntax to access a vector a at (position) index i is, a[i]
- If we try to access elements outside of the index range, we will get the value as NA
- We can access multiple elements of a vector using a vector of indices. The returned value is also a vector.
- We can also make some elements disappear in the returned vector by giving the indices of those elements with a negative sign.
- We can also provide TRUE and FALSE in an indices vector. When there is TRUE, the element is considered in the result and when there is FALSE, the element is not considered in the result.
- For the two vectors with length greater than one, the arithmetic operations are performed like a dot operation (one to one).
- R has six basic (‘atomic’) vector types: logical, integer, real, complex, string (or character) and raw.
- We can perform arithmetic operations (like addition, subtraction, multiplication and division).
- Importantly, the two vectors should be of same length and same type or the second vector can be an atomic value of same type.
- If the second vector is of different length, something called recycling happens to the smallest vector to match the size of the largest vector.

Create a vector
- students.marks <-c (10,20,30,40) # create a vector using combine function
- students.marks # print variable
- [1] 10 20 30 40
- students.grades <- c (“pass”, 50, “B”, “TRUE”) # create a vector using different element types
- students.grades # print (non-character values converted to character type)
- [1] “pass” “50” “B” “TRUE”
Atomic Vector Script
- student.names <-c (“amar”,”bob”, “chap”, “don”) # create character vector (using combine function)
- student.names # print vector content
- [1] “amar” “bob” “chap” “don”
- str (student.names) # get vector structure ( four elements indexed from [1:4])
- chr [1:4] “amar” “bob” “chap” “don”
- is.character (student.names ) # test character vector
- [1] TRUE
- student.weights <-c (70.1, 70.2, 70.3, 70.4) # create numeric vector (using combine function)
- student.weights # print vector content
- [1] 70.1 70.2 70.3 70.4
- str (student.weights) # get vector structure
- num [1:4] 70.1 70.2 70.3 70.4
- is.numeric (student.weights) # test numeric vector
- [1] TRUE
- student.marks <-c (51L, 52L, 53L, 54L) # create integer vector (using combine function)
- student.marks # print vector content
- [1] 51 52 53 54
- str (student.marks) # get vector structure
- int [1:4] 51 52 53 54
- is.integer (student.weights) # test integer vector
- [1] FALSE
- is.numeric (student.marks) # integer is numeric without decimal part
- [1] TRUE
- is.integer (student.weights) # numeric maynot be integer
- [1] FALSE
- student.maths.interest <-c (T, TRUE, F, FALSE) # create logical vector (using combine function)
- student.maths.interest # print vector content
- [1] TRUE TRUE FALSE FALSE
- str (student.maths.interest) # get vector structure
- logi [1:4] TRUE TRUE FALSE FALSE
- is.logical (student.maths.interest) # test logical vector
- [1] TRUE
- complex.vector <-c (10+2i, -10+2i, 10-2i) # create complex vector (using combine function)
- complex.vector # print vector content
- [1] 10+2i -10+2i 10-2i
- str (complex.vector) # get vector structure
- cplx [1:3] 10+2i -10+2i 10-2i
- is.complex (complex.vector) # test complex vector
- [1] TRUE
- vector (“character”, length = 4) # create character vector (using vector command (default:empty string)
- [1] “” “” “” “”
- vector (“numeric”, length = 4) # create numeric vector (using vector command)(default:0)
- [1] 0 0 0 0
- vector (“integer”, length = 4) # create integer vector (using vector command)
- [1] 0 0 0 0
- vector (“logical”, length = 4) # create logical vector (using vector command)(default:FALSE)
- [1] FALSE FALSE FALSE FALSE
- vector (“complex”, length = 4) # create complex vector (using vector command)(default:0+i0)
- [1] 0+0i 0+0i 0+0i 0+0i
Subsetting or Access vector component
- Subsetting is the process of extracting one or more elements from any data structure (vector here).
- In R there is 1 based index.
- Use [] to access an element.
Subsetting/Extraction/Accessing vector elements
- students.marks [3] # to access vector at third position (index)
- [1] 30
- students.marks [5] # to access vector element outside the index range
- [1] NA
- students.marks [1:3] # access multiple elements in a sequence using colon operator (1 through 3 here)
- [1] 10 20 30
- student.marks [c(T,F,T,F)] #access multiple elements specifying logical vector
- [1] 10 30
- students.marks [c(1,4)] #access multiple elements specifying index vector
- [1] 10 40
- student.names <- c(“amar”,”bob”,”chap”,”don”) #create vector using combine function
- [1] “bob” “chap” “don”
- student.names [students.marks >= 20] #access multiple elements specifying logical vector
- indices <- c(1,3) # create vector of indices
- students.marks [indices] # access multiple elements of a vector using a vector of indices
- [1] 10 30
- neg.indices <- c(-1, -3) # create vector of negative indices to make some elements disappear
- students.marks [neg.indices] # print to make some elements disappear
- [1] 20 40
- bool.indices <-c (TRUE, FALSE, TRUE, FALSE) # create logical vector (TRUE, FALSE) of indices
- students.marks [bool.indices] # using TRUE and FALSE in an indices vector
- [1] 10 30
- bool.indices.short <-c (T,F,T,F) # create logical vector of indices (T & F)
- student.marks [bool.indices.short] #access element
- [1] 10 30
Coercions/ Typecasting/ Type conversion
- Converting one type to another.
- Whenever there is mixing of data type coercion will occur.
- Typesetting can be Implicit or Explicit (may or may not be Sensible)
Coercion script
- student.weights <- c(50.1,50.2,50.3,50.5) #create numeric(double) atomic vector
- str (student.weights) #print data structure
- num [1:4] 50.1 50.2 50.3 50.5
- student.weights <- c(50.1,50.2,50.3,’50.5′) #Implicit coercion due to mixing of data types
- str (student.weights) #print converted data structure
- chr [1:4] “50.1” “50.2” “50.3” “50.5”
- as.numeric (student.weights >= 50.3) # converting logical values (T,F) to numeric values(1,0)
- [1] 0 0 1 1
- as.integer (student.weights) #converting numeric to integer vector
- [1] 50 50 50 50
- as.character(student.weights) #converting numeric to character
- [1] “50.1” “50.2” “50.3” “50.5”
- as.logical(student.weights) #converting numeric values to logical values (Insensible coercion)
- [1] NA NA NA NA
- student.names <- c(“amar”,”bob”, “chap”, “don”) #create character atomic vector
- as.numeric (student.names) #converting names to logical values (Insensible coercion)
- [1] NA NA NA NA
- Warning message:
- NAs introduced by coercion
R Vector Arithmetic & Logical Operations
- students.test.marks <- c(1,2,3,4)
- add1.student.marks <-students.marks + 2 # arithmetic operations performed like a dot operation (one to one).
- add1.student.marks # print variable
- [1] 12 22 32 42
- add2.student.marks <-students.marks + students.test.marks # arithmetic operations performed like a dot operation (one to one).
- add2.student.marks # print variable
- [1] 11 22 33 44
- sub1.student.marks <- students.marks – 2 #R Vector Subtraction
- sub1.student.marks # print variable
- [1] 8 18 28 38
- sub2.student.marks <- students.marks – students.test.marks #R Vector Subtraction
- sub2.student.marks # print variable
- [1] 9 18 27 36
- mul1.students.marks <- students.marks * 2 #R Vector Multiplication
- mul1.students.marks # print variable
- [1] 20 40 60 80
- mul2.students.marks <- students.marks * students.test.marks #R Vector Multiplication
- mul2.students.marks # print variable
- [1] 10 40 90 160
- div1.students.marks <- students.marks %/% 2 #R Vector integer divison
- div1.students.marks # print variable
- [1] 5 10 15 20
- div2.students.marks <- students.marks %/% students.test.marks #R Vector integer divison
- div2.students.marks # print variable
- [1] 10 10 10 10
- students.less.marks <- c(1,2) # create vector with different length
- recycle.students.less.marks <- students.marks + students.less.marks # smallest Vector Element Recycling
- recycle.students.less.marks # print variable
- [1] 11 22 31 42
- students.marks >= 20 # R vector logical operation
- [1] FALSE TRUE TRUE TRUE
Vectorized Operations
- Vectorized operations: Flavor I: Input= single vector, Output= Scalar(vector of length 1)
- mean(students.marks) # print variable
- [1] 25
- Vectorized operations: Flavor II: Input= single vector, Output= single vector
- student.marks <-students.marks + 1 #arithmatic operator
- student.marks # print the content of new vector students.marks
- [1] 11 21 31 41
- student.marks >= 12 # logical operator
- [1] FALSE TRUE TRUE TRUE
- sqrt (students.marks) # print the content of variable
- [1] 3.162278 4.472136 5.477226 6.324555
- Vectorized operations: Flavor III: Input= multiple vector, Output= single vector
- total.students.marks <- students.marks + students.test.marks # addition
- total.students.marks # print the content of vector
- [1] 11 22 33 44
Factor
- Special case of vector used to store nominal/categorical values.
- “Categorical” means field can take values from few categories only.
- “Gender” is a categorical variable as limited categories available.
- Factors are more efficient than character vectors as characters are case sensitive. Further, string comparisons (filtering) are inefficient in comparison to integer comparison.
- Factors are self-describing in contrast to integer vectors.
- We can create a factor by wrapping combine function output in the factor method. This function will also provide additional information about number of levels (categories).
- Explicit coercion of factor to numeric vector will create Implicit levels (as.numeric () function ).By default, levels are decided alphabatically.
- Explicit levels can also be created when we want to order our own levels.We can do it by creating a factor with first arguement as the combine function to provide various categories(levels) and second arguement levels with the combine function again to order levels as per our need.
- Not only we can order the levels but also can create additional levels to provide all possible levels in the problem.
Script for Factor
- student.gender <- c(‘Male’,’Male’,’Female’,’Female’) #character vector to represent Female & Male categories
- student.gender #print categories
- [1] “Male” “Male” “Female” “Female”
- student.gender <- c(2L,2L,1L,1L) #integer vector to represent Female & Male categories as 2& 1 resp.
- student.gender #print categories
- [1] 2 2 1 1
- student.gender <- factor(c(‘Male’,’Male’,’Female’,’Female’)) #create factor(wrap combine function output in a factor method)
- student.gender # print categories with levels (not wrapped in quotes)
- [1] Male Male Female Female
- Levels: Female Male
- as.numeric (student.gender) #explicit coercion of factor to numeric vector( create Implicit levels)
- [1] 2 2 1 1
- student.blood.groups <- factor(c(“A”,”B”,”O”, “O”)) #create factor(wrap combine function output in a factor method)
- student.blood.groups #print categories along with levels (no quotes)
- [1] A B O O
- Levels: A B O
- student.blood.groups <- factor(c(“A”,”B”,”O”,”O”), levels = c(“A”, “B”, “AB”, “O”)) #create Explicit levels
- student.blood.groups #creating additional levels
- [1] A B O O
- Levels: A B AB O
- str(student.blood.groups) #test structure of factor with additional level
- Factor w/ 4 levels “A”,”B”,”AB”,”O”: 1 2 4 4
List
- Heterogenous data structure-can contain different class items.
- One dimentional arrangement of elements.
- With the help of list we can extract all information of one student stored in different type of vectors using single command.
- To create a list, first we create various different vectors storing different type of information. Then, we create an unnamed list using list() function and in this list () function we can pass all desired elements by use of square bracket to get a partcular element from the designated vector ( indexing ).
- We can create a named list also by specifying a name for each parameter. This will give the name of each element after the dollar sign in the output.
- Can use str () function to get the structure of created list.
- We can use the elemnts of atomic class only as the members of list or we can also use a vector as an element in the list (List can also be a member of another list).
- We can extract one or more elements from a list using square bracket.
- If we use single bracket to extract an object/element it will give an object of same type from which we are subsetting. If we subset, we get a vector and if we subset a list we will et a list.
- If we use single bracket to extract an object/element it will return the type of extracted element.
- typeof() function to get the type of extracted element.
- We can use names (wrapped in quotes) instead of index number to extract element from named list.
- We can also use dollar sign to extract element from named list and don’t need to use square brackets or wrap the element name in quotes.
- We can use length() function to get the total number of elements in the list.
Script for List
- student.names <- c(‘amar’, ‘bindu’, ‘chris’, ‘don’) #character vector
- student.weights <- c(70.1, 70.2, 70.3, 70.5) #numeric (double)vector
- student.gender <- factor (c(“male” , “female” ,”female”, “male” )) #factor
- student.eng.marks <- c( 51L, 52L, 53L, 55L) #integer vector
- student.maths.marks <- c( 61L, 62L, 63L, 65L) #integer vector
- student1 <- list (student.names[1], student.weights[1], student.gender[1], student.eng.marks[1], student.maths.marks[1] ) # creating unnamed list
- str (student1) #get the structure of created list
- List of 5
- $ : chr “amar”
- $ : num 70.1
- $ : Factor w/ 2 levels “female”,”male”: 2
- $ : int 51
- $ : int 61
- student1 # print content of unnamed list
- [[1]]
- [1] “amar”
- [[2]]
- [1] 70.1
- [[3]]
- [1] male
- Levels: female male
- [[4]]
- [1] 51
- [[5]]
- [1] 61
- student1 <- list (name= student.names[1], weight= student.weights[1], gender= student.gender[1], english= student.eng.marks[1], maths= student.maths.marks[1] ) # creating named list
- str (student1) #get the structure of created list
- List of 5
- $ name : chr “amar”
- $ weight : num 70.1
- $ gender : Factor w/ 2 levels “female”,”male”: 2
- $ english: int 51
- $ maths : int 61
- student1 # print content of named list
- $name
- [1] “amar”
- $weight
- [1] 70.1
- $gender
- [1] male
- Levels: female male
- $english
- [1] 51
- $maths
- [1] 61
- student1 <- list (name= student.names[1], weight= student.weights[1], gender= student.gender[1], marks.eng.maths= c(student.eng.marks[1], student.maths.marks[1])) # List containig a vector as an element
- str (student1) #get the structure of List containig a vector as an element
- List of 4
- $ name : chr “amar”
- $ weight : num 70.1
- $ gender : Factor w/ 2 levels “female”,”male”: 2
- $ marks.eng.maths: int [1:2] 51 61
- student1
- $name
- [1] “amar”
- $weight
- [1] 70.1
- $gender
- [1] male
- Levels: female male
- $marks.eng.maths
- [1] 51 61
- student2 <- list(student.names[2], student.gender[2], student.weights[2], student.eng.marks[2], student.maths.marks[2]) # craete unnamed list
- student2 #print unnamed list
- [[1]]
- [1] “bindu”
- [[2]]
- [1] female
- Levels: female male
- [[3]]
- [1] 70.2
- [[4]]
- [1] 52
- [[5]]
- [1] 62
- str(student2) #get structure of created list
- List of 5
- $ : chr “bindu”
- $ : Factor w/ 2 levels “female”,”male”: 1
- $ : num 70.2
- $ : int 52
- $ : int 62
- typeof(student2) #get type of created object
- [1] “list”
- student2 [2] #extract element from unnamed list using single square bracket
- [[1]]
- [1] female
- Levels: female male
- typeof(student2 [2]) #get type of extracted object (same as from which element was extracted)
- [1] “list”
- student2 [[2]] #extract element from unnamed list using double square bracket
- [1] female
- Levels: female male
- typeof( student2 [[2]]) #get type of extracted object (type of extracted object itself)
- [1] “integer”
- student2 [1:3] #extract multiple elements from unnamed list using single square bracket
- [[1]]
- [1] “bindu”
- [[2]]
- [1] female
- Levels: female male
- [[3]]
- [1] 70.2
- student3 <- list(name= student.names[3], gender= student.gender[3], weight= student.weights[3], eng= student.eng.marks[3], maths= student.maths.marks[3])# create named list
- student3 [“gender”] #extract element from named list using element name wrapped in quotes
- $gender
- [1] female
- Levels: female male
- student3 $ gender # extract element from named using dollar sign without quotes
- [1] female
- Levels: female male
- student3 [ c(“gender”, “name” , “weight”)] #extract multiple elements from named list using combine function to create a character vector
- $gender
- [1] female
- Levels: female male
- $name
- [1] “chris”
- $weight
- [1] 70.3
- length(student3) #get the total number of elements in the list
- [1] 5
Data frames
- Data frames are most popular data structures.
- Heterogenous-can contain elements of different classes (like list).
- 2-dimensional arrangement (unlike list).
- Data frames are like spread-sheets; each column represents a field and values are stored in the rows.
- If we want to get details of all the students of the class we can pass our atomic vectors to data.frame () function.
- However, each object passed in data.frame function should have an equal number of elements.
- Data frames are actually a type of list (we get ” list” when use typeof () function) where each element of the list is a vector of equal lengths. Vector variable names become field/column names and the values are stored in data rows.
- Head = top line with field/column/vector names
- Cell = Each value in data row
- “Character” string values are treated as “factor” by default in data frames.
- to remove this default behaviour we can set Boolean paramer stringasFactor = FALSE
Script for Data frame
- student.names <- c(“amar” , “bindu” , “chris” , “don” ) #character vector
- student.gender <- factor(c(‘male’, ‘female’, ‘female’, ‘male’)) # factor
- student.weights <- c(70.1, 70.2, 70.3, 70.5) #numeric (double)vector
- student.eng.marks <- c(51L, 52L, 53L,55L) #integer vector
- student.maths.marks <- c( 61L, 62L, 63L, 65L) #integer vector
- all.students <- data.frame (student.names, student.gender, student.weights, student.eng.marks, student.maths.marks) # create a data frame named all.students
- all.students #print content of created data frame
- typeof(all.students) #get type of created data frame
- [1] “list”
- str(all.students) #get structure of created data frame
- ‘data.frame’: 4 obs. of 5 variables:
- $ student.names : Factor w/ 4 levels “amar”,”bindu”,..: 1 2 3 4
- $ student.gender : Factor w/ 2 levels “female”,”male”: 2 1 1 2
- $ student.weights : num 70.1 70.2 70.3 70.5
- $ student.eng.marks : int 51 52 53 55
- $ student.maths.marks: int 61 62 63 65
- all.students <- data.frame (student.names, student.gender, student.weights, student.eng.marks, student.maths.marks, StringsAsFactors = FALSE) # use stringsAsFactors = FALSE to avoid conversion of character vector to factor
- all.students #print content of created data frame
- typeof(all.students) #get type of created data frame
- [1] “list”
- str(all.students) #get structure of created data frame
- ‘data.frame’: 4 obs. of 5 variables:
- $ student.names : chr “amar” “bindu” “chris” “don”
- $ student.gender : Factor w/ 2 levels “female”,”male”: 2 1 1 2
- $ student.weights : num 70.1 70.2 70.3 70.5
- $ student.eng.marks : int 51 52 53 55
- $ student.maths.marks: int 61 62 63 65
Script for operations on Data frame
- student.names <- c(“amar” , “bindu” , “chris” , “don” ) #create character vector
- student.gender <- factor(c(‘male’, ‘female’, ‘female’, ‘male’)) # factor
- student.weights <- c(70.1, 70.2, 70.3, 70.5) #numeric (double)vector
- student.eng.marks <- c(51L, 52L, 53L,55L) #integer vector
- student.maths.marks <- c( 61L, 62L, 63L, 65L) #integer vector
- #create dataframe while using stringsAsFactors = FALSE to avoid conversion of character vector to factor
- all.students <- data.frame (student.names, student.gender, student.weights, student.eng.marks, student.maths.marks, stringsAsFactors = F)
- all.students #print content of created data frame
- #subsetting data frame (first element means first column from data frame)
- all.students [1] #single square bracket to return object of same type (from which we extract)
- student.names
- 1 amar
- 2 bindu
- 3 chris
- 4 don
- typeof(all.students [1])
- [1] “list”
- all.students [[1]] #double square bracket to return object in its own type (element type)
- [1] “amar” “bindu” “chris” “don”
- typeof(all.students [[1]])
- [1] “character”
- # use column name instead of index to extract it
- all.students [“student.weights”] # single square bracket to return object of same type
- student.weights
- 1 70.1
- 2 70.2
- 3 70.3
- 4 70.5
- typeof(all.students [“student.weights”])
- [1] “list”
- all.students [[“student.weights”]] #double square bracket to give object of its own type
- [1] 70.1 70.2 70.3 70.5
- typeof(all.students [[“student.weights”]] )
- [1] “double”
- all.students$student.weights # dollar sign (with column name) to return object in its own type
- [1] 70.1 70.2 70.3 70.5
- typeof(all.students$student.weights)
- [1] “double”
Script to extract multiple elements (slice of data frame)
- all.students
- all.students [1:3] # colon or sequence operator to extract slice of first through third column
- all.students [ c(“student.weights”, “student.names”) ]# combine function to extract two columns together
- all.students [3,2] # Row, Column (extract individual cell located at third row & second column)
- [1] female
- Levels: female male
- all.students [1:3,1:2] #colon or sequence operator to extract consecutive rows and columns)(first three rows & first two columns)
- all.students [ c(1,2,3), c(1,2)] #combine function to extract individual cells
- all.students [,3] #unspecified row number(get all rows for third column)
- [1] 70.1 70.2 70.3 70.5
- all.students [1,]#unspecified column number (get all columns for first row)
- all.students [c(T,F,F,T),]# logical vector to pick required rows of all columns
- all.students [,c(T,F,F,F,T)]# logical vector to pick required column for all rows
- all.students [c(T,F,F,T),1:2] #extraction using both sequence operator and combine function
- all.students [student.gender == “female”, ] #logical operator to get details of only female students
- all.students [student.eng.marks == 51, ]
- all.students [student.eng.marks > 51, ]
- all.students [student.eng.marks >= 53, 1:2 ]
Matrix
- 2-dimensional arrangement (similar to Data frames and spreadsheets) but are homogenous in nature (unlike Data frames) and can store elements of same type only.
- Typically, Matrices are used to store & process numeric data.
- rbind() function for row-wise binding of matrix. Variable names are ‘Row-names’ and column names are auto-generated as comma followed by index number.
- cbind() function for column-wise binding of matrix. Variable names are ‘Column names or Head’ while Row names are auto-generated as index number followed by a comma.
- We can customize row names by passing the created matrix as an arguement in the rownames() function on the left hand side of assignment operator. On the right hand side we create a character vector using combine function.
- str() function can be used to test the structure of created matrix.
- Dimnames or dimension names is a list which contains row names and column names.
- Matrix can also be constructed using matrix ()function. As the first parameter to this function we will use combine function to create an integer vector. As second and third parameter we will use ncol=x and nrow =y to designate the number of columns and rows respectively. Row names as well as column names are auto-generated.
- If we use matrix() function to create a matrix, by default the elements will be arranged column-wise. first of all, the first column will be filled , then the second column will be filled and then so on.
- Techniques of extracting elements from Matrix are similar to that of used for extracting elements from Data frames.
Script for Matrix
- student.eng.marks <- c(61L, 62L, 63L, 65L) #integer vector
- student.maths.marks <- c(71L, 72L, 73L, 75L) #integer vector
- student.marks <- rbind (student.eng.marks, student.maths.marks) #row-wise matrix binding
- student.marks #print content of(row-wise)created matrix
- student.marks <- cbind (student.eng.marks, student.maths.marks) #column-wise matrix binding
- student.marks #print content of(column-wise)created matrix
- rownames(student.marks) <- c(“amar”, “bindu”, “chris”, “don”) #customize row-names for column bound matrix
- student.marks #print content of(column-wise)created matrix with customized row-names
- str(student.marks) #get structure of matrix with dimnames (cbind function)
- student.marks <- matrix(c(61L, 62L, 63L, 65L,71L, 72L, 73L, 75L), ncol=4, nrow=2)# matrix function to create matrix (default column-wise item arrangement)
- student.marks #print content of created matrix(default column-wise, c=4,r=2)
- student.marks <- matrix(c(61L, 62L, 63L, 65L,71L, 72L, 73L, 75L), nrow=4, ncol=2 ) #create matrix(default column-wise, r=4,c=2)
- student.marks #print content of created matrix(default column-wise, r=4,c=2)
- student.marks <- matrix(c(61L, 62L, 63L, 65L,71L, 72L, 73L, 75L), ncol=4, nrow=2, byrow= TRUE) #row wise item arrangement
- student.marks # print content of this row-wise matrix
- str(student.marks) #get structure of matrix without dimnames (matrix function)
- int [1:2, 1:4] 61 71 62 72 63 73 65 75
- rownames(student.marks) <-c(“amar” , “bindu”)
- student.marks
Script for Matrix Operations
- student.eng.marks <- c(61L, 62L, 63L, 65L) #integer vector
- student.maths.marks <- c(71L, 72L, 73L, 75L) #integer vector
- student.marks <- cbind (student.eng.marks, student.maths.marks) #column-wise matrix binding
- rownames(student.marks) <- c(“amar”, “bindu”, “chris”, “don”) #customize row-names for column bound matrix
- student.marks #print content of(column-wise)created matrix with customized row-names
- str(student.marks) #get structure of matrix with dimnames (cbind function)
- #Subsetting or extracting elements from Matrix [Row, Column]
- student.marks[3,2] #extract single cell
- [1] 73
- student.marks[,] #extract all rows & all columns
- student.marks[,2] #extract all rows with specified column
- student.marks[2,] #extract specified row with all columns
- student.marks[2:4,1] #colon or sequence operator for consecutive rows
- student.marks[c(1,3), ] #combine function (integer vector)to extract individual rows and all columns
- student.marks[c(T,F,T,F), 2] #logical vector (combine function) to extract individual rows and specified column
- #Matrix Summary
- student.marks #print content of(column-wise)created matrix with customized row-names
- rowSums(student.marks) #Row-wise sum
- colSums(student.marks) #Column-wise sum
- colMeans(student.marks) #Column wise mean