1 Introduction and preliminaries
- Exit repl:
q()
Help
help(solve)
||?solve
||help("[[")
- Lanuch help:
help.start()
help.search()
|| `??solve``example(topic)
Basic commands
- Elementary commands: expressions & assignments
- Commands are separated by
;
or newline. Commands can be grouped by{ }
. Comments start with#
. - Executing file:
source("<file>")
- output to file:
sink("<file>")
, restore:sink()
Data permanency
The entities that R creates and manipulates are objects.
Use objects()
or ls()
to list. The collection of objects currently stored is called the workspace.
Use rm()
to remove.
Objects are stored in .RData
, historys are stored in .Rhistory
.
2 Simple manipulations; numbers and vectors
Vectors and assignment
vector: a ordered array of numbers.
1 | # an assignment useing the function c() |
In most contexts the =
operator can be used as an alternative to <-
.
If an expression is used as a complete command, the value is preinted and lost.
Vector arithmetic
Shorter vectors are recycled when occurring in the same expression. In particular a constant is simply repeated.
- operators:
+
,-
,*
,/
,^
functions
log
,exp
,sin
,cos
,tan
,sqrt
max
&min
select in all their arguments, use
pmax
to select parallel.length
sum
&prod
mean
,var
sort
,order
,sort.list
To work with complex numbers, supply an explicit complex part: sqrt(-17+0i)
Generating regular sequences
1 | 1:10 |
Arguments can be given in named form and mixing with normal form
1 | # replicate an object |
Logical vectors
Values: TRUE
, FALSE
, NA
(not available)
Operators: <
, <=
, >
, >=
, ==
, !=
Logical operators: &
(and), |
(or)
1 | temp <- x > 13 |
In ordinary arithmetic, TRUE
are coerced 1
, FALSE
becomes 0
.
Missing values
is.na(x)
: compare elements withNA
&NaN
and return a new logical vector
x == NA
is different from is.na(x)
, former one will compare the x
itself with NA
.
NaN
(not a number) values:1
20/0 # NaN
Inf - Inf # NaN
Character vectors
Character strings are entered using "
or '
. Escaping uses \
.
?Quotes
1
2
3# concatenates arguments one by one into character strings
paste(c("x", "y"), 1:6, sep=" ") #> c("x1", "y2", "x3",...)
# paster(..., collapses=<str>) join arguments into a single string.
Index vectors; selecting and modifying subsets of a data set
Index can be:
- Logical vector: work like a filter
1 | x <- c(1:3, NA) |
A vector of positive integral quantities
1
x[1:10]
A vector of negative integral quantities: specifies the values to be excluded
1 | x[-(1:5)] # gives all but the first five |
- A vector of character strings: applied where an object has a names attribute:
1
2
3fruit <- c(5, 10, 1)
names(fruit) <- c("orange", "banana", "apple")
fruit[c("apple", "orange")]
An indexed expression can also appear on the rhs of a vector, when the assignment is performed only on those eles of the vector.
1 | # same |
Other types of objects
- matrices
- factors
- lists: a vector whose eles needn’t be of the same type
- data frames: matrix-like
- functions
3 Objects, their modes and attributes
- Intrinsic attributes: mode and length
Type, or mode, namely numeric, complex, logical, caharacter and raw.
Use mode(object)
, length(object)
, attributes(object)
to examine.
- Changing the length of an object
1 | e <- numeric() |
- Getting and setting attributes
1 | # attr(object, name) |
The class of an object
All objects in R have a class, for simple vectors this is mode, but “matrix, “ array” are other possible values.
class()
unclass()
4 Ordered and unordered factors
A factor of a vector is a group of unique elements.
- Use
factor(<vector>)
to create a factor. - A factor has a class attr containing all unique eles.
Use
tapply(<values>, <levels>, <func>)
to apply function to each levels.tapply
can be used to handle complicated indexing of a vector by multiple categories.
1 | state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act") |
The ordered()
creates ordered factors but is otherwise identical to factor
.
5 Arrays and matrices
A vector can be an array only if it has a dimension vector as its dim attr:
- dimension vector: a vector of non-negative integers.
1 | dim(z) <- c(3,5,100) |
Matrices are two dimension arrays.
If any index position is given an empty index vector, then the full range of that subscript is taken:
1 | a[2,,] |
Index matrices
Use a matrice in index to extract an irregular collection as a vector.
1 | x <- array(1:20, dim=c(4,5)) |
- Negative indeices are not allowd in index matrices.
- Rows containing
NA
produceNA
. - Rows containing a zero is ignored.
cbind
table
array()
1 | array(<dataVector>, <dimVector>) |
Mixed vector and array arithmetic. The recycling rule
- The expression is scanned from left to right.
- Any short vector operands are extended by cetycling their values until they match the size of any other operands.
- As long as short vectors and arrays only are encountered, the arrays must all have the same
dim
attribute or an error results. - Any vector operand longer than a matrix or array operand generates an error.
- If array structures are present and no error or coercion to vector has been precipitated, the result is an array structure with the common
dim
attribute of its array operands.
The outer product of two arrays
The result’s dimentsion of a outer product is obatined by concatenating their two dimension, and whose data vector is got by forming all possible products of a with those of b.
1 | ab <- a %o% b |
Generalied transpose of an array
aperm(a, perm)
: Switch dimension in a. Used to permute an array.t(a)
&aperm(a, c(2,1))
: transpose an array.
Matrix facilities
t(X)
is hte matrix transpose functionnrow(A)
&ncol(A)
Multiplication
1 | A * B #> matrix of product of each slot |
crossprod(X, y)
is the same ast(X) %*% y
diag(v)
- When v is a vector, gives a diagonal matrix.
- When v is a matrix, gives the vector of diagonal entries of v.
- When v is a single value, gives the identity matrix.
Linear equations and inversion
1 | b <- A %*% x |
Eigenvalues and eigenvectors
1 | # calculates the eigenvalues and eigenvectors of a symmetric matrix |
Singular value decomposition and determinants
svd(M)
: calculates the singular value decomposition of Mprod()
absdet()
det()
Least squares fitting and the QR decomposition
lsfit()
: returns a list giving results of a least squares fitting procedure.ls.diag()
lm(.)
qr()
Forming partitioned matrices
cbind()
: form matrices by binding matrices horizontally.rbind()
They respect dim
attribute.
They are the simplest ways explicitly to allow the vector to be treated as a column or row matrix.
The concatenatoin function, c()
, with arrays
1 | # Corece an array back to a simple vector |
Frequency tables from factors
table
allows frequency tables to be calculated from equal length factors.
Equivalent to tapply(statef, statef, length)
.
1 | incomef <- factor(cut(incomes, breaks = 35+10*(0:7)) |
6 Lists and data frames
Lists
list is an object consisting of an ordered collection of objects knows as its components.
components can be diffirent types and are always numbered.
1 | Lst <- list(name="Mondo", bf="ziyang", no.children=0, child.ages=c(101)) |
[[...]]
is used to select a single element, [...]
selects a sublist of the list, if list is a named list, the names are transferred to the sublit.
Data frames
A data frame is a list with class data.frame
, may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes.
Restrictions:
- The components must be vectors, factors, numeric matrices, lists, or other data frames.
- Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables.
- Numeric vectors, logicals and factors are included as is, and by default character vectors are coerced to be factors.
- Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size.
1 | # create |
Data frames can use as a workspace.
attach
allows not only directories and data frames to be attached, but other classes of object as well.
Managing the search path
search()
shows the current search path.ls(<search_level>)
examine the contents of any position on the search path.
7 Reading data from files
read.table()
The external file will normally have a special form:
- The first line of the file should have a name for each variable.
Each additional line has as its first item a row label and the values for each variable.
read.table(header=TRUE)
: specifies that the first line is a line of headings.
scan
1 | # The second arg is a dummy list that establishes the mode of the three vectors of data to be read. |
Accessing builtin datasets
data()
: see the list of builtin datasets.data(<data_name>[, package="datasets"])
: load datasets.
Editing data
1 | xnew <- edit(xold) |
8 Probability distributions
R as a set of statistical tables
Distribution | R name | additional arguments |
---|---|---|
beta | beta |
shape1, shape2, ncp |
binomial | binom |
size, prob |
Cauchy | cauchy |
location, scale |
chi-squared | chisq |
df, ncp |
exponential | exp |
rate |
F | f |
df1, df2, ncp |
gamma | gamma |
shape, scale |
geometric | geom |
prob |
hypergeometric | hyper |
m, n, k |
log-normal | lnorm |
meanlog, sdlog |
logistic | logis |
location, scale |
negative binomial | nbinom |
size, prob |
normal | norm |
mean, sd |
Poisson | pois |
lambda |
signed rank | signrank |
n |
Student’s t | t |
df, ncp |
uniform | unif |
min, max |
Weibull | weibull |
shape, scale |
Wilcoxon | wilcox |
m, n |
Prefix by d
for the density, p
for the CDF, q
for the quantile function, r
for simualtion.
ptukey
&qtukey
dmultinom
&rmultinom
Examining the distribution of a set of data
summary
& fivenum
& stem
1 | # examine the numbers |
1 | # make steam-and-leaf plot and histogram |
1 | # plot the empirical cumulative distribution function |
1 | # Quantile-quantile plots |
One- and two-sample tests
All “classical” tests are in package stats which is normally loaded.
1 | A <- scan() |
9 Grouping, loops and conditional execution
Grouped expressions
Commands may be grouped together in braces, {expr_1; ...; expr_m}
, in which case the value of the group is the result of the last expression.
Control statement
Conditional execution: if
statements
1 | # expr_1 must evaluate to single logical value |
&&
and ||
are “short-circuit” operators, whereas &
and |
apply element-wise to vectors, &&
and ||
apply to cetors of length one.
Vectorized version of if
: ifelse(condition, a, b)
returns a vectorof the same length as condition, with elements a[i] if conditions[i] is true.
Repetitive execution: for
loops, repeat
and while
for
1
2# `expr_1` is a vector expression
for (name in expr_1) expr_2split()
repeat expr
while (condition) expr
break
: terminate any loop. The only way to terminaterepeat
loopes.next
10 Writing your own functions
1 | # caculate two samples t test |
...
arg
1 | fun1 <- function(data, ...) { |
Assignments within functions
Any ordinary assignments done within the function are local and temporary and are lost after exit from the function.
- evaluation frame
- global assignments with superassignment operator:
<<-
orassign
Scope
Thef symbold which occur in the body of the function can be divides into:
- formal parameters: occurring in the arg list, their values are determined by binding the actual function arguments to the formal parameters.
- local variables: whose values are determined by the evaluation of expressoins in the body of the function.
- free variables: not formal or local variables. Free variables become local variables if they are assigned to.
1 | f <- function(x) { |
The free variable bindings are resolve in which the function was created, this is called lexical scope.
<<-
can change the value of free variables.
1 | open.account <- function(total) { |
Customizing the environment
Env profiles:
- Rprofile.site: inetc, or R_PROFILE
- .Rprofile: in work dir or home dir, or R_PROFILE_USER
Any function named .First()
in either of the profiles or in the .RData image is automatically performed at the begginning of an session.
.Last()
executed at the end of session.
Classes, generic functions and object orientation
The class of an object determines behavior with generic functions.
class
attributemethods(class="data.frame")
methods(plot)
UseMethod
indicates this is a generic function.
11 Statistical models in R
The ~
operator is used to define a model formula. The form for an ordinary linear model is:1
response ~ op_1 term_1 op_2 term_2 ...
- response: a vector or matrix defining the response variable(s)
- op_i: an operator, either
+
or-
, implying the inclusion or exclusion of a term. - term_i: is either:
- a vector or matrix expression, or 1
- a factor
- a formula expression consisting of factors, vectors or matrices connected by formula operators
The modal formulae specify the columns of the model matrix.
Notations
1 | Y ~ M # Y is modeled as M |
Examples
1 | # Simple linear regression model of y on x. The first has an implicit intercept term. |
Contrasts
LInear models
1 | fitted.model <- lm(formula, data = data.frame) |
Generic functions for extracting model info
1 | anova(obj_1, obj_2) |
12 Graphical procedures
High-level plotting commands
Designed to genrate a complete plot of the data, alwats start a new plot, erasing the current plot.
plot()
1 | # If x and y are vectors, produces a scatterplot |
Displaying multivariate data
pairs(X)
coplot()
Display graphics
1 | # Distribution-comparison plots. |
Low-level plotting commands
1 | points(x, y) |
Mathematical annotation
1 | help(plotmath) |
Interacting with graphics
1 | locator(n, type) |
Using graphics parameters
par()
Used to access and modify the list of graphics parameters for the current graphics device.
1 | par() # Return a list of all graphics paras. |
Graphics parameters list
Graphical elements
1 | # Character to be used for plotting points |
Axes and tick marks
Axes have three main components:
- axis line
- tick marks
- tick labels
1 | lab=c(5,7,12) # The desired number of tick intervals on the x and y axes and length of axis labels. |
Figure margins
A single plot is known as a figure and comprises a plot region surrounded by margins.
1 | mai=c(1, 0.5, 0.5, 0) # Widths of the bottom, left, top, right margins |
Multiple figure environment
1 | # Set the size of a multiple figure array |
Device drivers
1 | X11() # For unix |
Multiple graphics devices
1 | quartz() |
13 Packages
library()
: see which packages are installed.library(name)
: load package.install/update.packages()
search()
: see which packages are currently loaded.loadedNamespaces()
: display packages may be loaded but not available on the search list
Namespaces
Packages have naespaces:
- allow to hide functions and data that are meant only for internal use.
- prevent functions from name clashes.
- provide a way to refer to an object within a particular package.
t()
is the same as base::t()
:::
allows access to hidden obejcts. Users are more likely to use getAnywhere()
.