The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis.
A t-test is the most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistics (under certain conditions) follow a Student's t distribution. The t-test can be used, for example, to determine if the means of two sets of data are significantly different from each other.
Calculate the mean anxiety for the picture group using R.
By hand, plug the mean anxiety value for the picture group into the line equation.
Tip: ignore the error term, and remember that the picture group was given the group number zero.
What is the intercept value for the line? (b0)
What does the intercept represent, in term of the group data?
Create the dataframe using the following command:
Group <- gl(2, 12, labels = c("Picture", "Real Spider"))
Anxiety <- c(30, 35, 45, 40, 50, 35, 55, 25, 30, 45, 40, 50, 40, 35, 50, 55, 65, 55, 50, 35, 30, 50, 60, 39 )
spiderLong <- data.frame(Group, Anxiety)
Calculate the mean using the group name (βPictureβ) to select the picture data, and the data frame column number (2nd column) to select the anxiety column:
meanPicture <- mean(spiderLong[Group=="Picture", 2])
meanPicture
Or using the dataframe column name (βAnxietyβ):
meanPicture <- mean(spiderLong[Group=="Picture", "Anxiety"])
meanPicture
Complete the line equation for the picture group, using group number zero and the mean anxiety value for the picture group:
πππ₯πππ‘π¦0 = π0 + π10
πππ₯πππ‘π¦0 = ππππ πππ₯πππ‘π¦ πππ π‘βπ ππππ‘π’ππ ππππ’π = π0 + 0
ππππ πππ₯πππ‘π¦ πππ π‘βπ ππππ‘π’ππ ππππ’π = π0
π0 = 40
b0, the intercept, represents the average of the picture-only group.
Now calculate the mean anxiety for the real spider group using R.
By hand, plug the mean anxiety value for the real-spider group into the line equation.
Tip: ignore the error term, and remember that the real-spider group was given the group number one.
What is the slope of the line? (b1)
What does it represent in terms of the two conditions?
Calculate the mean of the real-spider data:
meanRealSpider <- mean(spiderLong[Group=="Real Spider", "Anxiety"])
meanRealSpider
πππ₯πππ‘π¦1 = π0 + π11
πππ₯πππ‘π¦1 = ππππ πππ₯πππ‘π¦ πππ π‘βπ πππππ πππππ ππππ’π = 40 + π11
ππππ πππ₯πππ‘π¦ πππ π‘βπ πππππ πππππ ππππ’π = ππππ πππ₯πππ‘π¦ πππ π‘βπ ππππ‘π’ππ ππππ’π + π1
47 = 40 + π1
π1 = 47 β 40 = 7
π1 = ππππ πππππ πππππ β ππππ ππππ‘π’ππ = 7
b1, the slope, represents the difference between the real-spider group mean and the baseline (the picture-only group) mean.
Write, by hand, the full equation for the anxiety model, including the b coefficients.
πππ₯πππ‘π¦π = π0 + π1ππππ’ππ
πππ₯πππ‘π¦π = 40 + 7ππππ’ππ
Run a linear regression model over the data, using anxiety as the dependent variable, and the group as the independent variable.
spider_lm <- lm(spiderLong$Anxiety ~ spiderLong$Group)
spider_lm
Examine the model:
summary(spider_lm)
Is the difference in anxiety significant between the picture-only and the real-spider groups?
- No, the difference between groups is not significant if we set an alpha of 5%. The P value for the slope (spiderLong$GroupReal Spider) has a P value > 0.05.
- Note: you must decide your significance threshold before collecting your data.
Now run an independent t-test over the data, like your learnt in RMS1 lab 14 (revise it if needed). Examine and interpret the output. You can write the function input comparing a list x versus a list y:
t.test(Anxiety ~ Group, data = spiderLong, paired = FALSE, var.equal = TRUE, conf.level = 0.95)
Simple linear regression: A simple linear regression is a linear regression model with a single explanatory variable.
[From M. Crawley, βThe R bookβ (second edition) Wiley p. 450] This example exercise uses data about the growth of caterpillars fed on experimental diets containing increasing quantities of tannin.
Create a dataframe using the command:
reg.data <- data.frame(growth=c(12,10,8,11,6,7,2,3,3),
tannin=c(0,1,2,3,4,5,6,7,8))
Plot a scatterplot:
reg.data
plot(reg.data)
Calculate slope and intercept to 2 decimal digits using the lm() function. Tip: decide which variable you want to use as dependent variable. Growth or tannin? Here we are measuring growth, and we are observing how tannin influences it, so growth is the dependent variable (often called y) and tannin is the independent variable (often called x):
model <- lm(reg.data$growth ~ reg.data$tannin)
model
Write the line equation for the linear model obtained with lm():
π¦ β 11.76 β 1.22 π₯
Plot the linear model over the scatterplot. Tip: use the abline() function.
plot(x = reg.data$tannin, y = reg.data$growth)
abline(model, col="red")
How can you extract more information about your linear model?
Tip: try to jot down a few possible commands now.
You can use the function summary():
summary(model)
And you could use the attributes() function to list the elements present inside the model object:
attributes(model)
Once you know the attribute name, you can get each attribute value using the dollar notation (
model$residuals
You can also access the elements of the model summary using the dollar notation
attributes(summary(model))
So you can get, for example, the proportion of variance accounted for by the model using the notation:
summary(model)$r.squared
Get the amount of variance accounted for by your linear model (R squared) from the model object in R. Is the model a good model?
summary(model)$r.squared
The model accounts for about 80% of the variance in the data, but we need more information to judge the model. For example more information from the model plots (to see if the assumptions are upheld), from the F value for the regression sum of squares, from the confidence intervals for slope and intercept and so on.
Use anova() on the model to find out the value of SSR and SSM .
ANOVA and linear models are the same thing.
Youβll later see that one or the other can be more practical.
anova(model)
The ANOVA output also includes the F ratio. Calculate the F ratio.
attributes(anova(model))
The data can be accessed using the row and column names, or using the row and column numbers:
anova(model)["reg.data$tannin","Mean Sq"]
anova(model)[1, 3]
anova(model)["Residuals","Mean Sq"]
anova(model)[2, 3]
Now you can calculate the F-ratio:
MSM <- anova(model)["reg.data$tannin","Mean Sq"]
MSR <- anova(model)["Residuals","Mean Sq"]
F_ratio <- MSM/MSR
F_ratio
Material adapted from:
β’Discovering statistics using R. Authors: A. Field, J. Miles, Z. Field. Publisher: Sage, 2012
β’The R book (second edition). Authors: Michael J. Crawley. Publisher: Wiley, 2013
β’Research methods and statistics 2, Tom Booths and Alex Doumas 2018.