Lab 3: Multiple regression

PS 1

In this lab you will build a linear model that takes into account both the advertising and the airplay, to predict album sales. First we need to fit a model using a bunch of known values of sales, advertising and airplay and then we need to derive the values of b0, b1 and b2 (the intercept and the slopes). We then need to know the budget spent on advertising for the album of interest, and how many times it was played on the radio. If we have these five values (b0, b1, b2, advertising and airplay for the album) we can emit a prediction regarding how much the album will sell.

In [3]:
sales <- read.delim("path/sales.dat")
print(sales)
     adverts     sales  airplay attract
1     10.256       330       43      10
2    985.685       120       28       7
3   1445.563       360       35       7
4   1188.193       270       33       7
5    574.513       220       44       5
6    568.954       170       19       5
7    471.814        70       20       1
8    537.352       210       22       9
9    500.000       200       NA      NA
10   514.068       200       21       7
11   174.093       300       40       7
12  1720.806       290       32       7
13   611.479        70       20       2
14   251.192       150       24       8
15    97.972       190       38       6
16   406.814       240       24       7
17   265.398       100       25       5
18  1323.287       250       35       5
19   950.982       240 99999999       4
20   196.650       210       36       8
21  1326.598       280       27       8
22  1380.689       230       33       8
23   792.345       210       33       7
24   957.167       230       28       6
25  1789.659       320       30       9
26   656.137       210       34       7
27   613.697       230       49       7
28   313.362       250       40       8
29   313.000 249albums       39       7
30   336.510        60       20       4
31  1544.899       330       42       7
32    68.954       150       35       8
33   785.692       150        8       6
34   125.628       180       49       7
35   377.925        80       19       8
36   217.994       180       42       6
37   759.862       130        6       7
38  1163.444       320       36       6
39   842.957       280       32       7
40   125.179       200       28       6
41   236.598       130       25       8
42   669.811       190       34       8
43   612.234       150       21       6
44   922.019       230       34       7
45    50.000       310       63       7
46  2000.000       340       31       7
47  1054.027       240       25       7
48   385.045       180       42       7
49  1507.972       220       37       7
50   102.568        40       25       8
51   204.568       190       26       7
52  1170.918       290       39       7
53   689.547       340       46       7
54   784.220       250       36       6
55   405.913       190       12       4
56   179.778       120        2       8
57   607.258       230       29       8
58  1542.329       190       33       8
59  1112.470       210       28       7
60   856.985       170       10       6
61   836.331       310       38       7
62   236.908        90       19       4
63  1077.855       140       13       6
64   579.321       300       30       7
65  1500.000       340       38       8
66   731.364       170       22       8
67    25.689       100       23       6
68   391.749       200       22       9
69   233.999        80       20       7
70   275.700       100       18       6
71    56.895        70       37       7
72   255.117        50       16       8
73   566.501       240       32       8
74   102.568       160       26       5
75   250.568       290       53       9
76    68.594       140       28       7
77   642.786       210       32       7
78  1500.000       300       24       7
79   102.563       230       37       6
80   756.984       280       30       8
81    51.229       160       19       7
82   644.151       200       47       6
83    15.313       110       22       5
84   243.237       110       10       8
85   256.894        70        1       4
86    22.464       100        1       6
87    45.689       190       39       6
88   724.938        70        8       5
89  1126.461       360       38       7
90  1985.119       360       35       5
91  1837.516       300       40       5
92   135.986       120       22       7
93   237.703       150       27       8
94   976.641       220       31       6
95  1452.689       280       19       7
96  1600.000       300       24       9
97   268.598       140        1       7
98   900.889       290       38       8
99   982.063       180       26       6
100  201.356       140       11       6
101  746.024       210       34       6
102 1132.877       250       55       7
103 1000.000       250        5       7
104   75.896       120       34       6
105 1351.254       290       37       9
106  202.705        60       13       8
107  365.985       140       23       6
108  305.268       290       54       6
109  263.268       160       18       7
110  513.694       100        2       7
111  152.609       160       11       6
112   35.987       150       30       8
113  102.568       140       22       7
114  215.368       230       36       6
115  426.784       230       37       8
116  507.772        30        9       3
117  233.291        80        2       7
118 1035.433       190       12       8
119  102.642        90        5       9
120  526.142       120       14       7
121  624.538       150       20       5
122  912.349       230       57       6
123  215.994       150       19       8
124  561.963       210       35       7
125  474.760       180       22       5
126  231.523       140       16       7
127  678.596       360       53       7
128   70.922        10        4       6
129 1567.548       240       29       6
130  263.598       270       43       7
131 1423.568       290       26       7
132  715.678       220       28       7
133  777.237       230       37       8
134  509.430       220       32       5
135  964.110       240       34       7
136  583.627       260       30       7
137  923.373       170       15       7
138  344.392       130       23       7
139 1095.578       270       31       8
140  100.025       140       21       5
141   30.425        60       28       1
142 1080.342       210       18       7
143  799.899       210       28       7
144 1071.752       240       37       8
145  893.355       210       26       6
146  283.161       200       30       8
147  917.017       140       10       7
148  234.568        90       21       7
149  456.897       120       18       9
150  206.973       100       14       7
151 1294.099       360       38       7
152  826.859       180       36       6
153  564.158       150       32       7
154  192.607       110        9       5
155   10.652        90       39       5
156   45.689       160       24       7
157   42.568       230       45       7
158   20.456        40       13       8
159  635.192        60       17       6
160 1002.273       230       32       7
161 1177.047       230       23       6
162  507.638       120        0       6
163  215.689       150       35       5
164  526.480       120       26       6
165   26.895        60       19       6
166  883.877       280       26       7
167    9.104       120       53       8
168  103.568       230       29       8
169  169.583       230       28       7
170  429.504        40       17       6
171  223.639       140       26       8
172  145.585       360       42       8
173  985.968       210       17       6
174  500.922       260       36       8
175  226.652       250       45       7
176 1051.168       200       20       7
177   68.093       150       15       7
178 1547.159       250       28       8
179  393.774       100       27       6
180  804.282       260       17       8
181  801.577       210       32       8
182  450.562       290       46       9
183   26.598       220       47       8
184  179.061        70       19       1
185  345.687       110       22       8
186  295.840       250       55       9
187 2271.860       320       31       5
188 1134.575       300       39       8
189  601.434       180       21       6
190   45.298       180       36       6
191  759.518       200       21       7
192  832.869       320       44       7
193   56.894       140       27       7
194  709.399       100       16       6
195   56.895       120       33       6
196  767.134       230       33       8
197  503.172       150       21       7
198  700.929       250       35       9
199  910.851       190       26       7
200  888.569       240       14       6
201  800.615       250       34       6
202 1500.000       230       11       8
203  785.694       110       20       9

Examine your data. Are there data points that requires special attention? How can you find out?

The function summary() is a good starting point.

In [5]:
summary(sales)
    adverts            sales              airplay            attract      
 Min.   :   9.104   Length:203         Min.   :       0   Min.   : 1.000  
 1st Qu.: 216.994   Class :character   1st Qu.:      20   1st Qu.: 6.000  
 Median : 526.480   Mode  :character   Median :      28   Median : 7.000  
 Mean   : 614.022                      Mean   :  495077   Mean   : 6.757  
 3rd Qu.: 911.600                      3rd Qu.:      36   3rd Qu.: 8.000  
 Max.   :2271.860                      Max.   :99999999   Max.   :10.000  
                                       NA's   :1          NA's   :1       

The output highlights that sales is a factor (it should be an integer, it’s the number of albums sold), and that the airplay and attract columns contain NA values. Also, the maximum values of airplay looks suspiciously large (it’s unlikely that any song has been played on the radio 99 million times in a week, even if it may feel like it for some pop hits).

Check if your data frame contains NA values.

To find which rows contain NA values you can use the command:

In [8]:
which(is.na(sales$airplay))
9
In [9]:
which(is.na(sales$attract))
9

To remove a row by index:

In [10]:
sales <- sales[-9, ]

To find and remove all rows containing NA values (all in one go):

In [11]:
sales <- sales[complete.cases(sales), ]
print(sales)
     adverts     sales  airplay attract
1     10.256       330       43      10
2    985.685       120       28       7
3   1445.563       360       35       7
4   1188.193       270       33       7
5    574.513       220       44       5
6    568.954       170       19       5
7    471.814        70       20       1
8    537.352       210       22       9
10   514.068       200       21       7
11   174.093       300       40       7
12  1720.806       290       32       7
13   611.479        70       20       2
14   251.192       150       24       8
15    97.972       190       38       6
16   406.814       240       24       7
17   265.398       100       25       5
18  1323.287       250       35       5
19   950.982       240 99999999       4
20   196.650       210       36       8
21  1326.598       280       27       8
22  1380.689       230       33       8
23   792.345       210       33       7
24   957.167       230       28       6
25  1789.659       320       30       9
26   656.137       210       34       7
27   613.697       230       49       7
28   313.362       250       40       8
29   313.000 249albums       39       7
30   336.510        60       20       4
31  1544.899       330       42       7
32    68.954       150       35       8
33   785.692       150        8       6
34   125.628       180       49       7
35   377.925        80       19       8
36   217.994       180       42       6
37   759.862       130        6       7
38  1163.444       320       36       6
39   842.957       280       32       7
40   125.179       200       28       6
41   236.598       130       25       8
42   669.811       190       34       8
43   612.234       150       21       6
44   922.019       230       34       7
45    50.000       310       63       7
46  2000.000       340       31       7
47  1054.027       240       25       7
48   385.045       180       42       7
49  1507.972       220       37       7
50   102.568        40       25       8
51   204.568       190       26       7
52  1170.918       290       39       7
53   689.547       340       46       7
54   784.220       250       36       6
55   405.913       190       12       4
56   179.778       120        2       8
57   607.258       230       29       8
58  1542.329       190       33       8
59  1112.470       210       28       7
60   856.985       170       10       6
61   836.331       310       38       7
62   236.908        90       19       4
63  1077.855       140       13       6
64   579.321       300       30       7
65  1500.000       340       38       8
66   731.364       170       22       8
67    25.689       100       23       6
68   391.749       200       22       9
69   233.999        80       20       7
70   275.700       100       18       6
71    56.895        70       37       7
72   255.117        50       16       8
73   566.501       240       32       8
74   102.568       160       26       5
75   250.568       290       53       9
76    68.594       140       28       7
77   642.786       210       32       7
78  1500.000       300       24       7
79   102.563       230       37       6
80   756.984       280       30       8
81    51.229       160       19       7
82   644.151       200       47       6
83    15.313       110       22       5
84   243.237       110       10       8
85   256.894        70        1       4
86    22.464       100        1       6
87    45.689       190       39       6
88   724.938        70        8       5
89  1126.461       360       38       7
90  1985.119       360       35       5
91  1837.516       300       40       5
92   135.986       120       22       7
93   237.703       150       27       8
94   976.641       220       31       6
95  1452.689       280       19       7
96  1600.000       300       24       9
97   268.598       140        1       7
98   900.889       290       38       8
99   982.063       180       26       6
100  201.356       140       11       6
101  746.024       210       34       6
102 1132.877       250       55       7
103 1000.000       250        5       7
104   75.896       120       34       6
105 1351.254       290       37       9
106  202.705        60       13       8
107  365.985       140       23       6
108  305.268       290       54       6
109  263.268       160       18       7
110  513.694       100        2       7
111  152.609       160       11       6
112   35.987       150       30       8
113  102.568       140       22       7
114  215.368       230       36       6
115  426.784       230       37       8
116  507.772        30        9       3
117  233.291        80        2       7
118 1035.433       190       12       8
119  102.642        90        5       9
120  526.142       120       14       7
121  624.538       150       20       5
122  912.349       230       57       6
123  215.994       150       19       8
124  561.963       210       35       7
125  474.760       180       22       5
126  231.523       140       16       7
127  678.596       360       53       7
128   70.922        10        4       6
129 1567.548       240       29       6
130  263.598       270       43       7
131 1423.568       290       26       7
132  715.678       220       28       7
133  777.237       230       37       8
134  509.430       220       32       5
135  964.110       240       34       7
136  583.627       260       30       7
137  923.373       170       15       7
138  344.392       130       23       7
139 1095.578       270       31       8
140  100.025       140       21       5
141   30.425        60       28       1
142 1080.342       210       18       7
143  799.899       210       28       7
144 1071.752       240       37       8
145  893.355       210       26       6
146  283.161       200       30       8
147  917.017       140       10       7
148  234.568        90       21       7
149  456.897       120       18       9
150  206.973       100       14       7
151 1294.099       360       38       7
152  826.859       180       36       6
153  564.158       150       32       7
154  192.607       110        9       5
155   10.652        90       39       5
156   45.689       160       24       7
157   42.568       230       45       7
158   20.456        40       13       8
159  635.192        60       17       6
160 1002.273       230       32       7
161 1177.047       230       23       6
162  507.638       120        0       6
163  215.689       150       35       5
164  526.480       120       26       6
165   26.895        60       19       6
166  883.877       280       26       7
167    9.104       120       53       8
168  103.568       230       29       8
169  169.583       230       28       7
170  429.504        40       17       6
171  223.639       140       26       8
172  145.585       360       42       8
173  985.968       210       17       6
174  500.922       260       36       8
175  226.652       250       45       7
176 1051.168       200       20       7
177   68.093       150       15       7
178 1547.159       250       28       8
179  393.774       100       27       6
180  804.282       260       17       8
181  801.577       210       32       8
182  450.562       290       46       9
183   26.598       220       47       8
184  179.061        70       19       1
185  345.687       110       22       8
186  295.840       250       55       9
187 2271.860       320       31       5
188 1134.575       300       39       8
189  601.434       180       21       6
190   45.298       180       36       6
191  759.518       200       21       7
192  832.869       320       44       7
193   56.894       140       27       7
194  709.399       100       16       6
195   56.895       120       33       6
196  767.134       230       33       8
197  503.172       150       21       7
198  700.929       250       35       9
199  910.851       190       26       7
200  888.569       240       14       6
201  800.615       250       34       6
202 1500.000       230       11       8
203  785.694       110       20       9

Check the type of each column:

In [12]:
sapply(sales, class)
adverts
'numeric'
sales
'character'
airplay
'integer'
attract
'integer'

Now, correct the data and make sure the column type is now appropriate for the analysis. First converte the column to character, then again to numeric (forcing a column into another data type is called coercion):

In [13]:
converted_sales_column <- as.numeric(as.character(sales$sales))
Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"

Then find the NA values indices:

In [14]:
which(is.na(converted_sales_column))
28

Or, in one long command:

In [16]:
which(is.na(as.numeric(as.character(sales$sales))))
Warning message in which(is.na(as.numeric(as.character(sales$sales)))):
"NAs introduced by coercion"
28

Or you can

- Coerce the column from factor to integer (not numeric, as there cannot be decimal places in the number of albums sold, in this example you cannot sell a third of an album)
- Correct the now missing (NA) cell value in row 28
- Put the column back into the data frame
In [18]:
corrected_sales <- as.integer(as.character(sales$sales))
corrected_sales
corrected_sales[28] <- 249
sales$sales <- corrected_sales
print(sales)
Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"
  1. 330
  2. 120
  3. 360
  4. 270
  5. 220
  6. 170
  7. 70
  8. 210
  9. 200
  10. 300
  11. 290
  12. 70
  13. 150
  14. 190
  15. 240
  16. 100
  17. 250
  18. 240
  19. 210
  20. 280
  21. 230
  22. 210
  23. 230
  24. 320
  25. 210
  26. 230
  27. 250
  28. <NA>
  29. 60
  30. 330
  31. 150
  32. 150
  33. 180
  34. 80
  35. 180
  36. 130
  37. 320
  38. 280
  39. 200
  40. 130
  41. 190
  42. 150
  43. 230
  44. 310
  45. 340
  46. 240
  47. 180
  48. 220
  49. 40
  50. 190
  51. 290
  52. 340
  53. 250
  54. 190
  55. 120
  56. 230
  57. 190
  58. 210
  59. 170
  60. 310
  61. 90
  62. 140
  63. 300
  64. 340
  65. 170
  66. 100
  67. 200
  68. 80
  69. 100
  70. 70
  71. 50
  72. 240
  73. 160
  74. 290
  75. 140
  76. 210
  77. 300
  78. 230
  79. 280
  80. 160
  81. 200
  82. 110
  83. 110
  84. 70
  85. 100
  86. 190
  87. 70
  88. 360
  89. 360
  90. 300
  91. 120
  92. 150
  93. 220
  94. 280
  95. 300
  96. 140
  97. 290
  98. 180
  99. 140
  100. 210
  101. 250
  102. 250
  103. 120
  104. 290
  105. 60
  106. 140
  107. 290
  108. 160
  109. 100
  110. 160
  111. 150
  112. 140
  113. 230
  114. 230
  115. 30
  116. 80
  117. 190
  118. 90
  119. 120
  120. 150
  121. 230
  122. 150
  123. 210
  124. 180
  125. 140
  126. 360
  127. 10
  128. 240
  129. 270
  130. 290
  131. 220
  132. 230
  133. 220
  134. 240
  135. 260
  136. 170
  137. 130
  138. 270
  139. 140
  140. 60
  141. 210
  142. 210
  143. 240
  144. 210
  145. 200
  146. 140
  147. 90
  148. 120
  149. 100
  150. 360
  151. 180
  152. 150
  153. 110
  154. 90
  155. 160
  156. 230
  157. 40
  158. 60
  159. 230
  160. 230
  161. 120
  162. 150
  163. 120
  164. 60
  165. 280
  166. 120
  167. 230
  168. 230
  169. 40
  170. 140
  171. 360
  172. 210
  173. 260
  174. 250
  175. 200
  176. 150
  177. 250
  178. 100
  179. 260
  180. 210
  181. 290
  182. 220
  183. 70
  184. 110
  185. 250
  186. 320
  187. 300
  188. 180
  189. 180
  190. 200
  191. 320
  192. 140
  193. 100
  194. 120
  195. 230
  196. 150
  197. 250
  198. 190
  199. 240
  200. 250
  201. 230
  202. 110
     adverts sales  airplay attract
1     10.256   330       43      10
2    985.685   120       28       7
3   1445.563   360       35       7
4   1188.193   270       33       7
5    574.513   220       44       5
6    568.954   170       19       5
7    471.814    70       20       1
8    537.352   210       22       9
10   514.068   200       21       7
11   174.093   300       40       7
12  1720.806   290       32       7
13   611.479    70       20       2
14   251.192   150       24       8
15    97.972   190       38       6
16   406.814   240       24       7
17   265.398   100       25       5
18  1323.287   250       35       5
19   950.982   240 99999999       4
20   196.650   210       36       8
21  1326.598   280       27       8
22  1380.689   230       33       8
23   792.345   210       33       7
24   957.167   230       28       6
25  1789.659   320       30       9
26   656.137   210       34       7
27   613.697   230       49       7
28   313.362   250       40       8
29   313.000   249       39       7
30   336.510    60       20       4
31  1544.899   330       42       7
32    68.954   150       35       8
33   785.692   150        8       6
34   125.628   180       49       7
35   377.925    80       19       8
36   217.994   180       42       6
37   759.862   130        6       7
38  1163.444   320       36       6
39   842.957   280       32       7
40   125.179   200       28       6
41   236.598   130       25       8
42   669.811   190       34       8
43   612.234   150       21       6
44   922.019   230       34       7
45    50.000   310       63       7
46  2000.000   340       31       7
47  1054.027   240       25       7
48   385.045   180       42       7
49  1507.972   220       37       7
50   102.568    40       25       8
51   204.568   190       26       7
52  1170.918   290       39       7
53   689.547   340       46       7
54   784.220   250       36       6
55   405.913   190       12       4
56   179.778   120        2       8
57   607.258   230       29       8
58  1542.329   190       33       8
59  1112.470   210       28       7
60   856.985   170       10       6
61   836.331   310       38       7
62   236.908    90       19       4
63  1077.855   140       13       6
64   579.321   300       30       7
65  1500.000   340       38       8
66   731.364   170       22       8
67    25.689   100       23       6
68   391.749   200       22       9
69   233.999    80       20       7
70   275.700   100       18       6
71    56.895    70       37       7
72   255.117    50       16       8
73   566.501   240       32       8
74   102.568   160       26       5
75   250.568   290       53       9
76    68.594   140       28       7
77   642.786   210       32       7
78  1500.000   300       24       7
79   102.563   230       37       6
80   756.984   280       30       8
81    51.229   160       19       7
82   644.151   200       47       6
83    15.313   110       22       5
84   243.237   110       10       8
85   256.894    70        1       4
86    22.464   100        1       6
87    45.689   190       39       6
88   724.938    70        8       5
89  1126.461   360       38       7
90  1985.119   360       35       5
91  1837.516   300       40       5
92   135.986   120       22       7
93   237.703   150       27       8
94   976.641   220       31       6
95  1452.689   280       19       7
96  1600.000   300       24       9
97   268.598   140        1       7
98   900.889   290       38       8
99   982.063   180       26       6
100  201.356   140       11       6
101  746.024   210       34       6
102 1132.877   250       55       7
103 1000.000   250        5       7
104   75.896   120       34       6
105 1351.254   290       37       9
106  202.705    60       13       8
107  365.985   140       23       6
108  305.268   290       54       6
109  263.268   160       18       7
110  513.694   100        2       7
111  152.609   160       11       6
112   35.987   150       30       8
113  102.568   140       22       7
114  215.368   230       36       6
115  426.784   230       37       8
116  507.772    30        9       3
117  233.291    80        2       7
118 1035.433   190       12       8
119  102.642    90        5       9
120  526.142   120       14       7
121  624.538   150       20       5
122  912.349   230       57       6
123  215.994   150       19       8
124  561.963   210       35       7
125  474.760   180       22       5
126  231.523   140       16       7
127  678.596   360       53       7
128   70.922    10        4       6
129 1567.548   240       29       6
130  263.598   270       43       7
131 1423.568   290       26       7
132  715.678   220       28       7
133  777.237   230       37       8
134  509.430   220       32       5
135  964.110   240       34       7
136  583.627   260       30       7
137  923.373   170       15       7
138  344.392   130       23       7
139 1095.578   270       31       8
140  100.025   140       21       5
141   30.425    60       28       1
142 1080.342   210       18       7
143  799.899   210       28       7
144 1071.752   240       37       8
145  893.355   210       26       6
146  283.161   200       30       8
147  917.017   140       10       7
148  234.568    90       21       7
149  456.897   120       18       9
150  206.973   100       14       7
151 1294.099   360       38       7
152  826.859   180       36       6
153  564.158   150       32       7
154  192.607   110        9       5
155   10.652    90       39       5
156   45.689   160       24       7
157   42.568   230       45       7
158   20.456    40       13       8
159  635.192    60       17       6
160 1002.273   230       32       7
161 1177.047   230       23       6
162  507.638   120        0       6
163  215.689   150       35       5
164  526.480   120       26       6
165   26.895    60       19       6
166  883.877   280       26       7
167    9.104   120       53       8
168  103.568   230       29       8
169  169.583   230       28       7
170  429.504    40       17       6
171  223.639   140       26       8
172  145.585   360       42       8
173  985.968   210       17       6
174  500.922   260       36       8
175  226.652   250       45       7
176 1051.168   200       20       7
177   68.093   150       15       7
178 1547.159   250       28       8
179  393.774   100       27       6
180  804.282   260       17       8
181  801.577   210       32       8
182  450.562   290       46       9
183   26.598   220       47       8
184  179.061    70       19       1
185  345.687   110       22       8
186  295.840   250       55       9
187 2271.860   320       31       5
188 1134.575   300       39       8
189  601.434   180       21       6
190   45.298   180       36       6
191  759.518   200       21       7
192  832.869   320       44       7
193   56.894   140       27       7
194  709.399   100       16       6
195   56.895   120       33       6
196  767.134   230       33       8
197  503.172   150       21       7
198  700.929   250       35       9
199  910.851   190       26       7
200  888.569   240       14       6
201  800.615   250       34       6
202 1500.000   230       11       8
203  785.694   110       20       9

Now the data is technically correct, let’s make the data consistent, that is, including only data points meaningful for that domain.

Remove the rows containing dramatic outliers.

In [19]:
plot(sales$adverts, sales$airplay)

You can find the data point by selecting any row with high airplay:

In [20]:
which(sales$airplay > 50)
  1. 18
  2. 44
  3. 74
  4. 101
  5. 107
  6. 121
  7. 126
  8. 166
  9. 185
In [21]:
which(sales$airplay > 100)
18

Remove row 18:

In [22]:
sales <- sales[-18, ]

We had 203 records (data frame rows) initially, now we have 201 left:

In [23]:
length(sales$airplay)
201

Let's have a look at the plots

In [24]:
plot(sales$adverts, sales$airplay)
In [25]:
plot(sales$sales, sales$airplay)

Generate in R a multiple linear model to predict sales from airplay and adverts.

In [27]:
sales_model <- lm(sales ~ adverts + airplay, data = sales)
summary(sales)
    adverts             sales          airplay         attract      
 Min.   :   9.104   Min.   : 10.0   Min.   : 0.00   Min.   : 1.000  
 1st Qu.: 215.994   1st Qu.:140.0   1st Qu.:20.00   1st Qu.: 6.000  
 Median : 526.480   Median :200.0   Median :28.00   Median : 7.000  
 Mean   : 612.913   Mean   :193.5   Mean   :27.56   Mean   : 6.771  
 3rd Qu.: 910.851   3rd Qu.:250.0   3rd Qu.:36.00   3rd Qu.: 8.000  
 Max.   :2271.860   Max.   :360.0   Max.   :63.00   Max.   :10.000  

Generate a model with only one predictor, advertisement:

In [28]:
sales_model_1var <- lm(sales ~ adverts, data = sales)

Now update the model to keep the same output and predictors as the input model (this is what the .~. symbols mean, it is not a frowning emoticon) and add the “airplay” and “attract” data:

In [29]:
sales_model_3var <- update(object = sales_model_1var, .~. + airplay + attract)

Now compare the two models, the 1var model contains only adverts:

In [30]:
summary(sales_model_1var)
Call:
lm(formula = sales ~ adverts, data = sales)

Residuals:
     Min       1Q   Median       3Q      Max 
-153.428  -43.864   -0.628   37.063  211.191 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.349e+02  7.526e+00  17.924   <2e-16 ***
adverts     9.558e-02  9.639e-03   9.917   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 66.1 on 199 degrees of freedom
Multiple R-squared:  0.3307,	Adjusted R-squared:  0.3274 
F-statistic: 98.34 on 1 and 199 DF,  p-value: < 2.2e-16

While the 3var model contains also airplay and attractiveness:

In [31]:
summary(sales_model_3var)
Call:
lm(formula = sales ~ adverts + airplay + attract, data = sales)

Residuals:
     Min       1Q   Median       3Q      Max 
-122.125  -28.324   -0.207   29.243  143.551 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -26.719337  17.337079  -1.541    0.125    
adverts       0.084587   0.006909  12.243  < 2e-16 ***
airplay       3.383720   0.276904  12.220  < 2e-16 ***
attract      11.092158   2.436088   4.553 9.24e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 47.05 on 197 degrees of freedom
Multiple R-squared:  0.6642,	Adjusted R-squared:  0.6591 
F-statistic: 129.9 on 3 and 197 DF,  p-value: < 2.2e-16

The adverts-only model accounts for about 32.7% of the variation. The three-predictors model accounts for 65.9% of the variation. Comparing the R squared to the adjusted R squared can give us an idea of how a model generalises. Ideally, we want the R squared and adjusted R squared values to be close. This means that te amount of variation accounted for does not shrink too much when we move from a model derived from a sample (R squared) to a model derived – ideally – from the population (adjusted R squared). Luckily, the variance accounted for does not shrink too much for any of the models (from 33.1% to 32.7% and from 66.4% to 65.9%).

Adapted from:

Discovering statistics using R. Authors: A. Field, J. Miles, Z. Field. Publisher: Sage, 2012 (chapter 7 – Multiple regression, p 261-311)

Research methods and statistics 2, Tom Booths and Alex Doumas 2018.

In [ ]: