线性回归
基本模型
y = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β m x m + ϵ y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_mx_m + \epsilon y=β0+β1x1+β2x2+⋯+βmxm+ϵ
- y y y为因变量
- x 1 , x 2 , … , x m x_1, x_2, \ldots, x_m x1,x2,…,xm为 m 个自变量
- ϵ \epsilon ϵ为残差
lm()
函数
- 用于完成多元线性回归系数估计,回归系数和方程统计检验
- 使用格式:
lm(formula, data, subset, weights, na.action, ...)
formula
参数为模型公式,例如y ~ x1 + x2
data
参数为 dataframe 格式数据
拟合示例
blood <- data.frame(
x1 = c(76.0, 91.5, 85.5, 82.5, 79.0, 80.5, 74.5, 79.0, 85.0, 76.5, 82.0, 95.0, 92.5),
x2 = c(50, 20, 20, 30, 30, 50, 60, 50, 40, 55, 40, 40, 20),
y = c(120, 141, 124, 126, 117, 125, 123, 125, 132, 123, 132, 155, 147)
)
lm_sol <- lm(y ~ x1 + x2, data = blood)
summary(lm_sol)
拟合结果
- 拟合残差
- 回归系数、( t ) 统计量值与 ( p ) 值
- ( r^2 ) 方、调整后的 ( r^2 )
- ( f ) 统计量、( p ) 值
预测
new <- data.frame(
x1 = c(75, 85),
x2 = c(40, 60)
)
predict(lm_sol, new, interval = "confidence", level = 0.95)
二分类逻辑回归
基本模型
ln ( p 1 − p ) = β 0 + β 1 x 1 + β 2 x 2 + ⋯ + β m x m \ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_mx_m ln(1−pp)=β0+β1x1+β2x2+⋯+βmxm
- x 1 , x 2 , … , x m x_1, x_2, \ldots, x_m x1,x2,…,xm为 (m) 个自变量
- ( y ) 的取值为 0 或 1
- ( p ) 为样本 ( y ) 取 1 的概率
glm()
函数
- 用于完成 logistic 回归,泊松回归等模型的系数估计和参数检验
- 使用格式:
glm(formula, family = binomial(link = 'logit'), data, weights, subset, na.action, ...)
family
参数为拟合分布所属的分布族,取 logit 则为 logistic 回归
拟合示例
library(readr)
student <- read_csv("student.csv")[, 2:4]
lr_sol <- glm(qualification ~ gpa + ability, family = binomial(link = 'logit'), data = student)
summary(lr_sol)
拟合结果
- 回归模型
预测
new <- data.frame(
gpa = c(3.0, 2.5, 3.5),
ability = c(550, 420, 600)
)
prob_fit <- predict(lr_sol, new, type = 'response')
threshold <- 0.5
new_prediction <- rep(0, nrow(new))
new_prediction[prob_fit > threshold] <- 1
- ( p ) 估计值
- qualification 估计值
发表评论