Functional Logistic Regression Model

When we have a set of curves observed in a finite set of time points from different sample individuals we are talking about functional data. The FDA, or functional data analysis studies how to reconstruct the true functional form of the data, and creates a basis expansion to achieve it. For another hand, glm models can be applied in this way, to predict a binary response variable from a  functional predictor. A classic example about berkeley growth study is presented here to show the utility of functional analysis models.

We are going to use fda and fda.usc library, where is contained growth dataset, and RColorBrewer for set amazing colors.

col.boys <- '#386CB075'; col.girls <- '#F0027F75'

The dataset is a list containing the heights of 39 boys and 54 girls from age 1 to 18 and the ages at which they were collected. We're going to apply smooth basis for mitigate roughness of data.

sm.growth.girls <- with(growth, smooth.basisPar(argvals=age, y=hgtf, lambda=0.1))
sm.growth.boys <- with(growth, smooth.basisPar(argvals=age, y=hgtm, lambda=0.1))

par(mfrow=c(1,2)) # En diferentes gráficos
plot(sm.growth.girls$fd, xlab="age", ylab="height (cm)",
main="Girls' Growth",lty=1, lwd=2, ylim=c(75,200))
plot(sm.growth.boys$fd, xlab="age", ylab="height (cm)",
main="Boys' Growth",lty=1, lwd=2, ylim=c(75,200))

par(mfrow=c(1,1)) # En un mismo gráfico
plot(sm.growth.boys$fd, xlab="age", ylab="height (cm)",
main="Berkeley Growth Study data",
lty=1, lwd=2.5, col=col.boys)
lines(sm.growth.girls$fd, lty=1, lwd=2.5, col=col.girls)


It’s clear boys grow up faster than girls in the last period. Now, we can define a  binary response variable with a simple code to determinate the sex  and apply a principal component basis to avoid multicollinearity between predictor variables. Then, we can use fregre.glm() function from fda.usc package to performance a logistic regression model with functional predictors.

Y <- c(rep(1,54), rep(0, 39))
X <- cbind(growth$hgtf, growth$hgtm)
Xdata <- fdata(t(X), argvals = growth$age)
basis <- create.pc.basis(Xdata, c(1,2))
basis.x <- list('X'=basis)
ldata <- list(X=Xdata, df=data.frame(Y))

model <- fregre.glm(Y ~ X, data=ldata, family=binomial, basis.x=basis.x)

glm(formula = pf)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.34948 -0.12019 0.01201 0.05903 2.68373

Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.93437 0.88023 2.198 0.027980 *
X.PC1 0.18508 0.05295 3.495 0.000473 ***
X.PC2 -0.63214 0.17357 -3.642 0.000271 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 126.495 on 92 degrees of freedom
Residual deviance: 20.341 on 90 degrees of freedom
AIC: 26.341

Number of Fisher Scoring iterations: 8

The results show a good fit with a high significance of estimated parameters.

plot(model$linear.predictors, model$fitted.values,
main='Linear Predictors Vs. Fitted Values',
col=c(rep(col.girls, 54),rep(col.boys, 39)), pch=16)




Except three bad classified points, two girls and one boy, the fit of the model is satisfactory.