Functional Logistic Regression Model

on

When we have a set of curves observed in a finite set of time points from different sample individuals we are talking about functional data. The FDA, or functional data analysis studies how to reconstruct the true functional form of the data, and creates a basis expansion to achieve it. For another hand, glm models can be applied in this way, to predict a binary response variable from a  functional predictor. A classic example about berkeley growth study is presented here to show the utility of functional analysis models.

We are going to use fda and fda.usc library, where is contained growth dataset, and RColorBrewer for set amazing colors.


library(fda)
library(fda.usc)
library(RColorBrewer)
palette(brewer.pal(8,'Accent')) 
col.boys <- '#386CB075'; col.girls <- '#F0027F75'

The dataset is a list containing the heights of 39 boys and 54 girls from age 1 to 18 and the ages at which they were collected. We’re going to apply smooth basis for mitigate roughness of data.

sm.growth.girls <- with(growth, smooth.basisPar(argvals=age, y=hgtf, lambda=0.1))
sm.growth.boys <- with(growth, smooth.basisPar(argvals=age, y=hgtm, lambda=0.1))

par(mfrow=c(1,2)) # En diferentes gráficos
plot(sm.growth.girls$fd, xlab="age", ylab="height (cm)",
main="Girls' Growth",lty=1, lwd=2, ylim=c(75,200))
plot(sm.growth.boys$fd, xlab="age", ylab="height (cm)",
main="Boys' Growth",lty=1, lwd=2, ylim=c(75,200))

par(mfrow=c(1,1)) # En un mismo gráfico
plot(sm.growth.boys$fd, xlab="age", ylab="height (cm)",
main="Berkeley Growth Study data",
lty=1, lwd=2.5, col=col.boys)
lines(sm.growth.girls$fd, lty=1, lwd=2.5, col=col.girls)

Rplot01Rplot

It’s clear boys grow up faster than girls in the last period. Now, we can define a  binary response variable with a simple code to determinate the sex  and apply a principal component basis to avoid multicollinearity between predictor variables. Then, we can use fregre.glm() function from fda.usc package to performance a logistic regression model with functional predictors.


Y <- c(rep(1,54), rep(0, 39))
X <- cbind(growth$hgtf, growth$hgtm)
Xdata <- fdata(t(X), argvals = growth$age)
basis <- create.pc.basis(Xdata, c(1,2))
basis.x <- list('X'=basis)
ldata <- list(X=Xdata, df=data.frame(Y))

model <- fregre.glm(Y ~ X, data=ldata, family=binomial, basis.x=basis.x)
summary.glm(model)

Call:
glm(formula = pf)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.34948 -0.12019 0.01201 0.05903 2.68373

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.93437 0.88023 2.198 0.027980 *
X.PC1 0.18508 0.05295 3.495 0.000473 ***
X.PC2 -0.63214 0.17357 -3.642 0.000271 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 126.495 on 92 degrees of freedom
Residual deviance: 20.341 on 90 degrees of freedom
AIC: 26.341

Number of Fisher Scoring iterations: 8

The results show a good fit with a high significance of estimated parameters.

plot(model$linear.predictors, model$fitted.values,
main='Linear Predictors Vs. Fitted Values',
col=c(rep(col.girls, 54),rep(col.boys, 39)), pch=16)

 

 

Rplot02

Except three bad classified points, two girls and one boy, the fit of the model is satisfactory.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s