Using splines to check linearity between predictor X and outcome Y {mgcv}

In linear regression analysis, If the predictor X is the numerical variable "calories", then an increase of X from 10 to 30 will have the same impact on Y as an increase from 60 to 80. Similar logic could be applied in a logistic regression or a Poisson regression, where the linearity between X and the Log[prob(y=1)/1-prob(y=1)] (aka logit[prob(Y=1)]) needs to be assessed.

One of the very first steps in using GLM for statistical modeling is to verify the assumption of linearity between all numerical predictors and the outcome. This could be done using a regression spline. A spline is the best smoother that can summarize a relationship between two variables. If it is close to a straight line, the assumption of linearity is verified. If it is monotonic, with only a slight curvature, the assumption can be accepted but the statistical power of the test of association between Y and Xi will be depreciated.

Using the retinol data we are interested to test whether there is a linear relationship between calories and retdiet (amount of retinol consumed):

using the package {mgcv}
>mod<-gam(retdiet~s(calories), data=retinol)
>plot (mod)

It can be observed that the relationship between amount of calories and amount of retinol consumed is not strictly linear.

now let us test the relationship between Y (concentration of plasmatic retinol) and X (amount of retinol consumed)

>mod<-gam(retplasma~s(retdiet), data=retinol)
>plot (mod)


The relationship is clearly linear


*for logit or poisson models, you need to add the option family=binomial or family=poisson to the gam function.

#example
>mod<-gam(binaryvariable~s(age), data=dataset, family=binomial)


ref: Bruno Falissard (The analysis of questionnaire data using R)




Introduction to the Analysis of Survival Data in the Presence of Competing Risks

 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4741409/