How to generate a simulated data in R

Package {simstudy}

The below example uses the library(simstudy) to generate a data of 10 observation for 3 variables: nr, x1 and y1. The function defData allows us to define the name, the formula and the distribution of each variable.
Check CRAN webiste for additional documentation on the simstudy package.

def <- defData(varname = "nr", dist = "nonrandom", formula = 7, id = "idnum")
def <- defData(def, varname = "x1", dist = "uniform", formula = "10;20")
def <- defData(def, varname = "y1", formula = "nr + x1 * 2", variance = 8)

dt <- genData(10, def)



How to convert all characters of a data frame to Numeric

char_columns<-sapply(merged_data, is.character) #identify character columns
data_char_num<-merged_data #replicate data
data_char_num[,char_columns]<[,char_columns],2, as.numeric))#recode character as numeric
sapply(data_char_num, class) #print classes of all columns

How to make predictions from a linear model

Note: for numerical values you enter numbers without " ", for factor variables you enter the "label"

df<-data.frame(res=c("ModHigh"), age=c(21), sex=("female"), train=c("clinical"), bdi=c(4))
pred<-predict(adj.model6,df, data=resilience)

Apply Function in R | R Tutorial 1.15 | MarinStatsLectures

different usages of tapply

using tapply to run a command, in this case calculating the variance of the variable met within each of the categories of the variable coffee consumption (3 categories)

tapply(coffee.exercise$met, coffee.exercise$coffee.consumption, var)

rowMeans, rowSums, colMeans, colSums, can do similar functions as apply. These are pretty similar to Stata's egen rowmean and rowtotal functions

Below are some examples:
colSums(ham, na.rm = T)
colMeans(ham, na.rm = T)

About subsetting data in R

Unlike Stata and other statistical package, running a cross tabulation on a subset of data in R is not a very straight forward thing. 
Let us assume the following scenario: assuming I want to cross tabulate Sex (M, F) by Tobacco (1 - Current, 2-Ex, 3-Never), but by excluding the Never smoking category. In stata a simple if Tobacco!=3 would suffice. However in R we need to subset the data prior to tabulating it:

#subsetting the data
retinol1<-subset(retinol, tabac!=3)
table(retinol1$Sex, retinol1$tabac)

However, subsetting can be embedded directly if we are doing univariate analysis:

``` {r}


Update! Turns out that the function xtab has a subset option!

xtabs(~ Sex + tabac, retinol, subset = tabac != 3)

How to conduct LOCF inputation in R; library{zoo}

Let us assume that weight has been measured on 624 patients for 4 distinct time points: M0, M1, M3 and M6

``` {r} 
#start by creating the vectors which includes the variables we want to use for imputation
#then we rename the columns
colnames(WeightImpute)=c("w0", "w1", "w3", "w6")

#creating a replicate array to be used within the for loop

#creating an object which is equal to the number of rows within our array (624)

#creating a counter (1:624) labeling it index

#creating a for loop using the na.locf function from library (zoo) that will carry on the LOCF. ATTENTION: the imputation will be carry out by column, that is the `i' is placed in the row part of the argument 
for(i in index){WeightImputeF[i,]=na.locf(WeightImpute[i,])}


Labeling points on scatter plot

In the below example i show how to assign a label (in this case Casenr) to each data point on my scatter plot.

plot(BMI~Age, data=prevend.sample)
text(BMI~Age, labels=Casenr, data=prevend.sample)

recode using elseif command

The below code creates a new variable "hdrs1" which takes the values of HAMD17tot_M0 if visit=0, and the values of HAMD17tot_M1 if visit =1 and the values of HAMD17tot_M3 if visit=3

I start by creating a vector than contains the values of HAMD17tot_M0, then I start modiying the vector using the ifelse command: if visit=1 then replace values by those of HAMD17tot_M1, else keep as is.

romain1$hdrs1<-ifelse(romain1$visit==1,romain1$HAMD17tot_M1, romain1$hdrs1)
romain1$hdrs1<-ifelse(romain1$visit==3,romain1$HAMD17tot_M3, romain1$hdrs1)

running a command on a subset of data (similar to if condition in stata)

``` hist(coffee.exercise$met[coffee.exercise$coffee.consumption=="A"])

How to remove NA as a level

by including NA in " " we transform it into a real missing value in an R factor vector

HIV_coded$bisexual<- factor(replace(HIV_coded$bisexual, HIV_coded$bisexual == "NA", NA))

Writing loops in R to loop over functions

The below vector is used to loop over 3 columns that I want to cross tabulate against sexual identity:

l=c("nationality", "sex_at_birth", "education")
for (i in l){
 mytables<-table(HIV_coded$sexual_identity, HIV_coded[,i])

The below loop is used to calculate the summary statistics for a series of variables:
l=c("age", "age_1")
for (i in l) {

