How to generate a simulated data in R
Package {simstudy}
The below example uses the library(simstudy) to generate a data of 10 observation for 3 variables: nr, x1 and y1. The function defData allows us to define the name, the formula and the distribution of each variable.
Check CRAN webiste for additional documentation on the simstudy package.
```
def <- defData(varname = "nr", dist = "nonrandom", formula = 7, id = "idnum")
def <- defData(def, varname = "x1", dist = "uniform", formula = "10;20")
def <- defData(def, varname = "y1", formula = "nr + x1 * 2", variance = 8)
dt <- genData(10, def)
dt
```
How to convert all characters of a data frame to Numeric
```
char_columns<-sapply(merged_data, is.character) #identify character columns
data_char_num<-merged_data #replicate data
data_char_num[,char_columns]<-as.data.frame(apply(data_char_num[,char_columns],2, as.numeric))#recode character as numeric
sapply(data_char_num, class) #print classes of all columns
```
How to make predictions from a linear model
Note: for numerical values you enter numbers without " ", for factor variables you enter the "label"
```
df<-data.frame(res=c("ModHigh"), age=c(21), sex=("female"), train=c("clinical"), bdi=c(4))
pred<-predict(adj.model6,df, data=resilience)
print(pred)
```
```
df<-data.frame(res=c("ModHigh"), age=c(21), sex=("female"), train=c("clinical"), bdi=c(4))
pred<-predict(adj.model6,df, data=resilience)
print(pred)
```
different usages of tapply
using tapply to run a command, in this case calculating the variance of the variable met within each of the categories of the variable coffee consumption (3 categories)
```
tapply(coffee.exercise$met, coffee.exercise$coffee.consumption, var)
```
rowMeans, rowSums, colMeans, colSums, can do similar functions as apply. These are pretty similar to Stata's egen rowmean and rowtotal functions
Below are some examples:
```
colSums(ham, na.rm = T)
colMeans(ham, na.rm = T)
```
```
tapply(coffee.exercise$met, coffee.exercise$coffee.consumption, var)
```
rowMeans, rowSums, colMeans, colSums, can do similar functions as apply. These are pretty similar to Stata's egen rowmean and rowtotal functions
Below are some examples:
```
colSums(ham, na.rm = T)
colMeans(ham, na.rm = T)
```
About subsetting data in R
Unlike Stata and other statistical package, running a cross tabulation on a subset of data in R is not a very straight forward thing.
Let us assume the following scenario: assuming I want to cross tabulate Sex (M, F) by Tobacco (1 - Current, 2-Ex, 3-Never), but by excluding the Never smoking category. In stata a simple if Tobacco!=3 would suffice. However in R we need to subset the data prior to tabulating it:
```{r}
#subsetting the data
retinol1<-subset(retinol, tabac!=3)
table(retinol1$Sex, retinol1$tabac)
```
However, subsetting can be embedded directly if we are doing univariate analysis:
``` {r}
table(retinol$tabac[retinol$tabac!=3])
```
Update! Turns out that the function xtab has a subset option!
Let us assume the following scenario: assuming I want to cross tabulate Sex (M, F) by Tobacco (1 - Current, 2-Ex, 3-Never), but by excluding the Never smoking category. In stata a simple if Tobacco!=3 would suffice. However in R we need to subset the data prior to tabulating it:
```{r}
#subsetting the data
retinol1<-subset(retinol, tabac!=3)
table(retinol1$Sex, retinol1$tabac)
```
However, subsetting can be embedded directly if we are doing univariate analysis:
``` {r}
table(retinol$tabac[retinol$tabac!=3])
```
Update! Turns out that the function xtab has a subset option!
```
xtabs(~ Sex + tabac, retinol, subset = tabac != 3)
```
How to conduct LOCF inputation in R; library{zoo}
Let us assume that weight has been measured on 624 patients for 4 distinct time points: M0, M1, M3 and M6
``` {r}
library(zoo)
#start by creating the vectors which includes the variables we want to use for imputation
WeightImpute<-cbind(MetSData$POIDS_M0,MetSData$POIDS_M1,MetSData$POIDS_M3,MetSData$POIDS_M6)
#then we rename the columns
colnames(WeightImpute)=c("w0", "w1",
"w3", "w6")
#creating a replicate array to be used within the for loop
WeightImputeF=WeightImpute
#creating an object which is equal to the number of rows within our array (624)
n=dim(WeightImpute)[1]
#creating a counter (1:624) labeling it index
index=which(!is.na(WeightImpute[,1]))
#creating a for loop using the na.locf function from library (zoo) that will carry on the LOCF. ATTENTION: the imputation will be carry out by column, that is the `i' is placed in the row part of the argument
for(i in index){WeightImputeF[i,]=na.locf(WeightImpute[i,])}
WeightImputeF
```
Labeling points on scatter plot
In the below example i show how to assign a label (in this case Casenr) to each data point on my scatter plot.
```
plot(BMI~Age, data=prevend.sample)
text(BMI~Age, labels=Casenr, data=prevend.sample)
```
```
plot(BMI~Age, data=prevend.sample)
text(BMI~Age, labels=Casenr, data=prevend.sample)
```
recode using elseif command
The below code creates a new variable "hdrs1" which takes the values of HAMD17tot_M0 if visit=0, and the values of HAMD17tot_M1 if visit =1 and the values of HAMD17tot_M3 if visit=3
I start by creating a vector than contains the values of HAMD17tot_M0, then I start modiying the vector using the ifelse command: if visit=1 then replace values by those of HAMD17tot_M1, else keep as is.
```
romain1$hdrs1<-romain1$HAMD17tot_M0
romain1$hdrs1<-ifelse(romain1$visit==1,romain1$HAMD17tot_M1, romain1$hdrs1)
romain1$hdrs1<-ifelse(romain1$visit==3,romain1$HAMD17tot_M3, romain1$hdrs1)
```
I start by creating a vector than contains the values of HAMD17tot_M0, then I start modiying the vector using the ifelse command: if visit=1 then replace values by those of HAMD17tot_M1, else keep as is.
```
romain1$hdrs1<-romain1$HAMD17tot_M0
romain1$hdrs1<-ifelse(romain1$visit==1,romain1$HAMD17tot_M1, romain1$hdrs1)
romain1$hdrs1<-ifelse(romain1$visit==3,romain1$HAMD17tot_M3, romain1$hdrs1)
```
running a command on a subset of data (similar to if condition in stata)
``` hist(coffee.exercise$met[coffee.exercise$coffee.consumption=="A"])
hist(coffee.exercise$met[coffee.exercise$coffee.consumption=="B"])
hist(coffee.exercise$met[coffee.exercise$coffee.consumption=="C"])
hist(coffee.exercise$met[coffee.exercise$coffee.consumption=="D"])
````
How to remove NA as a level
by including NA in " " we transform it into a real missing value in an R factor vector
```
HIV_coded$bisexual<- factor(replace(HIV_coded$bisexual, HIV_coded$bisexual == "NA", NA))
```
```
HIV_coded$bisexual<- factor(replace(HIV_coded$bisexual, HIV_coded$bisexual == "NA", NA))
```
Writing loops in R to loop over functions
The below vector is used to loop over 3 columns that I want to cross tabulate against sexual identity:
```{r}
l=c("nationality", "sex_at_birth", "education")
for (i in l){
mytables<-table(HIV_coded$sexual_identity, HIV_coded[,i])
print(mytables)
}
```
The below loop is used to calculate the summary statistics for a series of variables:
```
l=c("age", "age_1")
for (i in l) {
mymeans<-summary(HIV_coded[,i])
print(mymeans)
}
```
```{r}
l=c("nationality", "sex_at_birth", "education")
for (i in l){
mytables<-table(HIV_coded$sexual_identity, HIV_coded[,i])
print(mytables)
}
```
The below loop is used to calculate the summary statistics for a series of variables:
```
l=c("age", "age_1")
for (i in l) {
mymeans<-summary(HIV_coded[,i])
print(mymeans)
}
```
Subscribe to:
Posts (Atom)
Introduction to the Analysis of Survival Data in the Presence of Competing Risks
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4741409/
-
Sometimes when you are running a regression model with variables that have different lengths, r will prompt with the following error message...
-
``` library(memisc) labels(anemia$anem)<-c("not anemic"=0, "anemic"=1) #library memisc ```