R/predictEthnicity.R
predictEthnicity.Rd
Uses 1860 CpGs to predict self-reported ethnicity on placental microarray data.
predictEthnicity(betas, threshold = 0.75, force = FALSE)
n x m dataframe of methylation values on the beta scale (0, 1), where the variables are arranged in rows, and samples in columns. Should contain all 1860 predictors and be normalized with NOOB and BMIQ.
A probability threshold ranging from (0, 1) to call samples 'ambiguous'. Defaults to 0.75.
run even if missing predictors. Default is FALSE
.
a tibble
Predicts self-reported ethnicity from 3 classes: Africans, Asians, and Caucasians, using placental DNA methylation data measured on the Infinium 450k/EPIC methylation array. Will return membership probabilities that often reflect genetic ancestry composition.
The input data should contain all 1860 predictors (cpgs) of the final GLMNET model.
It's recommended to use the same normalization methods used on the training data: NOOB and BMIQ.
## To predict ethnicity on 450k/850k samples
# Load placenta DNAm data
data(plBetas)
predictEthnicity(plBetas)
#> 1860 of 1860 predictors present.
#> # A tibble: 24 × 7
#> Sample_ID Predicted_ethnicity_…¹ Predicted_ethnicity Prob_African Prob_Asian
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 GSM1944936 Caucasian Caucasian 0.00331 0.0164
#> 2 GSM1944939 Caucasian Caucasian 0.000772 0.000514
#> 3 GSM1944942 Caucasian Caucasian 0.000806 0.000699
#> 4 GSM1944944 Caucasian Caucasian 0.000883 0.000792
#> 5 GSM1944946 Caucasian Caucasian 0.000885 0.00130
#> 6 GSM1944948 Caucasian Caucasian 0.000852 0.000973
#> 7 GSM1944949 Caucasian Caucasian 0.000902 0.00176
#> 8 GSM1944950 Caucasian Caucasian 0.00174 0.00223
#> 9 GSM1944951 Caucasian Caucasian 0.000962 0.00231
#> 10 GSM1944952 Caucasian Caucasian 0.00287 0.00356
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹Predicted_ethnicity_nothresh
#> # ℹ 2 more variables: Prob_Caucasian <dbl>, Highest_Prob <dbl>