Uses 1860 CpGs to predict self-reported ethnicity on placental microarray data.

predictEthnicity(betas, threshold = 0.75, force = FALSE)

Arguments

betas

n x m dataframe of methylation values on the beta scale (0, 1), where the variables are arranged in rows, and samples in columns. Should contain all 1860 predictors and be normalized with NOOB and BMIQ.

threshold

A probability threshold ranging from (0, 1) to call samples 'ambiguous'. Defaults to 0.75.

force

run even if missing predictors. Default is FALSE.

Value

a tibble

Details

Predicts self-reported ethnicity from 3 classes: Africans, Asians, and Caucasians, using placental DNA methylation data measured on the Infinium 450k/EPIC methylation array. Will return membership probabilities that often reflect genetic ancestry composition.

The input data should contain all 1860 predictors (cpgs) of the final GLMNET model.

It's recommended to use the same normalization methods used on the training data: NOOB and BMIQ.

Examples

## To predict ethnicity on 450k/850k samples

# Load placenta DNAm data
data(plBetas)
predictEthnicity(plBetas)
#> 1860 of 1860 predictors present.
#> # A tibble: 24 × 7
#>    Sample_ID  Predicted_ethnicity_…¹ Predicted_ethnicity Prob_African Prob_Asian
#>    <chr>      <chr>                  <chr>                      <dbl>      <dbl>
#>  1 GSM1944936 Caucasian              Caucasian               0.00331    0.0164  
#>  2 GSM1944939 Caucasian              Caucasian               0.000772   0.000514
#>  3 GSM1944942 Caucasian              Caucasian               0.000806   0.000699
#>  4 GSM1944944 Caucasian              Caucasian               0.000883   0.000792
#>  5 GSM1944946 Caucasian              Caucasian               0.000885   0.00130 
#>  6 GSM1944948 Caucasian              Caucasian               0.000852   0.000973
#>  7 GSM1944949 Caucasian              Caucasian               0.000902   0.00176 
#>  8 GSM1944950 Caucasian              Caucasian               0.00174    0.00223 
#>  9 GSM1944951 Caucasian              Caucasian               0.000962   0.00231 
#> 10 GSM1944952 Caucasian              Caucasian               0.00287    0.00356 
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹​Predicted_ethnicity_nothresh
#> # ℹ 2 more variables: Prob_Caucasian <dbl>, Highest_Prob <dbl>