  • The goal of SelectSim pacakge is to implement the methodology to infer functional inter-dependencies between functional alterations in cancer.
  • SelectSim package provides function to generate the backgorund model and other utilites functions.


  • You can install the development version of SelectSim from GitHub with:
# install.packages("devtools")
devtools::install_github("CSOgroup/SelectSim",dependencies = TRUE, build_vignettes = TRUE)


  • We will run SelectSim algorithm on processed LUAD dataset from TCGA provided with the package.
  • Note: This an example for running a processed data. Check other vignette to process the data to create the run_data object needed as input for SelectSim algorithm.
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union
## Load the data provided with the package
data(luad_run_data, package = "SelectSim")

Data Description & Format

  • The loaded data is list object which consists of
    • M: a list object of GAMs which is presence absence matrix of alterations
    • tmb: a list object of tumor mutation burden as data frame with column names (should be) as sample and mutationn
    • sample.class a named vector of sample annotations
    • alteration.class a named vector of alteration annotations
# Check the data strucutre
#> List of 3
#>  $ M               :List of 2
#>   ..$ M  :List of 2
#>   .. ..$ missense  : num [1:396, 1:502] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:396] "AKT1" "ALK" "APC" "AR" ...
#>   .. .. .. ..$ : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   .. ..$ truncating: num [1:396, 1:502] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:396] "AKT1" "ALK" "APC" "AR" ...
#>   .. .. .. ..$ : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   ..$ tmb:List of 2
#>   .. ..$ missense  :'data.frame':    502 obs. of  2 variables:
#>   .. .. ..$ sample  : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   .. .. ..$ mutation: num [1:502] 163 253 270 1328 100 ...
#>   .. ..$ truncating:'data.frame':    502 obs. of  2 variables:
#>   .. .. ..$ sample  : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   .. .. ..$ mutation: num [1:502] 24 45 40 206 17 18 73 31 176 108 ...
#>  $ sample.class    : Named chr [1:502] "LUAD" "LUAD" "LUAD" "LUAD" ...
#>   ..- attr(*, "names")= chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>  $ alteration.class: Named chr [1:396] "MUT" "MUT" "MUT" "MUT" ...
#>   ..- attr(*, "names")= chr [1:396] "AKT1" "ALK" "APC" "AR" ...

Running SelectX

  • We use the function selectX() which generates the background model and results.
  • The parameters for the functions are:
    • M: the list object of GAMs & TMB
    • sample.class: a named vector of samples with covariates
    • alteration.class: a named vector of alteration with covariates
    • min.freq: Number of samples a gene should be mutated in atleast
    • n.permut: Number of simulation to do
    • lambda: Penalty factor used in computing penalty vector
    • tao: Fold chnage factor used in computing penalty vector
    • maxFDR: FDR rate to call significnat results
  • The function returns a list object which contains the background model and results.
result_obj<- selectX(  M = luad_run_data$M,
                      sample.class = luad_run_data$sample.class,
                      alteration.class = luad_run_data$alteration.class,
                      n.cores = 1,
                      min.freq = 10,
                      n.permut = 1000,
                      lambda = 0.3,
                      tao = 1,
                      save.object = FALSE,
                      verbose = FALSE,
                      estimate_pairwise = FALSE,
                      maxFDR = 0.25)
#> Total Time: 4.34 sec elapsed

Intrepreting the results

  • Lets look into the results
#>              SFE_1 SFE_2         name support_1 support_2     freq_1    freq_2
#> KRAS - TP53   KRAS  TP53  KRAS - TP53       154       221 0.30677291 0.4402390
#> EGFR - KRAS   EGFR  KRAS  EGFR - KRAS        57       154 0.11354582 0.3067729
#> STK11 - TP53 STK11  TP53 STK11 - TP53        59       221 0.11752988 0.4402390
#> BRAF - KRAS   BRAF  KRAS  BRAF - KRAS        35       154 0.06972112 0.3067729
#> KRAS - STK11  KRAS STK11 KRAS - STK11       154        59 0.30677291 0.1175299
#>              overlap  w_overlap max_overlap
#> KRAS - TP53       49 35.8760174         154
#> EGFR - KRAS        0  0.0000000          57
#> STK11 - TP53      13  9.5456386          59
#> BRAF - KRAS        2  0.9821429          35
#> KRAS - STK11      28 25.9230769          59
Filtering significant hits
# Filtering significant hits and counting EDs
result_obj$result %>% filter(nFDR2<=0.25) %>% head(n=2)
#>             SFE_1 SFE_2        name support_1 support_2    freq_1    freq_2
#> KRAS - TP53  KRAS  TP53 KRAS - TP53       154       221 0.3067729 0.4402390
#> EGFR - KRAS  EGFR  KRAS EGFR - KRAS        57       154 0.1135458 0.3067729
#>             overlap w_overlap max_overlap freq_overlap r_overlap w_r_overlap
#> KRAS - TP53      49  35.87602         154    0.3181818  99.02908    58.69631
#> EGFR - KRAS       0   0.00000          57    0.0000000  31.22148    16.85637
#>                   wES wFDR       nES mean_r_nES nFDR cum_freq nFDR2 type  FDR
#> KRAS - TP53 -16.13638    0 -14.18270  -1.953683    0      375     0   ME TRUE
#> EGFR - KRAS -11.91926    0 -10.72977  -1.189488    0      211     0   ME TRUE
result_obj$result %>% filter(nFDR2<=0.25) %>% count(type)
#>   type  n
#> 1   CO 13
#> 2   ME 30
Plotting a scatter plot of co-mutation
# Filtering significant hits and plotting
options(repr.plot.width = 7, repr.plot.height = 7)
obs_exp_scatter(result = result_obj$result,title = 'TCGA LUAD')


# Print the sessionInfo
