Skip to contents
  • The goal of SelectSim pacakge is to implement the methodology to infer functional inter-dependencies between functional alterations in cancer.
  • SelectSim package provides function to generate the backgorund model and other utilites functions.

Installation

  • You can install the development version of SelectSim from GitHub with:
# install.packages("devtools")
devtools::install_github("CSOgroup/SelectSim",dependencies = TRUE, build_vignettes = TRUE)

Example

  • We will run SelectSim algorithm on processed LUAD dataset from TCGA provided with the package.
  • Note: This an example for running a processed data. Check other vignette to process the data to create the run_data object needed as input for SelectSim algorithm.
library(SelectSim)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
## Load the data provided with the package
data(luad_run_data, package = "SelectSim")

Data Description & Format

  • The loaded data is list object which consists of
    • M: a list object of GAMs which is presence absence matrix of alterations
    • tmb: a list object of tumor mutation burden as data frame with column names (should be) as sample and mutationn
    • sample.class a named vector of sample annotations
    • alteration.class a named vector of alteration annotations
# Check the data strucutre
str(luad_run_data)
#> List of 3
#>  $ M               :List of 2
#>   ..$ M  :List of 2
#>   .. ..$ missense  : num [1:396, 1:502] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:396] "AKT1" "ALK" "APC" "AR" ...
#>   .. .. .. ..$ : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   .. ..$ truncating: num [1:396, 1:502] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. ..$ : chr [1:396] "AKT1" "ALK" "APC" "AR" ...
#>   .. .. .. ..$ : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   ..$ tmb:List of 2
#>   .. ..$ missense  :'data.frame':    502 obs. of  2 variables:
#>   .. .. ..$ sample  : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   .. .. ..$ mutation: num [1:502] 163 253 270 1328 100 ...
#>   .. ..$ truncating:'data.frame':    502 obs. of  2 variables:
#>   .. .. ..$ sample  : chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>   .. .. ..$ mutation: num [1:502] 24 45 40 206 17 18 73 31 176 108 ...
#>  $ sample.class    : Named chr [1:502] "LUAD" "LUAD" "LUAD" "LUAD" ...
#>   ..- attr(*, "names")= chr [1:502] "TCGA-05-4244-01" "TCGA-05-4249-01" "TCGA-05-4250-01" "TCGA-05-4382-01" ...
#>  $ alteration.class: Named chr [1:396] "MUT" "MUT" "MUT" "MUT" ...
#>   ..- attr(*, "names")= chr [1:396] "AKT1" "ALK" "APC" "AR" ...

Running SelectX

  • We use the function selectX() which generates the background model and results.
  • The parameters for the functions are:
    • M: the list object of GAMs & TMB
    • sample.class: a named vector of samples with covariates
    • alteration.class: a named vector of alteration with covariates
    • min.freq: Number of samples a gene should be mutated in atleast
    • n.permut: Number of simulation to do
    • lambda: Penalty factor used in computing penalty vector
    • tao: Fold chnage factor used in computing penalty vector
    • maxFDR: FDR rate to call significnat results
  • The function returns a list object which contains the background model and results.
result_obj<- selectX(  M = luad_run_data$M,
                      sample.class = luad_run_data$sample.class,
                      alteration.class = luad_run_data$alteration.class,
                      n.cores = 1,
                      min.freq = 10,
                      n.permut = 1000,
                      lambda = 0.3,
                      tao = 1,
                      save.object = FALSE,
                      verbose = FALSE,
                      estimate_pairwise = FALSE,
                      maxFDR = 0.25)
#> Total Time: 4.34 sec elapsed

Intrepreting the results

  • Lets look into the results
head(result_obj$result[,1:10],n=5)
#>              SFE_1 SFE_2         name support_1 support_2     freq_1    freq_2
#> KRAS - TP53   KRAS  TP53  KRAS - TP53       154       221 0.30677291 0.4402390
#> EGFR - KRAS   EGFR  KRAS  EGFR - KRAS        57       154 0.11354582 0.3067729
#> STK11 - TP53 STK11  TP53 STK11 - TP53        59       221 0.11752988 0.4402390
#> BRAF - KRAS   BRAF  KRAS  BRAF - KRAS        35       154 0.06972112 0.3067729
#> KRAS - STK11  KRAS STK11 KRAS - STK11       154        59 0.30677291 0.1175299
#>              overlap  w_overlap max_overlap
#> KRAS - TP53       49 35.8760174         154
#> EGFR - KRAS        0  0.0000000          57
#> STK11 - TP53      13  9.5456386          59
#> BRAF - KRAS        2  0.9821429          35
#> KRAS - STK11      28 25.9230769          59
Filtering significant hits
# Filtering significant hits and counting EDs
result_obj$result %>% filter(nFDR2<=0.25) %>% head(n=2)
#>             SFE_1 SFE_2        name support_1 support_2    freq_1    freq_2
#> KRAS - TP53  KRAS  TP53 KRAS - TP53       154       221 0.3067729 0.4402390
#> EGFR - KRAS  EGFR  KRAS EGFR - KRAS        57       154 0.1135458 0.3067729
#>             overlap w_overlap max_overlap freq_overlap r_overlap w_r_overlap
#> KRAS - TP53      49  35.87602         154    0.3181818  99.02908    58.69631
#> EGFR - KRAS       0   0.00000          57    0.0000000  31.22148    16.85637
#>                   wES wFDR       nES mean_r_nES nFDR cum_freq nFDR2 type  FDR
#> KRAS - TP53 -16.13638    0 -14.18270  -1.953683    0      375     0   ME TRUE
#> EGFR - KRAS -11.91926    0 -10.72977  -1.189488    0      211     0   ME TRUE
result_obj$result %>% filter(nFDR2<=0.25) %>% count(type)
#>   type  n
#> 1   CO 13
#> 2   ME 30
Plotting a scatter plot of co-mutation
# Filtering significant hits and plotting
options(repr.plot.width = 7, repr.plot.height = 7)
obs_exp_scatter(result = result_obj$result,title = 'TCGA LUAD')

SessionInfo

# Print the sessionInfo
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-conda-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /mnt/ndata/arvind/envs/R_4/lib/libopenblasp-r0.3.25.so;  LAPACK version 3.11.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Zurich
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4       SelectSim_0.0.1.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.4       xfun_0.41          bslib_0.6.1        ggplot2_3.4.4     
#>  [5] rstatix_0.7.2      lattice_0.22-5     vctrs_0.6.5        tools_4.3.2       
#>  [9] generics_0.1.3     parallel_4.3.2     tibble_3.2.1       fansi_1.0.6       
#> [13] highr_0.10         pkgconfig_2.0.3    Matrix_1.6-5       desc_1.4.3        
#> [17] ggridges_0.5.5     rngtools_1.5.2     RcppParallel_5.1.6 lifecycle_1.0.4   
#> [21] farver_2.1.1       compiler_4.3.2     stringr_1.5.1      tictoc_1.2        
#> [25] textshaping_0.3.7  munsell_0.5.0      codetools_0.2-19   carData_3.0-5     
#> [29] htmltools_0.5.7    sass_0.4.8         yaml_2.3.8         pillar_1.9.0      
#> [33] pkgdown_2.0.7      car_3.1-2          ggpubr_0.6.0       jquerylib_0.1.4   
#> [37] tidyr_1.3.0        cachem_1.0.8       doRNG_1.8.6        iterators_1.0.14  
#> [41] abind_1.4-5        foreach_1.5.2      tidyselect_1.2.0   digest_0.6.34     
#> [45] stringi_1.8.3      reshape2_1.4.4     purrr_1.0.2        fastmap_1.1.1     
#> [49] grid_4.3.2         colorspace_2.1-0   cli_3.6.2          magrittr_2.0.3    
#> [53] Rfast_2.1.0        utf8_1.2.4         broom_1.0.5        withr_3.0.0       
#> [57] scales_1.3.0       backports_1.4.1    RcppZiggurat_0.1.6 rmarkdown_2.25    
#> [61] ggsignif_0.6.4     ragg_1.2.7         memoise_2.0.1      evaluate_0.23     
#> [65] knitr_1.45         doParallel_1.0.17  rlang_1.1.3        Rcpp_1.0.12       
#> [69] glue_1.7.0         jsonlite_1.8.8     R6_2.5.1           plyr_1.8.9        
#> [73] systemfonts_1.0.5  fs_1.6.3