Description of the dataset

5 different cell populations present in pancreatic tumors were considered. It has been described in the deconbench project1 and published in BioRxiv

Raw transcriptome and methylome profiles of these different cell populations were extracted from various sources (PDX model, tissues or isolated cells).

In silico Dirichlet distribution have been used based on realistic proportions defined by the anatomopathologist expertise (Jerome Cros).

Transcriptome of in silico mixtures from pancreatic tumors were obtained by considering \(D = T A\), with \(T\) the cell-type profiles (matrix of size \(M * K\), with \(M\) the number of features and \(K=5\) the number of cell types) and \(A\) the cell-type proportion per patient (matrix of size \(K * N\), with \(N=30\) the number of samples) common between both omics.

Omics_type = trancriptome

Cancer_type = paad

Cohort_size = 30

Patient_metadata = No

Sample_type = In silico mixture of cell lines, PDX derived cells and FFPE tissues

Preparation of the data

Raw cell type profile matrices were preprocessed (Feature filtering, normalization, signal transformation, sample aggregation) to avoid any batch effect.

Feature filtering = selection of protein coding genes (hg38)

Normalisation = edgeR

Transformation = none, linear scale

Aggregation = median

Composition of the test dataset

Transcriptome dataset

## [1] 5
## [1] 21566    30
sample_1 sample_2 sample_3 sample_4 sample_5
TSPAN6 25.5428504 23.8107563 24.6858019 24.069288 28.5945215
TNMD 0.0084628 0.0058332 0.0046414 0.006628 0.0060237
DPM1 43.6701170 36.8306077 38.5186138 38.560100 46.2323510
SCYL3 11.0820191 13.4562352 16.2704194 15.473919 11.9509177
C1orf112 8.6083792 7.6278147 10.4545485 10.163029 8.8729509
FGR 29.3553811 19.3349987 10.9988095 18.319123 21.7318955
CFH 49.4560793 43.5173661 41.0245146 43.126309 39.9683816
FUCA2 70.5025889 64.0591376 69.3969083 69.593649 66.7881289
GCLC 24.8261538 25.7357443 25.8552780 25.899016 27.0125506
NFYA 26.1139828 25.7320105 26.5634747 26.733890 24.4848352

Expected number of cell types

## [1] 5

Cancer type

## [1] "paad"

Composition of the solution dataset (ground truth)

Source = in silico simulations

Number of expected cell types = 5

5 independant proportion matrixces and corresponding complex expression matrices have been generated to score the algorithm performances.

## [1] 5
## [1]  5 30