Description of the dataset

The dataset was published1 by Kang et al in Plos Computational Biology.

In brief, total mRNA was prepared from Namalwa (Burkitt’s lymphoma), Hs343T (fibroblast line derived from a mammary gland adenocarcinoma), hTERT-HME1 (normal mammary epithelial cells immortalized with hTERT), and MCF7 (estrogen receptor positive breast cancer cell line). The RNA samples was profiled by RNA-sequencing in duplicates.

Omics_type = transcriptome

Cancer_type = brca

Cohort_size = 30

Patient_metadata = No

Sample_type = In silico mixture of cell lines

Preparation of the data

Expression data from array were collected, normalized together using fRMA and transformed using log2.

Normalisation = edgeR

Transformation = none, linear scale

Aggregation = median

Composition of the test dataset

Transcriptome dataset

## [1] 5
## [1] 56646    30
sample_1 sample_2 sample_3 sample_4 sample_5
BHLHE40 /// DELEC1 0.6828031 0.9473202 0.5441742 0.1245747 0.8655587
MTARC1 /// MARCHF1 6.6497089 6.9352935 7.7606798 7.3697662 8.6457317
SEPTIN1 15.9085347 12.7724058 14.0699822 15.0968837 13.9262235
MARCHF10 0.6418168 1.3160400 0.9932912 0.7749461 1.4025259
SEPTIN10 91.3654553 97.4414434 85.2664554 86.3254491 77.3517607
MARCHF11 0.0268714 0.1374658 0.0407050 0.0279064 0.1238828
SEPTIN11 189.3891755 247.8283085 195.6794356 185.6063814 225.4692756
SEPTIN12 0.0289296 0.0809382 0.0284863 0.1438283 0.0225955
SEPTIN14 0.2670606 0.0037391 0.2080682 0.1044387 0.2188631
MTARC2 /// MARCHF2 8.6032742 10.7252242 9.2278913 8.7357587 10.3554631

Expected number of cell types

## [1] 4

Cancer type

## [1] "brca"

Composition of the solution dataset (ground truth)

Source = in silico simulations

Number of expected cell types = 4

5 independant proportion matrices and corresponding complex expression matrices have been generated to score the algorithm performances.

## [1] 5
## [1]  4 30