Description of the dataset

The dataset was published1 by Kang et al in Plos Computational Biology.

In brief, total mRNA was prepared from Namalwa (Burkitt’s lymphoma), Hs343T (fibroblast line derived from a mammary gland adenocarcinoma), hTERT-HME1 (normal mammary epithelial cells immortalized with hTERT), and MCF7 (estrogen receptor positive breast cancer cell line). mRNA samples were diluted to 100 ng/μl and mixed in different proportions. The mixed RNA samples was profiled by RNA-sequencing. Sequencing libraries were prepared using TruSeq RNA sample preparation kit v2 (Illumina).

Omics_type = tanscriptome

Cancer_type = brca

Cohort_size = 32

Patient_metadata = No

Sample_type = In vitro mixture of cell lines

Preparation of the data

Expression data from array were collected, normalized together using fRMA and transformed using log2.

Normalisation = edgeR

Transformation = none, linear scale

Composition of the test dataset

Transcriptome dataset

## [1] 1
## [1] 56646    32
sample_1 sample_2 sample_3 sample_4 sample_5
BHLHE40 /// DELEC1 0.2723886 0.6524212 0.562481 0.0701092 1.428305
MTARC1 /// MARCHF1 15.8763625 15.8086664 14.033900 14.7930397 13.407634
SEPTIN1 8.1327445 7.5279364 10.743386 11.1824163 4.929955
MARCHF10 1.3619429 1.5055873 1.490574 1.8228390 1.981197
SEPTIN10 34.0874842 30.4128629 24.946030 30.0768438 60.127021
MARCHF11 0.0000000 0.0000000 0.000000 0.0350546 0.000000
SEPTIN11 151.2923955 161.7000733 159.238358 143.1629723 251.427702
SEPTIN12 0.0000000 0.1003725 0.028124 0.0000000 0.000000
SEPTIN14 0.0000000 0.0501862 0.028124 0.0350546 0.000000
MTARC2 /// MARCHF2 11.2846694 11.2417183 9.702796 10.7267065 14.467344

Expected number of cell types

## [1] 4

Cancer type

## [1] "brca"

Composition of the solution dataset (ground truth)

Source = in vitro mixtures

Number of expected cell types = 4

## [1] 1
## [1]  4 32