Description of the dataset
The dataset was published1 by Kang et al in Plos Computational Biology.
In brief, total mRNA was prepared from Namalwa (Burkitt’s lymphoma), Hs343T (fibroblast line derived from a mammary gland adenocarcinoma), hTERT-HME1 (normal mammary epithelial cells immortalized with hTERT), and MCF7 (estrogen receptor positive breast cancer cell line). mRNA samples were diluted to 100 ng/μl and mixed in different proportions. The mixed RNA samples was profiled by RNA-sequencing. Sequencing libraries were prepared using TruSeq RNA sample preparation kit v2 (Illumina).
Omics_type
= tanscriptome
Cancer_type
= brca
Cohort_size
= 32
Patient_metadata
= No
Sample_type
= In vitro mixture of cell lines
Preparation of the data
Expression data from array were collected, normalized together using fRMA and transformed using log2.
Normalisation
= edgeR
Transformation
= Log2 + 1 (pseudo-log2)
Composition of the test dataset
Transcriptome dataset
## [1] 1
## [1] 56646 32
colnames(test_data[[1]]) = paste0("sample_",1:dim(test_data[[1]])[2])
knitr::kable(head(test_data[[1]][,1:5], 10))
sample_1 | sample_2 | sample_3 | sample_4 | sample_5 | |
---|---|---|---|---|---|
BHLHE40 /// DELEC1 | 0.3475393 | 0.7245814 | 0.6438386 | 0.0977580 | 1.279950 |
MTARC1 /// MARCHF1 | 4.0769321 | 4.0711334 | 3.9101474 | 3.9812170 | 3.848762 |
SEPTIN1 | 3.1910485 | 3.0921967 | 3.5537766 | 3.6067284 | 2.568021 |
MARCHF10 | 1.2399741 | 1.3251488 | 1.3164786 | 1.4971469 | 1.575892 |
SEPTIN10 | 5.1328846 | 4.9732835 | 4.6974419 | 4.9577681 | 5.933738 |
MARCHF11 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0497069 | 0.000000 |
SEPTIN11 | 7.2507001 | 7.3460711 | 7.3240757 | 7.1715569 | 7.979726 |
SEPTIN12 | 0.0000000 | 0.1379920 | 0.0400143 | 0.0000000 | 0.000000 |
SEPTIN14 | 0.0000000 | 0.0706452 | 0.0400143 | 0.0497069 | 0.000000 |
MTARC2 /// MARCHF2 | 3.6187871 | 3.6137342 | 3.4199159 | 3.5517260 | 3.951154 |
Composition of the solution dataset (ground truth)
Source
= in vitro mixtures
Number of expected cell types
= 4
## [1] 1
## [1] 4 32