ABOUT
Motivation
Nowadays, knowledge extraction methods from Next Generation Sequencing data are highly requested. In this project, we focus on RNA-seq gene expression analysis and specifically on case–control studies with rule-based supervised classification algorithms that build a model able to discriminate cases from controls. State of the art algorithms compute a single classification model that contains few features (genes). On the contrary, our goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the genes related to the predicted class.
CAMUR
Specialized in Genomic Data Classification
CAMUR (Classifier with Alternative and MUltiple Rule-based models) is a new method and software package able to extract multiple, alternative, and equivalent classification models. CAMUR iteratively computes a rule-based classification model, calculates the power set (or a partial combination) of the features present in the rules, iteratively eliminates those combinations from the data set, and performs again the classification procedure until a stopping criterion is verified. CAMUR includes an ad-hoc knowledge repository (database) and a complete querying tool.
CamurWeb
CamurWeb, with a simple interface, allows the user to use Camur to categorize large amounts of data quickly, intuitively and directly from the browser. With CamurWeb, it is not required download the software or manage configurations.
Cancer Classifications with RNA-Seq
This section shows the results of the RNA-seq classification related to 21 different cancer samples. Experimental data has been exacted from GDC Portal that contains clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes.
For each data set has been considered the FPKM values (Fragments Per Kilobase per Million mapped reads), which normalizes the gene raw counts by considering the length of the gene and the total number of the fragments.