“Translational Bioinformatics” collection for PLOS cBio

A review collection in current approach in Translational Bioinformatics.

=======================================================

COVER
Image Credit: PLOS
Issue Image

‘Translational Bioinformatics’ is a collection of PLOS Computational Biology Education articles which reads as a “book” to be used as a reference or tutorial for a graduate level introductory course on the science of translational bioinformatics.

Translational bioinformatics is an emerging field that addresses the current challenges of integrating increasingly voluminous amounts of molecular and clinical data. Its aim is to provide a better understanding of the molecular basis of disease, which in turn will inform clinical practice and ultimately improve human health.

The concept of a translational bioinformatics introductory book was originally conceived in 2009 by Jake Chen and Maricel Kann. Each chapter was crafted by leading experts who provide a solid introduction to the topics covered, complete with training exercises and answers. The rapid evolution of this field is expected to lead to updates and new chapters that will be incorporated into this collection.

Collection editors: Maricel Kann, Guest Editor, and Fran Lewitter, PLOS Computational Biology Education Editor.

Download the full Translational Bioinformatics collection here: PDF

Collection URL: www.ploscollections.org/translationalbioinformatics

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002796

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002826

Chapter 2: Data-Driven View of Disease Biology

Casey S. Greene, Olga G. Troyanskaya

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002816

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002805

Chapter 4: Protein Interactions and Disease

Mileidy W. Gonzalez, Maricel G. Kann

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002819

Chapter 5: Network Biology Approach to Complex Diseases

Dong-Yeon Cho, Yoo-Ah Kim, Teresa M. Przytycka

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002820

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002821

Chapter 7: Pharmacogenomics

Konrad J. Karczewski, Roxana Daneshjou, Russ B. Altman

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002817

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002858

Chapter 9: Analyses Using Disease Ontologies

Nigam H. Shah, Tyler Cole, Mark A. Musen

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002827

Chapter 10: Mining Genome-Wide Genetic Markers

Xiang Zhang, Shunping Huang, Zhaojun Zhang, Wei Wang

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002828

Chapter 11: Genome-Wide Association Studies

William S. Bush, Jason H. Moore

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002822

Chapter 12: Human Microbiome Analysis

Xochitl C. Morgan, Curtis Huttenhower

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002808

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002823

Chapter 14: Cancer Genome Analysis

Miguel Vazquez, Victor de la Torre, Alfonso Valencia

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002824

 

Use prior information to prognostic biomaker or not?

In our recent publication in BMC bioinformatics, we acompared a great deal of feature selection methods to finding prognostic biomakers in 6 breast cancer gene expresion data. No methods show significant performacne in prediction accuracy, feature selection stability and  biogical interprety, which against previeous reseach results: current network-based appraoch did not show much benift in our analysis. Meanwhile, A group from NKI also show the simliar results in PloS One. The R codes for these algorithms in our paper is availiable as request.

Prediction performance in terms of area under ROC curve (AUC)

Continue reading “Use prior information to prognostic biomaker or not?”

FrSVM: A filter ranking feature selection algorithm

We use a simple filter feature selection algorithm, called FrSVM, which selected the top ranked genes in PPI network and then training these top raked genes in L2-SVM. FrSVM integrates protein-protein interaction (ppi) network information into feature/gene selection algorithm for prognostic biomarker discovery.

As L2-SVM could not do feature the the ranking of genes were used as feature selection step.  Central genes always plays an important role biological process, so make using GeneRank to selected  those genes with large differences in their expression.

We applied FrSVM to several cancer datasets and reveals a significantly better prediction performance and higher signature stability. Related manuscript already put to arXiv and  R  code for FrSVM available at:

Codes: https://sites.google.com/site/yupengcun/software/frsvm

Papers: http://arxiv.org/abs/1212.3214

. Any comments and question on the FrSVM are welcomed. The following is how to run the program:


1. 
Geting gene expression profiles (GEP), PPi Network.

##############################################
# Geing GEP
#———————————————————————————-
library(GEOquery)
a = getGEO(“GSExxxxx”, destdir=”/home/YOURPATH/”)
## Normalized the GEP by limma
x= t(normalizeBetweenArrays(exprs(a), method=”quantile”) )
## defien your classes labes, y, as a factor
y= facotr(“Two Class”)

 

##############################################
# mapping probest IDs to Entrez IDs
# take hgu133a paltform as example
#———————————————————————————
library(‘hgu133a.db’)
mapped.probes<-mappedkeys(hgu133aENTREZID)
refseq<-as.list(hgu133aENTREZID[mapped.probes])
times<-sapply(refseq, length)
mapping <- data.frame(probesetID=rep(names(refseq),times=times), graphID=unlist(refseq),row.names=NULL, stringsAsFactors=FALSE)
mapping<- unique(mapping)##############################################
Summarize probests to genes of x by limma
# ad.ppi: Adjacencen matrix of PPI network

#———————————————————————————
Gsub=ad.ppi
mapping <- mapping[mapping[,’probesetID’] %in% colnames(x),]
int <- intersect(rownames(Gsub), mapping[,”graphID”])
xn.m=xn.m[,mapping$probesetID]

index = intersect(mapping[,’probesetID’],colnames(xn.m))
x <- x[,index]
colnames(xn.m) <- map2entrez[index]
ex.sum = t(avereps(t(xn.m), ID=map2entrez[index]))

int= intersect(int, colnames(ex.sum))
ex.sum=ex.sum[,int]         ## GEP which matched to PPI network
Gsub=Gsub[int,int]            ## PPI network which matched to GEP


2.  Run FrSVM program

##################################################
# You need install for flowing packages for run FrSVM.R programs:
#    library(ROCR)
#    library(Matrix)
#    library(kernlab)
#
## If you want to running parallelly, you also need  to load:
#    library(multicore)
#
## Here is an expale for 5 times 10-folds Cross-Validtaion
source(“../FrSVM.R”)
res <- frSVM.cv(x=ex.sum, y=y, folds=10,Gsub=Gsub, repeats=5, parallel = FALSE, cores = 2, DEBUG=TRUE,d=0.5,top.uper=0.95,top.lower=0.9)
## the AUC values for 5*10-folds CV
AUC= res$auc

 

Current approach in finding biomaker by means of mahcine learning

How to find the robust biomarkers in the genomics data are first step to personalized medicine. Here we take a short review on how machine leaning works in find biomarkers and current aproach in this area.  for more interesting technology, please see the following papers.

Biomarker Gene Signature Discovery Integrating Network Knowledge

Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany
Abstract: Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.