“Translational Bioinformatics” collection for PLOS cBio

A review collection in current approach in Translational Bioinformatics.

=======================================================

COVER
Image Credit: PLOS
Issue Image

‘Translational Bioinformatics’ is a collection of PLOS Computational Biology Education articles which reads as a “book” to be used as a reference or tutorial for a graduate level introductory course on the science of translational bioinformatics.

Translational bioinformatics is an emerging field that addresses the current challenges of integrating increasingly voluminous amounts of molecular and clinical data. Its aim is to provide a better understanding of the molecular basis of disease, which in turn will inform clinical practice and ultimately improve human health.

The concept of a translational bioinformatics introductory book was originally conceived in 2009 by Jake Chen and Maricel Kann. Each chapter was crafted by leading experts who provide a solid introduction to the topics covered, complete with training exercises and answers. The rapid evolution of this field is expected to lead to updates and new chapters that will be incorporated into this collection.

Collection editors: Maricel Kann, Guest Editor, and Fran Lewitter, PLOS Computational Biology Education Editor.

Download the full Translational Bioinformatics collection here: PDF

Collection URL: www.ploscollections.org/translationalbioinformatics

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002796

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002826

Chapter 2: Data-Driven View of Disease Biology

Casey S. Greene, Olga G. Troyanskaya

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002816

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002805

Chapter 4: Protein Interactions and Disease

Mileidy W. Gonzalez, Maricel G. Kann

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002819

Chapter 5: Network Biology Approach to Complex Diseases

Dong-Yeon Cho, Yoo-Ah Kim, Teresa M. Przytycka

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002820

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002821

Chapter 7: Pharmacogenomics

Konrad J. Karczewski, Roxana Daneshjou, Russ B. Altman

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002817

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002858

Chapter 9: Analyses Using Disease Ontologies

Nigam H. Shah, Tyler Cole, Mark A. Musen

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002827

Chapter 10: Mining Genome-Wide Genetic Markers

Xiang Zhang, Shunping Huang, Zhaojun Zhang, Wei Wang

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002828

Chapter 11: Genome-Wide Association Studies

William S. Bush, Jason H. Moore

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002822

Chapter 12: Human Microbiome Analysis

Xochitl C. Morgan, Curtis Huttenhower

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002808

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002823

Chapter 14: Cancer Genome Analysis

Miguel Vazquez, Victor de la Torre, Alfonso Valencia

PLOS Computational Biology: published 27 Dec 2012 | info:doi/10.1371/journal.pcbi.1002824

 

Use prior information to prognostic biomaker or not?

In our recent publication in BMC bioinformatics, we acompared a great deal of feature selection methods to finding prognostic biomakers in 6 breast cancer gene expresion data. No methods show significant performacne in prediction accuracy, feature selection stability and  biogical interprety, which against previeous reseach results: current network-based appraoch did not show much benift in our analysis. Meanwhile, A group from NKI also show the simliar results in PloS One. The R codes for these algorithms in our paper is availiable as request.

Prediction performance in terms of area under ROC curve (AUC)

Continue reading

Scientific B-sides

Welcome back! The last post discussed rules 1-3: the importance to do a postdoc, a concise CV and a unique research statement. Like the last post this one is inspired by a Career Development Workshop at ISMB 2012 that I contributed to (download the slides).

There is still one thing missing from a standard application pack:

4. Pretend you care! The teaching statement

Together with CV and research statement some places ask you to submit a teaching statement. So write one. But don’t be fooled, it’s pretty low on the priority list (for the hiring committee, even if maybe not for you). Academic employers want three things from you: money, papers, and … long time nothing … teaching. I’m not saying they won’t ask you to teach for many hours a week, but when it comes to you being evaluated its money and papers (in that order) which…

View original post 951 more words

达堡研讨会印象

达堡是我对Dagstuhl Schloss的简称,是位于德国萨尔兰州的一个小城堡,同时也是德国信息学研究领域的Leibniz Center for Informatics所在地。达堡研讨会(Dagstuhl Seminars)的是信息学领域的顶级研讨会之一,他以Oberwolfach数学研究中心为楷模,努力营造一个为学者提供及交流,启迪智慧的平台。达堡会议的宗旨:

Schloss Dagstuhl – Leibniz Center for Informatics (German: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH) is the world’s premier venue for informatics. World-class scientists, promising young researchers and practioners come together to exchange their knowledge and to discuss their research findings.

达堡的会议组织形式
这个研讨会只接受邀请注册,不接受直接申请,参会人的名额一般限定在30个人左右以保证每个与会者都能彼此充分的交流。同时这个研讨会没有固定的program,参与者会在会议的第一天的早上围城一圈讨论接下来几天的会议日程。

达堡的生活
由于历史的原因,达堡的交通不是很方便,需要转好几次车才能到达,这样地理位置也为参会者全心参会创造了一个很好的机会,使得参会者没法翘会去旅游。五天左右的会议,小小的城堡提供了会需要的一切,还有各种娱乐台球室,咖啡屋,各种红/白葡萄酒,饮料,咖啡就放在屋子旁边,你可以顺便拿你想要喝的东西,然后在自己的消费单上签个字,到退房时统一结帐。一瓶好的雷司令的价格在达堡也就9.50欧,还是很便宜的。40欧一天,包括了食宿,达堡的会议费用还是相当的便宜的。

在这个日新月异的狂飙运动时代,有时能停下来,去乡间和同行去乡间聊点科学,出点汗,总比打折飞的在各种会场和酒桌上只争朝夕,觥筹交错好一点吧。 希望我们自己也能有Oberwolfach 数学中心,Dagstuhl simenar之类的会,大师和年轻学生, 学者们能做到一起, 分享彼此的发现乐趣。

http://en.wikipedia.org/wiki/Dagstuhl
http://en.wikipedia.org/wiki/Mathematical_Research_Institute_of_Oberwolfach

Scientific B-sides

Starting your own group is one of the most important steps in your scientific career — and one of the hardest.

Being invited to a Career Development Workshop at ISMB 2012 made me write down some of the advice that I had got when I was on the jobmarket a few years ago (and even put some of it on slides).

In a diverse and interdisciplinary field like computational biology it is very quite hard to come up with general rules that fit everyone. This is why I went down the self-indulgent route and revisited the CV and research statement I had prepared 4 years ago. (You’ll find a copy in the slides.) Some things are Ok, some things I would improve now — you will see, I’ll comment on this later. Let’s start with the basics:

View original post 1,266 more words

comutational courese in Plos Computational Biology

Short introduction paper in different ares in computational biology.

Fran Lewitter, Welcome to PLoS Computational Biology “Education”

Kenzie D MacIsaac, Ernest Fraenkel, Practical Strategies for Discovering Regulatory DNA Sequence Motifs, April 2006

Duncan Brown, Kimmen Sjölander, Functional Classification Using Phylogenomic Inference,June 2006

Philip E Bourne, Johanna McEntyre, Biocurators: Contributors to the World of Science,October 2006

FrSVM: A filter ranking feature selection algorithm

We use a simple filter feature selection algorithm, called FrSVM, which selected the top ranked genes in PPI network and then training these top raked genes in L2-SVM. FrSVM integrates protein-protein interaction (ppi) network information into feature/gene selection algorithm for prognostic biomarker discovery.

As L2-SVM could not do feature the the ranking of genes were used as feature selection step.  Central genes always plays an important role biological process, so make using GeneRank to selected  those genes with large differences in their expression.

We applied FrSVM to several cancer datasets and reveals a significantly better prediction performance and higher signature stability. Related manuscript already put to arXiv and  R  code for FrSVM available at:

Codes: https://sites.google.com/site/yupengcun/software/frsvm

Papers: http://arxiv.org/abs/1212.3214

. Any comments and question on the FrSVM are welcomed. The following is how to run the program:


1. 
Geting gene expression profiles (GEP), PPi Network.

##############################################
# Geing GEP
#———————————————————————————-
library(GEOquery)
a = getGEO(“GSExxxxx”, destdir=”/home/YOURPATH/”)
## Normalized the GEP by limma
x= t(normalizeBetweenArrays(exprs(a), method=”quantile”) )
## defien your classes labes, y, as a factor
y= facotr(“Two Class”)

 

##############################################
# mapping probest IDs to Entrez IDs
# take hgu133a paltform as example
#———————————————————————————
library(‘hgu133a.db’)
mapped.probes<-mappedkeys(hgu133aENTREZID)
refseq<-as.list(hgu133aENTREZID[mapped.probes])
times<-sapply(refseq, length)
mapping <- data.frame(probesetID=rep(names(refseq),times=times), graphID=unlist(refseq),row.names=NULL, stringsAsFactors=FALSE)
mapping<- unique(mapping)##############################################
Summarize probests to genes of x by limma
# ad.ppi: Adjacencen matrix of PPI network

#———————————————————————————
Gsub=ad.ppi
mapping <- mapping[mapping[,’probesetID’] %in% colnames(x),]
int <- intersect(rownames(Gsub), mapping[,”graphID”])
xn.m=xn.m[,mapping$probesetID]

index = intersect(mapping[,’probesetID’],colnames(xn.m))
x <- x[,index]
colnames(xn.m) <- map2entrez[index]
ex.sum = t(avereps(t(xn.m), ID=map2entrez[index]))

int= intersect(int, colnames(ex.sum))
ex.sum=ex.sum[,int]         ## GEP which matched to PPI network
Gsub=Gsub[int,int]            ## PPI network which matched to GEP


2.  Run FrSVM program

##################################################
# You need install for flowing packages for run FrSVM.R programs:
#    library(ROCR)
#    library(Matrix)
#    library(kernlab)
#
## If you want to running parallelly, you also need  to load:
#    library(multicore)
#
## Here is an expale for 5 times 10-folds Cross-Validtaion
source(“../FrSVM.R”)
res <- frSVM.cv(x=ex.sum, y=y, folds=10,Gsub=Gsub, repeats=5, parallel = FALSE, cores = 2, DEBUG=TRUE,d=0.5,top.uper=0.95,top.lower=0.9)
## the AUC values for 5*10-folds CV
AUC= res$auc