Scientific B-sides

Welcome back! The last post discussed rules 1-3: the importance to do a postdoc, a concise CV and a unique research statement. Like the last post this one is inspired by a Career Development Workshop at ISMB 2012 that I contributed to (download the slides).

There is still one thing missing from a standard application pack:

4. Pretend you care! The teaching statement

Together with CV and research statement some places ask you to submit a teaching statement. So write one. But don’t be fooled, it’s pretty low on the priority list (for the hiring committee, even if maybe not for you). Academic employers want three things from you: money, papers, and … long time nothing … teaching. I’m not saying they won’t ask you to teach for many hours a week, but when it comes to you being evaluated its money and papers (in that order) which…

View original post 951 more words

达堡研讨会印象

达堡是我对Dagstuhl Schloss的简称,是位于德国萨尔兰州的一个小城堡,同时也是德国信息学研究领域的Leibniz Center for Informatics所在地。达堡研讨会(Dagstuhl Seminars)的是信息学领域的顶级研讨会之一,他以Oberwolfach数学研究中心为楷模,努力营造一个为学者提供及交流,启迪智慧的平台。达堡会议的宗旨:

Schloss Dagstuhl – Leibniz Center for Informatics (German: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH) is the world’s premier venue for informatics. World-class scientists, promising young researchers and practioners come together to exchange their knowledge and to discuss their research findings.

达堡的会议组织形式
这个研讨会只接受邀请注册,不接受直接申请,参会人的名额一般限定在30个人左右以保证每个与会者都能彼此充分的交流。同时这个研讨会没有固定的program,参与者会在会议的第一天的早上围城一圈讨论接下来几天的会议日程。

达堡的生活
由于历史的原因,达堡的交通不是很方便,需要转好几次车才能到达,这样地理位置也为参会者全心参会创造了一个很好的机会,使得参会者没法翘会去旅游。五天左右的会议,小小的城堡提供了会需要的一切,还有各种娱乐台球室,咖啡屋,各种红/白葡萄酒,饮料,咖啡就放在屋子旁边,你可以顺便拿你想要喝的东西,然后在自己的消费单上签个字,到退房时统一结帐。一瓶好的雷司令的价格在达堡也就9.50欧,还是很便宜的。40欧一天,包括了食宿,达堡的会议费用还是相当的便宜的。

在这个日新月异的狂飙运动时代,有时能停下来,去乡间和同行去乡间聊点科学,出点汗,总比打折飞的在各种会场和酒桌上只争朝夕,觥筹交错好一点吧。 希望我们自己也能有Oberwolfach 数学中心,Dagstuhl simenar之类的会,大师和年轻学生, 学者们能做到一起, 分享彼此的发现乐趣。

http://en.wikipedia.org/wiki/Dagstuhl
http://en.wikipedia.org/wiki/Mathematical_Research_Institute_of_Oberwolfach

Scientific B-sides

Starting your own group is one of the most important steps in your scientific career — and one of the hardest.

Being invited to a Career Development Workshop at ISMB 2012 made me write down some of the advice that I had got when I was on the jobmarket a few years ago (and even put some of it on slides).

In a diverse and interdisciplinary field like computational biology it is very quite hard to come up with general rules that fit everyone. This is why I went down the self-indulgent route and revisited the CV and research statement I had prepared 4 years ago. (You’ll find a copy in the slides.) Some things are Ok, some things I would improve now — you will see, I’ll comment on this later. Let’s start with the basics:

View original post 1,266 more words

comutational courese in Plos Computational Biology

Short introduction paper in different ares in computational biology.

Fran Lewitter, Welcome to PLoS Computational Biology “Education”

FrSVM: A filter ranking feature selection algorithm

We use a simple filter feature selection algorithm, called FrSVM, which selected the top ranked genes in PPI network and then training these top raked genes in L2-SVM. FrSVM integrates protein-protein interaction (ppi) network information into feature/gene selection algorithm for prognostic biomarker discovery.

As L2-SVM could not do feature the the ranking of genes were used as feature selection step.  Central genes always plays an important role biological process, so make using GeneRank to selected  those genes with large differences in their expression.

We applied FrSVM to several cancer datasets and reveals a significantly better prediction performance and higher signature stability. Related manuscript already put to arXiv and  R  code for FrSVM available at:

Codes: https://sites.google.com/site/yupengcun/software/frsvm

Papers: http://arxiv.org/abs/1212.3214

. Any comments and question on the FrSVM are welcomed. The following is how to run the program:


1. 
Geting gene expression profiles (GEP), PPi Network.

##############################################
# Geing GEP
#———————————————————————————-
library(GEOquery)
a = getGEO(“GSExxxxx”, destdir=”/home/YOURPATH/”)
## Normalized the GEP by limma
x= t(normalizeBetweenArrays(exprs(a), method=”quantile”) )
## defien your classes labes, y, as a factor
y= facotr(“Two Class”)

 

##############################################
# mapping probest IDs to Entrez IDs
# take hgu133a paltform as example
#———————————————————————————
library(‘hgu133a.db’)
mapped.probes<-mappedkeys(hgu133aENTREZID)
refseq<-as.list(hgu133aENTREZID[mapped.probes])
times<-sapply(refseq, length)
mapping <- data.frame(probesetID=rep(names(refseq),times=times), graphID=unlist(refseq),row.names=NULL, stringsAsFactors=FALSE)
mapping<- unique(mapping)##############################################
Summarize probests to genes of x by limma
# ad.ppi: Adjacencen matrix of PPI network

#———————————————————————————
Gsub=ad.ppi
mapping <- mapping[mapping[,’probesetID’] %in% colnames(x),]
int <- intersect(rownames(Gsub), mapping[,”graphID”])
xn.m=xn.m[,mapping$probesetID]

index = intersect(mapping[,’probesetID’],colnames(xn.m))
x <- x[,index]
colnames(xn.m) <- map2entrez[index]
ex.sum = t(avereps(t(xn.m), ID=map2entrez[index]))

int= intersect(int, colnames(ex.sum))
ex.sum=ex.sum[,int]         ## GEP which matched to PPI network
Gsub=Gsub[int,int]            ## PPI network which matched to GEP


2.  Run FrSVM program

##################################################
# You need install for flowing packages for run FrSVM.R programs:
#    library(ROCR)
#    library(Matrix)
#    library(kernlab)
#
## If you want to running parallelly, you also need  to load:
#    library(multicore)
#
## Here is an expale for 5 times 10-folds Cross-Validtaion
source(“../FrSVM.R”)
res <- frSVM.cv(x=ex.sum, y=y, folds=10,Gsub=Gsub, repeats=5, parallel = FALSE, cores = 2, DEBUG=TRUE,d=0.5,top.uper=0.95,top.lower=0.9)
## the AUC values for 5*10-folds CV
AUC= res$auc

 

Current approach in finding biomaker by means of mahcine learning

How to find the robust biomarkers in the genomics data are first step to personalized medicine. Here we take a short review on how machine leaning works in find biomarkers and current aproach in this area.  for more interesting technology, please see the following papers.

Biomarker Gene Signature Discovery Integrating Network Knowledge

Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany
Abstract: Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.

顺流而下, 把梦做完

2011最后一个月把搬家,论文的reversion搅到一块,忙得晕头转向,虽然不是很完美,但是两件都顺利完成。

总结一下自己一年做过的事:

个 体医疗(Personal medicine)是今年很热的话题。如何在10多万基因芯片找到有用的标记(marker)是件很有意思的事情,因为在癌症患者中,用药过猛是个很大的 问题,很多人不是死于癌症本生,而是过度治疗后病人虚弱的身体很难抵御其他病毒的入侵。 所以如果能找到癌症复发(relapse)的基因标识(gene marker),就可以给临床医生用药建议,减少癌症患者的死亡率。 也许某天,基因检测也想普通的血检一样方便和简单。 基于这个思想我们发展了一个把蛋白质-蛋白质作用网络(PPI)作为先验信息整合到癌症患者的芯片表达数据(mRNA)中,来寻找marker基因。在尝 试了无数个支持向量机(SVM)的分类器之后,我们得出一个有点沮丧,但要有点小喜悦的结果:简单的整合PPI和mRNA,很难提高marker的预测精 度,而从2007年到现在,这方面的研究都支持整 合PPI到mRNA中能显著地提高marker的预测精度! 之后,在6个常用的乳腺癌数据集中,我们比较了其他的14中公认最好的分类方法,结果支持我们的新发现。接着整理思路,画图和写文章。在八月终于把文章写 完投BMC bioinformatics。期间体会德国导师的对文章每一个细节的精准要求,前后修改了5次才最后定稿。两个月后2个审稿人的意见出来,虽然评审意见 很positive,但是没人提了7个修改意见。 花了两个月的紧张计算,完成了修改,从新在12月初吧reversion提交了。 在4个月的紧张之后,完成了初稿和修改稿,正如导师在最后给我的 从新提交的邮件中说的“Good luck to your ms, I hold my thumbs!”。BMC一直宣称快速审稿,快速发表,但从我的经历上我觉的那也是浮云啊。

Continue reading “顺流而下, 把梦做完”