达堡研讨会印象

达堡是我对Dagstuhl Schloss的简称,是位于德国萨尔兰州的一个小城堡,同时也是德国信息学研究领域的Leibniz Center for Informatics所在地。达堡研讨会(Dagstuhl Seminars)的是信息学领域的顶级研讨会之一,他以Oberwolfach数学研究中心为楷模,努力营造一个为学者提供及交流,启迪智慧的平台。达堡会议的宗旨:

Schloss Dagstuhl – Leibniz Center for Informatics (German: Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH) is the world’s premier venue for informatics. World-class scientists, promising young researchers and practioners come together to exchange their knowledge and to discuss their research findings.

达堡的会议组织形式
这个研讨会只接受邀请注册,不接受直接申请,参会人的名额一般限定在30个人左右以保证每个与会者都能彼此充分的交流。同时这个研讨会没有固定的program,参与者会在会议的第一天的早上围城一圈讨论接下来几天的会议日程。

达堡的生活
由于历史的原因,达堡的交通不是很方便,需要转好几次车才能到达,这样地理位置也为参会者全心参会创造了一个很好的机会,使得参会者没法翘会去旅游。五天左右的会议,小小的城堡提供了会需要的一切,还有各种娱乐台球室,咖啡屋,各种红/白葡萄酒,饮料,咖啡就放在屋子旁边,你可以顺便拿你想要喝的东西,然后在自己的消费单上签个字,到退房时统一结帐。一瓶好的雷司令的价格在达堡也就9.50欧,还是很便宜的。40欧一天,包括了食宿,达堡的会议费用还是相当的便宜的。

在这个日新月异的狂飙运动时代,有时能停下来,去乡间和同行去乡间聊点科学,出点汗,总比打折飞的在各种会场和酒桌上只争朝夕,觥筹交错好一点吧。 希望我们自己也能有Oberwolfach 数学中心,Dagstuhl simenar之类的会,大师和年轻学生, 学者们能做到一起, 分享彼此的发现乐趣。

http://en.wikipedia.org/wiki/Dagstuhl
http://en.wikipedia.org/wiki/Mathematical_Research_Institute_of_Oberwolfach

Scientific B-sides

Starting your own group is one of the most important steps in your scientific career — and one of the hardest.

Being invited to a Career Development Workshop at ISMB 2012 made me write down some of the advice that I had got when I was on the jobmarket a few years ago (and even put some of it on slides).

In a diverse and interdisciplinary field like computational biology it is very quite hard to come up with general rules that fit everyone. This is why I went down the self-indulgent route and revisited the CV and research statement I had prepared 4 years ago. (You’ll find a copy in the slides.) Some things are Ok, some things I would improve now — you will see, I’ll comment on this later. Let’s start with the basics:

View original post 1,266 more words

comutational courese in Plos Computational Biology

Short introduction paper in different ares in computational biology.

Fran Lewitter, Welcome to PLoS Computational Biology “Education”

FrSVM: A filter ranking feature selection algorithm

We use a simple filter feature selection algorithm, called FrSVM, which selected the top ranked genes in PPI network and then training these top raked genes in L2-SVM. FrSVM integrates protein-protein interaction (ppi) network information into feature/gene selection algorithm for prognostic biomarker discovery.

As L2-SVM could not do feature the the ranking of genes were used as feature selection step.  Central genes always plays an important role biological process, so make using GeneRank to selected  those genes with large differences in their expression.

We applied FrSVM to several cancer datasets and reveals a significantly better prediction performance and higher signature stability. Related manuscript already put to arXiv and  R  code for FrSVM available at:

Codes: https://sites.google.com/site/yupengcun/software/frsvm

Papers: http://arxiv.org/abs/1212.3214

. Any comments and question on the FrSVM are welcomed. The following is how to run the program:


1. 
Geting gene expression profiles (GEP), PPi Network.

##############################################
# Geing GEP
#———————————————————————————-
library(GEOquery)
a = getGEO(“GSExxxxx”, destdir=”/home/YOURPATH/”)
## Normalized the GEP by limma
x= t(normalizeBetweenArrays(exprs(a), method=”quantile”) )
## defien your classes labes, y, as a factor
y= facotr(“Two Class”)

 

##############################################
# mapping probest IDs to Entrez IDs
# take hgu133a paltform as example
#———————————————————————————
library(‘hgu133a.db’)
mapped.probes<-mappedkeys(hgu133aENTREZID)
refseq<-as.list(hgu133aENTREZID[mapped.probes])
times<-sapply(refseq, length)
mapping <- data.frame(probesetID=rep(names(refseq),times=times), graphID=unlist(refseq),row.names=NULL, stringsAsFactors=FALSE)
mapping<- unique(mapping)##############################################
Summarize probests to genes of x by limma
# ad.ppi: Adjacencen matrix of PPI network

#———————————————————————————
Gsub=ad.ppi
mapping <- mapping[mapping[,’probesetID’] %in% colnames(x),]
int <- intersect(rownames(Gsub), mapping[,”graphID”])
xn.m=xn.m[,mapping$probesetID]

index = intersect(mapping[,’probesetID’],colnames(xn.m))
x <- x[,index]
colnames(xn.m) <- map2entrez[index]
ex.sum = t(avereps(t(xn.m), ID=map2entrez[index]))

int= intersect(int, colnames(ex.sum))
ex.sum=ex.sum[,int]         ## GEP which matched to PPI network
Gsub=Gsub[int,int]            ## PPI network which matched to GEP


2.  Run FrSVM program

##################################################
# You need install for flowing packages for run FrSVM.R programs:
#    library(ROCR)
#    library(Matrix)
#    library(kernlab)
#
## If you want to running parallelly, you also need  to load:
#    library(multicore)
#
## Here is an expale for 5 times 10-folds Cross-Validtaion
source(“../FrSVM.R”)
res <- frSVM.cv(x=ex.sum, y=y, folds=10,Gsub=Gsub, repeats=5, parallel = FALSE, cores = 2, DEBUG=TRUE,d=0.5,top.uper=0.95,top.lower=0.9)
## the AUC values for 5*10-folds CV
AUC= res$auc

 

Current approach in finding biomaker by means of mahcine learning

How to find the robust biomarkers in the genomics data are first step to personalized medicine. Here we take a short review on how machine leaning works in find biomarkers and current aproach in this area.  for more interesting technology, please see the following papers.

Biomarker Gene Signature Discovery Integrating Network Knowledge

Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany
Abstract: Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.

Ten years of the Human Genomics maps

Ten years have past since the publish of human genomics maps in Nature and Science magazine

  • International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
  • Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

. last issue on Nature and Science published lost of paper and news of the progress in this area.  Human Genome project(HGP) do bring the genome technology to all ares in biology and changing the research style. So many data come form the high-through put machine, its a golden age for computational biologist. See more form here:

Continue reading “Ten years of the Human Genomics maps”

A new manuscript on gene duplication models

I update  my manuscripts in arXiv and submit to journal.  the manuscript was doing numerical simulation of the evolutionary fate of mutant gene at duplicate loci. Diffusion method was used in the mutation diffusion in the natural population, and Ito’s stochastic difference equation was employed to approximating  the  4-dimension Kolmognov backwark  equation. For more detail, please see my manuscripts bellow:

Numerical Studies of the Evolutionary Rate of Mutant Allele at Duplicate Loci

Yupeng Cun

Gene duplications are one of major primary driving forces for evolutionary novelty. We took population genetics models of genes duplicate to study how evolutionary forces acting during the fixation of mutant allele at duplicate loci. We study the fixation time of mutant allele at duplicate loci under double null recessive model (DNR) and haploinsufficient model (HI). And we also investigate how selection coefficients with other evolutionary force influence the fixation frequency of mutant allele at duplicate loci. Our results suggest that the selection plays a role in the evolutionary fate of duplicate genes, and tight linkage would help the mutant allele preserved at duplicate loci. Our theoretical simulation agree with the genomics data analysis result well, that selection, rather than drift, plays a important role in the establishment of duplicate loci, and recombination have a great opportunity to be acted upon selection.

Subjects:

Populations and Evolution (q-bio.PE)

Cite as:

arXiv:1007.0333v2 [q-bio.PE]

When Machine learning meets molecular evolution

A recent paper , Schwarz et al. 2010, was using kernel method to reconstruction the phylogenetic tree, which usually done by maximum likelihood estimation. Their using finite-state transducers(FST) to create a alignment-free kernel for evolutionary comparison of molecular sequence, and their call it a rational kernel approach. Their method overcome the gap in alignment sequence. As we known,  the gap can influence the accuracy of phylogenetic tree.

Kernel method had approved to be a powerful tool for classification, and their method do help to classify the twilight-zone in very close sequence(see the following picture). The result in their paper is a new and accurate way of determining evolutionary distances in the twilight zone of sequence alignments that is suitable for large homologies datasets.

The method for phylogenetic/ phylogenomic reconstruction are still challenged problems in evolution biology.  Schwarz et al. ‘s paper only do misclassification, maybe we can see the kernel method for estimating the divergence time, effective population size, recombination rate and mutation rate in nature population.

(A phylogenetic trees of the Chlorophyceae, which reconstructed by FST distance (left) using the full kernel score, F84 distance estimation on a Muscle alignment (top right) and maximum-likelihood tree on the same Muscle alignment (bottom right).)

Continue reading “When Machine learning meets molecular evolution”

Social network, machine learning and disease-genes

Some recent paper on how disease gene network works and the metastasis of cancer. Machine  Learning is a good tool for study the relation between individual gene and disease.  here are the papers:

Infectious Disease Modeling of Social Contagion in Networks

Alison L. Hill1,2*, David G. Rand1,3, Martin A. Nowak1,4,5,Nicholas A. Christakis6,7,8

Information, trends, behaviors and even health states may spread between contacts in a social network, similar to disease transmission. However, a major difference is that as well as being spread infectiously, it is possible to acquire this state spontaneously. For example, you can gain knowledge of a particular piece of information either by being told about it, or by discovering it yourself. In this paper we introduce a mathematical modeling framework that allows us to compare the dynamics of these social contagions to traditional infectious diseases. We can also extract and compare the rates of spontaneous versus contagious acquisition of a behavior from longitudinal data and can use this to predict the implications for future prevalence and control strategies. As an example, we study the spread of obesity, and find that the current rate of becoming obese is about 2 per year and increases by 0.5 percentage points for each obese social contact, while the rate of recovering from obesity is 4per year. The rates of spontaneous infection and transmission have steadily increased over time since 1970, driving the increase in obesity prevalence. Our model thus provides a quantitative way to analyze the strength and implications of social contagions.

Continue reading “Social network, machine learning and disease-genes”

A Book on Statistical Learning

I just read a book on statistical learning, The Elements of Statistical Learning(2ed). The important of this this book do not need me buck. The authors are so kind, and server they e-print of this online freely and they set up an web for supplementary.

Here is their website: http://www-stat.stanford.edu/~tibs/ElemStatLearn/ . Wish you can find the beauty of statistical learning.

Continue reading “A Book on Statistical Learning”