Current approach in finding biomaker by means of mahcine learning

How to find the robust biomarkers in the genomics data are first step to personalized medicine. Here we take a short review on how machine leaning works in find biomarkers and current aproach in this area.  for more interesting technology, please see the following papers.

Biomarker Gene Signature Discovery Integrating Network Knowledge

Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany
Abstract: Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.

A new manuscript on gene duplication models

I update  my manuscripts in arXiv and submit to journal.  the manuscript was doing numerical simulation of the evolutionary fate of mutant gene at duplicate loci. Diffusion method was used in the mutation diffusion in the natural population, and Ito’s stochastic difference equation was employed to approximating  the  4-dimension Kolmognov backwark  equation. For more detail, please see my manuscripts bellow:

Numerical Studies of the Evolutionary Rate of Mutant Allele at Duplicate Loci

Yupeng Cun

Gene duplications are one of major primary driving forces for evolutionary novelty. We took population genetics models of genes duplicate to study how evolutionary forces acting during the fixation of mutant allele at duplicate loci. We study the fixation time of mutant allele at duplicate loci under double null recessive model (DNR) and haploinsufficient model (HI). And we also investigate how selection coefficients with other evolutionary force influence the fixation frequency of mutant allele at duplicate loci. Our results suggest that the selection plays a role in the evolutionary fate of duplicate genes, and tight linkage would help the mutant allele preserved at duplicate loci. Our theoretical simulation agree with the genomics data analysis result well, that selection, rather than drift, plays a important role in the establishment of duplicate loci, and recombination have a great opportunity to be acted upon selection.

Subjects:

Populations and Evolution (q-bio.PE)

Cite as:

arXiv:1007.0333v2 [q-bio.PE]

When Machine learning meets molecular evolution

A recent paper , Schwarz et al. 2010, was using kernel method to reconstruction the phylogenetic tree, which usually done by maximum likelihood estimation. Their using finite-state transducers(FST) to create a alignment-free kernel for evolutionary comparison of molecular sequence, and their call it a rational kernel approach. Their method overcome the gap in alignment sequence. As we known,  the gap can influence the accuracy of phylogenetic tree.

Kernel method had approved to be a powerful tool for classification, and their method do help to classify the twilight-zone in very close sequence(see the following picture). The result in their paper is a new and accurate way of determining evolutionary distances in the twilight zone of sequence alignments that is suitable for large homologies datasets.

The method for phylogenetic/ phylogenomic reconstruction are still challenged problems in evolution biology.  Schwarz et al. ‘s paper only do misclassification, maybe we can see the kernel method for estimating the divergence time, effective population size, recombination rate and mutation rate in nature population.

(A phylogenetic trees of the Chlorophyceae, which reconstructed by FST distance (left) using the full kernel score, F84 distance estimation on a Muscle alignment (top right) and maximum-likelihood tree on the same Muscle alignment (bottom right).)

继续阅读