Current approach in finding biomaker by means of mahcine learning

How to find the robust biomarkers in the genomics data are first step to personalized medicine. Here we take a short review on how machine leaning works in find biomarkers and current aproach in this area.  for more interesting technology, please see the following papers.

Biomarker Gene Signature Discovery Integrating Network Knowledge

Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany
Abstract: Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.

顺流而下, 把梦做完

2011最后一个月把搬家,论文的reversion搅到一块,忙得晕头转向,虽然不是很完美,但是两件都顺利完成。

总结一下自己一年做过的事:

个 体医疗(Personal medicine)是今年很热的话题。如何在10多万基因芯片找到有用的标记(marker)是件很有意思的事情,因为在癌症患者中,用药过猛是个很大的 问题,很多人不是死于癌症本生,而是过度治疗后病人虚弱的身体很难抵御其他病毒的入侵。 所以如果能找到癌症复发(relapse)的基因标识(gene marker),就可以给临床医生用药建议,减少癌症患者的死亡率。 也许某天,基因检测也想普通的血检一样方便和简单。 基于这个思想我们发展了一个把蛋白质-蛋白质作用网络(PPI)作为先验信息整合到癌症患者的芯片表达数据(mRNA)中,来寻找marker基因。在尝 试了无数个支持向量机(SVM)的分类器之后,我们得出一个有点沮丧,但要有点小喜悦的结果:简单的整合PPI和mRNA,很难提高marker的预测精 度,而从2007年到现在,这方面的研究都支持整 合PPI到mRNA中能显著地提高marker的预测精度! 之后,在6个常用的乳腺癌数据集中,我们比较了其他的14中公认最好的分类方法,结果支持我们的新发现。接着整理思路,画图和写文章。在八月终于把文章写 完投BMC bioinformatics。期间体会德国导师的对文章每一个细节的精准要求,前后修改了5次才最后定稿。两个月后2个审稿人的意见出来,虽然评审意见 很positive,但是没人提了7个修改意见。 花了两个月的紧张计算,完成了修改,从新在12月初吧reversion提交了。 在4个月的紧张之后,完成了初稿和修改稿,正如导师在最后给我的 从新提交的邮件中说的“Good luck to your ms, I hold my thumbs!”。BMC一直宣称快速审稿,快速发表,但从我的经历上我觉的那也是浮云啊。

继续阅读“顺流而下, 把梦做完”

a simple R programs for short seq assmebling

Given sequence S = { ATC, CCA, CAG, TCC, AGT }, use R to perform overlap assemble( greedy approach)  of the given sequences. We ca nuse R to approach this problems:

pseudocode for Greedy approach (suboptimal solution)

Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si.

1. Calculate pairwise overlap of strings
2. Merge a pair with maximum overlap
3. Repeat 1. – 3. until there is only one string

R codes: 继续阅读“a simple R programs for short seq assmebling”

USnews: Personalized Medicine

In 2003, after more than a decade of research, the Human Genome Project was completed by the U.S. Department of Energy and the National Institutes of Health.

The goals of the Human Genome Project were to learn the order of the 3 billion units of DNA that go into making a human genome, as well as to identify all of the genes located in this vast amount of data. By 2003, almost all of the pairs of chemicals that make up the units had been put in the correct sequence—enough for a pronouncement of success. The individual genes within the long strands of DNA, and the elements that control the genes, are still in the process of being identified. Current counts indicate that the human genome contains 22,000 to 23,000 genes.

One of the early hopes of the genomic project was to pinpoint specific genes that caused common diseases. Scientists now think the answer is more complex, with many diseases the result of multiple genes interacting. Nevertheless, the information garnered from the genome project has the potential to forever transform healthcare. Many believe that genome-based medicine, frequently called personalized medicine, is the future of healthcare—the next logical step in a world in which more is known about human genetics, disease, and wellness than ever before.

Of all the scientific and social promises that stem from advances in our understanding of the human genome, genomic medicine may be the most eagerly awaited. The prospect of examining a person’s entire genome, or at least a large portion of it, in order to make individualized risk predictions and treatment decisions is tantalizingly within reach.

继续阅读“USnews: Personalized Medicine”

Machine Learning Course

A Machine learning Course form Standford.  This provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.
Lecture 1


Lecture 2

继续阅读“Machine Learning Course”

Ten years of the Human Genomics maps

Ten years have past since the publish of human genomics maps in Nature and Science magazine

  • International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
  • Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

. last issue on Nature and Science published lost of paper and news of the progress in this area.  Human Genome project(HGP) do bring the genome technology to all ares in biology and changing the research style. So many data come form the high-through put machine, its a golden age for computational biologist. See more form here:

继续阅读“Ten years of the Human Genomics maps”