培训课程内容: 深入讲解编程的基础思路和R语言的思想,并有R编程和数据处理的多上机实践和答疑!学习多个使用R语言分析的实例,包括基本的数据统计、基因芯片GEO数据分析以及TCGA数据下载和分析。


A new R package for network-based biomarker discovery released

A new R package, netClass, has been release. netClass integrate network information, such as protein-protein interaction network or KEGG, to mRNA classification, but also incorporate miRNA to mRNA with mi-mRNA interaction network for biomarker discovery. This methods we called stSVM and already published in PloS ONE (Cun et al 2013). Apart from stSVM, we also implement the flowing methods in netClass: 

  1. AEP (average gene expression of pathway), Guo et al., BMC Bioinformatics 2005, 6:58.
  2. PAC (pathway activitive classification), Lee E, et  al., PLoS Comput Biol 4(11): e1000217.
  3. hubc (Hub nodes classification), Taylor et al.(2009) Nat. Biotech.: doi: 10.1038/nbt.152
  4. frSVM (filter via top ranked genes), Cun et al. arXiv:1212.3214 ;  Winter etal., PLoS Comput Biol 8(5): e1002511.
  5. stSVM (network smoothed t-statistic) , Cun et al., PloS One,.

NetClass can be download from souceforg ( http://sourceforge.net/projects/netclassr/) or , CRAN (http://cran.r-project.org/web/packages/netClass/ ). For more detail of netClass, you can refer these four papers:

Lecture on Machine Learning

Probabilistic Graphical Models


Discriminative Learning of Sum-Product Networks


Graphical Models via Generalized Linear Models


Classification with Deep Invariant Scattering Networks


Dirichlet Process: Practical Course


Hilbert Space Embedding for Dirichlet Process Mixtures


Exploring transcription regulation through cell-to-cell variability



Understanding Gene Regulatory Networks and Their Variations



Rich Probabilistic Models for Holistic Scene Understanding


a simple R programs for short seq assmebling

Given sequence S = { ATC, CCA, CAG, TCC, AGT }, use R to perform overlap assemble( greedy approach)  of the given sequences. We ca nuse R to approach this problems:

pseudocode for Greedy approach (suboptimal solution)

Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si.

1. Calculate pairwise overlap of strings
2. Merge a pair with maximum overlap
3. Repeat 1. – 3. until there is only one string

R codes: 继续阅读“a simple R programs for short seq assmebling”