《R语言在生物医学数据处理中的应用》第一期

R语言生物医学数据处理中的应用

2017.11.04-11.05第一期

昆明

培训课程内容: 深入讲解编程的基础思路和R语言的思想,并有R编程和数据处理的多上机实践和答疑!学习多个使用R语言分析的实例,包括基本的数据统计、基因芯片GEO数据分析以及TCGA数据下载和分析。

 

A new R package for network-based biomarker discovery released

A new R package, netClass, has been release. netClass integrate network information, such as protein-protein interaction network or KEGG, to mRNA classification, but also incorporate miRNA to mRNA with mi-mRNA interaction network for biomarker discovery. This methods we called stSVM and already published in PloS ONE (Cun et al 2013). Apart from stSVM, we also implement the flowing methods in netClass: 

  1. AEP (average gene expression of pathway), Guo et al., BMC Bioinformatics 2005, 6:58.
  2. PAC (pathway activitive classification), Lee E, et  al., PLoS Comput Biol 4(11): e1000217.
  3. hubc (Hub nodes classification), Taylor et al.(2009) Nat. Biotech.: doi: 10.1038/nbt.152
  4. frSVM (filter via top ranked genes), Cun et al. arXiv:1212.3214 ;  Winter etal., PLoS Comput Biol 8(5): e1002511.
  5. stSVM (network smoothed t-statistic) , Cun et al., PloS One,.

NetClass can be download from souceforg ( http://sourceforge.net/projects/netclassr/) or , CRAN (http://cran.r-project.org/web/packages/netClass/ ). For more detail of netClass, you can refer these four papers:

Lecture on Machine Learning

Probabilistic Graphical Models

http://videolectures.net/mlss05au_roweis_pgm/

Discriminative Learning of Sum-Product Networks

http://videolectures.net/nips2012_gens_discriminative_learning/

Graphical Models via Generalized Linear Models

http://videolectures.net/nips2012_yang_models/

Classification with Deep Invariant Scattering Networks

http://videolectures.net/nips2012_mallat_classification/

Dirichlet Process: Practical Course

http://videolectures.net/mlss2012_gorur_dirichlet_practical/

Hilbert Space Embedding for Dirichlet Process Mixtures

http://videolectures.net/nipsworkshops2012_muandet_dirichlet/

Exploring transcription regulation through cell-to-cell variability

http://videolectures.net/mlsb2010_friedman_etr/

 

Understanding Gene Regulatory Networks and Their Variations

http://videolectures.net/nips09_koller_ugrntv/

 

Rich Probabilistic Models for Holistic Scene Understanding

http://videolectures.net/ijcai2011_koller_scene/

a simple R programs for short seq assmebling

Given sequence S = { ATC, CCA, CAG, TCC, AGT }, use R to perform overlap assemble( greedy approach)  of the given sequences. We ca nuse R to approach this problems:

pseudocode for Greedy approach (suboptimal solution)

Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si.

1. Calculate pairwise overlap of strings
2. Merge a pair with maximum overlap
3. Repeat 1. – 3. until there is only one string

R codes: 继续阅读“a simple R programs for short seq assmebling”