Computational Genomics, Machine Learning, Medicine Genomics, Programming, R

A new R package for network-based biomarker discovery released

A new R package, netClass, has been release. netClass integrate network information, such as protein-protein interaction network or KEGG, to mRNA classification, but also incorporate miRNA to mRNA with mi-mRNA interaction network for biomarker discovery. This methods we called stSVM and already published in PloS ONE (Cun et al 2013). Apart from stSVM, we also implement the flowing methods in netClass: 

  1. AEP (average gene expression of pathway), Guo et al., BMC Bioinformatics 2005, 6:58.
  2. PAC (pathway activitive classification), Lee E, et  al., PLoS Comput Biol 4(11): e1000217.
  3. hubc (Hub nodes classification), Taylor et al.(2009) Nat. Biotech.: doi: 10.1038/nbt.152
  4. frSVM (filter via top ranked genes), Cun et al. arXiv:1212.3214 ;  Winter etal., PLoS Comput Biol 8(5): e1002511.
  5. stSVM (network smoothed t-statistic) , Cun et al., PloS One,.

NetClass can be download from souceforg ( http://sourceforge.net/projects/netclassr/) or , CRAN (http://cran.r-project.org/web/packages/netClass/ ). For more detail of netClass, you can refer these four papers:

Programming

Lecture on Machine Learning

Probabilistic Graphical Models

http://videolectures.net/mlss05au_roweis_pgm/

Discriminative Learning of Sum-Product Networks

http://videolectures.net/nips2012_gens_discriminative_learning/

Graphical Models via Generalized Linear Models

http://videolectures.net/nips2012_yang_models/

Classification with Deep Invariant Scattering Networks

http://videolectures.net/nips2012_mallat_classification/

Dirichlet Process: Practical Course

http://videolectures.net/mlss2012_gorur_dirichlet_practical/

Hilbert Space Embedding for Dirichlet Process Mixtures

http://videolectures.net/nipsworkshops2012_muandet_dirichlet/

Exploring transcription regulation through cell-to-cell variability

http://videolectures.net/mlsb2010_friedman_etr/

 

Understanding Gene Regulatory Networks and Their Variations

http://videolectures.net/nips09_koller_ugrntv/

 

Rich Probabilistic Models for Holistic Scene Understanding

http://videolectures.net/ijcai2011_koller_scene/

Computational Genomics, Programming

a simple R programs for short seq assmebling

Given sequence S = { ATC, CCA, CAG, TCC, AGT }, use R to perform overlap assemble( greedy approach)  of the given sequences. We ca nuse R to approach this problems:

pseudocode for Greedy approach (suboptimal solution)

Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si.

1. Calculate pairwise overlap of strings
2. Merge a pair with maximum overlap
3. Repeat 1. – 3. until there is only one string

R codes: Continue reading “a simple R programs for short seq assmebling”