Computational Genomics

A brief introduction to “apply” in R

a good, practical guidline for “apply” in R.

What You're Doing Is Rather Desperate

At any R Q&A site, you’ll frequently see an exchange like this one:

Q: How can I use a loop to […insert task here…] ?
A: Don’t. Use one of the apply functions.

So, what are these wondrous apply functions and how do they work? I think the best way to figure out anything in R is to learn by experimentation, using embarrassingly trivial data and functions.

If you fire up your R console, type “??apply” and scroll down to the functions in the base package, you’ll see something like this:

Let’s examine each of those.

1. apply
Description: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”

OK – we know about vectors/arrays and functions, but what are these “margins”? Simple: either the rows (1), the columns (2) or both (1:2). By “both”, we mean “apply the…

View original post 1,003 more words

Computational Genomics

Inferring tumour evolution 2 – Comparison to classical phylogenetics

Scientific B-sides

Quick recap: Last time we talked about tumor evolution and I presented a toy example to introduce key concepts. I also introduced the intra-tumor phylogeny problem: Given a sample of the genomes of clones in a tumour, reconstruct its `life history’. This problem consists of two sub-problems: (1)identification of clones, and (2) inferring evolutionary relationships between clones.

This problem falls into the general area of reconstructing phylogenetic trees — so how does inferring clonal trees compare to classical phylogenetic methods?

View original post 777 more words

Computational Genomics

Inferring tumour evolution 1 – The intra-tumour phylogeny problem

Scientific B-sides

“Cancer evolves dynamically as clonal expansions supersede one another driven by shifting selective pressures, mutational processes, and disrupted cancer genes. These processes mark the genome, such that a cancer’s life history is encrypted in the somatic mutations present,”

write Nik-Zainal et al in the abstract of their 2012 Cell paper `The life history of 21 breast cancers‘. The key figure of their paper shows a phylogenetic tree of tumor development in a patient. The paper contains lots of computational work on analyzing and interpreting mutations based on deep-sequencing data, but –a big surprised but— the very last step of putting together the tree was done manually. Half the paper is describing the reasoning that Peter Campbell and his group used to condense all the evidence they had gathered from genomic data into the tree – but there is no algorithm.

View original post 951 more words


A practical guideline for writing research paper

Ten Simple Rules for Writing Research Papers

Computational Genomics, Machine Learning, Medicine Genomics, Programming, R

A new R package for network-based biomarker discovery released

A new R package, netClass, has been release. netClass integrate network information, such as protein-protein interaction network or KEGG, to mRNA classification, but also incorporate miRNA to mRNA with mi-mRNA interaction network for biomarker discovery. This methods we called stSVM and already published in PloS ONE (Cun et al 2013). Apart from stSVM, we also implement the flowing methods in netClass: 

  1. AEP (average gene expression of pathway), Guo et al., BMC Bioinformatics 2005, 6:58.
  2. PAC (pathway activitive classification), Lee E, et  al., PLoS Comput Biol 4(11): e1000217.
  3. hubc (Hub nodes classification), Taylor et al.(2009) Nat. Biotech.: doi: 10.1038/nbt.152
  4. frSVM (filter via top ranked genes), Cun et al. arXiv:1212.3214 ;  Winter etal., PLoS Comput Biol 8(5): e1002511.
  5. stSVM (network smoothed t-statistic) , Cun et al., PloS One,.

NetClass can be download from souceforg ( or , CRAN ( ). For more detail of netClass, you can refer these four papers:


Lecture on Machine Learning

Probabilistic Graphical Models

Discriminative Learning of Sum-Product Networks

Graphical Models via Generalized Linear Models

Classification with Deep Invariant Scattering Networks

Dirichlet Process: Practical Course

Hilbert Space Embedding for Dirichlet Process Mixtures

Exploring transcription regulation through cell-to-cell variability


Understanding Gene Regulatory Networks and Their Variations


Rich Probabilistic Models for Holistic Scene Understanding