Computational Genomics, Medicine Genomics, System Biology

Sclust paper published on NP

After years fighting, our Sclsut paper published on Nature Protocols finally. Enjoy!

Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust

  • Nature Protocols volume13pages1488–1501 (2018)
  • doi:10.1038/nprot.2018.033
Published: 24 May 2018


The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking <10 min. Sclust is designed such that even non-experts in computational biology or bioinformatics with basic knowledge of the Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.

Computational Genomics, Medicine Genomics

A new fast method for copy number calling, tissue purity estimating and subclone inferring in cancer genome

Our new methods final launched on Nature Protocols, where we developed a series of methods and related C++/R combined software package,  Sclust(around 1.5Gb,大文件谨慎载). In Sclust, you can do copy number calling, cancer tissue purity estimating and clone and subclone structure inferring from normal-tumor paired whole genome/exon sequencing data.


1. 可以准确地做copy number calling, tumor purity estimating,subclonal inferring;

2. subclonal inferring的速度超级快。4000~6000 个SNVs 的 clonal inferring 过程在个人电脑上只需3到5秒。

3. sclust 给出了每个集群的倍数树变异,目前还有少数个软件提供这个功能。


联系邮件。 下面clonal 推断一些背景。

Continue reading “A new fast method for copy number calling, tissue purity estimating and subclone inferring in cancer genome”

Computational Genomics, Medicine Genomics, System Biology

A useful course of biomedical data analysis

Biomedical Data Science:

Chapter 0 – Introduction

Chapter 1 – Inference

Chapter 2 – Exploratory Data Analysis

Chapter 3 – Robust Statistics

Chapter 4 – Matrix Algebra

Chapter 5 – Linear Models

Chapter 6 – Inference for High-Dimensional Data

Chapter 7 – Statistical Modeling

Chapter 8 – Distance and Dimension Reduction

Chapter 9 – Practical Machine Learning

Chapter 10 – Batch Effects

525.5x: Introduction to Bioconductor: Annotation and analysis

Setup and basics on biological background (Week 1)

Focus on data structure and management (Week 2)

Focus on genomic ranges (Week 3a)

Focus on genomic annotation (Week 3b)

Testing genome-scale hypotheses (Week 4)

525.6x: High-performance computing for reproducible genomics with Bioconductor

Visualization of genome scale data (Week 1)

Scalable genomic analysis (Week 2)

Multi-omic data integration (Week 3)

Fostering reproducible genome-scale analysis (Week 4)

Legacy material from 2015 Introduction to Bioconductor

RNA-seq data analysis

Variant Discovery and Genotyping

ChIP-seq data analysis

DNA methylation data analysis

Footnotes for all lectures


Continue reading “A useful course of biomedical data analysis”

Computational Genomics

【c】Frontiers in Single Cell Genomics, Suzhou

Frontiers in Single Cell Genomics


We are pleased to announce the Cold Spring Harbor Asia conference on Frontiers in Single Cell Genomics which will be held in Suzhou, China, located approximately 60 miles west of Shanghai. The conference will begin at 7:00pm on the evening of Monday November 7, and will conclude after lunch on November 11, 2016.

Continue reading “【c】Frontiers in Single Cell Genomics, Suzhou”

Computational Genomics

A brief introduction to “apply” in R

a good, practical guidline for “apply” in R.

What You're Doing Is Rather Desperate

At any R Q&A site, you’ll frequently see an exchange like this one:

Q: How can I use a loop to […insert task here…] ?
A: Don’t. Use one of the apply functions.

So, what are these wondrous apply functions and how do they work? I think the best way to figure out anything in R is to learn by experimentation, using embarrassingly trivial data and functions.

If you fire up your R console, type “??apply” and scroll down to the functions in the base package, you’ll see something like this:

Let’s examine each of those.

1. apply
Description: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”

OK – we know about vectors/arrays and functions, but what are these “margins”? Simple: either the rows (1), the columns (2) or both (1:2). By “both”, we mean “apply the…

View original post 1,003 more words