Computational Genomics


在经历了短暂的公司研发后, 从2018年7月起, 我又回到学术圈在中国科学院昆明植物所做PI,领导一个生物信息学研究的实验室,从事二代、三代基因组数据的从头组装、遗传变异分析和相关的比较基因组学研究。 研究方向主要侧重于应用统计/概率论理论,机器学习(统计学习)算法到最新的计算生物学问题中,特别关注的数据是植物基因组。



Computational Genomics, Medicine Genomics, System Biology

Sclust paper published on NP

After years fighting, our Sclsut paper published on Nature Protocols finally. Enjoy!

Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust

  • Nature Protocols volume13pages1488–1501 (2018)
  • doi:10.1038/nprot.2018.033
Published: 24 May 2018


The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking <10 min. Sclust is designed such that even non-experts in computational biology or bioinformatics with basic knowledge of the Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.

Computational Genomics, Medicine Genomics

A new fast method for copy number calling, tissue purity estimating and subclone inferring in cancer genome

Our new methods final launched on Nature Protocols, where we developed a series of methods and related C++/R combined software package,  Sclust(around 1.5Gb,大文件谨慎载). In Sclust, you can do copy number calling, cancer tissue purity estimating and clone and subclone structure inferring from normal-tumor paired whole genome/exon sequencing data.


1. 可以准确地做copy number calling, tumor purity estimating,subclonal inferring;

2. subclonal inferring的速度超级快。4000~6000 个SNVs 的 clonal inferring 过程在个人电脑上只需3到5秒。

3. sclust 给出了每个集群的倍数树变异,目前还有少数个软件提供这个功能。


联系邮件。 下面clonal 推断一些背景。

Continue reading “A new fast method for copy number calling, tissue purity estimating and subclone inferring in cancer genome”

Computational Genomics, Medicine Genomics, System Biology

A useful course of biomedical data analysis

Biomedical Data Science:

Chapter 0 – Introduction

Chapter 1 – Inference

Chapter 2 – Exploratory Data Analysis

Chapter 3 – Robust Statistics

Chapter 4 – Matrix Algebra

Chapter 5 – Linear Models

Chapter 6 – Inference for High-Dimensional Data

Chapter 7 – Statistical Modeling

Chapter 8 – Distance and Dimension Reduction

Chapter 9 – Practical Machine Learning

Chapter 10 – Batch Effects

525.5x: Introduction to Bioconductor: Annotation and analysis

Setup and basics on biological background (Week 1)

Focus on data structure and management (Week 2)

Focus on genomic ranges (Week 3a)

Focus on genomic annotation (Week 3b)

Testing genome-scale hypotheses (Week 4)

525.6x: High-performance computing for reproducible genomics with Bioconductor

Visualization of genome scale data (Week 1)

Scalable genomic analysis (Week 2)

Multi-omic data integration (Week 3)

Fostering reproducible genome-scale analysis (Week 4)

Legacy material from 2015 Introduction to Bioconductor

RNA-seq data analysis

Variant Discovery and Genotyping

ChIP-seq data analysis

DNA methylation data analysis

Footnotes for all lectures


Continue reading “A useful course of biomedical data analysis”

Computational Genomics

【c】Frontiers in Single Cell Genomics, Suzhou

Frontiers in Single Cell Genomics


We are pleased to announce the Cold Spring Harbor Asia conference on Frontiers in Single Cell Genomics which will be held in Suzhou, China, located approximately 60 miles west of Shanghai. The conference will begin at 7:00pm on the evening of Monday November 7, and will conclude after lunch on November 11, 2016.

Continue reading “【c】Frontiers in Single Cell Genomics, Suzhou”