Biomedical Data Science: http://genomicsclass.github.io/book/

## PH525x series – Biomedical Data Science

## Links and resources

- R markdown source files
- ePub version on Leanpub
- Links to the HarvardX class pages
- External resources and books
- Finding more help for data analysis

## Chapter 0 – Introduction

- Introduction [Rmd]
- Getting started [Rmd]
- Getting started exercises
- dplyr introduction [Rmd]
- dplyr introduction exercises
- Mathematical notation [Rmd]

## Chapter 1 – Inference

- Random variables [Rmd]
- Random variables exercises
- Populations and samples [Rmd]
- Populations and samples exercises
- CLT and t-distribution [Rmd]
- CLT and t-distribution exercises
- CLT in practice [Rmd]
- CLT in practice exercises
- t-test in practice [Rmd]
- Confidence intervals [Rmd]
- Power calculations [Rmd]
- Power calculations exercises
- Monte carlo [Rmd]
- Monte carlo exercises
- Permutation tests [Rmd]
- Permutation tests exercises
- Association tests [Rmd]
- Association tests exercises

## Chapter 2 – Exploratory Data Analysis

## Chapter 3 – Robust Statistics

## Chapter 4 – Matrix Algebra

- Introduction to using regression [Rmd]
- Introduction to using regression exercises
- Matrix notation [Rmd]
- Matrix notation exercises
- Matrix operations [Rmd]
- Matrix operations exercises
- Matrix algebra examples [Rmd]
- Matrix algebra examples exercises

## Chapter 5 – Linear Models

- Linear models introduction [Rmd]
- Linear models introduction exercises
- Expressing design formula [Rmd]
- Expressing design formula exercises
- Linear models in practice [Rmd]
- Linear models in practice exercises
- Standard errors [Rmd]
- Standard errors exercises
- Interactions and contrasts [Rmd]
- Interactions and contrasts exercises
- Collinearity [Rmd]
- Collinearity exercises
- QR and regression [Rmd]
- Linear models going further [Rmd]

## Chapter 6 – Inference for High-Dimensional Data

- Introduction to high-throughput data [Rmd]
- Introduction to high-throughput data exercises
- Inference for high-throughput data [Rmd]
- Inference for high-throughput data exercises
- Multiple testing [Rmd]
- Multiple testing exercises
- EDA for high-throughput data [Rmd]
- EDA for high-throughput data exercises

## Chapter 7 – Statistical Modeling

- Modeling [Rmd]
- Modeling exercises
- Bayes theorem [Rmd]
- Bayes theorem exercises
- Hierarchical models [Rmd]
- Hierarchical models exercises

## Chapter 8 – Distance and Dimension Reduction

- Distance [Rmd]
- Distance exercises
- PCA motivation [Rmd]
- SVD [Rmd]
- SVD exercises
- Projections [Rmd]
- Rotations [Rmd]
- MDS [Rmd]
- MDS exercises
- PCA [Rmd]

## Chapter 9 – Practical Machine Learning

- Clustering and heatmaps [Rmd]
- Clustering and heatmaps exercises
- Conditional expectation [Rmd]
- Conditional expectation exercises
- Smoothing [Rmd]
- Smoothing exercises
- Machine learning [Rmd]
- Crossvalidation [Rmd]
- Crossvalidation exercises

## Chapter 10 – Batch Effects

- Introduction to batch effects [Rmd]
- Confounding [Rmd]
- Confounding exercises
- EDA with PCA [Rmd]
- EDA with PCA exercises
- Adjusting with linear models [Rmd]
- Adjusting with linear models exercises
- Factor analysis [Rmd]
- Factor analysis exercises
- Adjusting with factor analysis [Rmd]
- Adjusting with factor analysis exercises

## 525.5x: Introduction to Bioconductor: Annotation and analysis

### Setup and basics on biological background (Week 1)

- Installing Bioconductor and finding help [Rmd]
- Three data types: reference DNA sequence, DNA variants, and gene expression [Rmd]
- Mapping/alignment software (optional) [Rmd]

### Focus on data structure and management (Week 2)

- Management of genome-scale data: Object-oriented solutions [Rmd]
- SummarizedExperiment in depth [Rmd]
- Management and processing of large numbers of BED files [Rmd]

### Focus on genomic ranges (Week 3a)

### Focus on genomic annotation (Week 3b)

- General overview of Bioconductor annotation [Rmd]
- Cheat sheet on Bioconductor annotation [Rmd]
- Translating addresses between genome builds: liftOver [Rmd]

### Testing genome-scale hypotheses (Week 4)

- Biological vs. technical variability [Rmd]
- t tests and multiple comparisons [Rmd]
- Moderated t tests via limma [Rmd]
- Introducing gene sets and gene set analysis [Rmd]
- Gene set analysis using the roast algorithm [Rmd]

## 525.6x: High-performance computing for reproducible genomics with Bioconductor

### Visualization of genome scale data (Week 1)

- Sketching the binding landscape over chromosomes with ggbio’s karyogram layout [Rmd]
- Plotting data in the context of genomic features with Gviz [Rmd]
- Visualizing NGS data [Rmd]
- Graphical user interfaces for multivariate data with shiny [Rmd]
- Clustering gene expression data with shiny [Rmd]
- Final remarks on visualization [Rmd]

### Scalable genomic analysis (Week 2)

- Overview of BiocParallel usage [Rmd]
- Introduction to Bioconductor’s Amazon Machine Instance for cluster creation and use in EC2 [Rmd]
- Interfacing to external data: SQLite, tabix, HDF5[Rmd]
- Benchmarking various out-of-memory solutions[Rmd]
- Sharded GRanges for scalable integrative analysis[Rmd]

### Multi-omic data integration (Week 3)

- Basic examples of multi-omic integration[Rmd]
- Using RTCGAToolbox outputs to integrate clinical, mutation, expression and methylation assays[Rmd]
- TCGA application: kataegis and rainfall plot[Rmd]

### Fostering reproducible genome-scale analysis (Week 4)

## Legacy material from 2015 Introduction to Bioconductor

- Installing Bioconductor and finding help
- Annotating phenotypes and molecular function
- The ExpressionSet Container
- IRanges and GRanges
- Operating on GRanges
- Cheat sheet for genomic annotation
- Translating addresses between reference builds with liftOver
- Cheat sheet for GRanges and other Bioconductor objects
- Importing NGS data with Bioconductor
- NGS read counting
- Technical versus biological variability
- Statistical Inference with Bioconductor
- Using limma
- Gene Sets Analysis
- Gene Sets Analysis in R

## RNA-seq data analysis

- Downloading and unzipping fastq files
- Genomic alignment with STAR
- Transcriptome alignment with RSEM
- RNA-seq at the gene-level: EDA, DE and SVA
- Differential exon usage
- Exploring plots of isoform-level expression with Cufflinks/cummeRbund

## Variant Discovery and Genotyping

- Genome variation from 2014

## ChIP-seq data analysis

- ChIP-seq from 2014

## DNA methylation data analysis

- DNA Methylation Data Analysis
- Reading 450K idat files with the minfi package
- Interactive visualization of DNA methylation data analysis
- Statistical Inference in the Analysis of DNA methylation Data

## Footnotes for all lectures

## Acknowledgments