RNAseq: Differential Expression

Julin Maloof

Find genes that have different transcript levels:

In theory: simple.
Genes expressed at higher levels in the plants should generate more sequence fragments.
Therefore we can count the number of reads mapping to a particular gene.
- If IMB211 has more counts than R500 for a particular gene, that may indicate higher expression

Whiteboard…

RNAseq count data is not normally distributed

What if you have more overall reads in one library as compared to the other?

What if one gene is expressed so highly in one sample that it dominates the read counts?

Small number of replicates (~ 3 per sample type), large number of tests (30,000 - 40,000)
- Also need multiple testing correction

One common method of RNAseq quantification is Reads (or Fragments) Per Kilobase of gene length per Million reads mapped

Don't use it (at least for statistical analysis)

Whiteboard…

Problems:

1 read in a 100bp gene and 10 reads in a 1000bp gene both have the same RPKM. (Why is this a problem?)
Assumes that an uniform normalization is appropriate

Whiteboard…

In edgeR a model matrix is created to describe the possible experimental effects on gene expression.

Our additive model matrix will look like this:

The 1s and 0s specify which effects are present in each sample and are used in the statistical model:

\[ expression \sim intercept + gt + trt \]

The 1s and 0s specify which effects are present in each sample and are used in the statistical model:

\[ expression \sim intercept + gt + trt \]

Which is shorthand for

\[ expression \sim intercept + gtR500\_Effect*gtR500 + trtDP\_Effect*trtDP \]

To test whether a factor is an important determinant of gene expression:

Compare models with and without that term.

For example, to test if genotype is important, we would compare

(full model) \( expression \sim intercept + gt + trt \)

(reduced model) \( expression \sim intercept + trt \)

If the full model fits the data significantly better than the reduced model, then we conclude that genotype is important.