Julin Maloof
Find genes that have different transcript levels:
Whiteboard…
One common method of RNAseq quantification is Reads (or Fragments) Per Kilobase of gene length per Million reads mapped
Don't use it (at least for statistical analysis)
Whiteboard…
Problems:
Whiteboard…
In edgeR a model matrix
is created to describe the possible experimental effects on gene expression.
Our additive model matrix will look like this:
(Intercept) | gtR500 | trtDP | |
---|---|---|---|
IMB211_DP_1 | 1 | 0 | 1 |
IMB211_DP_2 | 1 | 0 | 1 |
IMB211_DP_3 | 1 | 0 | 1 |
IMB211_NDP_1 | 1 | 0 | 0 |
IMB211_NDP_2 | 1 | 0 | 0 |
IMB211_NDP_3 | 1 | 0 | 0 |
R500_DP_1 | 1 | 1 | 1 |
R500_DP_2 | 1 | 1 | 1 |
R500_DP_3 | 1 | 1 | 1 |
R500_NDP_1 | 1 | 1 | 0 |
R500_NDP_2 | 1 | 1 | 0 |
R500_NDP_3 | 1 | 1 | 0 |
The 1s and 0s specify which effects are present in each sample and are used in the statistical model:
\[ expression \sim intercept + gt + trt \]
(Intercept) | gtR500 | trtDP | |
---|---|---|---|
IMB211_DP_1 | 1 | 0 | 1 |
IMB211_DP_2 | 1 | 0 | 1 |
IMB211_DP_3 | 1 | 0 | 1 |
IMB211_NDP_1 | 1 | 0 | 0 |
IMB211_NDP_2 | 1 | 0 | 0 |
IMB211_NDP_3 | 1 | 0 | 0 |
R500_DP_1 | 1 | 1 | 1 |
R500_DP_2 | 1 | 1 | 1 |
R500_DP_3 | 1 | 1 | 1 |
R500_NDP_1 | 1 | 1 | 0 |
R500_NDP_2 | 1 | 1 | 0 |
R500_NDP_3 | 1 | 1 | 0 |
The 1s and 0s specify which effects are present in each sample and are used in the statistical model:
\[ expression \sim intercept + gt + trt \]
Which is shorthand for
\[ expression \sim intercept + gtR500\_Effect*gtR500 + trtDP\_Effect*trtDP \]
(Intercept) | gtR500 | trtDP | |
---|---|---|---|
IMB211_DP_1 | 1 | 0 | 1 |
IMB211_DP_2 | 1 | 0 | 1 |
IMB211_DP_3 | 1 | 0 | 1 |
IMB211_NDP_1 | 1 | 0 | 0 |
IMB211_NDP_2 | 1 | 0 | 0 |
IMB211_NDP_3 | 1 | 0 | 0 |
R500_DP_1 | 1 | 1 | 1 |
R500_DP_2 | 1 | 1 | 1 |
R500_DP_3 | 1 | 1 | 1 |
R500_NDP_1 | 1 | 1 | 0 |
R500_NDP_2 | 1 | 1 | 0 |
R500_NDP_3 | 1 | 1 | 0 |
To test whether a factor is an important determinant of gene expression:
Compare models with and without that term.
For example, to test if genotype is important, we would compare
(full model) \( expression \sim intercept + gt + trt \)
to
(reduced model) \( expression \sim intercept + trt \)
If the full model fits the data significantly better than the reduced model, then we conclude that genotype is important.