Julin Maloof
Find genes that have different transcript levels:
Whiteboard…
One common method of RNAseq quantification is Reads (or Fragments) Per Kilobase of gene length per Million reads mapped
Don't use it (at least for statistical analysis)
Whiteboard…
Problems:
Whiteboard…
In edgeR a model matrix
is created to describe the possible experimental effects on gene expression.
Our additive model matrix will look like this:
(Intercept) | gtR500 | trtDP | |
---|---|---|---|
IMB211_DP_1 | 1 | 0 | 1 |
IMB211_DP_2 | 1 | 0 | 1 |
IMB211_DP_3 | 1 | 0 | 1 |
IMB211_NDP_1 | 1 | 0 | 0 |
IMB211_NDP_2 | 1 | 0 | 0 |
IMB211_NDP_3 | 1 | 0 | 0 |
R500_DP_1 | 1 | 1 | 1 |
R500_DP_2 | 1 | 1 | 1 |
R500_DP_3 | 1 | 1 | 1 |
R500_NDP_1 | 1 | 1 | 0 |
R500_NDP_2 | 1 | 1 | 0 |
R500_NDP_3 | 1 | 1 | 0 |
The 1s and 0s specify which effects are present in each sample and are used in the statistical model:
expression∼intercept+gt+trt
(Intercept) | gtR500 | trtDP | |
---|---|---|---|
IMB211_DP_1 | 1 | 0 | 1 |
IMB211_DP_2 | 1 | 0 | 1 |
IMB211_DP_3 | 1 | 0 | 1 |
IMB211_NDP_1 | 1 | 0 | 0 |
IMB211_NDP_2 | 1 | 0 | 0 |
IMB211_NDP_3 | 1 | 0 | 0 |
R500_DP_1 | 1 | 1 | 1 |
R500_DP_2 | 1 | 1 | 1 |
R500_DP_3 | 1 | 1 | 1 |
R500_NDP_1 | 1 | 1 | 0 |
R500_NDP_2 | 1 | 1 | 0 |
R500_NDP_3 | 1 | 1 | 0 |
The 1s and 0s specify which effects are present in each sample and are used in the statistical model:
expression∼intercept+gt+trt
Which is shorthand for
expression∼intercept+gtR500_Effect∗gtR500+trtDP_Effect∗trtDP
(Intercept) | gtR500 | trtDP | |
---|---|---|---|
IMB211_DP_1 | 1 | 0 | 1 |
IMB211_DP_2 | 1 | 0 | 1 |
IMB211_DP_3 | 1 | 0 | 1 |
IMB211_NDP_1 | 1 | 0 | 0 |
IMB211_NDP_2 | 1 | 0 | 0 |
IMB211_NDP_3 | 1 | 0 | 0 |
R500_DP_1 | 1 | 1 | 1 |
R500_DP_2 | 1 | 1 | 1 |
R500_DP_3 | 1 | 1 | 1 |
R500_NDP_1 | 1 | 1 | 0 |
R500_NDP_2 | 1 | 1 | 0 |
R500_NDP_3 | 1 | 1 | 0 |
To test whether a factor is an important determinant of gene expression:
Compare models with and without that term.
For example, to test if genotype is important, we would compare
(full model) expression∼intercept+gt+trt
to
(reduced model) expression∼intercept+trt
If the full model fits the data significantly better than the reduced model, then we conclude that genotype is important.