Final Exam Info and Study Guide

The final is NOT comprehensive, it will focus on topics since the midterm but because the class is cumulative, there may be some questions that require you to apply concepts from earlier in the class.

The final will be in two parts: a take-home part and an in-class part. The take-home part will be open book/web and unlimited time. The in-class part will be closed book but you can bring one hand-written page of notes.

In Class part

This will be a concept and knowledge-based exam and will not include true scripting/coding. You may be asked to pseudo-code or to find a code error.

You can bring one hand-written page of notes.

Study guide

Example Questions

Question 1 Consider the following data from an Illumina sequencing experiment:

A00887:346:H2VK2DSX2:1:1141:3884:31986	163	scaffold_0	43675	255	70M89N73M	=	43736	293
CATTTCTCACCTCCTCAAGGCAACTTTCAAGCTCCTTCAATTCTTCATCCTCCGAGAAGCTCACTGTGGCTTGTTTGATTGTGTTCTTCAAATGCATCTCAGCACTAAAGAGCTCTCGCCTGCTTCCTGTGGACACTGAGATC
FI,5FFFFFFFFFFFFFFFFFFF,FFFFFF:FFFFFF,FF:FFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF,FF:FFFFFFFFFFFFFF:
NH:i:1	HI:i:1	AS:i:276	nM:i:6

A (2pts): The data above comes from what kind of file?

B: (3pts): The sequencing quality is in Phred+33 or Phred+64? (Note: I would give you this chart for the exam.)

C: (2pts) Convert the quality of the fourth base to a Phred score.

D: (2pts) Convert the Phred score from part C to a p-value. If you aren’t sure how to do this, give the formula.

E: (3 pts) Explain what the probability from D means in terms of confidence in the sequence. If you are stuck on C or D then just pick a p-value to illustrate (like what does a p-value of 0.01 mean in this context).

F: (2pts) What does the number “43675” refer to?

G: (2pts) What does the number “255” refer to?

H: (2pts) What does “70M89N73M” mean?

Question 2 You have performed a differential gene expression analysis to find genes expressed differentially in the developing petals of red vs white roses. You have a list of 963 genes that are higher in red petals and 1204 genes that are higher in white petals. Briefly describe three follow-up analyses you could do with these gene lists to try to understand the biological basis for the differences in petal color. For each analysis explain what you would hope to learn. Two sentences for each analysis.

Question 3 Examine the gene co-expression network graph below.

A: What is a node in this graph? Describe what the nodes in this graph represent.

B: What is an edge in this graph? Describe what the edges in this graph represent.

C: Which gene has the highest degree centrality? Explain your choice.

D: Which gene has the highest betweenness centrality? Explain your choice.

Question 4 You are examining a .vcf file. On different rows you see the following entries. What does each of these mean?

A: 0/1 B: 1/1 C: 0/0

Question 5 You have performed a metagenomics analysis of the microbial communities in the soil of a tomato farm and the soil of a nearby grassland. You find that the alpha diversity of the tomato farm soil is much lower than that of the grassland soil. What does this mean? What could be some reasons for this difference?

Note: not all topics covered in the final will be represented in the example questions above. The example questions are meant to give you a sense of the types of questions that will be on the exam, but they are not meant to be an exhaustive list of topics or question types.

Take Home

No study guide for this; it will be open book/web and unlimited time. You will be given data sets and asked to do analyses similar to what you have done in the labs and assignments.