Midterm Info and Study Guide
23 Apr 2026The midterm will be in two parts.
Part one will be in class and closed book, but You can bring one hand-written page of notes.
Part two will be take home and open book.
Part One–In Class
This will be a concept and knowledge-based exam and will not include true scripting/coding. You may be asked to pseudo-code or to find a code error.
You can bring one hand-written page of notes.
Study guide
- Know the general format of Linux and R commands.
- Be able to explain / interpret and create relative and absolute file paths.
- Know which Linux commands you would use to
- rename a file
- copy a file
- move a file
- navigate the file system
- look at file size
- see the contents of a file
- Know how to create and access variables in Linux. What does
$do in Linux? - What does
$(CMD)do in Linux? (Where CMD is a Linux command). - Know the difference between
git add,git commit,git pullandgit push. - Know the structure of a
forloop in Linux. Be able to pseudo-code aforloop. - BLAST
- What is the format of a FASTA file?
- What are the different BLAST programs and how do you choose?
- What is word size, and how do you choose?
- What is an e-value and how do you interpret it?
- Be able to interpret a PCA plot of SNP data.
- How can you tell if population structure is a problem for your trait?
- Be able to interpret a phylogenetic tree
- Explain how information is passed from the UI to the server in a Shiny app.
- How does GWAS work?
- What is a QQ plot? be able to interpret one.
- What is population structure and how does it affect GWAS?
- Be able to interpret a Manhattan plot
- Genome Assembly
- What does BUSCO stand for, and how is it used to evaluate genome assemblies?
- What is N50 and how is it calculated? How is it used to evaluate genome assemblies?
Example Questions
1A: You have cloned a gene of interest from the Eastern Monarch Butterfly population and you want to find its ortholog in Western Butterfly via a nucleotide BLAST search. You expect the sequences to be very similar. Considering the word sizes we looked at in the BLAST lab, what word size you use? Why? (Conceptual question, no code needed)
1B: Will your choice in part 1A cause an increased or decreased sensitivity (ability to detect distant homologs)? Why? (Conceptual question, no code needed)
1C: If you wanted to blast all of the genes in the Eastern Monarch Butterfly genome against those in the Western Monarch Butterfly genome to find close orthologs would that change your word size choice?
2 You want to rename the file monarch_genome.fasta to monarch_genome_v1.fasta. What Linux command would you use? Include the command and how the file names would be used in the command.
3 what git commands would you use to update your local repository with the latest changes from the remote repository, and then to submit your changes to the remote repository?
4 Compare and contrast git pull and git clone: how are these related and what makes them different?
5 You have 100 FASTA files in a directory. You want to run a separate BLAST search on each of them. Write a pseudo-code for loop that would accomplish this task. You do not need to write the actual BLAST command, just the structure of the loop and how you would use the file names in the loop.
6 The code below does not work. What is the error?
testFiles=$(ls test*.txt)
for file in $testFiles
do
cat file
done
The output is
cat: file: No such file or directory
cat: file: No such file or directory
cat: file: No such file or directory
Note: not all topics covered in the midterm will be represented in the example questions above. The example questions are meant to give you a sense of the types of questions that will be on the exam, but they are not meant to be an exhaustive list of topics or question types.
Part Two–Take Home
No study guide for this; it will be open book/web and unlimited time. You will be given data sets and asked to do analyses similar to what you have done in the labs and assignments.