Midterm Info and Study Guide

The midterm will be in two parts.

Part one will be in class and closed book, but You can bring one hand-written page of notes.

Part two will be take home and open book.

Part One–In Class

This will be a concept and knowledge-based exam and will not include true scripting/coding. You may be asked to pseudo-code or to find a code error.

You can bring one hand-written page of notes.

Study guide

  • Know the general format of Linux and R commands.
  • Be able to explain / interpret and create relative and absolute file paths.
  • Know which Linux commands you would use to
    • rename a file
    • copy a file
    • move a file
    • navigate the file system
    • look at file size
    • see the contents of a file
  • Know how to create and access variables in Linux. What does $ do in Linux?
  • What does $(CMD) do in Linux? (Where CMD is a Linux command).
  • Know the difference between git add, git commit, git pull and git push.
  • Know the structure of a for loop in Linux. Be able to pseudo-code a for loop.
  • BLAST
    • What is the format of a FASTA file?
    • What are the different BLAST programs and how do you choose?
    • What is word size, and how do you choose?
    • What is an e-value and how do you interpret it?
  • Be able to interpret a PCA plot of SNP data.
  • How can you tell if population structure is a problem for your trait?
  • Be able to interpret a phylogenetic tree
  • Explain how information is passed from the UI to the server in a Shiny app.
  • How does GWAS work?
    • What is a QQ plot? be able to interpret one.
    • What is population structure and how does it affect GWAS?
    • Be able to interpret a Manhattan plot
  • Genome Assembly
    • What does BUSCO stand for, and how is it used to evaluate genome assemblies?
    • What is N50 and how is it calculated? How is it used to evaluate genome assemblies?

Example Questions

1A: You have cloned a gene of interest from the Eastern Monarch Butterfly population and you want to find its ortholog in Western Butterfly via a nucleotide BLAST search. You expect the sequences to be very similar. Considering the word sizes we looked at in the BLAST lab, what word size you use? Why? (Conceptual question, no code needed)

1B: Will your choice in part 1A cause an increased or decreased sensitivity (ability to detect distant homologs)? Why? (Conceptual question, no code needed)

1C: If you wanted to blast all of the genes in the Eastern Monarch Butterfly genome against those in the Western Monarch Butterfly genome to find close orthologs would that change your word size choice?

2 You want to rename the file monarch_genome.fasta to monarch_genome_v1.fasta. What Linux command would you use? Include the command and how the file names would be used in the command.

3 what git commands would you use to update your local repository with the latest changes from the remote repository, and then to submit your changes to the remote repository?

4 Compare and contrast git pull and git clone: how are these related and what makes them different?

5 You have 100 FASTA files in a directory. You want to run a separate BLAST search on each of them. Write a pseudo-code for loop that would accomplish this task. You do not need to write the actual BLAST command, just the structure of the loop and how you would use the file names in the loop.

6 The code below does not work. What is the error?

testFiles=$(ls test*.txt)

for file in $testFiles
    do
        cat file
    done

The output is

cat: file: No such file or directory
cat: file: No such file or directory
cat: file: No such file or directory

Note: not all topics covered in the midterm will be represented in the example questions above. The example questions are meant to give you a sense of the types of questions that will be on the exam, but they are not meant to be an exhaustive list of topics or question types.

Part Two–Take Home

No study guide for this; it will be open book/web and unlimited time. You will be given data sets and asked to do analyses similar to what you have done in the labs and assignments.

Welcome to BIS180L

Dear Students,

Welcome to BIS180L. I am looking forward to teaching you in this class. This class will give you hands-on experience with bioinformatics data analysis.

Class will meet in person in TLC 2216

Pre-recorded Lectures

For the most part I will provided lectures in a pre-recorded format. This will allow the best use of our time together in the classroom. Pre-recorded lectures will be available at the latest at 9AM the day before the lecture/lab time. You will need to watch them in advance and answer embedded quiz questions via playposit. Due date for the quiz is 9AM the day of lecture. This give me and the TA time to review questions before class.

There is no video due for the March 31 class

Class meeting time

Even though lectures are pre-recorded, we will still meet at 1:10 PM.

In class

We will devote the beginning of each class / lab to questions on the lecture material. You will then work in small groups of two or three to work on the lab material. John (TA) and I will be available to help when you have questions.

I expect you to be in class/lab each day at 1:10. If you have a Covid or other health emergency that prevents you from coming to class, please let me know (in advance if possible).

Outside of class

You can expect to spend a fair bit of time outside of class working on the assignments. This can either be done in one of the computer labs, or on your laptop. If you are not willing to put in time outside of class this is probably not the best class for you. If you do put in the time outside of class you will be rewarded by learning a great deal about how to perform bioinformatics analyses.

Reading Material / Text Book

We will use Vince Buffalo’s excellent book Bioinformatics Data Skills.

This book is available online for free through the UC Davis Library (on campus or VPN connection required). UC is licensed for simultaneous access by 28 people. If that doesn’t work for you, you can buy it direct from the publisher or from amazon. As of this posting Amazon is cheaper and has used copies as well.

Additionally we will use Hadley Wickham’s also excellent book R for Data Science which is available online for free. If you would like a physical copy, here is the amazon link