Welcome to BIS180L

Julin Maloof
Lecture 01

Course Personnel

Instructor: Julin Maloof jnmaloof@ucdavis.edu

Teaching Assistant: Matt Davis mtdavis@ucdavis.edu

What this course is about

The goal of this course is to introduce you to the tools and thinking required for bioinformatics analysis.

This is a computer-based class. No bench work.

Learning Objectives

At the end of this course you should be able to:

  • Use cloud computing resources like Jetstream and AWS
  • Navigate the Linux operating system at the command line for file management and invocation of bioinformatics programs
  • Automate repetitive computational tasks using loops
  • Write scripts in R for data analysis and display
  • Perform genome wide association studies (GWAS)
  • Analyze Illumina sequencing data to find sequence polymorphisms

Learning Objectives, Continued

  • Tentative: Perform a genome Assembly
  • Tentative: Find methylated regions of the genome
  • Analyze RNA-seq data to find differentially expressed genes and pathways
  • Make and deploy Shiny web applications
  • Reconstruct genetic co-expression networks
  • Analyze metagenomics data
  • Apply best practices for reproducible data analyses via git version control and markdown notebooks

Amount of Work (Bad News / Good News)

  • This class will require a lot of work outside of lab hours.
  • But you will gain a solid foundation in bioinformatic analysis, focused on second-generation sequencing data.

Why R and Linux?

  • Both R and Linux have command-line interfaces
  • Antiquated?
  • NO! Written code provides flexibility, creativity, and power not available in any other way
  • Linux (or Unix / Mac)
    • Outstanding built-in tools for data crunching
    • Provides access to hundreds of bioinformatics programs
    • Easy to automate repetitive tasks

Why R and Linux?

  • R
    • Powerful scripting, statistical, data processing, and graphical capabilities
    • Many bioinformatics packages are developed in R
    • Easy to automate repetitive tasks.

And

coding is cool

“Every student deserves the opportunity to learn computer programming. Coding can unlock creativity and open doors for an entire generation of American students. We need more coders — not just in the tech industry, but in every industry.” – Mark Pincus, CEO and Founder, Zynga

“Coding is engaging and empowering. It’s a necessary 21st Century skill.” – Jan Cuny, Program Officer, National Science Foundation

And

coding is cool

“Code has become the 4th literacy. Everyone needs to know how our digital world works, not just engineers.” – Mark Surman, Executive Director, The Mozilla Foundation

“If you can program a computer, you can achieve your dreams. A computer doesn’t care about your family background, your gender, just that you know how to code. But we’re only teaching it in a small handful of schools, why?” – Dick Costolo, Former CEO, Twitter

Do you want to succeed in this class?

Secrets for success:

  1. You are learning a new language, treat it as such.
  2. Go slow. If your goal is to get out of this room as soon as possible you will not succeed.
  3. Read the lab instructions carefully. See #2.
  4. Do not cut and paste code. Type it; you will learn it better.
  5. Take the time to understand a command. THINK!
  6. Do ask for help when you are stuck or confused.
  7. Be kind to yourself and your community.

Course Schedule

  • Most lectures pre-recorded
  • Do we want to keep this? To be revisited at the end of this week
  • Lecture Videos and embedded Playposit Quizzes due 9AM Tue/Thur
  • Released at least 24 Hours in advance.
  • To take the quiz be sure to start the video from Canvas > Assignments
  • To watch again: go to Canvas > Media Gallery
  • Tuesdays, Thursdays
    • Lab 1:10 - 5:00
  • Fridays
    • Discussion 1:10 - 2:00
    • Varied use
      • Q & A (with student helping to A as well as Q)
      • lecture
      • Keep on working

(Tentative) Course Outline

  • Week 1:
    • Linux fundamentals
    • Markdown, git repositories
    • Sequence analysis and BLAST
  • Week 2:
    • Analyze BLAST in R
  • Week 3:
    • Tidyverse
    • Alignment, tree building
  • Week 4:
    • SNPs
    • population structure, GWAS
  • Week 5:
    • Build a web-app
  • Week 6:
    • Genome Assembly
    • Methylation Analysis
  • Weeks 7-9:
    • Illumina reads
    • RNAseq
  • Week 9:
    • Genetic Networks
  • Week 10:
    • Metagenomics

Course Grading

  • 45% Lab assignments
  • 25% Take home midterm (Available May 03, Due May 09, 1:10 PM)
  • 25% Take home final (Available June 07, Due June 12, 9:00 AM)
  • 5% Playposit lecture quizzes + possible discussion Qs + Attendance

Do your own work

Developing code is an interactive process. Both your friends and the web can be excellent resources.

However Any direct copying of text or code from any person (in this class or on the web, etc) is considered plagiarism in the context of this course.

If you receive inspiration or ideas from an external source give attribution

Course Website

Reference Text

Author: Vince Buffalo

text

This is an excellent book that covers much of the material that is covered in lab.

You can use this to help with ideas that are not clear from class, or for more in depth coverage of the material.

I particularly recommend reading it in depth for anyone planning to build on the skills learned in this class.

It is available online for free, through UCD library. See links on course website.

Reference Text2

Authors: Hadley Wickham and Garrett Grolemund

text2

This is another excellent book that more generally covers data manipulation and analysis in R.

The author, Hadley Wickham, has built some very nice additions to R that we will make use of in this class.

Available online for free

Bioinformatics best practices

  1. Clear documentation
  2. Reproducible results
  3. Informative names (files, variables, functions)
  4. Logical organization
  5. Documents/Data in open (non-proprietary) formats
    • This is essential for achieving 1 and 2

Chicken and Egg Problem

  • You need some knowledge before you can do anything
  • But you need to do some things to set up your machine before you can gain knowledge
  • As a result the first few days of this class can feel a little disorienting
  • It will get better

Today's Lab

  1. Get a virtual linux machine running
  2. Clone Assignment 01 repo into Rstudio
  3. Practice Markdown
  4. Learn a little Linux command line

Virtual Linux machine

  • The computer lab machines run Windows
  • Bioinformatics on a Windows machine is painful (although getting better) (R is fine)
  • Solution: virtual machine!
  • Use Jetstream2 to run a virtual Linux machine in the cloud as part of the NSF-funded Access-CI
  • You can connect to your virtual machine from any computer, including your laptop or home computer

What is a virtual machine?

  • Universities have many very large computers (a.k.a. servers) in various places around the country
  • These servers can be split up to be many separate “virtual” machines, each emulating an individual computer
  • You can connect to these virtual machines and it just like having your own computer, but it is in the “cloud”
  • Terminology:
    • Each virtual machine is an instance of a machine image.
    • You can think of the image as a snapshot (or template) of a machine that captures the OS, the installed programs, etc.
    • The image that you are using is called BIS180L and was created by John Davis and myself for this class.

Connecting to the virtual machine

  • There are two ways to connect to your instance:
  1. Using a Virtual Network Connection (VNC). This allows the graphical display of the instance to be displayed on your local computer
  2. Using a secure shell (SSH). This provides a text connection to your instance. An advantage for slow internet connections.
  • You can also transfer files from your instance to other computers using SFTP (Secure File Transfer Protocol)

Demo VM connections

Quick demo of VNC and SSH

Other virtual machine notes

As detailed in the lab notes for today:

  • We have created a virtual machine instance for each of you
  • The first lab section details how to connect.

FAQ: Can I use my own computer?

  • Can you use your own computer to connect to your Virtual Machine?
    • Yes, this is easy and you will probably want to, to make it easy to work outside of class
    • But for today, please first connect using the lab PCs
    • Also eduroam is slow, so if everyone is in here on their own laptop on wifi, things won't work very well
  • Can you use your own computer to run the analyses directly?
    • For R/Rstudio analyses, probably (But make sure your R is up to date!)
    • For Linux/Possibly, if you are running a Mac or Linux machine. The first lab manual links to install notes for the Linux machine. I have a similar, although out-of-date document for Mac.
    • We will NOT provide troubleshooting help for your own computer (other than helping you connect it to the VM)

Today's Lab

  1. Get a virtual linux machine running
  2. Clone Assignment 01 repo into Rstudio
  3. Practice Markdown
  4. Learn a little Linux command line

Markdown

Markdown is a text-based formatting system for quickly and easily generating nicely formatted output.

This presentation as well as the entire BIS180L website was written in markdown.

It helps achieve three guiding principles:

  1. Clear documentation
  2. Reproducibility
  3. Open formats

Markdown vs docx

What is we want to produce this:

format

The markdown file that generates it is

Markdown vs docx

What is we want to produce this:

format

The word file that generates it is

Today's Lab

  1. Get a virtual linux machine running
  2. Clone Assignment 01 repo into Rstudio
  3. Practice Markdown
  4. Learn a little Linux command line

Linux Command Line

We will work through a tutorial developed by Ian Korf.

You are learning a new language; treat it as such.

  • Keep notes
  • Be patient
  • Practice and repetition help
  • It will get easier

Quick orientation to Linux command line

  • Instead of clicking on an icon, typing a command
    • command [options] [file-path]
  • directory structure

Assignments to turn in for this lab

  • Assignments will be turned in via github
  • There will be a separate repo for each assignment
  • Detailed instructions are given on the lab page for today
  • We can help if you get stuck
  • For today's lab you will need to turn in four files (two markdown files and their correspongind .html counterparts).
  • Due before class on Tuesday, but there will be new material this Thursday

Slack Channels

Slack is our main tool for communication (other than lecture and one-on-one lab time).

There are two UC Davis Slack channels that I have added you to:

  • bis180l-student-announcements-2024
    • Posting on this channel should be limited to me or John, although you can reply to posts with follow-up questions
  • bis180l-student-help-2024
    • This is the place to seek help. If you are confused or stuck someone else is as well.
    • If you know the answer to a question please contribute

Slack and Class Etiquette

  • Be kind and respectful to each other
  • Remember that we all come from different backgrounds
  • Promote questions and discussion
  • Think about how your Slack posts could be viewed by others

Refresher: Secrets for success

Website Tour

Lets go have fun!