BAMs and SNPs and IGV

Julin N. Maloof

Lab Goals

By then end of the last lab you should have mapped Illumina reads to the B. rapa genomic reference.

Our goals today are to:

  1. Learn about the SAM/BAM format
  2. View the mapped reads in a genome browser
  3. Identify SNPs between the reference and each of the strains that we sequenced.

SAM/ BAM Files

SAM/BAM files provide info about the alignment of each read to the genome.

More information here and here

SAM/BAM Fields

Col Field Type Brief Description
1 QNAME String Query template NAME
2 FLAG Int bitwise FLAG
3 RNAME String References sequence NAME
4 POS Int 1- based leftmost mapping POSition
5 MAPQ Int MAPing Quality
6 CIGAR String CIGAR String
7 RNEXT String Ref. name of the mate/next read
8 PNEXT Int Position of the mate/next read
9 TLEN Int observed Template LENgth
10 SEQ String segment SEQuence
11 QUAL String ASCII of Phred-scaled base QUALity+33

SAM/BAM

Let's take a look at one…

SAM/BAM CIGAR

code descrtiption
M alignment match
I insertion to the reference
D deletion from the reference
N skipped region from the reference
X sequence mismatch

SAM/BAM bitwise FLAG

Bit Description
1 0x1 template having multiple fragments
2 0x2 proper alignment
4 0x4 fragment unmapped
8 0x8 next fragment in the template unmapped
16 0x10 SEQ being reverse complemented
32 0x20 SEQ of the next fragment in the template being reverse complemented
64 0x40 the first fragment in the template
128 0x80 the last fragment in the template
256 0x100 secondary alignment
512 0x200 not passing filters
1024 0x400 PCR or optical duplicate

Discussion:

Not all reads map to the reference. Why might a read not map to the reference?

Discuss for 2-3 minutes. Try to answer the question.

IGV

The integrated genomics viewer allows us to visualize the results of our read mapping.

Let's take a look…

Criteria for calling SNPs

Discuss in groups:

If you were designing a SNP caller, what are important criteria for calling SNPs?

What might cause an erroneous call?

Useful SNPs for F2 cross

  • Goal is to map the genetic basis of a trait difference (which region of the genome is responsible)
  • To do this we need to know the parental origin of chromosomal segments in each F2.
  • Which SNPs are informative for this question? (Illustrate on board)

VCF file

Variant Call Format files are the output from the SNP caller and provide information about each possible SNP.

You will see these as you work through the lab. I can go over these files in dicussion tomorrow.