Blast Questions (and Answers)

Julin Maloof
April 02, 2026

How BLAST algorithm determines where to extend in genome searching?

  • BLAST works in two stages
  • it breaks your query sequence into chunks (words).
    • For megablast the default word size is 28
  • it then compares the words in your query to a database of all words in the genome(s) that you are searching
  • If a word in your query matches a sequence in the database, then stage 2 (extension) starts

I still need help understanding the difference between megablast and dc-megablast.

  • Remember that BLAST initially search for matching words
  • In megablast the default word length is 28 nucleotides (or 11 for blastn)
  • megablast and blastn require that all nucleotides in the word match EXACTLY
  • dc-megablast allows gaps in the match (not all nucleotides need to match)
    • Default is 11 nucleotides out of 18

Am I understanding correctly that gaps are the misalignment when it comes to calculating the alignment quality for the score?

No, gaps are their own category with their own penalty. Generally lower.

is the megablast used by default then since dc-megablast is discontinued now?

dc-megablast is not discontinued (confirmed on NCBI website today)

megblast is web and command line default

Are nested for-loops possible?

Yes, and you will get to practice this today

save often!

Save you work very often! Don't learn this the hard way!

And add, commit, push!

Set up git merge strategy

I forgot to set this up on the image. Please open a terminal on your instance and enter

git config --global pull.rebase false

preserves history of both branches and makes the merge its own commit

demo of git functionality in R studio