Illumina Q&A

Julin Maloof

2026-05-12

Questions

  • I didn’t understand the fastq line with the phred scores, specifically how there are different versions, I understood the general idea but I got the question wrong and I wasn’t sure why.

  • How do you determine the [phred] scale of a .fastq? Give an example if possible.

FASTQ

FASTQ files have 4 lines of information for each read

@HWUSI-EAS100R:6:73:941:1973#0/1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCC
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**
  1. @SEQID
  2. Sequence
  3. Starts with “+” and then usually blank
  4. Quality information

PHRED QUALITY

  • Each base has a quality score representing how confident the machine is that the base is correct.

  • These are called PHRED scores are range from 0 to ~ 40, where

\(PHRED = -10 * log_{10}(p)\)

and \(p\) is the probability that the reported base is wrong.

  • QUESTION: If there is a 1 in 100 chance that the base is wrong, what is the PHRED score? (Try this in R)

PHRED Qualities, part 2

But how can the following encode PHRED qualities?

!''*((((***+))%%%++)(%%%%).1***-+*''))**

In computerese each character is represented internally as a number. This is called the ASCII code.

For example ! has an ASCII code of 33, * has an ASCII code of 42, etc.

Thus, these characters represent numbers, and numbers can represent quality.

Why use characters instead of numbers?

PHRED Qualities, part 3

To add an additional wrinkle, the ASCII codes must be converted to the actual PHRED scores.

Why? ASCII characters 0 - 32 are invisible so they can’t be used.

Additionally, different starting points have been used:

Diagram showing different PHRED quality encoding schemes including Sanger (Phred+33), Solexa (Solexa+64), and Illumina 1.3+, 1.5+, and 1.8+ (Phred+64 or Phred+33). The diagram displays ASCII character ranges from 33 to 126, with corresponding PHRED quality scores ranging from 0 to 41. Different encoding schemes are shown in different colored rows with their typical quality score ranges.

How do we determine the PHRED scale?

Look for unique characters for the different scales and see if they are in your sequence

Diagram showing different PHRED quality encoding schemes including Sanger (Phred+33), Solexa (Solexa+64), and Illumina 1.3+, 1.5+, and 1.8+ (Phred+64 or Phred+33). The diagram displays ASCII character ranges from 33 to 126, with corresponding PHRED quality scores ranging from 0 to 41. Different encoding schemes are shown in different colored rows with their typical quality score ranges.

Playposit question review

Your FASTQ file has qualities listed as “E”. Which scale was used in this file?

Diagram showing different PHRED quality encoding schemes including Sanger (Phred+33), Solexa (Solexa+64), and Illumina 1.3+, 1.5+, and 1.8+ (Phred+64 or Phred+33). The diagram displays ASCII character ranges from 33 to 126, with corresponding PHRED quality scores ranging from 0 to 41. Different encoding schemes are shown in different colored rows with their typical quality score ranges.

Playposit question review

Your FASTQ file has qualities listed as “E”. Which scale was used in this file?

Diagram showing different PHRED quality encoding schemes including Sanger (Phred+33), Solexa (Solexa+64), and Illumina 1.3+, 1.5+, and 1.8+ (Phred+64 or Phred+33). The diagram displays ASCII character ranges from 33 to 126, with corresponding PHRED quality scores ranging from 0 to 41. Different encoding schemes are shown in different colored rows with their typical quality score ranges.

Example

What is the PHRED scale for:

@A00953:413:HJ7WMDSX2:2:1101:27597:1031 1:N:0:AAGTGTAT+CATTAATT
AACACCTGCCCTATATACTCTCTCTCTCTCGTAGTTTCTACTTCCAGAACTCATCTCTCTCTCC
+
FFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFF:F:F,FFFFF,:FFFFFF,:F
Diagram showing different PHRED quality encoding schemes including Sanger (Phred+33), Solexa (Solexa+64), and Illumina 1.3+, 1.5+, and 1.8+ (Phred+64 or Phred+33). The diagram displays ASCII character ranges from 33 to 126, with corresponding PHRED quality scores ranging from 0 to 41. Different encoding schemes are shown in different colored rows with their typical quality score ranges.