2026-05-12
I didn’t understand the fastq line with the phred scores, specifically how there are different versions, I understood the general idea but I got the question wrong and I wasn’t sure why.
How do you determine the [phred] scale of a .fastq? Give an example if possible.
FASTQ files have 4 lines of information for each read
@HWUSI-EAS100R:6:73:941:1973#0/1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCC
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**
Each base has a quality score representing how confident the machine is that the base is correct.
These are called PHRED scores are range from 0 to ~ 40, where
\(PHRED = -10 * log_{10}(p)\)
and \(p\) is the probability that the reported base is wrong.
But how can the following encode PHRED qualities?
!''*((((***+))%%%++)(%%%%).1***-+*''))**
In computerese each character is represented internally as a number. This is called the ASCII code.
For example ! has an ASCII code of 33, * has an ASCII code of 42, etc.
Thus, these characters represent numbers, and numbers can represent quality.
Why use characters instead of numbers?
To add an additional wrinkle, the ASCII codes must be converted to the actual PHRED scores.
Why? ASCII characters 0 - 32 are invisible so they can’t be used.
Additionally, different starting points have been used:
Look for unique characters for the different scales and see if they are in your sequence
Your FASTQ file has qualities listed as “E”. Which scale was used in this file?
Your FASTQ file has qualities listed as “E”. Which scale was used in this file?
What is the PHRED scale for:
@A00953:413:HJ7WMDSX2:2:1101:27597:1031 1:N:0:AAGTGTAT+CATTAATT
AACACCTGCCCTATATACTCTCTCTCTCTCGTAGTTTCTACTTCCAGAACTCATCTCTCTCTCC
+
FFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFF:F:F,FFFFF,:FFFFFF,:F