Quality Scores

A quality score (or Q-score) expresses an error probability. In particular, it serves as a convenient and compact way to communicate very small error probabilities.

Given an assertion, A, the quality score, Q(A), expresses the probability that A is not true, P(~A), according to the relationship:

Q(A) =-10 log10(P(~A))

where P(~A) is the estimated probability of an assertion A being wrong.

The relationship between the quality score and error probability is demonstrated with the following table:

Quality score, Q(A)

Error probability, P(~A)

10

0.1

20

0.01

30

0.001

On supported Illumina systems, Q-scores are automatically binned. The specific binning applied depends on the current Q-table. See the white paper Reducing Whole Genome Data Storage Footprint for more information, available from the Illumina website.

Quality Score Encoding

In FASTQ files, quality scores are encoded into a compact form, which uses only 1 byte per quality value. In this encoding, the quality score is represented as the character with an ASCII code equal to its value + 33. The following table demonstrates the relationship between the encoding character, its ASCII code, and the quality score represented.

When Q-score binning is in use, the subset of Q-scores applied by the bins is displayed.

Symbol

ASCII Code

Q-Score

!

33

0

"

34

1

#

35

2

$

36

3

%

37

4

&

38

5

'

39

6

(

40

7

)

41

8

*

42

9

+

43

10

,

44

11

-

45

12

.

46

13

/

47

14

0

48

15

1

49

16

2

50

17

3

51

18

4

52

19

5

53

20

6

54

21

7

55

22

8

56

23

9

57

24

:

58

25

;

59

26

<

60

27

=

61

28

>

62

29

?

63

30

@

64

31

A

65

32

B

66

33

C

67

34

D

68

35

E

69

36

F

70

37

G

71

38

H

72

39

I

73

40

Last updated