Phred Quality Values and ABI 3700 Data

We are checking the accuracy of phred quality values when run on ABI 3700 chromatograms using quality value lookup tables calibrated for ABI 373/377 chromatograms. We used phred version 990722 for the tests described here.

Our goal is to determine the accuracy of the phred quality values on 3700 data using the current lookup tables in phred and whether we must modify phred for these data.

We obtained chromatograms and consensus sequence for finished human BAC projects from the Genome Sequencing Center at Washington University in Saint Louis, selecting projects with at least 10% ABI 3700 chromatograms. These projects contain 3700 chromatograms generated using dye primer and terminator chemistries with the POP5 matrix and dye terminator chemistry with the POP6 matrix. Essentially standard run conditions were used to generate these data except for the dye terminator POP6, which were run at 37C. Table 1 summarizes the quantities of aligned reads and bases that we used for this work.

Table 1. Aligned Reads and Bases
chemistry matrix number projects number reads number bases
primer POP5 9 5767 3477864
terminator POP5 18 10177 6354554
terminator POP6 12 8274 4354631

Methods

We performed the following procedures to process the data for these tests.

Results

Dye Primer POP5
The dye primer POP5 quality value accuracy plot shows good quality value accuracy up to about quality value 25. For larger phred quality values, the observed quality values are progressively lower, meaning that phred underestimates the error rates. We examined discrepancies with assigned quality values of 40 and higher and found a greater tendency to form compressions in comparison to slab gel runs. Many of the additional compressions have stem/loop motifs that are not a problem with slab gels. We consider the number of aligned bases used in this test to be marginal and hope to obtain additional data in the near future to improve our confidence in the result.

Dye Terminator POP5
The dye terminator POP5 quality value accuracy plot shows consistently good agreement between the phred and observed quality values up to quality value 30. For higher phred quality values, the observed quality values vary around the phred values without an apparent trend, suggesting that the variations are due to statistical fluctuations resulting from the relatively small number of aligned bases used for the test.

Dye Terminator POP6
The dye terminator POP6 quality value accuracy plot shows consistently good agreement between the phred and observed quality values up to and slightly above quality value 30. For higher phred quality values, the observed quality values, again, vary around the phred values without a clear trend, suggesting that the variations are due to the relatively small number of aligned bases. need more data

Conclusions

Based on these limited data sets, it appears that the current phred version (990722) assigns quality values with good accuracy up to quality value 25 for all tested dye chemistry/matrix run combinations. For dye primer chemistry run in the POP5 matrix, the phred quality values above 25 show a trend of progressively overestimating the quality. This trend appears to be due to a greater tendency of the strands to form compressions during the electrophoresis in comparison to slab gel runs, suggesting that we will need to modify phred to recognize a greater range of stem/loop motifs, and possibly create a quality value lookup table specifically for it. For dye terminator chemistry run on the POP5 and POP6 matrices, the phred quality values maintain good accuracy up to about quality value 30. Between phred quality values 30 and 40, the observed quality values exhibit modest, apparently random, variation around the phred quality values; the variation increases above quality value 40. This indicates that the phred quality values are generally valid for these dye terminator data but we need additional data to improve our confidence in the tests.

References and Notes

Ewing, B. & Green, P. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186-194 (1998).

Acknowledgements

We express appreciation for the efforts of the Genome Sequencing Center of Washington University to produce high quality finished projects, which are essential for these tests, and acknowledge their invaluable generosity and assistance in providing us with them.

This page was updated on 11 August 2000.