SPADA Draft Documents

Genome Count

%Complete Genomes

1 10 100 1000 10000 100000 1000000

100 120

0 20 40 60 80

Genome

Number of Bacterial

%Complete Bacterial

genomes in GenBank

1990 2000 2010 2020 2030

305 Figure 6. Plot showing the total number of bacterial genomes in GenBank and the percentage of 306 complete bacterial genomes as a function of year. These plots were made by parsing the 307 “prokaryotes.txt < Caution-ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/ >” that 308 NCBI provides as an inventory of all bacterial genomes.

309 310

311

Exclusivity and Background Databases

For the purposes of checking for false positive amplifications, it is useful to construct 312 exclusivity and environmental background databases. For computational efficiency, we 313 recommend populating the exclusivity database with near-neighbor sequences (i.e. organisms 314 that are phylogenetically distinct but closely related to those in the inclusivity dataset). All other 315 distantly related organisms that may be present in the sample matrix and might cause false 316 positives, can be placed into the background database. Typically, the background database 317 consists of unrelated genomes that are contaminants in the sample matrix preparation such as the 318

17

Made with FlippingBook flipbook maker