DNA databases and DNA profiling

Share on FacebookShare on Google+Email this to someoneTweet about this on Twitter

Everything I’ve read about DNA profiling and DNA databases suggests the following:

  1. A universal national DNA database should be constructed
  2. DNA profiling for this database should switch to SNP genotyping

DNA database – The U.S. national and state governments operate a number of DNA databases. Most law enforcement databases are integrated under the FBI’s CODIS system, which now contains several million DNA profiles in its “offender” index. Privacy advocates raise a number of concerns about these databases, but most political action concerns the criteria for getting profiles into and out of these databases. As far as I can tell, all of the concerns about inclusion/exclusion criteria would be circumvented by the existence of a universal database. For example, issues of contention include:

  • What crimes warrant the collection of DNA?
  • Should DNA be automatically collected at arrest or not until conviction?
  • If DNA is collected at arrest, should DNA profiles be expunged if no conviction is made?
  • Does the mass collection of DNA raise the risk of false positives and subsequent false convictions? [Note, ostensibly it does - especially when imperfect forensic profiles are used to search for a match.]

The retention of DNA samples is a second concern for privacy advocates. This is a real issue which should be addressed by maximizing protections of stored samples or by choosing to discard samples. Other concerns are aimed at the application of DNA databases in criminal prosecution. These criticisms exist regardless of the databases’ size/scope, but there is reason to believe that the increased attention to the caveats of DNA evidence that a universal database provides would improve these conditions. Along those lines, there are a number of benefits which come from universal coverage:

  • Universal coverage is perhaps the best way to ensure proper privacy protection and oversight of the database.
  • False positives would be more easily detected and corrected.
  • The advantages to law enforcement would be obvious.
  • Paternity would be known for all children.
  • It would have beneficial uses for identification outside of law-enforcement.

DNA profiling – The most common form of DNA profiling used for DNA databases (and other DNA-identification applications) is STR genotyping. Even with the best foreseeable technological advancements, STR genotyping has many disadvantages to SNP genotyping. If we were to implement a universal DNA database, it would be prudent to make the switch to SNP genotyping.

  • While STR genotyping is currently performed on ~13-16 highly polymorphic loci, it would be technically trivial to genotype hundreds (or thousands or more!) of biallelic SNPs.
  • High-throughput SNP genotyping platforms are advanced, and the pace of development (i.e. reduction in costs) is enormous.
  • SNP genotyping is technically simpler than STR genotyping, and it would be easier to miniaturize.
  • Huge databases of SNPs are already known, making it possible to select a panel of SNPs to meet almost any reasonable requirements. For example, SNPs could be chosen to minimize the chance that they are actually markers for socially-important phenotypic differences between individuals or groups.
  • Multiple correlated SNPs can be chosen for redundancy against genotyping error.
  • Poor quality forensic samples can be more accurately assigned to database profiles when there are hundreds (or thousands) or points of comparison, in contrast to the 13 STRs used in CODIS.
  • You can imagine the on-demand genotyping of a select subset of SNPs as an identity-verification scheme.

Previous posts: [1],[2],[3]

Labels: , , ,

5 Comments

  1. I saw a talk by Bruce Weir a month or so ago on this topic that was very interesting. He just moved out to chair the UW biostatistics department, and testified on DNA evidence in the OJ trial. He made a number of points I found interesting: 
    1) Testing genetic identity is much easier than testing genetic relatedness. Estimating relatedness requires genotyping many more sites than we do now. Much of his work is on how to estimate the probability of identity-by-descent (ie relatedness) of two sequences of DNA. 
    2) SNPs are about 4x (if I recall the exact figure correctly) less informative than the repeats than STRs. 
    3)  
     
    In my own opinion, the weak link in DNA profiling is sample handling (and similarly, database security). The scientific technique has become effectively unquestionable, but the rest of the chain of proof is as human as ever.

  2. 1. based on ~13 STRs alone, this is correct. 
    2. i would have guessed that the STRs in use would have been even more “informative” but I don’t know exactly how that’s measured. most STRs have 15-50 alleles, whereas the SNPs that we’d used would have just 2 
    3. yes, this is correct. however, the power of 1k or 10k SNPs to ascertain identity would eliminate problems associated with degraded samples. contaminated samples would continue to be a problem, especially if trace amounts of DNA are being used to find a match. thus, when the only evidence linking a person to a crime is DNA, the most probable explanation should be some kind of human error.

  3. Universal coverage would also make it easier to find potential donors for things like bone marrow transplants, and so on.

  4. loki, you’d need to genotype HLA to get tissue matches.

  5. I don’t see why the f***ing government should have the power to take DNA samples from me. Even if I trusted my present government (under the control of Tony B. Liar), how can I be sure it won’t be replaced by one that I don’t trust – who would then have access to all the data?

a