Using Algorithms to Track Down Sex Criminals
In hospitals and police evidence rooms all over the US, thousands of “rape kits” sit untested—and in some cities, they may be thrown away entirely. These kits hold potential DNA evidence that could catch or convict a perpetrator of sexual assault, but a backlog in processing the kits prevents the access to that information. Now, though, a recent Stanford University study reveals that an advanced machine learning algorithm can process kits faster and more efficiently than human forensic experts. That can speed up the prosecution of sexual assault cases and end the victims’ long wait for justice.
The Challenge of Collecting Evidence
Prosecuting sexual assault cases present a number of challenges for law enforcement and the legal system. The number of sexual assault cases that make it to trial is far lower than for any other type of crime, and recent statistics reveal that only five out of every 1,000 rapists ever end up in prison. Lawyers acknowledge that sexual assault cases are difficult to prosecute, and there are many reasons for the low number of convictions. But one of these obstacles is a lack of physical evidence that ties a suspect to the assault.
The so-called “rape kit”, correctly named a Sexual Assault Evidence Kit or SAEK, can provide that evidence. After a sexual assault, medical staff may do a sexual assault forensic exam, which uses the kit to collect and store physical evidence such as hair or semen left on the victim’s clothes, belongings or body. These samples are tested for traces of DNA, which is then uploaded into CODIS, a national database of DNA profiles of known offenders, for matching.
But testing samples collected in SAEKs can be time-consuming and costly, averaging around $1000 per kit. Testing must be performed by forensic professionals, who try to save time by focusing on specimens they believe are most likely to provide DNA, out of the many collected in any given kit. That leads to a massive backlog in getting kits processed and tested in time to pursue a sexual assault case. In many cities, the kits simply sit in storage, but in others, kits over 30 days old can be simply thrown away. That happened in New York City, where hospitals have destroyed 840 kits since 2012.
Now, though, a new approach made possible by advanced machine learning makes it possible to test kits and type DNA samples more quickly and accurately than ever, making the collected evidence immediately available for use in prosecuting assault cases.
Machine Learning Makes Testing Faster and More Accurate
According to a recent report from Stanford University’s Institute for Human-Centered Artificial Intelligence, or HAI, Stanford business professor Lawrence M. Wein has developed a machine learning algorithm that can predict which biological samples in SAEKs are more likely to provide DNA—and its results are more accurate than the recommendations of human examiners.
Wein’s initial research was based on databases from the San Francisco Police Department’s Criminalistics Laboratory, which tests all the elements of SAEKs and collects and stores information on all the samples deemed most likely to contain DNA. SFPD’s database contained data from 868 kits tested over a two-year span (2017-2019), which allowed Wein’s team to develop a machine learning model precise enough to predict which elements in an SAEK would be most likely to contain DNA samples worth loading into the CODIS database.
Wein’s results, recently published in the Proceedings of the National Academy of Sciences, revealed an increase of 41 percent in the number of DNA test results that could be uploaded to CODIS. And while testing all samples in a kit might become more expensive, the algorithm more than doubles the yield of useful samples and could eventually increase the number of sampling targets by 47 percent.
These initial results suggest that the combination of sophisticated machine learning algorithms and “big data”—massive datasets collected from multiple sources—could accelerate both the processing time and accuracy for SAEKs throughout the US, and encourage the release of more federal and state funding for processing the kits.
How Does Machine Learning Work?
Machine learning is one of the many applications of artificial intelligence—structures that give computers and other systems the ability to learn from experience and past information, so that they can evaluate situations and make decisions independent of human actions. Everyday applications of machine learning include search engines and platforms that offer suggestions based on a user’s past actions.
Like humans, machines can learn, and they’re “taught” in a variety of ways to do things like identify patterns and make predictions. Algorithms provide a blueprint for the task the machine is expected to do, and humans guide its learning by checking the accuracy of its performance and rewarding the right decisions. Eventually the machine’s ability to make the right predictions becomes so accurate that it can perform required tasks on its own.
Access to large amounts of data is essential for machine learning, and cloud-based storage platforms make more data available than ever before. The more data that a machine can operate on, the more accurate its performance becomes. That makes it possible to carry out complex operations on massive datasets in a fraction of the time it would take humans to do the same thing, while eliminating the “human error” that can deliver wrong or incomplete results.
In that way, the combination of sophisticated machine learning processes and the large datasets provided by the SFPD made Wein’s initial research possible and demonstrated its potential to change the way sexual assault cases are investigated and prosecuted.
“After this traumatic experience,” says Wein, “for the kits to just sit there and never get tested is unspeakable.” But as his work reveals, the algorithms of advanced machine learning can make “rape kit” testing faster, cheaper and more accurate than ever. Reducing the backlog of untested kits can lead to more convictions in sexual assault cases—and offer closure to assault victims forced to wait months or even years for cases to move forward.