Using next-generation sequencing technology and a novel strategy to encode 1,000 times the largest data size previously achieved in DNA, a Harvard geneticist encodes his book in life's language
Although George Church's next book doesn't hit the shelves until Oct. 2, it has already passed an enviable benchmark: 70 billion copies—roughly triple the sum of the top 100 books of all time.
And they fit on your thumbnail.
That's because Church, the Robert Winthrop Professor of Genetics at Harvard Medical School and a founding core faculty member of the Wyss Institute for Biomedical Engineering at Harvard University, and his team encoded the book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, in DNA, which they then read and copied.
Biology's databank, DNA has long tantalized researchers with its potential as a storage medium: fantastically dense, stable, energy efficient and proven to work over a timespan of some 3.5 billion years. While not the first project to demonstrate the potential of DNA storage, Church's team married next-generation sequencing technology with a novel strategy to encode 1,000 times the largest amount of data previously stored in DNA.
The team reports its results in the Aug. 17 issue of the journal Science.
The researchers used binary code to preserve the text, images and formatting of the book. While the scale is roughly what a 5 ¼-inch floppy disk once held, the density of the bits is nearly off the charts: 5.5 petabits, or 1 million gigabits, per cubic millimeter. "The information density and scale compare favorably with other experimental storage methods from biology and physics," said Sri Kosuri, a senior scientist at the Wyss Institute and senior author on the paper. The team also included Yuan Gao, a former Wyss postdoc who is now an associate professor of biomedical engineering at Johns Hopkins University.
And where some experimental media—like quantum holography—require incredibly cold temperatures and tremendous energy, DNA is stable at room temperature. "You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later," Church said.
Reading and writing in DNA is slower than in other media, however, which makes it better suited for archival storage of massive amounts of data, rather than for quick retrieval or data processing. "Imagine that you had really cheap video recorders everywhere," Church said. "Just paint walls with video recorders. And for the most part they just record and no one ever goes to them. But if something really good or really bad happens you want to go and scrape the wall and see what you got. So something that's molecular is so much more energy efficient and compact that you can consider applications that were impossible before."
About four grams of DNA theoretically could store the digital data humankind creates in one year.
Although other projects have encoded data in the DNA of living bacteria, the Church team used commercial DNA microchips to create standalone DNA. "We purposefully avoided living cells," Church said. "In an organism, your message is a tiny fraction of the whole cell, so there's a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn't earn its keep, if it isn't evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it."
In another departure, the team rejected so-called "shotgun sequencing," which reassembles long DNA sequences by identifying overlaps in short strands. Instead, they took their cue from information technology, and encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly. Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence. "We wanted to illustrate how the modern world is really full of zeroes and ones, not As through Zs alone," Kosuri said.
The team discussed including a DNA copy with each print edition of Regenesis. But in the book, Church and his co-author, the science writer Ed Regis, argue for careful supervision of synthetic biology and the policing of its products and tools. Practicing what they preach, the authors decided against a DNA insert—at least until there has been far more discussion of the safety, security and ethics of using DNA this way. "Maybe the next book," Church said.