Shower thought here:
Imagine sequencing seed DNA and storing it in a digital seed bank. Useful genes could later be printed out and edited back into crops with. A backup drive for life.
SciFi? Probably not:
- The cost of gene sequencing is falling faster than Moore’s Law.
- DNA printing is routine. There are even startups developing tabletop gene printers.
- CRISPR has made it fast and cheap to edit genes, and it’s improving rapidly.
An obvious question that comes to mind is storage. Genes contain a lot of information, and seeds themselves are highly efficient natural storage mechanisms… All that information packed into tiny living DNA pods that can survive drying.
But seeds have to be kept cold and dry. The rule of thumb is that reducing water content by 1% or temperature by 10 degrees Fahrenheit will double a seed’s life span.
Seed Banks are our current approach to “datacenters” for genetic diversity. The Svalbard Global Seed Vault is embedded in permafrost… the ideal environment for keeping seeds around a long time.
Backing It Up
Of course, having a digital backup wouldn’t hurt, and digitizing the genetic information could be useful for lots of other reasons. Gene data is a valuable resource for scientific analysis. Indeed, the Jodrell Laboratory maintains a digital DNA barcode bank (not full sequences).
So how much space would full genomes take? Time for some silly back-of-the-envelope math. Whole genome sequencing generates a lot of data – there are about six billion base pairs in each human diploid genome. Storing that can take anywhere between 200GB and 125MB (if you’re just storing mutations).
Word on the street is that all of the video on YouTube comes out to about 100 petabytes.
1PB = 1,000,000 GB 100PB = 100,000,000 GB Human genome = 200 GB (100,000,000 GB / 200 GB) = 500,000
So we could store 500k individual human genomes for the cost of 1 YouTube. Not terribly efficient. For reference the Millennium Seed Bank physically stores 34,000 species and 1,980,405,036 individual seeds.
However, YMMV with seed genomes. The human genome is 6 billion base pairs, or 6000Mbp, by contrast a tomato is 900Mbp. Soybean is 1115Mbp.
Full human genome = 6,000Mbp and 200GB Full soy genome = 1115Mbp 6,000 / 1,115 = 5.381 200GB / 5.381 = 37.167GB
Let’s say 40GB per full plant genome. You could store more than 5x tomatoes and soybeans than people for the same space.
And we could get creative. The difference between individuals in a species is the sum of their mutations, and we can store those mutation diffs for ~125MB.
What if we were to sequence 34,000 species, then store idividuals of those species as diffs against the “base genome”? Let’s say we store 34,000 full “base” genomes, the same number as the Millennium seed bank and that each genome takes about 40GB.
34,000 species * 40GB = 1,360,000GB YouTube = 100,000,000GB 100,000,000GB - 1,360,000GB = 98,640,000GB 125MB = 0.125GB 98,640,000GB / 0.125GB = 789,120,000
34k species + 789,120,000 individuals for the space of a YouTube. Still not as efficient as traditional seed banks, but it seems within the realm of plausability to create a useful digital seed bank.
It makes me wonder if we couldn’t take a SETI@home approach to the storage problem… donate a bit of your hard drive space to a BitTorrent swarm that keeps those valuable seed genomes alive. Lots of Copies Keeps Stuff Safe.
Disclaimer: this is me being curious. I’m no expert. Please chime in if you have corrections.