Big Data: Managing the Flow of Information in Synthetic Biology

October 20, 2015

Big Data: Managing the Flow of Information in Synthetic Biology

by

October 20, 2015

Big Data. It’s impossible to read any business-focused news or articles without running into this term, which is rapidly reaching saturation. But what does it actually mean? And how does it apply to the world of synthetic biology? Take an enzyme. Make a single point mutation and check its activity. That’s data. Publish the result. Still data. Look at all the publications which have ever examined mutations in your enzyme, and use that to build a model of factors controlling activity? That’s big data. Couple that to information obtained from the genome sequences of several thousand bacteria to identify interesting homologues for further development? Also big data.As we develop more and more assays, with more and more information being provided by each, the amount of information that we have to deal with rapidly goes beyond our ability to comprehend. Better methods are needed to sort this data, to store it, analyse it, to pull meaning out of the flood of information passing by every moment. We’ve gathered some of the foremost industry experts in big data and machine learning for SynBioBeta SF 2015, so that they can talk about the role these play in synthetic biology. This session will be moderated by Andrew Phillips, head of the Bio-Computation Group at Microsoft Research, whose experience in computer modelling of biological systems makes him the perfect choice to work with our speakers.Enzymatic Design: ArzedaArzeda is a synthetic biology company located in Seattle, WA. They focus on developing new microbial factories, in particular through the design of enzymes to catalyse novel chemical reactions. The goal is to produce industrially valuable compounds which can otherwise not be produced by biological methods. Arzeda uses in silico enzyme engineering, searching through a vast database of potential enzyme structures to find those which show the best activity in simulated assays. Once a short-list of candidates has been chosen, Arzeda then expresses these proteins for in vitro assays and further development. This essentially makes the design process faster and cheaper by utilising computer simulations wherever possible, allowing them to quickly develop new products.Arzeda is currently partnered with DuPont Pioneer to develop new crop traits for upcoming seed variants, and are also developing novel enzymatic pathways to produce industrial precursor chemicals such as Levulinic acid and 1,3-butadiene. We’ll be hearing more about these plans from Dr Yih-En Andrew Ban, who will be speaking at the upcoming SynBioBeta SF conference.Setting the Odds: Koliber BiosciencesFounded in 2014, Koliber Biosciences is a synthetic biology start-up with a strong focus on the fields of data analysis, modelling, and statistics-based experimental design. They use this expertise to assist other companies in several areas, most notably gene pathway optimisation, assay development, and scaling preliminary observations to large scale predictive models. All of these require the application of statistical techniques to large data sets, thus falling firmly under the ‘big data’ umbrella. To hear more about life under the umbrella, we’re lucky enough to have an upcoming talk by Ewa Lis, founder and CTO of Koliber Biosciences. This position is the latest step on a storied career which includes time as a scientist and team leader at Genomatica, Life Technologies, and Biolight Harvesting, as well as a PhD from the renowned Scripps Research Institute.Synthesise and optimise: DNA2.0California-based DNA2.0 was founded in 2003, offering an integrated synthetic biology pipeline covering every step from gene design through to synthesis and cloning. Combined with their systems for algorithm-assisted protein engineering, DNA2.0 has carved out a strong position in the field of gene synthesis – a position supported by an IP portfolio covering 13 patents, 40+ in-house publications, and a remarkably fruitful Arctic research collaboration. Impressively, they have also released a number of their patented sequences to the scientific community, IP-free, in order to encourage further innovation.

Tri-color plasmid designs, assembled by DNA2.0. Source: Tech Museum iGEM Team, 2014.DNA2.0’s success, particularly in protein optimisation, relies on the use of algorithmic processing of large protein structure data sets to determine ideal designs. As such, computing technology plays a large part in the services that they can offer. To tell us more about this, we’ll be hearing from Alan Villalobos, current VP Synthetic Biology / Director of Information Technology and DNA2.0These speakers will be on stage from 4:30 pm – 5:05 pm during Session 11: Big Data & Machine Learning for Biology. We look forward to seeing you there!