Our mission is to make biology easier to engineer. Ginkgo is constructing, editing, and redesigning the living world in order to answer the globe’s growing challenges in health, energy, food, materials, and more. Our bioengineers make use of an in-house automated foundry for designing and building new organisms. Today, our foundry is developing over 40 different organisms to make products across multiple industries.
Also making use of our foundry is Joyn Bio, a joint venture between Bayer and Ginkgo. Joyn Bio is dedicated to addressing unmet needs in agriculture by applying synthetic biology approaches to engineer microbial solutions for growers globally. To support this important partnership, and help drive its research, we’re looking for an experienced Senior Software Data Engineer to join our team. This role will be critical to architecting the software platform that supports the analytics and machine learning that will ultimately help to define how our bioengineering is performed at scale.
Our programming languages of choice are Python and SQL, as well as DNA, but you must be someone who loves writing elegant code in any language. Most importantly, you should be passionate about making biology the next engineering discipline. The 20th century was all about bits and the awesome technology of computers. The 21st century is all about atoms and the awesome technology of biology – and Ginkgo is at the forefront of this revolution.
As an experienced data pipeline builder and data wrangler who enjoys building data systems from the ground up, you’re excited by the prospect of optimizing (or even redesigning) Ginkgo’s data architecture to support our next generation of products and data initiatives. You’ll be responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams. You’ll also support our software developers, database architects, data analysts, and data scientists on data initiatives and ensure optimal data delivery architecture is consistent throughout ongoing projects.
You’ll be working in close collaboration with a data science team at Joyn Bio to address the data needs of its shared projects with Ginkgo.
- Create and maintain optimal data pipeline architecture
- Identify, design, and implement internal process improvements, including: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, and more
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies
- Use appropriate tools to analyze the data pipeline and provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics
- Work with stakeholders including the Executive, Product, Data Science, Design, and Computational Biology teams to assist with data-related technical issues and support their data infrastructure needs
- Keep our data secure
Desired Experience and Capabilities:
- Master’s degree in Computer Science, Statistics, Informatics, Information Systems or related quantitative field
- At least 5 years of data engineering experience
- Advanced knowledge of database design best practices, as well as experience working with relational databases, data warehouses, and big data platforms
- Proven capability of building and optimizing “big data” data pipelines, architectures, and data sets
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
- Strong analytical skills in relation to working with unstructured datasets
- Experience building processes that support data transformation, data structures, metadata, dependency, and workload management
- Working knowledge of message queuing, stream processing, and highly scalable “big data” data stores
- Strong project management and organizational skills
- High level of comfort with supporting the data needs of multiple teams, systems, and products
- Strong level of motivation and self-direction
Desired Software Tools/Expertise:
- Big data tools: Hadoop, Hive, Spark, Kafka, etc.
- Relational SQL databases, including Redshift
- Data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- AWS cloud services: EC2, EMR, RDS, Redshift
- Object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
- Linux (working knowledge)