Multifactorial optimisation of a molecular biology process with LabGenius

Combining active learning with Antha’s multifactorial optimisation for a single-stranded DNA (ssDNA) extension reaction

 

Key Findings

LabGenius combined Antha with their proprietary EVA platform:

 
  • Antha enabled LabGenius to physically execute multifactorial experimental designs in an automated and autonomous manner to optimise their ssDNA extension process.

  • Antha improved efficiency, generating up to a 70% time saving on experimental design and execution, and reducing lab consumable use by up to 30%.

  • Integrating Antha enabled LabGenius’ EVA platform to autonomously optimise and facilitate library generation and construction above a minimum efficiency threshold.

  • The automated multifactorial design interpretation, experimental planning and execution by Antha provides up to 15 hours time saving per DNA Library optimisation. 

 
 
 

“Outside of all of the benefits that Antha has provided us for multifactorial experiment optimisation it has empowered our automation mindset.” 

LabGenius User Testimonial

 

Staffan from LabGenius describes the benefits Antha has brought to their science.

“Antha has enabled us at LabGenius to target a new area of work in our automation efforts not possible before. It is what we call on-the-fly automation.”

LabGenius User Testimonial

 

Introduction

LabGenius have developed EVA, an autonomous AI-driven evolution engine for the discovery of next-generation protein therapeutics. LabGenius use their cutting-edge platform technology to develop new protein-based drugs with world leading pharmaceutical companies.

The EVA platform utilises an AI driven DNA library design strategy. It also plans, optimises and physically executes, via Antha, the production of superior quality DNA libraries encoding a host of evolved therapeutic proteins of interest ranging from enzymes to antibodies. Engineering objectives can include improved or altered binding affinity or specificity through to thermostability or resistance to proteases. The EVA platform learns and continually improves itself for future experiments with every iteration of an experimental campaign, mapping the sequence-to-function landscape of protein therapeutics.

LabGenius started working with Antha in 2017 and have since automated upwards of 20 unique workflows of value to them. Integral to LabGenius’ EVA platform is the ability for Antha to rapidly and flexibly interpret and drive the physical execution of multifactorial optimisation experiments on liquid handling platforms in a hardware-agnostic manner. 

 
 
BG #ededed.jpg
 

 Executive Summary

LabGenius employed Antha in their experimental labs to improve the efficiency and robustness of single stranded DNA (ssDNA) extension, a key process in their DNA library generation method. The need to recover the maximum diversity during single strand extension of DNA fragments encoding degeneracy is critical to the EVA platform.

Every ssDNA extension reaction requires a different priming oligonucleotide. The ability to comprehensively investigate and re-optimise the extension process for every new DNA library fragment without adding significant cycle time is of utmost importance, and critical to improving efficiency.

The repeated context-specific optimisation of the ssDNA extension process would be extremely challenging to execute manually within the expected time frames of DNA library generation. This is due to the number of experimental runs and complexity of the experimental designs. However, by integrating Antha’s rapid and flexible digital interpretation and physical execution of complex multifactorial experimental designs into the EVA platform, LabGenius can autonomously design, plan and physically execute process optimisation as part of their end to end platform.

The Antha executed optimisation process has provided a reusable framework to identify robust experimental conditions, on a case by case basis, that achieve double stranded DNA (dsDNA) recovery efficiencies above the required minimum threshold for maximising DNA library diversity. The automated multifactorial design interpretation, experimental planning and execution by Antha provides up to 5 hrs time saving per  DNA Library fragment optimisation when compared to executing the same experiments manually. Given that up to three library fragments are often optimised per full library generation, Antha provides up to 15 hrs time saving per full library optimisation.

 
 

 Active Learning and Protein Engineering

In 1978 an oligonucleotide method was first demonstrated for altering specific bases in a target gene.[1] This provided the ability to sample substitution mutations in proteins, leading the way to understanding the molecular mechanisms of biocatalysis, substrate specificity, enantioselectivity, stability and protein folding. Fast forward to the modern day and we now have access to an extensive molecular biology toolbox. We can sample substitution, insertion and deletion mutations through to exploring expanded genetic code for exploiting novel chemistries of non-canonical amino acids.

With these new tools we can now exploit the protein sequence landscape to produce high value improved and/or novel proteins to satisfy market need. The technology developments have contributed to an estimated protein engineering market size of >$1b in 2018 with an expected compound annual growth rate (CAGR) of 15.5% placing the estimated global market worth at >$2.5b by 2024.

Traditional protein engineering approaches such as rational design (site-directed mutagenesis) and directed evolution (random mutagenesis) have bought us closer to understanding the protein sequence-structure-function relationship.[2] However, we are still a long way from being able to predictively engineer proteins to elicit a single specific desired improvement, and further still from being able to engineer multiple improvements simultaneously.

Increased throughput afforded by liquid handling automation means a greater wealth of data can be acquired to help us better navigate the protein sequence landscape and identify mutant variants with desirable improved properties. However, with higher throughput comes additional cost to screening. 

Throughput alone does not get us close enough to the goal of better understanding how to predictively design proteins with improved, altered or novel functions. A step wise change is required to marry the throughput afforded by automation with smart experimental design and data analysis.[3]

A new wave of advancements in protein engineering techniques does just this, marrying multifactorial experimental design, increased throughput with automation and machine learning algorithms for design generation. This new approach identifies patterns in data sets in multiple dimensions that the human brain is not capable of perceiving - avoiding recognition of pseudo or false patterns because it does this without prior human knowledge or imposed bias.

LabGenius are driving this new method of protein engineering forward with their EVA platform. 

In nature, organisms evolve slowly over many generations. Evolution is a remarkably inefficient and unpredictable process. LabGenius are unpicking the rules that underpin biological systems such that their machine learning algorithms can predict which mutations will improve a biological design. 

The predictive power of their novel machine learning algorithms constantly improves with data from every protein engineering campaign conducted, along with a constantly growing wealth of open-access data sets.

The EVA platform takes the DNA sequence of a known protein scaffold of interest, or the library sequence of a previous iteration, coupled to next generation sequencing and screening data as an input. And, through application of their machine learning algorithms, generates a proposed library design/re-design targeting specific mutation sites in the target protein. The specific mutation sites identified can come from known sequence-to-function models and public data of the desired scaffold alongside human informed target sites if desired. 

The next step is producing a DNA library that satisfies the generated design. After DNA library generation, protein production and functional screening, design objective data is fed back into the EVA platform for subsequent rounds of library generation and screening (Figure 1). This allows for an intelligent navigation through protein sequence space until protein variants are identified that satisfy the design objectives whilst simultaneously improving the underlying machine learning methodology for subsequent iterations or campaigns.

 
 

Figure 1. Classical vs Machine Learning driven directed evolution protein engineering methods and the EVA platform. Top panel, classical directed evolution techniques uses an iterative approach of library diversity generation and variant screening whereby improved variants are used as template sequences for subsequent iterations and knowledge of unimproved variants is discarded. Classical directed evolution approaches tend to sample local mutation space to that of the parent, limiting the sequence space investigated. Middle panel, Machine learning driven directed evolution approaches use the data derived from variant screening of both improved and unimproved variants to direct the next round of mutations to be sampled. Machine learning approaches investigate broader sequence space allowing for multiple solutions to be identified (Figure adapted from [3]). Bottom panel, the LabGenius Eva platform uses Active Learning algorithms to explore a broad sequence space and fitness landscape providing a DNA Library design, and applies a DoE and Active Learning multifactorial optimisation to each library generation process followed by Machine Learning on collected data to inform subsequent iterations.

 

 

 DNA Library Generation

 

Key to any protein engineering campaign is the generation of mutations in target DNA encoding the protein of interest. LabGenius’ proprietary process produces high quality mutated DNA libraries through a single strand DNA extension and cloning process.[4] In this process each ssDNA starting unit containing genetic code degeneracy at key sites, identified by the EVA machine learning algorithms, is converted to dsDNA. 

A critical criterium for DNA library production is maximising sequence diversity whilst minimising sequence bias. Polymerase Chain Reaction (PCR) based library generation approaches are known to elicit libraries with sequence bias and often result in a loss of diversity when amplifying DNA templates with encoded degeneracy. LabGenius circumvent the latter problem via a single strand DNA extension process (Figure 2), delivering greater diversity in comparison to PCR amplification methods by negating the effects of amplification bias inherent to PCR.

The process of single strand DNA extension, much like PCR, requires annealing of a complementary oligonucleotide to prime against the ssDNA template for 5’ to 3’ polymerase extension. The priming oligonucleotide contains additional DNA sequences at its 5’ termini that encode a uracil residue (a nucleotide predominantly seen in RNA) critical to downstream DNA library fragment assembly and cloning by the USER method (Figure 2).[5] The ssDNA extension step is driven by a uracil competent polymerase. The downstream USER cloning method subsequently utilises restriction enzymes capable of cutting the resulting dsDNA product only at the uracil base, revealing a sequence specific single stranded overhang ready for annealing to similarly prepared DNA library fragments that have the complementary sequence specific overhang (Figure 2).

Given the sequence heterogeneity of every DNA library fragment to priming oligonucleotide pair such as, Guanine Cytosine content, secondary structure formation, priming oligonucleotide annealing temperatures etc. the initial annealing and single strand DNA extension process can have highly varying efficiencies. 

Unlike PCR amplification, the single strand DNA extension process is a 1:1 relationship. The maximum conversion of ssDNA to dsDNA can be 100% with full conversion of every ssDNA to dsDNA. However, the efficiency of dsDNA recovery dictates the diversity of the initial library sequences that are retained. 

At LabGenius a minimum recovery rate threshold is required to give a representative diversity of the starting ssDNA library fragment. However, optimisation is required on a case-by-case basis to achieve or exceed the minimum recovery rate threshold as the efficiency of the ssDNA extension process can be highly variable given different starting material.

 
 

 Figure 2. Single strand DNA extension and USER cloning method. The LabGenius library generation and build process starts with a ssDNA extension process, utilising priming oligonucleotides encoding a uracil base for USER cloning compatibility and homology sequences for neighbouring DNA parts. After primer annealing and ssDNA extension processes DNA excision at the incorporated uracil base pair on each dsDNA fragment reveals single stranded overhangs with complementary sequence to neighbouring DNA parts. Entry plasmid DNA is prepared by a similar method ready for overhang annealing and ligation of generated dsDNA library fragments (Figure adapted from [4]).

 

Active Learning, multifactorial optimisation and Antha

 

To address the case and context specific optimisation of the ssDNA extension process LabGenius are employing an active learning driven multifactorial optimisation methodology. 

Active Learning (AL), also known as optimal experimental design, is a special case of Machine Learning (ML) which uses a learning algorithm to interactively request additional data from the user. This is done in an iterative manner until the desired objective is achieved. 

LabGenius employ a hybrid experimental approach starting with a Design of Experiments (DoE) multifactorial experimental design (e.g Fractional Factorial, Full Factorial or Response Surface) that allows them to screen key factors in their ssDNA extension process. Data collected from this initial experimental design is analysed by a Bayesian ‘exploit and explore’ AL algorithm to navigate through the experimental landscape in an autonomous manner. This facilitates design space exploitation in subsequent iterations given the prior information whilst exploring new areas of design space to better characterise the experimental landscape, potentially identifying condition sets that fall on alternative experimental optima.

Using the Antha DoE suite of elements (Figure 3), LabGenius  are able to rapidly and flexibly define an automated liquid handling protocol that simulates all the low-level liquid handling instructions interpreted from a DoE design file in order to then execute that experimental run on a Gilson PIPETMAX™ (Figure 4). The LabGenius EVA platform automatically generates a DoE design file and Antha workflow, providing an estimated several hours time saving when compared to hard coding an equivalent method into alternative liquid handling platforms’ proprietary software. Also, other software cannot provide the user with the same breadth or flexibility that the Antha workflow provides.

Antha’s DoE elements provide features that allow you to rapidly prototype and optimise DoE execution in silico whilst minimising the estimated physical execution run time, number of liquid handling transfers and consumables used during the run. The Antha workflow prepares intermediate mixtures of user specified factors from the design file, before further distributing and mixing these master mixes in such a way as to achieve the final set point concentrations of all factors in a single run (Figure 3 and 4). In doing so for these DoE designs, Antha provided an estimated 35% saving in the number of liquid handling actions required for execution, equating to an estimated 6 hr saving in total run time across a full library optimisation. The saving on liquid handling steps and execution run time also translates to an estimated 30% saving in pipette tip usage (963 tips or 10 x 96 tip racks) per library optimisation.

All planning for the execution of the DoE is taken care of by Antha, guiding the user on required reagents, volumes and labware, providing detailed schematics of how to set up input plates and how the liquid handling platform should be set up. (Figure 4)

 
 
 

 Figure 3. Simple, rapid and flexible workflow prototyping in Antha’s Workflow Editor. The graphical user interface of Antha’s Workflow Editor affords a biological scientist the ability to rapidly prototype automated liquid handling workflows through programming at a higher level of abstraction with respect to most hardware vendors automated liquid handling software. The DoE workflow used for driving the execution of both DoE iterations in this study is shown here.

 
 
 

Figure 4. Simple user set up.  A screenshot of Antha’s Preview page directing the users as to how to set up their automated liquid handling platform. All low-level decisions are taken care of by Antha so the user isn’t required to determine deck layouts before conducting a physical run in the lab. This decreases risk of human error or for repetitive dry run physical testing in the lab before being able to carry out execution with samples.

 

 Significant Factors and Modelled Factor Interactions

 

For each iteration of multifactorial optimisation carried out by LabGenius, a PicoGreen™ dsDNA quantification assay kit was used to provide a quantitative measure for the recovery of dsDNA as a percentage of the ssDNA starting material used in the ssDNA extension reactions. LabGenius also used Antha to prepare their PicoGreen™ analysis plates after the ssDNA extension reaction optimisation runs are completed, including preparation of serially diluted standards, transfer of test samples to analysis plates and addition of PicoGreen™ reagents.

The initial factor screening DoE design used in this case study was a space fill design investigating the effects of five different factors with between 10 - 30 levels each on the efficiency of the ssDNA extension reaction. The initial factor screen was carried out across 48 experimental runs. And, given the number of factor levels, resulted in an extremely complex experiment to attempt to prepare manually. However, after multifactorial design generation by the EVA platform the interpretation and physical execution of the multifactorial design through Antha took 1hour and 17 min on a Gilson PIPETMAX™ liquid handling platform. Antha has provided up to an estimated 5hr time saving per optimisation iteration compared to manual execution should the LabGenius team attempt to set up this complexity of experiment.

The initial multifactorial experimental design identified key factors and factor interactions contributing to the efficiency of dsDNA recovery. The initial data showed an average dsDNA recovery of 15 a.u. across the majority of the experimental conditions investigated, but with key areas of the experimental design space giving up to a 2-fold increase in dsDNA recovery efficiency above average (Figure 5 A). Analysing the effects two of the factors in the screening design had on the efficiency of the ssDNA extension reaction, in the scope of the concentration ranges investigated, revealed Factor 2 had a positive impact on dsDNA recovery efficiency, with increasing concentration of Factor 1 having a synergistic effect (Figure 5 B). 

The remaining three of the five factors under investigation had a negative impact on the efficiency of the ssDNA extension reaction to varying degrees, also showing significant factor interactions between themselves (Figure 5 C & D). Factor 3 and 4 both impact the ssDNA extension reaction negatively, however increasing concentrations of Factor 4 in the reaction modulated the effect of increasing Factor 3 concentration from negative to positive. Whilst there was positive reaction inflection due to factor interactions here, the overall impact of these factors on the reaction efficiency was negative. Factor 4 and Factor 5 on the other hand both negatively affected the efficiency of the reaction and had a further synergistic negative effect compared to the factors when considered alone (Figure 5 D).

Whilst three of the five factors investigated had a negative impact on overall dsDNA recovery the power of a multifactorial experimental campaign is exemplified here. Identifying non-intuitive factor interactions that would otherwise be missed with a one factor at a time (OFAT) experimental approach.


 
 
 

Figure 5. Significant Factors and Modelled Factor Interaction response surfaces. (A) dsDNA recovery responses from the first DoE derived multifactorial experimental design identified a mean recovery of ~15% with regions of the design space that also elicited better and worse dsDNA recovery. (B) Key influencing factors were identified that acted synergistically to improve dsDNA recovery. (C and D) The remaining three factors in the design were identified to be negatively influencing on dsDNA recovery. Data analysis and graphing performed in JMP 14.1.0. 

 

Exceeding a minimum dsDNA recovery threshold

 

The subsequent Active Learning derived experimental design based off the first iteration data aimed to exploit the reaction conditions giving rise to the higher dsDNA recovery efficiencies observed from the first iteration. It focussed on the two remaining positive impacting factors, whilst exploring design space outside of the factor level ranges initially investigated (Figure 6 A). The second iteration design considered the two remaining factors with six and seven levels respectively across 16 experimental condition sets with 5 replicates of each (Figure 6 A). The data from the second iteration experimental design identified a robust experimental landscape where > 81% of the reaction conditions investigated gave a dsDNA recovery between 25 – 30 a.u., exceeding the objective response target of 25 a.u (Figure 6 B).

 
 
 

Figure 6. Active Learning driven multifactorial experimental design and modelled factor interaction response surface. (A) A contour map representation for an autonomously generated Bayesian Inference ‘exploit and explore’ algorithm multifactorial experimental design that exploits the experimental design space of the two remaining factors that derived the best dsDNA recovery efficiencies in the first design iteration whilst exploring the design space further to identify any potential alternative experimental optima. (B) The modelled factor interaction response surface for the two remaining factors identifies a robust experimental design space has been achieved in which >81% of experimental conditions provide a dsDNA recovery efficiency greater than the minimum target required. Data analysis and graphing performed in JMP 14.1.0.

 

Conclusion

 

By using Antha as an extension to the EVA platform, LabGenius are able to physically execute multifactorial experimental designs in an autonomous manner to optimise their ssDNA extension process on a case by case basis. This is a key step for LabGenius’ protein engineering mutagenesis library generation (Figure 1 and 2). With Antha, the complexity and number of experimental runs in each multifactorial experimental iteration were beyond that which is considered feasible by manual execution (5 factors with up to 30 levels over more than 80 experimental runs), delivering greater power and biological insight.

Antha provides LabGenius with up to 13 hrs  time saving on experimental design, planning and physical execution per library, equating to ~65 hrs saved per week given an average of five library optimisations (Figure 7). Whilst Antha provides extensive time savings, equivalent to that of 1.5 full time employee’s hours per week, reduced labware usage of up to 30% is also achievable. The combination of LabGenius’ EVA platform with Antha facilitates autonomous optimisations for library generation and construction above a minimum efficiency threshold required for diverse high quality mutagenesis libraries, essential to the LabGenius process.

Comparison of time and resource savings afforded by Antha over manual execution exemplifies the rapidity and flexibility required when optimising a process on a case by case basis. 

 
 
 

Figure 7. Comparison of hands-on and walkaway time for manual and automated library optimisation. Using both EVA and Antha provides greater than 70% time saving on overall experimental design, planning and execution and up to 95% time saving for hands on time in the lab.

 

User Experience

 

Table 1.User experience comparisons for Antha vs programming alternative liquid handling software vs manual execution of the experiment

“Where previously the cost of development has been prohibitively large. We would not develop automation for an experiment we only intend to run once, but with Antha, assuming labware has already been validated, the time to develop automation has been decreased 10-fold. This means it is now time and cost viable for us to automate experiments on-the-fly.”

LabGenius User Testimonial