Implementing Design of Experiments (DOE): A practical example
You may have an understanding of what Design of Experiments (DOE) is in theory. But what happens when DOE collides with the real world?
Implementing DOE in a busy laboratory is, of course, a nuanced topic—and there’s plenty of ways to approach it.
DOE implementation with a practical example: 7 elements to consider
Let’s jump straight in with a real-life example. Imagine that we want to optimize the expression of a target protein in bacterial cell culture.
Based on our experience with DOE campaigns, here are the most important elements to consider, and how we’d approach them in this scenario:
1. Using DOE tools vs doing it manually
In theory, you can create, execute, and analyze this DOE example (and any other DOE, for that matter) with little more than a pipette, pen, and paper.
But it’ll be hard to do more than scratch the surface without the proper tools. For something as complex as protein expression, you’re going to need a hefty toolbox to help you with each stage.
Software for DOE
Let’s begin with DOE software.
DOE rests on a well-established and robust mathematical foundation. Technically, you can do the math by hand. But it’s hard work, error-prone, and requires specialized mathematical knowledge. Using DOE software helps reduce the risk of a mathematical slip.
And thankfully, over the last few years, DOE software has become more accessible to scientists—which lowers the barriers to entry for non-statisticians.
By creating and assessing different designs, analyzing the data, and building models with software, you’ll also find it easier to decide your next action or iteration.
Automation hardware for DOE
Biological DOEs—including our example of optimizing protein production—typically involves liquid handling and analytics.
Manually handling small quantities of liquid is feasible.
That being said, DOEs are more complex than most protocols. They typically employ dozens or hundreds of runs—and the variations between runs are minute. For DOEs that surpass a certain scale, you’d end up driving yourself mad trying to manually pipette 10 or more liquids into wells that are millimeters apart. All at variable volumes, in an unpredictable layout.
To unleash the full potential of DOE, working by hand without making any errors would be nothing short of a miracle, even for the most practiced pipettor.
Automation hardware would instantly relieve you of the burden, and radically speed up time to insight.
Worth noting: If you go down the automation route, you will, of course, need to integrate the output of your DOE software with the software that controls their lab automation. Automation engineers can help make the transition from manual to automated liquid handling and ease DOE implementation. Though we know that this can create a new bottleneck. And DOE can be complex enough without worrying about shifting toward fully automated experimentation.
That’s why at Synthace, we’ve created a more accessible kind of DOE software—the kind that doesn’t require an automation engineer’s specialist scripting or coding knowledge.
But we digress...
2. Framing your question as a hypothesis
A large part of the power of DOE resides in the process. It’s a campaign approach encompassing screening, refinement and iteration, optimization, and assessing robustness.
So before you begin, you need to sketch out a plan for your campaign. And as with every scientific experiment, you always start by framing your question as a hypothesis.
Returning to our growth experiment: Producing compound ‘X’ in bacteria depends on a complex interaction between genetics and environment.
So our hypothesis would be: By varying aspects of genetics and environment, we will discover what’s important, and how they affect one another. This will help us optimize production.
3. Choosing factors based on what you know...
After forming a hypothesis, the next stage is to start thinking about which factors to investigate, and how to change them.
By factors, we mean variables in your experiment. There can be genetic factors and environmental factors. In our protein optimization with DOE, an example of a genetic factor could be which promoter we use, and an environmental one could be the overnight growth temperature.
To avoid spending too much effort re-learning things that are already known, we recommend using all of the knowledge you can get.
For instance, if you know which growth media achieve high yields when you’re trying to optimize protein production with DOE, there’s usually no need to confirm this experimentally.
However, you can investigate a biologically plausible change to the media (e.g., zinc availability may be limiting) alongside other media, genetic and process factors, and interactions (e.g., between zinc and manganese).
4. ... Without relying on your knowledge completely
Having said that, familiarity should not breed complacency. There’s a line to toe here: It’s all too easy to develop experiments that confirm, rather than test, hypotheses. Not having a well-developed and robust theoretical framework for your experiment will prevent you from getting to grips with the complexity of your system.
So, be open-minded. Don’t assume you know everything. DOE helps you investigate your system in an unbiased way, which often reveals new insights and generates novel hypotheses.
For instance, the formulae for many cell growth media are handed down and used unquestioningly by generations of scientists. After all, why would you risk taking something out if your cells might not grow properly?
But calculated risks are part of science.
Cell growth is complex and there’s no perfect medium that gives excellent results in every possible case. It’s likely that many ingredients aren’t necessary for specific applications or may even be harmful: High levels of zinc may inhibit the growth of certain bacteria, for example.
Investigating the composition of such apparently standard parts of the workflow can be useful: Some “unnecessary” components of the media can be very expensive, while others are actively harmful for the specific application.
5. Getting your measurements right
Results for your DOE are only as good as the quality of your measurement data. So for your DOE to work, your measurements have to be in order.
What can go wrong? There are two related problems: Noise and sensitivity.
Noise is about how reproducible the signal is. If you measure the same thing 3 times, how much do the results vary? This will define the resolution of your experiment. Noisier assays make it harder to distinguish between real changes and random variations. Noise is often something to watch out for during the earlier stages, where many runs will produce low or no signal. Distinguishing these to inform the next iteration will be critical.
Sensitivity is more about the range of signals that you can detect. This usually comes down to a device’s upper and lower detection limits. If you don’t take these into account, you risk losing a lot of information on signals outside those limits, which is a big problem when it comes to modeling DOE data.
Sensitivity could come up in our working DOE example as a side-effect of the assay protocol. The simplest way to detect protein expression might be using crude lysates with a Bradford assay. But you’d need to ensure that the dynamic range of the plate reader doesn’t restrict sensitivity. Testing multiple dilutions is one common way to mitigate this. Mitigating noise issues from background expression of non-target proteins using a proper negative control strategy is also something you’d want to consider.
6. Avoiding the "big bang"—and breaking up your experiment into stages
DOE lets you investigate lots of factors at once—so naturally, you’ll have plenty of factors to choose from. Though you can’t test them all at once. You’ll need to avoid the temptation of creating a “big bang”. In other words, trying to investigate all your factors in depth with 1 massive experiment. This would be impractical, if not impossible.
When thinking about what influences the optimal expression of a target protein, for example, you’ll have to choose between dozens of factors, like variations in the genetic payload—plasmid type, the coding, promoter or terminator sequences. The molecular biology techniques used to assemble and transform the payload, the host strain details and growth conditions, such as temperatures and times, could also influence expression.
Most of the possible combinations will have little if any effect on the expression profile. The problem is you don’t know which!
Thankfully, the solution is simple: It’s best to do your experiment in stages. Begin your DOE campaign by investigating a broad set of factors in limited detail, as you’ll eliminate dead-ends—and produce a smaller, more interesting and influential set. Later experiments can fill in the missing details.
7. Giving your DOE campaign a sanity check
Before you start, look over your DOE campaign. Make sure that you understand exactly what you’re proposing to do in each stage, and whether it makes biological sense.
Will all your runs be biologically plausible?
When you’re looking at the early stages of your DOE campaign, remember that the aim is to investigate high and low levels of continuous factors. For our protein optimization example, we’d want to focus on things like concentrations of media components, to establish ranges to investigate.
And while each of the highest and lowest levels for your factors may make sense in isolation, the combination may not be possible. For instance, investigating the effect of several carbon sources on bacterial growth could involve a low level or zero for each source individually. Bacteria may, however, thrive on more than one carbon source. But giving bacteria no carbon would obviously prevent growth. Equally, large amounts of different carbon sources could overwhelm the bacteria. So, you may want to set limits for total carbon.
Biologically, implausible runs waste time and resources, and can compromise the overall results. Especially if they occur multiple times. Trying to understand how the combination of levels would influence the system is critical: It will make a huge difference to the success of your run. No DOE design package or statistician can give you these answers.
Have you also thought about your positive and negative controls?
It's good scientific practice to use positive and negative controls. But these aren't included in the DOE experimental design, and they're important to think about.
The experimental design will contain the points required to estimate the effects and interactions that you are investigating. DOE also assumes that you can easily measure the response for each run.
You should consider all of these as experimental runs. While they can sometimes include runs that could function as controls (e.g., the zero carbon example above) that's not their purpose. Which means you need to make sure that you add the required controls and replicate runs separately.
Can you make iterating easier by making some of your runs identical?
We also advocate, particularly when iterating, for including a few repeated runs from earlier stages to help understand if your system is behaving the same way.
Otherwise, you could end up in a situation where all your runs look suspiciously different from what you expect given earlier experiments. But as your runs have little to nothing in common, it can be difficult to identify errors that affect large sets of runs, like a machine not functioning correctly.
What we learned from this example? For DOE, the scientist holds the key
If these 7 elements are too much for you to take in all in one go, just remember this: Software and automation, as well as experts in statistics and lab automation, are all valuable allies.
But your greatest ally is your scientific knowledge and instincts: It's up to you to make sure that your experiments ask the right questions in the right way.
Just remember to temper this with open-mindedness: Be critical of what you think you already know. After all, you have nothing to lose but your cognitive bias.
Interested in learning more about DOE? Make sure to check out our other DOE blogs, download our DOE for biologists ebook, or watch our DOE Masterclass webinar series.
Michael "Sid" Sadowski, PhD
Michael Sadowski, aka Sid, is the Director of Scientific Software at Synthace, where he leads the company’s DOE product development. In his 10 years at the company he has consulted on dozens of DOE campaigns, many of which included aspects of QbD.