Bandit Algorithms for Website Optimization
John Myles White
When looking for ways to improve your website, how do you decide which changes to make? And which changes to keep? This concise book shows you how to use Multiarmed Bandit algorithms to measure the real-world value of any modifications you make to your site. Author John Myles White shows you how this powerful class of algorithms can help you boost website traffic, convert visitors to customers, and increase many other measures of success.
This is the first developer-focused book on bandit algorithms, which were previously described only in research papers. You’ll quickly learn the benefits of several simple algorithms—including the epsilon-Greedy, Softmax, and Upper Confidence Bound (UCB) algorithms—by working through code examples written in Python, which you can easily adapt for deployment on your own website.
- Learn the basics of A/B testing—and recognize when it’s better to use bandit algorithms
- Develop a unit testing framework for debugging bandit algorithms
percentage of the time and rewards you with a value of 0 the rest of the time. This 0/1 framework is a very simple way to simulate situations like clickthroughs or user signups: the potential user arrives at your site; you select an arm for them in which you, for example, show them one specific color logo; finally, they either do sign up for the site (and give you reward 1) or they don’t (and give you reward 0). If 2% of people who see a red logo sign up and 5% of people who see a green logo sign
[0.1, 0.1, 0.1, 0.1, 0.9] n_arms = len(means) random.shuffle(means) arms = map(lambda (mu): BernoulliArm(mu), means) This will set up an array that contains 5 arms. 4 of them output reward 10% of the time, while the best of them outputs a reward 90% of the time. This is a very black-and-white situation that you won’t see in the real world, but that means that it’s a nice starting point for testing our algorithms. We’ll be using it throughout the rest of this book. Simulating the Arms of a
First, we will calculate a different scale for reward rates by exponentiating our estimates of rA and rB. Using this new scale, we will choose Arm A with probability exp(rA) / (exp(rA) + exp(rB)) and Arm B with probability exp(rB) / (exp(rA) + exp(rB)). This naive exponential rescaling has the virtue of not behaving strangely if you someone used negative numbers as rates of success, since the call to exp will turn any negative numbers into positive numbers and insure that the negative numbers in
external information about people’s tastes in products you might advertise to them? 59 • How much traffic does your site receive? Is the system you’re building going to scale up? How much traffic can your algorithm handle before it starts to slow your site down? • How much will you have to distort the setup we’ve introduced when you admit that visitors to real websites are concurrent and aren’t arriving sequentially as in our simulations? A/A Testing Banal though it sounds, the real world is
the potential value of every option. Don’t close your mind without giving something a fair shot. At the same time, just one experience should be enough to convince you that some mistakes aren’t worth repeating. Everybody’s gotta grow up sometime You should make sure that you explore less over time. No matter what you’re doing, it’s important that you don’t spend your whole life trying out every crazy idea that comes into your head. In the bandit algorithms we’ve tried, we’ve seen this lesson play