Baby Steps to Epsilon Greedy, UCB 1 and Thomson Sampling — Introduction to Reinforcement Learning

which team’s winning rate that you wish to update?

def update(self, x):
self.N += 1.0
self.previous_winning_rate = ( (self.N — 1)*self.previous_winning_rate + x ) / self.N )
winning_rate + Square_root(2*log(total_trial) / team_trial)

Thomson Sampling (Bayesian Bandit)

