HOW Future Sight AI Works
Below is a modified version of the script for the video above to add content cut for time and make it more suitable for reading. Parts which are not found in the video are marked with this background
How an A.I. is Becoming the World's Best Pokemon Player
To master a game, you need a plan. And a good place to start is to find how many places you might have to use it. For simple games, the number starts small. But, as the games add pieces and the places they can go, its number of situations and difficulty to master skyrockets. The higher the number, the closer making a computer which plays it at a human level comes to limits of what we can build.
Despite its worldwide popularity, surprisingly few people know that underneath the veil of a kid’s game, Pokemon is one of the most complex two player strategy games out there. And with this complexity, building a computer that can out play humans, let alone those who’ve played competitively for over a decade, sounds impossible, right? I don’t think so...
Hi, and welcome to The Third Build. This is about my journey in creating Future Sight AI; a program which learns to plays Pokemon like a human and at a competitive level. I will cover not only the main steps of how I made it, but also how well did, what its next plans are, and how you can battle it yourself. Let me start by clarifying that intro and talk about what goes in to playing competitive Pokemon.
Competitive Pokemon’s Depth
Most of you will know Pokemon as a game where you go around catching fun creatures. However, the purpose all this catching is for your caught Pokemon to participate in battles where you and your opponent send out Pokemon to face off. Each turn you choose one of 4 different moves or switch between one of your 6 team members with the goal of bringing the health of your opponent’s team to zero. Where the complexity comes into play is in what these move and switch options can do and how many options you have for those options.
Let's use the four moves you can use as an example. Each can be one of 18 different types – which come with their own web of strengths and weaknesses - and can do any combination of effects like doing damage, healing yourself, poisoning your target, changing the weather, and so much more. But, before you pick from any of those moves, you must choose them from the list of ~40 useful ones each Pokemon can learn. Looking at this information alone, an average Pokemon already has 91,390 different ways you can use it, or more importantly, it can be used against you. And again, that number only represents picking the moves. If I explained all the options for choosing each of your team members, all their stats, or even the stats of their stats, you’ll see how that number for how many ways a battle can start, might feel like an understatement.
Before I continue, I do have a request for the professional data scientists. This is my first data science project and I’m here for any feedback you can give (especially on what I’ve done wrong). Just don’t be pretentious; I have no patient for that.
With all that intro stuff laid out, it’s time to dive into how Future Sight AI works.
Applying Machine Learning to Pokemon
There are 3 major tasks this AI must pull off to be effective: Understand whether it’s winning, predict how it’s choices will play out, and determine what it’s not told about the opponent’s team. We’re going to start with the one this project only started because and cannot be done without.
Surprisingly enough, this whole project started from my interest in the NBA. I noticed websites like ESPN would post a team’s current chance of winning during games, wanted to know how they made those predictions, and figured the best way to understand it was to try making my own. Once I got little into that project, it occurred to me a sporting event and a Pokemon battle shouldn’t be all that different from a data perspective, so I tried making one for this game too.
Figuring out how close you are to winning in basketball is relatively straight forward since most of it comes down to the current score and how much time is left. However, Pokemon doesn’t have such concrete indicators for winning as, if you, say, used the amount of health you have left as your score, well, anyone who’s played this game can tell you that having more HP than your opponent gives you no guarantee you’ll win. There are some Pokémon and moves whose whole gimmick is exploiting that expectation!
To do this properly, I had to decipher how a bunch of different factors worked together in conveying how well you’re doing in a battle. And the go to place for solving problems like this now a days is machine learning.
The reason machine learning is great for an application like this is once you have a machine learning model set up, it can sort out hard to solve problems for you. This works since you can give it a set of factors you think are part to the answer, the model can find how important each factor is or how to interpret them, and it will give an answer by using that importance to put those factors together.
As a starting point, I decided the following attributes could be useful in determining a player’s likelihood of winning:
If they're currently in or out of battle
Current status condition
Player's Side Attributes
Side Conditions (Light Screen, Tailwind, etc.)
Entry Hazards (Stealth Rocks, Spikes, etc.)
Volatile Status (Leech Seed, Curse, etc.)
Last move used
The Pokemon fans might be wondering why ideas like each Pokemon’s type or their ability aren’t factored in. That’s because, at the time, I thought those factors didn’t change too much for a particular Pokemon that just knowing which Pokemon it was would do a good enough job in representing those ideas.
The biggest requirement for creating most machine learning models is having examples of the data you're putting into it where the answer is already known so it can learn how much a factor is related to that answer. For me, this meant I had to somehow find a ton of battles where I can see both players teams, exactly what happens every turn, and the answer of who won. Thankfully, there’s Pokemon Showdown: a fan made simulator where people can play competitive Pokemon against others and, most importantly, save their finished battles to the website to be viewed by anyone later.
Now all I had to do was figure out a way to get all those battles and put them into a format TensorFlow could understand. At first, that meant me manually running a program to download all the battles they saved on their public server daily. However, when someone on their team noticed my project, they decided to provide me 2,000,000 battles to train on which was way more than enough to get the job done. Showdown is run by a bunch of fellow data nerds, so they have a program where they’ll give researchers anonymized logs of past battles for their projects. I did choose poor timing in asking for the logs as I got my allotment in between the two parts of Pokemon Sword and Shield’s DLC so my data for the current games was from before either released, but I was able to use the information to support past generations and improve my log collecting to supplement my data for the current generation.
I then trained the model so every turn of a battle was a separate example with their answer being whether player 1 ended up winning. In the end, I had made a model that could predict correctly who would win from any given turn with at most 81% accuracy. I haven’t mentioned yet just how much randomness can turn a battle around like when strong moves that won’t always hit misses, so getting it right 100% of the time is impossible, which makes that accuracy near the upper limit of what’s reasonable.
Because of how I structured my code, I realized I was just a few modifications away from creating models that could help predict your opponent's next move. It mostly just took switching out the answer from being whether player 1 would win to what player 1 did in the following turn, and I was able to create three more models to represent the different aspects of what a player can do in a turn with even greater accuracy than the first one.
Getting these models to this level of accuracy was a whole project onto, so I decided to make these predictions a stand-alone product for others to use. If you go to your web browser of choice’s app extension store and look up “Pokemon Battle Predictor”, you’ll find a free extension I made (which I admittedly don’t update as often as I should) which enables you to view the predictions these models make for the battle you’re playing in or watching live. It works for most singles formats, and you can find more details on this page on this website.
Seeing a Battle’s Future
This step is where Future Sight AI gets its name as it’s all about looking at how the battle might play out when the AI makes its next move. The basic steps of this are figuring out what move options you and your opponent can choose, seeing what happens on all the different combinations of your move choices and their move choice, and choosing the move that led the most situations that were favorable to you.
I didn't know this when I started, but this method of looking at possible turns to choose your most favorable move, is the same one used by the best chess playing AIs, and because my machine learning model can tell you which of those future turns result in a better chance for you to win, those same techniques can now be applied to Pokemon. There’re many differences that make applying this method more difficult for this game, but the 2 biggest ones are you don't know what your opponent’s options are for sure and this game’s random chance.
The chart above for turns which result from the move combinations of both players is already a best-case scenario assuming you know everything your opponent can do, which won’t happen often. If you want to cover all their possible moves, the opponent’s side of the chart gets much larger. What this chart also assumes is every move always happens the same way. A ton of moves have a chance to miss, burn its target, or other random side effects, and each one of those creates another turn to consider. All this leads to the chart transforming into the one below. Don’t forget, all these situations are just for one turn and to investigate future turns, you’d have to explore at least this many options for all the turns that branch off from this first one.
If you’ve played the game, you might be thinking “yeah that is a huge number, but I don't consider nearly that many options when I play.” But you can do that because you’ve spent time learning what's common and what small fraction of that huge number you’re more worried about. Getting a computer to understand what you know intuitively is the tricky part of all of this, but that’s where the machine learning come into play.
Let's update that diagram again to reflect this. Since we have models that can tell us what the most likely moves the opponent will go for, we can eliminate the less likely ones. Also, because we can know our chance of winning for any situations, we can remove our own move choices which result in much worse outcomes than the others. When you add in limits for how likely a random event must be for to be worth considering, and the AI can reduce how many turns it must explore several times over.
Even though I reduced the amount of turns the AI explores, there’s still a lot of computation to do for its 15 seconds time limit. Technically, players have up to 2 and a half minutes to make their decisions, but at the end of each turn you gain at most 15 seconds back. So, if you spend more than 15 seconds per turn deciding, you’ll lose by timeout. Thankfully, exploring different outcomes of a turn doesn't need to happen in a certain order, so, if it can process multiple outcomes at the same time, the AI could get its results way faster. That method’s speed advantage was so great it left no choice but to bite the bullet and make my project multithreaded allowing it to run turn exploring on multiple processors.
My laptop, which was the computer the AI mainly ran on, has 4 processing cores with 2 thread each it could use to explore turns. That’s fine, but using that many meant I was lucky to get through exploring 1 turn ahead before time ran out. When I started looking at ways to increase the number of processors I could run on, I realized my best, if not only (it felt like the moment I was ready to buy a computer was the same the chip shortage became an issue), solution was taking the AI to the cloud. This solution had me super excited since a cloud server can have pretty much any number of cores; 32, 64, 256, you name it, a cloud server can handle it. That excitement disappeared when I found I was limited to only 16 cores. Don’t get me wrong, doubling is good, but compared to what it could’ve been, it’s a little disappointing. I was so ready to rock out on a 64-core computer and run house, but alas this means AI can only look just shy of 3 turns ahead in the 15 second time limit. And I’m fine with that for the first run of this. Speaking about the first run, let’s talk about how it did.
How Future Sight AI Did
Here's something even the most dedicated users of Showdown haven’t seen before. When you play against a random opponent, you're giving a rating based on how well you’ve done against other opponents. This is a graph of how many people have each of those ratings. Now this system does have its flaws, as people’s ratings tend to fluctuate quite a bit. I was able to mitigate that by not only making sure it plays plenty of games, but also play enough games until its rating became stable. As for how I got this information, because I had that system setup to download battles off Showdown daily, I found myself with a mini database of all of Showdown’s battles. This graph won't reflect every battle, but this does represent 50,000 battles in the past month, so this should be more than enough to make a good estimate.
For our analysis I added the yellow line to the graph which represents what percentage player you are when you're at that rating. I also shifted the graph over by 1 rating as to start at 1001 as the number for players at the base rating of 1000 is filled with far too much noise as there’s a high chance the other people who used the system once and left. I provided a picture of that graph if you want to see it, but that also means if used this graph to come to my conclusions, all of those points would be at a much higher percentage of the player-base. The extra points I have on that graph are there for you to see where other relevant battlers place in their ratings. Might notice that my reading, despite being the creator of the AI, is quite low. That's because despite playing competitive Pokemon for the better part of the past ten years and knowing way too much about how this game works, I’m just not good at the game. Obviously, I'm not going to compare myself to a computer that has perfect knowledge of every aspect of the game, but I think this still works as evidence that just because my AI will know exactly what the game can do, there is more to this game than just knowing how it works to play it well. As for the random moves, during my runs with the AI, I wanted to generate a baseline of what it would do if it wasn't working at all. They were to do that, I said it too randomly choose games where instead of using the AI, it would just pick random moves. I was considering having a separate bot start from the bottom of the ladder and see how high it can go, but doing so would have caused a lot of fluke wins and I wanted to get an idea of what random moves could do on the level of the I was playing at. At the end of the four runs that constitute the AI's current version, there were 60 battles in which it shows exclusively random moves. Of those 60 battles, it won zero. Now, I'll be the first to admit that winning zero of those battles is an anomaly as after that many either somebody should have forfeited or the game's random chance might have played in your favor. However, since winning zero out of 60 was even a possibility, that was more than enough proof to me that you must be far better than random moves to even have a chance.
Another point of context I gave is from Showdown itself as they do publish some of this information but only for the top 500 players in the world. And as I'm writing this, the rating of player 500 has moved slightly to 1706.
Now, the AI’s average rating is 1547 and its maximum rating is 1630. That average rating puts it in the top 10% of all players and that maximum rating puts it above the top 5% of all players. These numbers come from the aforementioned four runs I did of the current version where each run is the AI battling random opponent’s every 24 hours. Between each run, I intentionally lost enough games to drop the AI’s rating by 100 points to see if it could climb back to where it was without the benefit of the starting bonus a previous run might have given it. In retrospect, I should’ve had the AI start at the same rating each time, but it’s quite difficult and time consuming to move your rating to a particular spot as if you did too low, getting back up to the starting point might be a challenge in itself. And the thing about it getting to 1630 is that means it was remarkably close to being in the top 500 in the world. Also, the entry into the 1600s is not even an anomaly as that 1630 rating came three weeks ago but then when I ran this a few days ago it got back to the 1600s no problem!
Now I do want to clarify that just because it has that rating does not mean it's going play well every time because there are still some vital gaps in this logic and to fundamental parts of what it does which are still holding it back. The first is the two, and this is truly a topic onto itself that will have to save for later, is that the AI build its own teams. If you watched the video, you might have noticed I left a long pause after I said that because the importance this cannot be understated. It's very much so believed that team building is far harder than battling, so for the sake of completeness in making sure FSAI can handle everything that goes into playing this game, I'm potentially putting it at a great disadvantage for having it tackle the harder side. In short it does so by trying to create a well-rounded team that counters the Pokémon it thinks will see more often while also not being countered itself. For right now, I have it playing safe; FSAI can pick whatever Pokémon it wants for a team, but each Pokemon’s set doesn't deviate that far off what’s standard as I’d just have all its teams be fine rather than risk making bad ones.
Now the second part is a little more complicated. Since I knew the AI would be playing against people in a bunch of different skill levels while trying to raise its own rating, I programmed in a way to make it play differently against those of different rank. You can't simply approach different player levels the same way and expect to win as, for example, the worst players might go for that obvious move you wouldn't expect a better player to. This might be a byproduct of how I set up its runs, but it seems like this programming had unintended consequence.
One would expect that the AI would win more against worst players and lose more against better ones, but, when I looked at it wins and losses when they were grouped by player ranking, it seems like they're all close to 50%. That means it may have played to its opponent skill level which I guess is useful in its own right, but I would have preferred if it, you know, played better! I feel like with only a little adjustment I can get it out of the state, but for now, it has this weird limitation.
Inverse Damage Calculation
One of the important things to do when making an AI for a game, especially one where you're not all that great of a player, is to learn about how the best players play. As part of that, I spent a lot of time watching YouTube videos of people playing competitively, and as anyone who watches competitive Pokémon YouTube knows, if you do that for long enough, you’re going to come across Wolfe Glick. Not going to go over his whole resume, but if there’s an important competition to win in this space, odds are he’s won it. Once my AI was in a working state, I’d watch videos with a mindset of “they just made a particular play, could my AI do the same”. Most of the time, the answer was yes, and if it wasn't, FSAI was a few tweaks away from getting there. However, there was a play he made that was so wild yet so on point that it made me rethink everything. Here’s a link to the play if you want to see it, but basically it starts by his opponent using a move that would seemingly not make any sense to click but it does have an extra effect of changing the weather. Now there is this not all the popular move called Weather Ball which changes its type based upon what the weather is. Because he realized that first move didn't make sense, he went through all the reasons why it would and lo and behold he realized that if someone on his opponent’s team had Weather Ball and the weather was what it had become, they would have a super effective move against the rest of his team and if he hadn't noticed it, he just might have lost the battle.
Not only did I see a great example of how deep this game can be, but also was left dumbfounded on how my AI could pull such a play off. A lot of time thinking later, I realized there were a million steps he needed to take to get there, none of which my AI currently did.
This pointed out that I wasn’t doing anything to determine what set of stats, moves, items, and just everything that your opponent can change about their team. Realizing I hadn’t addressed such an important part of playing this game, I started down the path of discovering just how much information you can gain about your opponent’s team that you’re not directly told, and the first stop was inverse damage calculations.
That’s a term I use to describe finding the stats of your opponent’s team based on the amount of damage that both sides do to each other. This is best explained through an example. Here’s a situation where our opponent does damage on us. When you click a damaging move in Pokemon, the amount of damage done is determined by this equation with the main components being the power of the move, its users’ attack and level, and its target’s defenses. We only run inverse damage calculation once the damage has already been done, so we can rearrange the equation to find the attack stat which caused this amount of damage. Since the game tells us our opponent’s level, we know our own defense stat, and the power of the move that hit us can be looked up, there’s only one more number we need correctly find their attack stat. However, that number is where things get complicated. It represents the general damage modifier which is the product of potentially dozens of known and unknown effects happening during a battle which change how much damage a move can do. The two effects that can have the greatest difference, yet we have the least likelihood of knowing, are from the item the attacker is holding or their special ability.
There are a ton of different items and some abilities which effect how much damage an attack can do, which could make sorting which ones they currently have a tedious process. But, if we combine the knowledge of how much each of them modify damage with what the opponent’s Pokemon most commonly has, then we can efficiently find their most likely combination of item and ability which makes this equation return the Pokémon’s possible stat.
Hopefully that solution made sense for how to solve for a stat when they're attacking you. But if you’re attacking them, I just cannot sit here and try to explain that! Even though I figured it out, the solution to do so properly almost goes over my head. Here’s the thing, when trying to solve for attack, it’s just that one stat you must worry about, but for finding defense, there are two stats (either HP and defense or HP and special defense) that can change how much damage a move does and there's a third stat that is connected to one of them but not the other (defense and special defense have no correlation). On top of that you still must make sure that the items and abilities match up to give you the correct stat, and no, linear algebra just wasn’t viable here, so getting to this solution was such a mess.
And one more thing about damage calculation: no matter what you do your answer it is still kind of a guess. Almost every move can do 1 of 16 different randomly chosen amounts of damage, so there’s no way of knowing if the move that was just hit was the most, least, or anything in between damage possible with their stat. But, as you land more hits on them and them on you, you’ll have calculated their stat enough times that the average of all of them will lead you close to the right answer
Now I do understand I don’t need to get exact stats as an estimate’s probably good enough, but there’s just too many guesses this computer needs to make for me miss an opportunity to find something it can know for sure.
There are some other tricks I used to find what items or abilities the opponent has like checking if something that would have happened if they had a particular item didn't, but finding their stats turns out to be the most helpful in figuring out what they can do. Once the AI knows their stats, maybe their item, and a couple of moves, it can look through a list of the most common sets that Pokémon runs, find the one that’s most similar to what it already found, and use the information from that set to fill in the remaining unknowns.
This is by no means an exact science and there's a lot more that can be done in this area, but this already goes an exceedingly long way in making its predictions more accurate.
Now I said there were three tasks that made this AI work, but there's actually a fourth. I’ll tell you what it does, but the how will have to come later since I only finished it recently and it’s gonna take some time for me to put it into words. Consider this; all three parts I’ve mentioned exist just to help the AI look through thousands of future turns, and although it’s a good strategy, I couldn’t help but think there must be a more efficient way. While writing the code to help the AI get more out of fewer turns, I realized that with a few changes, it could also be used to help guide the machine learning predictions. And after a bit of testing, it appeared this guiding code, on its own, outperformed the machine learning models not just in accuracy, but even more so in speed. It was such an improvement that the AI in its current form doesn't use machine learning at all. I still don’t know if that’s a good thing, but because of this 4th phase, the AI transitioned from doing well because I could think ahead, to doing great because it could think… sideways. It should be noted that this part of the video was filmed two months ago, so I have made a bit of a dent in being able to explain this. Basically, the machine learning models had its weight trained on certain on a battle format as a whole and then those weights were set in stone for every battle. That's fine as it gives you a general idea of how important things are, but with a game that varies as much is this one from battle to battle, those values being static could result in misguided decisions. So, I wrote code that could analyze the battle on its own which resulted to similar behavior to if the AI calculated those weights on the fly; it no longer understand the game on a format-by-format basis, but rather on a turn-by-turn basis. The code allows it to be far more dynamic in its analysis of the game and therefore improve all its predictions. And, because information no longer has to go through the machine learning, TensorFlow pipeline, it can skip a large part of its speed bottleneck and be way faster.
I could keep going forever on this cause for every five lines of code, I feel like I could probably add a page to this script. And considering there’s almost 20,000 lines in here there’s so much more I could dive into; I didn’t even talk about how physically plays the game! I’m definitely making this topic an on-going series on this channel, it won’t be every video, but expect this AI to stick around, so I recommend you do as well.
Battle the AI!
In the meantime, how would like to battle it yourself? For the week of this video being uploaded, Future Sight AI is open for challengers. All you have to do is go to Pokemon Showdown, type in its username, have a team ready in one it’s supported formats, and get ready to see what it’s made of firsthand. If you’re not the battling type, you’re already in the right place as if you switch to the main Future Sight AI page of the website, you can see it battle alongside its analysis of. Also, it should be noted it will be running at its absolute lowest performance setting, which still should be fine, but I get if that’s disappointing. that’s just how it has to be since if I had this thing running for a week at full strength on a server, the only thing this AI would be beating is my wallet! But if you’re really interested in challenging it at its peak, just hit me up and I’ll see if we can make something work.
I do have every intention in making it be able to play against others in the official games on Switch as it should be straight forward if the computer can read a screen, so that’s one thing I can guarantee is in this AI’s future.
Speaking about the future, yes, doubles is coming.
In case you don’t know, the format of tournaments hosted by Pokemon themselves is for battles where instead of having one Pokemon out per side, there’s two, and I am so ready to take on that side of this project. I tried to keep doubles in mind while coding and have most of the ideas how to implement it worked out, and really the only thing stopping me it will just take time. At the end of the day, I'm just one guy working on this, well, for now, but talking about stuff like that is what Friday’s for.
As for my other plans, I do have some fun things in mind like seeing how well it can play through the story of one of the games. And, of course, there’s no shortage Pokemon challenges I could apply this to and maybe see if we can find best solution to certain some of them or push the limits of what challenges we think are possible by throwing it at some tough ones.
But, I have my sights on a particular problem I’d like to solve once and for all. There is a popular quote amongst Pokémon fans, that frankly I believe applies life in general, that paraphrasing says if you’re truly skilled, you can win with your favorites.
And there have been several people that fit this truly skilled criteria who’ve tried being the proof. But their success or failures can never make this definitive. And that comes down to always not knowing who’s the weakest link; is it players on either side as even the best have bad days, is it your team as their sets were not the best they could be, is it both?
However, a computer which can not only keep its play consistent, but consistently good fixes both those problems as the AI won’t make a wrong calculation, or forget a mechanic, or just get tilted or anything. Also, through some more coding, the AI could be applied to find the best possible sets for a particular team. With all that in mind, I think this might be our best chance to see this through. As for what team to use, over the decade of playing this game, I’ve developed a pretty concreate idea who of the team of 6 I’d want to put up to this task. They aren't all my favorites, but collectively, they make up my favorite team. In its current state, Future Sight AI isn’t ready to do this question justice, but one day, when I’m ready, they'll get their time to shine.