I Forced a Bot to Play 40,000 Games of Mono Red - The EV of Experimental Frenzy
Experimental Frenzy is a beautifully complex magic card. Since it's printing in GRN it has found a solid home in standard Mono Red decks, and has even made its way into modern affinity, most notably in the hands of Matt Sperling at MC London. Throughout its time in standard, however, it's never quite been the agreed upon best draw engine for Mono Red. Printed in the same set as Risk Factor, and now facing a new challenger in Chandra, Fire Artisan from WAR, Frenzy is constantly having to fight for inclusion in the 75s of red mages.
As someone who wants to play these decks and make reasonable decisions derived from sound logic, I have found choosing between these options rather difficult. The main problem with trying to make this choice is that Frenzy in particular is an incredibly random card with a terrible fail state (you find Mountain Mountain on top), and just about the highest ceiling possible from a draw engine (see this clip). Not being able to effectively measure the expected use case leads to a lot of nebulous discussion around how good it actually is, without providing actionable information as to whether it's better than the other options. So, in order to improve my understanding of Frenzy and how it plays out, I forced a bot to play 10000 games of Mono Red with Experimental Frenzy in play, and am here to report on my findings.
We'll start from the most simple case, we've got 4 lands in play and no other permanents, no cards in hand, and we slam a Frenzy. For each case we look at, I've made my bot play 10,000 games. This produces reasonable smooth results, and the simulations finish in a couple of minutes. The best way to look at the 10000 games played with this setup is to plot the inverted cumulative distribution function of the damage dealt at the end of the 4th Frenzy turn (counting the turn we play Frenzy as the 0th turn):
As someone who wants to play these decks and make reasonable decisions derived from sound logic, I have found choosing between these options rather difficult. The main problem with trying to make this choice is that Frenzy in particular is an incredibly random card with a terrible fail state (you find Mountain Mountain on top), and just about the highest ceiling possible from a draw engine (see this clip). Not being able to effectively measure the expected use case leads to a lot of nebulous discussion around how good it actually is, without providing actionable information as to whether it's better than the other options. So, in order to improve my understanding of Frenzy and how it plays out, I forced a bot to play 10000 games of Mono Red with Experimental Frenzy in play, and am here to report on my findings.
I'm not a fan of coffee to be honest, but invention I can get behind. (Art by Simon Dominic) |
Making the Bot
Magic is hard. Really hard. Just recently it was shown by researchers that magic is the most computationally complex game we know of. As such, we're going to have to make some simplifications in order to get a bot to be able to play magic. Firstly, we're going to ignore our opponent. Of course, opponents are a rather important part of any game of magic, but we're here to get a baseline for how Frenzy works. Once we have that baseline, we can add back in our understanding of how interaction works and modify the result based on, for example, how likely we think our opponent is to Mortify our Frenzy. Another simplification we're going to make is to assume that the board is empty when we play Frenzy, with 2 instants or sorceries in the graveyard. This allows our Ghitu Lavarunners to have 2 power and haste at all times, and is I believe a reasonable assumption to be making by the time a Frenzy is resolved.
Now that our opponent is out of the picture we need a way to make decisions. This is where an algorithm becomes important. An algorithm is a process by which we make decisions, and can often be expressed in the form "if X then Y." As magic players we often internalise these types of decision making processes and use them to form heuristics, such as "if bird then bolt." The first step to becoming a good player is to learn these heuristics, and the first step to becoming a great player is learning when to ignore them. Unfortunately, teaching a bot when to break rules just involves writing more and more rules until you run out of space on your computer. Because of this, our bot isn't going to be a great player. It won't take the perfect line every time, but it will take a decent line every time. I'll discuss more about the implications of this, but for now, here is how our bot will make decisions:
This decision making process is the basis of the simulation. Although I'm not sure this is the best possible hierarchy of choices, I believe it to be close to the best in the vast majority of situations. Although it is easy to come up with situations where this process will lead to suboptimal lines, that will be true for any such algorithm without adding many more layers of complexity.
For those of you who are interested, this simulation was implemented in Python, and if you're really interested you can find it here or contact me via twitter.
The Decklist
The decklist we'll be using for this simulation is the rather straightforward main deck from Ethan Gaieski at SCG Richmond:
4 Fanatical Firebrand
4 Ghitu Lavarunner
4 Goblin Chainwhirler
4 Runaway Steam-Kin
4 Viashino Pyromancer
3 Experimental Frenzy
4 Lightning Strike
4 Shock
4 Wizard's Lightning
4 Light Up the Stage
2 Skewer the Critics
19 Mountain
There's nothing particularly fancy about this mainboard, it's just the standard streamlined suite of 4-ofs with 3 Frenzy and a small gap filled with 2 Skewers that was common at the beginning of this standard format. This isn't the most up to date list, and isn't one I'd recommend taking into a tournament at this stage in the metagame, but it is, I believe, the best case scenario for when you resolve Frenzy. More recent iterations of Mono Red have taken advantage of Teferi, Time Raveller's effect on the format (pushing out counterspells), and have gone much bigger mainboard with 20 lands and some combination of 4 Chandra/Frenzy. Examples of this type of build can be seen in Martin Juza's recent article on the deck and also in the MPL lists for the last couple of weeks. Despite this shift, I want to look at the best case scenario for a resolved Frenzy to give a baseline as to what we can expect. Ethan's list above does this, maximising the density of damaging spells and playing a lower land and Frenzy count. This deck might not be the most likely to resolve a Frenzy in time, but it's the list of cards you want left over when you do get one in play.
Now that we've chosen a decklist, we need to populate the library in the simulation, which is harder than it might seem. We've assumed that we have a certain number of lands in play (e.g. 6), but no other permanents, and that there are 2 instants or sorceries in the graveyard. If we take the naïve approach and remove a Frenzy, 2 spells and 6 lands, then our library is preferentially skewed towards non land cards when in reality there would also be a bunch of creatures and spells already cast/in exile/still in our hand. The most reasonable approach I've found is to just leave the entire deck as is, and treat the lands in play and spells in the graveyard as "extra". This preserves the land/spell ratio we would expect to find when Frenzy hits play without having to get overly fancy with randomly determining what cards were played during the game or making too many unfounded assumptions about the gamestate. The likely effect of this is that our results will skew more centrally to an even land/spell distribution, although I doubt it will be a strong enough effect to be meaningful.
PSA: Teferis Time Raveler, Hero of Dominaria, and Mage of Zhalfir are now called Teffa, Tefairy, and Tefable respectively. This update brought to you by my mate @Archwanderer and I. (Art by Chris Rallis). |
Now that we've chosen a decklist, we need to populate the library in the simulation, which is harder than it might seem. We've assumed that we have a certain number of lands in play (e.g. 6), but no other permanents, and that there are 2 instants or sorceries in the graveyard. If we take the naïve approach and remove a Frenzy, 2 spells and 6 lands, then our library is preferentially skewed towards non land cards when in reality there would also be a bunch of creatures and spells already cast/in exile/still in our hand. The most reasonable approach I've found is to just leave the entire deck as is, and treat the lands in play and spells in the graveyard as "extra". This preserves the land/spell ratio we would expect to find when Frenzy hits play without having to get overly fancy with randomly determining what cards were played during the game or making too many unfounded assumptions about the gamestate. The likely effect of this is that our results will skew more centrally to an even land/spell distribution, although I doubt it will be a strong enough effect to be meaningful.
I Love it When a Plot Comes Together
IT'S ALIVE!!!! (art by Chris Seaman) |
The way to read this plot is to find the damage you want to deal on the x-axis, and the y-axis will tell you the probability that you've dealt this much damage at the end of the 4th Frenzy turn. For example, the probability of having dealt at least 10 damage by the end of the 4th Frenzy turn is approximately 0.37. The average amount of damage dealt is 8.4, while the median is 8. This makes reasonable sense given that there is a large number of values in the 6 to 8 range (the plot drops quickly), and a long tail of more extreme value (most of which involve multiple Steam Kins for mana and Light Up the Stage to clear lands). This long tail skews the average upwards a bit, so bear that in mind. The standard deviation in damage dealt is 5.1, which is a rather large variation, and explains why it's so hard to evaluate Frenzy properly. This is an objective measurement of just how much variance is involved in the results of such Frenzied Experiments.
What this tells us is hard to analyse exactly, but my first impression is that taking 4 turns uncontested to have a reasonable chance of dealing 8 damage is a little longer than I would have expected. In terms of gameplay, this indicates that the conventional wisdom of "play out your hand before you drop Frenzy" is rather sound. With 2 or 3 cards in hand on turn 4 of the game you're likely to be able to have a higher, more consistent, and more reactive damage output based on your opponent's play by playing out those cards, building up a larger number of lands in play, and then dropping Frenzy once you've expended your other resources. This is especially true if the cards you have are creatures, since their damage output scales linearly the more time they're in play.
What this tells us is hard to analyse exactly, but my first impression is that taking 4 turns uncontested to have a reasonable chance of dealing 8 damage is a little longer than I would have expected. In terms of gameplay, this indicates that the conventional wisdom of "play out your hand before you drop Frenzy" is rather sound. With 2 or 3 cards in hand on turn 4 of the game you're likely to be able to have a higher, more consistent, and more reactive damage output based on your opponent's play by playing out those cards, building up a larger number of lands in play, and then dropping Frenzy once you've expended your other resources. This is especially true if the cards you have are creatures, since their damage output scales linearly the more time they're in play.
More Coffee
Now that we understand this simple situation, what happens when we start to add a few more lands or have a different number of turns available? Well, that's where the next plot comes in:
This plot shows the average amount of damage dealt by the end of each turn depending on how many lands you have in play when you drop Frenzy. We can see our previous result as the blue dot on the 4th Frenzy turn, with the same average damage result of 8.4. The type of polynomial growth we see here makes sense. Every single part of this process (mana available, number of creatures in play, the ease with which we can turn on spectacle) snowballs into more and more damage with each passing turn. We also see the immense power of Frenzy when played with access to 7 mana, dealing on average over 15 damage by the end of the 4th turn. This shouldn't be necessary in most cases, but we now know that the vast majority of games will over by the 3rd turn if we resolve a frenzy with 7 mana. I'd like to add that I think these numbers could be increased by about 0.5-1 damage each turn with optimal play. As mentioned above, there are only so many rules I can give the bot, and there are many scenarios where it will make a sub-optimal decision, most of which involve Light Up the Stage cards and when to attack with Steam Kin. The damage output could also be increased by starting with a Steam Kin in play, but again we're after a baseline. Another thing to note is that this is just damage directed at our opponent's face. If you're using Frenzy as a control card, say against White Weenie such as in the Kannister clip I linked above, your damage efficiency won't be as high.
How Does it Compare?
Now that we have a better understanding of the expected damage output of Frenzy, how do we compare it to its main competitor at the moment, Chandra, Fire Artisan? Although I may tackle a bot that can play with Chandra in the future, the decision making process is far more complicated due to the multiple options available. We still have access to the card we draw each turn, and then add a card from Chandra. This very quickly becomes a rather difficult prioritisation problem where you need to decide how much to value casting the Chandra card before it expires/which card you value casting more out of the two and when to go to combat. As soon as you resolve a Light Up the Stage the complexity of the problem gets even worse, and we're once again faced with the realisation that Magic is actually pretty hard.
Despite all this, we do have an important piece of information regarding Chandra's output: she deals 7 damage on the 3rd turn if uncontested (turns 0, 1, and 2 we use her +1, turn 3 we use her -7). This places her ultimate well over our 4 and 5 mana Frenzies, even before we account for the damage dealt by the cards we draw each turn, and the 3 cards her +1 gives us access to. Another distinction to remember is that an uncontested Chandra will always deal a minimum of 7 damage on the 3rd turn, which will likely be enough to end the game, while a 4 mana Frenzy will on average have dealt 5 damage by that point. Even considering the potentially higher maximum damage of Frenzy (31 was the highest output from my simulations), the reliability of Chandra and the likelihood she will end the game seems to be too good to pass up. Taking this to an extreme hypothetical, it's like choosing between a card that is 50% to deal 100 damage to your opponent and a card that is 100% to deal 20. Sure, there might exist games where you need the overkill, but you're giving up substantial equity in the games where you don't.
That's Enough Coffee For Today
Overall, I'm rather pleased with the results of my simulation. Although I'm sure the processes could use a little work, I'm reasonably confident that my data is representative of an uncontested Frenzy. Although I will miss chaining together spells off Steam Kins at an absurd rate, I believe I now have a compelling case that Chandra is just a more efficient and reliable top end for Mono Red as it stands. She is certainly a better turn 4 play, and any advantage of a 7 mana Frenzy over her seems to be overkill. On the other hand, she is more vulnerable to being partially or completely dealt with, as opponents have the opportunity to attack her directly. Even when this happens, however, Chandra deals at least 5 damage and outperforms most 2 turn Frenzies, and even the 4 mana Frenzy on the 3rd turn.
This is the point where our knowledge of, and experience with, a format becomes important. Although at the time of writing Mortify in particular has a reasonably high standing within the format, this could easily change. With the rise of a wide variety of super friends decks, so too have planeswalker specific removal such as the Elder Spell become more common. If your opponent is likely to have a reliable way of removing Chandra without dealing damage to her then perhaps the inconsistent but explosive Frenzy is going to pressure your opponent more effectively, ensuring that they can't draw an answer to both your Frenzy and the extra cards you have in hand.
Turns out blowing stuff up is both an art and a science. (Art by Yongjae Choi) |
This isn't a very definitive answer to the binary question of "Frenzy or Chandra?" but I feel much more confident moving forwards that I'll be able to make the right decision with a better fundamental understanding of the behaviours of Experimental Frenzy, and I hope that you do too. As always, if you've got any questions or comments please feel free to contact me here or via twitter, @ArcKayNine
Until next time,
May all your stages stay lit.
Amazing, informative, well done sir. My friends and I thank you
ReplyDeleteThanks, this was amazing to read and contraticed my intial believes completly. Math rocks and you do too!
ReplyDelete