Thursday, March 31, 2011

Metrics (Part III)

In Part I, game designer Ian Schreiber outlines the debate between metrics-driven design and the more touchy-feely intuition-based design. In Part II, he explains the difficulties with trying to measure the "fun" in your game. In Part III, he tackles the issues of measuring difficulty, game balance, and the value of games to players.

Another example: measuring difficulty


Player difficulty, like fun, is another thing that’s basically impossible to measure directly, but what you can measure is progression, and failure to progress. Measures of progression are going to be different depending on your game.

For a game that presents skill-based challenges like a retro arcade game, you can measure things like how long it takes the player to clear each level, how many times they lose a life on each level, and importantly, where and how they lose a life. Collecting this information makes it really easy to see where your hardest points are, and if there are any unintentional spikes in your difficulty curve. I understand that Valve does this for their FPS games, and that they actually have a visualizer tool that will not only display all of this information, but actually plot it overlaid on a map of the level, so you can see where player deaths are clustered. Interestingly, starting with Half-Life 2 Episode 2 they actually have live reporting and uploading from players to their servers, and they have displayed their metrics on a public page (which probably helps with the aforementioned privacy concerns, because players can see for themselves exactly what is being uploaded and how it’s being used).

Yet another example: measuring game balance


What if instead you want to know if your game is fair and balanced? That’s not something you can measure directly either. However, you can track just about any number attached to any player, action or object in the game, and this can tell you a lot about both normal play patterns, and also the relative balance of strategies, objects, and anything else.

For example, suppose you have a strategy game where each player can take one of four different actions each turn, and you have a way of numerically tracking each player’s standing. You could record each turn, what action each player takes, and how it affects their respective standing in the game.

Or, suppose you have a CCG where players build their own decks, or a Fighting game where each player chooses a fighter, or an RTS where players choose a faction, or an MMO or tabletop RPG where players choose a race/class combination. Two things you can track here are which choices seem to be the most and least popular, and also which choices seem to have the highest correlation with actually winning. Note that this is not always the same thing; sometimes the big, flashy, cool-looking thing that everyone likes because it’s impressive and easy to use is still easily defeated by a sufficiently skilled player who uses a less well-known strategy. Sometimes, dominant strategies take months or even years to emerge through tens of thousands of games played; the Necropotence card in Magic: the Gathering saw almost no play for six months or so after release, until some top players figured out how to use it, because it had this really complicated and obscure set of effects… but once people started experimenting with it, they found it to be one of the most powerful cards ever made. So, both popularity and correlation with winning are two useful metrics here.

If a particular game object sees a lot more use than you expected, that can certainly signal a potential game balance issue. It may also mean that this one thing is just a lot more compelling to your target audience for whatever reason – for example, in a high fantasy game, you might be surprised to find more players creating Elves than Humans, regardless of balance issues… or maybe you wouldn’t be that surprised. Popularity can be a sign in some games that a certain play style is really fun compared to the others, and you can sometimes migrate that into other characters or classes or cards or what have you in order to make the game overall more fun.

If a game object sees less use than expected, again that can mean it’s underpowered or overcosted. It might also mean that it’s just not very fun to use, even if it’s effective. Or it might mean it is too complicated to use, it has a high learning curve relative to the rest of the game, and so players aren’t experimenting with it right away (which can be really dangerous if you’re relying on playtesters to actually, you know, playtest, if they leave some of your things alone and don’t play with them).

Metrics have other applications besides game objects. For example, one really useful area is in measuring beginning asymmetries, a common one being the first-player advantage (or disadvantage). Collect a bunch of data on seating arrangements versus end results. This happens a lot with professional games and sports; for example, I think statisticians have calculated the home-field advantage in American Football to be about 2.3 points, and depending on where you play the first-move advantage in Go is 6.5 or 7.5 points (in this latter case, the half point is used to prevent tie games). Statistics from Settlers of Catan tournaments have shown a very slight advantage to playing second in a four-player game, on the order of a few hundredths of a percent; normally we could discard that as random variation, but the sheer number of games that have been played gives the numbers some weight.

A Note on Ethics

The ethical consideration here is that a lot of these metrics look at player behavior but they don’t actually look at the value added (or removed) from the players’ lives. Some games, particularly those on Facebook which have evolved to make some of the most efficient use of metrics of any games ever made, have also been accused (by some people) of being blatantly manipulative, exploiting known flaws in human psychology to keep their players playing (and giving money) against their will. Now, this sounds silly when taken to the extreme, because we think of games as something inherently voluntary, so the idea of a game “holding us prisoner” seems strange. On the other hand, any game you’ve played for an extended period of time is a game you are emotionally invested in, and that emotional investment does have cash value. If it seems silly to you that I’d say a game “makes” you spend money, consider this: suppose I found all of your saved games and put them in one place. Maybe some of these are on console memory cards or hard disks. Maybe some of them are on your PC hard drive. For online games, your “saved game” is on some company’s server somewhere. And then suppose I threatened to destroy all of them… but not to worry, I’d replace the hardware. So you get free replacements of your hard drive and console memory cards, a fresh account on every online game you subscribe to, and so on. And then suppose I asked you, how much would you pay me to not do that. And I bet when you think about it, the answer is more than zero, and the reason is that those saved games have value to you! And more to the point, if one of these games threatened to delete all your saves unless you bought some extra downloadable content, you would at least consider it… not because you wanted to gain the content, but because you wanted to not lose your save.

To be fair, all games involve some kind of psychological manipulation, just like movies and books and all other media (there’s that whole thing about suspending our disbelief, for example). And most people don’t really have a problem with this; they still see the game experience itself as a net value-add to their life, by letting them live more in the hours they spend playing than they would have lived had they done other activities.

But just like difficulty curves, the difference between value added and taken away is not constant; it’s different from person to person. This is why we have things like MMOs that enhance the lives of millions of subscribers, while also causing horrendous bad events in the lives of a small minority that lose their marriage and family to their game obsession, or that play for so long without attending to basic bodily needs that they keel over and die at the keyboard.

So there is a question of how far we can push our players to give us money, or just to play our game at all, before we cross an ethical line… especially in the case where our game design is being driven primarily by money-based metrics. As before, I invite you to think about where you stand on this, because if you don’t know, the decision will be made for you by someone else who does.

[This article is an excerpt from Level 8: Metrics and Statistics, part of Ian Schreiber's course on game balance called Game Balance Concepts.]

Ian Schreiber has been making games professionally since 2000, first as a programmer and then as a game designer. He currently teaches game design classes for Savannah College of Art and Design and Columbus State Community College. He has worked on five shipped games and hundreds of shipped students. You can learn more about Ian at his blog, Teaching Game Design.

Monday, March 21, 2011

April 2011 Poll

Please vote for the April 2011 topic! As always, feel free to suggest more topics!  Look at the submission guidelines for Topics and Blog Entries.

You'll see the poll to the right. The choices are:
  • Gamification
  • Emergence
  • Dealing with Communities (How Fans Affect Our Games)
Emergence (from Pascal BĂ©langer)
What is emergence to you? How do you deal with it? What place do you let it take in your games/design? And in your opinion, can we make emergent games that would still be art? Or even better, are there any valid existing examples of such games.
Jesper Juul on Emergence vs progression: http://www.jesperjuul.net/text/openandtheclosed.html
 
Please vote by March 28.  Thank you!

Saturday, March 19, 2011

Metrics (Part II)

In Part I, game designer Ian Schreiber outlines the debate between metrics-driven design and the more touchy-feely intuition-based design. In Part II, he explains the difficulties with trying to measure the "fun" in your game.

How much to measure?

Suppose you want to take some metrics in your game so you can go back and do statistical analysis to improve your game balance. What metrics do you actually take – that is, what exactly do you measure?

There are two schools of thought that I’ve seen. One is to record anything and everything you can think of, log it all, mine it later. The idea is that you’d rather collect too much information and not use it, than to not collect a piece of critical info and then have to re-do all your tests.

Another school of thought is that “record everything” is fine in theory, but in practice you either have this overwhelming amount of extraneous information from which you’re supposed to find this needle in a haystack of something useful, or potentially worse, you mine the heck out of this data mountain to the point where you’re finding all kinds of correlations and relationships that don’t actually exist. By this way of thinking, instead you should figure out ahead of time what you’re going to need for your next playtest, measure that and only that, and that way you don’t get confused when you look at the wrong stuff in the wrong way later on.

Again, think about where you stand on the issue.

Personally, I think a lot depends on what resources you have. If it’s you and a few friends making a small commercial game in Flash, you probably don’t have time to do much in the way of intensive data mining, so you’re better off just figuring out the useful information you need ahead of time, and add more metrics later if a new question occurs to you that requires some data you aren’t tracking yet. If you’re at a large company with an army of actuarial statisticians with nothing better to do than find data correlations all day, then sure, go nuts with data collection and you’ll probably find all kinds of interesting things you’d never have thought of otherwise.

What specific things do you measure?

That’s all fine and good, but whether you say “just get what we need” or “collect everything we can,” neither of those is an actual design. At some point you need to specify what, exactly, you need to measure.

Like game design itself, metrics is a second-order problem. Most of the things that you want to know about your game, you can’t actually measure directly, so instead you have to figure out some kind of thing that you can measure that correlates strongly with what you’re actually trying to learn.

Example: measuring fun

Let’s take an example. In a single-player Flash game, you might want to know if the game is fun or not, but there’s no way to measure fun. What correlates with fun, that you can measure? One thing might be if players continue to play for a long time, or if they spend enough time playing to finish the game and unlock all the achievements, or if they come back to play multiple sessions (especially if they replay even after they’ve “won”), and these are all things you can measure. Now, keep in mind this isn’t a perfect correlation; players might be coming back to your game for some other reason, like if you’ve put in a crop-withering mechanic that punishes them if they don’t return, or something. But at least we can assume that if a player keeps playing, there’s probably at least some reason, and that is useful information. More to the point, if lots of players stop playing your game at a certain point and don’t come back, that tells us that point in the game is probably not enjoyable and may be driving players away. (Or if the point where they stopped playing was the end, maybe they found it incredibly enjoyable but they beat the game and now they’re done, and you didn’t give a reason to continue playing after that. So it all depends on when.)

Player usage patterns are a big deal, because whether people play, how often they play, and how long they play are (hopefully) correlated with how much they like the game. For games that require players to come back on a regular basis (like your typical Facebook game), the two buzzwords you hear a lot are Monthly Active Uniques and Daily Active Uniques (MAU and DAU). The “Active” part of that is important, because it makes sure you don’t overinflate your numbers by counting a bunch of old, dormant accounts belonging to people who stopped playing. The “Unique” part is also important, since one obsessive guy who checks FarmVille ten times a day doesn’t mean he counts as ten users. Now, normally you’d think Monthly and Daily should be equivalent, just multiply Daily by 30 or so to get Monthly, but in reality the two will be different based on how quickly your players burn out (that is, how much overlap there is between different sets of daily users). So if you divide MAU/DAU, that tells you something about how many of your players are new and how many are repeat customers.

For example, suppose you have a really sticky game with a small player base, so you only have 100 players, but those players all log in at least once per day. Here your MAU is going to be 100, and your average DAU is also going to be 100, so your MAU/DAU is 1. Now, suppose instead that you have a game that people play once and never again, but your marketing is good, so you get 100 new players every day but they never come back. Here your average DAU is still going to be 100, but your MAU is around 3000, so your MAU/DAU is about 30 in this case. So that’s the range, MAU/DAU goes between 1 (for a game where every player is extremely loyal) to 28, 30 or 31 depending on the month (representing a game where no one ever plays more than once).

A word of warning: a lot of metrics, like the ones Facebook provides, might use different ways of computing these numbers so that one set of numbers isn’t comparable to another. For example, I saw one website that listed the “worst” MAU/DAU ratio in the top 100 applications as 33-point-something, which should be flatly impossible, so clearly the numbers somewhere are being messed with (maybe they took the Dailies from a different range of dates than the Monthlies or something). And then some people compute this as a %, meaning on average, what percentage of your player pool logs in on a given day, which should range from a minimum of about 3.33% (1/30 of your monthly players logging in each day) to 100% (all of your monthly players log in every single day). This is computed by taking DAU/MAU (instead of MAU/DAU) and multiplying by 100 to get a percentage. So if you see any numbers like this from analytics websites, make sure you’re clear on how they’re computing the numbers so you’re not comparing apples to oranges.

Why is it important to know this number? For one thing, if a lot of your players keep coming back, it probably means you’ve got a good game. For another, it means you’re more likely to make money on the game, because you’ve got the same people stopping by every day… sort of like how if you operate a brick-and-mortar storefront, an individual who just drops in to window-shop may not buy anything, but if that same individual comes in and is “just looking” every single day, they’re probably going to buy something from you eventually.

[This article is an excerpt from Level 8: Metrics and Statistics, part of Ian Schreiber's course on game balance called Game Balance Concepts.]

Ian Schreiber has been making games professionally since 2000, first as a programmer and then as a game designer. He currently teaches game design classes for Savannah College of Art and Design and Columbus State Community College. He has worked on five shipped games and hundreds of shipped students. You can learn more about Ian at his blog, Teaching Game Design.

Thursday, March 17, 2011

Metrics (Part I)

In this article, game designer Ian Schreiber outlines the debate between metrics-driven design and the more touchy-feely intuition-based design.

Metrics

Here’s a common pattern in artistic and creative fields, particularly things like archaeology or art preservation or psychology or medicine where it requires a certain amount of intuition but at the same time there is still a “right answer” or “best way” to do things. The progression goes something like this:

  1. Practitioners see their field as a “soft science”; they don’t know a whole lot about best principles or practices. They do learn how things work, eventually, but it’s mostly through trial and error.
  2. Someone creates a technology that seems to solve a lot of these problems algorithmically. Practitioners rejoice. Finally, we’re a hard science! No more guesswork! Most younger practitioners abandon the “old ways” and embrace “science” as a way to solve all their field’s problems. The old guard, meanwhile, sees it as a threat to how they’ve always done things, and eyes it skeptically.
  3. The limitations of the technology become apparent after much use. Practitioners realize that there is still a mysterious, touchy-feely element to what they do, and that while some day the tech might answer everything, that day is a lot farther off than it first appeared. Widespread disillusionment occurs as people no longer want to trust their instincts because theoretically technology can do it better, but people don’t want to trust the current technology because it doesn’t work that great yet. The young turks acknowledge that this wasn’t the panacea they thought; the old guard acknowledge that it’s still a lot more useful than they assumed at first. Everyone kisses and makes up.
  4. Eventually, people settle into a pattern where they learn what parts can be done by computer algorithms, and what parts need an actual creative human thinking, and the field becomes stronger as the best parts of each get combined. But learning which parts go best with humans and which parts are best left to computers is a learning process that takes a while.
Currently, game design seems to be just starting Step 2. We’re hearing more and more people anecdotally saying why metrics and statistical analysis saved their company. We hear about MMOs that are able to solve their game balance problems by looking at player patterns, before the players themselves learn enough to exploit them. We hear of Zynga changing the font color from red to pink which generates exponentially more click-throughs from players to try out other games. We have entire companies that have sprung up solely to help game developers capture and analyze their metrics. The industry is falling in love with metrics, and I’ll go on record predicting that at least one company that relies entirely on metrics-driven design will fail, badly, by the time this whole thing shakes out, because they will be looking so hard at the numbers that they’ll forget that there are actually human players out there who are trying to have fun in a way that can’t really be measured directly. Or maybe not. I’ve been wrong before.

At any rate, right now there seems to be three schools of thought on the use of metrics:
  • The old school Zynga model: design almost exclusively by metrics. Love it or hate it, 60 Million monthly active unique players laugh at your feeble intuition-based design.
  • Rebellion against the old school Zynga model: metrics are easy to misunderstand, easy to manipulate, and are therefore dangerous and do more harm than good. If you measure player activity and find out that more players use the login screen than any other in-game action, that doesn’t mean you should add more login screens to your game out of some preconceived notion that if a player does it, it’s fun. If you design using metrics, you push yourself into designing the kinds of games that can be designed solely by metrics, which pushes you away from a lot of really interesting video game genres.
  • The moderate road: metrics have their uses, they help you tune your game to find local “peaks” of joy. They help you take a good game and make it just a little bit better, by helping you explore the nearby design space. However, intuition also has its uses; sometimes you need to take broad leaps in unexplored territory to find the global “peaks,” and metrics alone will not get you there, because sometimes you have to make a game a little worse in one way before it gets a lot better in another, and metrics won’t ever let you do that.
Think about it for a bit and decide where you stand, personally, as a designer. What about the people you work with on a team (if you work with others on a team)?

[This article is an excerpt from Level 8: Metrics and Statistics, part of Ian Schreiber's course on game balance called Game Balance Concepts.]

Ian Schreiber has been making games professionally since 2000, first as a programmer and then as a game designer. He currently teaches game design classes for Savannah College of Art and Design and Columbus State Community College. He has worked on five shipped games and hundreds of shipped students. You can learn more about Ian at his blog, Teaching Game Design.

Monday, March 14, 2011

March 2011: Using Metrics

Metrics.  For some, metrics is a golden savior:  the pathway to riches!  By learning what users respond to, you can quickly capitalize on this info and rocket your game to success.  Is it really that easy?  Surely, you need to know how to interpret the data so that you are not led astray.

When David Michael and I presented at the Serious Games Conference D.C., we explained that while it might be easy to mine user actions for data, it might be hard to tell if users had actually "learned" anything.  People who make surveys know that it's not always that easy to ask the right questions.  For one, are you even measuring what you want to measure?  Maybe the level of engagement is not shown in that indicator.  After all, this is just a bunch of numbers.  All subject to interpretation.

At this year's GDC, metrics certainly made an impression, but there were cautionary tales warning people not to get too beholden by them.  Metrics don't solve the world's problems.  They don't tell you WHY, just WHAT.

With this in mind, I'd like to know:
  • Are you using metrics?  If so, why?
  • What do you feel about using metrics and will it help the industry as a whole?
  • How are you using metrics?  And how does this help you? 
  • If we just cater to a certain audience's likes and dislikes, are we limiting our audience to a certain type?