Data - MarketingIn the ultra-competitive world of digital entertainment, a fundamental shift is occurring. While the age-old practice of "moving product off the shelf" remains, more and more video game publishers are releasing products, both online and mobile games, which rely on keeping players engaged, playing longer and purchasing in-game currency in order to maximize their revenue. To track customer behavior and make actionable business decisions, organizations require structured data, metrics and customer segments - on a per-game basis. Example business questions include: what is the impact to sales based off changes in prices/promotions; what is player retention over time; how does revenue measure against forecast; how much should be spent on new games or in-game content?

As I see it, answering these questions and the many others like them involve three main steps.

Extract semi-structured data from raw log files

Games track what happens within a game by writing to log files, which generate millions of rows of 'events' every minute that a game is live. These logs form the most direct insight into a player's in-game behavior - being able to harness the information contained within this mountain of data opens up the doorway to understand why customers behave the way they do. As the analytics of customer behavior matures, more data points and/or metrics can be added to log files to satisfy reporting needs.

The two main raw events we're concerned with are a player logging in to the game, and a player's in-game purchase habits. Based on the volume and the velocity at which this data is generated, Big Data technologies such as Spark, Databricks, and Impala are frequently employed to ingest the raw files into a semi-structured format for either command-line or GUI-based querying.

Build Data warehouse with dimensions and base-level facts

Once the in-game data is extracted; it can be modeled into a traditional star schema warehouse along with account-level metadata. During the ETL process from the Big Data tables into a more traditional relational database, such as AWS Redshift, the most basic in-game events, sign-ons and in-game spending, can be transformed into base-level information: day-by-day customer churn and transaction deltas. Leveraging a data warehouse service, such as AWS Redshift, allows for an organization to quickly react to the growing needs of its analytics practice, without the typical lag time associated with expanding on premise data infrastructures. With these rudimentary data points in place, a whole array of high level metrics is now possible.

Transform base data tables into advanced business metrics

Once the base data is populating the data warehouse, it's time to turn that data into actionable business information. Metrics such as Monthly Active Users (MAU), Average Revenue per User (ARPU), Average Revenue per Paying User (ARPPU) and various Retention Rates can all be calculated off of the two base metrics described above. Depending on the size of the data set being processed, this reporting layer of calculations can either be done in real-time through a view, or can be calculated and stored in a dedicated reporting table through the use of a custom SQL script. Additionally, relating these metrics to each individual customer account allows for power segmentation analysis along locations, playing patterns, and other in-game characteristics.


Being able to produce numerous insightful business metrics from two simple event types represents a powerful ability to gain insights. As more event types are "brought into the fold" of data analysis, then the possible value that expansive data engineering brings to the business grows exponentially. Additionally, as seen with this example, more and more businesses are employing a wide array of technologies to move and transform data from its source (game logs in this case) to the final reporting environment, incorporating both cutting edge Big Data ingestion and tried and tested relational warehouses. Being able to play to the strengths of various technologies along the path from data to information assures a business maximizes its return on investment in business intelligence.

About the Author

Mahlon GrahamMahlon Graham is a data-focused IT professional with a wide array of industry experience specializing in data engineering. Mr. Graham has helped numerous clients expand their data-driven decision making capabilities by designing and implementing data warehousing and business intelligence solutions.