Plenty to read!

Plenty to read!

Analyze Video Game Data for Personal Data Projects

Analyze Video Game Data for Personal Data Projects


VIDEO GAMES ARE THE IDEAL DATA SOURCE

…to learn Fabric, Power BI, or train your data analyst skills, in general.


LEARNING BY DOING DATA PROJECTS

An effective way to learn tools and practice your data skills is to conduct personal data projects. These projects are typically self-motivated, exploratory exercises to investigate questions and data that’s personally relevant and interesting to you. In a previous article, I described how you can use personal data for this, like from personal devices (such as Fitbit), services (such as Netflix), or personal tracking (such as Digital Shadow Projects).

In this article, I discuss another valuable data source for these projects: video games. If you or your friends and family enjoy playing video games, this can be a spectacular source of data projects. Specifically, the complexity of video games and their file structures make them ideal to simulate the technical complexity of a ‘real’ data project. Furthermore, because of their scale and complexity, you have to carefully consider your approach and design when using this data—again—much like a real data project.

Additionally, I describe some key considerations for you to design and execute a data project successfully such that it is a valuable addition to your portfolio and CV.

The purpose of this article is twofold:

1. To explain how to get data from video games for data projects, and why this is valuable for learning.
2. To give some tips for how to have a successful personal data project that's valuable for your CV.

 

 

VIDEO GAMES AS DATA SOURCES

A video game is at its core just a lot of complex data and systems. These data and systems must seamlessly integrate to create an interactive experience for users. From objects to players, there are numerous properties and attributes that must be continuously tracked to produce the end result. Furthermore, all this information needs to coalesce into a concise and effective user experience for players to understand what decisions and actions they should take. In a lot of ways, a video game isn’t so different from a BI solution:

  • Both have to integrate, manage, and aggregate a lot of information.

  • This information has to be effectively and concisely presented to a user.

  • The user takes this information and uses it to drive decisions and actions to achieve their objectives.

  • The user may need different granularity of information, from a high-level overview to drilldown details.

Because of this complexity and workflow, video game data can be an ideal candidate for your next portfolio project. However, it also means that you should put more effort into defining the project scope and goals up-front to be successful.

Example of a Power BI report created (back in 2020) with data collected from a Minecraft server.

 

 

TIPS FOR CONDUCTING EFFECTIVE PERSONAL DATA PROJECTS

When conducting a personal data project, there’s some key considerations to keep in mind for you to be successful (and also to just finish it).

ANSWER A SPECIFIC QUESTION

A good data project (both personal and professional) clearly defines its purpose. The first thing you should do is clearly articulate the questions you’re trying to answer with this project. Write out your goals and describe how the data could help you achieve them. The simplest goals address your curiosity, however, the best goals are functional in nature. Functional goals help you achieve something, like better performance.

Two examples of goals are below for a game where you fight monsters in consecutive, repeated runs:

  • Curiosity: With which weapon do I have the most monster kills or the highest damage?
    Answering this question is a one-off exercise. There’s no specific objective in the game, you’re just trying to understand some information that the game maybe doesn’t currently tell you.

  • Functional: What’s my fastest run? What’s my average? How can I complete the run faster?
    Answering this question requires more thought. The specific objective is to shorten the run times. To achieve this objective you have to actually analyze this data to find possible answers, and repeatedly use the solution to measure whether the changing decisions or actions result in more favorable outcomes.

Once you’ve defined these questions, it also helps to consider how you’ll explain them to someone else. Why did you have these questions, and why should someone care about answering them?

DESIGN SOMETHING YOU’LL ACTUALLY USE

The best personal data projects are the ones where you are your own user. With one-off analyses, you can be messy and quick to answer the question, after which, you don’t need to answer that question again. However, if you’re regularly using the solution, you’ll need to set up automated, robust processes and designs that are effective and reliable.

Some examples of goals that require you make solutions:

  • Improving your performance in the game
    (such as with games that have repeated sessions, missions, or matches, like Hades or Slay the Spire)

  • Helping you plan
    (such as with factory-building and automation games, like Satisfactory or Oxygen Not Included)

  • Monitoring a server and showing the top players for different metrics
    (such as with open-ended, creative games like Minecraft)

SET A SMALL SCOPE AND EXPAND IT INCREMENTALLY IF NEEDED

The majority of personal data projects lose steam before they’re completed. A big reason for this is that people bite off more than they can chew. To avoid this, you should set a narrow scope with limited questions you want to answer. Focus on achieving something small. It’s better to create something small and effective that works end-to-end rather than something large and ambitious that remains unfinished.

This is particularly important when analyzing video game data, because of how vastly complex it is. It’s very easy to get trapped in “postage stamp collecting”, meaning you start gathering large amounts of metrics and attributes without a clear idea of what you’ll do with this information. Remain focused on the question you defined up-front, and consider focusing on one question at a time.

 

 

WHERE TO FIND DATA FOR YOUR VIDEO GAMES

Video game data are valuable for personal data projects because of their complexity. This means you have to consider things like purpose, scope, and design, much like you would for a real data project. Because of the data volumes and high cardinality, you also might need more sophisticated tools than Power BI Desktop, which makes a project like this perfect for learning about and testing Microsoft Fabric. But where do you find this information, and how can you start analyzing it?

There are several options available to you, described in the below sections.

 

VIDEO GAME SAVE FILES

Games are played over multiple sessions. As such, the player’s progress needs to be saved, so they can resume where they left off. These save files contain a snapshot of the game’s state, including all of the attributes and metrics and that point-in-time. As such, finding and analyzing these save files can be one of the easiest ways to analyze video game data.

Example of a save file for the game Stardew Valley. Highlighted is the “money” metric, or how much money your character currently has as of the save file.

To find video game save files, you can usually see an “open local files” option in the game menu, or in the menu of the vendor (Steam, GOG). If you can’t find it, it’s simple to Google where these save files are located.

You’ll want to consider the following points when analyzing video game files, though:

  • Human-readability: Most save-files aren’t human-readable, because if a player opens and alters it, they could cheat to give themselves an advantage. For example, they could modify the “money” property to a large amount, or unlock new abilities. This makes it difficult—but not impossible—to analyze these files. To do this, you’re forced to parse the save file either using one that’s publicly available (Satisfactory Calculator is a good example) or you need to parse it yourself. This is a very difficult exercise, as you essentially need to “crack” the file structure. Some games do have human-readable save files (JSON or XML), such as Minecraft, Stardew Valley, and Rimworld.

  • Point-in-time: Save files represent a static state; a fixed point-in-time. To observe trends, you need to design a way to snapshot these save files (or the information you want the trend for). A simple way to do this is to just copy the save file to a Blob storage at a fixed interval, or even when changes are detected.

  • Relevant information: Open the file and identify where the relevant information is before you start transforming it for analysis. Often, you’ll need less than 5% of the total data to answer your question.

 

VIDEO GAME MODDING

It’s a common practice for many games to support community modifications to the game. These mods can extend content, provide new functionality, or any other changes. These frameworks can often be very accessible, providing APIs to easily access game content and data while you play.

You can design a small mod that collects data from your game as you play at fixed intervals, or upon events. This real-time handling of information can be a really interesting way to experiment with certain tools. For example, you can design a mod that writes to a real-time analytics workload so that you get useful analytics and reports while you play.

In the below example, I wrote a small mod that collects information from the farming simulator game Stardew Valley as I played and writes it to a CSV. The objective was to track the cumulative money earned and correlate that to other variables (although I did go a bit overboard). However, the real objective was to generate a sufficiently large and complex dataset that I could analyze and enrich in Fabric for demos.

I hardly know any C#, but writing this mod was barely more complex than creating a Tabular Editor script, and it worked great.

 

Example of the workflow I used to analyze the resulting data. Data was extracted from the game using a mod, and stored in a location accessible to a Fabric Dataflow Gen2. From there, it’s the typical workflow of landing the data in the Lakehouse, shaping and staging it, before enriching it with other information. The resulting analysis-ready tables can be analyzed in a Power BI dataset with a connected report the player sees.

 

SCREEN-SCRAPING WHILE RUNNING THE GAME

A third option is to read and scrape the screen of the game as you play. With this approach, you monitor the screen and extract information at a fixed interval. This approach is typically best for multiplayer games where you don’t have access to the files, but you still want to analyze them. Alternatively, you can analyze recordings instead of live data. While this can be quite sophisticated to set up, once you’ve got it running, you don’t have to deal with the game’s files and their obtuse format.

 

TO CONCLUDE

If you or your friends and family play video games, you can analyze the data for a personal data project. The complexity and volume of this data forces you to create end-to-end workflows, as well as to have a more thorough consideration of the questions you’re trying to answer and the specific scope and approach you’ll take.

The easiest way to do this is by analyzing human-readable save file data, such as the JSON and XML files from games like Minecraft, Stardew Valley and Rimworld. In a future blogpost, I’ll explain how you can use this information to seed and enrich an enterprise-sized dataset, by using this video game data to create your own simulated sales dataset for demos and portfolio reports.


How to set table or matrix column widths in a Power BI report

How to set table or matrix column widths in a Power BI report

Solve the Right Problems

Solve the Right Problems

0