Plenty to read!

Plenty to read!

Learn Power BI with Sample Datasets: Part 2 - Personal Data Projects

Learn Power BI with Sample Datasets: Part 2 - Personal Data Projects


PRACTICE POWER BI WITH YOUR OWN DATA

…taken from online services you use, to create analyses & dashboards meaningful for you.


Choosing the right test data to learn Power BI is important. You should get data that represents ‘real’ scenarios and also aligns with your learning goals.

LEARNING BY DOING

When learning Power BI, we need data to practice using features and functionalities. Everyone learns differently, but it is insufficient to only read or watch videos - even if they are full courses - to learn Power BI. We need to get our hands dirty. In practice, however, this is difficult. Where do we find data or databases to use for practicing? How can we fit that data to our learning goals?

In this series, we look at different ways to find or generate sample data we can use to generate reports or full solutions for learning, demos & portfolio projects. These articles are written assuming no prior technical knowledge, explaining things in plain language and assuming these sample data are used exclusively for personal use and learning.

This article is about using personal data projects to learn Power BI, sharing some examples of where to find and collect personal data, and best practices when working on these ‘digital shadow’ projects.


CLICK HERE FOR THE ARTICLE

Sqlbi have created a free tool, providing orders datasets of various sizes ready-made, or a means to customize our own Contoso sample dataset with different parameters. This dataset can be inflated to large sizes of over 100 million rows, and pairs well with essential DAX learning resources like dax.do and the sqlbi DAX courses.


There are many services and platforms collecting data about you every day, and easy means to get that data. From social media to geolocation and biometrics, we look at common datasets and use-cases for retrieving your personal data to create Power BI solutions that are useful to you & test your skills. This is particularly interesting for testing your Power Query knowledge, as these datasets often require a lot of effort to wrangle under control. It can also just be a lot of fun.


Depending on your goals, you might find that the datasets available don’t sufficiently fit your needs. In these cases, it’s often easiest to generate your own, fake data for your project. There are many tools & methods to do this, but in this part we look at 3 methods using Excel, Power Query and Python.

 

PART 2: PERSONAL DATA PROJECTS


 

I. INTRODUCTION

Data is collected about almost everything we do. Putting it together, we can almost construct a ‘digital shadow’ of ourselves.

PERSONAL DATA ABOUNDS

It is common knowledge nowadays that the services we use collect vast amounts of data about us as users. Not only our activity — clicks, likes, comments or listens — but also metadata about that activity. How long you spend looking at a webpage section, what times of the day you typically check that app, and so on. The more connected services we use, the more data there is. From Bluetooth toothbrushes to motion-sensing kitty-cams, nearly everything in our environment today is producing data.

It’s truly mind-boggling to reflect upon how much of our daily lives are collected and stored. Thankfully, recent changes to data privacy legislation has been making this (somewhat) more transparent. With many services, you can request your personal information as a flat-file export or even programmatically with an API. In some regions, companies are even legally obliged to make this possible. However, that doesn’t mean they make it easy, or even legible. As data professionals, however, this poses a unique challenge for us.

Personal data is a tremendously valuable source for learning. With it, we can both educate ourselves about the information collected about us while also honing our data-wrangling and analysis skills. By retrieving our personal data from a wide variety of sources, then shaping and consolidating it together, we can create interesting and useful data solutions for ourselves. Sometimes referred to as digital shadow projects, these arise with particular popularity around the New Year, as we reflect over what we’ve done and how we’ve changed, as evidenced by our own data. As human beings, we are inherently more interested in our own information… it is about us, after all. When looking at a dashboard of our own data, we have the full context. We can draw conclusions or form interpretations that no one else can. Even corporations have picked upon the fact that this triggers us. Such dashboards are arising in more and more software products, such as the ‘Spotify Wrapped’ or ‘Steam Review’, serving analytics about the music you’ve listened to and games you’ve played this year, respectively. Given how prevalently these are shared online, this is clearly a valuable and innovative marketing strategy. But how can we get value from this, ourselves? Why is this personal data interesting to learn data skills, and how can we execute such a personal data project, successfully?

 

II. WHY PERSONAL DATA?

REPURPOSING PERSONAL DATA FOR LEARNING

How can we learn Power BI with our personal data? Well, mainly thanks to Power Query, Power BI is actually the perfect tool to wrangle with personal data. Power Query provides an easy solution for us to shape and transform the data, while we can analyze it with DAX and visualize the results in a report. All of this is self-contained to an offline Power BI Desktop file we use, ourselves, entirely for free.

But why should we use personal data to learn Power BI? Why is personal data so good at teaching us the right skills and mentality?

 

Practice datasets — even the best ones — are almost always clean, complete and accurate. This doesn’t typically reflect ‘real’ data scenarios, where the data we’re given needs to be scrutinized with intensity and cleaned to ensure a quality solution is created. Thankfully, personal data gives us plenty of chances to practice this, because it’s definitely not sparkly clean. The services that collect and deliver that data are often imperfect, as well as our use of those services. Since we know ourselves best, we can easily see flaws or gaps in the data, and consider how to deal with them.

Example: Fitbit data
Maybe you forgot to wear your Fitbit for a few days, so the data is missing. Or maybe you notice that - hypothetically - after a firmware update, your heart rate was mysteriously ~5 bpm slower. How do you deal with these data challenges when analyzing it?


Similarly, practice datasets often come in a pre-arranged structure fit-for-purpose. Again, this doesn’t often reflect reality. In data projects, the data we use needs to go through ETL (extract, transform & load) pipelines to transform it to the right shape — the final, “business-ready” form of the data. Since our personal data is often shaped for the services that use and collect it, we almost always have to spend a huge amount of effort to shape it. From my own experience, about ~90% of my time on personal data projects is spent building the solution to collect and transform the data; only the last 10% is spent on visualization and analysis. This perfectly reflects many of the enterprise data projects I’ve undertaken, as well.

Example: YouTube Watch History data
The data as you get it could be shaped in a .json file that you need to flatten and transform to a table format. Further, it provides no information about categories or tags. What if you want to see how many movie trailers you watched this year? How do you categorize this? Also, the ‘length’ column seems to be in a strange unit. What unit is that, and how can you transform it so you see the length in MM:SS? Where can you find data about how long you watched, or on which device?


Even though many services are obliged to give you a way to retrieve your personal data, most of them don’t make it very easy. These data requests often require you to go through a laborious series of online forms and checkboxes, or serve you only parts of the data at once. They aren’t obliged to make it easy, nor to provide you ways to easily automate this data retrieval, like through an API. Again, this also reflects the reality of many data projects. That forecast file provided by Bob from America and Janet from England — how can we add that to our data solution? How can we update it automatically? What if Bob or Janet change the structure or column names in the Excel file? Apparently the forecast changes every year, how do we deal with that? Further, the business also wants to include customer reviews of our products, and the company stock price & ticker on the executive overview page… how can we add that information?

 

There are of course many other practical reasons. For example, our personal data is ‘real’ but also free, it’s large, granular and complex, and since it’s from ourselves, we can easily come up with questions to address with whatever data solution we make.

 

III. TIPS FOR A SUCCESSFUL PERSONAL DATA PROJECT

YOUR DIGITAL SHADOW PROJECT

Good data solutions address a question or problem, and help us successfully take actions to address it. We obviously have personal goals like health, so why not try to build data solutions that help us reach them?

As mentioned in the previous article in this series, just messing around with data in Power BI won’t ensure you learn anything valuable. You need to first define some goals; some questions you want answered. You need to hypothesize, research and design the solution you want to make, before you make it. If you do not do this, you will end up easily overwhelmed with the information you’ve collected, or uncertain about what to do with it.

The best way to do this is to examine the available data, framing it in the context of your personal goals and challenges. What questions do you have about yourself, or what problems do you face that this data can help you address? How can this data help you make better decisions and take successful actions so you reach your goals, or solve your problems?

Below are 5 tips to keep in mind before you set out on your own digital shadow project:


This ensures your project has a main focus and objective. Really make sure this is well-defined before you try to “analyze all the things”.

Some off-the-cuff examples:

  • ‘Was I more active this year compared to last year?’

  • ‘Am I watching more Netflix lately compared to average?’

  • Did I kill more mobs in my Minecraft game compared to last time?’

  • ‘How does my sleep compare to that of my partner?’

  • ‘How active are my cats during the day?

Since good data solutions are also actionable, think about what actions you intend to take from the thing you are making. How will it be useful for you? How will it bring value?

More examples:

  • ‘I want to reach a minimum daily stepcount of 10k steps. I can build a Power BI solution based on my Fitbit, Apple Health or Google Fit data that sends me data alerts to move if I’m below 8k steps by 17:00.‘

  • I want to eat more healthy. I can build a Power BI solution based on my bank account and paypal data to get visibility on how much I spend ordering fast food, and send me alerts informing me how much I’ve saved if I order less than average in a given month compared to previous months.’


Many personal data projects fail because they’re too big. After a while, you’ll get fatigued and give up. Instead, define a narrow scope and build something quickly. You can always add to it, later. A small but complete project is better than an ambitious one, unfinished.


Create something that you will really want to or need to use on a daily or weekly basis. This will teach you very quickly whether you’re creating something useful, and as your own user, you can practice building something iterative, starting with an MVP and refining it based on your own feedback (to yourself).


The ideal personal data project is also a portfolio item. You should keep in mind, how will you explain this to someone else? A peer, or someone interviewing you? Remember that not everyone shares common interests, so the question & purpose of the project needs to be clear, as well as the challenges you faced and how you overcame them. Why should they care?


It’s easy to get wrapped up in your own project, particularly once you start seeing interesting trends and insights. However, this is also an opportunity to practice your discipline in creating diligent documentation. This is to your own benefit, because you will need to come back and check how you did something when you want to apply it in a professional setting. If it’s not documented, you’ll waste time reverse-engineering it, or learning it again.


So someone else did this already. Who cares? There’s no rule that says you have to be unique; you have to be the first. Don’t pay attention to what other people are doing, and certainly don’t compare yourself to others. If you’re working on a personal data project, just do it. Learn from it. Finish it. It will be worth it, but only if you keep going.

 

IV. PERSONAL DATA SOURCES - EXAMPLES & CASES

In this section, we’ll examine five different personal data sources, and projects you can make with them. It is not the purpose to provide a tutorial for each data source; this article simply links to where you can go to get the data, with a bit of context, and some suggested questions to analyze.

Streaming platforms are ubiquitous, particularly since the pandemic. Most people have subscriptions to one or more platforms, such as Netflix or Amazon Prime. For some of these platforms, getting your data is easy. For others… not so much. Below are some ways to get this data as of December 2022:

1. Netflix:
Go to your account settings,

  • Clicking on a profile and select ‘Viewing Activity’ to view or download (bottom right)

  • Select ‘Download Your Personal Information’.
    When I try either of these methods this myself, however, I get an error with my profile, and the request takes up to 30 days to process and get your data.

What I see when requesting a download of profile-specific information.

While getting data from Netflix is the easiest process, it also apparently takes the longest to get the data. 30 Days?!

2. Amazon Prime:
In your account settings, watch history is browsable as a separate tab. Downloading your data is not available through the Amazon Prime page, but is something you can do with a separate request through to Amazon itself, described here. From the available menu options, though, it seems like this might not return watch history, but only settings. If that is the case, the only way to get your watch history into a file is to scrape it from the profile page in your browser, for example using Power Automate Desktop or Python.

The process through Amazon is not so intuitive and also not guaranteed to give information about Prime watch history.

3. Disney Plus or Hulu:
From the Disney Plus app, there does not appear to be any means to view or download your watch history at this time. Reading through the Disney Privacy Policy, the only way to get this information appears to be contacting Disney ‘Guest Services’ or contacting their EMEA Data Protection Officer. It is clearly stated in their Privacy Policy that you have the right to request access to personal information, however.

4. Apple TV:
For Apple, the process is similar to Amazon. It’s possible to view your watch history from within the app, but downloading a copy requires navigating through a specific, general apple process, detailed here.

QUESTIONS FOR ANALYZING YOUR STREAMING DATA

Once you’ve transformed your data and loaded it into Power BI, build your model and DAX to answer the below questions in a 2-page report. Page 1 should focus more on how much you watch, and page 2 should focus on what you watch; a drilldown from the Page 1 ‘trend’ view.

  1. On average, how much Streaming do you do per day? Per week?

  2. What times of the day do you tend to watch streaming the most? How does this vary over the week?

  3. Have you been watching more streaming lately (in the last 2 weeks) compared to average? Use a rolling 14-day average to compare. What about if you use a 28-day average? 7-day average?

  4. Can you identify differences between people in your household? Who watches the most / least?

  5. What genres, shows and movies are most popular in your household?

  6. Which television episode or movie have you re-watched the most times?

  7. Are there specific genres you tend to watch at specific times?

  8. How many series have you abandoned before finishing them, according to your streaming data?

  9. How many series have you binged (binge = 3+ episodes watched in one sitting / back-to-back )?

  10. What is the series you have been watching for the longest period of time (longest gap between first and latest watch)?

Streaming services are popular not just for video content, but also audio content. The same kinds of questions can be asked about our music listening habits as our video-watching habits. Since we often listen to music while doing other things, and since songs are typically much shorter than videos, this data is often much larger and richer. Further, there are a lot of public resources you can use to get additional metadata about your music. For example, thanks to Matthew Roche, I learned about the open source Musicbrainz database, which catalogues metadata about millions of tracks in a relational database you can freely download and use. I have loaded a copy of this database into a Power BI Datamart, and regularly use it to analyze my own music data, or query for additional music from artists I’ve newly discovered.

  1. Spotify:
    The best resource for analyzing your Spotify data is from Matthew Roche’s blog post about the topic. It is a first class example of how to get and analyze personal data in a useful and interesting way; a model project. I highly recommend you read the entire post. Additional instructions exist on the Spotify data privacy page, though it doesn’t seem like listening history is yet available via their API.

  2. YouTube Music:
    Inspired by Matthew, I’ve written my own post about doing the same with YouTube music data, since I don’t use Spotify. This data is easily available via Takeout; you can retrieve it quite easily. Unfortunately, it is very ‘bare bones’; you will need to go through a lot of data transformation effort to clean the data and enrich it with additional metadata from i.e. Musicbrainz.

  3. Apple Music:
    Apple has their own service which provides analytics about your listening history called Apple Replay, but it’s unclear whether you can easily download this information for yourself. To retrieve this information at length, you’ll need to follow the same process as getting your Watch History from Apple TV; via privacy.apple.com data requests.

You can further extend this by also including podcast listening history from other podcast-specific sources. The sources above should include Podcasts by default (and you will need to filter them out to get only music).

QUESTIONS FOR ANALYZING YOUR MUSIC LISTENING HISTORY

To reiterate, it’s strongly encouraged that you consider using the Musicbrainz metadata database to enrich your analysis. Without it, you will only have a small amount of attributes. This also gives you some experience with managing a database and fitting it to transactional data to answer specific questions.

  1. In general, how much music did you listen to on a daily average? Express this using HH:MM. Is it more or less than the previous year? If you only have one year of data, is the average increasing or decreasing over time (using a rolling average)?

  2. When in the day do you tend to listen to music the most? How does this compare across the week?

  3. What are your top genres? Your top artists? Your top albums? Your top songs? If you view your top genres or artists by playtime instead of play count, do you get different results? How do your top played artists compare to your list of ‘favourite 10 artists’, ignoring this data?

  4. How have your listening habits changed over time? Are there specific genres that you have been listening more to, lately, compared to others? What about artists?

  5. How many artists have you discovered this year (artists that you did not listen to in past years; the first listen was this calendar year or in the last 12 months)? What proportion of your total listens does this account for?

  6. What percentage of your listens are from albums / songs released this year vs. those released in previous years?

  7. What is the oldest song that you’ve listened to? What is the newest song?

  8. If you sort your listens by release year, do you see a specific pattern that makes sense?

  9. Do you tend to listen to longer songs, or shorter songs? What correlation do you observe between the number of plays and the average song play time?

  10. How do your listening habits compare to album ratings or sales? Can you find any correlation or pattern? Can you obtain any public data reflecting the popularity of these tracks, and if so, how do your own listening habits compare to the aggregate?

The first time I used takeout.google.com my jaw hit the floor. I requested everything and started going through it, and it was just … staggering. The sheer volume and scope of the information collected was immense, particularly the data about mobile phone usage (if you’ve had an Android device). It’s really impressive that Google have put together such a simple and elegant tool to retrieve all this information. Compared to other platforms and services, this really seems to go above and beyond. That doesn’t mean it’s complete, however. As you go through the available data you may notice that a lot of data seems incomplete; there are undoubtedly a lot of attributes and categories not included with the information, or it is over-aggregated to a daily or monthly level.

QUESTIONS FOR ANALYZING YOUR GOOGLE DATA

Since the scope of this data is so vast, you will really need to narrow what you’re trying to analyze before you request the data. I strongly advise against doing an “all-in” download of everything. Instead, select specific things you are interested in and download only the relevant information.

YouTube Watch History

  1. How much YouTube do you watch per day, on average? Per week? Does this make more sense if you view it as total hours watched, or the total number of different videos?

  2. What kinds of videos do you watch on YouTube?

  3. Which content creator do you watch the most?

  4. What percentage of the content you view are from creators you are subscribed to?

  5. Are you watching more or less YouTube videos over time? (Use the 14-day average)

  6. Do you tend to watch longer (>10 min), shorter (< 2.5 min) or medium (2.5-10 min) videos?

  7. What YouTube videos have you watched most frequently?

  8. When do you tend to watch YouTube videos? Do you watch more videos during your work hours on workdays, or outside work hours?

  9. What percentage of videos that you watch do you ‘like’ or add to a list?

  10. What percentage of videos do you watch vs. play in the background (i.e. ‘Lo-fi Hip Hop Study Mix’)

Phone Activity

  1. What apps do you use the most? How does this compare to the prior year?

  2. Are the most popular apps you use being used more or less frequently by you?

  3. How long are your sessions that you use a phone (assume a ‘session’ is the range between a series of ‘usage’ activities, where any lapse in activity of more than 10 seconds ends a session)?

  4. etc…

Social media platforms are among the easiest software from which you can retrieve your personal information. Since most social media platforms are public-facing, they often also provide APIs you can use to collect information more easily. This information is usually very large and text-dense, letting you experiment with techniques like text analysis, natural language processing or network analysis.

It’s essential when working with this data to take care of what you are downloading and how you share it. When you download personal information from Social Media, you also are indirectly obtaining personal information of others, including private messages, group chats, and files that have been shared (even if archived or deleted, for some services). Depending on where you reside, collecting this information may come with specific data privacy restrictions and regulations. Before you download and use this information - particularly if you plan on sharing what you make - be diligent in checking your responsibilities with this information.

To avoid problems, it’s important to be responsible in how you handle and analyze this data. This is just like when analyzing personally identifiable information (PII) in real-life. Researching and creating secure data storage and handling solutions protect both you and your network.

  1. Twitter:
    Under ‘Settings and Privacy > Your Account’ exists the option to download an archive of your data. This gives you a compressed flat-file dump of all your information, including tweets & more. If you want to collect information like tweets and metrics from other accounts, Twitter also has an API that is extensively used and documented. You can find plenty of tutorials online to get started using it.

  2. Facebook:
    Facebook provides a similar experience to retrieve your data from your profile. Note that this exclusively pertains to Facebook and not other Meta services like Instagram, WhatsApp or Oculus. Meta do provide a number of APIs which can be used to interact with their services, such as the Facebook Graph API documented here.

  3. WhatsApp:
    To download your WhatsApp data you can either export it per chat or parse the back-up you have saved to a cloud service like Google Drive or iCloud, though these back-ups are encrypted and will need to be handled by your account, first.

  4. Instagram:
    Download your Instagram data follows a similar process as Facebook. You can retrieve your data in similar formats or access via the Instagram API.

  5. LinkedIn:
    In LinkedIn you can similarly download your data from your profile. There are APIs available that let you perform actions and gather information, which you can find online.

    There are of course many other social platforms like Tiktok, Snapchat, YouTube, and so forth. Usually, retrieving your personal information from these services is straightforward.


QUESTIONS FOR ANALYZING SOCIAL MEDIA DATA

If you decide to download your social media data, you need to spend more time defining the questions and problem statements you intend to address, as well as the scope. Are you looking at only one platform, or do you intend to aggregate platforms together? If you aggregate them, how will you define your metrics? Is a ‘Retweet’ on Twitter equivalent to a ‘Reshare’ on LinkedIn? What about on WhatsApp, which is only ‘private’ chats?

Next, you also need to define carefully how you will handle PII of other people. Are you allowed to download and store this data at all in your region? Will you anonymize the data? How can you safely do this without losing valuable descriptive attributes? What do you intend to do with this information, compare yourself to other people, find groups/segments/networks?

Once you’ve carefully considered what you will do, you can define some concrete questions. Below are some examples:

  1. In general, have you been using social media more or less over the last 12 months? Use a rolling average.

  2. Which social media platform do you use the ‘most’? How do you quantify using the ‘most’ — # posts? Total post length? Number of times you opened the app on your phone or browser?

  3. Is there a time of day where you use social media more or less? How does this differ across platforms? How does this differ across days of the week?

  4. On which social media platform do you receive the most engagement? How does it correlate with your post or usage frequency? Is the frequency or length of your posts, for example, correlated with how much engagement you receive? How do you define engagement… # likes? Comments? Sum of all engagements?

  5. What key words tend to show up most frequently in your posts? What key words are most highly correlated with content engagement? What topics are most popular if ‘popularity’ is defined by engagement?

  6. Have you been receiving more or less engagement over time? Can you explain that?

  7. Does the time when you post have an effect on the number of engagements you receive?

  8. Of the people who ‘follow’ you, where do they come from? What is their primary language of posting in their own content? What topics do they post the most about, and how does that compare to your own?

  9. How many of the people who ‘follow’ you are likely bots, or ‘dead’ accounts that haven’t posted in over 1 year?

  10. What percentage of people who ‘follow’ you have engaged with content you have posted in the last 6 months?

An obvious choice for personal data analysis, wearables and personal devices give us an easy way to collect meaningful information about our activities and health. From our cellphones to smart watches and IoT devices, the possibilities are endless. Unlike other forms of personal data analysis, this is also very common in communities which use these devices. As a result, for Power BI, there are a lot of freely available custom connectors to retrieve this data directly into a dataset.

  1. Fitbit:
    Fitbit has an API you can use to retrieve your data, either from Power Query or programmatically through other means like Python. While relatively straightforward, the request limit is rather conservative, so it takes a long time to get historical data. For batch, one-off downloads of your information, it’s better to use their data export page. Once you’ve cleaned and shaped the data, you can then use the API to collect only recent information.

  2. Garmin:
    Garmin similarly provides a download service from the web. Their API seems similar to that of Fitbit, and is documented here.

  3. Strava:
    For avid cyclists, Strava is the go-to place to analyze your activity. It’s also quite popular in the Power BI community, as a custom connector has been made by Kasper de Jonge. If you prefer a bulk export method, that is also possible as described, here, as well as the actual API.

  4. Google Fit & Google Devices:
    Downloaded from takeout.google.com, as mentioned above.

  5. Apple Smartwatches:
    Downloaded from privacy.apple.com, as mentioned above.

  6. Samsung Smartwatches:
    Downloaded from the mobile app, as described here.

There are of course many other devices you can use, but these are some of the most popular. When working with this data, you will need to carefully shape and clean it, reflecting about what you do with outliers, like days when you forgot to charge the wearable, or if you see differences when you switched devices.

 

QUESTIONS FOR ANALYZING DATA FROM WEARABLES & PERSONAL DEVICES

These questions will depend on your personal health and activity goals. They are usually the easiest to define and make actionable, since we all want to be healthy. Again, be really clear in what you want to answer instead of setting out to analyze whether you are ‘more active’ or ‘healthier’ compared to a previous period. Some of these devices provide data down to and past the minute granularity, which can result in over one billion records if you’ve been using these devices for 5+ years.

The below questions are phrased with respect to stepcount and distance travelled by foot, but you can easily replace this with something else like heart rate, sleep, distance biked, weight, etc.

  1. What is your average daily stepcount? How is this expressed in units of distance (i.e. km walked per day)? How does this compare to the national average in the country where you live? How does this compare to a previous period (year, month, etc.)?

  2. Are you becoming less active over time? Use a rolling average to examine this.

  3. What kinds of anomalies do you see in the data? Which days in the last years were you most active, and can you explain this? Can you include this on a time series with annotations to tell a story?

  4. During the week, when do you tend to be most active? How does this differ when you work from home vs. work at the office? What about on days that you work vs. days you have off?

  5. Find a specific period where you were biologically / physically different from ‘normal’. For example, when you were sick with the flu for a week, or recovering from a surgery. Is this period distinct from your average?

  6. If you have switched devices at any point, do you see a difference in your metrics before vs. after switching the device? Can this be explained by confounding variables like changes in your environment or lifestyle?

  7. Can you perform a forecast of your data? If so, what do you project your activity will be in the next week? The next month? The next year? How much certainty does this prediction have? Use a simple algorithm to predict or forecast this, like the Facebook Prophet forecast.

  8. Can you correlate your activity with the weather? Do you tend to be more active on warm, sunny days compared to cold, rainy days, for example? What is the general correlation between your activity and the temperature?

  9. If you have access to location data, can you create a map showing where you are most active, or what your typical running / walking routes are?

  10. Do you think this data is accurate? If yes / no, how can you prove this? What kinds of data can you use to validate this information and show that it is or is not accurate, within a certain percentage?

 

TO CONCLUDE

There are countless sources of personal data you can retrieve, online. Analyzing this data can not only help you learn data and critical thinking skills, but also helps make you more aware about the kinds of information corporations are collecting about you. Since this personal data is typically large, messy, and not ideally structured for analysis, it is the perfect opportunity to learn how to wrangle data with Power Query and deal with issues like missing or incomplete information.

It is essential to be mindful about what we are collecting, and how we will share it. We must take care to handle data diligently, particularly more sensitive personal information of ourselves, and information that may indirectly be about others, like from social media. We need to be responsible with this information, and also clear about what we analyze, how we will explain it to others, and how we might get value from it, ourselves. In doing that exercise, we train the right muscles to help us when we reach the ‘real’ data battlefield.


Advance Your Career &amp; Learning with Data Communication

Advance Your Career & Learning with Data Communication

Learn Power BI with Sample Datasets: Part 1 - Contoso Data Generator

Learn Power BI with Sample Datasets: Part 1 - Contoso Data Generator

0