Google Data Analytics Capstone Project
I think of all Capstone projects and Case studies- hiring managers sure love their case studies, as things done in partial fulfilment of an end-goal. They are usually tied to the display of eligibilty of a job or proof of proficiency.
I am still not sure why I did this, maybe it was only learning how to analyze quantitative data and simultaneously scratch the itch in me that is obsessed with learning or maybe I am still trying to answer this question I asked 3 years ago. What I do know is that this was a lot of fun for me, I am very proud of what I did and I am going to do many more.
What follows is my attempt at answering this question ‘How can a wellness technology company catered to women play it smart?’ The tools I used for this are Google sheets, MySQL, Tableau and Canva.
But first, an executive summary:
The Google Data Analytics course offered by Google, which is what the capstone project is for, outlines 6 stages that an analyst must go through when analyzing data. These stages are:
Ask
The overarching questions that I was required to answer with my analysis were:
1. What are some trends in smart device usage?
2. How could these trends help influence Bellabeat marketing strategy?
With the help of AI, I set the exact questions based on the above that I hoped to answer from the dataset, these are the questions which I then answered in my analysis:
- Are there specific times of the day when users have the most steps or burn the most calories?
- What is the correlation between each user’s average steps and calories burned?
- What is the spread of activity and inactivity on a daily basis?
- What is the percentage of average steps taken each day compared to the daily recommended steps?
- What is the daily step count and sleep hour classification per user?
- What is the average number of sleep hours for each user?
- How much time do users spend in bed vs how much time are they asleep?
Prepare
In the prepare phase, I did the following:
- Downloaded the data in Zip file format from Kaggle
- Created a Schema in MySql which was the major tool for my analysis
It was during this phase I discovered some of the limitations of the data set, which were:
- The dataset was only on 30 users
- The timeframe for the data collected was only one month.
- Some parts of the dataset were incomplete with the table on the weight information on the users containing data for only 8 people.
Process
This featured extensive data cleaning and wrangling. During this phase, these were the actions I performed:
- Changed the datetypes from DD/MM/YYYY to YYYY/MM/DD.
- Added an ID column for unique IDs since the data was long and each unique ID had data in different rows.
- Got rid of null values
This process helped me achieve the following
- Streamline the dataset using only 4 of the 8 tables in the dataset.
- Create tables in the schema and then columns in each of the tables. Ensuring that the data types were correct and matched what was in the Google Sheets files.
- Uploaded the cleaned data to MySQL.
Analyze and Share
To query my data and answer the questions I had set in the Ask Phase, I used SQL.
But first, some general information on the data:
- Average total steps = 7638
- Average Active hours = 3.8 hours or 228 minutes
- Average Sedentary hours = 16.5 hours or 991 minutes
- Average sleep time = 491.4 minutes.
Are there specific times of the day when users have the most steps or burn the most calories?
Average Steps Taken at different hours in the day
select
Distinct ActivityHour as Time_of_day,
sum(StepTotal) as Steps
from hourlysteps_merged
Group by Time_of_day
Order by Steps Desc;
Average Calories burned at different hours in the day
select
ActivityHour as Time_in_the_day,
round (avg (Calories)) as Calories
from hourlycalories_merged
Group by ActivityHour
Order By Calories Desc;
Visualization
What is the correlation between each user’s average steps and calories burned?
- The data revealed that on average users took most steps by 6 PM which was also the same hour they had the most calorie burned.
- This was also the only time that they matched the amount of calories burned with the number of steps taken.
Average Calories burned and steps taken at different hours in the day
select
(dailysteps_merged.UserId) as Users,
round (avg (dailycalories_merged.Calories)) as Calories,
round (avg (dailysteps_merged.StepTotal)) as Steps
from dailycalories_merged
join
dailysteps_merged
on
dailycalories_merged.Id = dailysteps_merged.Id
Group by Users;
Visualization
- The more steps a user took on average the more calories they burned, showing a positive correlation between the two variables.
What is the spread of activity and inactivity on a daily basis?
Spread of Activity and Inactivity on A Daily basis
select dayname(ActivityDate) as day_of_week,
round(avg(LightlyActiveMinutes)) as average_lightlyactive_minutes,
round(avg(FairlyActiveMinutes)) as average_fairlyactive_minutes,
round(avg(VeryActiveMinutes))as average_veryactive_minutes,
round(avg(SedentaryMinutes)) as average_inactive_minutes
from dailyactivity_merged
group by day_of_week
order by 'day_of_week';
Visualization
- For the one month period that this data was collected, users were mostly inactive or lightly active
- With Saturday having the highest amount of light active minutes and Monday having the most inactive minutes.
What is the percentage of average steps taken each day compared to the daily recommended steps?
- The daily recommended step count is 10,000, the data shows that on average users did not reach this target.
Average Steps Taken on each day of the week
select
dayname(ActivityDay) as day_of_week,
round (avg (StepTotal)) as Steps
from dailysteps_merged
Group by day_of_week;
Visualization
- The day with the highest step count is Saturday while Sunday has the lowest.
- Thursday and Friday have the exact amount of steps on average.
What is the daily step count and sleep hour classification per user?
In the same vein of recommended step count, a specific number of sleep hours is recommended to get optimal rest and for this, I set it at 7–8 hours. Then I did the same for step count per user. This was followed by grouping each user into categories that fit that sleeping habits and step counts.
Daily Step Count Classification per user
select
distinct UserId as Users,
round(avg(TotalSteps)) as Average_total_steps,
CASE
When round(avg(TotalSteps)) < 10000 THEN 'Not Enough Steps'
When round(avg(TotalSteps)) = 10000 THEN 'Enough Steps'
Else 'That is Enough Steps'
End as avg_user_steps
from dailyactivity_merged
group by Users;
Daily Sleep hours Classification per user
select
distinct UserId AS Users,
round (avg(TotalMinutesAsleep) / 60) AS Average_sleep_time_in_hours,
CASE
When round(avg(TotalMinutesAsleep) / 60) BETWEEN 1 AND 6 THEN 'insufficient sleep'
When round(avg(TotalMinutesAsleep) / 60) BETWEEN 7 AND 8 THEN 'optimal sleep'
else 'Okay, Sleeping Beauty'
End AS Sleep_Time_Groupings
From fit_bit_data.sleepday_merged
Group by Users;
Visualization
What is the average number of sleep hours for each user?
This is discover how many hours each user sleeps in relation to all the other users as displayed with a bubble chart
Average Sleep Hours Per User
select
distinct UserId as Users,
round(AVG(TotalMinutesAsleep)/60) as Average_sleep_time_in_hours
FROM fit_bit_data.sleepday_merged
Group by Users;
Visualization
How much time do users spend in bed vs how much time are they asleep?
This is discover how many hours each user sleeps in relation to all the other users as displayed with a bubble chart
Daily Time in Bed Vs Time Asleep in Minutes
select
dayname(SleepDay) as day_of_week,
round (avg(TotalMinutesAsleep), 2) as Time_Asleep,
round (avg(TotalTimeInBed), 2) as Time_in_bed
from sleepday_merged
Group by day_of_week;
Visualization
- There is not a large disparity in the number of minutes users spend in bed compared to how many minutes they are asleep.
- Participants spent about 39 minutes in bed before falling asleep.
- Sunday the day with the least amount of steps also has the highest amount of time in bed and asleep.
Act (Recommendations)
Based on the analysis carried out, here are a few recommendations for the company to consider in regard to its marketing.
- Reward Rest Days: Users already spend Sundays doing limited activities, so advertise Sundays as the perfect day for rest, and instead of tracking activity, let it track rest and reward users based on how little they did on Sundays. Positioning themselves as a health company that also values rest, sets them apart in the market. Think of it as a cheat day but for rest instead of food.
- Activity Reminders: The data shows that users are mostly non-active or lightly active, the company should look into push notifications that remind users to move around a bit after long periods of inactivity with quirk quips like “Fancy some stretching?” or “Your legs asked that we tell you to move”. This places the company as a fun health buddy that is looking out for you.
- User Communities: To encourage activity and drive competition, the company can create sleep and activity levels accompanied by access to a buddy system. Everyone gets to know their activity and sleep levels. Then based on these, they can choose to join communities of other women at the same level as them.
- User Education: Creating educational content that is tailored to each user and teaches them how to either get more sleep, more activity or rest more in general or get to levels higher than they currently at.