Image source: NPR Health News
_h indicate estimates of
how many hours the child engaged in that activity_score indicate scores the
child received on various behavioral tasks administers by the
researchers_rt indicate the
rates at which the children performed the behavioral tasksplay_action_h is the number of hours a
child spends playing action video games per day. If this is NA, that
likely means the child does not engage in this behavior. However, there
are variables that indicate a score for a behavior test, in which case
we cannot use that data point for that child.# Load libraries & dataset
library(tidyverse)
library(mice)
library(VIM)
library(ggfortify)
library(MKinfer)
library(ggplot2)
library(gtsummary)
library(modelsummary)
dat <- read.csv('../Final/mmi_kids/data/mmi_kids_gva_v2021.2.csv')For initial exploration, I am interested in seeing if there are any differences by age and gender in various behaviors and scores. I hypothesize that the older kids may use media more, as their brains become more developed and they become more interested in socializig with their friends.
Additionally, we can see from the EDA see there are a few outliers in the estimated media hours per day - there is definitely something wrong with these, as daily hours can only realistically go up to 24. The largest outlier well exceeds this at 35 hours.
dat %>%
ggplot(aes(x=age)) +
geom_histogram(binwidth = 1, fill="#69b3a2", color="#e9ecef") +
ggtitle("Distribution of Children's Ages") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
scale_x_continuous(breaks = c(8, 9, 10, 11, 12, 13)) +
ylab("Frequency") +
xlab("Age")Not too many younger kids, and weird that there is one 13-year-old - the study said the ages went up to 12. Not a huge deal, but a bit odd!
dat %>%
ggplot( aes(x=read_h)) +
geom_histogram(binwidth = 1, fill="#69b3a2", color="#e9ecef") +
ggtitle("Distribution of Reading Hours per Day") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("Frequency") +
xlab("Hours per Day")The grand majority of kids do not seem to read very much - no more than 1 hour per day. Curious if this includes reading as part of school work or not.
dat %>%
ggplot( aes(x=media_h)) +
geom_histogram(binwidth = 1, fill="#69b3a2", color="#e9ecef") +
ggtitle("Distribution of Estimated Total Media Use per Day") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("Frequency") +
xlab("Hours per Day")This was a bit surprising to me - 10+ hours per day of media for some students seems excessive. We obviously have an outlier at 30+ hours of media per day, as that’s not possible in 24 hours!
dat %>%
ggplot( aes(x=age, y=media_h, color=gender)) +
geom_point(alpha=0.8, size=4) +
ggtitle("Total Estimated Media Use per Day by Age & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("Hours per Day") +
xlab("Age")It seems there is a slight positive relationship between age and hours of media consumed per day. There is not much of a relationship between gender and hours of media consumed. Finally, we see the 30+ hour data point come back to haunt us here too.
That outlier be haunting us like a dang ghost.
dat %>%
ggplot( aes(x=k6_score, y=play_nonaction_h, color=gender)) +
geom_point(alpha=0.8, size=4) +
ggtitle("Non-Action Video Game Hours per Day by Distress Score & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("Hours per Day") +
xlab("Distress Score")Looking at just non-action video game play, we see females leading the pack with the higher hours played. However, we see males at the high end of distress. Overall, the data points are rather spread and I would be hesitant to make any sweeping conclusions here without more data to back them up.
…does cooking in Breath of the Wild count as non-action game play??
dat %>%
ggplot( aes(x=k6_score, y=play_action_h, color=gender)) +
geom_point(alpha=0.8, size=4) +
ggtitle("Action Video Game Hours per Day by\nDistress Score & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("Hours per Day") +
xlab("Distress Score")Looking at hours of action video games played per day, we see almost exclusively males at the high ends for both hours played and distress scores. Video game play and gender might be good candidates for explanatory variables later on in this analysis!
Action games, AKA lots of BOOM POW BAM explosions n' stuff!
dat %>%
ggplot( aes(x=mw_score, y=play_action_h, color=gender)) +
geom_point( alpha=0.8, size=4) +
ggtitle("Action Video Game Hours per Day by\nMind Wandering Score & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("Hours per Day") +
xlab("Mind Wandering Score")Keeping on the topic of action video games, I wanted to likewise look at the Mind-Wandering Score that was collected by researchers. We do not see anything too extreme other than a handful of male players who spent a lot of time playing these games and have a slightly higher mind wandering score. However, I find it especially interesting that the students with the highest mind wandering scores actually have very low action video game hours. This could be informative!
PCA is a method to reduce the dimensionality of our dataset and distill the information that helps best explain the variance in the data. Looking at the PCA helps us determine which features give us the most salient information about patterns we are looking for.
Two methods of evaluating the PCA are using a scree plot and biplot with loadings.
Did someone say ‘salient’?
pca <- prcomp(na.omit(dat[, c('age', 'grades', 'read_h', 'video_h', 'music_h', 'write_h', 'create_h',
'play_action_h', 'play_nonaction_h', 'surf_h', 'skype_h', 'talk_h', 'mmi_score',
'k6_score', 'sdq_score', 'ct_score', 'cp_score', 'sleep_score', 'mw_score',
'grit_score', 'toi_score', 'd2_rt', 'sart_rt', 'blast_rt')]),
center = TRUE, scale. = TRUE)
screeplot(pca, type="lines")In the scree plot, we see that the first principle component* captures the most variation, and then the second captures some variation, and so on. Past the 2nd principle component, we see less and less variation being explained. Therefore, it seems the first and second components distilled the most information.
autoplot(pca, data = na.omit(dat), colour = "school_year", loadings=TRUE, loadings.label=TRUE)For the PCA biplot with loadings, there was not a clear separation by school year. The labels get a bit cluttered, but we see that:
No cheating on your responses please!
I chose to use the MICE package to impute missing values. I have used this in the past and was pleased with the results. Basically, this package uses multiple regression to predict what the missing values are.
I like the MICE method because it does not assume that the data are normally distributed and also assumes the data are missing randomly. These assumptions were appropriate for this project. Source
MICE also has really nice plots to visualize and evaluate missing values - let’s go ahead and use that first!
# Remove student with fake data
dat <- dat[dat$kid != 'k029', ]
# Remove rows with more than 50% NA
dat_drop <- dat[which(rowMeans(!is.na(dat)) > 0.60), ]
# Select the numerical columns with the missing data
dat_mis <- select(dat_drop, c('grades':'blast_miss_rate'))
# Create a really nice MICE plot to show missing value patterns
mice_plot <- aggr(dat_mis, col=c('navyblue','yellow'),
numbers=TRUE, sortVars=TRUE,
labels=names(dat_mis), cex.axis=.7,
gap=3, ylab=c("Missing data","Pattern"))##
## Variables sorted by number of missings:
## Variable Count
## grit_score 0.50000000
## sdq_score 0.46923077
## cp_score 0.43846154
## mw_score 0.39230769
## k6_score 0.38461538
## play_action_h 0.21538462
## play_nonaction_h 0.21538462
## ct_score 0.20000000
## blast_rt 0.11538462
## blast_fa_rate 0.11538462
## blast_miss_rate 0.11538462
## toi_score 0.09230769
## mmi_week 0.07692308
## mmi_weekend 0.07692308
## mmi_score 0.07692308
## sart_fa_rate 0.07692308
## sart_miss_rate 0.07692308
## sart_rt 0.06923077
## grades 0.03076923
## sleep_score 0.01538462
## d2_rt 0.01538462
## d2_fa_rate 0.01538462
## d2_miss_rate 0.01538462
## read_h 0.00000000
## video_h 0.00000000
## music_h 0.00000000
## write_h 0.00000000
## create_h 0.00000000
## play_h 0.00000000
## surf_h 0.00000000
## skype_h 0.00000000
## talk_h 0.00000000
## total_hours_weekday 0.00000000
## total_hours_weekendday 0.00000000
## media_h 0.00000000
## mm_frequency 0.00000000
cp_score) also has a lot of missing
values, and I would actually like to use this variable in my analysis. I
feel comfortable moving forward with the MICE imputation, but will
closely evaluate the results when considering the Child’s Parent
Score.# Note that this code chunk below is set NOT to run because I imputed the data beforehand and already saved the file. The imputation takes several minutes, so I have saved the imputed dataset and loaded it here.
# Perform the imputation
dat_imputed <- mice(dat_mis, m=5, maxit = 50, method = 'pmm', seed = 500)
# Select one of the imputed datasets to use (Using 2 here)
dat_imp_complete <- complete(dat_imputed, 2)
# Grab the categorical variables that we did not impute
dat_cat <- select(dat_drop, c('kid':'gender'))
# Concat the imputed data to the categorical cols
imputed_full <- cbind(dat_cat, dat_imp_complete)
write.csv(imputed_full, '../Final/data_imputed.csv')# Load the data for real this time
imputed_full <- read.csv('data_imputed.csv')Thank you, MICE!
My goal with the t-tests is to explore if there are any differences between the teachers in relation to the grades that students self-reported. A teacher or professor can have a huge impact on a student’s specific learning style, so I am interested to see if there is anything of interest here.
teachers <- na.omit(unique(imputed_full$teacher))
B <- length(teachers)
# Compute the number of pairs, and create an appropriately
# sized data frame to store the p-values in:
n_comb <- choose(B, 2)
p_values <- data.frame(group1 = vector("character", n_comb),
group2 = vector("character", n_comb),
p = vector("numeric", n_comb))
# Loop over all pairs:
k <- 1 # Counter variable
for(i in 1:(B-1))
{
for(j in (i+1):B)
{
# Run a t-test for the current pair:
test_res <- perm.t.test(grades ~ teacher,
data = subset(imputed_full,
teacher == teachers[i] | teacher == teachers[j]))
# Store the p-value:
p_values[k, ] <- c(teachers[i], teachers[j], test_res$p.value)
# Increase the counter variable:
k <- k + 1
}
}
p_values$bonf <- p.adjust(p_values$p, method = "bonferroni")
p_values$holm <- p.adjust(p_values$p, method = "holm")
p_values$bh <- p.adjust(p_values$p, method = "BH")
knitr::kable(p_values)| group1 | group2 | p | bonf | holm | bh |
|---|---|---|---|---|---|
| t10 | t03 | 0.468360647115994 | 1.0000000 | 1.0000000 | 0.7806011 |
| t10 | t04 | 0.28071266454471 | 1.0000000 | 1.0000000 | 0.5741850 |
| t10 | t01 | 0.388483808720804 | 1.0000000 | 1.0000000 | 0.6992709 |
| t10 | t09 | 0.901778587556985 | 1.0000000 | 1.0000000 | 0.9660096 |
| t10 | t05 | 0.755028048660544 | 1.0000000 | 1.0000000 | 0.8941122 |
| t10 | t06 | 0.274885827342811 | 1.0000000 | 1.0000000 | 0.5741850 |
| t10 | t02 | 0.256146403903182 | 1.0000000 | 1.0000000 | 0.5741850 |
| t10 | t11 | 0.053810926604514 | 1.0000000 | 1.0000000 | 0.2017910 |
| t10 | t08 | 0.444687305800653 | 1.0000000 | 1.0000000 | 0.7696511 |
| t03 | t04 | 0.616908370414151 | 1.0000000 | 1.0000000 | 0.8617225 |
| t03 | t01 | 0.867871086762281 | 1.0000000 | 1.0000000 | 0.9660096 |
| t03 | t09 | 0.35024199485285 | 1.0000000 | 1.0000000 | 0.6567037 |
| t03 | t05 | 0.163412829363141 | 1.0000000 | 1.0000000 | 0.4085321 |
| t03 | t06 | 0.607002652951452 | 1.0000000 | 1.0000000 | 0.8617225 |
| t03 | t02 | 0.557177799314374 | 1.0000000 | 1.0000000 | 0.8617225 |
| t03 | t11 | 0.00136552922824547 | 0.0614488 | 0.0559867 | 0.0122898 |
| t03 | t08 | 0.936215590071695 | 1.0000000 | 1.0000000 | 0.9660096 |
| t04 | t01 | 0.708527367919974 | 1.0000000 | 1.0000000 | 0.8617225 |
| t04 | t09 | 0.109389790425134 | 1.0000000 | 1.0000000 | 0.3076588 |
| t04 | t05 | 0.0508572002316935 | 1.0000000 | 1.0000000 | 0.2017910 |
| t04 | t06 | 0.989274655974988 | 1.0000000 | 1.0000000 | 0.9892747 |
| t04 | t02 | 0.936445586374526 | 1.0000000 | 1.0000000 | 0.9660096 |
| t04 | t11 | 0.000286663575707051 | 0.0128999 | 0.0123265 | 0.0043000 |
| t04 | t08 | 0.703982957075637 | 1.0000000 | 1.0000000 | 0.8617225 |
| t01 | t09 | 0.223798756338605 | 1.0000000 | 1.0000000 | 0.5300497 |
| t01 | t05 | 0.0999812100126979 | 1.0000000 | 1.0000000 | 0.2999436 |
| t01 | t06 | 0.699322176886671 | 1.0000000 | 1.0000000 | 0.8617225 |
| t01 | t02 | 0.637958834903266 | 1.0000000 | 1.0000000 | 0.8617225 |
| t01 | t11 | 0.000615208030700147 | 0.0276844 | 0.0258387 | 0.0069211 |
| t01 | t08 | 0.944542689277285 | 1.0000000 | 1.0000000 | 0.9660096 |
| t09 | t05 | 0.52701919165529 | 1.0000000 | 1.0000000 | 0.8469951 |
| t09 | t06 | 0.0922414651866604 | 1.0000000 | 1.0000000 | 0.2964904 |
| t09 | t02 | 0.0816817835583894 | 1.0000000 | 1.0000000 | 0.2827446 |
| t09 | t11 | 0.0062261899432337 | 0.2801785 | 0.2428214 | 0.0400255 |
| t09 | t08 | 0.336588406816577 | 1.0000000 | 1.0000000 | 0.6567037 |
| t05 | t06 | 0.0443688597832433 | 1.0000000 | 1.0000000 | 0.1996599 |
| t05 | t02 | 0.0397568856321453 | 1.0000000 | 1.0000000 | 0.1987844 |
| t05 | t11 | 0.0371274479848394 | 1.0000000 | 1.0000000 | 0.1987844 |
| t05 | t08 | 0.161389857588044 | 1.0000000 | 1.0000000 | 0.4085321 |
| t06 | t02 | 0.919585386736334 | 1.0000000 | 1.0000000 | 0.9660096 |
| t06 | t11 | 0.000255851852561101 | 0.0115133 | 0.0112575 | 0.0043000 |
| t06 | t08 | 0.699510619211028 | 1.0000000 | 1.0000000 | 0.8617225 |
| t02 | t11 | 0.000242860178467621 | 0.0109287 | 0.0109287 | 0.0043000 |
| t02 | t08 | 0.649124526964742 | 1.0000000 | 1.0000000 | 0.8617225 |
| t11 | t08 | 0.0018461002686262 | 0.0830745 | 0.0738440 | 0.0138458 |
Wow this is interesting! Teacher 11 shows up as having a very low p-value with almost all other teachers. - The confidence intervals do NOT include zero for teacher 11’s comparisons, so we can be quite confident that the relationship we are seeing is valid. - In fact, since I was running so many hypothesis tests I made sure to use a couple different p-value corrections: - While there was no hope at all for finding relationships between most of the other teachers and their grades, teacher 11 was still statistically significant. - It seems the self-reported grades were truly quite different for this teacher. I cannot resist, so I will make a couple brief exploratory graphs here.
ggplot(imputed_full, aes(x = reorder(teacher, -grades), y = grades)) +
geom_bar(position = "dodge",
stat = "summary",
fun = "mean",
fill="#69b3a2") +
# Fancy title
ggtitle("Teachers by Average Self-Reported Student Grades") +
xlab("Teacher") +
ylab("Average Student Grades") +
# Move the title to the center
theme(plot.title = element_text(hjust = 0.5))We clearly see that Teacher T11 has much lower self-reported student grades. Let’s dig further into which grades this teacher covers - it is possible they have special needs or a tough class.
knitr::kable(table(imputed_full$teacher, imputed_full$school_year))| 5P | 6P | 7P | 8P | special needs | |
|---|---|---|---|---|---|
| t01 | 0 | 0 | 13 | 0 | 0 |
| t02 | 0 | 0 | 10 | 0 | 0 |
| t03 | 0 | 0 | 0 | 13 | 0 |
| t04 | 7 | 7 | 0 | 0 | 0 |
| t05 | 5 | 8 | 0 | 0 | 0 |
| t06 | 0 | 0 | 0 | 15 | 0 |
| t08 | 6 | 3 | 0 | 0 | 0 |
| t09 | 0 | 0 | 15 | 0 | 0 |
| t10 | 8 | 8 | 0 | 0 | 0 |
| t11 | 0 | 0 | 0 | 0 | 12 |
Aha!, teacher T11 teaches exclusively special needs classes. This certainly explains the grades we are seeing. Special needs students may not have self-reported their grades the same way as neuro-typical students, and it’s even possible that someone else entered the grades for these students. We will definitely need to keep this in mind as we continue with our analyses.
Next, I’m curious to see if there is any difference in the number of hours boys and girls play video games per day.
ladies <- imputed_full[imputed_full$gender == "female",]
gents <- imputed_full[imputed_full$gender == "male",]
print(wilcox.test(ladies$play_h, gents$play_h))##
## Wilcoxon rank sum test with continuity correction
##
## data: ladies$play_h and gents$play_h
## W = 948, p-value = 3.656e-08
## alternative hypothesis: true location shift is not equal to 0
print(paste("Girls average hours of video games per day: ", round(mean(ladies$play_h), 2)))## [1] "Girls average hours of video games per day: 0.46"
print(paste("Boys average hours of video games per day: ", round(mean(gents$play_h), 2)))## [1] "Boys average hours of video games per day: 1.17"
We indeed see a difference between the number of hours that boys and girls play video games. The boys played an average of just over an hour of video games per day, while the girls played about 45 minutes per day.
ggplot(imputed_full, aes(x=gender, y=play_h, color=gender, fill=gender) ) +
geom_violin() +
# Fancy title
ggtitle("Hours of Video Games Played Per Day by Gender") +
xlab("Gender") +
ylab("Hours of Video Games Played ") +
# Move the title to the center
theme(plot.title = element_text(hjust = 0.5))Moving on, I am interested in seeing if there is any correlation between a child’s Distress Score and the Sleep Score that they received.
corr <- cor.test(imputed_full$k6_score, imputed_full$sleep_score,
use = "pairwise.complete")
print(corr)##
## Pearson's product-moment correlation
##
## data: imputed_full$k6_score and imputed_full$sleep_score
## t = -4.8231, df = 128, p-value = 3.943e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5286497 -0.2359037
## sample estimates:
## cor
## -0.3921605
ggplot(imputed_full, aes(k6_score, sleep_score)) +
geom_point() +
# Fancy title
ggtitle("Sleep Score by Distress Score") +
xlab("Distress Score") +
ylab("Sleep Score") +
# Move the title to the center
theme(plot.title = element_text(hjust = 0.5))I was listening to a Malcolm Gladwell podcast about how teachers had a strong impact on the performance of their students. Since we want to explore categorical variables for the Chi-Squared Test, I am interested in exploring if there is a relationship between the teacher and the gender of the student.
knitr::kable(table(imputed_full$teacher, imputed_full$gender))| female | male | |
|---|---|---|
| t01 | 6 | 7 |
| t02 | 5 | 5 |
| t03 | 6 | 7 |
| t04 | 6 | 8 |
| t05 | 5 | 8 |
| t06 | 7 | 8 |
| t08 | 4 | 5 |
| t09 | 7 | 8 |
| t10 | 11 | 5 |
| t11 | 6 | 6 |
chisq.test(table(imputed_full$teacher, imputed_full$gender),
simulate.p.value = TRUE)##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: table(imputed_full$teacher, imputed_full$gender)
## X-squared = 3.5066, df = NA, p-value = 0.9505
Not super surprising, but still encouraging to see that the gender of students does not seem to interact with the teachers.
Sorry if this analysis has disappointed you, Malcolm!!
I am interested in the K-6 distress scale score
(k6_score) and the Strength and Difficulties
Questionnaire score (sdq_score). NOTE: I am
thinking these two variables may appear to be similar on the surface,
but I am interested in exploring if there are any underlying factors
that differ.
The K-6 distress scale measures psychological distress including anxiety and depression. The higher the score, the higher the levels of distress.
The Strength and Difficulties score measures levels of socio-emotional difficulty. The higher the score, the more social and emotional difficulties the child is experiencing.
My hypothesis is that there is no difference between K-6 distress and socio-emotional difficulty in terms of their predictors.
My goal is to fit identical models for both dependent variables so I can see if there are any underlying differences between the two.
# Create some one-hot encoded variables so we can use them
imputed_full$age_group <- as.factor(imputed_full$age_group)
imputed_full$gender <- as.factor(imputed_full$gender)
# Fit linear regression model for K6 distress
m_k6 <- lm(k6_score ~ age_group + gender + grades + read_h + video_h + music_h +
create_h + play_action_h + play_nonaction_h + surf_h + skype_h +
talk_h + mmi_score + sdq_score + ct_score + cp_score +
sleep_score + mw_score + toi_score + d2_rt + sart_rt + blast_rt,
data = imputed_full)
# Pretty up the results
m_k6 %>%
gtsummary::tbl_regression() %>%
gtsummary::bold_labels() %>%
gtsummary::bold_p(t=0.05) %>%
gtsummary::add_glance_source_note()| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| age_group | |||
| [ 8.0,10.2) | — | — | |
| [10.2,11.2) | -1.2 | -2.5, 0.02 | 0.054 |
| [11.2,12.7] | -1.2 | -2.5, 0.18 | 0.087 |
| gender | |||
| female | — | — | |
| male | 0.32 | -0.82, 1.5 | 0.6 |
| grades | 1.7 | -2.5, 5.9 | 0.4 |
| read_h | 0.28 | -0.38, 0.93 | 0.4 |
| video_h | 0.56 | -0.05, 1.2 | 0.070 |
| music_h | 0.08 | -0.51, 0.68 | 0.8 |
| create_h | -0.81 | -1.6, -0.05 | 0.038 |
| play_action_h | -0.44 | -1.3, 0.38 | 0.3 |
| play_nonaction_h | -1.0 | -2.0, -0.10 | 0.030 |
| surf_h | 0.74 | -0.72, 2.2 | 0.3 |
| skype_h | 0.23 | -0.48, 0.94 | 0.5 |
| talk_h | -0.34 | -0.66, -0.02 | 0.038 |
| mmi_score | 0.05 | -0.52, 0.63 | 0.9 |
| sdq_score | 0.32 | 0.20, 0.45 | <0.001 |
| ct_score | 0.00 | -0.04, 0.04 | >0.9 |
| cp_score | -1.9 | -7.0, 3.3 | 0.5 |
| sleep_score | -1.8 | -5.2, 1.5 | 0.3 |
| mw_score | 3.2 | 0.42, 6.0 | 0.024 |
| toi_score | 0.07 | -0.01, 0.15 | 0.10 |
| d2_rt | 3.8 | 1.1, 6.6 | 0.007 |
| sart_rt | 7.3 | -1.0, 16 | 0.086 |
| blast_rt | -4.7 | -7.2, -2.2 | <0.001 |
| R² = 0.574; Adjusted R² = 0.482; Sigma = 2.51; Statistic = 6.22; p-value = <0.001; df = 23; Log-likelihood = -291; AIC = 632; BIC = 703; Deviance = 668; Residual df = 106; No. Obs. = 130 | |||
| 1 CI = Confidence Interval | |||
Fabulous. First, we have an R-square value of 0.48, which is not too shabby for a social science dataset! This means that our model is explaining about half of the variation that we are seeing in the data.
Which variables seem to relate to a child having more distress? We see significant, positive relationships with the following variables:
sdq_score)mw_score)d2_rt)imputed_full %>%
ggplot( aes(x=sdq_score, y=k6_score) ) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("K6 Distress Score vs Strengths & Difficulties Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score") +
xlab("SDQ Score")imputed_full %>%
ggplot( aes(x=mw_score, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("K6 Distress Score vs Mind Wandering Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score") +
xlab("Mind Wandering Score")The Response Time score refers to the amount of time in seconds that it took students to respond to a task given by the researchers.
imputed_full %>%
ggplot( aes(x=d2_rt, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("K6 Distress Score vs D2 Response Time") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score") +
xlab("D2 Response Time")From the regression model, we see significant inverse relationships with Distress and the following variables:
create_h)play_nonaction_h)talk_h)blast_rt)imputed_full %>%
ggplot( aes(x=create_h, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs Daily Content Creation") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score") +
xlab("Daily Content Creation Hours")imputed_full %>%
ggplot( aes(x=play_nonaction_h, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs by Daily Non-Action Video Games") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score Score") +
xlab("Daily Non-Action Video Game Hours")imputed_full %>%
ggplot( aes(x=talk_h, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs Daily Non-Tech Talking") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score") +
xlab("Daily Non-Tech Talking Hours")“BLAST” refers to a response time test where a higher score means a better response time.
imputed_full %>%
ggplot( aes(x=blast_rt, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs BLAST Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("K6 Distress Score") +
xlab("BLAST Score")# Fit linear regression model for Strength & Difficulties score
m_sdq <- lm(sdq_score ~ age_group + gender + grades + read_h + video_h + music_h +
create_h + play_action_h + play_nonaction_h + surf_h + skype_h +
talk_h + mmi_score + k6_score + ct_score + cp_score +
sleep_score + mw_score + toi_score + d2_rt + sart_rt + blast_rt,
data = imputed_full)
m_sdq %>%
gtsummary::tbl_regression() %>%
gtsummary::bold_labels() %>%
gtsummary::bold_p(t=0.05) %>%
gtsummary::add_glance_source_note()| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| age_group | |||
| [ 8.0,10.2) | — | — | |
| [10.2,11.2) | 0.92 | -0.80, 2.6 | 0.3 |
| [11.2,12.7] | 0.76 | -1.1, 2.6 | 0.4 |
| gender | |||
| female | — | — | |
| male | -0.85 | -2.4, 0.70 | 0.3 |
| grades | -8.8 | -14, -3.3 | 0.002 |
| read_h | -0.35 | -1.2, 0.54 | 0.4 |
| video_h | -1.0 | -1.8, -0.18 | 0.017 |
| music_h | -0.54 | -1.3, 0.26 | 0.2 |
| create_h | -0.21 | -1.3, 0.85 | 0.7 |
| play_action_h | -0.06 | -1.2, 1.1 | >0.9 |
| play_nonaction_h | 1.4 | 0.12, 2.7 | 0.032 |
| surf_h | 1.1 | -0.88, 3.1 | 0.3 |
| skype_h | -0.54 | -1.5, 0.42 | 0.3 |
| talk_h | 0.22 | -0.22, 0.66 | 0.3 |
| mmi_score | 1.1 | 0.37, 1.9 | 0.004 |
| k6_score | 0.60 | 0.36, 0.83 | <0.001 |
| ct_score | -0.01 | -0.07, 0.04 | 0.7 |
| cp_score | 5.1 | -1.8, 12 | 0.15 |
| sleep_score | -8.7 | -13, -4.4 | <0.001 |
| mw_score | -1.8 | -5.6, 2.1 | 0.4 |
| toi_score | 0.02 | -0.09, 0.14 | 0.7 |
| d2_rt | -2.5 | -6.4, 1.3 | 0.2 |
| sart_rt | -3.0 | -15, 8.4 | 0.6 |
| blast_rt | 2.0 | -1.6, 5.6 | 0.3 |
| R² = 0.622; Adjusted R² = 0.540; Sigma = 3.41; Statistic = 7.58; p-value = <0.001; df = 23; Log-likelihood = -331; AIC = 712; BIC = 783; Deviance = 1,235; Residual df = 106; No. Obs. = 130 | |||
| 1 CI = Confidence Interval | |||
Adjusted R-squared of the model is 0.54, which is not bad. We are explaining about 54% of the variation in our data with these variables.
We see three significant increasers of the Strengths and Difficulties score:
play_nonaction_h)mmi_score)k6_score)imputed_full %>%
ggplot( aes(x=play_nonaction_h, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("Strengths & Difficulties Score vs Daily Non-Action Video Games") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("SDQ Score") +
xlab("Daily Non-Action Video Games Hours")imputed_full %>%
ggplot( aes(x=mmi_score, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("Strengths & Difficulties Score vs Media Multitasking") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("SDQ Score") +
xlab("Media Multitasking")THIS IS THE SAME GRAPH AS ABOVE but I am showing it again for uniformity!
imputed_full %>%
ggplot( aes(x=k6_score, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("Strengths & Difficulties Score vs K6 Distress Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("SDQ Score") +
xlab("K6 Distress Score")Overall, there are some interesting nuggets on the decreasers for the Strengths and Difficulties score:
imputed_full %>%
ggplot( aes(x=grades, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("Strengths & Difficulties Score vs Grades") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("SDQ Score") +
xlab("Grades")imputed_full %>%
ggplot( aes(x=video_h, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("Strengths & Difficulties Score vs Sleep Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("SDQ Score") +
xlab("Daily Hours Watching Videos")NOTE: This graph honestly seems weird! I would have expected a much clearer inverse relationship, but the datapoints are quite spread. Further investigation required!
imputed_full %>%
ggplot( aes(x=sleep_score, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("Strengths & Difficulties Score vs Sleep Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
) +
ylab("SDQ Score") +
xlab("Sleep Score")autoplot(m_k6, which = 1:6, ncol = 2, label.size = 3)Observations:
NOTE: I had investigated outliers from a previous look at this data and eliminated them before this attempt at this analysis.
autoplot(m_sdq, which = 1:6, ncol = 2, label.size = 3)Observations:
Hopefully my diagnostic plots were a bit more optimistic!
Some parting thoughts and recommendations:
Stress, anxiety, and depression can have a spider-web effect into many other parts of a person’s life - it’s astonishing to already see these effects with kids at such young ages in life.
It is possible that more distress could cloud thinking and therefore response time.
Consider spending a little extra time creating content rather than consuming content.
Get a good night’s rest when possible! This is possibly the most important piece of advice!
Create some new content just like Bob Ross