Image source: NPR Health News
_h
indicate estimates of
how many hours the child engaged in that activity_score
indicate scores the
child received on various behavioral tasks administers by the
researchers_rt
indicate the
rates at which the children performed the behavioral tasksplay_action_h
is the number of hours a
child spends playing action video games per day. If this is NA, that
likely means the child does not engage in this behavior. However, there
are variables that indicate a score for a behavior test, in which case
we cannot use that data point for that child.# Load libraries & dataset
library(tidyverse)
library(mice)
library(VIM)
library(ggfortify)
library(MKinfer)
library(ggplot2)
library(gtsummary)
library(modelsummary)
<- read.csv('../Final/mmi_kids/data/mmi_kids_gva_v2021.2.csv') dat
For initial exploration, I am interested in seeing if there are any differences by age and gender in various behaviors and scores. I hypothesize that the older kids may use media more, as their brains become more developed and they become more interested in socializig with their friends.
Additionally, we can see from the EDA see there are a few outliers in the estimated media hours per day - there is definitely something wrong with these, as daily hours can only realistically go up to 24. The largest outlier well exceeds this at 35 hours.
%>%
dat ggplot(aes(x=age)) +
geom_histogram(binwidth = 1, fill="#69b3a2", color="#e9ecef") +
ggtitle("Distribution of Children's Ages") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) scale_x_continuous(breaks = c(8, 9, 10, 11, 12, 13)) +
ylab("Frequency") +
xlab("Age")
Not too many younger kids, and weird that there is one 13-year-old - the study said the ages went up to 12. Not a huge deal, but a bit odd!
%>%
dat ggplot( aes(x=read_h)) +
geom_histogram(binwidth = 1, fill="#69b3a2", color="#e9ecef") +
ggtitle("Distribution of Reading Hours per Day") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("Frequency") +
xlab("Hours per Day")
The grand majority of kids do not seem to read very much - no more than 1 hour per day. Curious if this includes reading as part of school work or not.
%>%
dat ggplot( aes(x=media_h)) +
geom_histogram(binwidth = 1, fill="#69b3a2", color="#e9ecef") +
ggtitle("Distribution of Estimated Total Media Use per Day") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("Frequency") +
xlab("Hours per Day")
This was a bit surprising to me - 10+ hours per day of media for some students seems excessive. We obviously have an outlier at 30+ hours of media per day, as that’s not possible in 24 hours!
%>%
dat ggplot( aes(x=age, y=media_h, color=gender)) +
geom_point(alpha=0.8, size=4) +
ggtitle("Total Estimated Media Use per Day by Age & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("Hours per Day") +
xlab("Age")
It seems there is a slight positive relationship between age and hours of media consumed per day. There is not much of a relationship between gender and hours of media consumed. Finally, we see the 30+ hour data point come back to haunt us here too.
That outlier be haunting us like a dang ghost.
%>%
dat ggplot( aes(x=k6_score, y=play_nonaction_h, color=gender)) +
geom_point(alpha=0.8, size=4) +
ggtitle("Non-Action Video Game Hours per Day by Distress Score & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("Hours per Day") +
xlab("Distress Score")
Looking at just non-action video game play, we see females leading the pack with the higher hours played. However, we see males at the high end of distress. Overall, the data points are rather spread and I would be hesitant to make any sweeping conclusions here without more data to back them up.
…does cooking in Breath of the Wild count as non-action game play??
%>%
dat ggplot( aes(x=k6_score, y=play_action_h, color=gender)) +
geom_point(alpha=0.8, size=4) +
ggtitle("Action Video Game Hours per Day by\nDistress Score & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("Hours per Day") +
xlab("Distress Score")
Looking at hours of action video games played per day, we see almost exclusively males at the high ends for both hours played and distress scores. Video game play and gender might be good candidates for explanatory variables later on in this analysis!
Action games, AKA lots of BOOM POW BAM explosions n' stuff!
%>%
dat ggplot( aes(x=mw_score, y=play_action_h, color=gender)) +
geom_point( alpha=0.8, size=4) +
ggtitle("Action Video Game Hours per Day by\nMind Wandering Score & Gender") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("Hours per Day") +
xlab("Mind Wandering Score")
Keeping on the topic of action video games, I wanted to likewise look at the Mind-Wandering Score that was collected by researchers. We do not see anything too extreme other than a handful of male players who spent a lot of time playing these games and have a slightly higher mind wandering score. However, I find it especially interesting that the students with the highest mind wandering scores actually have very low action video game hours. This could be informative!
PCA is a method to reduce the dimensionality of our dataset and distill the information that helps best explain the variance in the data. Looking at the PCA helps us determine which features give us the most salient information about patterns we are looking for.
Two methods of evaluating the PCA are using a scree plot and biplot with loadings.
Did someone say ‘salient’?
<- prcomp(na.omit(dat[, c('age', 'grades', 'read_h', 'video_h', 'music_h', 'write_h', 'create_h',
pca 'play_action_h', 'play_nonaction_h', 'surf_h', 'skype_h', 'talk_h', 'mmi_score',
'k6_score', 'sdq_score', 'ct_score', 'cp_score', 'sleep_score', 'mw_score',
'grit_score', 'toi_score', 'd2_rt', 'sart_rt', 'blast_rt')]),
center = TRUE, scale. = TRUE)
screeplot(pca, type="lines")
In the scree plot, we see that the first principle component* captures the most variation, and then the second captures some variation, and so on. Past the 2nd principle component, we see less and less variation being explained. Therefore, it seems the first and second components distilled the most information.
autoplot(pca, data = na.omit(dat), colour = "school_year", loadings=TRUE, loadings.label=TRUE)
For the PCA biplot with loadings, there was not a clear separation by school year. The labels get a bit cluttered, but we see that:
No cheating on your responses please!
I chose to use the MICE package to impute missing values. I have used this in the past and was pleased with the results. Basically, this package uses multiple regression to predict what the missing values are.
I like the MICE method because it does not assume that the data are normally distributed and also assumes the data are missing randomly. These assumptions were appropriate for this project. Source
MICE also has really nice plots to visualize and evaluate missing values - let’s go ahead and use that first!
# Remove student with fake data
<- dat[dat$kid != 'k029', ]
dat
# Remove rows with more than 50% NA
<- dat[which(rowMeans(!is.na(dat)) > 0.60), ]
dat_drop
# Select the numerical columns with the missing data
<- select(dat_drop, c('grades':'blast_miss_rate'))
dat_mis
# Create a really nice MICE plot to show missing value patterns
<- aggr(dat_mis, col=c('navyblue','yellow'),
mice_plot numbers=TRUE, sortVars=TRUE,
labels=names(dat_mis), cex.axis=.7,
gap=3, ylab=c("Missing data","Pattern"))
##
## Variables sorted by number of missings:
## Variable Count
## grit_score 0.50000000
## sdq_score 0.46923077
## cp_score 0.43846154
## mw_score 0.39230769
## k6_score 0.38461538
## play_action_h 0.21538462
## play_nonaction_h 0.21538462
## ct_score 0.20000000
## blast_rt 0.11538462
## blast_fa_rate 0.11538462
## blast_miss_rate 0.11538462
## toi_score 0.09230769
## mmi_week 0.07692308
## mmi_weekend 0.07692308
## mmi_score 0.07692308
## sart_fa_rate 0.07692308
## sart_miss_rate 0.07692308
## sart_rt 0.06923077
## grades 0.03076923
## sleep_score 0.01538462
## d2_rt 0.01538462
## d2_fa_rate 0.01538462
## d2_miss_rate 0.01538462
## read_h 0.00000000
## video_h 0.00000000
## music_h 0.00000000
## write_h 0.00000000
## create_h 0.00000000
## play_h 0.00000000
## surf_h 0.00000000
## skype_h 0.00000000
## talk_h 0.00000000
## total_hours_weekday 0.00000000
## total_hours_weekendday 0.00000000
## media_h 0.00000000
## mm_frequency 0.00000000
cp_score
) also has a lot of missing
values, and I would actually like to use this variable in my analysis. I
feel comfortable moving forward with the MICE imputation, but will
closely evaluate the results when considering the Child’s Parent
Score.# Note that this code chunk below is set NOT to run because I imputed the data beforehand and already saved the file. The imputation takes several minutes, so I have saved the imputed dataset and loaded it here.
# Perform the imputation
<- mice(dat_mis, m=5, maxit = 50, method = 'pmm', seed = 500)
dat_imputed
# Select one of the imputed datasets to use (Using 2 here)
<- complete(dat_imputed, 2)
dat_imp_complete
# Grab the categorical variables that we did not impute
<- select(dat_drop, c('kid':'gender'))
dat_cat
# Concat the imputed data to the categorical cols
<- cbind(dat_cat, dat_imp_complete)
imputed_full
write.csv(imputed_full, '../Final/data_imputed.csv')
# Load the data for real this time
<- read.csv('data_imputed.csv') imputed_full
Thank you, MICE!
My goal with the t-tests is to explore if there are any differences between the teachers in relation to the grades that students self-reported. A teacher or professor can have a huge impact on a student’s specific learning style, so I am interested to see if there is anything of interest here.
<- na.omit(unique(imputed_full$teacher))
teachers <- length(teachers)
B
# Compute the number of pairs, and create an appropriately
# sized data frame to store the p-values in:
<- choose(B, 2)
n_comb <- data.frame(group1 = vector("character", n_comb),
p_values group2 = vector("character", n_comb),
p = vector("numeric", n_comb))
# Loop over all pairs:
<- 1 # Counter variable
k for(i in 1:(B-1))
{for(j in (i+1):B)
{# Run a t-test for the current pair:
<- perm.t.test(grades ~ teacher,
test_res data = subset(imputed_full,
== teachers[i] | teacher == teachers[j]))
teacher # Store the p-value:
<- c(teachers[i], teachers[j], test_res$p.value)
p_values[k, ] # Increase the counter variable:
<- k + 1
k
}
}
$bonf <- p.adjust(p_values$p, method = "bonferroni")
p_values$holm <- p.adjust(p_values$p, method = "holm")
p_values$bh <- p.adjust(p_values$p, method = "BH")
p_values
::kable(p_values) knitr
group1 | group2 | p | bonf | holm | bh |
---|---|---|---|---|---|
t10 | t03 | 0.468360647115994 | 1.0000000 | 1.0000000 | 0.7806011 |
t10 | t04 | 0.28071266454471 | 1.0000000 | 1.0000000 | 0.5741850 |
t10 | t01 | 0.388483808720804 | 1.0000000 | 1.0000000 | 0.6992709 |
t10 | t09 | 0.901778587556985 | 1.0000000 | 1.0000000 | 0.9660096 |
t10 | t05 | 0.755028048660544 | 1.0000000 | 1.0000000 | 0.8941122 |
t10 | t06 | 0.274885827342811 | 1.0000000 | 1.0000000 | 0.5741850 |
t10 | t02 | 0.256146403903182 | 1.0000000 | 1.0000000 | 0.5741850 |
t10 | t11 | 0.053810926604514 | 1.0000000 | 1.0000000 | 0.2017910 |
t10 | t08 | 0.444687305800653 | 1.0000000 | 1.0000000 | 0.7696511 |
t03 | t04 | 0.616908370414151 | 1.0000000 | 1.0000000 | 0.8617225 |
t03 | t01 | 0.867871086762281 | 1.0000000 | 1.0000000 | 0.9660096 |
t03 | t09 | 0.35024199485285 | 1.0000000 | 1.0000000 | 0.6567037 |
t03 | t05 | 0.163412829363141 | 1.0000000 | 1.0000000 | 0.4085321 |
t03 | t06 | 0.607002652951452 | 1.0000000 | 1.0000000 | 0.8617225 |
t03 | t02 | 0.557177799314374 | 1.0000000 | 1.0000000 | 0.8617225 |
t03 | t11 | 0.00136552922824547 | 0.0614488 | 0.0559867 | 0.0122898 |
t03 | t08 | 0.936215590071695 | 1.0000000 | 1.0000000 | 0.9660096 |
t04 | t01 | 0.708527367919974 | 1.0000000 | 1.0000000 | 0.8617225 |
t04 | t09 | 0.109389790425134 | 1.0000000 | 1.0000000 | 0.3076588 |
t04 | t05 | 0.0508572002316935 | 1.0000000 | 1.0000000 | 0.2017910 |
t04 | t06 | 0.989274655974988 | 1.0000000 | 1.0000000 | 0.9892747 |
t04 | t02 | 0.936445586374526 | 1.0000000 | 1.0000000 | 0.9660096 |
t04 | t11 | 0.000286663575707051 | 0.0128999 | 0.0123265 | 0.0043000 |
t04 | t08 | 0.703982957075637 | 1.0000000 | 1.0000000 | 0.8617225 |
t01 | t09 | 0.223798756338605 | 1.0000000 | 1.0000000 | 0.5300497 |
t01 | t05 | 0.0999812100126979 | 1.0000000 | 1.0000000 | 0.2999436 |
t01 | t06 | 0.699322176886671 | 1.0000000 | 1.0000000 | 0.8617225 |
t01 | t02 | 0.637958834903266 | 1.0000000 | 1.0000000 | 0.8617225 |
t01 | t11 | 0.000615208030700147 | 0.0276844 | 0.0258387 | 0.0069211 |
t01 | t08 | 0.944542689277285 | 1.0000000 | 1.0000000 | 0.9660096 |
t09 | t05 | 0.52701919165529 | 1.0000000 | 1.0000000 | 0.8469951 |
t09 | t06 | 0.0922414651866604 | 1.0000000 | 1.0000000 | 0.2964904 |
t09 | t02 | 0.0816817835583894 | 1.0000000 | 1.0000000 | 0.2827446 |
t09 | t11 | 0.0062261899432337 | 0.2801785 | 0.2428214 | 0.0400255 |
t09 | t08 | 0.336588406816577 | 1.0000000 | 1.0000000 | 0.6567037 |
t05 | t06 | 0.0443688597832433 | 1.0000000 | 1.0000000 | 0.1996599 |
t05 | t02 | 0.0397568856321453 | 1.0000000 | 1.0000000 | 0.1987844 |
t05 | t11 | 0.0371274479848394 | 1.0000000 | 1.0000000 | 0.1987844 |
t05 | t08 | 0.161389857588044 | 1.0000000 | 1.0000000 | 0.4085321 |
t06 | t02 | 0.919585386736334 | 1.0000000 | 1.0000000 | 0.9660096 |
t06 | t11 | 0.000255851852561101 | 0.0115133 | 0.0112575 | 0.0043000 |
t06 | t08 | 0.699510619211028 | 1.0000000 | 1.0000000 | 0.8617225 |
t02 | t11 | 0.000242860178467621 | 0.0109287 | 0.0109287 | 0.0043000 |
t02 | t08 | 0.649124526964742 | 1.0000000 | 1.0000000 | 0.8617225 |
t11 | t08 | 0.0018461002686262 | 0.0830745 | 0.0738440 | 0.0138458 |
Wow this is interesting! Teacher 11 shows up as having a very low p-value with almost all other teachers. - The confidence intervals do NOT include zero for teacher 11’s comparisons, so we can be quite confident that the relationship we are seeing is valid. - In fact, since I was running so many hypothesis tests I made sure to use a couple different p-value corrections: - While there was no hope at all for finding relationships between most of the other teachers and their grades, teacher 11 was still statistically significant. - It seems the self-reported grades were truly quite different for this teacher. I cannot resist, so I will make a couple brief exploratory graphs here.
ggplot(imputed_full, aes(x = reorder(teacher, -grades), y = grades)) +
geom_bar(position = "dodge",
stat = "summary",
fun = "mean",
fill="#69b3a2") +
# Fancy title
ggtitle("Teachers by Average Self-Reported Student Grades") +
xlab("Teacher") +
ylab("Average Student Grades") +
# Move the title to the center
theme(plot.title = element_text(hjust = 0.5))
We clearly see that Teacher T11 has much lower self-reported student grades. Let’s dig further into which grades this teacher covers - it is possible they have special needs or a tough class.
::kable(table(imputed_full$teacher, imputed_full$school_year)) knitr
5P | 6P | 7P | 8P | special needs | |
---|---|---|---|---|---|
t01 | 0 | 0 | 13 | 0 | 0 |
t02 | 0 | 0 | 10 | 0 | 0 |
t03 | 0 | 0 | 0 | 13 | 0 |
t04 | 7 | 7 | 0 | 0 | 0 |
t05 | 5 | 8 | 0 | 0 | 0 |
t06 | 0 | 0 | 0 | 15 | 0 |
t08 | 6 | 3 | 0 | 0 | 0 |
t09 | 0 | 0 | 15 | 0 | 0 |
t10 | 8 | 8 | 0 | 0 | 0 |
t11 | 0 | 0 | 0 | 0 | 12 |
Aha!, teacher T11 teaches exclusively special needs classes. This certainly explains the grades we are seeing. Special needs students may not have self-reported their grades the same way as neuro-typical students, and it’s even possible that someone else entered the grades for these students. We will definitely need to keep this in mind as we continue with our analyses.
Next, I’m curious to see if there is any difference in the number of hours boys and girls play video games per day.
<- imputed_full[imputed_full$gender == "female",]
ladies <- imputed_full[imputed_full$gender == "male",]
gents
print(wilcox.test(ladies$play_h, gents$play_h))
##
## Wilcoxon rank sum test with continuity correction
##
## data: ladies$play_h and gents$play_h
## W = 948, p-value = 3.656e-08
## alternative hypothesis: true location shift is not equal to 0
print(paste("Girls average hours of video games per day: ", round(mean(ladies$play_h), 2)))
## [1] "Girls average hours of video games per day: 0.46"
print(paste("Boys average hours of video games per day: ", round(mean(gents$play_h), 2)))
## [1] "Boys average hours of video games per day: 1.17"
We indeed see a difference between the number of hours that boys and girls play video games. The boys played an average of just over an hour of video games per day, while the girls played about 45 minutes per day.
ggplot(imputed_full, aes(x=gender, y=play_h, color=gender, fill=gender) ) +
geom_violin() +
# Fancy title
ggtitle("Hours of Video Games Played Per Day by Gender") +
xlab("Gender") +
ylab("Hours of Video Games Played ") +
# Move the title to the center
theme(plot.title = element_text(hjust = 0.5))
Moving on, I am interested in seeing if there is any correlation between a child’s Distress Score and the Sleep Score that they received.
<- cor.test(imputed_full$k6_score, imputed_full$sleep_score,
corr use = "pairwise.complete")
print(corr)
##
## Pearson's product-moment correlation
##
## data: imputed_full$k6_score and imputed_full$sleep_score
## t = -4.8231, df = 128, p-value = 3.943e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5286497 -0.2359037
## sample estimates:
## cor
## -0.3921605
ggplot(imputed_full, aes(k6_score, sleep_score)) +
geom_point() +
# Fancy title
ggtitle("Sleep Score by Distress Score") +
xlab("Distress Score") +
ylab("Sleep Score") +
# Move the title to the center
theme(plot.title = element_text(hjust = 0.5))
I was listening to a Malcolm Gladwell podcast about how teachers had a strong impact on the performance of their students. Since we want to explore categorical variables for the Chi-Squared Test, I am interested in exploring if there is a relationship between the teacher and the gender of the student.
::kable(table(imputed_full$teacher, imputed_full$gender)) knitr
female | male | |
---|---|---|
t01 | 6 | 7 |
t02 | 5 | 5 |
t03 | 6 | 7 |
t04 | 6 | 8 |
t05 | 5 | 8 |
t06 | 7 | 8 |
t08 | 4 | 5 |
t09 | 7 | 8 |
t10 | 11 | 5 |
t11 | 6 | 6 |
chisq.test(table(imputed_full$teacher, imputed_full$gender),
simulate.p.value = TRUE)
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: table(imputed_full$teacher, imputed_full$gender)
## X-squared = 3.5066, df = NA, p-value = 0.9505
Not super surprising, but still encouraging to see that the gender of students does not seem to interact with the teachers.
Sorry if this analysis has disappointed you, Malcolm!!
I am interested in the K-6 distress scale score
(k6_score
) and the Strength and Difficulties
Questionnaire score (sdq_score
). NOTE: I am
thinking these two variables may appear to be similar on the surface,
but I am interested in exploring if there are any underlying factors
that differ.
The K-6 distress scale measures psychological distress including anxiety and depression. The higher the score, the higher the levels of distress.
The Strength and Difficulties score measures levels of socio-emotional difficulty. The higher the score, the more social and emotional difficulties the child is experiencing.
My hypothesis is that there is no difference between K-6 distress and socio-emotional difficulty in terms of their predictors.
My goal is to fit identical models for both dependent variables so I can see if there are any underlying differences between the two.
# Create some one-hot encoded variables so we can use them
$age_group <- as.factor(imputed_full$age_group)
imputed_full$gender <- as.factor(imputed_full$gender)
imputed_full
# Fit linear regression model for K6 distress
<- lm(k6_score ~ age_group + gender + grades + read_h + video_h + music_h +
m_k6 + play_action_h + play_nonaction_h + surf_h + skype_h +
create_h + mmi_score + sdq_score + ct_score + cp_score +
talk_h + mw_score + toi_score + d2_rt + sart_rt + blast_rt,
sleep_score data = imputed_full)
# Pretty up the results
%>%
m_k6 ::tbl_regression() %>%
gtsummary::bold_labels() %>%
gtsummary::bold_p(t=0.05) %>%
gtsummary::add_glance_source_note() gtsummary
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
age_group | |||
[ 8.0,10.2) | — | — | |
[10.2,11.2) | -1.2 | -2.5, 0.02 | 0.054 |
[11.2,12.7] | -1.2 | -2.5, 0.18 | 0.087 |
gender | |||
female | — | — | |
male | 0.32 | -0.82, 1.5 | 0.6 |
grades | 1.7 | -2.5, 5.9 | 0.4 |
read_h | 0.28 | -0.38, 0.93 | 0.4 |
video_h | 0.56 | -0.05, 1.2 | 0.070 |
music_h | 0.08 | -0.51, 0.68 | 0.8 |
create_h | -0.81 | -1.6, -0.05 | 0.038 |
play_action_h | -0.44 | -1.3, 0.38 | 0.3 |
play_nonaction_h | -1.0 | -2.0, -0.10 | 0.030 |
surf_h | 0.74 | -0.72, 2.2 | 0.3 |
skype_h | 0.23 | -0.48, 0.94 | 0.5 |
talk_h | -0.34 | -0.66, -0.02 | 0.038 |
mmi_score | 0.05 | -0.52, 0.63 | 0.9 |
sdq_score | 0.32 | 0.20, 0.45 | <0.001 |
ct_score | 0.00 | -0.04, 0.04 | >0.9 |
cp_score | -1.9 | -7.0, 3.3 | 0.5 |
sleep_score | -1.8 | -5.2, 1.5 | 0.3 |
mw_score | 3.2 | 0.42, 6.0 | 0.024 |
toi_score | 0.07 | -0.01, 0.15 | 0.10 |
d2_rt | 3.8 | 1.1, 6.6 | 0.007 |
sart_rt | 7.3 | -1.0, 16 | 0.086 |
blast_rt | -4.7 | -7.2, -2.2 | <0.001 |
R² = 0.574; Adjusted R² = 0.482; Sigma = 2.51; Statistic = 6.22; p-value = <0.001; df = 23; Log-likelihood = -291; AIC = 632; BIC = 703; Deviance = 668; Residual df = 106; No. Obs. = 130 | |||
1 CI = Confidence Interval |
Fabulous. First, we have an R-square value of 0.48, which is not too shabby for a social science dataset! This means that our model is explaining about half of the variation that we are seeing in the data.
Which variables seem to relate to a child having more distress? We see significant, positive relationships with the following variables:
sdq_score
)mw_score
)d2_rt
)%>%
imputed_full ggplot( aes(x=sdq_score, y=k6_score) ) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("K6 Distress Score vs Strengths & Difficulties Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score") +
xlab("SDQ Score")
%>%
imputed_full ggplot( aes(x=mw_score, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("K6 Distress Score vs Mind Wandering Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score") +
xlab("Mind Wandering Score")
The Response Time score refers to the amount of time in seconds that it took students to respond to a task given by the researchers.
%>%
imputed_full ggplot( aes(x=d2_rt, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("K6 Distress Score vs D2 Response Time") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score") +
xlab("D2 Response Time")
From the regression model, we see significant inverse relationships with Distress and the following variables:
create_h
)play_nonaction_h
)talk_h
)blast_rt
)%>%
imputed_full ggplot( aes(x=create_h, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs Daily Content Creation") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score") +
xlab("Daily Content Creation Hours")
%>%
imputed_full ggplot( aes(x=play_nonaction_h, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs by Daily Non-Action Video Games") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score Score") +
xlab("Daily Non-Action Video Game Hours")
%>%
imputed_full ggplot( aes(x=talk_h, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs Daily Non-Tech Talking") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score") +
xlab("Daily Non-Tech Talking Hours")
“BLAST” refers to a response time test where a higher score means a better response time.
%>%
imputed_full ggplot( aes(x=blast_rt, y=k6_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("K6 Distress Score vs BLAST Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("K6 Distress Score") +
xlab("BLAST Score")
# Fit linear regression model for Strength & Difficulties score
<- lm(sdq_score ~ age_group + gender + grades + read_h + video_h + music_h +
m_sdq + play_action_h + play_nonaction_h + surf_h + skype_h +
create_h + mmi_score + k6_score + ct_score + cp_score +
talk_h + mw_score + toi_score + d2_rt + sart_rt + blast_rt,
sleep_score data = imputed_full)
%>%
m_sdq ::tbl_regression() %>%
gtsummary::bold_labels() %>%
gtsummary::bold_p(t=0.05) %>%
gtsummary::add_glance_source_note() gtsummary
Characteristic | Beta | 95% CI1 | p-value |
---|---|---|---|
age_group | |||
[ 8.0,10.2) | — | — | |
[10.2,11.2) | 0.92 | -0.80, 2.6 | 0.3 |
[11.2,12.7] | 0.76 | -1.1, 2.6 | 0.4 |
gender | |||
female | — | — | |
male | -0.85 | -2.4, 0.70 | 0.3 |
grades | -8.8 | -14, -3.3 | 0.002 |
read_h | -0.35 | -1.2, 0.54 | 0.4 |
video_h | -1.0 | -1.8, -0.18 | 0.017 |
music_h | -0.54 | -1.3, 0.26 | 0.2 |
create_h | -0.21 | -1.3, 0.85 | 0.7 |
play_action_h | -0.06 | -1.2, 1.1 | >0.9 |
play_nonaction_h | 1.4 | 0.12, 2.7 | 0.032 |
surf_h | 1.1 | -0.88, 3.1 | 0.3 |
skype_h | -0.54 | -1.5, 0.42 | 0.3 |
talk_h | 0.22 | -0.22, 0.66 | 0.3 |
mmi_score | 1.1 | 0.37, 1.9 | 0.004 |
k6_score | 0.60 | 0.36, 0.83 | <0.001 |
ct_score | -0.01 | -0.07, 0.04 | 0.7 |
cp_score | 5.1 | -1.8, 12 | 0.15 |
sleep_score | -8.7 | -13, -4.4 | <0.001 |
mw_score | -1.8 | -5.6, 2.1 | 0.4 |
toi_score | 0.02 | -0.09, 0.14 | 0.7 |
d2_rt | -2.5 | -6.4, 1.3 | 0.2 |
sart_rt | -3.0 | -15, 8.4 | 0.6 |
blast_rt | 2.0 | -1.6, 5.6 | 0.3 |
R² = 0.622; Adjusted R² = 0.540; Sigma = 3.41; Statistic = 7.58; p-value = <0.001; df = 23; Log-likelihood = -331; AIC = 712; BIC = 783; Deviance = 1,235; Residual df = 106; No. Obs. = 130 | |||
1 CI = Confidence Interval |
Adjusted R-squared of the model is 0.54, which is not bad. We are explaining about 54% of the variation in our data with these variables.
We see three significant increasers of the Strengths and Difficulties score:
play_nonaction_h
)mmi_score
)k6_score
)%>%
imputed_full ggplot( aes(x=play_nonaction_h, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("Strengths & Difficulties Score vs Daily Non-Action Video Games") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("SDQ Score") +
xlab("Daily Non-Action Video Games Hours")
%>%
imputed_full ggplot( aes(x=mmi_score, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("Strengths & Difficulties Score vs Media Multitasking") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("SDQ Score") +
xlab("Media Multitasking")
THIS IS THE SAME GRAPH AS ABOVE but I am showing it again for uniformity!
%>%
imputed_full ggplot( aes(x=k6_score, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="red") +
ggtitle("Strengths & Difficulties Score vs K6 Distress Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("SDQ Score") +
xlab("K6 Distress Score")
Overall, there are some interesting nuggets on the decreasers for the Strengths and Difficulties score:
%>%
imputed_full ggplot( aes(x=grades, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("Strengths & Difficulties Score vs Grades") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("SDQ Score") +
xlab("Grades")
%>%
imputed_full ggplot( aes(x=video_h, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("Strengths & Difficulties Score vs Sleep Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("SDQ Score") +
xlab("Daily Hours Watching Videos")
NOTE: This graph honestly seems weird! I would have expected a much clearer inverse relationship, but the datapoints are quite spread. Further investigation required!
%>%
imputed_full ggplot( aes(x=sleep_score, y=sdq_score)) +
geom_point(alpha=0.8, size=4, color="blue") +
ggtitle("Strengths & Difficulties Score vs Sleep Score") +
theme(
plot.title = element_text(size=15, hjust = 0.5)
+
) ylab("SDQ Score") +
xlab("Sleep Score")
autoplot(m_k6, which = 1:6, ncol = 2, label.size = 3)
Observations:
NOTE: I had investigated outliers from a previous look at this data and eliminated them before this attempt at this analysis.
autoplot(m_sdq, which = 1:6, ncol = 2, label.size = 3)
Observations:
Hopefully my diagnostic plots were a bit more optimistic!
Some parting thoughts and recommendations:
Stress, anxiety, and depression can have a spider-web effect into many other parts of a person’s life - it’s astonishing to already see these effects with kids at such young ages in life.
It is possible that more distress could cloud thinking and therefore response time.
Consider spending a little extra time creating content rather than consuming content.
Get a good night’s rest when possible! This is possibly the most important piece of advice!
Create some new content just like Bob Ross