library(tidyverse)
library(mgcv)
library(itsadug)
Laryngeal analysis (gam)
1 Notes
- The analysis goal is to extract GAM measures and correlate GAM-based measures with behavioral data.
- The model to fit each individual’s data is: bam(uV ~ s(time, condition, bs = ‘fs’, m = 1) + s(time, stim, bs = ‘fs’, m = 1), data = df, discrete = TRUE)
- condition is the interaction term of poa (dorsal vs. glottal), direction (high_to_low vs. low_to_high) x stim_role (standard vs. deviant)
2 Libraries
3 Load data and cleaning
# F0 data
<- read_csv("~/OneDrive - University of Toronto/Projects/Laryngeal/data/gam_F0.csv")
raw # data cleaning and organization
<- raw %>%
df_f0 pivot_longer(cols = 4:ncol(raw), names_to = "time", values_to = "uV") %>%
separate(col = condition, into = c("poa", "blc_stan", "blc_devi", "stim_role", "item")) %>%
unite(col = "block", c("blc_stan", "blc_devi"), sep = "_") %>%
mutate(group = as.factor(group),
participant = as.factor(participant),
poa = as.factor(poa),
block = as.factor(block),
stim_role = as.factor(stim_role),
item = as.factor(item),
time = as.numeric(time)) %>%
mutate(
immn_direction = as.factor(case_when(
=="highStan_lowDevi" & stim_role=="devi" ~ "high_to_low",
block=="lowStan_highDevi" & stim_role=="stan" ~ "high_to_low",
block=="lowStan_highDevi" & stim_role=="devi" ~ "low_to_high",
block=="highStan_lowDevi" & stim_role=="stan" ~ "low_to_high",
block
))%>%
) unite(col = "condition", c("poa", "immn_direction", "stim_role"), sep = "/", remove = FALSE) %>%
mutate(condition = as.factor(condition)) %>%
droplevels()
# VOT data
<- read_csv("~/OneDrive - University of Toronto/Projects/Laryngeal/data/gam_vot.csv")
raw # data cleaning and organization
<- raw %>%
df_vot pivot_longer(cols = 4:ncol(raw), names_to = "time", values_to = "uV") %>%
separate(col = condition, into = c("poa", "blc_stan", "blc_devi", "stim_role", "item")) %>%
unite(col = "block", c("blc_stan", "blc_devi"), sep = "_") %>%
mutate(group = as.factor(group),
participant = as.factor(participant),
poa = as.factor(poa),
block = as.factor(block),
stim_role = as.factor(stim_role),
item = as.factor(item),
time = as.numeric(time)) %>%
mutate(
immn_direction = as.factor(case_when(
=="highStan_lowDevi" & stim_role=="devi" ~ "long_to_short",
block=="lowStan_highDevi" & stim_role=="stan" ~ "long_to_short",
block=="lowStan_highDevi" & stim_role=="devi" ~ "short_to_long",
block=="highStan_lowDevi" & stim_role=="stan" ~ "short_to_long",
block
))%>%
) unite(col = "condition", c("poa", "immn_direction", "stim_role"), sep = "/", remove = FALSE) %>%
mutate(condition = as.factor(condition)) %>%
droplevels()
The dataset consists of the following columns:
- group = participant group (vot vs. f0)
- participant = participant number
- time = time in milliseconds (ranges from -200 to 800, in 4 ms bins)
- poa = place of articulation (dorsal vs. glottal)
- stim_role = stimulus role (standard vs. deviant)
- uV = ERP amplitude in microvoltages
- immn_direction = direction for computing iMMN
4 GAM modeling and GAM-based individual measures
We extract the following GAM-based individual measures:
- trad_erp: average amplitude of observed data in specified time window
- model_area: Modelled Area = geometric area (amplitude * time) under the GAM curve. This measures the area under the peak (or maximum if there is no peak); only looking for positive area’s (or negative areas)
- peak_height: Height Modelled Peak = height of the peak of the GAM smooth, or the highest point if no peak
- NMP: Normalized Modelled Peak = a measure of robustness of the peak in units of SDs. I.e., how reliably does this subject show the peak? If value is above 1, then the 95% confidence bands do not overlap, and we can be certain the peak is there. If value is between 0 and 1, then there is a lot of variation between the items.
- half_area_latency: Modelled Area Median Latency = fractional area latency, i.e. latency at 50% of the area (midpoint)
- model_peak_time: Modelled Peak Latency = latency of the modelled peak
Additionally we compute the following measures, for extra information:
- trad_norm_erp: normalized traditional average
- hasPeak: TRUE/FALSE; is there a peak in the modeled signal in the search window?
- gam_erp: average of the GAM smooth in the time window
For all these measures holds that we look at a specified time window or search window. The traditional measure requires a narrower time window (e.g. 150-300 ms post stimulus onset), the GAM measures require a search window which can be wider (here we use 0 to 800 ms post stimulus onset). We only look for negative peaks.
The next section of code runs the GAMs for one single participant, extracts the measures and creates the plots (in a separate pdf document).
The model to fit each individual’s data is:
bam(uV ~ s(time, condition, bs = ‘fs’, m = 1) + s(time, stim, bs = ‘fs’, m = 1), discrete = TRUE)
# define search window
= 0; search_max = 800;
search_min # define classic erp window (this can be from permutation test)
<- 150; trad_max <- 300
trad_min
# loop over groups
for (group in c("f0", "vot")) {
# get data
<- get(paste0("df", "_", group))
df # define poa levels
<- levels(df$poa)
poas # define direction levels
<- levels(df$immn_direction)
directions
# initialize the full dataframe
<- data.frame()
df_gam
# get participant list
<- levels(df$participant)
participants
# loop over participants for modeling
for (participant in participants) {
# get single participant data
<- df %>%
df_tmp filter(participant == participant) %>%
droplevels()
# modeling
<- bam(uV ~
model # poa * stim_role + # fixed effects
s(time, by = condition) + # smooth for each poa x direction x stim_role
s(time, item, bs = "fs", m = 1), # random smooth by item
data = df_tmp,
discrete = TRUE) # for large data for speed
# summary(model)
# %%%%% extract peak height, peak time, and NMP %%%%%%
# get values and SE for every individual time point
<- min(model$model[, "time"])
min_time <- max(model$model[, "time"])
max_time = length(seq(min_time, max_time))
nval
# initialize stim index
<- 1
stim_ind
# loop over POAs
for (poa_ind in 1:length(poas)) {
# loop over directions
for (direction_ind in 1:length(directions)) {
# get poa and direction labels
<- poas[poa_ind]
poa <- directions[direction_ind]
direction
# extract modeled standard and deviant data
<- itsadug::get_modelterm(model, select=stim_ind, n.grid = nval, as.data.frame = TRUE)
devi <- stim_ind+1
stim_ind <- itsadug::get_modelterm(model, select=stim_ind, n.grid = nval, as.data.frame = TRUE)
stan <- stim_ind+1
stim_ind
# get difference for MMN
<- devi %>%
dat mutate(condition = "mmn",
fit = devi$fit - stan$fit,
se.fit = sqrt(devi$se.fit^2 + stan$se.fit^2))
# subset search data
<- dat[dat$time>=search_min & dat$time<=search_max, ]
sdat
# get derivative and search for peak (derivative=0) of a negativity (previous derivative value < 0, which means the actual ERP waveform is decreasing before this point)
<- data.frame(diff(dat$fit)/diff(dat$time)) # derivative
drv colnames(drv) <- 'dYdX'
$time <- rowMeans(embed(dat$time,2)) # center the X values for plotting
drv$dYdX.next <- c(drv$dYdX[2:nrow(drv)],NA)
drv$dYdX.prev <- c(NA,drv$dYdX[1:(nrow(drv)-1)])
drv
# MMN peak: going down (<0) then going up (>0)
$local_peak <- ((drv$dYdX < 0 & drv$dYdX.next > 0) | (drv$dYdX.next > 0 & drv$dYdX == 0 & drv$dYdX.prev < 0))
drv
# if at least one local peak in the search time window
if (sum(drv[drv$time>=search_min & drv$time<=search_max, ]$local_peak, na.rm=TRUE) >= 1) {
= TRUE
hasPeak # get all peak times
<- drv[which(drv$local_peak & drv$time>=search_min & drv$time<=search_max), "time"]
all_peak_times # initialize peak height with some larger value
<- Inf
peak_height # loop over local peak times
for (peak_ind in 1:length(all_peak_times)) {
# get the two fitted data points centering the local peak
= dat[dat$time >= floor(all_peak_times[peak_ind]) & dat$time <= ceiling(all_peak_times[peak_ind]), ]
peakdat # if the current height is smaller than the original peak height
if ( min(peakdat$fit) < peak_height) {
# update peak height
<- min(peakdat$fit)
peak_height # update peak time
<- all_peak_times[peak_ind]
peak_time # update se
<- peakdat[which.min(peakdat$fit),]$se.fit
peak_se # update NMP (original code doesn't have 1.96 factor)
<- peak_height / (1.96*peak_se) # relative peak measure (if < 1 then 95%CI overlaps with 0 at point of peak)
NMP
}
}else { # if no local peak
} <- FALSE
hasPeak # get general peak in search span
<- dat[dat$time>=search_min & dat$time<=search_max, ] # subset data
subdat # find peak
<- min(subdat$fit)
peak_height <- which.min(subdat$fit)
peak_index # get time
<- subdat[peak_index, "time"] # first time value with peak value
peak_time <- subdat[peak_index, ]$se.fit
peak_se <- peak_height / (1.96*peak_se)
NMP
}
# if we are looking for a valley but the value is positive, then no correct positivity/negativity
if (peak_height >= 0) {
<- NA
peak_height <- NA
peak_time <- NA
peak_se <- NA
NMP
}
# if peak time is the first point, there is so no real minimum
if (peak_time == min(sdat$time)) {
<- NA
peak_height <- NA
peak_time <- NA
peak_se <- NA
NMP
}
# get average of the GAM smooth in the tradition erp time window
= mean(dat[dat$time>=trad_min & dat$time<=trad_max, ]$fit)
gam_erp
# get area and fractional are latency
if (is.na(peak_time)) {
<- NA
area <- NA
half_area_latency else {
} # initialize are
<- 0
area = round(peak_time) # start time to take integral from
start # firsttime = search_min
# lasttime = search_max
# area to the right from the peak
for (i in start:search_max) {
= sdat[sdat$time == i,]$fit
val
# if derivative <=0
if (val <= 0) {
= area + abs(val)
area else { # end of peak, so stop going in this direction
} break
}= i
lasttime
}# area to the left from the peak
= start-1
beforestart if (beforestart >= search_min) {
for (j in beforestart:search_min) { # to the left from the peak
= sdat[sdat$time == j,]$fit
val # if derivative <=0
if (val <= 0) {
= area + abs(val)
area else { # end of peak, so stop going in this direction
} break
}= j
firsttime
}
}
# get half area latency
<- 0
halfarea for (k in firsttime:lasttime) {
= sdat[sdat$time == k, ]$fit
val = halfarea + abs(val)
halfarea if (halfarea >= 0.5 * area) {
= k
half_area_latency break
}
}# get area and fractional are latency end
}
# get measures for the current condition
<- data.frame(participant, poa, direction, hasPeak, area, peak_height, peak_se, NMP, peak_time, half_area_latency, gam_erp)
tmp_row # add to the extracted data
<- rbind(df_gam, tmp_row)
df_gam
#%%%%% plot for each condition start %%%%%
# get data for plotting
$fit.plus.se = dat$fit + dat$se.fit
dat$fit.minus.se = dat$fit - dat$se.fit
dat
# raw ERP data for plotting
<- model$model %>%
erp_df group_by(condition, time) %>%
summarize(mean_uV = mean(uV)) %>%
pivot_wider(names_from = condition, values_from = mean_uV) %>%
mutate(
uV_diff = .data[[paste0(poas[poa_ind], "/", directions[direction_ind], "/", "devi")]] - .data[[paste0(poas[poa_ind], "/", directions[direction_ind], "/", "stan")]]
)
# df for shades
<- sdat %>%
segments_df filter(time>=firsttime & time<=lasttime) %>%
mutate(x = time, xend = time,
y = 0, yend = fit)
# plotting
<- ggplot(dat, aes(x = time, y = fit)) +
fig # ribbon
geom_ribbon(aes(ymin = fit - se.fit, ymax = fit + se.fit), fill = "skyblue", alpha = 0.3) +
# gam modelled mmn
geom_line(color = "blue", linewidth = 1) +
# shade the search window
annotate("rect", xmin = search_min, xmax = search_max, ymin = -Inf, ymax = Inf, fill = "red", alpha = 0.1) +
# add horizontal and vertical zero lines
geom_vline(xintercept = 0, linetype = "solid", alpha = 0.2) +
geom_hline(yintercept = 0, linetype = "solid", alpha = 0.2) +
# mark peak time
geom_vline(xintercept=peak_time, linetype = "dashed", linewidth=1) +
# mark fractional area latency
geom_vline(xintercept=half_area_latency, linetype = "dotted", linewidth=1) +
# add derivative (scaled for visualization)
geom_line(data = drv, aes(x = time, y = dYdX*100), color = 'red', linetype = "dashed") +
# add shades to the modeled are
geom_segment(data = segments_df, aes(x=x, xend=xend, y=y, yend=yend), color = "blue", alpha = 0.1) +
# raw difference erp
geom_line(data = erp_df, aes(time, uV_diff), linetype = "dotted", color="black", linewidth=0.5) +
theme_bw() +
labs(title = paste0(participant, "_", poa, "_", direction, "_identityMMN"),
subtitle = paste0("NMP = ",round(NMP,digits=2)," (uV: ",round(peak_height,digits=2),", SE: ", round(peak_se,digits=2), ")"),
x = "Time (ms)", y = "uV")
# print(fig)
ggsave(plot = fig, width = 8, height = 5, units = "in", dpi = 300, filename = paste0("~/OneDrive - University of Toronto/Projects/Laryngeal/figures/gam/GAM_iMMN_", participant, "_", poa, "_", direction, ".png"))
#### plot for each condition end ####
# loop over directions end
} # loop over POAs end
} # loop over participants end
} # loop over groups end }