library(knitr)
library(tidyverse)
library(tidyr)
library(dplyr)
library(hrbrthemes)
library(ggplot2)
library(car)
library(RColorBrewer)
library(ggpubr)
::opts_chunk$set(echo = TRUE,
knitreval = TRUE)
Background
An LC50 determination for nickel on Botryllus schlosseri was completed in Spring Quarter 2024. Specimens were snap frozen at the 24 and 96 hour mark to assess for accumulation of Superoxide Dismutase 1 (SOD1). SOD1 is an endogenous antioxidant with a primary function involving the removal of reactive oxygen species (ROS).
Nickel genotoxicity functions indirectly through the resultant intracellular accumulation of ROS in most animal cells. ROS cause double and single stranded breaks to the DNA which inevitably may result in mutations forming at the cellular attempts to repair the DNA.
Here we explore the effects of increasing concentrations of nickel on the accumulation of SOD1 in B. schlosseri.
Retrieving Data from Google Sheets
SOD Data
Make sure you have made your Google sheet publicly available to anyone that has the link. If you make any updates to the sheet just re-curl the data, meaning just re-run the code below.
I apologize for not making a relative path. Just modify what is after “tee” to your own directory path.
curl -L https://docs.google.com/spreadsheets/d/1vNxX2tBdEa0Ibyd4mFdru5sAZx0aipv37-y-hkqvJq4/export?exportFormat=csv | tee /Users/valeste/Documents/Git/tough-tunicates/posts/LC50-SOD/SOD.csv
BCA Data
curl -L https://docs.google.com/spreadsheets/d/1mKhd95gn_tith8fJbYjX3jGyrO-Y4mWKCoCKdTVZFHY/export?exportFormat=csv | tee /Users/valeste/Documents/Git/tough-tunicates/posts/LC50-SOD/BCA.csv
Read in the data to your global environment
<- read.csv("SOD.csv")
sod
#let's take a peak
head(sod)
well_content tech_rep hom_num date_assay treatment hpe sod_calc
1 Hom 1 1:3 R1 R1 1 7/2/24 0 24 2.305306
2 Hom 1 1:3 R2 R2 1 7/2/24 0 24 2.179302
3 Hom 2 1:3 R1 R1 2 7/2/24 0 24 4.959112
4 Hom 2 1:3 R2 R2 2 7/2/24 0 24 3.800808
5 Hom 3 1:3 R1 R1 3 7/2/24 0 24 3.495769
6 Hom 3 1:3 R2 R2 3 7/2/24 0 24 1.962912
<- read.csv("BCA.csv")
bca
#another little peak
head(bca)
protein_conc.mg.mL well_content hom_num tech_rep well plate assay_date
1 0.2905277 Hom 1 1:4 R1 1 R1 A1 2 7/4/24
2 0.2385391 Hom 1 1:4 R2 1 R2 B1 2 7/4/24
3 0.2388732 Hom 1 1:4 R3 1 R3 C1 2 7/4/24
4 0.3881566 Hom 2 1:4 R1 2 R1 A2 2 7/4/24
5 0.3379439 Hom 2 1:4 R2 2 R2 B2 2 7/4/24
6 0.3473041 Hom 2 1:4 R3 2 R3 C2 2 7/4/24
treatment
1 0
2 0
3 0
4 0
5 0
6 0
In the BCA data frame, we are missing the hpe associated with the each sample. We can add this on to a new data frame that also combines it with the SOD data after we take the averages and standard errors of the technical replicates of the BCA assay.
Note that as of July 3, 2024, the data set for the 10 mg/L is incomplete in this data frame and should be taken with a grain of salt or removed from the data frame below. The 0 mg/L treatment is also incomplete with homogenate 37 needing to be re-evaluated.
Data Munging!
Filter Data
Clean up SOD data
<- sod %>%
df_clean_sod filter(!is.na(sod_calc)) %>% # remove any rows with NA
filter(treatment != 10) # remove the 10 mg/L treatment since the reps have not all been processed yet
Clean up BCA data
<- bca %>%
df_clean_bca filter(!is.na(protein_conc.mg.mL)) %>% # remove any rows with NA
filter(treatment != 10) # remove the 10 mg/L treatment since the reps have not all been processed yet
Average assays’ technical replicates
Get the mean and standard error of the technical replicates for SOD:
<- df_clean_sod %>%
df_avg_sod group_by(hom_num, treatment, hpe) %>%
summarize(
sod_avg = mean(sod_calc, na.rm = TRUE),
sod_se = sd(sod_calc, na.rm = TRUE) / sqrt(n())
)
`summarise()` has grouped output by 'hom_num', 'treatment'. You can override
using the `.groups` argument.
<- df_clean_bca %>%
df_avg_bca group_by(hom_num, treatment) %>%
summarize(
bca_avg = mean(protein_conc.mg.mL, na.rm = TRUE),
bca_se = sd(protein_conc.mg.mL, na.rm = TRUE) / sqrt(n())
)
`summarise()` has grouped output by 'hom_num'. You can override using the
`.groups` argument.
<- inner_join(df_avg_sod, df_avg_bca, by = c("hom_num", "treatment"))
combo head(combo)
# A tibble: 6 × 7
# Groups: hom_num, treatment [6]
hom_num treatment hpe sod_avg sod_se bca_avg bca_se
<int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 0 24 2.24 0.0630 0.256 0.0173
2 2 0 24 4.38 0.579 0.358 0.0154
3 3 0 24 2.73 0.766 0.357 0.00688
4 7 1 24 8.36 0.513 0.542 0.0430
5 8 1 24 0.487 0.100 0.483 0.0125
6 9 1 24 2.81 1.05 0.345 0.0184
The column sod_avg is in activity units/mL and the column bca_avg is in mg/mL.
<- combo %>%
combo mutate(sod_u_mg = sod_avg / bca_avg)
Exploratory Plots
Note that the plot below is the unormalized data. A BCA was run on the samples as well to quantify total protein content.
ggplot(combo, aes(x = factor(treatment), y = sod_avg, fill = factor(treatment))) +
geom_violin(trim = FALSE) +
facet_wrap(~ hpe, scales = "free_y") +
labs(title = "Violin Plot of SOD Activity (u/mL) by Treatment and HPE",
fill = "Nickel Concentration",
x = "Treatment",
y = "SOD Activity")
SOD1 activity normalized to the protein content per sample:
ggplot(combo, aes(x = factor(treatment), y = sod_u_mg, fill = factor(treatment))) +
geom_violin(trim = FALSE) +
facet_wrap(~ hpe, scales = "free_y") +
labs(title = "Violin Plot of SOD Activity (u/mg) by Treatment and HPE",
fill = "Nickel Concentration",
x = "Treatment",
y = "SOD Activity")
Statistical Analysis
We will run a two-way ANOVA on the data exploring sod1 activity units/mg of protein by hours post exposure and nickel concentration.
# Perform ANOVA
<- aov(sod_u_mg ~ hpe * treatment, data = combo)
anova_result
# Summarize ANOVA results
summary(anova_result)
Df Sum Sq Mean Sq F value Pr(>F)
hpe 1 2 1.58 0.017 0.897
treatment 1 96 95.96 1.037 0.315
hpe:treatment 1 42 42.47 0.459 0.502
Residuals 41 3795 92.56
ANOVA Conclusions
SOD1 activity does not differ across nickel concentration treatment groups or even the control at either the 24 or 96 hours post exposure mark.
Next Steps?
There could be various reasons why we did not observe the expected effect of increased SOD activity with increasing concentrations of nickel.
Power Analysis
Currently this will only work below if the groups are of equal sample size. Because I have yet to re-process the homogenate 37 (the final replicate for the control) this analysis cannot work.
library(pwr)
<- aov(sod_u_mg ~ hpe * treatment, data = combo)
anova_result <- summary(anova_result)
summary_result
# Extract the sum of squares
<- sum(summary_result[[1]][, "Sum Sq"])
ss_total <- summary_result[[1]]["Residuals", "Sum Sq"]
ss_residual
# Calculate eta squared
<- 1 - (ss_residual / ss_total)
eta_squared
# Calculate effect size f
<- sqrt(eta_squared / (1 - eta_squared)) effect_size_f
# Number of groups in your ANOVA (e.g., hpe * treatment)
<- length(unique(combo$hpe)) * length(unique(combo$treatment))
num_groups
# Significance level (commonly set to 0.05)
<- 0.05
alpha
# Sample size (total number of observations)
<- nrow(combo)
sample_size
# Conduct power analysis
<- pwr.anova.test(k = num_groups, f = effect_size_f, sig.level = alpha, n = sample_size / num_groups)
power_analysis
# Print the result
print(power_analysis)
Balanced one-way analysis of variance power calculation
k = 6
n = 7.5
f = 0.1920704
sig.level = 0.05
power = 0.1238005
NOTE: n is number in each group
Note that n = 7.5 is in relation to the unequal sample sizes per group which is in relation to the missing control sample, homogenate 37.
Power (0.1238005): The power of 0.1238 indicates that this study has a low probability (approximately 12.38%) of detecting the true effect (f = 0.1920704) under the current conditions (sample sizes, effect size, and significance level).
Considerations: To increase power, we may need to consider increasing the sample size per group, using more sensitive measures, or adjusting your study design to reduce variability.