Last updated: 2024-07-23

Checks: 7 0

Knit directory: DOX_24_Github/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20240723)

The command set.seed(20240723) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 85b4737

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 85b4737. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store

Untracked files:
    Untracked:  Johnson_DOX_24_1.Rmd
    Untracked:  Johnson_DOX_24_2.Rmd
    Untracked:  Johnson_DOX_24_3.Rmd
    Untracked:  Johnson_DOX_24_4.Rmd
    Untracked:  Johnson_DOX_24_5.Rmd
    Untracked:  Johnson_DOX_24_7.Rmd
    Untracked:  Johnson_DOX_24_RUV_Limma.Rmd
    Untracked:  Johnson_DOX_24_RUV_Limma.html
    Untracked:  VennDiagram.2024-07-23_18-08-30.log
    Untracked:  VennDiagram.2024-07-23_18-08-31.log

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/DDA_DIA_Comparison.Rmd) and HTML (docs/DDA_DIA_Comparison.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	85b4737	Omar-Johnson	2024-07-23	more updates for paper

Library

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load data

DDA_Toptable <- read.csv(file = "/Users/omarjohnson/Documents/Projects/Dox_Proteomics/Data/Data_Frames/Important_DF_Final/toptable_Dupcor.csv")

DIA_Toptable <- read.csv(file ="/Users/omarjohnson/Documents/Projects/Dox_Proteomics/Data/Data_Frames/DIA_proteins/RUVIII_Imputed/Toptable_summary_RUVIII.csv" , header = TRUE)


DDA_Ab <- read.csv(file = "/Users/omarjohnson/Documents/Projects/Dox_Proteomics/Data/Data_Frames/Abundance.csv", header = TRUE)

DIA_Ab <- read.csv(file = "/Users/omarjohnson/Documents/Projects/Dox_Proteomics/Data/Data_Frames/RNA_Protein_DF/Data_ind_prot_DOX_24hr.csv", header = TRUE)


Meta <- read.csv(file = "/Users/omarjohnson/Documents/Projects/Dox_Proteomics/Data/Data_Frames/DIA_proteins/RUVIII_Imputed/Meta.csv", header = TRUE, row.names = 1)

Get data for DDA & DIA LF_Changes

DDA_Toptable %>% head()

      logFC  AveExpr         t      P.Value    adj.P.Val        B Protein
1 -3.553817 23.99047 -16.74339 2.054348e-08 6.951913e-05 9.416789  Q15327
2 -2.141093 24.25375 -13.55227 1.444049e-07 2.443330e-04 7.859382  O76021
3 -1.634296 19.97143 -12.65816 2.690260e-07 3.034614e-04 7.328583  Q96LW7
4 -4.342380 20.84232 -11.92097 4.632261e-07 3.918893e-04 6.853153  Q969Q1
5 -1.165897 20.44496 -11.62228 5.822611e-07 3.940743e-04 6.649959  P19388
6 -1.762280 20.67219 -10.83369 1.092754e-06 6.163133e-04 6.081777  Q96GQ7

DDA_Toptable %>% dim()

[1] 3384    7

DIA_Toptable %>% head()

       X     logFC  AveExpr         t      P.Value    adj.P.Val        B
1 Q15061  2.436903 18.60721  26.00939 1.499311e-07 1.499311e-07 7.414993
2 O76021 -2.224860 21.98800 -18.86036 1.191879e-06 1.191879e-06 6.029757
3 P43694 -1.821642 18.81578 -15.98230 3.129665e-06 3.129665e-06 5.276514
4 P01130 -1.819689 20.01194 -15.77595 3.483331e-06 3.483331e-06 5.186033
5 Q96P11 -1.843458 17.28018 -14.33937 5.735316e-06 5.735316e-06 4.764600
6 Q6EMK4  1.838032 19.30419  14.22737 6.455771e-06 6.455771e-06 4.660093
  Protein threshold_P
1  Q15061        TRUE
2  O76021        TRUE
3  P43694        TRUE
4  P01130        TRUE
5  Q96P11        TRUE
6  Q6EMK4        TRUE

DIA_Toptable %>% dim()

[1] 4178    9

DDA_and_DIA <-  merge(DDA_Toptable, 
                          DIA_Toptable, by.x = "Protein", by.y = "Protein")

# colnames.x correspond to DDA, and colnames.y correspond to DIA. 
DDA_and_DIA %>% head()

     Protein     logFC.x AveExpr.x        t.x  P.Value.x adj.P.Val.x       B.x
1 A0A0B4J2D5  0.20404967  28.45442  1.9017522 0.08766216   0.2869147 -5.030079
2     A0FGR8  0.16807616  23.26372  0.7831563 0.45246181   0.6682978 -6.371292
3     A1A4S6  1.25691548  19.09886  1.3350552 0.21271597   0.4479346 -5.806614
4     A1L0T0 -0.25213697  24.34430 -2.3804932 0.03960844   0.1911431 -4.283075
5     A2RRP1 -0.30334810  24.20335 -2.0043164 0.07410958   0.2637085 -4.875179
6     A3KMH1 -0.02219401  25.50247 -0.1750399 0.86469240   0.9318851 -6.677851
           X     logFC.y AveExpr.y         t.y P.Value.y adj.P.Val.y       B.y
1 A0A0B4J2D5  0.13528476  23.88276  1.31734301 0.2349365   0.2349365 -6.046336
2     A0FGR8  0.09891217  20.14964  1.18755311 0.2786322   0.2786322 -6.195353
3     A1A4S6  0.14066972  16.85673  1.35338152 0.2233311   0.2233311 -6.003703
4     A1L0T0 -0.11886476  21.50372 -1.89930794 0.1048450   0.1048450 -5.294689
5     A2RRP1  0.01039713  21.36635  0.09930159 0.9240748   0.9240748 -6.919855
6     A3KMH1  0.03750700  22.53324  0.22802321 0.8270631   0.8270631 -6.895697
  threshold_P
1       FALSE
2       FALSE
3       FALSE
4       FALSE
5       FALSE
6       FALSE

DDA_and_DIA %>% dim()

[1] 3058   15

S5 - All proteins LogFC corr.

# Model all proteins across acquisition types 
# Fit linear model
model_LogFC <- lm(logFC.x ~ logFC.y, data = DDA_and_DIA)

# Get R-squared
r2 <- summary(model_LogFC)$r.squared
summary(model_LogFC)


Call:
lm(formula = logFC.x ~ logFC.y, data = DDA_and_DIA)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2458 -0.1185  0.0256  0.1501  1.8535 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.072848   0.005623  -12.96   <2e-16 ***
logFC.y      0.887968   0.015554   57.09   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3109 on 3056 degrees of freedom
Multiple R-squared:  0.5161,    Adjusted R-squared:  0.5159 
F-statistic:  3259 on 1 and 3056 DF,  p-value: < 2.2e-16

#------ Plot for both same sign and not same sign
ggplot2::ggplot(DDA_and_DIA, aes(x = logFC.x, y = logFC.y, color = threshold_P)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  annotate("text", x = Inf, y = Inf, vjust = 2, hjust = 2,
           label = paste("R^2 = ", round(r2, digits = 2), sep = ""),
           color = "black") +
  scale_color_manual(values = c("red", "blue")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'

DA proteins in DIA LogFC corr.

# Model differentially abundant (from DIA) proteins across acquisition types 
# Fit linear model

Diff_ab_DDA_and_DIA  <- DDA_and_DIA[DDA_and_DIA$adj.P.Val.y < 0.05,  ]

 
Diff_ab_DDA_and_DIA %>% head()

   Protein     logFC.x AveExpr.x        t.x    P.Value.x adj.P.Val.x        B.x
15  E9PAV3 -0.41996261  25.78583 -2.6237182 0.0263003158  0.15505273 -3.8867830
25  O00203  0.06090409  22.74772  0.4796207 0.6422585277  0.81177071 -6.5708717
40  O00442  0.32365861  19.81594  2.3054213 0.0449252846  0.20297352 -4.4036650
41  O00461 -0.47579646  22.78785 -5.2536544 0.0004278982  0.01540433  0.2552059
48  O00541 -2.31904608  20.50036 -4.8248426 0.0007877502  0.02117750 -0.3639773
50  O00567  0.24497484  28.51856  1.1617050 0.2734821244  0.51512631 -6.0084778
        X    logFC.y AveExpr.y       t.y    P.Value.y  adj.P.Val.y       B.y
15 E9PAV3 -0.3998172  18.17394 -3.300592 1.587372e-02 1.587372e-02 -3.343002
25 O00203  0.2596199  19.86848  3.353728 1.467563e-02 1.467563e-02 -3.264772
40 O00442  0.5893604  17.23225  4.133940 5.836938e-03 5.836938e-03 -2.269778
41 O00461 -0.2739090  18.74024 -2.919348 2.575340e-02 2.575340e-02 -3.860469
48 O00541 -1.0325875  20.20522 -9.474677 7.036491e-05 7.036491e-05  2.402861
50 O00567  0.4649937  21.77611  4.679509 3.234119e-03 3.234119e-03 -1.630077
   threshold_P
15        TRUE
25        TRUE
40        TRUE
41        TRUE
48        TRUE
50        TRUE

Diff_ab_DDA_and_DIA %>% dim()

[1] 858  15

model_LogFC <- lm(logFC.x ~ logFC.y, data = Diff_ab_DDA_and_DIA)

# Get R-squared
r2 <- summary(model_LogFC)$r.squared
summary(model_LogFC)


Call:
lm(formula = logFC.x ~ logFC.y, data = Diff_ab_DDA_and_DIA)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2012 -0.1322  0.0290  0.1583  1.6386 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.09764    0.01206  -8.093 1.98e-15 ***
logFC.y      0.90515    0.02052  44.103  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3514 on 856 degrees of freedom
Multiple R-squared:  0.6944,    Adjusted R-squared:  0.694 
F-statistic:  1945 on 1 and 856 DF,  p-value: < 2.2e-16

#------ Plot for both same sign and not same sign
ggplot2::ggplot(DDA_and_DIA, aes(x = logFC.x, y = logFC.y, color = threshold_P)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  annotate("text", x = Inf, y = Inf, vjust = 2, hjust = 2,
           label = paste("R^2 = ", round(r2, digits = 2), sep = ""),
           color = "black") +
  scale_color_manual(values = c("red", "blue")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'

DA protein overlap for DDA & DIA corr.

# Model differentially abundant proteins that are differentially abundant for DIA AND DDA methods 
# Fit linear model
DDA_and_DIA %>% head()

     Protein     logFC.x AveExpr.x        t.x  P.Value.x adj.P.Val.x       B.x
1 A0A0B4J2D5  0.20404967  28.45442  1.9017522 0.08766216   0.2869147 -5.030079
2     A0FGR8  0.16807616  23.26372  0.7831563 0.45246181   0.6682978 -6.371292
3     A1A4S6  1.25691548  19.09886  1.3350552 0.21271597   0.4479346 -5.806614
4     A1L0T0 -0.25213697  24.34430 -2.3804932 0.03960844   0.1911431 -4.283075
5     A2RRP1 -0.30334810  24.20335 -2.0043164 0.07410958   0.2637085 -4.875179
6     A3KMH1 -0.02219401  25.50247 -0.1750399 0.86469240   0.9318851 -6.677851
           X     logFC.y AveExpr.y         t.y P.Value.y adj.P.Val.y       B.y
1 A0A0B4J2D5  0.13528476  23.88276  1.31734301 0.2349365   0.2349365 -6.046336
2     A0FGR8  0.09891217  20.14964  1.18755311 0.2786322   0.2786322 -6.195353
3     A1A4S6  0.14066972  16.85673  1.35338152 0.2233311   0.2233311 -6.003703
4     A1L0T0 -0.11886476  21.50372 -1.89930794 0.1048450   0.1048450 -5.294689
5     A2RRP1  0.01039713  21.36635  0.09930159 0.9240748   0.9240748 -6.919855
6     A3KMH1  0.03750700  22.53324  0.22802321 0.8270631   0.8270631 -6.895697
  threshold_P
1       FALSE
2       FALSE
3       FALSE
4       FALSE
5       FALSE
6       FALSE

Diff_ab_DDA_and_DIA  <- DDA_and_DIA[(DDA_and_DIA$adj.P.Val.y < 0.05) & (DDA_and_DIA$P.Value.x < 0.05),  ]

Diff_ab_DDA_and_DIA %>% head()

   Protein    logFC.x AveExpr.x       t.x    P.Value.x adj.P.Val.x        B.x
15  E9PAV3 -0.4199626  25.78583 -2.623718 0.0263003158  0.15505273 -3.8867830
40  O00442  0.3236586  19.81594  2.305421 0.0449252846  0.20297352 -4.4036650
41  O00461 -0.4757965  22.78785 -5.253654 0.0004278982  0.01540433  0.2552059
48  O00541 -2.3190461  20.50036 -4.824843 0.0007877502  0.02117750 -0.3639773
51  O00571 -0.5290939  26.05640 -4.666530 0.0009934630  0.02334638 -0.5995461
67  O14656  0.4511342  20.15187  3.168577 0.0105478677  0.09013632 -2.9849006
        X    logFC.y AveExpr.y       t.y    P.Value.y  adj.P.Val.y       B.y
15 E9PAV3 -0.3998172  18.17394 -3.300592 1.587372e-02 1.587372e-02 -3.343002
40 O00442  0.5893604  17.23225  4.133940 5.836938e-03 5.836938e-03 -2.269778
41 O00461 -0.2739090  18.74024 -2.919348 2.575340e-02 2.575340e-02 -3.860469
48 O00541 -1.0325875  20.20522 -9.474677 7.036491e-05 7.036491e-05  2.402861
51 O00571 -0.5036178  22.30600 -3.113365 2.025297e-02 2.025297e-02 -3.599754
67 O14656  0.4239736  20.26135  2.960466 2.470779e-02 2.470779e-02 -3.810087
   threshold_P
15        TRUE
40        TRUE
41        TRUE
48        TRUE
51        TRUE
67        TRUE

Diff_ab_DDA_and_DIA %>% dim()

[1] 428  15

model_LogFC <- lm(logFC.x ~ logFC.y, data = Diff_ab_DDA_and_DIA)

# Get R-squared
r2 <- summary(model_LogFC)$r.squared
summary(model_LogFC)


Call:
lm(formula = logFC.x ~ logFC.y, data = Diff_ab_DDA_and_DIA)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.04668 -0.11188  0.05095  0.15284  1.43115 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.07325    0.01908  -3.838 0.000143 ***
logFC.y      1.06036    0.02644  40.108  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3647 on 426 degrees of freedom
Multiple R-squared:  0.7906,    Adjusted R-squared:  0.7901 
F-statistic:  1609 on 1 and 426 DF,  p-value: < 2.2e-16

#------ Plot for both same sign and not same sign
ggplot2::ggplot(Diff_ab_DDA_and_DIA, aes(x = logFC.x, y = logFC.y)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  annotate("text", x = Inf, y = Inf, vjust = 2, hjust = 2,
           label = paste("R^2 = ", round(r2, digits = 2), sep = ""),
           color = "black") +
  scale_color_manual(values = c("red", "blue")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'

LogFC corr. for non-DA

# Model differentially abundant proteins that are NOT differentially abundant for DIA AND DDA methods 
# Fit linear model
DDA_and_DIA %>% head()

     Protein     logFC.x AveExpr.x        t.x  P.Value.x adj.P.Val.x       B.x
1 A0A0B4J2D5  0.20404967  28.45442  1.9017522 0.08766216   0.2869147 -5.030079
2     A0FGR8  0.16807616  23.26372  0.7831563 0.45246181   0.6682978 -6.371292
3     A1A4S6  1.25691548  19.09886  1.3350552 0.21271597   0.4479346 -5.806614
4     A1L0T0 -0.25213697  24.34430 -2.3804932 0.03960844   0.1911431 -4.283075
5     A2RRP1 -0.30334810  24.20335 -2.0043164 0.07410958   0.2637085 -4.875179
6     A3KMH1 -0.02219401  25.50247 -0.1750399 0.86469240   0.9318851 -6.677851
           X     logFC.y AveExpr.y         t.y P.Value.y adj.P.Val.y       B.y
1 A0A0B4J2D5  0.13528476  23.88276  1.31734301 0.2349365   0.2349365 -6.046336
2     A0FGR8  0.09891217  20.14964  1.18755311 0.2786322   0.2786322 -6.195353
3     A1A4S6  0.14066972  16.85673  1.35338152 0.2233311   0.2233311 -6.003703
4     A1L0T0 -0.11886476  21.50372 -1.89930794 0.1048450   0.1048450 -5.294689
5     A2RRP1  0.01039713  21.36635  0.09930159 0.9240748   0.9240748 -6.919855
6     A3KMH1  0.03750700  22.53324  0.22802321 0.8270631   0.8270631 -6.895697
  threshold_P
1       FALSE
2       FALSE
3       FALSE
4       FALSE
5       FALSE
6       FALSE

Diff_ab_DDA_and_DIA  <- DDA_and_DIA[(DDA_and_DIA$adj.P.Val.y > 0.05) & (DDA_and_DIA$P.Value.x > 0.05),  ]

Diff_ab_DDA_and_DIA %>% head()

     Protein     logFC.x AveExpr.x        t.x  P.Value.x adj.P.Val.x       B.x
1 A0A0B4J2D5  0.20404967  28.45442  1.9017522 0.08766216   0.2869147 -5.030079
2     A0FGR8  0.16807616  23.26372  0.7831563 0.45246181   0.6682978 -6.371292
3     A1A4S6  1.25691548  19.09886  1.3350552 0.21271597   0.4479346 -5.806614
5     A2RRP1 -0.30334810  24.20335 -2.0043164 0.07410958   0.2637085 -4.875179
6     A3KMH1 -0.02219401  25.50247 -0.1750399 0.86469240   0.9318851 -6.677851
7     A4D1E9 -0.29082103  21.22996 -1.6046912 0.14098100   0.3630873 -5.456428
           X    logFC.y AveExpr.y        t.y P.Value.y adj.P.Val.y       B.y
1 A0A0B4J2D5 0.13528476  23.88276 1.31734301 0.2349365   0.2349365 -6.046336
2     A0FGR8 0.09891217  20.14964 1.18755311 0.2786322   0.2786322 -6.195353
3     A1A4S6 0.14066972  16.85673 1.35338152 0.2233311   0.2233311 -6.003703
5     A2RRP1 0.01039713  21.36635 0.09930159 0.9240748   0.9240748 -6.919855
6     A3KMH1 0.03750700  22.53324 0.22802321 0.8270631   0.8270631 -6.895697
7     A4D1E9 0.08757962  22.56991 0.56521074 0.5920505   0.5920505 -6.746126
  threshold_P
1       FALSE
2       FALSE
3       FALSE
5       FALSE
6       FALSE
7       FALSE

Diff_ab_DDA_and_DIA %>% dim()

[1] 1909   15

model_LogFC <- lm(logFC.x ~ logFC.y, data = Diff_ab_DDA_and_DIA)

# Get R-squared
r2 <- summary(model_LogFC)$r.squared
summary(model_LogFC)


Call:
lm(formula = logFC.x ~ logFC.y, data = Diff_ab_DDA_and_DIA)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.22341 -0.10389  0.01458  0.12212  1.21599 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.028204   0.005237  -5.386  8.1e-08 ***
logFC.y      0.491414   0.026258  18.715  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.224 on 1907 degrees of freedom
Multiple R-squared:  0.1552,    Adjusted R-squared:  0.1547 
F-statistic: 350.3 on 1 and 1907 DF,  p-value: < 2.2e-16

#------ Plot for both same sign and not same sign
ggplot2::ggplot(Diff_ab_DDA_and_DIA, aes(x = logFC.x, y = logFC.y)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  annotate("text", x = Inf, y = Inf, vjust = 2, hjust = 2,
           label = paste("R^2 = ", round(r2, digits = 2), sep = ""),
           color = "black") +
  scale_color_manual(values = c("red", "blue")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'

dev.off()

null device 
          1

Compare protein list

# How many peptides overlap between DDA and DIA methods? 
DDA_and_DIA %>% nrow()

[1] 3058

# Load library
library(VennDiagram)

Loading required package: grid

Loading required package: futile.logger

# Define protein sets
ALL_DDA <- DDA_Toptable$Protein
ALL_DIA <- DIA_Toptable$Protein
  


# Generate Venn diagram
venn.plot <- venn.diagram(
  x = list(ALL_DDA,ALL_DIA),
  filename = NULL,
  category.names = c("ALL_DDA", "ALL_DIA"),
  output = TRUE
)
grid.draw(venn.plot)

dev.off()

null device 
          1

Compare DA lists

# How many differentially abundant peptides overlap between DDA and DIA methods? 

# Load library
library(VennDiagram)
 
# Define protein sets
DE_DDA <- DDA_Toptable[DDA_Toptable$P.Value < 0.05, ]$Protein
DE_DIA <- DIA_Toptable[DIA_Toptable$P.Value < 0.05, ]$Protein
  


# Generate Venn diagram
venn.plot <- venn.diagram(
  x = list(DE_DDA,DE_DIA),
  filename = NULL,
  category.names = c("DEL_DDA", "DE_DIA"),
  output = TRUE
)
grid.draw(venn.plot)

Common_DE_DDA_DIA <- intersect(DE_DDA,DE_DIA)

Common_DE_DDA_DIA_DF <- DDA_and_DIA[DDA_and_DIA$Protein %in% Common_DE_DDA_DIA, ]

S4B DA protein LogFC corr. in both

DE_DDA <- DDA_Toptable[DDA_Toptable$P.Value < 0.05, ]$Protein
DE_DIA <- DIA_Toptable[DIA_Toptable$P.Value < 0.05, ]$Protein
Common_DE_DDA_DIA <- intersect(DE_DDA,DE_DIA)
Common_DE_DDA_DIA_DF <- DDA_and_DIA[DDA_and_DIA$Protein %in% Common_DE_DDA_DIA, ]
model_LogFC <- lm(logFC.x ~ logFC.y, data = Common_DE_DDA_DIA_DF)

# Get R-squared
r2 <- summary(model_LogFC)$r.squared
summary(model_LogFC)


Call:
lm(formula = logFC.x ~ logFC.y, data = Common_DE_DDA_DIA_DF)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.04668 -0.11188  0.05095  0.15284  1.43115 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.07325    0.01908  -3.838 0.000143 ***
logFC.y      1.06036    0.02644  40.108  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3647 on 426 degrees of freedom
Multiple R-squared:  0.7906,    Adjusted R-squared:  0.7901 
F-statistic:  1609 on 1 and 426 DF,  p-value: < 2.2e-16

#------ Plot for both same sign and not same sign
ggplot2::ggplot(Common_DE_DDA_DIA_DF, aes(x = logFC.x, y = logFC.y)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  annotate("text", x = Inf, y = Inf, vjust = 2, hjust = 2,
           label = paste("R^2 = ", round(r2, digits = 2), sep = ""),
           color = "black") +
  scale_color_manual(values = c("red", "blue")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'

Correlate abundance values

Meta

    Samples  Ind  Rep    Cond     Cond_Ind
S1       S1 B_77  Bio     Dox     B_77_Dox
S3       S3 C_87  Bio     Dox     C_87_Dox
S5       S5 A_48 Tech     Dox     A_48_Dox
S7       S7 A_48 Tech     Dox     A_48_Dox
S9       S9 A_48 Tech     Dox     A_48_Dox
S2       S2 B_77  Bio Control B_77_Control
S4       S4 C_87  Bio Control C_87_Control
S6       S6 A_48 Tech Control A_48_Control
S8       S8 A_48 Tech Control A_48_Control
S10     S10 A_48 Tech Control A_48_Control

DDA_Ab %>% head()

  Entrez.Gene.ID Ensembl.Gene.ID Gene.Symbol Accession
1           4625 ENSG00000092054        MYH7    P12883
2           4624 ENSG00000197616        MYH6    P13533
3           7273 ENSG00000155657         TTN    Q8WZ42
4             70 ENSG00000159251       ACTC1    P68032
5             58 ENSG00000143632       ACTA1    P68133
6             60 ENSG00000075624        ACTB    P60709
  Abundance..F1..Sample..Dox..n.a Abundance..F3..Sample..Dox..n.a
1                        5.25e+10                        7.41e+10
2                        1.00e+09                        1.47e+09
3                        1.39e+10                        2.08e+10
4                        3.92e+10                        4.11e+10
5                        1.72e+07                        5.81e+07
6                        4.02e+09                        8.43e+08
  Abundance..F5..Sample..Dox..Dox Abundance..F7..Sample..Dox..Dox
1                        5.28e+10                        6.82e+10
2                        5.11e+09                        4.25e+09
3                        2.19e+10                        2.49e+10
4                        3.76e+10                        3.54e+10
5                        3.85e+07                        5.07e+07
6                        2.04e+09                        1.62e+09
  Abundance..F9..Sample..Dox..Dox Abundance..F2..Control..Veh..n.a
1                        6.43e+10                         4.33e+10
2                        2.62e+09                         1.50e+09
3                        2.34e+10                         1.29e+10
4                        3.69e+10                         3.54e+10
5                        6.19e+07                         1.64e+07
6                        1.61e+09                         4.16e+09
  Abundance..F4..Control..Veh..n.a Abundance..F6..Control..Veh..Veh
1                         7.41e+10                         7.00e+10
2                         1.17e+09                         2.42e+09
3                         2.25e+10                         2.52e+10
4                         2.96e+10                         3.56e+10
5                         3.44e+07                         5.42e+07
6                         6.45e+08                         1.92e+09
  Abundance..F8..Control..Veh..Veh Abundance..F10..Control..Veh..Veh
1                         5.64e+10                          5.26e+10
2                         3.67e+09                          3.27e+09
3                         2.46e+10                          2.21e+10
4                         2.98e+10                          2.91e+10
5                         3.36e+07                          4.05e+07
6                         1.81e+09                          1.56e+09

DDA_Ab %>% dim()

[1] 4504   14

DIA_Ab %>% head()

  Protein.Group Protein.Ids Protein.Names   Genes First.Protein.Description
1    A0A0B4J2A2  A0A0B4J2A2            NA PPIAL4C                        NA
2    A0A0B4J2D5  A0A0B4J2D5            NA  GATD3B                        NA
3    A0A494C071  A0A494C071            NA   PWWP4                        NA
4        A0AVT1      A0AVT1            NA    UBA6                        NA
5        A0FGR8      A0FGR8            NA   ESYT2                        NA
6        A0JLT2      A0JLT2            NA   MED19                        NA
          S1       S3       S5       S7       S9         S2       S4       S6
1   979954.0  1470000  1460000  1560000  1700000   940535.0  2020000  1460000
2 16000000.0 17000000 15800000 14400000 15500000 15300000.0 13500000 13600000
3   381352.0   429735   287237   412285   380499   433456.0   540008   376304
4    62810.7   135522   159045   123045   123823    49388.1   144860   127488
5  1240000.0  1120000   969130  1040000  1020000  1120000.0  1120000  1080000
6   217977.0   181404   182657   145408   208660   224881.0   202427   160150
        S8      S10
1  1390000  1620000
2 13200000 12600000
3   460580   355701
4    84684   102306
5  1120000  1060000
6   126127   209726

DIA_Ab %>% dim()

[1] 4261   15

DDA_Ab %>% colnames()

 [1] "Entrez.Gene.ID"                    "Ensembl.Gene.ID"                  
 [3] "Gene.Symbol"                       "Accession"                        
 [5] "Abundance..F1..Sample..Dox..n.a"   "Abundance..F3..Sample..Dox..n.a"  
 [7] "Abundance..F5..Sample..Dox..Dox"   "Abundance..F7..Sample..Dox..Dox"  
 [9] "Abundance..F9..Sample..Dox..Dox"   "Abundance..F2..Control..Veh..n.a" 
[11] "Abundance..F4..Control..Veh..n.a"  "Abundance..F6..Control..Veh..Veh" 
[13] "Abundance..F8..Control..Veh..Veh"  "Abundance..F10..Control..Veh..Veh"

# Subset the columns referring to abundance
DDA_Ab <-DDA_Ab[,4:14]
DDA_Ab %>% head()

  Accession Abundance..F1..Sample..Dox..n.a Abundance..F3..Sample..Dox..n.a
1    P12883                        5.25e+10                        7.41e+10
2    P13533                        1.00e+09                        1.47e+09
3    Q8WZ42                        1.39e+10                        2.08e+10
4    P68032                        3.92e+10                        4.11e+10
5    P68133                        1.72e+07                        5.81e+07
6    P60709                        4.02e+09                        8.43e+08
  Abundance..F5..Sample..Dox..Dox Abundance..F7..Sample..Dox..Dox
1                        5.28e+10                        6.82e+10
2                        5.11e+09                        4.25e+09
3                        2.19e+10                        2.49e+10
4                        3.76e+10                        3.54e+10
5                        3.85e+07                        5.07e+07
6                        2.04e+09                        1.62e+09
  Abundance..F9..Sample..Dox..Dox Abundance..F2..Control..Veh..n.a
1                        6.43e+10                         4.33e+10
2                        2.62e+09                         1.50e+09
3                        2.34e+10                         1.29e+10
4                        3.69e+10                         3.54e+10
5                        6.19e+07                         1.64e+07
6                        1.61e+09                         4.16e+09
  Abundance..F4..Control..Veh..n.a Abundance..F6..Control..Veh..Veh
1                         7.41e+10                         7.00e+10
2                         1.17e+09                         2.42e+09
3                         2.25e+10                         2.52e+10
4                         2.96e+10                         3.56e+10
5                         3.44e+07                         5.42e+07
6                         6.45e+08                         1.92e+09
  Abundance..F8..Control..Veh..Veh Abundance..F10..Control..Veh..Veh
1                         5.64e+10                          5.26e+10
2                         3.67e+09                          3.27e+09
3                         2.46e+10                          2.21e+10
4                         2.98e+10                          2.91e+10
5                         3.36e+07                          4.05e+07
6                         1.81e+09                          1.56e+09

colnames(DDA_Ab) <- c("Accession", "S1" , "S3" , "S5" , "S7", "S9", "S2", "S4", "S6" , "S8", "S10")
DDA_Ab %>% head()

  Accession       S1       S3       S5       S7       S9       S2       S4
1    P12883 5.25e+10 7.41e+10 5.28e+10 6.82e+10 6.43e+10 4.33e+10 7.41e+10
2    P13533 1.00e+09 1.47e+09 5.11e+09 4.25e+09 2.62e+09 1.50e+09 1.17e+09
3    Q8WZ42 1.39e+10 2.08e+10 2.19e+10 2.49e+10 2.34e+10 1.29e+10 2.25e+10
4    P68032 3.92e+10 4.11e+10 3.76e+10 3.54e+10 3.69e+10 3.54e+10 2.96e+10
5    P68133 1.72e+07 5.81e+07 3.85e+07 5.07e+07 6.19e+07 1.64e+07 3.44e+07
6    P60709 4.02e+09 8.43e+08 2.04e+09 1.62e+09 1.61e+09 4.16e+09 6.45e+08
        S6       S8      S10
1 7.00e+10 5.64e+10 5.26e+10
2 2.42e+09 3.67e+09 3.27e+09
3 2.52e+10 2.46e+10 2.21e+10
4 3.56e+10 2.98e+10 2.91e+10
5 5.42e+07 3.36e+07 4.05e+07
6 1.92e+09 1.81e+09 1.56e+09

# Identify duplicated values
DDA_Ab$duplicated_name <- duplicated(DDA_Ab$Accession)

# This will return a logical vector where TRUE indicates the position of duplicates in the column.
duplicated_rows <- DDA_Ab[DDA_Ab$duplicated_name == TRUE, ]
print(duplicated_rows)

    Accession       S1       S3       S5       S7       S9       S2       S4
61     P04264 3.87e+08 1.45e+08 2.58e+08 7.07e+08 3.17e+08 1.19e+10 1.11e+09
276    P35908       NA       NA       NA       NA       NA 5.31e+06 1.14e+06
997    P04040       NA       NA       NA       NA       NA       NA       NA
          S6       S8      S10 duplicated_name
61  8.03e+08 4.61e+08 5.45e+08            TRUE
276       NA 5.00e+05 4.90e+05            TRUE
997       NA       NA       NA            TRUE

duplicated_rows %>% rownames()

[1] "61"  "276" "997"

# remove duplicated rows
DDA_Ab <- DDA_Ab[-c(61 , 276, 997), ]

# Make rownames of the data frame accession IDs
rownames(DDA_Ab) <- DDA_Ab$Accession

# Change colnames 
colnames(DDA_Ab)

 [1] "Accession"       "S1"              "S3"              "S5"             
 [5] "S7"              "S9"              "S2"              "S4"             
 [9] "S6"              "S8"              "S10"             "duplicated_name"

DDA_Ab <- DDA_Ab[, -c(1, 12)]


DDA_Ab %>% head()

             S1       S3       S5       S7       S9       S2       S4       S6
P12883 5.25e+10 7.41e+10 5.28e+10 6.82e+10 6.43e+10 4.33e+10 7.41e+10 7.00e+10
P13533 1.00e+09 1.47e+09 5.11e+09 4.25e+09 2.62e+09 1.50e+09 1.17e+09 2.42e+09
Q8WZ42 1.39e+10 2.08e+10 2.19e+10 2.49e+10 2.34e+10 1.29e+10 2.25e+10 2.52e+10
P68032 3.92e+10 4.11e+10 3.76e+10 3.54e+10 3.69e+10 3.54e+10 2.96e+10 3.56e+10
P68133 1.72e+07 5.81e+07 3.85e+07 5.07e+07 6.19e+07 1.64e+07 3.44e+07 5.42e+07
P60709 4.02e+09 8.43e+08 2.04e+09 1.62e+09 1.61e+09 4.16e+09 6.45e+08 1.92e+09
             S8      S10
P12883 5.64e+10 5.26e+10
P13533 3.67e+09 3.27e+09
Q8WZ42 2.46e+10 2.21e+10
P68032 2.98e+10 2.91e+10
P68133 3.36e+07 4.05e+07
P60709 1.81e+09 1.56e+09

DDA_Ab %>% dim()

[1] 4501   10

DDA_Ab <- na.omit(DDA_Ab)
DDA_Ab %>% dim()

[1] 3384   10

DIA_Ab %>% head()

  Protein.Group Protein.Ids Protein.Names   Genes First.Protein.Description
1    A0A0B4J2A2  A0A0B4J2A2            NA PPIAL4C                        NA
2    A0A0B4J2D5  A0A0B4J2D5            NA  GATD3B                        NA
3    A0A494C071  A0A494C071            NA   PWWP4                        NA
4        A0AVT1      A0AVT1            NA    UBA6                        NA
5        A0FGR8      A0FGR8            NA   ESYT2                        NA
6        A0JLT2      A0JLT2            NA   MED19                        NA
          S1       S3       S5       S7       S9         S2       S4       S6
1   979954.0  1470000  1460000  1560000  1700000   940535.0  2020000  1460000
2 16000000.0 17000000 15800000 14400000 15500000 15300000.0 13500000 13600000
3   381352.0   429735   287237   412285   380499   433456.0   540008   376304
4    62810.7   135522   159045   123045   123823    49388.1   144860   127488
5  1240000.0  1120000   969130  1040000  1020000  1120000.0  1120000  1080000
6   217977.0   181404   182657   145408   208660   224881.0   202427   160150
        S8      S10
1  1390000  1620000
2 13200000 12600000
3   460580   355701
4    84684   102306
5  1120000  1060000
6   126127   209726

DIA_Ab %>% colnames()

 [1] "Protein.Group"             "Protein.Ids"              
 [3] "Protein.Names"             "Genes"                    
 [5] "First.Protein.Description" "S1"                       
 [7] "S3"                        "S5"                       
 [9] "S7"                        "S9"                       
[11] "S2"                        "S4"                       
[13] "S6"                        "S8"                       
[15] "S10"

# Subset the columns referring to abundance
DIA_Ab <-DIA_Ab[,c(2, 6:15)]
DIA_Ab %>% head()

  Protein.Ids         S1       S3       S5       S7       S9         S2
1  A0A0B4J2A2   979954.0  1470000  1460000  1560000  1700000   940535.0
2  A0A0B4J2D5 16000000.0 17000000 15800000 14400000 15500000 15300000.0
3  A0A494C071   381352.0   429735   287237   412285   380499   433456.0
4      A0AVT1    62810.7   135522   159045   123045   123823    49388.1
5      A0FGR8  1240000.0  1120000   969130  1040000  1020000  1120000.0
6      A0JLT2   217977.0   181404   182657   145408   208660   224881.0
        S4       S6       S8      S10
1  2020000  1460000  1390000  1620000
2 13500000 13600000 13200000 12600000
3   540008   376304   460580   355701
4   144860   127488    84684   102306
5  1120000  1080000  1120000  1060000
6   202427   160150   126127   209726

colnames(DIA_Ab) <- c("Accession", "S1" , "S3" , "S5" , "S7", "S9", "S2", "S4", "S6" , "S8", "S10")
DIA_Ab %>% head()

   Accession         S1       S3       S5       S7       S9         S2       S4
1 A0A0B4J2A2   979954.0  1470000  1460000  1560000  1700000   940535.0  2020000
2 A0A0B4J2D5 16000000.0 17000000 15800000 14400000 15500000 15300000.0 13500000
3 A0A494C071   381352.0   429735   287237   412285   380499   433456.0   540008
4     A0AVT1    62810.7   135522   159045   123045   123823    49388.1   144860
5     A0FGR8  1240000.0  1120000   969130  1040000  1020000  1120000.0  1120000
6     A0JLT2   217977.0   181404   182657   145408   208660   224881.0   202427
        S6       S8      S10
1  1460000  1390000  1620000
2 13600000 13200000 12600000
3   376304   460580   355701
4   127488    84684   102306
5  1080000  1120000  1060000
6   160150   126127   209726

# Identify duplicated values
DIA_Ab$duplicated_name <- duplicated(DIA_Ab$Accession)

# This will return a logical vector where TRUE indicates the position of duplicates in the column.
duplicated_rows <- DIA_Ab[DIA_Ab$duplicated_name == TRUE, ]
print(duplicated_rows)

 [1] Accession       S1              S3              S5             
 [5] S7              S9              S2              S4             
 [9] S6              S8              S10             duplicated_name
<0 rows> (or 0-length row.names)

duplicated_rows %>% rownames()

character(0)

# remove duplicated rows-None to remove for DIA 


# Make rownames of the data frame accession IDs
rownames(DIA_Ab) <- DIA_Ab$Accession

# Change colnames 
colnames(DIA_Ab)

 [1] "Accession"       "S1"              "S3"              "S5"             
 [5] "S7"              "S9"              "S2"              "S4"             
 [9] "S6"              "S8"              "S10"             "duplicated_name"

DIA_Ab <- DIA_Ab[, -c(1, 12)]


DIA_Ab <- na.omit(DIA_Ab)



Meta

    Samples  Ind  Rep    Cond     Cond_Ind
S1       S1 B_77  Bio     Dox     B_77_Dox
S3       S3 C_87  Bio     Dox     C_87_Dox
S5       S5 A_48 Tech     Dox     A_48_Dox
S7       S7 A_48 Tech     Dox     A_48_Dox
S9       S9 A_48 Tech     Dox     A_48_Dox
S2       S2 B_77  Bio Control B_77_Control
S4       S4 C_87  Bio Control C_87_Control
S6       S6 A_48 Tech Control A_48_Control
S8       S8 A_48 Tech Control A_48_Control
S10     S10 A_48 Tech Control A_48_Control

# DDA_Ab_Veh <- DDA_Ab[, c(6:10)]
# DIA_Ab_Veh <- DIA_Ab[,c(6:10)]

DDA_Ab_Veh <- DDA_Ab
DIA_Ab_Veh <- DIA_Ab

DDA_Ab_Veh %>% head()

             S1       S3       S5       S7       S9       S2       S4       S6
P12883 5.25e+10 7.41e+10 5.28e+10 6.82e+10 6.43e+10 4.33e+10 7.41e+10 7.00e+10
P13533 1.00e+09 1.47e+09 5.11e+09 4.25e+09 2.62e+09 1.50e+09 1.17e+09 2.42e+09
Q8WZ42 1.39e+10 2.08e+10 2.19e+10 2.49e+10 2.34e+10 1.29e+10 2.25e+10 2.52e+10
P68032 3.92e+10 4.11e+10 3.76e+10 3.54e+10 3.69e+10 3.54e+10 2.96e+10 3.56e+10
P68133 1.72e+07 5.81e+07 3.85e+07 5.07e+07 6.19e+07 1.64e+07 3.44e+07 5.42e+07
P60709 4.02e+09 8.43e+08 2.04e+09 1.62e+09 1.61e+09 4.16e+09 6.45e+08 1.92e+09
             S8      S10
P12883 5.64e+10 5.26e+10
P13533 3.67e+09 3.27e+09
Q8WZ42 2.46e+10 2.21e+10
P68032 2.98e+10 2.91e+10
P68133 3.36e+07 4.05e+07
P60709 1.81e+09 1.56e+09

DIA_Ab_Veh %>% head()

                   S1       S3       S5       S7       S9         S2       S4
A0A0B4J2A2   979954.0  1470000  1460000  1560000  1700000   940535.0  2020000
A0A0B4J2D5 16000000.0 17000000 15800000 14400000 15500000 15300000.0 13500000
A0A494C071   381352.0   429735   287237   412285   380499   433456.0   540008
A0AVT1        62810.7   135522   159045   123045   123823    49388.1   144860
A0FGR8      1240000.0  1120000   969130  1040000  1020000  1120000.0  1120000
A0JLT2       217977.0   181404   182657   145408   208660   224881.0   202427
                 S6       S8      S10
A0A0B4J2A2  1460000  1390000  1620000
A0A0B4J2D5 13600000 13200000 12600000
A0A494C071   376304   460580   355701
A0AVT1       127488    84684   102306
A0FGR8      1080000  1120000  1060000
A0JLT2       160150   126127   209726

DDA_Ab_Veh_Avg <- rowMeans(DDA_Ab_Veh)
DDA_Ab_Veh_Avg_DF <- as.data.frame(DDA_Ab_Veh_Avg)
DDA_Ab_Veh_Avg_DF$Pro <- rownames(DDA_Ab_Veh_Avg_DF)
DDA_Ab_Veh_Avg_DF %>% head()

       DDA_Ab_Veh_Avg    Pro
P12883     6.0830e+10 P12883
P13533     2.6480e+09 P13533
Q8WZ42     2.1220e+10 Q8WZ42
P68032     3.4970e+10 P68032
P68133     4.0550e+07 P68133
P60709     2.0228e+09 P60709

DIA_Ab_Veh_Avg <- rowMeans(DIA_Ab_Veh)
DIA_Ab_Veh_Avg_DF <- as.data.frame(DIA_Ab_Veh_Avg)
DIA_Ab_Veh_Avg_DF$Pro <- rownames(DIA_Ab_Veh_Avg_DF)
DIA_Ab_Veh_Avg_DF %>% head()

           DIA_Ab_Veh_Avg        Pro
A0A0B4J2A2      1460048.9 A0A0B4J2A2
A0A0B4J2D5     14690000.0 A0A0B4J2D5
A0A494C071       405715.7 A0A494C071
A0AVT1           111297.2     A0AVT1
A0FGR8          1088913.0     A0FGR8
A0JLT2           185941.7     A0JLT2

Merged_Veh_Avg <- merge(DDA_Ab_Veh_Avg_DF, DIA_Ab_Veh_Avg_DF, by = "Pro")
Merged_Veh_Avg %>% head()

         Pro DDA_Ab_Veh_Avg DIA_Ab_Veh_Avg
1 A0A0B4J2D5      369700000     14690000.0
2     A0FGR8       10417000      1088913.0
3     A1A4S6         815615       247943.3
4     A1L0T0       21560000      3611000.0
5     A2RRP1       19790000      1585000.0
6     A3KMH1       48710000      2353000.0

Merged_Veh_Avg %>% dim()

[1] 3026    3

Merged_Veh_Avg$DDA_Ab_Veh_Avg_log2 <- log2(Merged_Veh_Avg$DDA_Ab_Veh_Avg)
Merged_Veh_Avg$DIA_Ab_Veh_Avg_log2 <- log2(Merged_Veh_Avg$DIA_Ab_Veh_Avg)




model_Abundances <- lm(DDA_Ab_Veh_Avg_log2 ~ DIA_Ab_Veh_Avg_log2, data = Merged_Veh_Avg)

# Get R-squared
r2 <- summary(model_Abundances)$r.squared
summary(model_Abundances)


Call:
lm(formula = DDA_Ab_Veh_Avg_log2 ~ DIA_Ab_Veh_Avg_log2, data = Merged_Veh_Avg)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.1626  -0.7112   0.0494   0.7583   7.1270 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          0.37743    0.25002    1.51    0.131    
DIA_Ab_Veh_Avg_log2  1.12452    0.01197   93.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.238 on 3024 degrees of freedom
Multiple R-squared:  0.7449,    Adjusted R-squared:  0.7448 
F-statistic:  8830 on 1 and 3024 DF,  p-value: < 2.2e-16

#------ Plot for both same sign and not same sign
ggplot2::ggplot(Merged_Veh_Avg, aes(x = DDA_Ab_Veh_Avg_log2, y = DIA_Ab_Veh_Avg_log2)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +
  annotate("text", x = Inf, y = Inf, vjust = 2, hjust = 2,
           label = paste("R^2 = ", round(r2, digits = 2), sep = ""),
           color = "black") +
  scale_color_manual(values = c("red", "blue")) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'

sessionInfo()

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] VennDiagram_1.7.3   futile.logger_1.4.3 lubridate_1.9.2    
 [4] forcats_1.0.0       stringr_1.5.0       dplyr_1.1.2        
 [7] purrr_1.0.2         readr_2.1.4         tidyr_1.3.0        
[10] tibble_3.2.1        ggplot2_3.4.3       tidyverse_2.0.0    
[13] workflowr_1.7.1    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.11          lattice_0.21-8       getPass_0.2-2       
 [4] ps_1.7.5             rprojroot_2.0.3      digest_0.6.33       
 [7] utf8_1.2.3           R6_2.5.1             futile.options_1.0.1
[10] evaluate_0.21        httr_1.4.7           highr_0.10          
[13] pillar_1.9.0         rlang_1.1.1          rstudioapi_0.15.0   
[16] whisker_0.4.1        callr_3.7.3          jquerylib_0.1.4     
[19] Matrix_1.5-4.1       rmarkdown_2.24       labeling_0.4.2      
[22] splines_4.2.0        munsell_0.5.0        compiler_4.2.0      
[25] httpuv_1.6.11        xfun_0.40            pkgconfig_2.0.3     
[28] mgcv_1.9-0           htmltools_0.5.6      tidyselect_1.2.0    
[31] fansi_1.0.4          tzdb_0.4.0           withr_2.5.0         
[34] later_1.3.1          nlme_3.1-163         jsonlite_1.8.7      
[37] gtable_0.3.4         lifecycle_1.0.3      formatR_1.14        
[40] git2r_0.32.0         magrittr_2.0.3       scales_1.2.1        
[43] cli_3.6.1            stringi_1.7.12       cachem_1.0.8        
[46] farver_2.1.1         fs_1.6.3             promises_1.2.1      
[49] bslib_0.5.1          generics_0.1.3       vctrs_0.6.3         
[52] lambda.r_1.2.4       tools_4.2.0          glue_1.6.2          
[55] hms_1.1.3            processx_3.8.2       fastmap_1.1.1       
[58] yaml_2.3.7           timechange_0.2.0     colorspace_2.1-0    
[61] knitr_1.43           sass_0.4.7

DDA_DIA_Comparison

Omar Johnson

2024-07-23