Hi,
I have a question, I have obtained scores ranging from (0 to 10) resulting GPT4 API and ChatGPT prompts but they are based ran on the same biological processes. Now, I am interested to correlate these scores or find concordance. I am seeking help to how to best represent these scores or correlate scores. I tried using ggplot
heatmap (code below) but this wasn’t very helpful because I could not correlate which row (feature) belong to which column (process), but, the issue is there “Inf” values in the dataframe. I am interested to correlate each feature to each process or column, and find concordance. Probably, I am not sure, a heatmap coupled with boxplot or violin plot would help in this case?
Ps: log(Method_1 / Method_2) will return Inf if Method_2 has a value of zero, it will return -Inf if Method_1 is zero, and it will return NaN if both values are zero.
The scores are in the form of 2 R dataframes as templates given below:
API scores
dput(Method_1)
#> A_Process B_Process C_Process D_Process E_Process F_Process G_Process
#> Feature_1 9 5 0 5 7 0 3
#> Feature_2 2 6 2 4 7 7 4
#> Feature_3 0 0 0 0 0 0 2
#> Feature_4 2 4 1 1 7 0 2
#> Feature_5 0 6 0 0 6 7 2
#> H_Process I_Process J_Process
#> Feature_1 4 6 2
#> Feature_2 0 0 0
#> Feature_3 5 7 7
#> Feature_4 3 5 6
#> Feature_5 3 5 6
ChatGPT scores
dput(Method_2)
#> A_Process B_Process C_Process D_Process E_Process F_Process G_Process
#> Feature_1 1 5 2 3 7 2 1
#> Feature_2 9 6 1 3 6 7 0
#> Feature_3 9 7 2 6 6 2 3
#> Feature_4 0 8 6 6 8 7 4
#> Feature_5 8 5 2 2 6 6 3
#> H_Process I_Process J_Process
#> Feature_1 1 3 8
#> Feature_2 0 5 5
#> Feature_3 4 4 7
#> Feature_4 4 4 7
#> Feature_5 5 4 6
log_ratios <- log(Method_1 / Method_2)
# Heatmap visualization
library(ggplot2)
library(reshape2)
# Assuming 'log_ratios' is a dataframe where the rownames are features
log_ratios$Feature <- rownames(log_ratios)
log_ratios_melted <- melt(log_ratios, id.vars = "Feature")
# Now plotting
ggplot(log_ratios_melted, aes(variable, Feature, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0) +
theme_minimal() +
xlab("Process") +
ylab("Feature") +
ggtitle("Heatmap of Concordance")
ggplot(log_ratios_melted, aes(x = Feature, y = value)) +
geom_violin(trim = FALSE) +
facet_wrap(~ variable, scales = "free_y") + # Ensure 'variable' is the column with process names
theme_minimal() +
xlab("Feature") +
ylab("Log Ratio") +
ggtitle("Question wise distribution of concordance")
Best Regards,
Abdul