How to best visualize, compare or correlate scores obtained from GPT4 API and ChatGPT?

Hi,

I have a question, I have obtained scores ranging from (0 to 10) resulting GPT4 API and ChatGPT prompts but they are based ran on the same biological processes. Now, I am interested to correlate these scores or find concordance. I am seeking help to how to best represent these scores or correlate scores. I tried using ggplot heatmap (code below) but this wasn’t very helpful because I could not correlate which row (feature) belong to which column (process), but, the issue is there “Inf” values in the dataframe. I am interested to correlate each feature to each process or column, and find concordance. Probably, I am not sure, a heatmap coupled with boxplot or violin plot would help in this case?

Ps: log(Method_1 / Method_2) will return Inf if Method_2 has a value of zero, it will return -Inf if Method_1 is zero, and it will return NaN if both values are zero.

The scores are in the form of 2 R dataframes as templates given below:

API scores

dput(Method_1)
#>           A_Process B_Process C_Process D_Process E_Process F_Process G_Process
#> Feature_1         9         5         0         5         7         0         3
#> Feature_2         2         6         2         4         7         7         4
#> Feature_3         0         0         0         0         0         0         2
#> Feature_4         2         4         1         1         7         0         2
#> Feature_5         0         6         0         0         6         7         2
#>           H_Process I_Process J_Process
#> Feature_1         4         6         2
#> Feature_2         0         0         0
#> Feature_3         5         7         7
#> Feature_4         3         5         6
#> Feature_5         3         5         6

ChatGPT scores

dput(Method_2)
#>           A_Process B_Process C_Process D_Process E_Process F_Process G_Process
#> Feature_1         1         5         2         3         7         2         1
#> Feature_2         9         6         1         3         6         7         0
#> Feature_3         9         7         2         6         6         2         3
#> Feature_4         0         8         6         6         8         7         4
#> Feature_5         8         5         2         2         6         6         3
#>           H_Process I_Process J_Process
#> Feature_1         1         3         8
#> Feature_2         0         5         5
#> Feature_3         4         4         7
#> Feature_4         4         4         7
#> Feature_5         5         4         6
log_ratios <- log(Method_1 / Method_2)

# Heatmap visualization
library(ggplot2)
library(reshape2)

# Assuming 'log_ratios' is a dataframe where the rownames are features
log_ratios$Feature <- rownames(log_ratios)
log_ratios_melted <- melt(log_ratios, id.vars = "Feature")

# Now plotting
ggplot(log_ratios_melted, aes(variable, Feature, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0) +
  theme_minimal() +
  xlab("Process") +
  ylab("Feature") +
  ggtitle("Heatmap of Concordance")

ggplot(log_ratios_melted, aes(x = Feature, y = value)) +
  geom_violin(trim = FALSE) +
  facet_wrap(~ variable, scales = "free_y") + # Ensure 'variable' is the column with process names
  theme_minimal() +
  xlab("Feature") +
  ylab("Log Ratio") +
  ggtitle("Question wise distribution of concordance")

Best Regards,
Abdul