Testing Ada 002 and Other Embedders with Large Texts

Hi, community.

In my research, I needed to compare the semantic textual similarity of generated text to a ‘standard’ text. Specifically, I had GPT generate an answer, and wanted to compare the generated answer to the ideal (standard) answer. The problem for me, though, is that my generated and standard answers were around 500-600 tokens. Standard datasets like SemEval exists for sentence based comparison, but to my knowledge no datasets exist for large bodies of text. To solve this issue, I used different translations of the Bible.

Above you see Mark Chapter 1 compared against different translations (i.e., American Standard Version, Basic Bible English, King James, New International, New KJV, New Living Translation, and World English Bible).

I then compared different lengths (“NIV 1st-10”), slightly altered versions of Mark 1 and an excerpt from the US Constitution as negative tests. Finally, I also compared two versions of the Bible in German (Hoffnung fuer Alle and Schlacter 2000) and two in Spanish.

I would love your feedback on this approach, and your thoughts on shortcomings and potential improvements. (Praise is also cautiously accepted. :slight_smile: )

1 Like

You did the work, always deserving praise.

It is an extremely specific investigation of one task for discrimination that doesn’t really generalize. Comparison of multiple awkward old texts.

Another bible embedding online was insightful, letting you find who’s writing about dragons and unicorns (yes really), and who is plagiarizing other books wholesale (yes really).

To pursue further, you might follow another line of inquiry I’ve been curious about: can ada benefit from sentence-wise embeddings of large text chunks for retrieval of the whole text. Would a weighted average (or other algorithm) of 20 runs stepping through the text have higher or lower metrics vs the single embedding run of the whole?

Praise

NLT seems suspiciously abnormal. These results aren’t really what I was expecting. I guess we’ll have to take a closer look and double check.

Hi! So I tried to reproduce your results, and the numbers generally check out. I couldn’t reproduce your low NLT scores, which I feel is a good thing. Maybe you mixed in a foreign language version? lu17 here is the german lutheran version.

I also tried a search retrieval of individual lines of NLT (split by newline):

mark1_nlt text

`mark1_nlt = “”"Mark 1

1 This is the Good News about Jesus the Messiah, the Son of God. It began
2 just as the prophet Isaiah had written: “Look, I am sending my messenger ahead of you, and he will prepare your way.
3 He is a voice shouting in the wilderness, ‘Prepare the way for the LORD ’s coming! Clear the road for him!’ ”
4 This messenger was John the Baptist. He was in the wilderness and preached that people should be baptized to show that they had repented of their sins and turned to God to be forgiven.
5 All of Judea, including all the people of Jerusalem, went out to see and hear John. And when they confessed their sins, he baptized them in the Jordan River.
6 His clothes were woven from coarse camel hair, and he wore a leather belt around his waist. For food he ate locusts and wild honey.
7 John announced: “Someone is coming soon who is greater than I am—so much greater that I’m not even worthy to stoop down like a slave and untie the straps of his sandals.
8 I baptize you with water, but he will baptize you with the Holy Spirit!”
9 One day Jesus came from Nazareth in Galilee, and John baptized him in the Jordan River.
10 As Jesus came up out of the water, he saw the heavens splitting apart and the Holy Spirit descending on him like a dove.
11 And a voice from heaven said, “You are my dearly loved Son, and you bring me great joy.”
12 The Spirit then compelled Jesus to go into the wilderness,
13 where he was tempted by Satan for forty days. He was out among the wild animals, and angels took care of him.
14 Later on, after John was arrested, Jesus went into Galilee, where he preached God’s Good News.
15 “The time promised by God has come at last!” he announced. “The Kingdom of God is near! Repent of your sins and believe the Good News!”
16 One day as Jesus was walking along the shore of the Sea of Galilee, he saw Simon and his brother Andrew throwing a net into the water, for they fished for a living.
17 Jesus called out to them, “Come, follow me, and I will show you how to fish for people!”
18 And they left their nets at once and followed him.
19 A little farther up the shore Jesus saw Zebedee’s sons, James and John, in a boat repairing their nets.
20 He called them at once, and they also followed him, leaving their father, Zebedee, in the boat with the hired men.
21 Jesus and his companions went to the town of Capernaum. When the Sabbath day came, he went into the synagogue and began to teach.
22 The people were amazed at his teaching, for he taught with real authority—quite unlike the teachers of religious law.
23 Suddenly, a man in the synagogue who was possessed by an evil spirit began shouting,
24 “Why are you interfering with us, Jesus of Nazareth? Have you come to destroy us? I know who you are—the Holy One of God!”
25 Jesus cut him short. “Be quiet! Come out of the man,” he ordered.
26 At that, the evil spirit screamed, threw the man into a convulsion, and then came out of him.
27 Amazement gripped the audience, and they began to discuss what had happened. “What sort of new teaching is this?” they asked excitedly. “It has such authority! Even evil spirits obey his orders!”
28 The news about Jesus spread quickly throughout the entire region of Galilee.
29 After Jesus left the synagogue with James and John, they went to Simon and Andrew’s home.
30 Now Simon’s mother-in-law was sick in bed with a high fever. They told Jesus about her right away.
31 So he went to her bedside, took her by the hand, and helped her sit up. Then the fever left her, and she prepared a meal for them.
32 That evening after sunset, many sick and demon-possessed people were brought to Jesus.
33 The whole town gathered at the door to watch.
34 So Jesus healed many people who were sick with various diseases, and he cast out many demons. But because the demons knew who he was, he did not allow them to speak.
35 Before daybreak the next morning, Jesus got up and went out to an isolated place to pray.
36 Later Simon and the others went out to find him.
37 When they found him, they said, “Everyone is looking for you.”
38 But Jesus replied, “We must go on to other towns as well, and I will preach to them, too. That is why I came.”
39 So he traveled throughout the region of Galilee, preaching in the synagogues and casting out demons.
40 A man with leprosy came and knelt in front of Jesus, begging to be healed. “If you are willing, you can heal me and make me clean,” he said.
41 Moved with compassion, Jesus reached out and touched him. “I am willing,” he said. “Be healed!”
42 Instantly the leprosy disappeared, and the man was healed.
43 Then Jesus sent him on his way with a stern warning:
44 “Don’t tell anyone about this. Instead, go to the priest and let him examine you. Take along the offering required in the law of Moses for those who have been healed of leprosy. This will be a public testimony that you have been cleansed.”
45 But the man went and spread the word, proclaiming to everyone what had happened. As a result, large crowds soon surrounded Jesus, and he couldn’t publicly enter a town anywhere. He had to stay out in the secluded places, but people from everywhere kept coming to him.“”"`

for anyone wondering
22 (line 21) talks about going to capernaeum, which happens both in mark 1 and mark 2, but in mark 2 it’s literally the first line.

overall, there’s not much you can read out of this data IMO, other than that data hygiene is important.

1 Like