You can expect that AI models from OpenAI themselves do have a bit of “activation” by seeing markdown. We really don’t know how OpenAI trained beyond corpus pretraining.
This is a question where the embeddings model can answer its own question by the score difference it can return.
Python embeddings code where I created a non-markdown version of some markdown text as the second string to send to text-embeddings-3-large model:
import os, base64, httpx
import numpy as np
texts = [r"""
## Why “parallel streams” isn’t great here
You *can* do something like:
```python
async def get_chat_response(...) -> tuple[AsyncIterator[str], AsyncIterator[UIEvent]]:
...
```
But then the application must concurrently consume both streams:
* coordinate termination (when assistant ends vs when UI ends),
* ensure neither iterator blocks the other,
* handle exceptions from either side,
* cancel both properly on exit.
That pushes complexity into your “application layer,” which you explicitly want to avoid.
## Alternative: callback/hook for sideband messages
If you insist that the main application only iterate assistant text, you can keep:
```python
AsyncIterator[str]
```
…and deliver sideband messages via a callback:
```python
from typing import Awaitable, Callable
UIHook = Callable[[Event], Awaitable[None]]
async def get_chat_response(prompt: str, *, stream: bool = False, ui: UIHook | None = None) -> AsyncIterator[str]:
if ui:
await ui(Event(type="ui", text="Calling model..."))
...
```
""".strip(),
# --- string 2 ---
r"""
Why “parallel streams” isn’t great here
You *can* do something like:
async def get_chat_response(...) -> tuple[AsyncIterator[str], AsyncIterator[UIEvent]]:
...
But then the application must concurrently consume both streams:
• coordinate termination (when assistant ends vs when UI ends),
• ensure neither iterator blocks the other,
• handle exceptions from either side,
• cancel both properly on exit.
That pushes complexity into your “application layer,” which you explicitly want to avoid.
Alternative: callback/hook for sideband messages
If you insist that the main application only iterate assistant text, you can keep AsyncIterator[str], and deliver sideband messages via a callback:
from typing import Awaitable, Callable
UIHook = Callable[[Event], Awaitable[None]]
async def get_chat_response(prompt: str, *, stream: bool = False, ui: UIHook | None = None) -> AsyncIterator[str]:
if ui:
await ui(Event(type="ui", text="Calling model..."))
...
""".strip(),
# ... up to 2048 strings
]
dimensions = 3072 # 1536 max for 3-small or ada-002; 3072 max for 3-large
params = {
"model": "text-embedding-3-large",
"input": texts,
"encoding_format": "float", # "base64" | "float"
"dimensions": dimensions,
}
try:
with httpx.Client(timeout=1800) as client:
resp = client.post(
"https://api.openai.com/v1/embeddings",
headers={"Authorization": f"Bearer {os.getenv("OPENAI_API_KEY")}"},
json=params,
)
resp.raise_for_status()
except httpx.HTTPStatusError as e:
print(f"Request failed: {e}")
if e.response is not None:
try:
# print body error messages from OpenAI
print("Error response body:\n", e.response.text)
except Exception:
raise
raise
except httpx.RequestError as e:
print(f"Request error: {e}")
raise
else:
print(f"For {len(texts)} texts, received {len(resp.content)} char body")
print(f"Snippet: {resp.text[:600]}")
response_dict = resp.json()
data_list = response_dict["data"]
data_list.sort(key=lambda d: d["index"])
n = len(data_list) # received item count
embeddings = np.empty((n, dimensions), dtype=np.float32) # reserve full block
# Each "embedding" return may be a base64 string of raw float32 bytes,
# or a plain text list of floats.
# example b64: "embedding": "Lr4Zvt1heb+KPae7Af1YPdYGJD4=" (5 dimensions)
for row_idx, item in enumerate(data_list):
e = item["embedding"]
vector = (
np.frombuffer(base64.b64decode(e, validate=True), dtype=np.float32, count=dimensions)
if isinstance(e, str)
else np.asarray(e, dtype=np.float32)
)
embeddings[row_idx] = vector
# `embeddings` is now an (n * dims) NumPy array in the same order as `texts`
print(embeddings.shape) # count, dimensions, such as (2, 1536)
print("Similarity: ", np.dot(embeddings[0], embeddings[1])) # dot product comparison
At the end, a dot product gives semantic similarity on the pre-normalized vectors.
(2, 3072)
Similarity: 0.9751775
So they live in the same space, but like any token change, and here there are several, the AI understanding, math, and result is going to be different.
I completely change the intention behind the second text, by appending:
### QUESTION
What is this passage discussing
“”"
Which despite leaving a different state, Similarity: 0.95712936
You’ll likely need to consider in a ranking situation getting back good vs bad chunks, your underlying question. I’ll “ask” a query and see how both of those passages score:
Similarity: 0.32220316
Similarity: 0.32865414
Looks like there’s not going to be much shift in how that compares to other chunks in a document retrieval situation.
If making a large trial of varied texts with and without, and algorithmically profiling all the dimensions returned by an embeddings model, one might find that there are a few value positions that have “markdown detected” as a characteristic. Identify those, use them exclusively and make a semantic markdown detector AI?