Fixing ‘ascii’ codec can’t encode ‘\u2014’ error in OpenAI API during vector store embedding

While working on a RAG-related project, I attempted to store the loaded pages into a vector store for building a web-referenced RAG pipeline. However, I encountered an error such as “‘ascii’ codec can’t encode character ‘\u2014’ in position 160: ordinal not in range(128)”. I tried various methods to resolve it (forcing UTF-8 encoding, replacing \u2014 with a dash, etc.), but nothing worked. When using COHERE’s embedding model, the embedding completes without any issues, which leads me to believe that it is due to differences in internal encoding. Is there any way to resolve this issue when using OpenAI’s API?
Below is the code.

import os
import sys
import io
from urllib.parse import quote
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings 
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma

os.environ["PYTHONUTF8"] = "1"


urls = [
    "https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C",
]
urls = [quote(url, safe=":/") for url in urls]

embd = OpenAIEmbeddings(model="text-embedding-3-small", api_key=key)

# Load
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512,
    chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

vectorstore = Chroma.from_documents(
    documents=doc_splits,
    embedding=embd,
)

retriever = vectorstore.as_retriever()

I’m running this in a Conda virtual environment, and I’ve tried everything I could think of—setting environment variables, forcing encoding, performing conversions, etc.—but nothing seems to work at all.