Hi,
I have created an application where the input file is dynamic. The input files are chuncked and then these chunks are passed into the Azure OpenAI embedding model text-ada-002.
The created embeddings are then appened into a single list to create a single field entry in the Azure Index.
But when I do I get the following error.
OperationNotAllowed\nMessage: The request is invalid. Details: actions : 0: The vector field 'content_vector' dimensionality must match the field definition's 'dimensions' property. Expected: '2048'. Actual: '104448'.
I know this is related to Azure, but I want to know if I’m doing the concatenation of the embeddings wrong.
def get_chunks(text):
Logging.log_info("Starting text chunking.")
try:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=int(CHUNK_SIZE),
chunk_overlap=int(OVERLAP),
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([text])
Logging.log_info(f"when fetching text split {texts}, length of chunks {len(texts)}")
return texts
except Exception as error:
Logging.log_errors(f'Could not chunk the text: {error}')
Azure OpenAI code
def get_vector_embeddings(text):
try:
Logging.log_info("Trying to get vector embeddings")
client = AzureOpenAI(api_key=AZURE_OPENAI_KEY,
api_version=API_VERSION,
azure_endpoint=AZURE_ENDPOINT)
embeddings = client.embeddings.create(input=[text], model=model_name).data[0].embedding
Logging.log_info(f"Vector embedding success{embeddings}")
return embeddings.tolist()
except Exception as error:
Logging.log_errors(f'Could not get vector embeddings: {error}')
Embedding append to list
if get_token_count(represented_text) > 8000:
chunks = get_chunks(represented_text)
embeddings = []
for document in chunks:
embeddings.extend(get_vector_embeddings(document.page_content))
PS: I’m using the RecursiveCharacterTextSplitter of the langchain framework to get the chunks