The retrieval is unusable slow - it is not what it is promised

I have tried to create an assistant for retrievals of 60mb of plain text in 20 files. gpt4-turbo complained about max 10000 tokens per minute limit so i used gpt 3.5 turbo which has 90000 tpm limit. it did not return an answer even after 30 minutes. i expected it work based on embeddings however as it looks like it feeds all text of all files into the current gpt as input which i afraid will be will be costly and looks unusable.

While the post about retrieval said that they would use embeddings, there was no explicit mention of the size of file input required for the model to treat them as big enough to use embeddings. While 20 files worth 60 mb looks big enough, it might not fit their criteria.

Also, the assistant has been a bit slow in general, atleast for me but again 30 mins is too long a time