Is there any better way to store dates in embedding?
I would like to share my experience regarding adding dates in a specific format while creating an embedding. By including the date format, such as “ABC born on 1 March 2023,” each word is stored as a keyword in the embedding. This enables relevant records to be pulled when conducting a search query for specific events, such as “ABC born on March 2023,” “ABC born on 1 March 2023,” or “Find who born in 2023.” As a result. I dont know this is best way, because i have to reformat dates before embedding, is there any best practice for this.
Could you maybe pre-process and make sure to add several date formats? I’ve not done a lot of embedding myself yet, so I may be way off base here. What about 3/1/2023, etc? Maybe write a quick and dirty script to query a lower level API (Curie?) and feed it one date and get back the date + variations in a normalized format?
Yes I can pre-process embedding data to make all dates standard like 1 March 2023, this format work because here we can query data by full date either month or year as well, I don’t know if it is the best practice, so if anyone handles this in a better way I want to know.
The best answer depends on more specific use cases. In addition to pre-process, you could use a post-process to filter/sort the results by keeping the date in metadata associated with the embedding.
If you are going to embed the date, I’d 1) place it at the beginning of the embedding input and 2) write it in a long-form format like “Tuesday, May 2nd, 2023”.
Thanks, input embedding in this format then it cares a lot about date and time.
On August 23, 2023 at 10:30 AM, there will be a Sales pipeline review at the Boston office.
On May 24, 2023 at 4:00 PM, there will be a Meeting with client G at the corporate office.