Storing API data - cloud bucket vs vector db

Hi everyone,

I’m looking to ingest weather-related data from a public website (for example, BBC Weather) via an API and store it for downstream analytical use cases.

I’m trying to decide on the most appropriate storage approach and would appreciate some guidance. The two options I’m considering are:

  • Storing the raw data in a cloud object store (e.g., S3/GCS/Azure Blob)

  • Storing the data in a vector database

Given that the primary goal is analytics (historical analysis, trends, aggregations, etc.), which option would be more suitable, and why? Are there any best practices or alternative architectures I should consider?

Thanks in advance for your insights.

Kind regards

A vector database is the only AI product that you have mentioned. However, that will provide semantic similarity results, based on language and topic. Semantic search is not good for raw data figures.

There’s likely no need to use cloud storage, either, unless you are also automating with a cloud worker. Data analysis will likely be on data in bulk. If you need to make queries for subsets, you might use a traditional relational database, but it seems this will probably be underlying 2D data by API query that you retrieve.

Thus, this general application is something to design with the assistance of an AI, as I don’t see surface for it to be powered by AI, the focus of this forum and OpenAI’s products.

3 Likes

Thanks @_j for your inputs, that sounds good logically !!

I would go with a basic database … my all time fav here would be a DynamoDB table, with a sort key on the integer UNIX timestamp. So you can do a hash/primary key of ZipCode (string) with UNIX (integer) as a sort. Pull in data over different time frames. The records should be small enough to fit in the table.

If there is a semantic piece, you can augment the table with the embedding vector, then for search, gather these in a serialized data object for semantic correlation (use Numpy for speed), in memory, without all the crazy overhead of a turn-key vector store. Just pass the ZipCode/UNIX to your table for the highest correlations to inject content into the LLM prompt.

1 Like