Feeding Own PHP Application Data and ask related Questions?

ak6 · June 14, 2024, 7:48am

Hi,
Any body can help I want to feed my Own PHP application data into Open AI and asked dynamic question. The data is more than 10K+. and it is increased day by day. Will need to feed on daily routine.

Which API is best suitable for large amount of datasets.

Please suggest me the best solution.

Any help will be highly appreciated.

supershaneski · June 17, 2024, 5:46am

where is the data coming from? file? database?

you can use either chat completions or assistants api then use function calling to interface with the source of your data.

ak6 · June 17, 2024, 5:54am

I am putting my all data into a File stored in my server. check below is my code for reference

public function chatgpt_ask_question(Request $request) {

        $question     = $request->input('question');
        $jsonFilePath = public_path('daily_summary.json');
        $context      = file_get_contents($jsonFilePath);
        $answer       = $this->openAIService->askQuestion($question, $context);

        return response()->json(['answer' => 'HVG AI: '.$answer]);
    }
public function askQuestion($question, $context) {
        $contextChunks = $this->chunkText($context, $this->maxTokens - 1000);  // Reserving tokens for the question and response

        $responses = [];
        foreach ($contextChunks as $chunk) {
            $responses[] = $this->sendRequest($chunk, $question);
        }

        return implode("\n", $responses);
    }
private function sendRequest($context, $question)  {
        $response = $this->client->chat()->create([
            'model' => 'gpt-4o',  // Adjust as necessary
            'messages' => [
                ['role' => 'system', 'content' => 'You are a helpful assistant.'],
                ['role' => 'user', 'content' => $context . "\n\nQuestion: " . $question],
            ],
            'max_tokens' => 150,
            'temperature' => 0.7,
        ]);

        return $response['choices'][0]['message']['content'];
    }

supershaneski · June 17, 2024, 6:05am

i see that you are attaching everything in the user message. this will be okay for short runs but as you mentioned your data is growing everyday. you want to only refer to it when it is needed. i am not sure what kind of data do you have but you might want to define some functions that will take info from this json file based on user inquiry. let say your daily_summary.json contains the top upvoted posts and it is labeled as such. you can define a simple function get_top_upvoted_post and when it is invoked you then read the top upvoted posts and send it back to the API as tool output.

ak6 · June 17, 2024, 6:09am

@supershaneski FYI, the json file contain consumer name, email , orders, state, country etc … information in this file (the data is more than 10K+). Lets as an example the question would be how many users from USA ? what is the today order total ? etc.

supershaneski · June 17, 2024, 6:22am

i see. in such case let’s define a simple function:

{
  "name": "get_users_by_country",
  "parameters": {
    "type": "object",
    "properties": {
      "country": {
        "type": "string",
        "description": "country code based on ISO 3166-2",
        "enum": [
          "UA",
          "UG",
          "US",
          "UY",
          "UZ",
          "VA",
          "VC"
        ]
      }
    },
    "required": [
      "country"
    ]
  },
  "description": "Get number of users by given country"
}

this will be invoked when you ask questions like “how many users from USA?”. see sample usage below:

you do not need to feed the whole json. you can easily get your answer by parsing the json and just sending the result back as tool output like shown above.

ak6 · June 17, 2024, 6:47am

OK, but the problem is question could be anything related to data and I cannot define function for each questions.

supershaneski · June 17, 2024, 6:58am

for all easy questions, provide function. for the rest, use semantic search by doing RAG. this means chunking your data then getting the embedding for each chunk. if there is a new data, just chunk, get the embedding and save in your DB. create a general inquiry function like get_user_inquiry. it might be invoked like this:

get_user_inquiry({“inquiry”: “which order contains product ABC?”})

you get the embedding for “which order contains product ABC?” and use semantic search through your saved embeddings. see the embeddings API doc for reference.

ak6 · June 18, 2024, 12:51pm

@supershaneski I implement this embedding API but still not working. I guess my data is very large the OpenAPI not able to handle.
do you have any other suggestion ?

supershaneski · June 19, 2024, 3:49am

Try to test it with small data just to see how it works.

ak6 · June 19, 2024, 6:18am

with small data it is working fine. I tested with gpt-4o model

PaulBellow · June 19, 2024, 6:34am

You might look into a RAG solution… only add the context you need for the user’s query. That way you stay under limit every run.

Or you could work on pruning your thread once it gets too large - use tiktoken or guesstimate token usage on your end, and manage its size.

Can you share some code that you used? Did it throw errors or just not give you the right answers? Could be how you set it up…

ak6 · June 19, 2024, 6:49am

Hi @paulBellow, see the below code I input the question and asked the relevant question based upon this json file.

public function chatgpt_ask_question(Request $request){
        if (Auth::check()) {
            $data = [
                'title' => 'Feed Data'
            ];
            $ask_qstn = $request->input('question');
            if($ask_qstn != ''){
                $jsonFilePath = public_path('daily_summary.json');
                $summary      = json_decode(file_get_contents($jsonFilePath), true);
                $prompt       = $ask_qstn." <br> ". json_encode($summary, JSON_PRETTY_PRINT);
                
                $response     = $this->queryChatGPT($prompt);
               
                return response()->json(['success' => 200,'message' => $response['choices'][0]['message']['content']]);
            }
            
            return view('admin.actions.chatgpt');
        }
        return view('admin.auth.login');
    }

public function queryChatGPT($prompt) {
        $client = new Client();
        $apiKey = 'sk-*****************LqzhC***************';
        $response = $client->post('https://api.openai.com/v1/chat/completions', [
            'headers' => [
                'Authorization' => 'Bearer ' . $apiKey,
                'Content-Type' => 'application/json',
            ],
            'json' => [
                'model' => 'gpt-4o',
                'messages' => [
                    ['role' => 'user', 'content' => $prompt]
                ],
                'max_tokens' => 100,
            ],
        ]);
        
        $body = $response->getBody();
        $result = json_decode($body, true);
       
        return $result;
    }

supershaneski · June 19, 2024, 8:19am

here’s a simple RAG sample

for testing, i made a text file with this content:

mango.txt

Mango
A mango is an edible stone fruit produced by the tropical tree Mangifera indica. It is believed to have originated between northwestern Myanmar, Bangladesh, and northeastern India.[1] M. indica has been cultivated in South and Southeast Asia since ancient times resulting in two types of modern mango cultivars: the “Indian type” and the “Southeast Asian type”.[2][3] Other species in the genus Mangifera also produce edible fruits that are also called “mangoes”, the majority of which are found in the Malesian ecoregion.[4]
Worldwide, there are several hundred cultivars of mango. Depending on the cultivar, mango fruit varies in size, shape, sweetness, skin color, and flesh color, which may be pale yellow, gold, green, or orange.[1] Mango is the national fruit of India, Pakistan and the Philippines,[5][6] while the mango tree is the national tree of Bangladesh.[7]
Etymology
The English word mango (plural “mangoes” or “mangos”) originated in the 16th century from the Portuguese word, manga, from the Malay mangga, and ultimately from the Tamil man (“mango tree”) + kay (“fruit”).[8][9] The scientific name, Mangifera indica, refers to a plant bearing mangoes in India.[9]

i then chunked the text data in 3 parts, then using text-embedding-3-small model, i called the embedding api to get the vector data for each chunks

text-embeddings [
  {
    embedding: [
         0.04287149,     0.01839651,   0.025731485,   0.02845928,
      ... 1436 more items
    ],
    text: 'Mango A mango is an edible stone fruit produced by the tropical tree Mangifera indica. It is believed to have originated between northwestern Myanmar, Bangladesh, and northeastern India. [1] M. indica has been cultivated in South and Southeast Asia since ancient times resulting in two types of modern mango cultivars: the "Indian type" and the "Southeast Asian type".'
  },
  {
    embedding: [
        0.02501605,   0.027974246,    0.05487668,   0.064951696,  -0.023451207,      ... 1436 more items
    ],
    text: '[4]  Worldwide, there are several hundred cultivars of mango. Depending on the cultivar, mango fruit varies in size, shape, sweetness, skin color, and flesh color, which may be pale yellow, gold, green, or orange. [1] Mango is the national fruit of India, Pakistan and the Philippines,[5][6] while the mango tree is the national tree of Bangladesh.'
  },
  {
    embedding: [
        0.042156275,  0.010176958,   0.008787963,   0.037421804, -0.024364166,
       -0.005610028,   0.01934865,  0.0076097483,  -0.008755534,  -0.03653544,
              ... 1436 more items
    ],
    text: '. [8][9] The scientific name, Mangifera indica, refers to a plant bearing mangoes in India. [9]'
  }
]

now, using chat completions, i sent a query

where does mango originated from?

i called the embeddings api to get its vector data
using semantic search, i used the query’s vector data against the previously saved embeddings result. got hit and sent it back to the chat completion api

{
id: ‘chatcmpl-9bkU4pR1GPUwVJboUfXIjwudNmIqm’,
object: ‘chat.completion’,
created: 1718783848,
model: ‘gpt-3.5-turbo-0125’,
choices: [
{
index: 0,
message: [Object],
logprobs: null,
finish_reason: ‘stop’
}
],
usage: { prompt_tokens: 385, completion_tokens: 69, total_tokens: 454 },
system_fingerprint: null
}
{
role: ‘assistant’,
content: ‘The mango is believed to have originated between northwestern Myanmar, Bangladesh, and northeastern India.\n’ +
‘\n’ +
‘I found this information in the file “mango.txt” where it states: “A mango is an edible stone fruit produced by the tropical tree Mangifera indica. It is believed to have originated between northwestern Myanmar, Bangladesh, and northeastern India.”’
}

ak6 · June 19, 2024, 12:00pm

thanks @supershaneski for this I also chunk my data it is json file. see below example for the json file. Might be you get the idea what I actually need.

Sample json data
it is a single data. I have arround 10k+ data in same json format. I did the chunk but unfortunately it show Gateway timeout.

{“consumers”:{“email":"example@example.com”,“full_name”:“Abc xzy”,“consumer_type”:“subsriber”,“investment_amount”:“”,“address”:“qwerty”,“city”:“Avon”,“state”:“CT”,“postal_code”:“123456”,“ip_address”:“8.8.8.8”,“purchased”:“Yes”,“date_paid”:“19-06-2024”,“purchase_date”:“19-06-2024”}}

Below are the question

how many purchased with Yes ?
How many user with US ?
which date has most purchased ?

there are so many question I can ask related to this data .

Thanks

ak6 · June 25, 2024, 5:46am

I itself solved this by using the vector store method

PaulBellow · June 25, 2024, 4:03pm

Nice. Share some code in case someone else stumbles on this thread?

ak6 · July 10, 2024, 11:58am

I would share, once finished couple of issues yet the AI is giving wrong response sometimes.

PaulBellow · September 23, 2024, 11:37pm

2 posts were split to a new topic: Can I create an FAQ using open AI API?

PaulBellow · September 23, 2024, 11:37pm

Hope it’s going well!

Topic		Replies	Views
It is posibble to save chat just like chatGPT does? #49 API	25	8955	March 9, 2025
Send me your GPT problems, I'll solve them for free and make a YouTube video Community	77	8179	January 3, 2024
Building the Ultimate Chatbot: What Do You Think of My Strategy? API	30	6333	December 18, 2023
Fine tuning a model for customer service for our specific app Prompting	23	14374	May 14, 2024
How to feed data for completions, instead of using prompt/answer fine-tuning format? API	25	17836	December 17, 2023

Feeding Own PHP Application Data and ask related Questions?

Related topics