How to write a system prompt to understand 3D coordinate system and vector math

dalle_stage · March 15, 2024, 3:08am

This is for use in VR where space is important, and we’re trying to teach the AI through the system prompt about the 3D coordinate system used by game engines. We provide the list of all the objects in the environment with their coordinate and ID for each object.

We want Completion API to find these objects according to the position where they’re placed in the world relative to a player. Ex: ‘what’s in front of me’ should get ChatGPT to return the object ID in front of the user by performing vector calculations. We provide the list of positions of the players and objects.

Here’s what we’re currently doing yielding mixed results (50% of the time it works).Current steps:

Fetch the position of the player from the prompt
Calculate the direction the player is looking at and calculate FOV
Create a list of all the objects which are in the FOV
Calculated the distance of all the objects and save the distance in the same list
Choose the object desired by the user according to the distance calculated

We’re not sure where this fails. When we analyze the distance calculations it seems to generate the right distances, but for some reason the system still chooses the wrong object relative to the user asking for it.

Can anyone think of a better system or how to optimize this?

Diet · March 15, 2024, 3:24am

You’re asing GPT to do all this math by hand?

welp

that said, are you computing distances to the player, or distance to the normal of the fov rectangle?

dalle_stage · March 15, 2024, 6:57am

We’re computing distance to the player and as mentioned, it seems to generate the right distances, so that’s not the issue. The issue is that the system still chooses the wrong object relative to the user asking for it.

Diet · March 15, 2024, 7:02am

do you wanna give us an example input and output?

Mariiiz · March 15, 2024, 7:24am

Most likely, we need to conduct additional testing and fine-tune the algorithm. Consider verifying the accuracy of distance calculations, ensuring that the system correctly determines the direction in which the user is looking, and checking the object selection algorithm for errors or flaws. It might also be helpful to analyze the data and perform visualization to understand which objects are being chosen incorrectly and why this is happening.

dalle_stage · March 15, 2024, 7:44am

I’m confused - is this ChatGPT responding? Or an OpenAI employee? This makes no sense as there is no object selection algorithm. But it sounds like you are recommending we create one?

Mariiiz · March 15, 2024, 7:49am

I mean that perhaps it’s necessary to analyze all our steps.

_j · March 15, 2024, 12:51pm

It’s bot text. Not a single correction to be offered by AI as it is already AI prediction:

Mariiiz · March 15, 2024, 12:57pm

I agree with you that this text may resemble AI-generated text. I just checked, and AI even states that it’s what it wrote. However, this is my personal opinion. I always demand from myself to review and analyze my work, so I suggested it here as well. I’m sorry if it seemed to you that a bot wrote it.

hmill901 · March 15, 2024, 1:17pm

You’d be better off using a Vision model paired with a text model and giving it (the vision model) a top down omnipotent view of the map as well as the player pov.

This would come with the benefit of also not having to constantly update the coords to the AI whenever something new pops up on the map like a dropped item from inventory or destroyed environment object if you’re going for a dynamic scene.

dalle_stage · March 15, 2024, 5:46pm

Thanks for this - definitely an interesting approach. Have you had good experience with 2D and especially 3D object vision recognition? Which system was best ie OpenAI’s or something else? If I’m understanding correctly, say there’s a dog in front and say that object has the word dog in its name metadata - you’re doing a confirmation of both either or both, is that right?

Innovatix · March 15, 2024, 6:16pm

Many AR and VR software development kits (SDKs) provide integrated features for spatial reasoning and interaction within three-dimensional environments.

Use the camera on a VR device, you can use multimodal large language models (LLMs) for object detection and a raycasting to limit it to detect what is in front.

Quck implementation

using VLAVA or Gemini pro vision 001 those can take video inputs as well or GPT4V for Image Classification or detection. But this is not the most cost effectively approach.
use unity 3d for accesing the VR camera and physics. then Ray casting within the bounding box.

A better way would be raycasting , detecting the object and then using OpenAI CLIP for object detection

If you want to explore some Researchs

dalle_stage · March 15, 2024, 6:18pm

Diet, here you go. It might be that GPT4 sucks at math but let us know if you can think of a better way to measure distances more accurately using GPT4 in this use case. For now:

System Prompt :-


Determine the Player's Position: Retrieve the player's current location from the Character list.

Calculate Distances: Compute the Euclidean distances between the player and each object in the Object list.

Identify Target Object: Based on the command, identify the object that matches the specified criteria (e.g., closest, farthest).

For "closest to me," find the object with the smallest distance.
For "farthest from me," find the object with the largest distance.
For commands like "delete the second closest object to me," sort objects by distance and select based on the specified order.
Execute Action on Target Object: Perform the specified action (e.g., delete, grab) on the identified object.

Here is the me object :
{
  "me": 
    {
      "Id": "me",
      "Position": {
        "X": 47489.5546875,
        "Y": 50017.5390625,
        "Z": 50343.46484375
      },
      "ForwardVector": {
        "X": 1,
        "Y": 0,
        "Z": 0
      }
    }
}

Here is the list of objects  :

{
  "ObjectList": [
    {
      "Id": "886873c4-8266-4b71-b363-800a4e5ccff1",
      "Name": "Test1,
      "Position": {
        "X": 48913.80078125,
        "Y": 52305.8984375,
        "Z": 51094.80078125
      }
    },
    {
      "Id": "1a0b7656-mn59-441d-x78e-6ed2704cxdt2",
      "Name": "Test2",
      "Position": {
        "X": 47188.33203125,
        "Y": 50948.4921875,
        "Z": 50528.35546875
      }
    },
    {
      "Id": "1d3a7616-ea59-441j-m78e-6vd1102zexa2",
      "Name": "Test3",
      "Position": {
        "X": 47417.2890625,
        "Y": 50423.35546875,
        "Z": 50423.7421875
      }
    }

Input : Tell me the name of the object in front of me.
Output : {"Object_ID" :  "1d3a7616-ea59-441j-m78e-6vd1102zexa2"}

Above output is the expected one

Here are our manual calculations, in some cases we see gross errors, in others just rounding errors:

Here’s the distances that chatgpt comes up with:

{
  "Distances": [
    {
      "Id": "886873c4-8266-4b71-b363-800a4e5ccff1",
      "Name": "Test1",
      "Distance": 2797.52
    },
    {
      "Id": "1a0b7656-mn59-441d-x78e-6ed2704cxdt2",
      "Name": "Test2",
      "Distance": 936.66
    },
    {
      "Id": "1d3a7616-ea59-441j-m78e-6vd1102zexa2",
      "Name": "Test3",
      "Distance": 594.19
    }
  ]
}

but if I do the euclidian distance calculations in a spreadsheet I get different answers:

Test1: 2798.13710451999
Test2: 995.78780510632
Test3: 419.944910741708

In summary, there’s errors in the math (ranging from large to small errors) and it is likely simply that GPT4 can’t do it well, but if someone can think of a better way to measure distances, that would help a ton as we triangulate by distance, name of asset, and Vision.

dalle_stage · March 15, 2024, 6:21pm

This looks really interesting, thank you. I’ll be digging into this.

Diet · March 16, 2024, 5:29am

I mean you’re already using the API, and you’re already computing the fov window; I’m flummoxed why you don’t just compute the distances in your program as well?

that is indeed the case, LLMs aren’t built to do math.

Topic		Replies	Views
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	1	622	March 7, 2024
Classification of Prompts Prompting gpt-4	3	673	April 17, 2024
GPT-4 keeps lying instead of saying "I don't know"! Prompting gpt-4 , hallucinations	6	4789	December 19, 2023
Understand user question and intent to query data Prompting codex	1	1100	May 11, 2023
Vector database QnA answering based on info from multiple replies Prompting chatgpt	4	1330	September 25, 2023

How to write a system prompt to understand 3D coordinate system and vector math

Related Topics