Can you please help with this scenario-based queries –
- Model: GPT4
- Input Tokens: 100
- Max. Tokens: 500
Case A:
Output Tokens: 300 (within max tokens)
Questions:
- How much tokens will be charged for, 100 + 300 OR 100 + 500?
Case B:
Output Tokens: 800 (beyond max tokens)
Questions:
2. How much tokens will be charged for, 100 + 800?
3. Please confirm whether we get the entire 800 tokens in response
In both cases given above, assume we’re well within model’s context window. My understanding is the response will be truncated when it exceeds the context window.