Use Responses API + Structured Outputs, with tool/function calling as the enforcement layer.
Do not fine-tune for this. Fine-tuning is the wrong direction.
generally speaking, inquiries like yours can be handled by asking the machine itself….
Chatgpt Spam concerning your specific request.
1. The Correct Architecture (What Actually Works)
Core stack
-
Responses API → conversational control + intent detection
-
Structured Outputs (JSON schema) → guaranteed final JSON
-
Tool / Function calling → hard validation + slot completion
-
State machine (outside the model) → quote type flow control
Fine-tuning is unnecessary and will reduce reliability for this task.
2. Intent Detection Without Explicit Asking
You want:
Do this in-system, not by questioning the user.
In your system prompt, define intent classification rules:
Classify user intent into one of:
- QUICK_QUOTE
- STANDARD_QUOTE
- TRACKING_QUOTE
Rules:
- If user provides a tracking number → TRACKING_QUOTE
- If user references multiple items with individual dimensions → STANDARD_QUOTE
- If user gives totals only (count, weight, volume, size) → QUICK_QUOTE
- If ambiguous, default to QUICK_QUOTE and escalate only if missing fields block quote
The model will classify without asking.
3. Fixed Parameters (Pickup Type, Service Level)
You are correct: the model must lock certain parameters.
Do not let the model invent them dynamically.
Best practice
Example:
"pickup_type": {
"value": "commercial",
"locked": true
},
"service_level": {
"value": "standard",
"locked": true
}
Tell the model:
If the user does not explicitly request a different pickup type or service level, do not ask and do not change defaults.
This avoids unnecessary questions.
4. Slot-Filling Without Being Annoying
Rule: Ask only when a required field blocks quote generation
Each quote type has a minimum viable schema.
Quick Quote (≈20 fields)
Standard Quote
Tracking Quote
The model should:
-
Ask one question at a time
-
Ask only for missing required fields
-
Never ask for fields not required by detected intent
5. Structured Output (This Is the Key)
At the end, you force the model to emit JSON that matches a schema.
Example schema (simplified):
{
"quote_type": "QUICK_QUOTE | STANDARD_QUOTE | TRACKING_QUOTE",
"pickup_type": "commercial",
"service_level": "standard",
"data": {
"tracking_id": "string | null",
"totals": {
"items": "number | null",
"weight": "number | null",
"volume": "number | null"
},
"items": [
{
"length": "number",
"width": "number",
"height": "number",
"weight": "number"
}
]
}
}
Use Responses API with response_format set to JSON schema.
This guarantees:
6. Where Function Calling Fits
Function calling is not your primary interface—it is your enforcer.
Use it to:
Example:
This loop is fast and reliable.
7. Why You Should NOT Fine-Tune
Fine-tuning:
-
Hard-codes behavior you actually want flexible
-
Makes intent detection worse, not better
-
Breaks when business rules change
-
Costs more and slows iteration
Fine-tuning is for:
You don’t need it here.
8. Recommended Final Stack
Use this:
-
Responses API
-
System prompt defining:
-
intent rules
-
locked defaults
-
minimal-question policy
-
Structured Outputs (JSON schema)
-
Optional function/tool calls for validation
Do not use:
Bottom Line (Blunt Take)
If you:
…you’ll get a clean, deterministic quoting assistant that feels natural and doesn’t interrogate the user.