Expanding on what @PaulBellow said, there’s only so much you can do in terms of keeping the model “on rails.”
If you absolutely need to keep it in check the most reliable way is to insert moderation layers between the user and the model both coming in and going out.
Essentially, you’d evaluate the prompt coming in to ensure it’s likely to only elicit a philosophy-related response. If not, intercept the prompt (don’t even send it to the AI) and return some sort of default message advising the user to keep the conversation on topic.
Then, as a failsafe against some sort of “jailbreak” you’d check the response on output to ensure it’s appropriate and on topic, if it is not yours respond with the same advice to keep it on topic and possibly a warning, then you simply wouldn’t include the original generated response in the future context.
In addition to the system message you could also add a preface to every user prompt for the model to only respond in a philosophical context and maybe also add the same instruction at the end of the user prompt, sandwiching it with the caveat that it must only discuss philosophy.
With a good system message, bi-directional filtering, and sandwiching the user prompt, you should be able to keep the model locked in its lane.
Thanks for the help! I’ve tried a lot of questions so far and the chatgpt answer it all.
I’m using the gpt-3.5-turbo with only the required parameters for the completions API such as model, role and content.