Yes, it turns out that the 3.5-turbo models, while indeed my small top_p setting locks in the top token more than actually setting to 0, there is still non-determinism in the logits, and there can be position changes of “top” in long outputs.
I did a thorough investigation here, using the gpt-3.5-turbo-instruct model: