Why does CODEX not improve its sampling strategy by incorporating a syntax check of some sort? I.e. prohibiting (sequences of) tokens that cannot be generated without disrupting syntactic validity?
Reasons I could come up with: Unknown programming language, syntax uncheckable if not fully completed code, CODEX syntax errors are rare enough.
Would it be as simple as running topk completions through a language server to filter out snippets with errors? Maybe other checks can also be run? Maybe you can also get Codex to auto-generate unit tests for the completions to see which pass?
With GPT-3 the model could learn concepts from one language and apply it in another.
While I think this is also useful for CODEX, new languages like Julia get uncompileable completions, so fine-tuning might help.
However, CODEX can also be used for language transfer (e.g. python to ruby), so fine-tuning for all languages, while maybe not scalable, also prevents some features of the overall CODEX to be useful