Why does CODEX not use syntax aware sampling to improve on nucelus sampling?

As described in Chen et al. 2021:

We use nucleussampling (Holtzman et al., 2020) with topp=0.95 for all sampling evaluation in this work

For reference: Holtzman et al., 2020

Why does CODEX not improve its sampling strategy by incorporating a syntax check of some sort? I.e. prohibiting (sequences of) tokens that cannot be generated without disrupting syntactic validity?

Reasons I could come up with: Unknown programming language, syntax uncheckable if not fully completed code, CODEX syntax errors are rare enough.

1 Like

Would it be as simple as running topk completions through a language server to filter out snippets with errors? Maybe other checks can also be run? Maybe you can also get Codex to auto-generate unit tests for the completions to see which pass?


Hmm… fine-tuned Codex maybe :thinking:
Because I need some VHDL done.

With GPT-3 the model could learn concepts from one language and apply it in another.
While I think this is also useful for CODEX, new languages like Julia get uncompileable completions, so fine-tuning might help.
However, CODEX can also be used for language transfer (e.g. python to ruby), so fine-tuning for all languages, while maybe not scalable, also prevents some features of the overall CODEX to be useful