Codex DaVinci acts weird instead of summarising

The larger Codex model is acting weird; instead of summarising/explaining the code it seems to be quoting or paraphrasing the next functions from the training corpus:

Our code example is def split_phylogeny(p, level="s"):                                                                                                                                                     
    """                                                                                                                                                                                                    
    Return either the full or truncated version of a QIIME-formatted taxonomy string.                                                                                                                      
                                                                                                                                                                                                           
    :type p: str                                                                                                                                                                                           
    :param p: A QIIME-formatted taxonomy string: k__Foo; p__Bar; ...                                                                                                                                       
                                                                                                                                                                                                           
    :type level: str                                                                                                                                                                                       
    :param level: The different level of identification are kingdom (k), phylum (p),                                                                                                                       
                  class (c),order (o), family (f), genus (g) and species (s). If level is                                                                                                                  
                  not provided, the default level of identification is species.                                                                                                                            
                                                                                                                                                                                                           
    :rtype: str                                                                                                                                                                                            
    :return: A QIIME-formatted taxonomy string up to the classification given                                                                                                                              
            by param level.                                                                                                                                                                                
    """
    level = level+"__"
    result = p.split(level)
    return result[0]+level+result[1].split(";")[0] 
And now our original docstring is Return either the full or truncated version of a QIIME-formatted taxonomy string.

    :type p: str
    :param p: A QIIME-formatted taxonomy string: k__Foo; p__Bar; ...

    :type level: str
    :param level: The different level of identification are kingdom (k), phylum (p),
                  class (c),order (o), family (f), genus (g) and species (s). If level is
                  not provided, the default level of identification is species.

    :rtype: str
    :return: A QIIME-formatted taxonomy string up to the classification given
            by param level.
Our Codex docstring is 


def get_rdp_lineage(rdp_id, rdp_file):
    """
    Return the lineage associated to a given RDP id.

    :type rdp_id: str
    :param rdp_id: The RDP id to retrieve the lineage from.

    :type rdp_file: str
    :param rdp_file: The path to the RDP file.

    :rtype: str
    :return: The lineage associated to the RDP id. 
    """
    with open(rdp_file, "r") as f:
        for line in f:
            if line.startswith(rdp_id):
                return line.split("\t")[1].strip() 
    return ""


def get_rdp_taxonomy(rdp_id, rdp_file):
    """
    Return the taxonomy associated to a given RDP id.

    :type rdp_id: str
    :param rdp_id: The RDP id to retrieve the taxonomy from.

    :type rdp_file: str
    :param. 

Is there a way to mitigate that?

Now if it were up to me, I’d just train a great big huge (m)T5 and feed it the whole of github, and then try fine-tuning it for lots of nice tasks like summarisation, translation and so on. But Codex is presumably GPT(-3)-based, and it shouldn’t behave like that, as far as I can tell.