Sequence prediction prompts

I have a set of sequence examples of fixed length size n created using 10 digits (each element of the sequence could be one of these digits).
1 2 4 6 7 9 7
1 9 4 5 6 8 0

I would like to fine-tune a model where given a sequence of size k<n the model would predict the next number in the sequence.
How do I go about creating the prompts for fine-tuning and prediction?

1 Like

I used python library transformers

pip install transformers

This example uses the Hugging Face’s transformers library to fine-tune and predict with a pre-trained GPT-2 model. Make sure to install the library first by running pip install transformers .

import random
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Config
from transformers import TextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments

# Parameters
k = 3
n = 7
num_examples = 1000
epochs = 5

# Generate random sequences
sequences = [' '.join([str(random.randint(0, 9)) for _ in range(n)]) for _ in range(num_examples)]

# Create input-output pairs
def create_pairs(sequence, k):
    return [(sequence[i:i+k], sequence[i+k]) for i in range(len(sequence) - k)]

pairs = [create_pairs(seq.split(), k) for seq in sequences]
pairs = [pair for sublist in pairs for pair in sublist]

# Prepare dataset
train_data = '\n'.join([f'input: {" ".join(pair[0])} output: {pair[1]}' for pair in pairs])

# Save train_data to a file
with open('train_data.txt', 'w') as f:

# Load pre-trained GPT-2 model and tokenizer
model_name = 'gpt2'
config = GPT2Config.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name, config=config)

# Prepare data for training
train_dataset = TextDataset(tokenizer=tokenizer, file_path='train_data.txt', block_size=128)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Configure training arguments
training_args = TrainingArguments(

# Create Trainer and fine-tune the model
trainer = Trainer(


# Function for predicting next number in a sequence
def predict_next_number(sequence, tokenizer, model):
    input_str = f'input: {sequence} output: '
    input_ids = tokenizer.encode(input_str, return_tensors='pt')
    output = model.generate(input_ids, max_length=len(input_ids[0]) + 1, num_return_sequences=1)
    decoded_output = tokenizer.decode(output[0])
    next_number = decoded_output.strip().split()[-1]
    return next_number

# Test the prediction function
input_sequence = '3 8 1'
next_number = predict_next_number(input_sequence, tokenizer, model)
print(f"Input sequence: {input_sequence}\nPredicted next number: {next_number}")

This script generates random sequences of length n with digits from 0 to 9, creates input-output pairs, fine-tunes a GPT-2 model, and predicts the next number given a sequence of length k . Note that this is a simple example and you might need to adjust the parameters or preprocess the data differently to achieve better results.