@nelson
76 sec for 1k tokens using Llama2, 13 billion parameters @ 4 bits quantization
./main -t 16 -m ./models/llama2/llama-2-13b-chat.ggmlv3.q4_0.bin --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p "[INST] <<SYS>> You are a helpful, respectful and honest assistant. <</SYS>> Write a story about llamas. [/INST]"
llama_print_timings: eval time = 50167.78 ms / 645 runs ( 77.78 ms per token, 12.86 tokens per second)
llama_print_timings: eval time = 57875.98 ms / 762 runs ( 75.95 ms per token, 13.17 tokens per second)
llama_print_timings: eval time = 38968.11 ms / 510 runs ( 76.41 ms per token, 13.09 tokens per second)
84 sec for 1k tokens using Llama2, 13 billion parameters @ 8 bits quantization
./main -t 16 -m ./models/llama2/llama-2-13b-chat.ggmlv3.q8_0.bin --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p "[INST] <<SYS>> You are a helpful, respectful and honest assistant. <</SYS>> Write a story about llamas. [/INST]"
llama_print_timings: eval time = 41616.74 ms / 494 runs ( 84.24 ms per token, 11.87 tokens per second)
llama_print_timings: eval time = 37273.97 ms / 444 runs ( 83.95 ms per token, 11.91 tokens per second)
llama_print_timings: eval time = 55865.34 ms / 652 runs ( 85.68 ms per token, 11.67 tokens per second)
187 sec for 1k tokens using Llama2, 70 billion parameters @ 4 bits quantization
./main -t 16 -gqa 8 -m ./models/llama2/llama-2-70b-chat.ggmlv3.q4_0.bin --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 --in-prefix-bos --in-prefix ' [INST] ' --in-suffix ' [/INST]' -i -p "[INST] <<SYS>> You are a helpful, respectful and honest assistant. <</SYS>> Write a story about llamas. [/INST]"
llama_print_timings: eval time = 88664.63 ms / 471 runs ( 188.25 ms per token, 5.31 tokens per second)
llama_print_timings: eval time = 87636.06 ms / 470 runs ( 186.46 ms per token, 5.36 tokens per second)
llama_print_timings: eval time = 105132.86 ms / 564 runs ( 186.41 ms per token, 5.36 tokens per second)
Note: For whatever reason, the 70B model is only using 2 CPU’s and 1.4 GB of memory, and the 13B model uses 16 CPU’s and 2.5 GB of memory.
I pulled the C++ code a few days ago, and recently updated to support the 70B model, and over time, hopefully they can speed up the 70B to utilize more of my computer resources.
Here is a sample output using the 70B model at 4 bits quantization:
Once upon a time, in the rolling hills of the Andes, there lived a group of llamas. These llamas were known for their soft, warm fur and their gentle dispositions. They spent their days roaming the green fields, munching on grass and enjoying the fresh mountain air.
One llama in particular, named Luna, was very curious. She loved to explore the surrounding hills and valleys, always looking for new adventures. One day, while wandering through a dense thicket of trees, Luna stumbled upon a hidden cave.
Inside the cave, Luna found a treasure trove of glittering crystals and shiny rocks. She had never seen anything like it before and was immediately captivated. She spent hours admiring the sparkling gems and even tried to imitate their colors by twirling her fur in different ways.
As the sun began to set, Luna reluctantly left the cave and returned to her herd. But she couldn’t stop thinking about the crystals and rocks she had seen. She told all her friends about her discovery, but they didn’t believe her.
“There’s no way you found a cave full of treasure,” said one llama. “You must have been seeing things.”
Luna was determined to prove them wrong. The next day, she led the herd to the cave, and they were all amazed by its beauty. Together, they explored every nook and cranny, marveling at the sparkling gems and shiny rocks.
From that day on, the llamas made regular visits to the cave, always discovering new hidden treasures. And Luna, the curious llama who had found it all, was hailed as a hero by her herd. She had shown them that even in their own backyard, there was still so much to explore and discover.
As the years went by, the llamas continued to visit the cave, and it became a special place for them. They would go there to celebrate special occasions, like birthdays and anniversaries, and they would always leave an offering of grass or leaves as a thank you to the cave for its treasures.
And Luna, well, she never lost her sense of curiosity and wonder. She continued to explore the world around her, always looking for new adventures and hidden treasures. But she never forgot the magical cave that had started it all, and she made sure to visit it often, remembering the day she discovered its secrets and the joy it had brought to her and her herd.