How does temperature impact next token prediction in LLMs?

TLDR
1. At a temperature of 1, the probability values are the same as those derived from the standard softmax function.
2. Raising the temperature inflates the probabilities of the less likely tokens, thereby broadening the range of potential candidates (or diversity) for the model’s next token prediction.
3. Lowering the temperature, on the other hand, makes the probability of the most likely token approach 1.0, boosting the model’s confidence. Decreasing the temperature effectively eliminates the uncertainty within the model.

Google colab notebook.

Introduction
Large Language Models (LLMs) are versatile generative models suited for a wide array of tasks. They can produce consistent, repeatable outputs or generate creative content by placing unlikely words together. The “temperature” setting allows users to fine-tune the model’s output, controlling the degree of predictability.

Let’s take a hypothetical example to understand the impact of temperature on the next token prediction.

We asked an LLM to complete the sentence, “This is a wonderful _____.” Let’s assume the potential candidate tokens are:

|   token    | logit |
|------------|-------|
| day        |    40 |
| space      |     4 |
| furniture  |     2 |
| experience |    35 |
| problem    |    25 |
| challenge  |    15 |

The logits are passed through a softmax function so that the sum of the values is equal to one. Essentially, the softmax function generates probability estimates for each token.

Standard softmax function

Let’s calculate the probability estimates in Python.

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSlider


def softmax(logits):
    exps = np.exp(logits)
    return exps / np.sum(exps)


data = {
    "tokens": ["day", "space", "furniture", "experience", "problem", "challenge"],
    "logits": [5, 2.2, 2.0, 4.5, 3.0, 2.7]
}
df = pd.DataFrame(data)
df['probabilities'] = softmax(df['logits'].values)
df
| No. |   tokens   | logits | probabilities |
|-----|------------|--------|---------------|
|   0 | day        |    5.0 |      0.512106 |
|   1 | space      |    2.2 |      0.031141 |
|   2 | furniture  |    2.0 |      0.025496 |
|   3 | experience |    4.5 |      0.310608 |
|   4 | problem    |    3.0 |      0.069306 |
|   5 | challenge  |    2.7 |      0.051343 |
ax = sns.barplot(x="tokens", y="probabilities", data=df)
ax.set_title('Softmax Probability Estimates')
ax.set_ylabel('Probability')
ax.set_xlabel('Tokens')
plt.xticks(rotation=45)
for bar in ax.patches:
    ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height(), f'{bar.get_height():.2f}',
            ha='center', va='bottom', fontsize=10, rotation=0)
plt.show()

The softmax function with temperature is defined as follows:

where (T) is the temperature, (x_i) is the (i)-th component of the input vector (logits), and (n) is the number of components in the vector.

def softmax_with_temperature(logits, temperature):
    if temperature <= 0:
        temperature = 1e-10  # Prevent division by zero or negative temperatures
    scaled_logits = logits / temperature
    exps = np.exp(scaled_logits - np.max(scaled_logits))  # Numerical stability improvement
    return exps / np.sum(exps)


def plot_interactive_softmax(temperature):
    probabilities = softmax_with_temperature(df['logits'], temperature)
    plt.figure(figsize=(10, 5))
    bars = plt.bar(df['tokens'], probabilities, color='blue')
    plt.ylim(0, 1)
    plt.title(f'Softmax Probabilities at Temperature = {temperature:.2f}')
    plt.ylabel('Probability')
    plt.xlabel('Tokens')
    # Add text annotations
    for bar, probability in zip(bars, probabilities):
        yval = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2, yval, f"{probability:.2f}", ha='center', va='bottom', fontsize=10)
    plt.show()

interactive_plot = interactive(plot_interactive_softmax, temperature=FloatSlider(value=1, min=0, max=2, step=0.01, description='Temperature'))
interactive_plot

At T = 1,

At a temperature of 1, the probability values are the same as those derived from the standard softmax function.

At T > 1,

Raising the temperature inflates the probabilities of the less likely tokens, thereby broadening the range of potential candidates (or diversity) for the model’s next token prediction.

At T < 1,

Lowering the temperature, on the other hand, makes the probability of the most likely token approach 1.0, boosting the model’s confidence. Decreasing the temperature effectively eliminates the uncertainty within the model.

Conclusion

LLMs leverage the temperature parameter to offer flexibility in their predictions. The model behaves predictably at a temperature of 1, closely following the original softmax distribution. Increasing the temperature introduces greater diversity, amplifying less likely tokens. Conversely, decreasing the temperature makes the predictions more focused, increasing the model’s confidence in the most probable token by reducing uncertainty. This adaptability allows users to tailor LLM outputs to a wide array of tasks, striking a balance between creative exploration and deterministic output.

Unless otherwise noted, all images are by the author.


How does temperature impact next token prediction in LLMs? was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Unlock the power of our talent network. Partner with QAT Global for your staffing needs and experience the difference of having a dedicated team of experts supporting your enterprise’s growth.

Explore Articles from QAT Global