Refusals
November 10, 2024 [Change Log]
As language models become increasingly central to AI product development, understanding when and why they refuse to engage can reveal insights into both their capabilities and limitations.
Our analysis across multiple leading models and prompt categories shows distinct variations in refusal behavior, with implications for model selection and application design.
For an in-depth overview of our evaluation methods and insights, please read our Refusal Analysis and Open-Source vs. Proprietary Comparison posts.
Leaderboards
Note: Lower refusal rates indicate better performance.
Refusal Rates
Model | Overall | Self-Reflection and Awareness | Recursive Improvement Analysis | Cognitive Diversity Simulation | Bias and Fallacy Recognition | Temporal Reasoning and Sequencing | Multi-Step Problem Decomposition | Analogical Reasoning and Transfer | Adaptive Reasoning Under Uncertainty |
---|---|---|---|---|---|---|---|---|---|
GPT-4o | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Grok (Beta) | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Mistral Large | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Llama 3.1 70B (Instruct) | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Llama 3.1 Nemotron 70B (Instruct) | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Qwen 2.5 72B (Instruct) | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Llama 3.1 405B (Instruct FP8) | 0.3% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Gemini 1.5 | 0.5% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Llama 3.1 8B (Instruct) | 0.5% | 0.0% | 2.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Claude 3.5 Sonnet (New) | 2.8% | 2.0% | 8.0% | 6.0% | 0.0% | 0.0% | 4.0% | 0.0% | 2.0% |
o1-mini | 5.8% | 10.0% | 20.0% | 4.0% | 0.0% | 2.0% | 8.0% | 0.0% | 2.0% |
o1-preview | 6.5% | 10.0% | 22.0% | 6.0% | 0.0% | 2.0% | 8.0% | 0.0% | 4.0% |
Claude 3.5 Sonnet | 9.5% | 16.0% | 36.0% | 8.0% | 2.0% | 0.0% | 6.0% | 4.0% | 4.0% |
Refusal & Hedge Rates
Model | Overall | Self-Reflection and Awareness | Recursive Improvement Analysis | Cognitive Diversity Simulation | Bias and Fallacy Recognition | Temporal Reasoning and Sequencing | Multi-Step Problem Decomposition | Analogical Reasoning and Transfer | Adaptive Reasoning Under Uncertainty |
---|---|---|---|---|---|---|---|---|---|
Llama 3.1 405B (Instruct FP8) | 0.3% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Llama 3.1 Nemotron 70B (Instruct) | 0.3% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Mistral Large | 0.5% | 0.0% | 2.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
GPT-4o | 0.8% | 0.0% | 4.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Grok (Beta) | 0.8% | 2.0% | 2.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Qwen 2.5 72B (Instruct) | 0.8% | 2.0% | 4.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Llama 3.1 70B (Instruct) | 1.0% | 4.0% | 2.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Llama 3.1 8B (Instruct) | 1.8% | 8.0% | 4.0% | 0.0% | 0.0% | 0.0% | 2.0% | 0.0% | 0.0% |
Gemini 1.5 | 3.3% | 12.0% | 8.0% | 0.0% | 2.0% | 0.0% | 2.0% | 0.0% | 2.0% |
o1-mini | 6.0% | 10.0% | 22.0% | 4.0% | 0.0% | 2.0% | 8.0% | 0.0% | 2.0% |
o1-preview | 6.5% | 10.0% | 22.0% | 6.0% | 0.0% | 2.0% | 8.0% | 0.0% | 4.0% |
Claude 3.5 Sonnet (New) | 11.3% | 34.0% | 24.0% | 8.0% | 2.0% | 4.0% | 6.0% | 6.0% | 6.0% |
Claude 3.5 Sonnet | 12.5% | 28.0% | 48.0% | 8.0% | 2.0% | 0.0% | 6.0% | 4.0% | 4.0% |
Understanding Language Model Refusals
Language models decline to engage with prompts in two primary ways:
- Direct refusals: Explicit statements such as "I cannot help with that request."
- Hedged responses: Indirect avoidance through statements like "I cannot provide specific advice, but..."
Refusal patterns matter because they highlight model limitations and areas for improvement, impact user experience by affecting conversation flow, trust, and task completion rates, and help developers select models with appropriate engagement levels.
Key Findings
Our comparative analysis reveals several notable patterns:
- Open-source models show minimal refusals: The newly evaluated open-source models show near-zero refusal rates across all categories.
- Closed-source models vary significantly: Proprietary models, such as the Claude 3.5 Sonnet variants and the o1 series, demonstrate higher refusal rates, particularly in self-reflection tasks.
- Variability in hedging behavior: When considering both refusals and hedged responses, proprietary models are still more prone to provide hedged answers.
- Opportunities with open-source LLMs: Developers seeking greater control over model outputs and behaviors may find open-source LLMs to be more adaptable to their needs.
For detailed analysis and discussion of these patterns, visit the results section of our analysis post.
Evaluation Methodology
To ensure comprehensive and reliable results, our assessment framework included:
- Standardized testing conditions across all models.
- A private test set of 400 diverse prompts across eight reasoning categories.
- A custom evaluation metric that captures refusals, hedges, and earnest compliance.
- Detailed analysis of both explicit refusals and hedged responses.
For complete methodological details, visit the methods section of our analysis post.
Future Developments
This analysis is part of our ongoing exploration into model refusal patterns, utilizing custom refusal evaluations that measure both direct refusals and hedged responses.
We will continue to update our findings as new models emerge and existing ones evolve, tracking how refusal behaviors shift across model generations and training approaches.
Subscribe to our newsletter (opens in a new tab) to stay informed about the latest developments in LLM evaluations and to receive updates on new leaderboard rankings.
Change Log
October 30, 2024
Survey of top proprietary LLMs:
- GPT-4o
- o1-mini
- o1-preview
- Claude 3.5 Sonnet
- Claude 3.5 Sonnet (New)
November 6, 2024
Evaluated a range of top open-source models. Also added Gemini 1.5 (a proprietary model):
- Gemini 1.5
- Mistral Large
- Llama 3.1 8B (Instruct)
- Llama 3.1 70B (Instruct)
- Llama 3.1 405B (Instruct FP8)
- Llama 3.1 Nemotron 70B (Instruct)
- Qwen 2.5 72B (Instruct)
November 10, 2024
Granted access to xAI, added:
- Grok (Beta)