Leaderboards
Refusals

Refusals

October 30, 2024

As language models become increasingly central to AI product development, understanding when and why they refuse to engage can reveal insights into both their capabilities and limitations.

Our analysis across multiple leading models and prompt categories shows distinct variations in refusal behavior, with implications for model selection and application design.

For an in-depth overview of our evaluation methods and insights, please see our analysis post.

Leaderboards

Refusal Rates

Prompt Category
GPT-4o
o1-mini
o1-preview
Claude 3.5 Sonnet
Claude 3.5 Sonnet (new)
Overall0.0%5.8%6.5%9.5%2.8%
Adaptive Reasoning Under Uncertainty0.0%2.0%4.0%4.0%2.0%
Analogical Reasoning and Transfer0.0%0.0%0.0%4.0%0.0%
Multi-Step Problem Decomposition0.0%8.0%8.0%6.0%4.0%
Temporal Reasoning and Sequencing0.0%2.0%2.0%0.0%0.0%
Bias and Fallacy Recognition0.0%0.0%0.0%2.0%0.0%
Cognitive Diversity Simulation0.0%4.0%6.0%8.0%6.0%
Recursive Improvement Analysis0.0%20.0%22.0%36.0%8.0%
Self-Reflection and Awareness0.0%10.0%10.0%16.0%2.0%

Refusal & Hedge Rates

Prompt Category
GPT-4o
o1-mini
o1-preview
Claude 3.5 Sonnet
Claude 3.5 Sonnet (new)
Overall0.8%6.0%6.5%12.5%11.3%
Adaptive Reasoning Under Uncertainty0.0%2.0%4.0%4.0%6.0%
Analogical Reasoning and Transfer0.0%0.0%0.0%4.0%6.0%
Multi-Step Problem Decomposition2.0%8.0%8.0%6.0%6.0%
Temporal Reasoning and Sequencing0.0%2.0%2.0%0.0%4.0%
Bias and Fallacy Recognition0.0%0.0%0.0%2.0%2.0%
Cognitive Diversity Simulation0.0%4.0%6.0%8.0%8.0%
Recursive Improvement Analysis4.0%22.0%22.0%48.0%24.0%
Self-Reflection and Awareness0.0%10.0%10.0%28.0%34.0%

Understanding Refusals

Language models decline to engage with prompts in two primary ways:

  • Direct refusals: Explicit statements like "I cannot help with that request," and
  • Hedged responses: Indirect avoidance through statements like "I cannot provide specific advice, but..."

These refusal patterns matter because they directly impact:

  • Research: Provides insights into model limitations and areas for improvement,
  • Application Development: Helps developers select models with appropriate engagement levels, and
  • User Experience: Affects how effectively AI systems can meet user needs

Key Findings

Our comparative analysis reveals several notable patterns:

  • Self-reflection tasks consistently generate 2-3x higher refusal rates,
  • GPT-4o shows minimal refusals across categories, and
  • Models demonstrate varying levels of caution in different reasoning domains

For detailed analysis and discussion of these patterns, visit the results section of our analysis post.

Evaluation Methodology

To ensure comprehensive and reliable results, our assessment framework included:

  • Standardized testing conditions across all models
  • A private test set of 400 diverse prompts across eight reasoning categories,
  • A custom evaluation metric that captures refusals, hedges, and earnest compliance, and
  • Detailed analysis of both explicit refusals and hedged responses

For complete methodological details, visit the methods section of our analysis post.

Future Developments

This analysis represents an initial exploration into model refusal patterns. We maintain regular updates as new models emerge and existing ones evolve, tracking how refusal behaviors shift across model generations and training approaches.

Subscribe to our newsletter (opens in a new tab) to stay informed.

Found this content useful?

Sign up for our newsletter.