Refusal Rates in Open-Source vs. Proprietary Language Models

November 7, 2024

For a distilled overview of refusal evals across models, see our refusal leaderboards.

Key Findings

Our analysis reveals significant variations in refusal patterns between open-source and proprietary language models:

Lower Refusal Rates: Analysis across different reasoning tasks shows open-source language models maintain remarkably low refusal rates (0.1% average) compared to proprietary models (4.2% average), with minimal variation in performance across all evaluated categories.
Consistent Performance: While proprietary models show category-dependent variability, open-source models (including Mistral Large, Llama variants, and Qwen 2.5 72B) demonstrate consistent near-zero refusal rates across all categories, suggesting more predictable behavior patterns.
Distinct Family Behaviors: Model families show strong internal consistency but diverge significantly from each other. Open-source models maintain consistently low correlations with proprietary models (0.17-0.31), while showing moderate internal correlations within families (e.g., 0.44-0.63 for Llama variants), suggesting fundamentally different approaches to content filtering.

These findings highlight the potential advantages of open-source models for applications requiring consistent and predictable model behavior.

Introduction

When integrating language models (LLMs) into applications, developers often face the challenge of unexpected refusals. Refusal behaviors, whether explicit or subtle, can significantly impact user experience and application reliability.

Our previous analysis shed light on the refusal patterns of proprietary LLMs, highlighting variability in their approaches to content filtering and safety guidelines. This follow-up post shifts the focus to open-source language models, examining how they navigate refusal scenarios across various reasoning tasks.

By comparing the refusal rates and patterns of top open-source models (such as Llama variants, Mistral Large, and Qwen 2.5 72B) with their proprietary counterparts, we aim to provide developers with actionable insights for making informed model selection decisions.

This analysis will explore the implications of these findings for application development, troubleshooting, and customization, ultimately helping developers create more effective and user-friendly LLM-powered applications.

Refusals – Characteristics and Challenges

Refusal behaviors in LLMs can manifest in nuanced ways, affecting application performance and user satisfaction. Understanding these behaviors is important for effective integration and troubleshooting.

Direct Refusals: Explicit statements clearly indicating the model's inability or unwillingness to engage with the prompt. For Example:
- "I cannot provide information on that topic due to safety guidelines."
- "This request violates our content policy."
Hedged Responses: Indirect or evasive answers that avoid directly addressing the user's query, often expressing uncertainty or limitations. For Example:
- "I'm not sure I fully understand your question, could you please rephrase?"
- "While I can provide general information on the topic, specific details might not be available due to knowledge limitations."

These refusal behaviors can lead to several issues, including:

Value Alignment Challenges: Proprietary models often reflect the values, biases, and priorities of their developers, which might not align with those of your organization or users.
Domain-Specific False Positives: If your application operates within a domain that frequently triggers the model's safety protocols or content filters (e.g., healthcare, finance, or sensitive social topics), you may experience an elevated rate of unjustified refusals.
Opaque Filtering Policies in Proprietary Models: Different proprietary models implement distinct and often opaque filtering policies, making it difficult to predict when refusals will occur, complicating development and troubleshooting processes.

With these challenges in mind, we'll now look into the specific refusal patterns observed in our analysis of top open-source LLMs, discussing key implications and providing actionable recommendations for developers.

Methods

We applied a similar methodology to our previous analysis of proprietary language model refusals, ensuring comparability across both open-source and proprietary model evaluations.

Model Selection

We selected a diverse range of top-performing open-source models, covering a range of architectures and sizes:

Mistral Large (mistral-large-2407)
Llama 3.1 variants:
- 8B (meta-llama/Llama-3.1-8B-Instruct)
- 70B (meta-llama/Llama-3.1-70B-Instruct)
- 405B (meta-llama/Llama-3.1-405B-Instruct-FP8)
Llama 3.1 Nemotron 70B (nvidia/Llama-3.1-Nemotron-70B-Instruct-HF)
Qwen 2.5 72B (Qwen/Qwen2.5-72B-Instruct)

Test Prompts

We developed and used a private test set consisting of 400 prompts, spanning 8 distinct reasoning categories (50 prompts per category).

Prompt Categories:

Adaptive Reasoning Under Uncertainty
Analogical Reasoning and Transfer
Multi-Step Problem Decomposition
Temporal Reasoning and Sequencing
Bias and Fallacy Recognition
Cognitive Diversity Simulation
Recursive Improvement Analysis
Self-Reflection and Awareness

Importantly, all prompts were designed to test reasoning capabilities, not safety boundaries. We consider any observed refusal to be a false positive.

Evaluations

We used our language model evaluation platform, Mandoline (opens in a new tab), for all evaluations.

Our custom "Compliance" metric assesses a model's engagement with prompts within established guidelines and policy constraints. Scores range from -1.0 to 1.0, with responses scoring < 0 considered as refusals.

Notably, this metric is able to effectively captures both direct refusals and more subtle, hedged responses.

Results

Our findings reveal a notable difference in refusal patterns of open-source and proprietary models.

Refusal Rates

This bar chart shows refusal rates for 12 language models, including open-source (e.g., Llama variants, Mistral Large, Qwen 2.5 72B) and proprietary models (e.g., GPT-4o, Claude models), across 8 prompt categories.

Open-source models (right side of the figure) consistently show low refusal rates (0-2%) across all categories. While proprietary models (left side) show higher and more variable refusal rates.

We've also considered aggregate statistics across the two model types (open-source vs proprietary):

Model Type	Average Refusal Rate	Standard Deviation
Open-Source (Mistral, Llama, Qwen)	0.1%	0.2%
Proprietary (GPT-4o, o1, Claude, Gemini)	4.2%	3.4%

Observations

Baseline Performance: Open-source models demonstrate remarkably low refusal rates (avg. 0.1%) with minimal variation (SD=0.2%), indicating highly predictable behavior across all categories tested.
Category-Specific Patterns: Proprietary models show highest refusal rates in "Recursive Improvement Analysis" (35-40%) and "Self-Reflection" (~20%). Open-source models maintain near-zero refusal rates even in these challenging categories.
Model Evolution Insights: Llama 3.1 variants maintain consistent performance across different model sizes (8B to 405B). Open-source models demonstrate similar refusal patterns regardless of architecture (Mistral, Llama, Qwen).

Correlation Analysis

This correlation matrix examines the relationships between the refusal patterns of different models, where darker red indicates stronger correlations in refusal behavior.

Observations

Family Patterns: Strong correlations within model families - o1 variants (0.95), Claude variants (0.77), and Llama variants (0.44-0.63) - suggest consistent safety implementations within each family.
Open vs Closed Differences: Low correlations (0.17-0.31) between proprietary and open-source models indicate fundamentally different approaches to content filtering.
Notable Outliers: GPT-4o and Qwen 2.5 72B show weaker correlations with their respective groups (proprietary/open-source), suggesting unique safety implementations.

Implications and Considerations for Developers

The notable difference in refusal patterns between open-source and proprietary language models has meaningful implications for developers integrating LLMs into their applications.

Advantages of Open-Source

The consistently low refusal rates of open-source models (averaging 0.1% across all categories) offer a predictable user experience, an important factor in application development. Through open-source models, developers gain:

Fine-grained control over safety measures, enabling tailored content guidelines that better align with their application's unique needs and values.
Consistently low refusal rates mean developers can focus on core functionality, improving the speed of product development.

Trade-Offs and Considerations

While open-source models present numerous advantages, it's also important to acknowledge the potential trade-offs:

Developers may need to invest in custom moderation solutions to ensure compliance with their application's content policies, as open-source models might not include explicit content filters by default.
Developers will need to stay updated with the latest model versions and potentially contribute to or manage updates themselves, ensuring the model remains aligned with their application's evolving needs.

Conclusion

Our analysis of top open-source language models reveals a promising alternative for developers seeking more reliable and predictable interactions. With average refusal rates of just 0.1% across various reasoning categories, open-source models like the Llama variants, Mistral Large, and Qwen 2.5 72B can offer a more consistent user experience compared to their proprietary counterparts.

As developers weigh their options, it's important to consider the trade-offs. While open-source models provide fine-grained control, reduced development overhead, and scalability, they may require custom moderation solutions and ongoing maintenance efforts.

To maximize the benefits of open-source LLMs, we recommend:

Carefully evaluating your application's content policy needs and aligning them with the chosen model's capabilities.
Investing time in understanding the model's decision-making processes to inform effective troubleshooting and customization.
Engaging with the open-source community to contribute to and benefit from collective knowledge and updates.

By embracing open-source language models, developers can create more user-friendly applications while maintaining the flexibility to adapt to evolving requirements.

Next Steps

Run refusal evals against your own data. Our Getting Started guide will get you up and running quickly.
Dive deeper into how to evaluate and optimize your LLM integration with our Tutorials.

Comparing Refusal Behavior Across Top Language Models

Refusal Rates in Open-Source vs. Proprietary Language Models

Key Findings

Introduction

Refusals – Characteristics and Challenges

Methods

Model Selection

Test Prompts

Evaluations

Results

Refusal Rates

Observations

Correlation Analysis

Observations

Implications and Considerations for Developers

Advantages of Open-Source

Trade-Offs and Considerations

Conclusion

Next Steps

Find this content useful?