Mandoline API Reference

Authentication

To use the Mandoline API:

Sign up (opens in a new tab) for a Mandoline account.
Get your API key from the account page (opens in a new tab).

Installation

To install the Mandoline Node.js (opens in a new tab) SDK:

npm install mandoline

Setup

Initialize the Mandoline client with your API key:

import { Mandoline } from "mandoline";
 
const mandoline = new Mandoline({ apiKey: "your-api-key" });

Or use an environment variable:

// Set MANDOLINE_API_KEY in your environment
const mandoline = new Mandoline();

Data Models

Here are the main data models used in Mandoline:

type UUID = string;
 
type SerializableDict = { [key: string]: any };
type NullableSerializableDict = SerializableDict | null;
 
type StringArray = ReadonlyArray<string>;
type NullableStringArray = StringArray | null;
 
interface Metric {
  id: UUID;
  createdAt: string;
  updatedAt: string;
  name: string;
  description: string;
  tags?: NullableStringArray;
}
 
interface MetricCreate {
  name: string;
  description: string;
  tags?: NullableStringArray;
}
 
interface MetricUpdate {
  name?: string;
  description?: string;
  tags?: NullableStringArray;
}
 
interface Evaluation {
  id: UUID;
  createdAt: string;
  updatedAt: string;
  metricId: UUID;
  prompt: string;
  prompt_image?: string;
  response?: string;
  response_image?: string;
  properties?: NullableSerializableDict;
  score: number;
}
 
interface EvaluationCreate {
  metricId: UUID;
  prompt: string;
  prompt_image?: string;
  response?: string;
  response_image?: string;
  properties?: NullableSerializableDict;
}
 
interface EvaluationUpdate {
  properties?: NullableSerializableDict;
}

Metrics

Metrics are used to evaluate specific aspects of LLM performance. To learn more about metrics, see our Core Concepts guide.

Create a Metric

Creates a new evaluation metric.

async createMetric(metric: MetricCreate): Promise<Metric>

Parameters:

metric: MetricCreate object
- name: string (required)
- description: string (required)
- tags: NullableStringArray (optional)

Returns: Promise<Metric>

Example:

const newMetric = await mandoline.createMetric({
  name: "Response Clarity",
  description: "Measures how clear and understandable the LLM's response is",
  tags: ["clarity", "communication"],
});

Get a Metric

Fetches a specific metric by its unique identifier.

async getMetric(metricId: UUID): Promise<Metric>

Parameters:

metricId: UUID (required)

Returns: Promise<Metric>

Example:

const metric = await mandoline.getMetric(
  "550e8400-e29b-41d4-a716-446655440000",
);

List Metrics

Fetches a list of metrics with optional filtering.

async getMetrics(options?: {
  skip?: number;
  limit?: number;
  tags?: NullableStringArray;
  filters?: SerializableDict;
}): Promise<Metric[]>

Parameters:

options: (optional)
- skip: number (optional, default: 0)
- limit: number (optional, default: 100, max: 1000)
- tags: NullableStringArray (optional)
- filters: SerializableDict (optional)

Returns: Promise<Metric[]>

Example:

const metrics = await mandoline.getMetrics({
  skip: 0,
  limit: 50,
  tags: ["clarity", "communication"],
});

Update a Metric

Modifies an existing metric's attributes.

async updateMetric(metricId: UUID, update: MetricUpdate): Promise<Metric>

Parameters:

metricId: UUID (required)
update: MetricUpdate object
- name: string (optional)
- description: string (optional)
- tags: NullableStringArray (optional)

Returns: Promise<Metric>

Example:

const updatedMetric = await mandoline.updateMetric(
  "550e8400-e29b-41d4-a716-446655440000",
  {
    description: "Updated description for the metric",
    // Fields not included will not be updated
  },
);

Delete a Metric

Removes a metric permanently.

async deleteMetric(metricId: UUID): Promise<void>

Parameters:

metricId: UUID (required)

Returns: Promise<void>

Example:

await mandoline.deleteMetric("550e8400-e29b-41d4-a716-446655440000");

Evaluations

Evaluations in Mandoline apply metrics to specific LLM interactions. To learn more about evaluations, see our Core Concepts guide.

Create an Evaluation

Performs an evaluation for a single metric on a prompt-response pair. Supports both text and image inputs.

async createEvaluation(evaluation: EvaluationCreate): Promise<Evaluation>

Parameters:

evaluation: EvaluationCreate object
- metricId: UUID (required)
- prompt: string (required)
- prompt_image: string (optional)
- response: string (optional)
- response_image: string (optional)
- properties: NullableSerializableDict (optional)

Returns: Promise<Evaluation>

Note: At least one of response or response_image must be provided. Images should be base64 encoded with data URL format (e.g. data:image/[type];base64,[data]).

Example:

// Text-only evaluation
const textEvaluation = await mandoline.createEvaluation({
  metricId: "550e8400-e29b-41d4-a716-446655440000",
  prompt: "Explain quantum computing",
  response: "Quantum computing uses quantum mechanics...",
  properties: { model: "my-llm-model-v1" },
});
 
// Image-based evaluation
const imageEvaluation = await mandoline.createEvaluation({
  metricId: "550e8400-e29b-41d4-a716-446655440000",
  prompt: "Describe this image",
  prompt_image: "data:image/jpeg;base64,/9j/4AAQSkZJRg...",
  response: "The image shows a sunset over mountains",
  properties: { model: "my-vision-model-v1" },
});

Note: This is a compute-heavy operation and is therefore rate limited to 3 requests / second. If you exceed this limit, you'll receive a RateLimitExceeded error.

Get an Evaluation

Fetches details of a specific evaluation.

async getEvaluation(evaluationId: UUID): Promise<Evaluation>

Parameters:

evaluationId: UUID (required)

Returns: Promise<Evaluation>

Example:

const evaluation = await mandoline.getEvaluation(
  "550e8400-e29b-41d4-a716-446655440000",
);

List Evaluations

Fetches a list of evaluations with optional filtering.

async getEvaluations(options?: {
  skip?: number;
  limit?: number;
  metricId?: UUID;
  properties?: NullableSerializableDict;
  filters?: SerializableDict;
}): Promise<Evaluation[]>

Parameters:

options: (optional)
- skip: number (optional, default: 0)
- limit: number (optional, default: 100, max: 1000)
- metricId: UUID (optional)
- properties: NullableSerializableDict (optional)
- filters: SerializableDict (optional)

Returns: Promise<Evaluation[]>

Example:

const evaluations = await mandoline.getEvaluations({
  skip: 0,
  limit: 50,
  metricId: "550e8400-e29b-41d4-a716-446655440000",
  properties: { model: "my-llm-model-v1" },
});

Update an Evaluation

Modifies an existing evaluation's properties.

async updateEvaluation(evaluationId: UUID, update: EvaluationUpdate): Promise<Evaluation>

Parameters:

evaluationId: UUID (required)
update: EvaluationUpdate object
- properties: NullableSerializableDict (optional)

Returns: Promise<Evaluation>

Example:

const updatedEvaluation = await mandoline.updateEvaluation(
  "550e8400-e29b-41d4-a716-446655440000",
  {
    properties: { reviewed: true },
  },
);

Delete an Evaluation

Removes an evaluation permanently.

async deleteEvaluation(evaluationId: UUID): Promise<void>

Parameters:

evaluationId: UUID (required)

Returns: Promise<void>

Example:

await mandoline.deleteEvaluation("550e8400-e29b-41d4-a716-446655440000");

Evaluate Multiple Metrics

Performs evaluations across multiple metrics for a given prompt-response pair. Supports both text and image inputs.

async evaluate(
  metrics: Metric[],
  prompt: string,
  prompt_image?: string,
  response?: string,
  response_image?: string
  properties?: NullableSerializableDict,
): Promise<Evaluation[]>

Parameters:

metrics: Metric[] (required) - An array of metrics to evaluate against
prompt: string (required) - The prompt to evaluate
response: string (optional) - The response to evaluate
properties: NullableSerializableDict (optional) - Additional properties to include with the evaluations
prompt_image: string (optional) - Base64 encoded image with data URL format
response_image: string (optional) - Base64 encoded image with data URL format

Note: At least one of response or response_image must be provided. Images should be base64 encoded with data URL format (e.g. data:image/[type];base64,[data]).

Returns: Promise<Evaluation[]>

Example:

const metrics = await mandoline.getMetrics({ tags: ["depth"] });
const evaluations = await mandoline.evaluate(
  metrics,
  "Explain the theory of relativity",
  "The theory of relativity, proposed by Albert Einstein...",
  { model: "my-llm-model-v1" },
);

Advanced Concepts

Pagination

Mandoline uses offset-based pagination for listing metrics and evaluations:

skip: Number of items to skip before returning results.
limit: Maximum number of items to return in a single request.

Example:

// Get first 50 metrics
const firstPage = await mandoline.getMetrics({ limit: 50 });
 
// Get next 50 metrics
const secondPage = await mandoline.getMetrics({ skip: 50, limit: 50 });

For queries larger than 1000 items, multiple requests are required.

Multimodal Evaluation

Mandoline API Reference

Table of Contents

Authentication

Installation

Setup

Data Models

Metrics

Create a Metric

Get a Metric

List Metrics

Update a Metric

Delete a Metric

Evaluations

Create an Evaluation

Get an Evaluation

List Evaluations

Update an Evaluation

Delete an Evaluation

Evaluate Multiple Metrics

Advanced Concepts

Pagination

Find this content useful?