Documentation
API Reference

Mandoline API Reference

Table of Contents

  1. Authentication
  2. Installation
  3. Setup
  4. Data Models
  5. Metrics
  6. Evaluations
  7. Advanced Concepts

Authentication

To use the Mandoline API:

  1. Sign up (opens in a new tab) for a Mandoline account.
  2. Get your API key from the account page (opens in a new tab).

Installation

To install the Mandoline Node.js (opens in a new tab) SDK:

npm install mandoline

Setup

Initialize the Mandoline client with your API key:

import { Mandoline } from "mandoline";
 
const mandoline = new Mandoline({ apiKey: "your-api-key" });

Or use an environment variable:

// Set MANDOLINE_API_KEY in your environment
const mandoline = new Mandoline();

Data Models

Here are the main data models used in Mandoline:

type UUID = string;
 
type SerializableDict = { [key: string]: any };
type NullableSerializableDict = SerializableDict | null;
 
type StringArray = ReadonlyArray<string>;
type NullableStringArray = StringArray | null;
 
interface Metric {
  id: UUID;
  createdAt: string;
  updatedAt: string;
  name: string;
  description: string;
  tags?: NullableStringArray;
}
 
interface MetricCreate {
  name: string;
  description: string;
  tags?: NullableStringArray;
}
 
interface MetricUpdate {
  name?: string;
  description?: string;
  tags?: NullableStringArray;
}
 
interface Evaluation {
  id: UUID;
  createdAt: string;
  updatedAt: string;
  metricId: UUID;
  prompt: string;
  response: string;
  properties?: NullableSerializableDict;
  score: number;
}
 
interface EvaluationCreate {
  metricId: UUID;
  prompt: string;
  response: string;
  properties?: NullableSerializableDict;
}
 
interface EvaluationUpdate {
  properties?: NullableSerializableDict;
}

Metrics

Metrics are used to evaluate specific aspects of LLM performance. To learn more about metrics, see our Core Concepts guide.

Create a Metric

Creates a new evaluation metric.

async createMetric(metric: MetricCreate): Promise<Metric>

Parameters:

  • metric: MetricCreate object
    • name: string (required)
    • description: string (required)
    • tags: NullableStringArray (optional)

Returns: Promise<Metric>

Example:

const newMetric = await mandoline.createMetric({
  name: "Response Clarity",
  description: "Measures how clear and understandable the LLM's response is",
  tags: ["clarity", "communication"],
});

Get a Metric

Fetches a specific metric by its unique identifier.

async getMetric(metricId: UUID): Promise<Metric>

Parameters:

  • metricId: UUID (required)

Returns: Promise<Metric>

Example:

const metric = await mandoline.getMetric(
  "550e8400-e29b-41d4-a716-446655440000",
);

List Metrics

Fetches a list of metrics with optional filtering.

async getMetrics(options?: {
  skip?: number;
  limit?: number;
  tags?: NullableStringArray;
  filters?: SerializableDict;
}): Promise<Metric[]>

Parameters:

  • options: (optional)
    • skip: number (optional, default: 0)
    • limit: number (optional, default: 100, max: 1000)
    • tags: NullableStringArray (optional)
    • filters: SerializableDict (optional)

Returns: Promise<Metric[]>

Example:

const metrics = await mandoline.getMetrics({
  skip: 0,
  limit: 50,
  tags: ["clarity", "communication"],
});

Update a Metric

Modifies an existing metric's attributes.

async updateMetric(metricId: UUID, update: MetricUpdate): Promise<Metric>

Parameters:

  • metricId: UUID (required)
  • update: MetricUpdate object
    • name: string (optional)
    • description: string (optional)
    • tags: NullableStringArray (optional)

Returns: Promise<Metric>

Example:

const updatedMetric = await mandoline.updateMetric(
  "550e8400-e29b-41d4-a716-446655440000",
  {
    description: "Updated description for the metric",
    // Fields not included will not be updated
  },
);

Delete a Metric

Removes a metric permanently.

async deleteMetric(metricId: UUID): Promise<void>

Parameters:

  • metricId: UUID (required)

Returns: Promise<void>

Example:

await mandoline.deleteMetric("550e8400-e29b-41d4-a716-446655440000");

Evaluations

Evaluations in Mandoline apply metrics to specific LLM interactions. To learn more about evaluations, see our Core Concepts guide.

Create an Evaluation

Performs an evaluation for a single metric on a prompt-response pair.

async createEvaluation(evaluation: EvaluationCreate): Promise<Evaluation>

Parameters:

  • evaluation: EvaluationCreate object
    • metricId: UUID (required)
    • prompt: string (required)
    • response: string (required)
    • properties: NullableSerializableDict (optional)

Returns: Promise<Evaluation>

Example:

const newEvaluation = await mandoline.createEvaluation({
  metricId: "550e8400-e29b-41d4-a716-446655440000",
  prompt: "Explain quantum computing",
  response: "Quantum computing uses quantum mechanics...",
  properties: { model: "my-llm-model-v1" },
});

Note: This is a compute-heavy operation and is therefore rate limited to 3 requests / second. If you exceed this limit, you'll receive a RateLimitExceeded error.

Get an Evaluation

Fetches details of a specific evaluation.

async getEvaluation(evaluationId: UUID): Promise<Evaluation>

Parameters:

  • evaluationId: UUID (required)

Returns: Promise<Evaluation>

Example:

const evaluation = await mandoline.getEvaluation(
  "550e8400-e29b-41d4-a716-446655440000",
);

List Evaluations

Fetches a list of evaluations with optional filtering.

async getEvaluations(options?: {
  skip?: number;
  limit?: number;
  metricId?: UUID;
  properties?: NullableSerializableDict;
  filters?: SerializableDict;
}): Promise<Evaluation[]>

Parameters:

  • options: (optional)
    • skip: number (optional, default: 0)
    • limit: number (optional, default: 100, max: 1000)
    • metricId: UUID (optional)
    • properties: NullableSerializableDict (optional)
    • filters: SerializableDict (optional)

Returns: Promise<Evaluation[]>

Example:

const evaluations = await mandoline.getEvaluations({
  skip: 0,
  limit: 50,
  metricId: "550e8400-e29b-41d4-a716-446655440000",
  properties: { model: "my-llm-model-v1" },
});

Update an Evaluation

Modifies an existing evaluation's properties.

async updateEvaluation(evaluationId: UUID, update: EvaluationUpdate): Promise<Evaluation>

Parameters:

  • evaluationId: UUID (required)
  • update: EvaluationUpdate object
    • properties: NullableSerializableDict (optional)

Returns: Promise<Evaluation>

Example:

const updatedEvaluation = await mandoline.updateEvaluation(
  "550e8400-e29b-41d4-a716-446655440000",
  {
    properties: { reviewed: true },
  },
);

Delete an Evaluation

Removes an evaluation permanently.

async deleteEvaluation(evaluationId: UUID): Promise<void>

Parameters:

  • evaluationId: UUID (required)

Returns: Promise<void>

Example:

await mandoline.deleteEvaluation("550e8400-e29b-41d4-a716-446655440000");

Evaluate Multiple Metrics

Performs evaluations across multiple metrics for a given prompt-response pair.

async evaluate(
  metrics: Metric[],
  prompt: string,
  response: string,
  properties?: NullableSerializableDict
): Promise<Evaluation[]>

Parameters:

  • metrics: Metric[] (required) - An array of metrics to evaluate against
  • prompt: string (required) - The prompt to evaluate
  • response: string (required) - The response to evaluate
  • properties: NullableSerializableDict (optional) - Additional properties to include with the evaluations

Returns: Promise<Evaluation[]>

Example:

const metrics = await mandoline.getMetrics({ tags: ["depth"] });
const evaluations = await mandoline.evaluate(
  metrics,
  "Explain the theory of relativity",
  "The theory of relativity, proposed by Albert Einstein...",
  { model: "my-llm-model-v1" },
);

Advanced Concepts

Pagination

Mandoline uses offset-based pagination for listing metrics and evaluations:

  • skip: Number of items to skip before returning results.
  • limit: Maximum number of items to return in a single request.

Example:

// Get first 50 metrics
const firstPage = await mandoline.getMetrics({ limit: 50 });
 
// Get next 50 metrics
const secondPage = await mandoline.getMetrics({ skip: 50, limit: 50 });

For queries larger than 1000 items, multiple requests are required.