Skip to content

Using the Lambda Inference API#

The Lambda Inference API enables you to use large language models (LLMs) without the need to set up a server. No limits are placed on the rate of requests. The Lambda Inference API can be used as a drop-in replacement for applications currently using the OpenAI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.

To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.

In the examples below, you can replace llama-4-maverick-17b-128e-instruct-fp8 with any of the available models. You can obtain a list of the available models using the /models endpoint. Replace <API-KEY> with your actual Cloud API key.

Creating chat completions#

The /chat/completions endpoint takes a list of messages that make up a conversation, then outputs a response.

First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:

pip install openai

Run, for example:

from openai import OpenAI

openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambda.ai/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

model = "llama-4-maverick-17b-128e-instruct-fp8"

chat_completion = client.chat.completions.create(
    messages=[{
        "role": "system",
        "content": "You are an expert conversationalist who responds to the best of your ability."
    }, {
        "role": "user",
        "content": "Who won the world series in 2020?"
    }, {
        "role":
        "assistant",
        "content": "The Los Angeles Dodgers won the World Series in 2020."
    }, {
        "role": "user",
        "content": "Where was it played?"
    }],
    model=model,
)

print(chat_completion)

You should see output similar to:

ChatCompletion(id='chatcmpl-5dbe7101-bc9f-4f05-a6c2-16f9dc153203', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The 2020 World Series was played at Globe Life Field in Arlington, Texas. The Dodgers defeated the Tampa Bay Rays in the series, winning 4 games to 2. Globe Life Field was the home stadium of the Texas Rangers, but it was used as a neutral site due to COVID-19 pandemic restrictions.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False}, 'self_harm': {'filtered': False}, 'sexual': {'filtered': False}, 'violence': {'filtered': False}, 'jailbreak': {'filtered': False, 'detected': False}, 'profanity': {'filtered': False, 'detected': False}})], created=1743968155, model='llama-4-maverick-17b-128e-instruct-fp8', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage(completion_tokens=65, prompt_tokens=69, total_tokens=134, completion_tokens_details=None, prompt_tokens_details=None))

Run:

curl -sS https://api.lambda.ai/v1/chat/completions \
  -H "Authorization: Bearer <API-KEY>" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "llama-4-maverick-17b-128e-instruct-fp8",
        "messages": [
          {
            "role": "system",
            "content": "You are an expert conversationalist who responds to the best of your ability."
          },
          {
            "role": "user",
            "content": "Who won the world series in 2020?"
          },
          {
            "role": "assistant",
            "content": "The Los Angeles Dodgers won the World Series in 2020."
          },
          {
            "role": "user",
            "content": "Where was it played?"
          }
        ]
      }' | jq .

You should see output similar to:

{
  "id": "chatcmpl-ff1c7776-e842-4d80-af86-2952d3b2d595",
  "object": "chat.completion",
  "created": 1743968336,
  "model": "llama-4-maverick-17b-128e-instruct-fp8",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas. The Dodgers played against the Tampa Bay Rays, winning the series 4 games to 2. Globe Life Field was the home stadium of the Texas Rangers, but it was used as a neutral site due to COVID-19 pandemic restrictions."
      },
      "finish_reason": "stop",
      "content_filter_results": {
        "hate": {
          "filtered": false
        },
        "self_harm": {
          "filtered": false
        },
        "sexual": {
          "filtered": false
        },
        "violence": {
          "filtered": false
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "profanity": {
          "filtered": false,
          "detected": false
        }
      }
    }
  ],
  "usage": {
    "prompt_tokens": 69,
    "completion_tokens": 65,
    "total_tokens": 134,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  },
  "system_fingerprint": ""
}

Creating completions#

The /completions endpoint takes a single text string (a prompt) as input, then outputs a response. In comparison, the /chat/completions endpoint takes a list of messages as input.

To use the /completions endpoint:

First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:

pip install openai

Run, for example:

from openai import OpenAI

openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambda.ai/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

model = "llama-4-maverick-17b-128e-instruct-fp8"

response = client.completions.create(
  prompt="Computers are",
  temperature=0,
  model=model,
)

print(response)

You should see output similar to:

Completion(id='cmpl-651020a3-26f2-47a9-9e73-6bc0f36e25f0', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=Logprobs(text_offset=None, token_logprobs=None, tokens=None, top_logprobs=None), text=' being used more and more in education - IELTS Writing Essay Sample\nComputers')], created=1743968497, model='llama-4-maverick-17b-128e-instruct-fp8', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=16, prompt_tokens=4, total_tokens=20, completion_tokens_details=None, prompt_tokens_details=None))

Run:

curl -sS https://api.lambda.ai/v1/completions \
  -H "Authorization: Bearer <API-KEY>" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "llama-4-maverick-17b-128e-instruct-fp8",
        "prompt": "Computers are",
        "temperature": 0
      }' | jq .

You should see output similar to:

{
  "id": "cmpl-d8ea14eb-7398-43e6-82d1-8dec587e1e04",
  "object": "text_completion",
  "created": 1743968623,
  "model": "llama-4-maverick-17b-128e-instruct-fp8",
  "choices": [
    {
      "text": " being used more and more in education. Some people say that this is a positive",
      "index": 0,
      "finish_reason": "length",
      "logprobs": {
        "tokens": null,
        "token_logprobs": null,
        "top_logprobs": null,
        "text_offset": null
      }
    }
  ],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 16,
    "total_tokens": 20,
    "prompt_tokens_details": null,
    "completion_tokens_details": null
  }
}

Listing models#

The /models endpoint lists the models available for use through the Lambda Inference API.

To use the /models endpoint:

First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:

pip install openai

Run:

from openai import OpenAI

openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambda.ai/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

client.models.list()

You should see output similar to:

SyncPage[Model](data=[Model(id='llama3.3-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama-4-maverick-17b-128e-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-3b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='llama-4-scout-17b-16e-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-8b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-nemotron-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-11b-vision-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='lfm-40b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-405b', created=1724347380, object='model', owned_by='lambda'), Model(id='qwen25-coder-32b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-8b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='deepseek-llama3.3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='deepseek-r1-671b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-405b-instruct-fp8', created=1724347380, object='model', owned_by='lambda')], object='list')

Run:

curl https://api.lambda.ai/v1/models -H "Authorization: Bearer <API-KEY>" | jq .

You should see output similar to:

{
  "object": "list",
  "data": [
    {
      "id": "llama3.3-70b-instruct-fp8",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama-4-maverick-17b-128e-instruct-fp8",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama3.2-3b-instruct",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama-4-scout-17b-16e-instruct",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "hermes3-8b",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama3.1-nemotron-70b-instruct-fp8",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama3.1-70b-instruct-fp8",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama3.2-11b-vision-instruct",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "lfm-40b",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "hermes3-405b",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "qwen25-coder-32b-instruct",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama3.1-8b-instruct",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "deepseek-llama3.3-70b",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "deepseek-r1-671b",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "hermes3-70b",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    },
    {
      "id": "llama3.1-405b-instruct-fp8",
      "object": "model",
      "created": 1724347380,
      "owned_by": "lambda"
    }
  ]
}

Note

Currently, the following models are available:

  • deepseek-llama3.3-70b
  • deepseek-r1-671b
  • deepseek-v3-0324
  • hermes3-405b
  • hermes3-70b
  • hermes3-8b
  • lfm-40b
  • llama-4-maverick-17b-128e-instruct-fp8
  • llama-4-scout-17b-16e-instruct
  • llama3.1-405b-instruct-fp8
  • llama3.1-70b-instruct-fp8
  • llama3.1-8b-instruct
  • llama3.1-nemotron-70b-instruct-fp8
  • llama3.2-11b-vision-instruct
  • llama3.2-3b-instruct
  • llama3.3-70b-instruct-fp8
  • qwen25-coder-32b-instruct