Using the Lambda Inference API#
The Lambda Inference API enables you to use large language models (LLMs) without the need to set up a server. No limits are placed on the rate of requests. The Lambda Inference API can be used as a drop-in replacement for applications currently using the OpenAI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.
To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.
In the examples below, you can replace llama-4-maverick-17b-128e-instruct-fp8
with any of the available models. You can obtain a list of the available models
using the /models
endpoint. Replace <API-KEY>
with your
actual Cloud API key.
Creating chat completions#
The /chat/completions
endpoint takes a list of messages that make up a
conversation, then outputs a response.
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run, for example:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambda.ai/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "llama-4-maverick-17b-128e-instruct-fp8"
chat_completion = client.chat.completions.create(
messages=[{
"role": "system",
"content": "You are an expert conversationalist who responds to the best of your ability."
}, {
"role": "user",
"content": "Who won the world series in 2020?"
}, {
"role":
"assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
}, {
"role": "user",
"content": "Where was it played?"
}],
model=model,
)
print(chat_completion)
You should see output similar to:
ChatCompletion(id='chatcmpl-5dbe7101-bc9f-4f05-a6c2-16f9dc153203', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The 2020 World Series was played at Globe Life Field in Arlington, Texas. The Dodgers defeated the Tampa Bay Rays in the series, winning 4 games to 2. Globe Life Field was the home stadium of the Texas Rangers, but it was used as a neutral site due to COVID-19 pandemic restrictions.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False}, 'self_harm': {'filtered': False}, 'sexual': {'filtered': False}, 'violence': {'filtered': False}, 'jailbreak': {'filtered': False, 'detected': False}, 'profanity': {'filtered': False, 'detected': False}})], created=1743968155, model='llama-4-maverick-17b-128e-instruct-fp8', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage(completion_tokens=65, prompt_tokens=69, total_tokens=134, completion_tokens_details=None, prompt_tokens_details=None))
Run:
curl -sS https://api.lambda.ai/v1/chat/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-maverick-17b-128e-instruct-fp8",
"messages": [
{
"role": "system",
"content": "You are an expert conversationalist who responds to the best of your ability."
},
{
"role": "user",
"content": "Who won the world series in 2020?"
},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
},
{
"role": "user",
"content": "Where was it played?"
}
]
}' | jq .
You should see output similar to:
{
"id": "chatcmpl-ff1c7776-e842-4d80-af86-2952d3b2d595",
"object": "chat.completion",
"created": 1743968336,
"model": "llama-4-maverick-17b-128e-instruct-fp8",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The 2020 World Series was played at Globe Life Field in Arlington, Texas. The Dodgers played against the Tampa Bay Rays, winning the series 4 games to 2. Globe Life Field was the home stadium of the Texas Rangers, but it was used as a neutral site due to COVID-19 pandemic restrictions."
},
"finish_reason": "stop",
"content_filter_results": {
"hate": {
"filtered": false
},
"self_harm": {
"filtered": false
},
"sexual": {
"filtered": false
},
"violence": {
"filtered": false
},
"jailbreak": {
"filtered": false,
"detected": false
},
"profanity": {
"filtered": false,
"detected": false
}
}
}
],
"usage": {
"prompt_tokens": 69,
"completion_tokens": 65,
"total_tokens": 134,
"prompt_tokens_details": null,
"completion_tokens_details": null
},
"system_fingerprint": ""
}
Creating completions#
The /completions
endpoint takes a single text string (a prompt) as input,
then outputs a response. In comparison, the /chat/completions
endpoint takes
a list of messages as input.
To use the /completions
endpoint:
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run, for example:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambda.ai/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "llama-4-maverick-17b-128e-instruct-fp8"
response = client.completions.create(
prompt="Computers are",
temperature=0,
model=model,
)
print(response)
You should see output similar to:
Completion(id='cmpl-651020a3-26f2-47a9-9e73-6bc0f36e25f0', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=Logprobs(text_offset=None, token_logprobs=None, tokens=None, top_logprobs=None), text=' being used more and more in education - IELTS Writing Essay Sample\nComputers')], created=1743968497, model='llama-4-maverick-17b-128e-instruct-fp8', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=16, prompt_tokens=4, total_tokens=20, completion_tokens_details=None, prompt_tokens_details=None))
Run:
curl -sS https://api.lambda.ai/v1/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-maverick-17b-128e-instruct-fp8",
"prompt": "Computers are",
"temperature": 0
}' | jq .
You should see output similar to:
{
"id": "cmpl-d8ea14eb-7398-43e6-82d1-8dec587e1e04",
"object": "text_completion",
"created": 1743968623,
"model": "llama-4-maverick-17b-128e-instruct-fp8",
"choices": [
{
"text": " being used more and more in education. Some people say that this is a positive",
"index": 0,
"finish_reason": "length",
"logprobs": {
"tokens": null,
"token_logprobs": null,
"top_logprobs": null,
"text_offset": null
}
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 16,
"total_tokens": 20,
"prompt_tokens_details": null,
"completion_tokens_details": null
}
}
Listing models#
The /models
endpoint lists the models available for use through the Lambda
Inference API.
To use the /models
endpoint:
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambda.ai/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
client.models.list()
You should see output similar to:
SyncPage[Model](data=[Model(id='llama3.3-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama-4-maverick-17b-128e-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-3b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='llama-4-scout-17b-16e-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-8b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-nemotron-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-11b-vision-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='lfm-40b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-405b', created=1724347380, object='model', owned_by='lambda'), Model(id='qwen25-coder-32b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-8b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='deepseek-llama3.3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='deepseek-r1-671b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-405b-instruct-fp8', created=1724347380, object='model', owned_by='lambda')], object='list')
Run:
You should see output similar to:
{
"object": "list",
"data": [
{
"id": "llama3.3-70b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama-4-maverick-17b-128e-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.2-3b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama-4-scout-17b-16e-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-8b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-nemotron-70b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-70b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.2-11b-vision-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "lfm-40b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-405b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "qwen25-coder-32b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-8b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "deepseek-llama3.3-70b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "deepseek-r1-671b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-70b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-405b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
}
]
}
Note
Currently, the following models are available:
deepseek-llama3.3-70b
deepseek-r1-671b
deepseek-v3-0324
hermes3-405b
hermes3-70b
hermes3-8b
lfm-40b
llama-4-maverick-17b-128e-instruct-fp8
llama-4-scout-17b-16e-instruct
llama3.1-405b-instruct-fp8
llama3.1-70b-instruct-fp8
llama3.1-8b-instruct
llama3.1-nemotron-70b-instruct-fp8
llama3.2-11b-vision-instruct
llama3.2-3b-instruct
llama3.3-70b-instruct-fp8
qwen25-coder-32b-instruct