Using the Lambda Inference API#
The Lambda Inference API enables you to use large language models (LLMs) without the need to set up a server. No limits are placed on the rate of requests. The Lambda Inference API can be used as a drop-in replacement for applications currently using the OpenAI API. See, for example, our guide on integrating the Lambda Inference API into VS Code.
To use the Lambda Inference API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.
In the examples below, you can replace hermes3-405b
with any of the available
models. You can obtain a list of the available models using the
/models
endpoint. Replace <API-KEY>
with your actual
Cloud API key.
Creating chat completions#
The /chat/completions
endpoint takes a list of messages that make up a
conversation, then outputs a response.
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run, for example:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "deepseek-r1-671b"
chat_completion = client.chat.completions.create(
messages=[{
"role": "system",
"content": "You are a helpful assistant named Hermes, made by Nous Research."
}, {
"role": "user",
"content": "Who won the world series in 2020?"
}, {
"role":
"assistant",
"content":
"The Los Angeles Dodgers won the World Series in 2020."
}, {
"role": "user",
"content": "Where was it played?"
}],
model=model,
)
print(chat_completion)
You should see output similar to:
ChatCompletion(id='chatcmpl-ee4d3972-e171-4cda-b3bb-aaad5e7dd34e', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Okay, the user asked where the 2020 World Series was played. Let me start by recalling the previous conversation. The user first asked who won the 2020 World Series, and I correctly answered the Los Angeles Dodgers. Now, the follow-up question is about the location.\n\nI need to remember that the 2020 season was impacted by the COVID-19 pandemic. Normally, the World Series is held at the home ballparks of the participating teams. But due to the pandemic, there were significant changes. I think MLB used a neutral site for the playoffs and World Series that year to minimize travel and reduce infection risks.\n\nI should verify the specific venue. From what I recall, the Texas Rangers' Globe Life Field in Arlington, Texas, was the host. It was a new stadium, opened in 2020, and they used it as a neutral site. The entire postseason, including the World Series, was played there. Let me double-check that information to make sure. Yes, sources confirm that the 2020 World Series games were all held at Globe Life Field. Also, it was the first time the World Series was played entirely at a single neutral site since 1944. I should mention that it was due to the pandemic and that the Dodgers won there. That should answer the user's question accurately and provide context about why the location was different.\n</think>\n\nThe 2020 World Series was held entirely at **Globe Life Field** in Arlington, Texas, due to the COVID-19 pandemic. This marked the first time in MLB history that the World Series was played at a single neutral site rather than alternating between the home ballparks of the two competing teams (the Los Angeles Dodgers and the Tampa Bay Rays). The Dodgers clinched their title in this new stadium, which opened earlier that year.", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None), content_filter_results={'hate': {'filtered': False}, 'self_harm': {'filtered': False}, 'sexual': {'filtered': False}, 'violence': {'filtered': False}, 'jailbreak': {'filtered': False, 'detected': False}, 'profanity': {'filtered': False, 'detected': False}})], created=1742579082, model='deepseek-r1-671b', object='chat.completion', service_tier=None, system_fingerprint='', usage=CompletionUsage(completion_tokens=372, prompt_tokens=50, total_tokens=422, completion_tokens_details=None, prompt_tokens_details=None))
Run:
curl -sS https://api.lambdalabs.com/v1/chat/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-671b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant named Hermes, made by Nous Research."
},
{
"role": "user",
"content": "Who won the world series in 2020?"
},
{
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
},
{
"role": "user",
"content": "Where was it played?"
}
]
}' | jq .
You should see output similar to:
{
"id": "chatcmpl-34686031-4c70-47b8-bafc-2300532790ab",
"object": "chat.completion",
"created": 1742579524,
"model": "deepseek-r1-671b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Okay, the user asked, \"Where was it played?\" referring to the 2020 World Series. From the previous conversation, I had already told them that the Dodgers won in 2020. Now they want to know the location.\n\nFirst, I need to recall where the 2020 World Series was held. I remember that due to the COVID-19 pandemic, MLB made some changes that year. Normally, the World Series games are hosted at the home stadiums of the participating teams. But in 2020, to minimize travel and reduce the risk of infection, all the playoff games, including the World Series, were held at neutral sites. \n\nThe specific venue for the 2020 World Series was Globe Life Field in Arlington, Texas. That's the home stadium of the Texas Rangers. So even though the Dodgers and the Rays were the teams playing, the games took place at a neutral location. I should mention that it was the first time the entire World Series was played at a single neutral site. Also, maybe explain the reason behind this decision, which was the pandemic. The user might be interested in knowing why it was held there instead of the usual home-and-away format. \n\nWait, let me double-check the details. Globe Life Field opened in 2020, right? So it was a new stadium. The Rangers moved there from their old stadium, Globe Life Park. That could be a point to mention. Also, since the Dodgers won, even though the games were in Texas, the Dodgers' home is in Los Angeles, but that didn't affect the location. \n\nI should structure the answer to first state the location, then explain the reason (COVID-19), mention it was a neutral site, and perhaps add a note about it being the first time. That should cover the user's question comprehensively. Make sure the information is accurate. Let me confirm: 2020 World Series, Dodgers vs. Rays, held at Globe Life Field in Arlington, Texas. Yes, that's correct. The neutral site part is important because it's a departure from the norm. The user might also be curious if this has happened before, so noting that it was the first time adds context. \n\nI think that's all. The answer should be concise but informative, addressing both the location and the reason behind it.\n</think>\n\nThe 2020 World Series was held at **Globe Life Field** in **Arlington, Texas**. Due to the COVID-19 pandemic, MLB implemented a neutral-site format for the postseason, with all games played at this venue to minimize travel and health risks. This marked the first time in World Series history that the entire series was played at a single neutral location. The stadium, home to the Texas Rangers, had just opened earlier that year."
},
"finish_reason": "stop",
"content_filter_results": {
"hate": {
"filtered": false
},
"self_harm": {
"filtered": false
},
"sexual": {
"filtered": false
},
"violence": {
"filtered": false
},
"jailbreak": {
"filtered": false,
"detected": false
},
"profanity": {
"filtered": false,
"detected": false
}
}
}
],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 568,
"total_tokens": 618,
"prompt_tokens_details": null,
"completion_tokens_details": null
},
"system_fingerprint": ""
}
Creating completions#
The /completions
endpoint takes a single text string (a prompt) as input,
then outputs a response. In comparison, the /chat/completions
endpoint takes
a list of messages as input.
To use the /completions
endpoint:
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run, for example:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "deepseek-r1-671b"
response = client.completions.create(
prompt="Computers are",
temperature=0,
model=model,
)
print(response)
You should see output similar to:
Completion(id='cmpl-0edfa9cd-bc25-4fe5-9fec-80c0317e9bd7', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=Logprobs(text_offset=None, token_logprobs=None, tokens=None, top_logprobs=None), text=" all around us. They're in our phones, our cars, our thermost")], created=1742579235, model='deepseek-r1-671b', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=16, prompt_tokens=4, total_tokens=20, completion_tokens_details=None, prompt_tokens_details=None))
Run:
curl -sS https://api.lambdalabs.com/v1/completions \
-H "Authorization: Bearer <API-KEY>" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-671b",
"prompt": "Computers are",
"temperature": 0
}' | jq .
You should see output similar to:
{
"id": "cmpl-1086fdaa-cae9-4ee7-bede-ca29c5b5915c",
"object": "text_completion",
"created": 1742579704,
"model": "deepseek-r1-671b",
"choices": [
{
"text": " used for many purposes. What is one thing that computers cannot do?\n\nComputers",
"index": 0,
"finish_reason": "length",
"logprobs": {
"tokens": null,
"token_logprobs": null,
"top_logprobs": null,
"text_offset": null
}
}
],
"usage": {
"prompt_tokens": 4,
"completion_tokens": 16,
"total_tokens": 20,
"prompt_tokens_details": null,
"completion_tokens_details": null
}
}
Listing models#
The /models
endpoint lists the models available for use through the Lambda
Inference API.
To use the /models
endpoint:
First, create and activate a Python virtual environment. Then, install the OpenAI Python API library by running:
Run:
from openai import OpenAI
openai_api_key = "<API-KEY>"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
client.models.list()
You should see output similar to:
SyncPage[Model](data=[Model(id='llama3.3-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-3b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-8b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-nemotron-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-70b-instruct-fp8', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.2-11b-vision-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='lfm-40b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-405b', created=1724347380, object='model', owned_by='lambda'), Model(id='qwen25-coder-32b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-8b-instruct', created=1724347380, object='model', owned_by='lambda'), Model(id='deepseek-llama3.3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='deepseek-r1-671b', created=1724347380, object='model', owned_by='lambda'), Model(id='hermes3-70b', created=1724347380, object='model', owned_by='lambda'), Model(id='llama3.1-405b-instruct-fp8', created=1724347380, object='model', owned_by='lambda')], object='list')
Run:
You should see output similar to:
{
"object": "list",
"data": [
{
"id": "llama3.3-70b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.2-3b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-8b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-nemotron-70b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-70b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.2-11b-vision-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "lfm-40b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-405b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "qwen25-coder-32b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-8b-instruct",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "deepseek-llama3.3-70b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "deepseek-r1-671b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "hermes3-70b",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
},
{
"id": "llama3.1-405b-instruct-fp8",
"object": "model",
"created": 1724347380,
"owned_by": "lambda"
}
]
}
Note
Currently, the following models are available:
llama3.3-70b-instruct-fp8
llama3.2-3b-instruct
hermes3-8b
llama3.1-nemotron-70b-instruct-fp8
llama3.1-70b-instruct-fp8
llama3.2-11b-vision-instruct
lfm-40b
hermes3-405b
qwen25-coder-32b-instruct
llama3.1-8b-instruct
deepseek-llama3.3-70b
deepseek-r1-671b
hermes3-70b
llama3.1-405b-instruct-fp8