Rate Limits
LangSmith has rate limits which are designed to ensure the stability of the service for all users.
To ensure access and stability, LangSmith will respond with HTTP Status Code 429 indicating that rate or usage limits have been exceeded under the following circumstances:
Scenarios
Temporary throughput limit over a 1 minute period at our application load balancer
This 429 is the the result of exceeding a fixed number of API calls over a 1 minute window on a per API key/access token basis. The start of the window will vary slightly — it is not guaranteed to start at the start of a clock minute — and may change depending on application deployment events.
After the max events are received we will respond with a 429 until 60 seconds from the start of the evaluation window has been reached and then the process repeats.
This 429 is thrown by our application load balancer and is a mechanism in place for all LangSmith users independent of plan tier to ensure continuity of service for all users.
Method | Endpoint | Limit | Window |
---|---|---|---|
DELETE | Sessions | 30 | 1 minute |
POST OR PATCH | Runs | 5000 | 1 minute |
POST | Feedback | 5000 | 1 minute |
* | * | 2000 | 1 minute |
The LangSmith SDK takes steps to minimize the likelihood of reaching these limits on run-related endpoints by batching up to 100 runs from a single session ID into a single API call.
Plan-level hourly trace event limit
This 429 is the result of reaching your maximum hourly events ingested and is evaluated in a fixed window starting at the beginning of each clock hour in UTC and resets at the top of each new hour.
An event in this context is the creation or update of a run. So if run is created, then subsequently updated in the same hourly window, that will count as 2 events against this limit.
This is thrown by our application and varies by plan tier, with organizations on our Startup/Plus and Enterprise plan tiers having higher hourly limits than our Free and Developer Plan Tiers which are designed for personal use.
Plan | Limit | Window |
---|---|---|
Developer (no payment on file) | 50,000 events | 1 hour |
Developer (with payment on file) | 250,000 events | 1 hour |
Startup/Plus | 500,000 events | 1 hour |
Enterprise | Custom | Custom |
Plan-level hourly trace data ingest limit
This 429 is the result of reaching the maximum amount of data ingested across your trace inputs, outputs, and metadata and is evaluated in a fixed window starting at the beginning of each clock hour in UTC and resets at the top of each new hour.
Typically, inputs, outputs, and metadata are send on both run creation and update events. So if a run is created and is 2.0MB in size at creation, and 3.0MB in size when updated in the same hourly window, that will count as 5.0MB of storage against this limit.
This is thrown by our application and varies by plan tier, with organizations on our Startup/Plus and Enterprise plan tiers having higher hourly limits than our Free and Developer Plan Tiers which are designed for personal use.
Plan | Limit | Window |
---|---|---|
Developer (no payment on file) | 500MB | 1 hour |
Developer (with payment on file) | 2.5GB | 1 hour |
Startup/Plus | 5.0GB | 1 hour |
Enterprise | Custom | Custom |
Plan-level monthly unique traces limit
This 429 is the result of reaching your maximum monthly traces ingested and is evaluated in a fixed window starting at the beginning of each calendar month in UTC and resets at the beginning of each new month.
This is thrown by our application and applies only to the Developer Plan Tier when there is no payment method on file.
Plan | Limit | Window |
---|---|---|
Developer (no payment on file) | 5,000 traces | 1 month |
Self-configured monthly usage limits
This 429 is the result of reaching your usage limit as configured by your organization admin and is evaluated in a fixed window starting at the beginning of each calendar month in UTC and resets at the beginning of each new month.
This is thrown by our application and varies by organization based on their configured settings.
Handling 429s responses in your application
Since some 429 responses are temporary and may succeed on a successive call, if you are directly calling the LangSmith API in your application we recommend implementing retry logic with exponential backoff and jitter.
For convenience, LangChain applications built with the LangSmith SDK has this capability built-in.
It is important to note that if you are saturating the endpoints for extended periods of time, retries may not be effective as your application will eventually run large enough backlogs to exhaust all retries.
If that is the case, we would like to discuss your needs more specifically. Please reach out to LangSmith Support with details about your applications throughput needs and sample code and we can work with you to better understand whether the best approach is fixing a bug, changes to your application code, or a different LangSmith plan.