API Rate Limiting Guide

To ensure stability, reliability, and fair usage for all clients, our API employs a rate-limiting system. This document explains how our rate limiting works and the best practices you should follow to ensure your integration runs smoothly.

How Rate Limiting Works

Our system identifies clients based on the Subject (sub) claim within your JSON Web Token (JWT). Based on this unique identifier, your requests are allocated to a specific rate-limiting tier.

Each tier has two key properties:

Rate: The sustainable number of requests you can make per second.
Burst: A small, additional allowance to handle short spikes in traffic. Think of it as a bucket: the rate is how fast the bucket refills, and the burst is the bucket's total capacity. As long as there's capacity in the bucket, your requests are processed instantly.

Our configuration uses nodelay, which means requests are never queued or slowed down. They are either processed immediately (if within the limit) or rejected (if the limit is exceeded).

Rate Limiting Tiers

At the moment of writing there is only one rate limiting tier that is configured for the end users.

Name	Rate limit	Burst
General Access	10 requests/sec	15 requests

Rate limiting for this tier is applied on a per-user basis.

What Happens When You're Rate Limited?

If you exceed the rate and burst allowance for your tier, our server will reject the request and respond with:

HTTP Status Code: 429 Too Many Requests
Response Header: Retry-After: 1

The Retry-After header indicates the minimum number of seconds you should wait before attempting another request. In our current configuration, this will always be 1 second.

Best Practices for API Integration

Following these guidelines will help you build a robust and reliable integration while avoiding rate limits.

Handle 429 Responses Gracefully.

Your application must be designed to handle 429 status codes. When you receive one, check for the Retry-After header and pause subsequent requests for at least that duration.
Implement Exponential Backoff

Simply waiting and retrying immediately isn't enough. The best strategy is exponential backoff with jitter. This means you increase the wait time between retries after each consecutive failure.

Below is an example of a simple retry with backoff in Python for an arbitrary function. Adding "jitter" (a small, random delay) prevents multiple clients from retrying at the exact same time which is critical if the integration code is deployed to the edge devices (i.e. home automation control plane).

import time
import random

def retry_with_backoff(func, max_retries=5, base_delay_ms=100):
    attempt = 0
    while attempt < max_retries:
        try:
            result = func()
            return result
        except Exception as e:
            attempt += 1
            print(f"Attempt {attempt} failed: {e}")

            if attempt >= max_retries:
                print("All retries have failed.")
                raise e

            exponential_backoff = (2 ** attempt) * base_delay_ms
            jitter = random.uniform(0, exponential_backoff * 0.5)
            total_delay_ms = exponential_backoff + jitter
            total_delay_s = total_delay_ms / 1000.0

            print(f"Waiting for {total_delay_ms:.2f}ms before next retry...")
            time.sleep(total_delay_s)

Cache Data on Your End

Avoid making redundant API calls. If you frequently need data that doesn't change often, cache it on your server or client-side to reduce the number of requests you send.
Control Concurrency

Be mindful of how many parallel requests your application is making. Firing off hundreds of requests simultaneously is a quick way to exhaust your burst allowance. If you need to perform bulk operations, queue them and process them at a rate that respects your tier's limits.
Use subscriptions-based endpoints if available. For enterprise integrations Zaptec provides user-group based notification systems which can be utilized to receive the updates from chargers.