Caching in FastAPI

The following outlines the benefits of caching in FastAPI, when to use it, and how to implement it.

Observed performance gains

We built a minimal FastAPI app with two endpoints performing the same recursive Fibonacci calculation, one cached and one uncached, to demonstrate efficiency improvements. Using a TestClient, we ran repeated requests to both endpoints and measured response times.

For the uncached endpoint, each request triggers a full recalculation, causing computation times to grow exponentially with input size. In contrast, the cached endpoint computes the result only once on a cache miss, then serves subsequent cache hits within the 60-second cache window almost instantly, clearly illustrating the performance benefits of caching.

Testing fibonacci(20)...
--------------------------------------------------
Uncached requests:
  Request 1: 0.0018s
  Request 2: 0.0010s
  Request 3: 0.0009s
Cached requests:
  Request 1: 0.0010s
  Request 2: 0.0005s
  Request 3: 0.0003s

Results for fibonacci(20):
  Average uncached: 0.0012s
  Average cached (all requests): 0.0006s
  Average cached (successive only): 0.0004s
  Speed improvement (all): 2.1x faster
  Speed improvement (successive): 3.1x faster
  Time saved (all): 0.6ms per request
  Time saved (successive): 0.8ms per request

Testing fibonacci(30)...
--------------------------------------------------
Uncached requests:
  Request 1: 0.0776s
  Request 2: 0.0776s
  Request 3: 0.0772s
Cached requests:
  Request 1: 0.0787s
  Request 2: 0.0005s
  Request 3: 0.0003s

Results for fibonacci(30):
  Average uncached: 0.0775s
  Average cached (all requests): 0.0265s
  Average cached (successive only): 0.0004s
  Speed improvement (all): 2.9x faster
  Speed improvement (successive): 187.8x faster
  Time saved (all): 51.0ms per request
  Time saved (successive): 77.1ms per request

Testing fibonacci(35)...
--------------------------------------------------
Uncached requests:
  Request 1: 0.8487s
  Request 2: 0.8573s
  Request 3: 0.8504s
Cached requests:
  Request 1: 0.8528s
  Request 2: 0.0005s
  Request 3: 0.0003s

Results for fibonacci(35):
  Average uncached: 0.8521s
  Average cached (all requests): 0.2845s
  Average cached (successive only): 0.0004s
  Speed improvement (all): 3.0x faster
  Speed improvement (successive): 1996.1x faster
  Time saved (all): 567.6ms per request
  Time saved (successive): 851.7ms per request

Through testing, we observed that caching yields exponentially greater speed improvements as computational complexity increases. Uncached response times rose from 1 ms to 843 ms, while successive cached requests (cache hits) consistently remained under 1 ms. This performance gain occurs because cached endpoints retrieve pre-computed results directly from memory instead of re-executing the expensive Fibonacci calculation.

Selecting endpoints for caching

In-memory caching is best suited for endpoints that are called frequently within short intervals and return identical responses for the same input parameters. As shown in our test case, caching converts expensive computations into fast memory lookups. This principle also applies to database queries, frequently repeated queries can be cached to significantly reduce response times.

However, not all endpoints benefit equally from caching. For endpoints with inherently fast execution times, caching overhead may offset any performance gains, making improvements negligible.

The cache hit ratio is another critical factor. A hit ratio of at least 30%, meaning 30% of requests retrieve cached data is generally required to justify the memory cost. Low hit ratios often signal high cache key cardinality, where too many unique input combinations prevent cache reuse and waste memory on single-use entries.

This is particularly important because in-memory caches do not automatically expire entries, old data is only removed when a new request triggers cache cleanup. Each unique request generates a unique cache key, and as the number of unique requests increases, these keys accumulate in memory. Endpoints that are infrequently accessed retain these cached results, leading to elevated RAM usage and potential memory leaks. It’s therefore important to monitor the number of unique cache keys generated by each endpoint and ensure it remains within a reasonable range to prevent uncontrolled memory growth.

Finally, in-memory caching should be avoided for endpoints returning large payloads (e.g., images, videos, or datasets exceeding 5–10 MB). Since in-memory caches store data directly in application RAM, large responses can quickly exhaust available memory and degrade system stability.

Good use cases for caching

Endpoints with high request frequency with the same parameters
Computationally expensive operations
Static Data Lookups

Bad use cases for caching

User-specific Data
Real-time Data
Endpoints with low request frequency
Large payloads (>5-10mb)
Endpoints with fast execution
Endpoints with high cardinality inputs (a lot of unique parameters)

Implementation guide

Import FastAPICache classes and initialize the cache in lifespan.py

...
from fastapi_cache import FastAPICache
from fastapi_cache.backends.inmemory import InMemoryBackend
...

@asynccontextmanager
async def lifespan(_: FastAPI) -> AsyncGenerator[None, None]:
    """Lifecycle event for the FastAPI app."""
    ...
    FastAPICache.init(InMemoryBackend())
    yield

In the endpoints themselves, import the cache decorator and add it above the function definition, specifying the cache expiration time in seconds using the expire parameter.

from fastapi import APIRouter
from fastapi_cache.decorator import cache

picture_requests_router = APIRouter(tags=["ARO", "Picture Requests"])

@picture_requests_router.get("/example-endpoint")
@cache(expire=60)  # Cache for 1 minute
async def example(word: str) -> str:
  ...