Caching in FastAPI
The following outlines the benefits of caching in FastAPI, when to use it, and how to implement it.
Observed performance gains
Section titled “Observed performance gains”We built a minimal FastAPI app with two endpoints performing the same recursive Fibonacci calculation, one cached and one uncached, to demonstrate efficiency improvements. Using a TestClient, we ran repeated requests to both endpoints and measured response times.
For the uncached endpoint, each request triggers a full recalculation, causing computation times to grow exponentially with input size. In contrast, the cached endpoint computes the result only once on a cache miss, then serves subsequent cache hits within the 60-second cache window almost instantly, clearly illustrating the performance benefits of caching.
Testing fibonacci(20)...--------------------------------------------------Uncached requests: Request 1: 0.0018s Request 2: 0.0010s Request 3: 0.0009sCached requests: Request 1: 0.0010s Request 2: 0.0005s Request 3: 0.0003s
Results for fibonacci(20): Average uncached: 0.0012s Average cached (all requests): 0.0006s Average cached (successive only): 0.0004s Speed improvement (all): 2.1x faster Speed improvement (successive): 3.1x faster Time saved (all): 0.6ms per request Time saved (successive): 0.8ms per request
Testing fibonacci(30)...--------------------------------------------------Uncached requests: Request 1: 0.0776s Request 2: 0.0776s Request 3: 0.0772sCached requests: Request 1: 0.0787s Request 2: 0.0005s Request 3: 0.0003s
Results for fibonacci(30): Average uncached: 0.0775s Average cached (all requests): 0.0265s Average cached (successive only): 0.0004s Speed improvement (all): 2.9x faster Speed improvement (successive): 187.8x faster Time saved (all): 51.0ms per request Time saved (successive): 77.1ms per request
Testing fibonacci(35)...--------------------------------------------------Uncached requests: Request 1: 0.8487s Request 2: 0.8573s Request 3: 0.8504sCached requests: Request 1: 0.8528s Request 2: 0.0005s Request 3: 0.0003s
Results for fibonacci(35): Average uncached: 0.8521s Average cached (all requests): 0.2845s Average cached (successive only): 0.0004s Speed improvement (all): 3.0x faster Speed improvement (successive): 1996.1x faster Time saved (all): 567.6ms per request Time saved (successive): 851.7ms per requestThrough testing, we observed that caching yields exponentially greater speed improvements as computational complexity increases. Uncached response times rose from 1 ms to 843 ms, while successive cached requests (cache hits) consistently remained under 1 ms. This performance gain occurs because cached endpoints retrieve pre-computed results directly from memory instead of re-executing the expensive Fibonacci calculation.
Selecting endpoints for caching
Section titled “Selecting endpoints for caching”In-memory caching is best suited for endpoints that are called frequently within short intervals and return identical responses for the same input parameters. As shown in our test case, caching converts expensive computations into fast memory lookups. This principle also applies to database queries, frequently repeated queries can be cached to significantly reduce response times.
However, not all endpoints benefit equally from caching. For endpoints with inherently fast execution times, caching overhead may offset any performance gains, making improvements negligible.
The cache hit ratio is another critical factor. A hit ratio of at least 30%, meaning 30% of requests retrieve cached data is generally required to justify the memory cost. Low hit ratios often signal high cache key cardinality, where too many unique input combinations prevent cache reuse and waste memory on single-use entries.
This is particularly important because in-memory caches do not automatically expire entries, old data is only removed when a new request triggers cache cleanup. Each unique request generates a unique cache key, and as the number of unique requests increases, these keys accumulate in memory. Endpoints that are infrequently accessed retain these cached results, leading to elevated RAM usage and potential memory leaks. It’s therefore important to monitor the number of unique cache keys generated by each endpoint and ensure it remains within a reasonable range to prevent uncontrolled memory growth.
Finally, in-memory caching should be avoided for endpoints returning large payloads (e.g., images, videos, or datasets exceeding 5–10 MB). Since in-memory caches store data directly in application RAM, large responses can quickly exhaust available memory and degrade system stability.
Good use cases for caching
Section titled “Good use cases for caching”- Endpoints with high request frequency with the same parameters
- Computationally expensive operations
- Static Data Lookups
Bad use cases for caching
Section titled “Bad use cases for caching”- User-specific Data
- Real-time Data
- Endpoints with low request frequency
- Large payloads (>5-10mb)
- Endpoints with fast execution
- Endpoints with high cardinality inputs (a lot of unique parameters)
Implementation guide
Section titled “Implementation guide”Import FastAPICache classes and initialize the cache in lifespan.py
...from fastapi_cache import FastAPICachefrom fastapi_cache.backends.inmemory import InMemoryBackend...
@asynccontextmanagerasync def lifespan(_: FastAPI) -> AsyncGenerator[None, None]: """Lifecycle event for the FastAPI app.""" ... FastAPICache.init(InMemoryBackend()) yieldIn the endpoints themselves, import the cache decorator and add it above the function definition, specifying the cache expiration time in seconds using the expire parameter.
from fastapi import APIRouterfrom fastapi_cache.decorator import cache
picture_requests_router = APIRouter(tags=["ARO", "Picture Requests"])
@picture_requests_router.get("/example-endpoint")@cache(expire=60) # Cache for 1 minuteasync def example(word: str) -> str: ...