Skip to main content

Command Palette

Search for a command to run...

Designing Web APIs: Chapter 6 - Scaling APIs

Updated
10 min read
Designing Web APIs: Chapter 6 - Scaling APIs
M

I'm Mohammed, a passionate software engineer. I've been working on server-side operations for the past six years, but my love for technology started when I was a kid. I love talking about Backend Development, Performance Optimization, Scalability, System Design and Databases. Join me as I explore these topics, sharing insights and experiences from my journey in the tech world.

When others rely on your API, availability and reliability become critical. You need to make sure it’s always up, responsive, and fast, even during unexpected usage spikes. A sudden spike in traffic can degrade your service quality or even crash your application, especially if it depends heavily on your APIs.

To handle a growing number of API calls, several optimizations can be made at the application level. These include improving database queries, sharding data, adding missing indexes, leveraging caching, processing heavy tasks asynchronously, writing efficient code, and fine-tuning web servers.

Beyond these technical fixes, scaling also means improving API design, setting smart usage policies, and helping developers use your API efficiently. Thoughtful changes like these can make a big difference in handling growth smoothly.

To ensure your API can handle growth in both traffic and use cases, it needs to scale effectively. In this chapter, we’ll focus on key practices like:

• Scaling throughput

• Evolving API design

• Paginating APIs

• Rate-limiting APIs

• Developer SDKs

✨ Scaling Throughput

As your API usage grows, the number of API calls per second increases. Optimizing your API for scaling is essential, and the first step is to identify potential bottlenecks that could limit its performance.

Finding the Bottlenecks

Scaling requires understanding where your API struggles. By monitoring and collecting usage data, you can identify and address the key bottlenecks, which often fall into four categories:

Disk I/O
Bottlenecks can occur due to expensive database queries or local disk access, slowing down data retrieval and processing.

Network I/O
Network-related issues are common when your API relies on external services, resulting in slow cross-data center API calls.

CPU
Inefficient code that performs costly computations can strain CPU resources, leading to slow response times.

Memory
Insufficient RAM can cause memory bottlenecks, particularly when handling large volumes of data or concurrent requests.

Cloud platforms like AWS (CloudWatch) offers tools to measure and monitor these bottlenecks, providing insights into performance and health.

Profiling to Identify Issues

Once you identify the most frequently used API methods, profiling tools can help find which parts of the code consume excessive resources. Profiling in production may reveal issues not present in development due to the differing conditions in each environment. By enabling production profiling for a small subset of traffic, you gain deeper insights into performance.

Code Profiling
Profiling your code helps identify CPU or memory-intensive functions that are slowing down your API. This allows you to optimize specific code paths that are inefficient.

Database Profiling
Tools like MySQL's slow query log can help identify database queries that are slowing down performance, allowing you to target and optimize these bottlenecks.

Network I/O Testing
Load testing simulates peak traffic to assess how the API performs under high demand. This is useful for preparing for anticipated spikes, such as those seen during major shopping events like Black Friday.

Adding Computing Resources

To scale your application, you can add more computing resources in two ways:

  1. Vertical Scaling: Increase the power of existing servers by adding more CPUs, RAM, or disk storage.

  2. Horizontal Scaling: Add more server instances to your resource pool, allowing the load to be distributed across multiple servers.

A typical large web application uses a Load Balancer to distribute requests across web servers. To scale databases, data is partitioned across multiple servers This is also referred to as Database Sharding, where each server stores part of the data. Database Replication further distributes the load, improving performance, reliability, and scalability.

Cloud application scaling has many complexities and some of them can increase the development time of your application. Unless you have scaling problems, you probably don’t want to add that complexity.

Database Indexes

Indexing optimizes data retrieval by helping the database locate rows quickly without scanning every entry. For example, creating an index on the email column in a users table speeds up queries that search by email.

However, too many indexes can be problematic. They consume additional storage and slow down operations like inserts, updates, or deletes, as the indexes must also be updated. Indexes are most useful on columns frequently used in WHERE, ORDER BY, or GROUP BY clauses.

Caching

Caching is a popular technique to handle high throughput by storing data in memory, making it faster to retrieve compared to disk storage. Tools like Memcached are often used to store responses from database queries. By analyzing database logs, you can identify slow or frequently used queries, and cache their results to speed up future lookups. If the data isn't found in the cache, a database query retrieves it, and the result is stored in the cache for future use.

Remember to invalidate the cache when data is updated, either by deleting it or letting it expire if update delays are acceptable. To improve performance further, Edge Caching stores API results closer to end users, reducing load on the server and enhancing throughput.

Doing Expensive Operations Asynchronously

For API requests that take a long time, consider performing expensive operations asynchronously to speed up response times, handling these operations outside the main request flow, improving efficiency and responsiveness. For example, when a user uploads a file, you don’t need to include the indexing process in the same request. Instead, index the file in an offline job that runs asynchronously.

✨Evolving Your API Design

Introducing New Data Access Patterns

As your API grows, it may face scaling challenges due to increased usage or unexpected developer needs. Identifying bottlenecks, such as excessive polling or returning too much data, is critical. Solutions include introducing new data access patterns and leveraging technologies like WebSockets, WebHooks, or GraphQL to optimize performance and reduce server load.

Examples:

  • Zapier: Reduced polling by supporting WebHooks, cutting server load significantly.

  • Twitter: Introduced a streaming API to replace frequent polling for real-time tweets.

  • GitHub: Adopted GraphQL to minimize API calls and provide tailored data responses.

  • Slack: Shifted from a WebSocket-based RTM API to a WebHook-based Events API, improving scalability for both developers and Slack.

By evolving your API design and engaging with developers for feedback, you can address scalability challenges effectively.

Supporting Bulk Endpoints

Bulk endpoints enable developers to perform operations on multiple items in a single API call, reducing HTTP round trips and database load.

For example, Slack previously required separate calls to invite each user to a channel. By adding bulk support, developers could invite multiple users in one request, improving efficiency for both Slack and its users.

This approach saves resources and simplifies usage, making it a valuable strategy for scalable API design.

POST /api/conversations.invite
HOST slack.com
Content-Type: application/json
Authorization: Bearer xoxp-165018607-jqf4sbdaq2a
{
"channel":"C0GEV71UG",
"users":["W1234567890","U2345678901", "U3456789012"]
}

Adding New Options to Filter Results

APIs can offer filtering options that allow developers to retrieve only the data they need. Common filters include:

  • Search Filters: Let developers query specific results using keywords, regex, or matches, reducing unnecessary data.

  • Date Filters: Return results before or after a timestamp, helpful for retrieving only new or relevant data (e.g., Twitter or Slack).

  • Order Filters: Enable sorting by attributes like price or popularity, reducing post-processing work.

  • Field Selection: Allow developers to include or exclude specific fields to save computation time and reduce payload size.

These filters minimize server load and optimize API responses, enhancing both scalability and usability.

✨Paginating APIs

When handling large datasets, APIs can improve performance and scalability by implementing pagination. Instead of returning thousands of items in a single call, which can overload backends and slow down clients. Pagination breaks data into smaller chunks, reducing response times and making results more manageable.

Two common methods of pagination include:

  • Offset-Based Pagination: Allows clients to specify an offset and limit to fetch subsets of data. While simple, it can struggle with consistency in rapidly changing datasets.

  • Cursor-Based Pagination: Uses a cursor or token to mark the starting point of the next set of results. This method is more reliable for datasets that change frequently, as it avoids issues with skipped or duplicated items.

Each method has its pros and cons, and choosing the right one depends on your API’s use case.

🔔 Look out for an upcoming article where we'll dive deeper into the differences between these methods!

✨Rate-Limiting APIs

Rate-limiting is a system used to control the number of requests an API allows within a specified time period. It helps manage traffic, prevent overloads, and ensure the stability of the API. Here's why it's important:

  • Protect infrastructure: Prevent server overload and safeguard the system from attacks, like Denial of Service (DoS).

  • Protect product: Stop abuse, such as mass registrations, spam, or misuse of API features.

A few things to consider when developing your rate- limiting policy:

  • Global vs. per-endpoint limits

    You can apply a single global limit or have specific limits for high-traffic endpoints.

  • Rate limits by entity
    Apply limits based on user, app, or IP address (depending on the authentication method).

  • Allow traffic bursts
    Some APIs support brief traffic surges without exceeding the limit using algorithms like the token bucket.

  • Exceptions

    You don’t need to have a single rate-limiting policy or a single set of rate limits (or quota) that are applicable to all your developers. Trusted developers can request higher limits, but verify their use case and ensure your system can handle it.

Implementation Strategies
When building a rate-limiting system, it’s important to ensure that it doesn't negatively impact your API's response time. To maintain high performance and scalability, many API services use in-memory data stores, such as Redis or Memcached, to handle rate-limiting. These systems allow for fast read and write operations, making them ideal for tracking the number of API requests.

Common algorithms used to implement rate-limiting include:

  • Token bucket

  • Fixed-window counter

  • Sliding-window counter

🔔 We will dive deeper into these algorithms and explain them in detail in a future article.

✨Developer SDKs

A developer software development kit (SDK) is a powerful set of tools that allows developers to build applications on a specific platform. By offering SDKs, you're not only simplifying the integration process for developers but also guiding them toward following best practices when working with your API. When developers can work with your API more effectively, it encourages optimal usage patterns, which ultimately aids in scaling your platform.

In this section, we’ll explore key considerations to keep in mind when designing SDKs that can help scale your API.

Rate-Limiting Support

Your SDK should automatically handle rate limits by parsing rate-limit headers, adjusting request rates, and retrying after a 429 error based on the specified time.

Pagination Support

To avoid hitting rate limits, your SDK should manage paginated responses and gracefully handle errors, while limiting the number of pages fetched in a loop.

Using Gzip Compression

Using gzip compression reduces network bandwidth usage, improving performance and lowering costs despite additional CPU resources for compression.

Caching Frequently Used Data

Implement caching to reduce API calls, improving scalability. Ensure proper expiration and caching policies are in place.

Error Handling and Exponential Back-Off

Error handling is crucial but often overlooked by developers. To improve error handling in your SDK, it’s important to include checks for invalid requests before they hit your servers. For instance, your SDK can reject an API call locally if it’s missing a required parameter.

Your SDK should also support actions to take when an API request fails. Some failures, such as authorization errors, cannot be retried, so your SDK should surface appropriate errors for these. For other errors, your SDK should automatically retry the request.

To avoid overloading your servers, implement exponential back-off. This strategy involves retrying failed requests over increasing intervals, helping reduce the frequency of requests and allowing the system to recover during outages.

SDK Best Practices

Building SDKs that help scale your API requires attention to detail. Here are some best practices to consider:

  • Stability, security, and reliability. Bugs in your SDK can cause significant disruptions for developers, and updates may be challenging to deploy. Testing before releasing your SDK is essential.

  • For mobile SDKs, optimization is key. Focus on reducing size, memory usage, CPU usage, network interactions, and battery consumption.

  • Simplify complex operations, such as OAuth, in your SDK to streamline the onboarding experience for developers.

  • Handle rate limits and errors. Your SDK should be designed to protect your API servers from overload by avoiding too many concurrent calls.

  • Troubleshooting should be straightforward. Enable logging and surface clear error messages to assist developers in discovering issues.

  • The way you distribute your SDK impacts its adoption. Utilize platforms like npm, Composer, or pip to make it easy for developers to access your SDK.

Conclusion

Scaling an API isn’t just about handling more requests; it’s about optimizing efficiency. Identify scalability challenges, assess if API calls can be reduced, and improve design for better usage. Developer feedback is essential in achieving successful API scaling.

In Chapter 7, we will discuss how to approach change with an eye for consistency as well as how to ensure backward compatibility as your API grows.

References

Designing Web APIs: Building APIs That Developers Love

Thank you for reading, and I hope you found the insights helpful! 💖

578 views

Designing Web APIs

Part 2 of 7

This series summarizes the book Designing Web APIs, guiding readers to shift from API users to API providers. It covers best practices for creating effective APIs, focusing on design, implementation, and maintenance for building robust, scalable APIs

Up next

Designing Web APIs: Chapter 5 - Design In Practice

Now that we’ve covered API paradigms, security, and best practices, it’s time to put these concepts into action. This chapter focuses on designing APIs through a hands-on approach using an imaginary example, emphasizing user-focused design and practi...