API optimization using SQL (n+1) optimization

SebinSebin

In the ever-evolving world of technology, speed and efficiency are key pillars in providing a seamless user experience. When it comes to processing time-series data through an Express endpoint, the initial setup can be the difference between lightning-fast results and frustrating delays. In this blog post, we embark on a journey to uncover the challenges faced by an Express endpoint tasked with handling vast amounts of time-series data, and how we transformed it into a high-performance system.

The Original Approach:

Our story begins with an Express endpoint tasked with querying extensive time-series data spanning over seven days. The endpoint needed to serve multiple users simultaneously, but there was a problem – the processing time was far from optimal, often taking several seconds to complete. To understand the root of the issue, let's delve into the original approach.

Step 1: Querying Tenant-Specific Customer Data

The first step involved querying the database to obtain all customers associated with a particular tenant. The query looked like this:

SELECT * FROM customer WHERE tenant_id='some-uuid'

The response provided an array of customer names, such as ["Cust1", "Cust2", ..., "Cust n"].

Step 2: Iterating Through Customers

Next, a for loop was used to iterate through each customer in the array. For each customer, another query was executed to retrieve devices related to that customer:

SELECT * FROM device WHERE customer='custId'`

The response yielded an array of device names, such as ["device1", "device2", ..., "device n"].

Step 3: Iterating Through Devices

Continuing down the rabbit hole, another for loop was employed to iterate through each device in the array. For each device, yet another query was executed to retrieve data for that device:

SELECT * FROM device_attribute WHERE deviceId='deviceId'

Finally, the code would look something like this:

const customerQuery = `SELECT * FROM customer WHERE tenant_id='${tenantId}'`;
const customers = await database.query(customerQuery);
for (const customer of customers) {
  const customerId = customer.id;
  const deviceQuery = `SELECT * FROM device WHERE customer='${customerId}'`;
  const devices = await database.query(deviceQuery);
  for (const device of devices) {
    const deviceId = device.id;
    const dataQuery = `SELECT * FROM device_attribute WHERE deviceId='${deviceId}'`;
    const deviceAttributes = await database.query(dataQuery);
  }
}

Challenges:

This approach presented several challenges that hindered performance optimization:

  1. Multiple Database Queries: The approach required multiple database queries, causing a significant overhead in terms of database interactions.

  2. Nested Loops: The nested loop structure compounded the problem, as it added complexity and slowed down the processing.

Optimizing the Endpoint:

To enhance the Express endpoint's performance, a more efficient strategy was employed, which involved the following key changes.

Change 1: Use the IN Clause

In the optimized code snippet, we first retrieve the customer IDs and use the IN clause to query devices for those customers. This approach minimizes the number of queries executed, improving efficiency and potentially reducing database load. Here's how it looks:

const customerQuery = `SELECT * FROM customer WHERE tenant_id='${tenantId}'`;
const customers = await database.query(customerQuery);
const customerIds = customers.map(customer => customer.id);
const deviceQuery = `SELECT * FROM device WHERE customer IN (${customerIds.join(', ')})`;
const devices = await database.query(deviceQuery);
for (const device of devices) {
  const deviceId = device.id;
  const dataQuery = `SELECT * FROM device_attribute WHERE deviceId='${deviceId}'`;
  const deviceAttributes = await database.query(dataQuery);
}

Using the IN clause can be faster because:

  • It reduces the number of round-trips between your application and the database, which can be a significant performance improvement, especially when dealing with a large number of IDs.

  • Databases are optimized to efficiently process IN clauses, which allows them to perform the operation in a more optimized manner.

  • It simplifies your code by allowing you to consolidate the queries into one, making it easier to manage and read.

Change 2: Using JOINs

By using SQL JOINs, you can retrieve all the necessary data in a single query, which can significantly reduce the number of database interactions and potentially improve query performance. This approach is generally more efficient and optimized for database operations. Here's an example:

const query = `
  SELECT c.*, d.*, da.*
  FROM customer AS c
  JOIN device AS d ON c.id = d.customer
  JOIN device_attribute AS da ON d.id = da.deviceId
  WHERE c.tenant_id = '${tenantId}'
`;
const result = await database.query(query);``

Conclusion:

By streamlining the data retrieval process, reducing database queries, and optimizing the overall workflow, the Express endpoint was transformed into a much faster and more responsive system. This optimization not only improved the user experience but also reduced the strain on the database, leading to better overall system performance.