Production Ready GraphQL

Part 2: Performance

Now we've covered the basic structure let's look at how we can make it scale for performance.

Nested Resolvers

We've already started to implement a nested resolver pattern in part 1.

The alternative to this is to return everything you need from the root resolver. But this will quickly cause problems such as data over-fetching.

We also mentioned that the graph could be connected in exponentially complex ways. Therefore it would quickly become impractical to return everything you need from the root resolver.

Imagine a query that looks like this.

query {
  posts {
    id
    title
    comments {
      id
      text
      user {
        id
        name
      }
    }
  }
}

Here we would need to find all posts with all their comments and for each comment return the user who created it.

const posts = (root, args, context) => {
  return context.db.posts.find({
    include: {
      comments: {
        include: {
          user: true,
        },
      },
    },
  });
};

This would be entirely unscalable. As the schema becomes more complex the posts resolver would also grow more complex.

The posts resolver would also need information on how the comment and user nodes are related which tightly couples the code and makes it less resilient to change.

By using nested resolvers we could throw any query at our service and see it gracefully handled with no extra effort on our part. Each node remains more "pure" in a sense and only implements logic directly related to itself.

The best approach here is nested resolvers which only resolve relationships one level deep.

// nodes/Post/resolvers.ts
const resolvers = {
  Post: {
    comments: (post, args, context) => {
      return context.db.comments.find({
        where: {
          postId: post.id,
        },
      });
    },
  },
};

// nodes/Comment/resolvers.ts
const resolvers = {
  Comment: {
    user: (comment, args, context) => {
      return context.db.users.findOne(comment.userId);
    },
  },
};

With this it doesn't matter at what point in a query a node was requested. It will always resolve.

# Request all posts with all their comments and
# for each comment return the user who created it.
query {
  posts {
    id
    title
    comments {
      id
      text
      user {
        id
        name
      }
    }
  }
}

# Request all comments and the user who created it.
query {
  comments {
    id
    text
    user {
      id
      name
    }
  }
}

N+1 Problem

Using nested resolvers can create other problems.

the n+1 problem occurs when you loop through the results of a query and perform one additional query per result, resulting in n number of queries plus the original (n+1). This is a common problem with ORMs, particularly in combination with GraphQL, because it is not always immediately obvious that your code is generating inefficient queries.

Imagine we want to make the following query again:

query {
  posts {
    id
    title
    comments {
      id
      text
      user {
        id
        name
      }
    }
  }
}

For each resolver, this would create a separate call to our database.

Select all posts

SELECT id, title FROM posts
# Returns 3 posts with ids 1, 2, 3

For each post run a separate query for its comment

SELECT id, text, userId FROM comments WHERE postId = 1;
SELECT id, text, userId FROM comments WHERE postId = 2;
SELECT id, text, userId FROM comments WHERE postId = 3;

For each comment, run a separate call for the user who created it. This could be the same user across multiple comments.

SELECT id, name FROM users WHERE id = 1;
SELECT id, name FROM users WHERE id = 1;
SELECT id, name FROM users WHERE id = 1;
SELECT id, name FROM users WHERE id = 2;
SELECT id, name FROM users WHERE id = 2;
SELECT id, name FROM users WHERE id = 2;

The number of round trips to the database will rapidly grow. But this can be avoided using dataloaders. ORM's such as Prisma already implement a dataloader pattern using query optimization

A dataloader allows you to batch multiple queries together and only perform one query per batch.

Taking a look at our user resolver we could instead use a dataloader.

const resolvers = {
  Comment: {
    user: (comment, args, context) => {
      // Direct call to database (Inefficient)
      // return context.db.users.findOne(comment.userId);

      // Using a dataloader (Efficient)
      return userDataloader.load(comment.userId);
    },
  },
};

The implementation of this dataloader would look something like:

const batchUserFunction = (ids) => {
  return db.users.findAll({
    whereIn: ids,
  });
};

export default new DataLoader(batchUserFunction);

This would translate to a single database query for all users. The dataloader will handle deduplicating the inputs and returning the correct user for each call.

SELECT id, name FROM users WHERE id in (1, 2);

Read more about dataloaders https://github.com/graphql/dataloader

With a combination of nested resolvers and dataloaders we can avoid data over-fetching and ensure we don't make too many roundtrips to the underlying datasource.