Part 5: Incremental Builds
The term “Incremental Builds” is not specific to Gatsby as it’s more a concept than a specific technology or implementation. Generally speaking it means that on subsequent runs of an action data is only changed incrementally, in contrast to having to do a full action each time. This is often achieved by populating a cache on the first run, listening for changes, and then only updating what has changed.
Gatsby supports Incremental Builds to varying degrees and in different places:
- The open source framework incrementally builds various steps in its build pipeline without requiring any special setup from users. These optimizations are a more generic approach and work with every site, regardless of their use case or sources.
- Often times a large portion of the build is spent on sourcing data from remote APIs through Gatsby source plugins. Source plugin authors can write their plugin in such a way that on subsequent runs data sourcing is only happening incrementally. This behavior is called Delta updates in this part of the tutorial. This optimization can further be supercharged if the source plugin is approved for Cloud Builds on Gatsby Cloud.
- Source plugins can also create/update/delete GraphQL nodes in an incremental fashion instead of recreating them all on every run (
deleteNode). This functionality is directly coupled with Delta updates.
By the end of this part of the tutorial, you will be able to:
- Use Node APIs to optimize node creation
- Describe in more detail how delta updates work
Before diving into the details, here’s a high-level overview of how Incremental Builds work:
Expand for detailed description
On the left side you have your API (CMS, Database, etc.) and on the right side the Gatsby source plugin.
During the first run of the source plugin a couple of things happen:
- All available data from the API is requested and sourced
- Once sourcing is done, a sync token (e.g. a token returned by the API or a generated timestamp) is saved. It’ll be used on subsequent runs.
- All GraphQL nodes are created
The API changed (e.g. a content editor fixed a typo in the CMS) and a new build is kicked off. During the second run of the source plugin this data change is happening incrementally:
- The sync token is sent along with the request to the API
- The API only returns the changed data, not the full data
- A new sync token is saved
- According to the delta update, new nodes are created and nodes referencing old data are removed
Here’s a quick analogy with a tool you might use daily:
- You’re using
git cloneto copy a git repository to your local machine. You’re fetching all information and all commits. (1. Run in above diagram)
- After a while you’re coming back to your local clone and see that other contributors have since then pushed commits to the upstream repository. You use
git pullto fetch the latest information. Instead of fetching all information and commits again,
gitonly fetches the new commits. (2. Run in above diagram)
As explained in the introduction, a source plugin can have a great impact on the time it takes for a user to see their changes applied. For a source plugin to support Incremental Builds both the API and source plugin have to support the concept of delta updates (which you learned about in Part 4).
Ideally, the delta updates from your API contain all CRUD (Create/Update/Delete) actions that occured since the last update.
The steps below outline the order you should structure your
sourceNodes API to source data in an ideal way.
Please note: Currently this tutorial’s example API doesn’t support delta updates. Thus you’ll see code examples from
gatsby-source-contentful to illustrate the explanations where necessary. Please let us know through the “Was this doc helpful to you?” form at the bottom of this page if you wish to see the example API directly support delta updates.
Gatsby aggressively garbage collects nodes between runs. This means that nodes that were created in the previous run but are not created in the current run will be deleted. You can tell Gatsby to keep old, but still valid nodes around, by “touching” them. For this you need to use the
Garbage collection: In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage.
touchNode API is primarily useful for source plugins fetching nodes from a remote system that can return only nodes that have been updated. The source plugin then touches all the nodes that haven’t been updated but still exist so Gatsby knows to keep them. Since Gatsby only tries to delete stale nodes on the first initial run, you need to “touch” them only once.
This technique can be shown in your source plugin, so open the
plugin/src/source-nodes.ts file and follow the steps:
isFirstSourceboolean variable above the
sourceNodesyou’ll set it to false and then on subsequent runs the codepath it’ll be used in won’t get called again.
gatsbyApi. Also destructure
isFirstSourceafter the initialization of the
isFirstSourceblock, use the
getNodesfunction to get an array of nodes back (it returns all nodes that are currently in Gatsby’s data layer). Filter out any nodes that are not from your source plugin and then “touch” the rest of them.
pluginis the name of your source plugin for this tutorial. Adjust it to the actual name once you write your actual source plugin.
With the code complete, your source plugin won’t discard GraphQL nodes that should still exist on subsequent runs.
The details of this step completely depend on the remote API you’re working with. Visit the API’s documentation to check if it supports delta updates and how you can use them.
Independent from what sync token you need to save (be it a timestamp, a token, or something else), you’ve already learned in Part 4 how you can use the
cache API to save and use that token.
Use that sync token on subsequent calls to the API to only receive delta updates.
First, a unique
sourceIdstring is generated from the
environmentwhich are both set in the plugin’s options.
sourceIdused for the
The next part might be unfamiliar to you since the tutorial hasn’t mentioned it so far:
In Part 4 you’ve learned how to use the
cacheAPI to save a sync token. You can also use the
setPluginStatusAPI to achieve the same end result. While
setPluginStatus, we’d recommend using the
cacheAPI instead due to it’s simpler usage.
Regardless, in the end a
syncTokenis saved inside a variable.
syncTokenis passed to the
And coming back is the
currentSyncData, the delta update.
fetchContentutility function itself uses Contentful’s Sync API to get delta updates. Here’s the relevant code snippet:
As mentioned in the beginning, it really depends on your API how this
fetchContentutility function would look like for you (Note: The equivalent in this tutorial would be the
The details of this step completely depend on the remote API you’re working with. Visit the API’s documentation to check what data (and in which shape) you’re getting back on delta updates.
Let’s assume the data you’ve got back looks like this:
You then can use different Node APIs for each of those CRUD operations:
- Create: You can use the
createNodeAPI to create new GraphQL nodes.
- Update: Also call
createNodeto update nodes.
- Delete: Use the
deleteNodeAPI to delete nodes.
Good job! You’ve completed one of the more theoretical parts of this tutorial.
Take a moment to think back on what you’ve learned so far. Challenge yourself to answer the following questions from memory:
- What are Incremental Builds and how do they function in the context of Gatsby source plugins?
- What is the use case for the
- Which Node APIs can you use to create/update/delete GraphQL nodes?
- Source plugin authors can write their plugin in such a way that on subsequent runs data sourcing is only happening incrementally through delta updates.
- Both your remote API and source plugin have to support delta updates.
- You can use the
touchNodeAPI to tell Gatsby to keep existing GraphQL nodes around during garbage collection.
- Most APIs use a sync token that you need to send along the request to the API to receive a delta update.
- You can use
deleteNodeto create/update/delete GraphQL nodes according to the delta update.
Share Your Feedback!
Our goal is for this tutorial to be helpful and easy to follow. We’d love to hear your feedback about what you liked or didn’t like about this part of the tutorial.
Use the “Was this doc helpful to you?” form at the bottom of this page to let us know what worked well and what we can improve.
In Part 6 you’ll learn how Image CDN can greatly improve image processing in your source plugin.Continue to Part 6