This article was originally posted on my blog (original link). I'm reposting here in the hopes that it helps more people get started with Gatsby!
Migrating a blog to Gatsby
Abstract: Gatsby is a great tool for building a blog. In part 1 I did the more basic task of migrating an existing React site to Gatsby. This time I migrated my blog, which was a lot more involved and required a lot more Gatsby-specific knowledge.
Here's the gist of what I'm going to cover:
- Preparing an existing blog for migration
- Configuring Gatsby to handle markdown
- Querying your markdown files using GraphQL
- Adding custom data to the generated GraphQL schema
- Turning all your markdown files into static pages
Let's jump in.
Preparing your existing blog for migration
NOTE: If you don't already have a blog or want to create one from scratch there's a tutorial for exactly that right here.
Let's move some files around. Gatsby gives you a good amount of flexibility when it comes to file structure, but for consistency with the docs I'm going to use the suggested file structure for migrating my blog. How you handle this step will depend on what you're migrating from. I am migrating from Hexo, which is very similar to Jekyll in how it structures files.
Clean up your source repo
For the first step, clear everything other than your actual post content out of the repo. For me, this meant everything that wasn't under the
source/ directory (that's a Hexo convention). One way to do this is to take everything not relevant to the upcoming Gatsby blog and move it into its own directory that doesn't interfere with anything. I chose to create
hexo.bak/ where all my old blog files would live (except for the content).
You could also simply delete everything other than your raw content. It's up to you. But once you're done with this cleanup you should have made a decision on where to hold your content, and moved everything else away or removed it.
Here's what that looks like for me:
For the rest of this post I'll ignore the
hexo.bak/ directory because it's not relevant to Gatsby.
Set up Gatsby
You need to copy all the standard Gatsby boilerplate into your directory. There are many ways you could do this but I'll go over what I did.
To get all the Gatsby files you can use the Gatsby CLI.
However you get Gatsby initialized in your repository root, afterwards you should have a file structure that looks something like this:
Now run the Gatsby dev server to make sure everything works:
NOTE: If you open up
package.json you can see what the
develop script is doing.
Boom ?! The default site is up.
Rendering a list of posts
Let's customize that landing page to render a list of posts. You will also probably want to customize the header and overall layout.
Customizing the layout
This is pretty simple. Just modify the primary layout file that was generated:
You can also customize the styles in
src/layouts/index.css. Stylus, Sass, Less, etc are also supported if you add the appropriate plugin. Here's the list (there's a page on the website too, but the source is more up to date).
Sidenote: You can also create your own plugin to do whatever you want, which I talked about in part 1.
Customizing the landing page
Also straightforward, just edit:
This file is where we'll actually render out the list of posts. So where the hell does that data come from??
Querying data with GraphQL
Now we're getting in to the meat of Gatsby and one of the areas where it really shines: Data sources. You can pull in data from anywhere to be rendered in your blog, but for our use case the only data source will be the file system (aka the markdown files stored on your hard drive).
But first, let's check out [GraphiQL]. It's an excellent playground for testing out GraphQL queries in any GraphQL project. Gatsby ships with it enabled by default, thank goodness. GraphQL can actually be oddly opaque without this excellent tool.
http://localhost:8000/___graphql in the browser and you'll be greeted with this lovely dev tool:
I recommend getting to know this tool if you're not already familiar. You will be coming back to this often to find the right query to pull data for your pages.
Querying the file system
If you play around with GraphiQL you'll notice there's not that much there. Let's fix that. We need to teach Gatsby how to query the file system. Luckily this is so common it's been done for you. Install the file system source plugin:
gatsby-config.js to both use the plugin and tell it what directory to source files from. Add this to the
As you can see on my system I keep all my markdown files under
content/_posts/ which is reflected in the
path option for the plugin.
Now restart the dev server and open GraphiQL up again. You should have access to the
allFile root type. Try running this query:
This will list all the files in the directory you specified to the plugin. You can query all sorts of information about the files. Just investigate the fields available to you under
node in GraphQL.
Pro tip: Hit ctrlspace to trigger autocomplete and bring up the list of all available fields.
Being able to query files is a big win, and if you have a directory of HTML files this is all you will need. But if you want to render markdown files as HTML you will need another plugin. Let's add that now:
As before, add it to the
plugins field in
This particular plugin can also take its own plugins via the
plugins option. I've left it empty but this is where you can add things like syntax highlighting or auto-linking of headers. Here's the current list: https://www.npmjs.com/search?q=gatsby-remark
Save and restart your dev server, then go into GraphiQL and try out the new
This query gives you the full HTML for all your markdown files. If you are using frontmatter you can also access that here. I'm assuming you have a
title field in your frontmatter:
Now you have access to the full HTML of your posts as well as the titles. With this we have enough information to render a list of posts on the front page.
Getting GraphQL data into your components
Gatsby has the concept of the
pageQuery. For every page you create you can specify a
pageQuery that will pass data into the default export of that page.
This is a simplified example, but there are a few things going on that might not be intuitive.
- In the render method we first check for errors, and return early if any are found
- If no error are found we render a link for each item in the array:
- We export a
pageQueryvariable that is constructed using the magic
The error handling is pretty straightforward, if a bit verbose, as long as you know what graphql responses look like. In case you didn't know, if you get an error in a graphql query the response will contain the
errors array. We check for this array and handle it accordingly.
Now let's looks specifically at where we render a link for each blog post:
Notice that the data shape is exactly what we specified in the GraphQL query. This may seem like a lot of nesting just to get at an array of data, but GraphQL emphasizes clarity over conciseness. You'll notice that if you run your GraphQL query in GraphiQL the data will have the exact shape described above.
And that brings us finally to the page query:
This is how you get data from Gatsby into your react components. Make sure you don't misspell
pageQuery otherwise you won't get what you want.
Also note that
graphql is just some magic global variable. Your linter will probably complain about it being undefined and you will just have to ignore it. Personally I think it would be more clear if
graphql was imported from Gatsby, but the project is still young so the API could change at some point ¯\( ツ )/¯
Linking to blog posts
But the links don't link anywhere… where's that
Let's remedy that. Import the
Link component and swap it for the simple
<a> tag that was in there before:
But what does it link to? What is the URL of each blog post?
That's an open question because it depends on your data and how you structured it before. For example, if you included the intended URL in the frontmatter of each post it's a simple matter of updating your query to include that:
Many existing Gatsby examples use
path within each markdown file's frontmatter to designate the url. For example:
In this case
node.frontmatter.path would be used to construct URLs. If this is the case for you then you're probably all set for the index page.
But what if the URL for each post is NOT in the frontmatter?
This was exactly my situation. The URL was actually derived from the title of the post so I had to figure out how to augment the GraphQL fields with my own data. Namely the URL of the post derived from the post title.
Adding custom data to the GraphQL schema
Aside: Gatsby is super extensible. It's the primary reason I switched from Hexo which worked well enough for my use case.
In order to extend this particular part of Gatsby you need to create a
gatsby-node.js file. This file let's you work with all of Gatsby's plugin hooks that are run in node. The GraphQL server is run in node, so this is where we add custom fields. Example:
Source code for gatsby-node.js here.
If you've worked with GraphQL before this should look very familiar. In fact, as you can see the string type is imported directly from GraphQL and not from Gatsby.
You check the type of node and if it's a type you're interested in you resolve with some fields. Fields in GraphQL require a
type and a way to
resolve the value.
I've omitted the implementation of
getURL here, but you can see the source code here (NOTE: in the source it's called
getSlug instead of
You can use this technique to add any field you want to your GraphQL schema. Now you should be all set to render
Link components that actually point somewhere interesting 👍.
Generating pages from markdown files
This is where it all comes together. If you finished the last section you would have ended up with a bunch of links that point to the correct URL but when you tried visiting the URL there was nothing there 😕. This is because Gatsby hasn't yet generated an additional pages. It's still just rendering whatever is in your
By default, Gatsby will create a static HTML page for everything under
src/pages/. At this point we've discussed
src/pages/index.js extensively. It will be the
index.html page of your site, and thus your landing page.
pages/ directory and you are good to go. For example,
src/pages/about.js would generate an
about.html page. Simple.
But almost everyone will want to generate some pages based on data, not on the files in the pages directory. Gatsby let's us do this.
Generating custom pages
The key here is again to hook in to one of Gatsby's many plugin hooks. In this case,
createPages. In the same
gatsby-node.js file as before:
At the most basic level this method of page creation is quite simple: Grab the
createPage function from the API and call it with some props.
pathis required. This is the path that your page will have as a generated HTML file. It's the URL of your final page.
componentis also required. It's the file containing the react component that will be used to render this particular page.
contextis optional but I've included it here because it will be important soon. This lets you pass data down to the react component specified in the
componentoption as well as the
The API is actually pretty simple: To generate a new page call
createPage with some props. So in pseudo code:
I've included the pseudo code to highlight the fact that nothing too magical is going on here. We just need to call create page for every post we want to create. The implementation is a bit more verbose, but that's still all it's doing.
So in order to make this work we also need to be able to query GraphQL just like we do in the page query. Gatsby let's us do exactly that by giving us access to the
graphql object and letting us return a promise so that we can do async work.
Notice that the query is very similar to the
pageQuery in index.js but it's not identical. This time we actually want the
id because it will allow the post template to use the ID to query one single blog post.
Rendering individual posts
If you've made it to this point rendering individual posts is quite straightforward. You need to:
- Create the
postTemplatefile referenced in
- Export your template component as the default export
- Add a
pageQuerythat will fetch the blog post to render
Here it is in all it's glory:
If you're not used to GraphQL syntax the
pageQuery might be a little intimidating, but it's all standard GraphQL so if you take the time to learn GraphQL on its own you will be able to use that knowledge here. I.e. it is not Gatsby-specific.
The important thing to note here is that
$id is passed in via
gatsby-node.js. That's how the post data and processed HTML string make their way into props. Then it's just a matter of rendering as you would with any other component.
Where to go from here
Here are some ideas:
- Add previous and next buttons to each post
- Create a remark plugin to add custom block types
- Aggregate tags from your frontmatter and generate pages for all posts of a specific tag
Some of these—such as pagination—are implemented on my blog. You can find the source code here:
In my opinion Gatsby provides a few killer features:
- Extensible through a powerful plugin API.
- Supports arbitrary data sources that can be easily queried using GraphQL.
- Splits your code automatically so you don't have to worry about bundle size increasing as a function of the number of pages you render.
It's not a perfect project (looking at you global
graphql object) and it's still under heavy development, so you may run in to bugs, but in my view the pros heavily outweigh the cons. It's a best-in-class static site generator and well worth the adoption time if you want to customize your blog.
If anything was unclear or you have more questions feel free to ask me on Twitter.