gatsby-transformer-rehype

Released under MIT license. gatsby-transformer-rehype npm package version. PRs welcome!

This is an HTML to HTML transformer. It parses HTML files and GraphQL HTML nodes using rehype. This package is heavily inspired by gatsby-transformer-remark, the difference being that the content source is HTML instead of Remark.

The general idea of this package is to convert an input HTML fragment into an HAST syntax tree, that is put into the HtmlAst object. HtmlAst is passed down to all plugins provided in the options. Plugins are allowed to mutate HtmlAst and thereby provide requested transformations on the original HTML. Finally gatsby-transformer-rehype parses the HtmlAst back to regular HTML.

This plugin also creates a tableOfContents field. The table of contents is generated by analyzing all headlines from level one to six withing the given HTML blob.

Install

yarn add gatsby-transformer-rehype

How to use

// In your gatsby-config.js
plugins: [
  {
    resolve: `gatsby-transformer-rehype`,
    options: {
      // Condition for selecting an existing GrapghQL node (optional)
      // If not set, the transformer operates on file nodes.
      filter: node => node.internal.type === `GhostPost`,
      // Only needed when using filter (optional, default: node.html)
      // Source location of the html to be transformed
      source: node => node.html,
      // Additional fields of the sourced node can be added here (optional)
      // These fields are then available on the htmlNode on `htmlNode.context`
      contextFields: [],
      // Fragment mode (optional, default: true)
      fragment: true,
      // Space mode (optional, default: `html`)
      space: `html`,
      // EmitParseErrors mode (optional, default: false)
      emitParseErrors: false,
      // Verbose mode (optional, default: false)
      verbose: false,
      // Plugins configs (optional but most likely you need one)
      plugins: [],
    },
  },
],

The filter option allows you to transform HTML nodes that come from other GraphQL nodes. In conjunction with the source option, you can also define a different location of your source html. If your HTML is sourced in from files, mediaType must be set to text/html.

The following parts of options are passed down to rehype as options:

  • options.fragment
  • options.space
  • options.emitParseErrors
  • options.verbose

The details of the rehype options above can be found in the rehype-parse documentation.

This transformer is most useful when combined with Gatsby rehype plugins which you can install to customize how HTML is transformed. The following gatsby-rehype-* plugins are currently available:

If you are missing a plugin, consider collaborating with me to contribute your own. Writing plugins for gatsby-transformer-rehype is easy!

Parsing algorithm

Each HTML file or HTML GraphQL node is parsed into a node of type HtmlRehype.

This plugin adds additional fields to the HtmlRehype GraphQL node including html, htmlAst and internal.content. The latter contains the source HTML. The transformed HTML can be found in html. All transformations should be made on htmlAst which is passed to all sub-plugins. Other Gatsby plugins can also add additional fields.

Table of Contents

Data for the table of contents is generated from the html blob by visiting all h1, h2, …, h6 elements which is used to generate a tableOfContents field. It contains a JSON array with the following structure:

[
    {   id: String, 
        heading: String, 
        items: [],
    }
]

The id field contains the heading id as given in the h1, h2, …, h6 elements. The heading field contains the heading title.

The array represents a nested tree, where first level headings are in the root array. If a heading has subheadings they will be placed in items. The elements of items replicate the same structure. The maximum level of nesting is 6.

The generateTableOfContents function is passed to all sub-plugins for on-demand creation. This function takes htmlAst as an argument and returns the tableOfContents tree structure.

How to query

A sample GraphQL query to get HtmlRehype nodes:

{
  allHtmlRehype {
    edges {
      node {
        html
        tableOfContents
      }
    }
  }
}

Access from parents

Your source HTML comes either from a file or from some other HTML GraphQL node. Assuming that you sourced your HTML from the GhostPost node, you can reach your transformed HTML also on the children node:

{
  allGhostPost {
    edges {
      node {
        childHtmlRehype {
            html
            tableOfContents
        }
      }
    }
  }
}

This allows for minimal changes in your original GraphQL queries.

Troubleshooting

gatsby-transformer-rehype hooks into the Gatsby onCreateNode method. This method is only called if a new node is created. If nodes were previously generated, they might have been cached and onCreateNode is not called again. During development, or when adding new plugins to the options, don’t forget to call

yarn clean

in order to trigger the transformer again. Please always do a yarn clean before reporting a bug to this project.

Contributions

PRs are welcome! Consider contributing to this project if you are missing feature that is also useful for others. Explore this guide, to get some more ideas.

Copyright & License

Copyright (c) 2020 styxlab - Released under the MIT license.