Valhalla

Take a deep dive into our latest launch, the Valhalla Content Hub!

ContactSign Up for Free
Community Plugin
View plugin on GitHub

NPM version Actions Build Status Reliability Rating Coverage FOSSA Status

gatsby-plugin-merge-robots

Based off the default gatsby robots.txt plugin. Create robots.txt for your Gatsby site on build.

Have another app/service/subdomain not managed by your gatsby site, but need to consolidate to one robots.txt? gatsby-plugin-merge-robots is a drop-in replacement for gatsby-plugin-robots-txt with one added parameter:

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        host: 'https://www.example.com',
        sitemap: 'https://www.example.com/sitemap.xml',
        policy: [{userAgent: '*', allow: '/'}],
        external: ["https://example.com/blog/robots.txt"]
      }
    }
  ]
};

The plugin will pull and refactor the external robots.txt by removing unneessary metadata, transforming the path to be compatible with your root level robots.txt, then appending it to the output of your gatsby robots.txt. Magic! ⚡

Install

yarn add gatsby-plugin-merge-robots

or

npm install --save gatsby-plugin-merge-robots

How To Use

gatsby-config.js

module.exports = {
  siteMetadata: {
    siteUrl: 'https://www.example.com'
  },
  plugins: ['gatsby-plugin-merge-robots']
};

Options

This plugin uses generate-robotstxt to generate content of robots.txt and it has the following options:

Name Type Default Description
host String ${siteMetadata.siteUrl} Host of your site
sitemap String / String[] ${siteMetadata.siteUrl}/sitemap/sitemap-index.xml Path(s) to sitemap.xml
policy Policy[] [] List of Policy rules
configFile String undefined Path to external config file
output String /robots.txt Path where to create the robots.txt

gatsby-config.js

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        host: 'https://www.example.com',
        sitemap: 'https://www.example.com/sitemap.xml',
        policy: [{userAgent: '*', allow: '/'}]
      }
    }
  ]
};

env-option

gatsby-config.js

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        host: 'https://www.example.com',
        sitemap: 'https://www.example.com/sitemap.xml',
        env: {
          development: {
            policy: [{userAgent: '*', disallow: ['/']}]
          },
          production: {
            policy: [{userAgent: '*', allow: '/'}]
          }
        }
      }
    }
  ]
};

The env key will be taken from process.env.GATSBY_ACTIVE_ENV first ( see Gatsby Environment Variables for more information on this variable), falling back to process.env.NODE_ENV. When this is not available then it defaults to development.

You can resolve the env key by using resolveEnv function:

gatsby-config.js

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        host: 'https://www.example.com',
        sitemap: 'https://www.example.com/sitemap.xml',
        resolveEnv: () => process.env.GATSBY_ENV,
        env: {
          development: {
            policy: [{userAgent: '*', disallow: ['/']}]
          },
          production: {
            policy: [{userAgent: '*', allow: '/'}]
          }
        }
      }
    }
  ]
};

configFile-option

You can use the configFile option to set specific external configuration:

gatsby-config.js

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        configFile: 'robots-txt.config.js'
      }
    }
  ]
};

robots-txt.config.js

module.exports = {
  host: 'https://www.example.com',
  sitemap: 'https://www.example.com/sitemap.xml',
  policy: [{userAgent: '*'}]
};

Netlify

If you would like to disable crawlers for deploy-previews you can use the following snippet:

gatsby-config.js

const {
  NODE_ENV,
  URL: NETLIFY_SITE_URL = 'https://www.example.com',
  DEPLOY_PRIME_URL: NETLIFY_DEPLOY_URL = NETLIFY_SITE_URL,
  CONTEXT: NETLIFY_ENV = NODE_ENV
} = process.env;
const isNetlifyProduction = NETLIFY_ENV === 'production';
const siteUrl = isNetlifyProduction ? NETLIFY_SITE_URL : NETLIFY_DEPLOY_URL;

module.exports = {
  siteMetadata: {
    siteUrl
  },
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        resolveEnv: () => NETLIFY_ENV,
        env: {
          production: {
            policy: [{userAgent: '*'}]
          },
          'branch-deploy': {
            policy: [{userAgent: '*', disallow: ['/']}],
            sitemap: null,
            host: null
          },
          'deploy-preview': {
            policy: [{userAgent: '*', disallow: ['/']}],
            sitemap: null,
            host: null
          }
        }
      }
    }
  ]
};

query-option

By default the site URL will come from the Gatsby node site.siteMeta.siteUrl. Like in Gatsby’s sitemap plugin an optional GraphQL query can be used to provide a different value from another data source as long as it returns the same shape:

gatsby-config.js

module.exports = {
  plugins: [
    {
      resolve: 'gatsby-plugin-merge-robots',
      options: {
        query: `{
          site: MyCustomDataSource {
            siteMetadata {
              siteUrl
            }
          }
        }`
      }
    }
  ]
};

Contributors

License

MIT License

FOSSA Status

© 2022 Gatsby, Inc.