Scrape URLs in Google Sheets for SEO.

With this script you can input a URL and our simple SEO scraper will extract data to quickly identify technical SEO issues. 🚀

Check out the GIF to see it in action 🎥


How to add the Script to Google Sheets

1. Copy the script below:

/**
 * Simple Google Sheets SEO scraper to extract data from a URL.  
 *
 * @param {"https://keywordsinsheets.com"} url - input the target URL.
 * @param {false} allHeadings [OPTIONAL] Select whether you would like the first heading returned, or all headings returned in a comma-separated list. Default is set to false.   
 * @param {"body"} selector [OPTIONAL] Include a CSS selector to scrape custom content.
 * @customfunction
 */
function seoscraper(url, allHeadings, selector) {

    // Check if the URL is valid
    if (!url) {
        return "You need to enter a valid URL.";
    } else if (!url.includes("http")) {
        return "You need to include the URL protocol eg. HTTP or HTTPs.";
    }

    // Set the default values for allHeadings and header
    allHeadings = allHeadings || false;

    data = [];

    try {
        // Fetch the URL
        const fetch = UrlFetchApp.fetch(url, {
            muteHttpExceptions: true,
            followRedirects: false
        });
        
        // load URL into Cheerio
        const content = fetch.getContentText();
        const $ = Cheerio.load(content);

        // get all of the text in the body
        const body = $("body").text();

        // Function to format text
        const trimText = (text) => text.trim().toLowerCase();

        // get the response code
        const status = fetch.getResponseCode();

        // Get the page title
        const title = $("title").text().trim();

        // Get the meta description
        const description = $("meta[name='description']").attr("content");

        // Get the canonical URL
        const canonical = $("link[rel='canonical']").attr("href");

        // Get the meta robots
        const robots = $("meta[name='robots']").attr("content");

        // get the wordcount of a page
        const wordCount = body.trim().split(/\s+/).length;

        // Get the H1
        const h1 = trimText($("h1").first().text());

        // Get the H2
        const h2 = allHeadings ? $("h2").map((i, e) => trimText($(e).text())).get().toString() : trimText($("h2").first().text());

        // Get the H3
        const h3 = allHeadings ? $("h3").map((i, e) => trimText($(e).text())).get().toString() : trimText($("h3").first().text());

        // Get the content from selector
        const customContent = $(selector).text();

        // Push all of the data into the array if 200 response
        if (status === 200) {
            data.push([status, title, description, h1, h2, h3, robots, canonical, wordCount, customContent]);
        } else {
            data.push([status]);
        }

        return data;

    } catch (err) {
        return "Cannot scrape URL";
    }

}

2. Head over to Google Sheets

Or if you’re really smart, create a new sheet by going to: https://sheets.new

Select Script editor from the Tools menu.

Paste the script and save it.


3. Add the Cheerio Library in Apps Script

Search for the Cheerio Library using the ID below. The Cheerio Library makes it easier to parse the HTML from the requested page.

1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0

If you’re not sure how to add a library in Apps Script, follow the gif below:


4. Add the formula to any cell in your sheet

=seoscraper(A1)

Replace A1 with any cell that includes your URL (the page you want to scrape).

The following columns of data will be returned:

  1. Status Code
  2. Page Title
  3. Meta Description
  4. H1
  5. H2 (first heading)
  6. H3 (first heading)
  7. Meta Robots
  8. Canonical URL
  9. Word Count

=seoscraper(A1,true,"body")

You can also add two further parameters to the function.

allHeadings

If you select allHeadings as true, it will return all h2 and h3 headings as comma-separated lists, rather than the first value.

selector
You can also add a custom CSS selector if there are particular areas of a page you would like to scrape. This is added as the final column of returned data.

*Websites could fail when fetching because of bot detection algorithms.


Thanks for stopping by 👋

I’m Andrew Charlton, the Google Sheets nerd behind Keywords in Sheets. 🤓

Questions? Get in touch with me on social or comment below 👇


More Scripts

Submit your response

Your email address will not be published.