petitviolet blog

    Subscribe a Web Page using GoogleAppScript



    Some of web pages have RSS feed so that we can get notified when the pages updated through Feedly, Slack, etc. However, some of them don't have such useful features. This post describes how to subscribe web pages that don't offer RSS feeds or something like that.

    HOW TO

    Use Google App Script(a.k.a GAS) to check a web page periodically, and then if there is an update, notify via Email.

    In this post, use as an example, even though it has /rss.xml for RSS feed. As the blog looks like the below image, we're able to extract the latest published post's date.

    structure of

    "UpdatedAt" dates can be found within <small>...</small> HTML tags. Let's use this structure to get UpdatedAt. For this example, I'm going to use Regex to select a date time string from a given HTML. If you'd like to parse more complicated HTML and select elements, there is a library, Parser that you can import with 1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw.


    In GAS, we can use UrlFetchApp#fetch and then HttpResponse#getContentText to get a HTML.
    Then, date format in the blog is yyyy-MM-dd found within <small> HTML tag so that we can find the pattern with /<small>(\d{4}-\d{2}-\d{2})<\/small>/. A code snippete for extracting the latest date from the blog is as following:

    const URL = '';
    const html = UrlFetchApp.fetch(url).getContentText('UTF-8');
    const pattern = /<small>(\d{4}-\d{2}-\d{2})<\/small>/;
    const updatedAt = html.match(pattern)[1];

    Next step is managing the state of this subscriber, that is to detect whether the obtained latest date is new one or not.
    Additionally, it needs to avoid notifying more than once for the same updates.
    GAS is basically based on Google SpreadSheet so that we can use it as a state storage.

    const CELL = 'A1'
    const sheet = SpreadsheetApp.getActiveSheet();
    const lastUpdatedAt = sheet.getRange(CELL).getValue();
    const today = Utilities.formatDate(new Date(), 'JST', 'yyyy-MM-dd');
    if (lastUpdatedAt === null || lastUpdatedAt == updatedAt) {
        return; // skip if it's the first time or already noticed
    // store the last updatedAt in the sheet

    The last step is how to notify if an update is observed.
    How to notify is depending on what you want, but in GAS, it would be easiest to send email via MailApp.sendEmail.

    var subject = "Update from";
    var name = "GAS robot";
    var body = `<a href=${URL}>${URL}</a> updated at ${date}`;
        to: MAIL_TO, 
        subject: subject,
        name: name, 
        htmlBody: body,
        body: body

    That's it! If you'd like to get notification in Slack, chat.postMessage API should work.

    Then, you can setup a time basis trigger to call these codes so that you'll be able to get notified when the page is updated.


    The whole code is available at


    Google App Script is really useful in many cases since it offers the following:

    • JavaScript runtime
    • on Spreadsheet which is considered simple storage
    • A scheduler to call JavaScript functions periodically

    So, I'd say we can fulfill most of personal use-cases by using GAS.