Saying stuff about stuff.

Caching dependencies on GitHub Actions

I previously wrote about caching gems on CircleCI and although this is even easier to achieve with GitHub Actions there’s still a useful approach worth applying that will minimise the overall time spent installing dependencies when a workflow contains many jobs.

The thing I want to prevent is each job having to download and install its dependencies, when this occurs one of the jobs will finish first and write to the cache and then we’ll see something like Cache hit occurred on the primary key BIG-LONG-KEY, not saving cache from the Post Run actions/cache@v2 output in other jobs. Nothing will blow up, but all of those separate jobs installing dependencies are wasting billable time.

The simple fix is to declare an initial job whose sole purpose is to do the work of installing dependencies and to make them available for the other jobs. Rather than store the installed dependencies as a workflow artifact I prefer to treat this step as warming a cache — which may be a little more verbose but I think is simpler overall (also, with this approach, the cache is treated as a performance optimisation and each dependant job is still able to run if it isn’t present for for some reason).

Caching gem dependencies

Here’s an example of installing gems, note the job cache_gems and how the other jobs declare it as a dependency by specifying needs: cache_gems:

on: [push]

jobs:
  cache_gems:
    name: Cache gems
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - uses: ruby/setup-ruby@v1
      with:
        # We always need to tell ruby/setup-ruby to cache the gems for us.
        bundler-cache: true

  brakeman:
    name: Brakeman
    runs-on: ubuntu-latest

    # Tell this job to wait for the `cache_gems` job to successfully complete
    # before it runs.
    needs: cache_gems

    steps:
    - uses: actions/checkout@v2

      # Each dependant job still need to install Ruby and ensure that it reads
      # from the cache before installing the gems (which is as simple as
      # specifying `bundler-cache: true` when using ruby/setup-ruby).
    - uses: ruby/setup-ruby@v1
      with:
        bundler-cache: true

      # Now run your actual CI command.
    - run: bundle exec brakeman --quiet --run-all-checks

  rspec:
    name: RSpec
    runs-on: ubuntu-latest
    needs: cache_gems
    steps:
    - uses: actions/checkout@v2
    - uses: ruby/setup-ruby@v1
      with:
        bundler-cache: true
    - run: bundle exec rspec

  rubocop:
    name: Rubocop
    runs-on: ubuntu-latest
    needs: cache_gems
    steps:
    - uses: actions/checkout@v2
    - uses: ruby/setup-ruby@v1
      with:
        bundler-cache: true
    - run: bundle exec rubocop --parallel

Caching JavaScript dependencies

I use the same technique for installing JavaScript dependencies but it’s a little more verbose because ruby/setup-ruby takes care of caching for us whereas with actions/setup-node we have to do it ourselves using actions/cache.

  cache_node_modules:
    name: Cache node_modules
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-node@v1

      # actions/setup-node isn't quite as friendly as ruby/setup-ruby and we
      # have to configure our own caching. The following will read from the
      # cache and later write to it if the job runs successfully.
    - uses: actions/cache@v2
      with:
        path: node_modules
        key: ${{ runner.os }}-yarn-v1-${{ hashFiles('**/yarn.lock') }}
        restore-keys: |
          ${{ runner.os }}-yarn-v1-

      # Now actually install the dependencies.
    - run: yarn install --frozen-lockfile

  jest:
    name: Jest
    runs-on: ubuntu-latest
    needs: cache_node_modules
    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-node@v1

      # Once again we need to download the cached dependencies.
    - uses: actions/cache@v2
      with:
        path: node_modules
        key: ${{ runner.os }}-yarn-v1-${{ hashFiles('**/yarn.lock') }}
        restore-keys: |
          ${{ runner.os }}-yarn-v1-

      # But we should also try to install the dependencies in case the cache
      # hasn't been properly warmed for some reason.
    - run: yarn install --frozen-lockfile

      # And here's your CI command.
    - run: yarn test

  prettier:
    name: Prettier
    runs-on: ubuntu-latest
    needs: cache_node_modules
    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-node@v1
    - uses: actions/cache@v2
      with:
        path: node_modules
        key: ${{ runner.os }}-yarn-v1-${{ hashFiles('**/yarn.lock') }}
        restore-keys: |
          ${{ runner.os }}-yarn-v1-
    - run: yarn install --frozen-lockfile
    - run: yarn prettier --check app/javascript

Running system tests

Running system tests may require both Ruby and JavaScript dependencies which can be achieved with the following changes:

   rspec:
     name: RSpec
     runs-on: ubuntu-latest
-    needs: cache_gems
+    needs: [cache_gems, cache_node_modules]
     steps:
     - uses: actions/checkout@v2
     - uses: ruby/setup-ruby@v1
       with:
         bundler-cache: true
+    - uses: actions/setup-node@v1
+    - uses: actions/cache@v2
+      with:
+        path: node_modules
+        key: ${{ runner.os }}-yarn-v1-${{ hashFiles('**/yarn.lock') }}
+        restore-keys: |
+          ${{ runner.os }}-yarn-v1-
+    - run: yarn install --frozen-lockfile
     - run: bundle exec rspec

My YAML reference

I find myself having to write ever more YAML nowadays and whilst it seems pretty simple at first (isn’t it just key/value?) having a bit more knowledge can be helpful. So here are a bunch of YAML things that I find useful — and that I continue to forget and have to look up again.

Strings

It’s not always necessary to “quote” a string and because of this there are lots of subtle ways that things can go awry — The Norway Problem being a classic example — but there are also lots of features that can help with formatting strings.

Formatting with |

The | treats its following lines as a block of multiline text:

text: |
  # Markdown Heading

  These separate lines
  will remain separate lines
  with no extra indentation.
"# Markdown Heading\n\nThese separate lines\nwill remain separate lines\nwith no extra indentation.\n"

There’s also |- which removes the final trailing newline.

Formatting with >

The > joins its following lines with a space:

text: >
  These separate lines
  will become one long line
  joined with spaces and
  with no indentation.
"These separate lines will become one long line joined with spaces and with no indentation.\n"

Using > can aid readability by splitting a single long command over many lines:

step:
  run: >
    NODE_ENV=production
    SOME=more
    ENV=vars
    npm build
"NODE_ENV=production SOME=more ENV=vars npm build\n"

There’s also >- which joins its following lines with a space and removes the final trailing newline.

Note that lines will only continue to be joined while the indentation level remains the same (this has caught me out in the past). So the following will join the first two lines with a space but the rest with newlines:

text: >
  These separate lines
  will NOT become one long line
    joined with spaces and
  with no indentation.
"These separate lines will NOT become one long line\n  joined with spaces and\nwith no indentation.\n"

Here’s what the spec says:

each line break is folded to a space unless it ends an empty or a more-indented line

Multiline with no extra formatting

The behaviour of >- appears to be similar to the default behaviour for a multiline string in that the lines are joined with a space and there’s no trailing newline — the big difference seems to be that indentation changes are ignored:

text:
  These separate lines
    will become one long line
      joined with spaces and
  with no indentation.
"These separate lines will become one long line joined with spaces and with no indentation."

This, again, can aid readability of a long command by splitting it over many lines with different indentation:

step:
  run:
    ./run_a_command
      --with=a
      --big=list
      --of=arguments
      -- and/file/paths
"./run_a_command --with=a --big=list --of=arguments -- and/file/paths"

Maps

Key/value pairs are “simple” can be nested:

key: value
nested:
  key: value
{
  "key": "value",
  "nested": {
    "key": "value"
  }
}

But this is exactly the sort of thing that caused The Norway Problem because the values can be anything and may not result in what you expected:

a: true
b: false
c: null
d: YES
e: NO
f: hello
g: 1.234
h: a long unquoted string
{
  "a": true,
  "b": false,
  "c": null,
  "d": true,
  "e": false,
  "f": "hello",
  "g": 1.234,
  "h": "a long unquoted string"
}

Collections

I think of arrays as Markdown bullet lists — they can also be nested:

- one
- two
- three
-
  - nested
  - array
["one", "two", "three", ["nested", "array"]]

And can be written “inline”:

- one
- two
- three
- [nested, array]

Anchors (&)

Anchors act as variables and can be used to reduce repetition. Here’s an example from the spec where & declares the named anchor SS and * is used to reference it further on through the document:

---
hr:
  - Mark McGwire
  # Following node labeled SS
  - &SS Sammy Sosa
rbi:
  - *SS # Subsequent occurrence
  - Ken Griffey

Anchors can be used to DRY up CI config (although they can’t be used in the GitHub Actions YAML) or a Rails database.yml:

default: &default
  adapter: postgresql
  encoding: unicode
  pool: 5

development:
  <<: *default
  database: app_development

test:
  <<: *default
  database: app_test

Comments

A comment starts with a #:

# Commented line.
- hello
# Interleaved comment.
- there # Another comment.
["hello", "there"]

But, as you may have noticed from a previous example, a # can appear in a multiline string when using | or > without being interpreted as a comment:

text: |
  # Markdown Heading

  These separate lines
  will remain separate lines
  with no extra indentation.
"# Markdown Heading\n\nThese separate lines\nwill remain separate lines\nwith no extra indentation.\n"

How to quickly test a snippet of YAML with Ruby

While writing this I encountered loads of little mistakes in my YAML and often had to verify that the output was what I expected. To check I used the DATA/__END__ trick:

require 'yaml'

pp YAML.load(DATA.read)

__END__

a: true
b: false
c: null
d: YES
e: NO
f: hello
g: 1.234
h: a long unquoted string
{"a"=>true,
 "b"=>false,
 "c"=>nil,
 "d"=>true,
 "e"=>false,
 "f"=>"hello",
 "g"=>1.234,
 "h"=>"a long unquoted string"}

My Prettier preferences and why

Over the years my own JavaScript code-formatting preferences have evolved but they don’t match Prettier’s defaults. I wondered whether I’d notice a difference so I started a recent new project with Prettier’s — and the community’s — defaults so I could find out.

After a few weeks I’m surprised (and more than a little pleased) to find that I’ve really noticed the difference and have switched back to my old, better, preferences. Each of the settings makes a noticeable difference when I’m writing code so here they are and why.

Single quotes (--single-quote)

Prettier makes quotes consistent either way so what’s the difference whether you write ' or "? The thing is that in general it’s easier to type a single quote than a double quote — because the former doesn’t require holding shift — and in particular with Vim it’s easier to type ci' (change inside quote) than ci". Simple as that.

Trailing commas (--trailing-comma es5)

Having a trailing comma on the final line of a multi-line object means that adding, removing, or moving entries doesn’t turn me into a comma juggler and force my brain to parse and validate the code. It also results in a cleaner diff so that only a single line is changed.

No semicolon (--no-semi)

This one is surely the most controversial because it involves semicolons. Consider the following:

const fourLetterShoutyWords = aListOfWords
  .map(word => word.toUpperCase())
  .sort()
  .filter(word => word.length === 4);

If I want to move the filter() line above the sort() it’ll take the semicolon with it and I’ll get a file.js|4 col 3| Unexpected token error. But with no semicolons, much like with trailing commas, I’m able to move code around more freely and without having to visually parse and validate the code myself.

Where Prettier makes a huge difference to fans of no semicolons is that old foe of forgetting to put one where it’s definitely necessary. This issue used to be something to fear (though it’s been many years since I last encountered it):

entirelyContrived = 1

(() => entirelyContrived++)()

Oops, what’s the problem: TypeError: 1 is not a function?

With Prettier enforcing no semicolons the reformatted code makes it more visually obvious that you’re actually attempting to call a function 1():

entirelyContrived = 1(() => entirelyContrived++)()

Conclusion

I have no doubt others will disagree with my choices but I’m happy to have found that they work for me and aren’t just subjective.

Getting the last day of the month or year in JavaScript

It’s easy to get the first day of a month but what about the last day of a month? Looking on Stack Overflow this is quite a common ask but it’s not something I’d ever encountered and the solution is quite interesting so I thought I’d describe it here.

It turns out the way to get the last day of a month is to ask for the day before the first day of the following month! I particularly like that phrasing because it’s both a description for humans and an explanation of the code. Here’s an example:

const month = 5 // June.

const startOfMonth = new Date(2019, month, 1)
// Sat Jun 01 2019...

const endOfMonth = new Date(2019, month + 1, 0)
// Sun Jun 30 2019...

Obviously zero is the day before the first day of a month?! And zero isn’t a special case, you can continue to count back:

new Date(2019, 6, 1)
// Mon Jul 01 2019...

new Date(2019, 6, 0)
// Sun Jun 30 2019...

new Date(2019, 6, -1)
// Sat Jun 29 2019...

// Keep going...

new Date(2019, 6, -364)
// Sun Jul 01 2018...

It works with months too, and across years:

new Date(2019, 11)
// Sun Dec 01 2019...

new Date(2019, 12) // ¿December + 1?
// Wed Jan 01 2020...

The two can be combined to get the last day of the year by asking for the zeroeth day of the thirteenth month (remembering it’s zero-indexed, so month number 12):

new Date(2019, 12, 0)
// Tue Dec 31 2019...

Or the final second before the new year:

new Date(2019, 12, 0, 24, 0, -1)
// Tue Dec 31 2019 23:59:59 GMT+0000 (Greenwich Mean Time)

And it’s the same behaviour when you mutate a date object — it does the cleverness on write so it reads back normally.

const date = new Date(2019, 6)
// Mon Jul 01 2019...

date.setMonth(12)
// Wed Jan 01 2020...

date.getFullYear()
// 2020

date.getMonth()
// 0

It seems a bit weird at first but behaves entirely consistently — as described in the specs (even the first one from 1997) — and is unusually developer friendly (particularly compared to the zero-indexed month debacle), I can’t believe I didn’t know this already.

Ruby’s identity method

When Ruby 2.2 added #itself I couldn’t think of anything I’d previously encountered where it would have been useful but I’ve finally used it in the wild. I had an array of integers and wanted to count their occurrences. Initially reaching for each_with_object I was hoping for something more meaningful, intention-revealing, and succinct.

[3, 1, 2, 1, 5].each_with_object(Hash.new(0)) { |i, memo| memo[i] += 1 }
# => {3=>1, 1=>2, 2=>1, 5=>1}

I remembered something about an identity method — #itself — and by using it you get the following (which is definitely more succinct):

[3, 1, 2, 1, 5].group_by(&:itself)
# => {3=>[3], 1=>[1, 1], 2=>[2], 5=>[5]}

So now I could write a method to extract pairs:

def pairs(array)
  array
    .group_by(&:itself)
    .select { |_, v| v.size == 2 }
end

pairs([3, 1, 2, 1, 5])
# => {1=>[1, 1]}

pairs([9, 1, 2, 2, 9])
# => {9=>[9, 9], 2=>[2, 2]}

In particular I wanted to check whether the array contained two groups each containing two items:

def two_pairs?(array)
  pairs(array).size == 2
end

two_pairs?([3, 1, 2, 1, 5])
# => false

two_pairs?([9, 1, 2, 2, 9])
# => true

Enumerable#tally

Since I wrote this Ruby 2.7 was released and added Enumerable#tally which behaves exactly like my initial code to count occurrences using #each_with_object:

[3, 1, 2, 1, 5].tally
# => {3=>1, 1=>2, 2=>1, 5=>1}

Pre-#itself hack

Looking around the internet it turns out that people were already getting around the lack of an identity method by calling a no-op method that returns self:

[3, 1, 2, 1, 5].group_by(&:to_i)
# => {3=>[3], 1=>[1, 1], 2=>[2], 5=>[5]}

self and #itself, what’s the difference?

To be honest although I expected it to fail I had my fingers crossed trying to use self like so:

# This doesn’t work.
[3, 1, 2, 1, 5].group_by(&:self)
# => NoMethodError (undefined method `self' for 3:Integer)

It doesn’t work but I didn’t know why. It turns out that self isn’t a method, it’s a special keyword whereas #itself is a method defined on Object (although the Ruby 2.2 release notes say it’s defined on Kernel).


Updated 2021-02-19: Added the section on Enumerable#tally.