Advent of Code 2022

These are my solutions to Day 1 of the advent of code 2022. Today’s puzzle involved taking a list which was in a very specific format, which contained listings of how many calories a group of elves were carrying in snacks, and determining how many calories each elf was carrying in total.

The majority of the difficulty in this problem arises from dealing with the slightly non-standard data format which is used; each elf lists all of their snacks’ values, then leaves a newline before we get the next elf’s values.

This is the sort of job which command line tools are actually very good at, but these days a bit neglected for. So I chose to solve the problem with a (pair of) bash one-liners (one for each of today’s two problems), but I’ll explain how it works below.

I’ve also replicated the solution using python with the standard scientific stack (which in this case, was just numpy). You could, without much extra work, solve this without using numpy too.

Bash

So first, my solution:

 
cat input | paste -s -d, | sed -e 's/,,/\n/g' | awk -F, '{sum = 0; for (i=1; i<=NF; i++) sum += $i} {print sum}' | cat --number | sort -k 2 -r | head -n 1

113	71506

But this isn’t very readable. I’ll break it down into steps. This is a very typical-looking bash “pipeline” - the vertical bar characters are called “pipes” in unix parlance, and they basically pass the output of the command on the left to the input of the command on the right.

cat input

The first command, cat input, just opens the file of inputs which I’d downloaded. Normally it would spit these out to the standard output, but instead it’s passing them along the pipeline. The file’s large, so I’m not going to run this in the notebook on its own.

paste

The paste command takes each line of a file, and outputs it on a line. By default it separates them with a <TAB> character, but ny specifying -d, we can change the “delimiter” to a comma. Finally, we need to pass the -s command to ensure the operation happens in serial, so that the order of the outputs stays the same.

sed

Sed is a program designed to make edits to “streams” (it’s the “stream editor”). Here I’m getting it to edit the output of the paste command, and replace repeated commas with a newline (since the newlines separating each elf’s list will have been replaced by just a comma, and each line will end with a comma, we know that a double comma separates each elf’s record.

awk

Awk is another classic mini-programming language, and alongside sed provides a set of primitive data analysis tools which are available on almost any unix-like machine. This section is a bit cryptic, but awk is designed to work with columns of data, and it has internal variables which identify each column. $1 is the first column, $5 the fifth, and so on.

The program works on each line of the input in turn: - Set sum to zero - For each value of i between 1 and NF (which is a special variable indicating the number of fields in the line): - Add the value of column i to sum. - After the for loop, print the value of sum.

The -F, flag is used to tell awk that each field is separated by a comma.

cat

This step’s actually not required, but I added it in to check things were working. You can take it out, but you’ll also need to change the following step.

cat --number takes the input, and adds a line number at the start of each line. I did this because I assumed I’d be asked which elf was carrying the most calories at some point, but I wasn’t.

sort

This one’s fairly simple, and it just sorts a column of data from low to high, by default. The -k 2 option tells it to sort the second column (since my first column is the line number in my solution) and the -r reverses the sort (so it’s high to low).

head

This command just takes the top values of the input, and returns them to the output. I specified -n 1 as I only want the first line, which is the highest value.

Because it’s the last stage in the pipeline, the output is printed to the standard output.

The second part just wants us to find the sum of the top three highest values, so we don’t need to make many changes. For simplicity I removed the line numbers, and I’ve changed the head command to head -n 3 to give the three highest values.

 
cat input | paste -s -d, | sed -e 's/,,/\n/g' | awk -F, '{sum = 0; for (i=1; i<=NF; i++) sum += $i} {print sum}' | sort -r | head -n 3 | paste -sd+ | bc

209603

paste

Our old friend paste has reappeared, this time I’ve changed the delimiter to a + character…

bc

Bash can’t do arithmetic on its own, but there’s a program which can do very simple arithmetic, and can parse strings with expressions like 1+1 in; we’ve just passed it a list of three numbers separated by + characters, so this will give us our second answer.

Python

First of all, let’s load in the data.

 
with open('input', 'r') as datafile:
    data = datafile.read()

Now let’s replace the empty lines with semicolons, and then the line breaks with commas.

 
data = data.replace("\n\n", ";").replace("\n", ",")

I’m going to use numpy to deal with the CSV (comma-separated values) data which I’ve created, but you can actually do this with standard library python too.

 
import numpy as np

Let’s split the data into new lines again at each semicolon, and then parse each line with numpy’s CSV reader to make numpy arrays from them. We can then calculate the sum of each array. We do this for each line in a list comprehension.

 
data = ([np.fromstring(line, sep=',').sum() for line in data.split(";")])

Now let’s sort the data, and choose the final element, which will be the largest.

 
sorted(data)[-1]

71506.0

Now let’s take the second part of the problem, where we need to find the sum of the three highest values.

Let’s do a quick sanity check to see the three numbers…

 
sorted(data)[-3:]

[68729.0, 69368.0, 71506.0]

Then getting the sum is trivial.

 
sum(sorted(data)[-3:])

209603.0