Jon Watson's Death by Tech Newsletter
Jon Watson's Death by Tech
Parsing JSON in shell scripts using JQ
0:00
-13:49

Parsing JSON in shell scripts using JQ

I’ve always patted myself on the back for my early decision to build a career in Linux systems administrator rather than becoming a developer or, gasp, a Windows systems administrator. In the early aughts, there were not very many Linux sysadmin jobs anywhere, even in the major centre where I lived in Canada. It was a tough career to get started in, and I built it up by podcasting about Linux, taking any contract work I could find, and - finally - landing my first legit Linux systems administrator position with a colleague who remains a good friend today.

In the early years, I’d do anything to get work to add to my portfolio in order to build legitimacy. During those years, and indeed still today, I have learned hundreds of tools, languages, concepts, models, and swear words. In that huge pool of skills I’ve learned and forgotten over the years, one tool has stood the test of time and has been able to handle almost anything I’ve thrown at it. That tool is basic every day shell scripting. However, there are a few things that shell scripts don’t do well natively and one of those is parsing JSON formatted data. That wasn’t a big deal 10 years ago, but now JSON is a very standard text data format and I encounter it constantly.

The most brutish of all options for handling JSON in a shell script is to treat it like any other text, and use sed/awk/grep to find what you need. A slightly harder option is to switch to something that handles JSON natively better, such as pretty much any actual programing laguage. But the most elegant answer is to use leverage what Linux tools do best: one thing and pipe them together.

Enter jq, a command line JSON query/parsing tool that allows sysadmins to handle JSON formatted data using familiar shell script concepts.

jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

It’s always best to learn by example, so let’s parse some JSON. I’m going to use Amazon Web Services (AWS) tools to grab some information about our EC2 instances for my examples. Many APIs output JSON and half the world is using AWS so it is a familiar place to start for many.

Let’s look at how we’d use jq to extract the instance IDs of each of our instances, regardless of what region they’re in. This isn’t an AWS post, but one way to do this is to use the aws command line tool, and grab your active regions using something like this:

$ aws ec2 --region us-east-1 describe-regions > regions

This will result in a file named “regions” containing JSON that looks something like the example below. I am only including three regions, but every region that is enabled in your account will be represented in the JSON.

{
    "Regions": [
        {
            "OptInStatus": "opt-in-not-required", 
            "Endpoint": "ec2.eu-north-1.amazonaws.com", 
            "RegionName": "eu-north-1"
        }, 
        {
            "OptInStatus": "opt-in-not-required", 
            "Endpoint": "ec2.ap-south-1.amazonaws.com", 
            "RegionName": "ap-south-1"
        }, 
        {
            "OptInStatus": "opt-in-not-required", 
            "Endpoint": "ec2.eu-west-3.amazonaws.com", 
            "RegionName": "eu-west-3"
        }
    ]
}

There’s lots of stuff we can use this file for, but if you wanted to just pull the region names out of this file, you can use jq like this:

 $ jq '.Regions[].RegionName' regions 
"eu-north-1"
"ap-south-1"
"eu-west-3"

If you add the -r switch to jq, the quotation marks will be removed to give you “raw” output. You can even put it all together like this in one call do away with the intermediate step of saving the regions to a file:

$ aws ec2 --region us-east-1 describe-regions | jq -r '.Regions[].RegionName'
eu-north-1
ap-south-1
eu-west-3

We’re halfway there, now lets pull the instance IDs out of each region using something like this:

$ for region in `jq -r '.Regions[].RegionName' regions`;  do  echo $region; aws ec2 describe-instances --region $region | jq -r "try .Reservations[].Instances[].InstanceId" ; done
eu-north-1
ap-south-1
i-0ace7f7696218ebce
i-02112404f13ca11a1
i-0844cafbadfe53ce9
i-06ade579a2wc5d5af
i-00378947jd66cf9a7
i-085cd87c9cc8811ef
i-053052ecwfdfee23c
i-0bf92c6ak0a7c9918
eu-west-3

From this we learn that I do not have any instances in the eu-north-1 and eu-west-3 regions, but I have 8 instances in the ap-south-1 region. This alone is a huge time saver when compared to the incessant clicking and slow refresh rate of the AWS web interface.

Note the [] after Regions. That means “all the elements in Regions”. But, like any good array, you access elements by index if you need:

$ jq -r '.Regions[0].RegionName' regions
eu-north-1

Those are some examples how to select elements using jq, but it is MUCH more powerful than that. For example, jq has conditional structures like IF/ELSE. Let’s see how many instances are stopped in the ap-south-1 region:

$ for region in `jq -r '.Regions[1].RegionName' regions`;  do  echo $region; aws ec2 describe-instances --region $region | jq 'if .Reservations[].Instances[].State.Name == "stopped" then "Stopped" else empty end' ; done

ap-south-1
"Stopped"

Currently, Amazon does not charge for stopped instances, but that could change one day and being able to simply iterate through your instances to find wasted billing may reduce your costs. I’ve used a similar method to find instances with things like “test” or people’s names in them. Those are almost always forgotten EC2 instances spinning away eating up costs and pruning them is part of a prudent billing management process.

Like any working sysadmin, I learn just enough about any given tool to scratch my current itch. Because of that, I can’t hope to touch on all the amazing things that jq can do - the sheer breadth of its capabilities is amazing. There is also a very good support system such as:

I hope that is enough info to pique your interest in jq. Shell scripts are the work horse of any Linux farm because there’s almost zero dependencies so there’s a high degree of write once, use many” success.

0 Comments
Jon Watson's Death by Tech Newsletter
Jon Watson's Death by Tech
Audio version of some of my newsletter editions.
Listen on
Substack App
RSS Feed
Appears in episode
Jon