JSON From the Command Line

JSON From the Command Line

I want to share a command line tool I have recently learned. jq provides a fantastic and fast way to manipulate and manage JSON files, especially when they become very large. I want to give a quick overview of how it works and then show some examples.

Giving jq input and output

jq works as a filter, which means that it requires input and will give you output. The handiest way to work with it is via redirects. In the simplest case if your input comes from a file there is no redirect necessary:

jq --some-jq-flag  myfile.json

This will take myfile.json as input and print your output to stdout. Say you want to redirect that output to a new file?

jq --some-jq-flag myfile.json > my_modified_file.json

(You may be tempted to cat the file into jq but this is a frowned upon pattern, called useless use of cat or UUOC. See here and here)

What if your input comes from a grep or curl command or some other process?

curl https://my-website | jq --some-jq-flag > my_modified_file.json

Finally, redirecting the jq output into another command (let’s see how many lines our output is) works as expected:

curl https://my-website | jq --some-jq-flag | wc -l

But What Does It Do?

Pretty Print

Perhaps one of it’s most common and easy to see uses is to “pretty print” JSON data. We can tell jq to use the identity filter (.) (i.e. do not alter the data) and it will default to pretty print. This is accomplished via the following:

some_input | jq '.' 

A useful construct to quickly pretty print from a e.g. Python script is to just “print” the data from the script and redirect the output to a file, through jq, e.g.:

my_script_that_prints_json.py > tmp.json
jq '.' < tmp.json

Array to Non-array and Back

Say someone has given you a JSON file that contains many objects, but they are not comma separated, and have no brackets to identify it as an array (this happens often.) You can read the file in “slurp” mode with the identity filter and jq will create an array out of your objects.

jq -s '.'  bracketless_file.json > array_of_objects.json

Now imagine the opposite- you have your array of JSON objects but you want to “remove the array.” Switching back and forth between these forms can be useful depending on the tool with which you want to manipulate the data. You can accomplish this by filtering on the array itself:

jq '.[]'  array.json > non_array.json

Converting to JSONL

Say you need to convert your array of JSON objects into JSONL format, or “new line delimited JSON.” A common use case for this (for me) is converting JSON into an appropriate format to be loaded into BigQuery (Google’s basic SQL database in GCP.) The appropriate flag here is the compact flag:

jq -c '.' non_compact.json > my_new_file.jsonl

(don`t forget file extensions are “meaningless” in a UNIX-style system.)

Accessing Keys and Data

Say you are still getting a file with an array of JSON objects and want to know how many objects are in your array? We can use the identity filter and the built in jq length function:

# Spaces are allowed if you think it is more readable:
# jq '. | length' array.json
jq '.|length' array.json

Imagine you have the following data and want to know how many elements are in a specific list:

$ cat data.json
{
  “object”: [
    {...},
    ... 
    {...},
  ],
    “another object”:
     ...
}

# Get number of elements in “object”
jq '.object|length' data.json

Accessing the value in a specific key is simple as well. Taking the same example data as above:

jq '.object' data.json

This will print out the value associated with that key!

More information

I have barely scratched the surface of what you can do with this tool. Feel free to check out the excellent documentation on the website, and you can always run jq --help or man jq.