Read YAML input file in BASH, C/C++ and Python

Utpal Kumar   3 minute read      

YAML is a data input format designed for easy readability and machine parsing. It serves the same role as JSON and XML, but is more human-readable and typically less verbose (no closing tags or heavy punctuation). In this post, we will see how to read a YAML file in Bash, C/C++ and Python.

The one mental model

The same people.yml — a set of key/value pairs — can be parsed in any language; the effort just differs. Python (PyYAML) is nearly free, Bash leans on sed/awk text-munging, and C needs the libyaml library and manual node walking. Same data out, very different work in.

Reading one YAML file three ways The same people.yml is parsed by Bash with sed and awk, by C with libyaml, and by Python with PyYAML, all yielding the same key-value data. people.yml key: value pairs Bash · sed + awk C · libyaml Python · PyYAML Same data name, age, gender…
One file, three parsers, identical key/value data — the difference is how much code each takes.

Create a simple YAML file

Let us first create a simple YAML file to read later using the Bash, C and Python scripts, and save it as people.yml.

name: "Andrew" 
age: 22 
gender: "M"
country: "USA"

Using Bash

Now, let us read people.yml using Bash script. This script was obtained from the github gist of pkuczynski (see references).


function parse_yaml {
   local prefix=$2
   local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
   sed -ne "s|^\($s\):|\1|" \
        -e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
        -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p"  $1 |
   awk -F$fs '{
      indent = length($1)/2;
      vname[indent] = $2;
      for (i in vname) {if (i > indent) {delete vname[i]}}
      if (length($3) > 0) {
         vn=""; for (i=0; i<indent; i++) {vn=(vn)(vname[i])("_")}
         printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, $2, $3);
      }
   }'
}


eval $(parse_yaml people.yml)
echo $name
echo $country

For each key, value pair in the yaml file, this script will read the yaml key as a bash variable and assign its value to this variable.

Using C/C++

It is slightly more tricky to read YAML in C/C++. The two most detailed blog article I found for reading YAML files in C are listed in references. I recommend interested readers to go though those articles.

#include <stdio.h>
#include <yaml.h>
#include <assert.h>

int main()
{
    FILE *file;
    yaml_parser_t parser;
    yaml_document_t document;
    yaml_node_t *node;
    int i = 1;

    file = fopen("people.yml", "rb");
    assert(file);

    assert(yaml_parser_initialize(&parser));

    yaml_parser_set_input_file(&parser, file);

    if (!yaml_parser_load(&parser, &document)) {
        goto done;
    }

    // iterate through each node
    while(1) {
        node = yaml_document_get_node(&document, i);
        if(!node) break;
        if(node->type == YAML_SCALAR_NODE) {
            if (node->data.scalar.style == 1){ //assuming that the key is a string
                printf("%s: ", node->data.scalar.value);
                i++; 
                node = yaml_document_get_node(&document, i); //assuming that the value is stored as the next node
                if(!node) break;
                printf("%s (%d)\n", node->data.scalar.value, node->data.scalar.style); //value for each key and the type of value in the braces
            }
        }
        i++;
    }
    yaml_document_delete(&document);


    done:
      yaml_parser_delete(&parser); //to free memory
      assert(!fclose(file));

    return 0;
}

This script uses the libyaml library for YAML parsing and emitting. The above code is an example for parsing the YAML file. The parser takes an input stream of bytes and produces a sequence of events. We segregate the interested events based on the “data style”.

To compile and run the script:

>> gcc -lyaml read_yaml.c -o read_yaml
>> ./read_yaml
name: Andrew (3)
age: 22 (1)
gender: M (3)
country: USA (3)

Using Python

The reading of YAML files works flawlessly in Python. We can use the PyYAML library to parse YAML files.

import yaml

def read_yml(ymlfile):
    with open(ymlfile) as file:
        out_dict = yaml.load(file, Loader=yaml.FullLoader)
    return out_dict
people_dict = read_yml(ymlfile="people.yml")
print(people_dict)
>> python read_yml.py
{'name': 'Andrew', 'age': 22, 'gender': 'M', 'country': 'USA'}

Tip: for untrusted input, prefer yaml.safe_load(file) over yaml.load(..., Loader=FullLoader). safe_load refuses to construct arbitrary Python objects, avoiding a class of code-execution risks; for a simple key/value file like this it returns the exact same dict.

Check your understanding

What does PyYAML's yaml.safe_load return for people.yml?

Recap

Without scrolling up — how do the three compare? To read the same people.yml:

  • Python (PyYAML): yaml.safe_load(file) → a dict. Easiest by far.
  • Bash: a sed + awk function turns each key: value into a shell variable.
  • C (libyaml): open the file, load the document, and walk scalar nodes by hand.

Pick the tool that matches where your program lives — but for anything non-trivial, a real YAML library (PyYAML, libyaml) beats hand-rolled text parsing.

References

  1. pkuczynski/parse_yaml.sh — the Bash YAML parser used above.
  2. Parsing YAML files in C with libyaml — Andrew (wpsoftware.net).
  3. YAML documents parsing with libyaml in C — Stas Kobzar.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.


Leave a comment