Automating Templated JSON Fuzzing / Unit Testing

May 29, 2022 •

• ☕️☕️☕️☕️ 20 min read

Introduction

JSON (JavaScript Object Notation) is a widely used lightweight data-interchange format. It is commonly used to share data between decoupled components / systems, store data persistently, and import / export data in / out of systems. From a security and functional testing perspective, verifying how applications handle JSON structures is important. Weak JSON validation can result in abrupt failures, race conditions, and can have serious security consequences depending on the context / use case.

JSON structures are a collection of unstructured (schemaless) key-value pairs and arrays. The unstructured nature of JSON introduces additional validation complexity in application logic. For example, a JSON structure may be valid JSON but may not contain the key-value attributes / structure you are expecting. Ultimately this results in most applications centralising JSON deserialisation logic, and custom implementing attribute validation logic on a per use case basis. The lack of centralisation for validation logic increases the likelihood of something going wrong. Likewise, it also increases the complexity of remediating issues when something goes wrong.

Testing JSON validation is context specific and delivers the most value when you have a known good sample, or template to work from. For example, if an application expects the key username and only parses this attribute, testing random key attributes will provide little to no value (unless you’re targeting the JSON parser itself). The likelihood of identifying relevant key attributes without a Corpus / template is low, especially in complex nested JSON structures. The favourable alternative for testing application logic is to use a known good JSON structure as a template. You can then use the template to manipulate key-attribute to test how the application handles corner cases.

Creating and testing JSON permutation manually is prone to human error, inconsistent, and generally tedious. This post presents a JSON parsing algorithm in Python that generates permutations of JSON structures that can be used to automate testing / general purpose fuzzing activities.

Background

The majority of modern programming languages have inbuilt support for serialising / deserialising JSON data. This post focuses on Python because it is:

Widely used and re-usable
A Dynamic programming language
Has inbuilt support for JSON through the JSON library

Point 2 and 3 are important from an implementation perspective because it reduces the complexity / dependency graph. The JSON library in point 3 provides a mechanism to deserialise / serialise JSON data, this means JSON objects can be converted to Python native objects and vise versa. The focus of this post will be the deserialisation function json.loads(), and the resulting Python Data Structures generated. To demonstrate this, the following example shows how to deserialise JSON structures and access attributes.

import json

json_string = """
{
    "website": "nikola.dev"
}
"""

# Output: JSON string is of type <class 'str'>
print(f"JSON string is of type {type(json_string)}")

deserialised_json = json.loads(json_string)

# Output: Deserialised JSON is of type <class 'dict'>
print(f"Deserialised JSON is of type {type(deserialised_json)}")

# Output: Website is: nikola.dev
print(f"Website is: {deserialised_json.get('website')}")

The same concept applies for nested JSON structures as can be seen below. Note that the dict and list structures are Python native data structures. So a list can be accessed via an index, and a dict can be accessed via a key.

import json

json_string = """
{
    "array_example": [
        1,
        2
    ],
    "dict_example": {
        "key": "value"
    }
}
"""

# Output: JSON string is of type <class 'str'>
print(f"JSON string is of type {type(json_string)}")

deserialised_json = json.loads(json_string)

# Output: Array example is of type <class 'list'>
print(f"Array example is of type {type(deserialised_json.get('array_example'))}")


# Output: Entry in array example is of type <class 'int'>
print(f"Entry in array example is of type {type(deserialised_json.get('array_example')[0])}")

# Output: Dict example is of type <class 'dict'>
print(f"Dict example is of type {type(deserialised_json.get('dict_example'))}")

# Output: Entry in dict example is of type <class 'str'>
print(f"Entry in dict example is of type {type(deserialised_json.get('dict_example').get('key'))}")

Requirements

There are three requirements for the solution: 1) parse structures recursively, 2) be granular, and 3) be context aware. The first point means the algorithm should handle complex / nested structures, ensuring the algorithm is versatile and provides complete coverage. The second point means the solution should only change one key attribute in a template at a time. The third point means the algorithm should know what attribute it is changing, and how to navigate to that attribute in a structure independently. To demonstrate the importance of point 2 and 3, consider the following basic JSON object:

{
    "id": "123",
    "name": "jane"
}

A simple parsing algorithm may deserialise the JSON object and change every key’s value to PAYLOAD as a test case.

{
    "id": "PAYLOAD",
    "name": "PAYLOAD"
}

Let’s say the receiving application consuming the modified JSON object requires a valid id value. In the example above, the receiving application would not exercise any logic relating to the name parameter because the id key is not valid. The application would see the id value is PAYLOAD, not 123, and return an error immediately. Only changing one key attribute at a time is important to maximise application code reach (control flow depth). Splitting the test case above into two test cases maximises code reach.

{
    "id": "PAYLOAD",
    "name": "jane"
}

{
    "id": "123",
    "name": "PAYLOAD"
}

Design

The proposed solution is to split the problem into two parts: JSON object parsing, and attribute modification. The path_finder component takes a JSON object and maps out the paths to each key-attribute using the Depth First Search algorithm. The injector component takes the paths for each parameter and uses it to walk the JSON structure to manipulate a value. Decoupling the JSON structure parsing logic from the attribute modification logic has three main advantages:

Maintainability - Reduces code complexity making the solution easier to debug, and easier to unit test.
Traceability - Enables surrounding logic to intelligently understand what was changed in a JSON template to achieve a given output.
Extensibility - Simplifies parent child relationship mappings. For example, path a/b/c tells us that c is a child of b which is a child of a. We can use this to test logical permutations such as a/b, ignoring c, or just a by itself, ignoring b/c.

The disadvantage of decoupling the structure parsing logic from the attribute modification logic is reduced performance. The parser has to walk JSON objects multiple times, reducing the overall efficiency of the solution. The JSON parser’s runtime complexity is O(V+E), and injector is O(n). Despite this, the advantages still outweigh the disadvantages in this use case.

To demonstrate the high level design of path_finder and injector from end to end let’s walk through an example. The following structure has 6 primitive values. 4/6 are visualised for demo purposes hobbies/0/priority and hobbies/1/name are not visualised to avoid overloading the diagram. path_finder should generate paths to each primitive value in the structure, these paths are referred to as parameter_paths.

                                         +----------------+
                                         |                |
                                         | [              |
                           +------------>|     "username" |
                           |             | ]              |
                           |             |                |
                           |             +----------------+
                           |
                           |             +----------------+
                           |             |                |
                           |             | [              |
{                          |             |     "hobbies", |
    "username": "Jane", ---+         +-->|     0,         |
    "hobbies": [                     |   |     "name"     |
        {                            |   | ]              |
            "name": "climbing", -----+   |                |
            "priority": "high"           +----------------+
        },
        {                                +----------------+
            "name": "skating",           |                |
            "priority": "medium" ---+    | [              |
        }                           |    |     "hobbies", |
    ],                              +--->|     1,         |
    "job": "Physicist" ---+              |     "priority" |
}                         |              | ]              |
                          |              |                |
                          |              +----------------+
                          |
                          |              +------------+
                          |              |            |
                          |              | [          |
                          +------------->|     "job"  |
                                         | ]          |
                                         |            |
                                         +------------+

injector should use the parameter_paths to manipulate the structure. For example, injector could use the parameter path hobbies/1/name in the structure above to replace the value skating to PAYLOAD. Note that this should be the only value that is modified in the template JSON object.

{
    "username": "Jane",
    "hobbies": [
        {
            "name": "climbing",
            "priority": "high"
        },
        {
            "name": "PAYLOAD",
            "priority": "medium"
        }
    ],
    "job": "Physicist"
}

Path Finder Implementation

path_finder generates a list of lists that contains the path to each primitive value in a deserialised JSON structure. A path to a key attribute in a structure is known as a parameter_path. To demonstrate this, below is an example of the parameter paths generated for a basic JSON structure. Note, within the algorithm the result is represented as [["name], ["age]], not as two separate lists as shown below for demonstration purposes.

                          +-------------+
                          |             |
                          | [           |
                      +-->|     "name"  |
+------------------+  |   | ]           |
|                  |  |   |             |
| {                +--+   +-------------+
|   "name":"test", |
|   "age": "100"   |
| }                +--+   +-------------+
|                  |  |   |             |
+------------------+  |   | [           |
                      +-->|     "age"   |
                          | ]           |
                          |             |
                          +-------------+

Implementing path_finder in Python is straightforward using the Depth First Search algorithm. Python’s dynamically typed nature means we can have a single function that accepts a deserialised JSON structure and recursively evaluates the result. The function determines if an input is a list or a dict, depending on which it is, the function either invokes a list handler, or a dict handler. If it is neither, we know we have hit the bottom of the nested structure (a Python primitive value). The list and dict handlers iterate on the input structure and recursively call the original function for each index / entry. Programmatically this looks like:

def map_structure(structure, stack=None, depth=0):

    # call the list handler
    if isinstance(structure, dict):
        _dict_handler(input_dict=structure, stack=stack, depth=depth)  

     # call the list handler
    if isinstance(structure, list):
        _list_handler(input_list=structure, stack=stack, depth=depth)

    return stack  # return the stack of parameters used to get to the primitive

The list and dict handlers are almost identical, the only difference is that the list handler iterates on an index and the dict handler iterates on a key. For demonstration purposes let’s look at the dict handler:

def _dict_handler(input_dict, stack=None, depth=0):
    # Check if there are any parent parameters, if not this is the first level
    if stack is None:
        stack = []

    parameter_paths = []  # List to store completed parameter paths

    for key, value in input_dict.items():  # Iterate on each key attribute 
        stack = stack[:depth]  # Remove non-applicable keys for variable length children
        stack.append(key)  # append current key to parameter path chain 

        # Call the previous function for each key in the structure
        result = self.map_structure(structure=value, stack=stack, depth=depth + 1)

        # We either extend / append depending on if the value is a primitive or list/dict.
        # This is done to avoid multi depth structures returning nested list / dict results
        if not isinstance(value, list) and not isinstance(value, dict):
            parameter_paths.append(result)
        else:
            parameter_paths.extend(result)

    return parameter_paths  # return the completed parameter path

The full source for all of the path_finder functions can be found in path_finder.py. To demonstrate how these functions work together, I’ve included log statements at the beginning and end of the map_structure() function. As well as log statements at the end of the _list_handler and _dict_handler functions. Consider the following structure. Note the structure has complex relationships with variable length / type children.

{
    "name": "blah",
    "hobbies": [
        "climbing",
        [
            "skating"
        ],
        "walking"
    ]
}

Observe how the list and dict handlers recursively call the map_structure() function, building a stack of parameters used to navigate to a primitive value.

Starting - structure: {'name': 'blah', 'hobbies': ['climbing', ['skating'], 'walking']} - stack: None - depth: 0
Starting - structure: blah - stack: ['name'] - depth: 1
Done - ['name']
Starting - structure: ['climbing', ['skating'], 'walking'] - stack: ['hobbies'] - depth: 1
Starting - structure: climbing - stack: ['hobbies', 0] - depth: 2
Done - ['hobbies', 0]
Starting - structure: ['skating'] - stack: ['hobbies', 1] - depth: 2
Starting - structure: skating - stack: ['hobbies', 1, 0] - depth: 3
Done - ['hobbies', 1, 0]
List handler done: [['hobbies', 1, 0]]
Starting - structure: walking - stack: ['hobbies', 2] - depth: 2
Done - ['hobbies', 2]
List handler done: [['hobbies', 0], ['hobbies', 1, 0], ['hobbies', 2]]
Dict handler done:  [['name'], ['hobbies', 0], ['hobbies', 1, 0], ['hobbies', 2]]

Final result: [['name'], ['hobbies', 0], ['hobbies', 1, 0], ['hobbies', 2]]

This approach scales with complex nested JSON structures. path_finder recursively traverses nested structures using Depth First Search, meaning even unusual nested structures are handled elegantly.

                                                            +--------+
                                                            |        |
                                                            | [      |
                                                            |     0, |
                                                            |     0, |
                                                   +------->|     0, |
                                                   |        |     0, |
                                                   |        |     0  |
                                                   |        | ]      |
+-----------------------------------------+        |        |        |
|                                         |        |        +--------+
| [                                       |        |
|     [                                   +--------+        +-----------+
|         [                               |                 |           |
|             [                           |                 | [         |
|                 [                       |                 |     0,    |
|                     "seriously",        |                 |     0,    |
|                     {                   |                 |     0,    |
|                         "why": "though" +---------------->|     0,    |
|                     },                  |                 |     1,    |
|                     "jeez"              |                 |     "why" |
|                 ]                       |                 | ]         |
|             ]                           |                 |           |
|         ]                               |                 +-----------+
|     ]                                   +--------+
| ]                                       |        |        +--------+
|                                         |        |        |        |
+-----------------------------------------+        |        | [      |
                                                   |        |     0, |
                                                   |        |     0, |
                                                   +------->|     0, |
                                                            |     0, |
                                                            |     2  |
                                                            | ]      |
                                                            |        |
                                                            +--------+

Injector Implementation

The parameter paths generated by path_finder can then be used by injector to manipulate a template (the deserialised JSON structure). This post focuses on three types of manipulations categorised by test scenarios: modifying, removing, and reordering key-attributes. For simplicity, these three scenarios are grouped into two concepts algorithmically: generating parameter permutations (modifying key-attributes), and structural permutations (removing and reordering key-attributes). This section demonstrates these concepts with practical examples, and talks about the implementation.

Parameter Permutations

Parameter permutations refers to modifying a key-attribute parameter. For example, changing a key-attribute from one value to another. This is the most common use case for most general purpose unit testing and fuzzing. To demonstrate this, the structure below will generate 3 parameter permutations for each parameter with the payload PAYLOAD. Note that injector only modifies only one parameter at a time and walks the template using the parameter path generated by path_finder.

                                                   +---------------------------------+
                                                   |                                 |
                                                   | {                               |
                                                   |     "hobbies": [                |
                                                   |         {                       |
                                                   |             "name": "PAYLOAD",  |
                                                   |             "gear": [           |
                                         +-------->|                 "helmet",       |
                                         |         |                 "roller blades" |
                                         |         |             ]                   |
                                         |         |         }                       |
                                         |         |     ]                           |
                                         |         | }                               |
                                         |         |                                 |
                                         |         +---------------------------------+
                                         |
+---------------------------------+      |         +---------------------------------+
|                                 |      |         |                                 |
| {                               |      |         | {                               |
|     "hobbies": [                +------+         |     "hobbies": [                |
|         {                       |                |         {                       |
|             "name": "skating",  |                |             "name": "skating",  |
|             "gear": [           |                |             "gear": [           |
|                 "helmet",       +--------------->|                 "PAYLOAD",      |
|                 "roller blades" |                |                 "roller blades" |
|             ]                   |                |             ]                   |
|         }                       |                |         }                       |
|     ]                           +-------+        |     ]                           |
| }                               |       |        | }                               |
|                                 |       |        |                                 |
+---------------------------------+       |        +---------------------------------+
                                          |
                                          |        +--------------------------------+
                                          |        |                                |
                                          |        | {                              |
                                          |        |     "hobbies": [               |
                                          |        |         {                      |
                                          |        |             "name": "skating", |
                                          |        |             "gear": [          |
                                          +------->|                 "helmet",      |
                                                   |                 "PAYLOAD"      |
                                                   |             ]                  |
                                                   |         }                      |
                                                   |     ]                          |
                                                   | }                              |
                                                   |                                |
                                                   +--------------------------------+

From an implementation perspective, path_finder has already done the hard work. injector simply walks the JSON structure using the parameter paths. This takes the form of a glorified for loop with some deep copying logic to avoid unintentionally changing template values. Python is particularly useful here because the same logic works for both dict and list data types, see the snippet below. Note, the full source for this function can be found in modify_attribute_in_structure_by_path().

target_dict = copy.deepcopy(structure)

current = target_dict
for index, k in enumerate(path):

    # We have not reached our destination yet, keep digging
    if index + 1 != len(path):
        current = current[k]  # Update current reference 
        continue

    # We hit the target attribute
    current[k] = "VALUE_WE_WANT_TO_INJECT" 
    break

return target_dict

Structural Permutations

Structural permutations are split into two parts, removing key-attributes, and re-ordering key-attributes. Both of these concepts are grouped together under structural permutations because they focus on manipulating the structure of a deserialised JSON object. Ultimately, both removing and re-ordering key attributes have the same end goal of generating output that can be used to test how an application validates the structure of a JSON object. Unlike parameter permutations which focus more on exercising an application’s input validation logic.

From a test case perspective, structural permutations are particularly useful for complex JSON structures. This is because multiple parent child key-attribute relationships are: 1) harder to validate, and 2) harder to comprehensively test. This is a great way to validate how robust an application’s JSON key-attribute structural validation is.

Removing key attributes

Removing key attributes generates permutations of a structure with each point of the chronological parameter path removed. For example, the path a/b/c would generate three permutations. a/b removing c, a removing b/c, and None removing a/b/c. To demonstrate this, the structure below will generate 6 missing attribute permutations, one for each logical attribute.

Pro tip: it’s easier to read the diagram bottom up.

                                                       +----+
                                                       |    |
                                       +-------------->| {} |
                                       |               |    |
                                       |               +----+
                                       |
                                       |               +-------------------+
                                       |               |                   |
                                       |               | {                 |
                                       |     +-------->|     "hobbies": [] |
                                       |     |         | }                 |
                                       |     |         |                   |
                                       |     |         +-------------------+
                                       |     |
                                       |     |         +---------------------------------+
                                       |     |         |                                 |
                                       |     |         | {                               |
                                       |     |         |     "hobbies": [                |
                                       |     |         |         {                       |
                                       |     |         |             "gear": [           |
                                       |     |         |                 "helmet",       |
                                       |     |    +--->|                 "roller blades" |
                                       |     |    |    |             ]                   |
                                       |     |    |    |         }                       |
                                       |     |    |    |     ]                           |
                                       |     |    |    | }                               |
                                       |     |    |    |                                 |
+---------------------------------+    |     |    |    +---------------------------------+
|                                 -----+     |    |
| {                               |          |    |    +-------------------------------+
|     "hobbies": [                -----------+    |    |                               |
|         {                       |               |    | {                             |
|             "name": "skating",  ----------------+    |     "hobbies": [              |
|             "gear": [           |                    |         {                     |
|                 "helmet",       -------------------->|             "name": "skating" |
|                 "roller blades" |                    |         }                     |
|             ]                   -----------------+   |     ]                         |
|         }                       |                |   | }                             |
|     ]                           ----------+      |   |                               |
| }                               |         |      |   +-------------------------------+
|                                 |         |      |
+---------------------------------+         |      |   +---------------------------------+
                                            |      |   |                                 |
                                            |      |   | {                               |
                                            |      |   |     "hobbies": [                |
                                            |      |   |         {                       |
                                            |      |   |             "name": "skating",  |
                                            |      |   |             "gear": [           |
                                            |      +-->|                 "roller blades" |
                                            |          |             ]                   |
                                            |          |         }                       |
                                            |          |     ]                           |
                                            |          | }                               |
                                            |          |                                 |
                                            |          +---------------------------------+
                                            |
                                            |          +--------------------------------+
                                            |          |                                |
                                            |          | {                              |
                                            |          |     "hobbies": [               |
                                            |          |         {                      |
                                            |          |             "name": "skating", |
                                            |          |             "gear": [          |
                                            +--------->|                 "helmet"       |
                                                       |             ]                  |
                                                       |         }                      |
                                                       |     ]                          |
                                                       | }                              |
                                                       |                                |
                                                       +--------------------------------+

Note that there is automatic deduplication built in to avoid duplicate tests. For example, without deduplication the structure below would generate four missing payloads for a and b because they both have the same parent. The diagram below demonstrates that only 3 are generated.

                                   +----+
                                   |    |
                              +--->| {} | <--Would generate this twice without deduplication
                              |    |    |
                              |    +----+
                              |
+------------------------+    |    +------------------------+
|                        |    |    |                        |
| {                      +----+    | {                      |
|     "deduplication": [ |         |     "deduplication": [ |
|         "a",           +-------->|         "b"            |
|         "b"            |         |     ]                  |
|     ]                  +---+     | }                      |
| }                      |   |     |                        |
|                        |   |     +------------------------+
+------------------------+   |
                             |     +------------------------+
                             |     |                        |
                             |     | {                      |
                             |     |     "deduplication": [ |
                             +---->|         "a"            |
                                   |     ]                  |
                                   | }                      |
                                   |                        |
                                   +------------------------+

From an implementation perspective, injector simply walks the JSON structure using the parameter path, exactly like the parameter permutations variant. The only difference is once the target attribute is reached, the value is deleted instead of replaced. The full source for this function can be found in remove_attribute_in_structure_by_path().

# Take a deep copy to avoid accidentally changing the original template
target_dict = copy.deepcopy(structure)

current = target_dict
for index, k in enumerate(path):

    # We have not reached our destination yet, keep digging
    if index + 1 != len(path):  # path is the parameter path we want to remove
        current = current[k]  # update our current reference
        continue

    del current[k]  # delete the attribute
    break

return target_dict

Reordering key attributes

Reordering key-attributes generates permutations for a structure with the payload at each point in the chronological parameter path. For example, the path a/b/c would generate two permutations. One for a/b, ignoring c, and one for a, ignoring b/c. It is not uncommon for applications to assume b/c exists if a exists, or c exists if a/b exists. To demonstrate this, the structure below will generate 3 permutations for each parent in the parameter path with the payload STRUCTURE PAYLOAD.

                                                   +-----------------------------------------+
                                                   |                                         |
                                                   | {                                       |
                                                   |     "hobbies": [                        |
                                                   |         {                               |
                                         +-------->|             "name": "skating",          |
                                         |         |             "gear": "STRUCTURE PAYLOAD" |
                                         |         |         }                               |
                                         |         |     ]                                   |
                                         |         | }                                       |
+---------------------------------+      |         |                                         |
|                                 |      |         +-----------------------------------------+
| {                               |      |
|     "hobbies": [                +------+         +-----------------------------+
|         {                       |                |                             |
|             "name": "skating",  |                | {                           |
|             "gear": [           |                |     "hobbies": [            |
|                 "helmet",       +--------------->|         "STRUCTURE PAYLOAD" |
|                 "roller blades" |                |     ]                       |
|             ]                   |                | }                           |
|         }                       |                |                             |
|     ]                           +-------+        +-----------------------------+
| }                               |       |
|                                 |       |        +------------------------------------+
+---------------------------------+       |        |                                    |
                                          |        | {                                  |
                                          +------->|     "hobbies": "STRUCTURE PAYLOAD" |
                                                   | }                                  |
                                                   |                                    |
                                                   +------------------------------------+

Note that there is automatic deduplication built in to avoid duplicate tests. For example, the structure below would generate two structural payloads for a and b as they both have the same parent. The diagram below demonstrates that only one is generated.

+------------------------+
|                        |      +------------------------------------------+
| {                      |      |                                          |
|     "deduplication": [ |      | {                                        |
|         "a",           +----->|     "deduplication": "STRUCTURE PAYLOAD" |
|         "b"            |      | }                                        |
|     ]                  |      |                                          |
| }                      |      +------------------------------------------+
|                        |
+------------------------+

From an implementation perspective, reordering key attributes uses the exact same function as the parameter permutations variant. The difference is what is targeted by the modify_attribute_in_structure_by_path() function. Instead of targeting a full parameter path from path_finder, we slice the path one index off at a time and inject a value at that point. The full source can be found in generate_structure_payloads_by_path(), this looks like:

def generate_structure_payloads_by_path(structure, path, value_to_inject):
    return[
            self.modify_attribute_in_structure_by_path(
                structure=structure,
                path=path[:-index],  # Cut the last key off, don't fuzz the primitive
                value_to_inject=value_to_inject,
            )
            for index, _ in enumerate(path)
            if path[:-index] != list()  # Avoid redundant payloads when there is nothing to change
    ]

Conclusion

This post practically demonstrates implementing the Depth First Search algorithm to map out a JSON structure, and then shows how the parameter paths can be used to generate JSON structure permutations. The permutations include three test cases: modifying key-attributes, removing key-attributes, and reordering key-attributes. The full source code with examples can be found on Github, see templated-json-fuzzer. Helper functions are exposed to practically use parameter permutations (modifying key-attributes), and structural permutations (removing key-attributes and reordering key-attributes), see here in the templated-json-fuzzer source.

The permutations generated using the algorithm presented in this post can be plugged into any system. This can be used to automate unit tests, general purpose fuzzing, and more. I’ve used this in the past to test web API user input parsing logic, file parsing for thick client applications, and automate unit tests.

Happy hacking!