Smooks DFDL Filter in a Cross Domain Solution

In the cyber security space, a cross domain solution is a bridge connecting two different security domains, permitting data to flow from one domain into another while minimising the associated security risks. A filter, or more formally a verification engine, is a suggested component in a cross domain solution.

A filter inspects the content flowing through the bridge. Data failing inspection is captured for investigation by the security team. Given this brief description, I argue for the following properties in a verification engine:

Validation: syntactically and semantically validates complex data formats
Content-based routing: routes valid data to its destination while invalid (e.g., malformed or malicious) data is routed to a different channel
Data streaming: filters data whatever the size which implies parsing the data and then reassembling it
Open to scrutiny: the filter’s source code should be available for detailed evaluation. Open source software, by definition, is a prime example of this property

Smooks, with its strong support for SAX event streams and XPath-driven routing, alongside DFDL’s transformation and validation features, manifest the above properties. Picture a situation where NITF (National Imagery Transmission Format) files need to be imported from an untrusted system into a trusted one. Widely used in national security systems, NITF is a binary file format that encapsulates imagery (e.g., JPEG) and its metadata. As part of the import, a filter is needed to unpack the NITF stream, ensure it’s as expected, and repack it before being routed to its destination. Should verification fail, the bad data is put aside for human intervention. This is how such a filter is described in a Smooks config:

dfdl:parser validates the binary content streaming from the input source (i.e., the untrusted system) and converts it to an event stream firing the pipelines. Driving the input’s validation and transformation behaviour is the XML schema nitf.dfdl.xsd, copied from the public DFDL schema repository and tweaked in order to route the data depending on its correctness. The first tweak is to turn invalid NITF data into hex with an InvalidData element wrapped around it like so:

The second tweak is to nest legit data within a ValidData element:

After ingestion, Smooks fires one of the following paths:

a. Happy path on encountering events that are not descendants of the InvalidData node [1]. A pipeline executes dfdl:unparser to reassemble the data in its original format, to then go on and replace the XML execution result stream with the reassembled binary data which will be delivered to the destination (i.e., the trusted sytem).

b. Unhappy path on encountering the InvalidData node. This path’s pipeline emits the hex content of InvalidData to a side output resource named deadLetterStream.

Voilà, a low-cost efficient verification engine was implemented with a few lines of XML. The complete source code of this example is available online.

_{^{1. Smooks 2 RC1 will feature support for selector negation.}}