The last milestone umbrella release of Smooks 2 is out. Among the things we’ve focused on in this release is performance. In particular, the DFDL cartridge was upgraded to Apache Daffodil 3 in order to leverage Daffodil’s new streaming capabilities. This means that the DFDL cartridge, including the specialised EDI and EDIFACT cartridges, are no longer memory-bound which allows developers to unleash Smooks’s full potential when processing large flat files. Other noteworthy performance improvements are:
Changing the default SAX parser implementation from Apache Xerces to FasterXML’s Woodstox: benchmarks consistently showed Woodstox outperforming Xerces
Eliminating CPU and memory hot spots
Replacing synchronised data structures with thread-safe lock-free alternatives
Forking Apache Xerces and modify it to provide a more optimal DOM implementation
All these optimisations have translated to a massive jump in throughput as demonstrated by our benchmarks. In one benchmark consisting of a non-trivial config, Smooks churned through a 1.5 GB real-world XML dataset in under 20 minutes! Not bad considering that the benchmark ran in a desktop environment. This definitely doesn’t mark the end of our performance tuning exercise. To keep us on our toes, the latest Smooks snapshot is benchmarked every time code is checked in, and the results archived for review.
Aside from better performance, pipelines have made their grand debut in 2.0.0-M3. As discussed in an earlier blog post, a pipeline is a flexible, yet simple, Smooks construct that isolates the processing of a targeted event from its main processing as well as from the processing of other pipelines. With pipelines, you can enrich data, rename/remove elements or attributes, and much more. Read 2.0.0-M3’s docs to learn more about pipelines.
Last but not least, Smooks’s Java namespaces were re-organised to provide a cleaner and more intuitive package structure. Broadly speaking, Smooks classes now fall under two top-level packages:
org.smooks.api: represents the Java contract between the developer and Smooks. Developers can safely assume that referencing interfaces within this package will not lead to breakage in their applications when upgrading to minor or patch versions of Smooks.
org.smooks.engine: represents Smooks’s internals. Whenever possible, developers should avoid referencing this package’s classes since no guarantee is given about their backwards compatibility between Smooks releases.
Milestone 3 concludes the final major changes of Smooks 2. Future releases prior to GA will be release candidates addressing any shortcomings or bugs. As always, feedback is more than welcome on Smooks’s user forum.