Comparing multiple directories (Java)

Several tools enable the recursive comparison of directories, but most of them allow for the comparison of only two directories (like my favorite, WinMerge). KDiff3 goes as far as comparing three files or directories.

As for me, I needed to compare up to five directories. As a consequence, I implemented my own algorithm. Read for more information.

At work, we have several configuration directories for our application. One for each environment (development, integration, pre-production, production, local). When I wanted to reconcile the various configuration, I implemented a comparison algorithm.

The basics

What is a directory, if not a tree structure? The challenge was therefore to compare a variable number of tree structures. To start with the beginning, let us define what a tree structure is…

The tree structure

Module: tree-comparer-model

For my algorithm, a tree is only a node with children. For convenience, the root is qualified with an ID.

When writing my first attempt at writing the comparison algorithm, I realized it was a shame to restrict it to files, so I made the trees generic. Thus, the ID can be nay comparable object (comparable is a constraint due to the way I compare trees, but String seems to be a good type for the ID).

Each node also carries information which can be stored as any comparable object. It is imperative that objects are comparable because children must be stored as sorted lists.

The comparison algorithm

Module: tree-comparer-algorithm

The main idea is to generate a tree storing the version of the node in each of the trees being compared. It is null-safe, and handles the case of nodes being present in only some of the selected trees.

The comparison of trees is based on the ordering of nodes (and therefore on the method T.compareTo(T) of Node<T>), and the comparison of the nodes themselves is based on the method T.equals(Object).

The algorithm is fully generic.

The filesystem implementation

Module: tree-comparer-file

This is a simple implementation of my algorithm for files. Obtaining a diff tree is interesting but hard to exploit, so I also implemented a report generator to print this as HTML.

File nodes have a checksum, name and size attributes which are used for equality test.

And to make it more exploitable, I used the diff_match_patch algorithm to display the changes between files. And these are included in a generated HTML file, using Bootstrap’s styles.

And it worked! I did reconcile my configuration directories (and ruled out the latest bugs on that occasion). Five directories compared at a time. Great job! 🙂

Published by

Cyrille Chopelet

Programming addict, UX philosopher, casual gamer, sci-fi enthusiast, hi-tech dilettante, ... Some people even call me a geek.

"Wit beyond measure is man's greatest treasure." − Rowena Ravenclaw