I've spent some time reading both this document and the code, and it seems that paths found by the long pathfinder are not cached but recomputed for each unit.
Now, I think a cache can significantly improve performances, especially in the cases where you select a bunch of units that are close to each other and you order them to move to a point that's quite far: in these cases, the units tend to converge on a single line, so why to compute the path for each of them?
So my proposal is to keep a cache of N paths (the most recent? The most frequently used? FIFO?); when a unit needs a path, we first search if in the cache there is already a path with the same target and a starting point that is less than X tiles away from the unit and we use the short range pathfinder to join that path (on which waypoint?), after having checked that that path can be reached, of course.
Possible improvements:
Reuse a cached path also if the end point in less than X tiles from the unit target (and there are not better cached paths);
Reuse part of a path also if the unit position and target are near to intermediate waypoints of the cached path (here the difficult part is to efficiently search if a point is contained in a path);
What do you think?