Optimizing Laszlo Explorer: Laziness is good

One of the big changes in the recently released OpenLaszlo 3.2 is that we removed KRANK. KRANK was an optimization that sped up application launch by effectively executing part of the launch process at compile time. Unfortunately, KRANK was a fragile solution that only really worked with targeting Flash 6. It was expensive to maintain and wasn't going to survive the transition to multiple runtimes. So we made the decision to remove it.

One side effect was that with the absence of KRANK several OpenLaszlo applications no longer launched in an acceptable amount of time. The most prominent of these was Laszlo Explorer, the main way new Laszlo developers explore the capabilities of the language. Without KRANK, Laszlo Explorer took almost 20 seconds to launch on my 1.6Mhz PowerBook. Clearly too slow.

So what to do? I decided to take a look at why this application took so long to launch. If we weren't going to support KRANK, we would need to dig in and understand where the time was really going. I ended up improving launch speed by just about 90%, and learned some important lessons about OpenLaszlo and the tree component.

I started with the sort of optimizations I've heard our Studios folks discuss. For example, I centralized a number of similar XPath constraint expressions into a single ondata method, reducing the number of delegates and the number of XPath evaluations done during setup. These relatively simple changes brought me a 3% speedup: nice for an hours' work, but not nearly enough.

Next, I noticed that each tree node in the Explorer menu contained a relatively large number of subviews and subnodes. For example, there were two complementary state nodes, used to control the size and position of the node label depending on its depth in the tree. Each of these states costs a node allocation, so converting that code to a single function called from the oninit handler saved time.

There were also animatorgroups that were active only when a tree node was opening or closing. By turning the animator groups into classes and instantiating them only when needed I was able to save the cost of allocation of two deeply nested animator groups.

By reducing the subnode count by 50%, I saved another 5% on launch, for a total savings of 8%. Still not enough.

At this point, I started experimenting with OpenLaszlo's built-in profiler. I'll write a separate post about this useful tool, so for now let me just describe the breakthrough finding: even though only nine tree nodes are visible when the Explorer launches, something like 280 tree node instances were being allocated!

A bit of code inspection quickly revealed the problem. It was in the LZX code declaring the tree structure itself:

   <basetree name='menu' datapath='navdata:/menu'>
     <navbutton width='160' level='0'>
        <navbutton width='160' level='1'>
          <navbutton width='160' level='2'>
            <navbutton width='160' level='3'>
            </navbutton>
          </navbutton>
        </navbutton>
      </navbutton>
   </basetree>

To understand why this is slow, you need to understand a bit about how node allocation interacts with node replication. Replication acts by replacing a single child node (usually a view) with multiple nodes, one for each data element returned by the original node's datapath XPath expression. This replacement process only occurs after the original node has been allocated and initialized, and—crucially—constructed its own subnodes.

With hindsight, one problem is clear: at the first level, a nav button is constructed, including three nested subviews. This nav button is then replicated seven times, each with three nested subviews. The second level nav buttons are then data bound and replicated, each initially having two nested subviews. And so on for the third and fourth nested subviews. If a given node in the XML data doesn't have subnodes, nested nav buttons are still allocated to level four in a single chain. The result is that the tree is balanced to level four: there are always four nav buttons allocated along each branch of the tree, regardless of XML data structure. So even though there are only 170 nodes in the XML structure, we allocate 280 tree nodes.

But the bigger problem is that we are allocating anything but the initially visible nodes in the first place! Why aren't we just allocating the seven topmost tree nodes, plus the two visible in the one open topmost node? Saving 40% by only allocating nodes that will be data bound is one thing, but why not save over 90% by only allocating nodes when they become visible?

I was able to quickly engineer a change to the navbutton class such that it would only allocate subnodes when it was opened. This change required dynamically allocating the initial subnode on open and then letting it be replicated. There were a few subtleties: deferring the 'open' animation until replication was complete (solution thanks to work done by the community on the laszlo-user mailing list), and continuing to support the ability to deep link into the Explorer menu structure. But the change turned out to be surprisingly compact and generic. And Laszlo Explorer now launches in about two seconds on my laptop!

We're exploring the possibility of making a similar change to future versions of the basetree so that everyone can share in this optimization. Lazy allocation of subnodes requires a more data-driven usage model, so this would not be a transparent change.

I'll note in closing that before doing this work I had gone along with the conventional wisdom that the recursive design of basetree makes it slow. I no longer agree with this assessment: with properly lazy allocation of subnodes, there's no inherent slowness to keep us from enjoying this elegant component.

Technorati Tags:
, ,

5 Responses to “Optimizing Laszlo Explorer: Laziness is good”

  1. Raju Says:

    Hi Jim, excellent breakdown and explanation of the problem. One of the things I’m most afraid with OpenLazlo development is the unnecessary instantiation of a high number of views/subnodes on application startup. The performance with the Flash 8 plugin improved a lot, but still you have to carefully consider which parts of your applications have to be instantiated and ready for usage when the splash screen is removed.

    The last note is interesting: I would have thought, too, that the recursive design of the basetree would be responsible for the slowness. But there are always surprises around when improving application performance. Looking forward to your post on the OpenLazslo profiler.

  2. Simon O'Connor Says:

    Great breakdown of performance tips and a good insight into how some things work within Laszlo.

    On your last note also, while you say that the component can be made to be effecient and is elegant, my issue with it has always been it’s handling of dynamic data, which is very poor. I have switched our application over to the new tree in the incubator of 3.2 and found it works wonderfully with dynamic data, and also seems quicker to run also.

    What are your thoughts on its makeup in comparison. (As it directly alters the dataset its representing, as well as being a somewhat ‘flat’ design rather than recursive)

  3. jgrandy Says:

    Simon - I’m not myself sure whether the purely recursive design of tree, or the flattened design of opttree, could stand alone given the underlying performance constraints we operate with. The original tree design can be made fast if it is completely lazy — but that requires specifying subnode classname and letting the tree instantiate its children on demand rather than using the declarative syntax that we like so much. On the other hand, opttree is very fast but isn’t as appropriate if the secondary levels of the tree have very different look and feel from the top levels.

    I do think the original tree could be made to support dynamic data with a moderate amount of work; I just don’t think that was considered by the original designer.

  4. Tom Says:

    I think to be fair 20 seconds is damn quick if you are infact running it on 1.6Mhz powerbook?

  5. Me Says:

    Tom: lol.