Thursday, October 16, 2008

Are you listening?

Did you know that the Eclipse SDK alone has over 250 different listener interfaces? In the following screenshot, look at the size of the scrollbar slider!

This is only the tip of the iceberg - corresponding to all these listener interfaces, there are about 250 different Event classes. This is so that later on, additional event data can be added without breaking binary API compatibility by passing a dedicated event object to the listener. (Remember that where we forgot to use event objects, we ended up with warts like IPerspectiveListener4.)

To top it off, there are probably about 500 methods for adding and removing all these kinds of listeners. addLaunchLabelChangedListener/removeLaunchLabelChangedListener, addLocalSiteChangedListener/removeLocalSiteChangedListener, addLogEntryListener/removeLogEntryListener and so on.

In the end, when you think about it at a more abstract level, there are only a couple of possible changes that can happen in a program. A value can change, from an old value to a new value. A set can change, by adding or removing elements. A list can change, by adding elements at a certain index, removing elements at an index, or moving or replacing elements.

How about we use only three listener interfaces, IValueChangeListener, ISetChangeListener, and IListChangeListener? Or we settle on something that looks like EMF's notification API. We might need pre- and post-notifications, and maybe also changes to maps, so the resulting number of listener interfaces might be closer to ten. Definitely not 500 - and I only looked at the base Eclipse SDK...

What do you think? Can we make the Eclipse API more consistent and uniform in this way? Do you agree that all these domain-specific listener interfaces, each with a corresponding event class, and add/remove methods, are a source of accidental complexity, and settling on a small number of generic listener APIs can reduce code bloat?

Tuesday, October 07, 2008

Avoiding Bloat

Martin Oberhuber asked on the e4 mailing list what had happened to the pervasive architectural themes that were identified at the summit, such as reducing bloat, too many listeners, and becoming more asynchronous. I started writing a response, focusing on one of the topics, bloat, and it quickly became more than just an email response so I am posting it here.

Before we go into the details, let me state the obvious: It is pretty much guaranteed that we will cause more bloat, overall, for the case of the Eclipse SDK based on the new e4 platform, as long as that SDK still contains 3.x plug-ins that require compatibility layers. This is because all the old (bloated?) functionality and the new (lean?) functionality will be there at the same time.

It seems the best we can do is to avoid bloating the new platform itself, when it is used without any compatibility layers. Unfortunately, we have all these cool new technologies that we would like to use - EMF, CSS, declarative UIs, data binding, cross-compiling of Java to ActionScipt, being able to use multiple languages, client-server split, etc. Put them together and the likely result is bloat. Or is there a way to avoid bloat and use cool new technology at the same time?

So what is bloat? Let's look at Wikipedia's definition of software bloat (thanks John Arthorne for pointing me to it):

Software bloat, also known as bloatware or elephantware, is a term used in both a neutral and disparaging sense, to describe the tendency of newer computer programs to be larger, or to use larger amounts of system resources (mass storage space, processing power or memory) than necessary for the same or similar benefits from older versions to its users.

Let me dive into one concrete example, to show why this is a hard problem:

Code bloat through redundancy, caused by low-level API occurs when clients of a low-level API have to write the same boilerplate code over and over again. Think of all the code we have to write for SWT layouts, for example:
Composite contents = new Composite(parentComposite, SWT.NONE);
contents.setLayoutData(new GridData(GridData.FILL_BOTH));
GridLayout layout = new GridLayout();
layout.marginHeight = convertVerticalDLUsToPixels(IDialogConstants.VERTICAL_MARGIN);
layout.marginWidth = convertHorizontalDLUsToPixels(IDialogConstants.HORIZONTAL_MARGIN);
layout.verticalSpacing = convertVerticalDLUsToPixels(IDialogConstants.VERTICAL_SPACING);
layout.horizontalSpacing = convertHorizontalDLUsToPixels(IDialogConstants.HORIZONTAL_SPACING);
layout.numColumns = 2;

Label label = new Label(contents, SWT.LEFT);
GridData data = new GridData();
data.horizontalAlignment = GridData.FILL;

filenameField = new Text(contents, SWT.SINGLE | SWT.BORDER);
data = new GridData();
data.horizontalAlignment = GridData.FILL;
data.grabExcessHorizontalSpace = true;
Whenever there is a low-level way of doing things, you can come up with a higher-level way and reduce the code size. Of course, you are only reducing the overall code size when the higher-level abstraction is used widely enough to amortize the cost of its implementation. In our SWT layout example, you could write instead:
Composite contents = new Composite(parentComposite, SWT.NONE);
contents.setLayoutData(new GridData(GridData.FILL_BOTH));

new Label(contents, SWT.LEFT).setText(label);

filenameField = new Text(contents, SWT.SINGLE | SWT.BORDER);

Point defaultMargins = LayoutConstants.getMargins();
defaultMargins.x, defaultMargins.y).generateLayout(contents);
Now this is looking a lot shorter, and maybe even more elegant. However, even if GridLayoutFactory is used widely enough to amortize the additional footprint caused by its implementation, there are still two problems: first, the original code ran faster, and second, you now have to learn two APIs - the higher-level one, and the lower-level one when the abstraction gets in your way.

You can see where I am going - there is no clear cut solution to this. It is really a hard problem, and in many cases, we will have to trade off one of the factors disk size, memory size, CPU consumption against the others.

Taking it just a little further, here is another idea, taken from the wikipedia article on code bloat:

The difference in code density between various languages is so great that often less memory is needed to hold both a program written in a "compact" language (such as a domain-specific programming language, Microsoft P-Code, or threaded code), plus an interpreter for that compact language (written in native code), than to hold that program written directly in native code.

So if we had a domain-specific language for creating SWT widgets and specifying their layout, we could get away with no Java code at all! I don't know if the .class file is a space efficient encoding for SWT widget hierarchies and layouts, but even if it is, consider this: The byte code for creating the widgets will stay in memory for as long as its class is referenced. Chances are that this will be a very long time; at least for the time that particular part of the UI is materialized somewhere. By comparison, if we had a domain-specific language, it would have to be read once to create the widgets and layout, after which the memory could be freed.

So maybe we can have our cake and eat it too! After thinking about this a bit, I am all excited about using cool new technologies, as long as they don't cause bloat.

We also have to be very carfeful not to use multiple redundant technologies to achieve the same thing, because that is another source of bloat. As in, for example, letting everyone plug in their favourite domain specific language for creating SWT widgets and layouts. This kind of redundancy would be just as bad as redundancy through repetitive boilerplate code, so let's pick one way of doing declarative UIs!

Note that there are lots of other sources of bloat, for example, unneeded functionality, too many layers of abstraction, or unnecessary flexibility. I am running out of time but it is probably interesting to think about these as well. I'd like to know if you have any pointers for me in the comments!

If avoiding bloat is one of the goals of e4, we need to keep this goal in mind all the time. Every bit of functionality should be pulling its own weight. For example, do not add convenience API unless its additional weight can be justified by reduced weight somewhere else.

I believe we should start watching our weight from the very beginning, and from time to time, it is probably healthy to discuss the weight of the various pieces. I can't wait until we have some kind of continuous build in place, so that we can make it visible for everyone how big (or small!) the components are, and how they are growing (or shrinking!) over time.

We could also borrow some ideas from the business world and introduce budgets. You want to provide a component for declarative UI? How about you get an allowance of 300 K? Would that be enough?

What do you think?