Tuesday, October 07, 2008

Avoiding Bloat

Martin Oberhuber asked on the e4 mailing list what had happened to the pervasive architectural themes that were identified at the summit, such as reducing bloat, too many listeners, and becoming more asynchronous. I started writing a response, focusing on one of the topics, bloat, and it quickly became more than just an email response so I am posting it here.

Before we go into the details, let me state the obvious: It is pretty much guaranteed that we will cause more bloat, overall, for the case of the Eclipse SDK based on the new e4 platform, as long as that SDK still contains 3.x plug-ins that require compatibility layers. This is because all the old (bloated?) functionality and the new (lean?) functionality will be there at the same time.

It seems the best we can do is to avoid bloating the new platform itself, when it is used without any compatibility layers. Unfortunately, we have all these cool new technologies that we would like to use - EMF, CSS, declarative UIs, data binding, cross-compiling of Java to ActionScipt, being able to use multiple languages, client-server split, etc. Put them together and the likely result is bloat. Or is there a way to avoid bloat and use cool new technology at the same time?

So what is bloat? Let's look at Wikipedia's definition of software bloat (thanks John Arthorne for pointing me to it):

Software bloat, also known as bloatware or elephantware, is a term used in both a neutral and disparaging sense, to describe the tendency of newer computer programs to be larger, or to use larger amounts of system resources (mass storage space, processing power or memory) than necessary for the same or similar benefits from older versions to its users.

Let me dive into one concrete example, to show why this is a hard problem:

Code bloat through redundancy, caused by low-level API occurs when clients of a low-level API have to write the same boilerplate code over and over again. Think of all the code we have to write for SWT layouts, for example:
Composite contents = new Composite(parentComposite, SWT.NONE);
contents.setLayoutData(new GridData(GridData.FILL_BOTH));
GridLayout layout = new GridLayout();
layout.marginHeight = convertVerticalDLUsToPixels(IDialogConstants.VERTICAL_MARGIN);
layout.marginWidth = convertHorizontalDLUsToPixels(IDialogConstants.HORIZONTAL_MARGIN);
layout.verticalSpacing = convertVerticalDLUsToPixels(IDialogConstants.VERTICAL_SPACING);
layout.horizontalSpacing = convertHorizontalDLUsToPixels(IDialogConstants.HORIZONTAL_SPACING);
layout.numColumns = 2;

Label label = new Label(contents, SWT.LEFT);
GridData data = new GridData();
data.horizontalAlignment = GridData.FILL;

filenameField = new Text(contents, SWT.SINGLE | SWT.BORDER);
data = new GridData();
data.horizontalAlignment = GridData.FILL;
data.grabExcessHorizontalSpace = true;
Whenever there is a low-level way of doing things, you can come up with a higher-level way and reduce the code size. Of course, you are only reducing the overall code size when the higher-level abstraction is used widely enough to amortize the cost of its implementation. In our SWT layout example, you could write instead:
Composite contents = new Composite(parentComposite, SWT.NONE);
contents.setLayoutData(new GridData(GridData.FILL_BOTH));

new Label(contents, SWT.LEFT).setText(label);

filenameField = new Text(contents, SWT.SINGLE | SWT.BORDER);

Point defaultMargins = LayoutConstants.getMargins();
defaultMargins.x, defaultMargins.y).generateLayout(contents);
Now this is looking a lot shorter, and maybe even more elegant. However, even if GridLayoutFactory is used widely enough to amortize the additional footprint caused by its implementation, there are still two problems: first, the original code ran faster, and second, you now have to learn two APIs - the higher-level one, and the lower-level one when the abstraction gets in your way.

You can see where I am going - there is no clear cut solution to this. It is really a hard problem, and in many cases, we will have to trade off one of the factors disk size, memory size, CPU consumption against the others.

Taking it just a little further, here is another idea, taken from the wikipedia article on code bloat:

The difference in code density between various languages is so great that often less memory is needed to hold both a program written in a "compact" language (such as a domain-specific programming language, Microsoft P-Code, or threaded code), plus an interpreter for that compact language (written in native code), than to hold that program written directly in native code.

So if we had a domain-specific language for creating SWT widgets and specifying their layout, we could get away with no Java code at all! I don't know if the .class file is a space efficient encoding for SWT widget hierarchies and layouts, but even if it is, consider this: The byte code for creating the widgets will stay in memory for as long as its class is referenced. Chances are that this will be a very long time; at least for the time that particular part of the UI is materialized somewhere. By comparison, if we had a domain-specific language, it would have to be read once to create the widgets and layout, after which the memory could be freed.

So maybe we can have our cake and eat it too! After thinking about this a bit, I am all excited about using cool new technologies, as long as they don't cause bloat.

We also have to be very carfeful not to use multiple redundant technologies to achieve the same thing, because that is another source of bloat. As in, for example, letting everyone plug in their favourite domain specific language for creating SWT widgets and layouts. This kind of redundancy would be just as bad as redundancy through repetitive boilerplate code, so let's pick one way of doing declarative UIs!

Note that there are lots of other sources of bloat, for example, unneeded functionality, too many layers of abstraction, or unnecessary flexibility. I am running out of time but it is probably interesting to think about these as well. I'd like to know if you have any pointers for me in the comments!

If avoiding bloat is one of the goals of e4, we need to keep this goal in mind all the time. Every bit of functionality should be pulling its own weight. For example, do not add convenience API unless its additional weight can be justified by reduced weight somewhere else.

I believe we should start watching our weight from the very beginning, and from time to time, it is probably healthy to discuss the weight of the various pieces. I can't wait until we have some kind of continuous build in place, so that we can make it visible for everyone how big (or small!) the components are, and how they are growing (or shrinking!) over time.

We could also borrow some ideas from the business world and introduce budgets. You want to provide a component for declarative UI? How about you get an allowance of 300 K? Would that be enough?

What do you think?


Gunnar said...

I like Ed's pictures more. :p

Ed Merks said...

Boris, those yummy cakes are an excellent illustration of why bloat is so tempting as to be practically unavoidable.

I'm glad you pointed out how the effort to avoid bloat, e.g., provide only trivially simple primitives, often is exactly what causes bloat in the final application, e.g., dozens of uncoordinated, overlapping convenience APIs, or reams of duplicated boiler plate code. I was reminded of this just yesterday when I noticed that IPluginElement is just like IConfigurationElement which is just like org.w3c.dom.Element. You have to ask yourself how many applications ultimately don't end up using DOM in some way...

Anonymous said...

Is it really one DSL to both create and layout the SWT Widgets? Or does it make more sense to limit the DSL to layout and use (instantiate)?

I guess it depends a bit on what is meant by create. Do you mean design new widgets or implement existing widgets?

Either way a compact and concise language to lay out the screens would be awesome.

David Carver said...

Personally, I think that the 3.x series needs a round of refactoring to help reduce it's own lines of code size. This then will help with e4 as the amount of compatibility layer that needs to stay will be adjusted. Also, I would hope that e4 is only talking about a Public API comptability layer and not an internal API compatibility layer. Because if the later happens, you are boxing yourself into a corner. The API compatibility layer should only be for public API, and it should all be marked as deprecated.

Boris Bokowski said...

Scott, I was thinking of a language (or really, a standard format) for instantiating existing widgets, which would include setting properties like text, tooltip, icon etc. as well as specifying the layout.

XSWT comes to mind, but there are lots of similar approaches with varying degrees of maturity. We just have to decide which one to use and then work on making it real.

Boris Bokowski said...

David, yes, we are planning to only support API-clean plug-ins. (And by "API" I mean "public API" because there is no such thing as a non-public API... it either is API or it is not.)

Btw, it's an interesting idea to mark everything as deprecated, I like it! (This would also force us, for every old API that we mark as deprecated, to explain the replacement API clients should use.)

David Carver said...

Don't forget about XUL, XFORMS, and I know XAML has already been mentioned as a meta data DSL for graphical UI interfaces.

Steve said...

Declarative UI is way cool and popular with the kids. One reason it "reduces bloat" is that the syntax is more compact that Java.


Steve said...

Another way to reduce bloat is to use API (when it is there).

filenameField = new Text(contents, SWT.SINGLE | SWT.BORDER);
filenameField.setLayoutData(new GridData(SWT.FILL, SWT.CENTER, true, false));


Boris Bokowski said...

Andi, of course nothing is ever really new. What I am excited about is that there might actually be good reasons, sometimes, for working on "cool kids" stuff.