Saturday, September 1, 2012

Wish list for java array (or numeric collections)

If you have been doing numeric calculations in Java for some time, you'll have learned to both love and hate numeric arrays.

For the love part:

  • they are safe. No garbage data. All access is guarded.
  • they are fast. A simple loop to calculate the average of an array of 100,000 elements on JDK 1.7, Intel Core i7 Q840 1.87GHz, takes about 100,000 nanoseconds, or one nanosecond per element.

For the hate part:

  • there are 6 number arrays (byte, short, int, long, float double). If you need to write a function that works with all, you either have to convert (expensive) or provide 6 implementations (can't use primitive with generics, and generics and arrays don't mix well anyway)
  • you can't have immutable references. If you return an array from an object, you have the option of being efficient but unsafe (you return the array directly, thus exposing your internal state) or being safe but inefficient (you return a copy, even if 90% of the uses are reads)
  • you can't give "views" of other arrays. To pass a sub-array you need to actually copy it. NIO buffers, for example, have to implement that functionality, like so many other have to in other contexts, including me.
  • their are concrete classes. Or better, we don't have a superclass for primitive/number collections.

This props up various attempts, all incompatible, to patch these deficiencies. If you search in ohloh for DoubleList
http://code.ohloh.net/search?s=DoubleList&browser=Default&pp=0&fl=Java&m...
you'll find some of them.

Naturally, I am in that group too. I have my "solution", which can't really be a solution because it's not standard. So, if I have to use some java math library to do something, I still have to convert in and out. Yet, I'll go through some of the details, because some things do work, and it's the kind of things I'd like to see.

What we need are numeric collections. The current collections don't mesh well, for the usual reasons: they are collection of objects, not primitives; generics does not work with primitives; generics is invariant.

The basic thing one need is an iterator for external iteration (yes, internal iteration is nice, but it becomes difficult when you need to iterate over more than one collection at a time, possibly in different patterns). Following both the java.lang.Number and java.util.Iterator conventions, I have:

public interface IteratorNumber {

boolean hasNext();
float nextFloat();
double nextDouble();
byte nextByte();
short nextShort();
int nextInt();
long nextLong();
}

In the same way that Iterator allows iteration of a collection, and Number allows to use a numeric value "without caring" with a single binding, this interface allows to iterate a numeric collection with a single binding. One can also image implementation of these that do a safe cast (i.e. throw an exception if the convertion overflows), or wrappers that do that. Again, similarly to java.util.Number and its sub-classes, I have:

public abstract class IteratorByte implements IteratorNumber {

@Override
public float nextFloat() {
return (float) nextByte();
}

// Implement all except nextByte()

}

This accomplishes two purposes: when implementing I only have to implement one method, and allows the client to detect what type of numer it is (a byte instead of a double). So that, in case one _does_ want to have different bindings for the different types, one can still do it!

Some kind of mechanism like this is the basic minimum. One could think of using Iterator and Iterator, but they have the following problems:

  • next().doubleValue() means there was an object created in between (the Number or Double). If the code can be inlined and escape analysis kicks in, that's not an issue. But, given that this is a basic interface, and you may have several implementations, this would not (currently) happen.
  • given that generics are invariant, you'd have to always use Iterator instead of Iterator. And working with wildcards all over the places is not pretty.

On the other side, IteratorDouble as implemented before, could easily implement Iterator too. The Double next() would only have one implementation (in the IteratorDouble superclass), which would refer to the nextDouble() call, so it could be inlined more easily, escape analysis may kick in and so on.

Regardless of the details: I think there is a need for a numeric iterator! :-)

Apart from the iterator, we need some basic collections. With lack of imagination (which is often enemy of good design), I have a CollectionNumber, which only supports sequential access, a ListNumber, which support random/indexed access. These have abstract implementations in CollectionDouble and ListDouble, again so that most of the implementation is done and so that a consumer can tell which type is in the collection.

Lastly, there is a set of ArrayDouble, ArrayFloat, ... which are just wrappers around arrays. And here's the punchline: if you code to these _directly_, through the magic of inlining, you suffer _no_ performance loss. Zero. Which means: using these set of interfaces, I am losing _nothing_. I can still provide 6 different bindings to ArrayXxx, which would all perform in the same way like I'd do with plain arrays. But, if performance is not that criticial, I _can_ code to ListNumber or CollectionNumber, and cover all cases in one. In fact, if only a couple of concrete classes are instanced, you'll also suffer no performance loss.

Again, regardless of the details: we need something like this.

I want to be able to make the decision to code to the general case first. Once I see that performance is an issues, I can take, say, the ArrayDouble case and give a second implementation for that, simply by switching on the type of the class, and then rewriting the same method with a different parameter type (the rest stays the same). In fact, it'd be nice if the VM did that for me, but that's another story.

The other nice thing about the wrapper array class is that I can disable the write. It's "read-only", not as good as immutable, but in many cases it works the same.

You can also create a "safe copy wrapper", which if accessed in read only, does not copy, but at the first write, would make the copy of the array.

These few things cover at least the problems I mentioned before. The solution I showed I think fits already the patterns established by java.lang.Number and the Collection classes. So at least does not feel "foreign".

No comments:

Post a Comment