Working with Java arrays

Newly I’ve had a question at Wizpert about how to print the contents of an array faster than iterating over the array itself if the display order of the elements does not matter?

The asker thought about using threads — however I was for the plain old solution. Let me explain why…

For the examples of this article I will use Java 8 because it has some new functionality on how arrays can be used.

The first idea was to simply create an application which prints out an array containing 1_000_000  integer elements with the common for-each loop and with another solution like threads like the asker wanted. However this would result not quite true because the integers have to be converted to strings prior printing them to System.out — and between the two different runs the garbage collector can start and it messes up the results.

The solution was to create a micro benchmark application — or at least a micro benchmark light.

Micro benchmarks

You can ask what is a micro benchmark? And why do I create a light version instead of a full one?

If we look up Benchmark in Wikipedia we can read:

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term ‘benchmark’ is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves.

Going along with this thought we could define micro benchmarks as measuring the performance of a part of a computer application or functionality. In this case we measure the performance of printing the list with various implementations.

And why is it light?

Because there are some rules how to create a micro benchmark. For example looking at the compiled code or garbage collection. Because these can mess up the performance measurement.

The compiled code can be other than you write in your IDE. This is because the compiler optimizes where it can — and sometimes it eradicates the code you expect to call. For example if you write a for loop without content like this:

for(int i = 0; i <array.length; i++) {
    array[i];
}

the compiler eradicates the whole for loop and your application terminates in an instant.

Another best practice for micro benchmarks is to do not use System.out.print statements because writing to an IO stream consumes time and it will change the results. Nevertheless I have to use printing because this is what I will measure: which method is fast to print an array of integers where the order is unimportant.

So I use a light variation of micro benchmarking where I try to embed best practices with some extensions to get the application work like it should — with IO operations.

Solutions

I’ve never thought but I ended up with 9 different solutions how to go through an array and print the contents to the console if the order of the items does not matter. Some of the solutions are duplicates of each other but they differ in a minimalistic change which could influence the result.

Let’s see the different approaches:

  1. normal printing: looping through the array in a normal for loop using System.out.print to print the elements — without any spaces between the numbers
  2. printing from the beginning and the end: with a normal for loop I iterate over the arrays and print the next element from the beginning of the array and from the end of the array
  3. printing with a stream: here I convert the array to a stream and use the forEach method with System.out.print to print the elements of the array
  4. printing with a parallel stream: this is like the previous example but I convert the stream to a parallel one
  5. printing divided: here I divided the array in two halves and print the next element from the first half and the second half; if the size of the array is odd the at the end I print the number in the middle
  6. printing with for-each: the same case than normal printing but here I use the enhanced for loop of Java, the so called for-in loop
  7. printing with threads: this solution prints the results using a queue and threads
  8. first and last 2:
  9. printing a single string: in this solution I use string joining with streams’ collect method and print a single line of string at the end

I won’t add every line of code here, you can look the sources up at the end of the article. However let’s see some of the solutions: the basic one with a normal loop (#1), the requested one with threads (#7) and the single string based on Java 8 (#9).

You already know how to print an array of elements to the console, but let’s take a look at it again:

long printNormal(final int[] array) {
    LocalDateTime start = LocalDateTime.now();
    for (int i = 0; i < array.length; i++) {
        System.out.print(array[i]);
    }
    return start.until(LocalDateTime.now(), ChronoUnit.MILLIS);
}

Well, this is the same old for loop to print the elements of the array — the only difference is in the timing — which is needed for measurement.

Usage with threads is a bit complicated because you have to share the information between the threads and have to eliminate that two threads print the same number.

private static final ConcurrentLinkedQueue<Integer> QUEUE = new ConcurrentLinkedQueue<>();

class Printer implements Runnable {

    @Override
    public void run() {
        Integer a = null;
        while ((a = QUEUE.poll()) != null) {
            System.out.print(a);
        }
    }
}

long printThreads(final int[] array) {
    for (int a : array) {
        QUEUE.add(a);
    }
    LocalDateTime start = LocalDateTime.now();
    Thread t1 = new Thread(new Printer());
    Thread t2 = new Thread(new Printer());
    Thread t3 = new Thread(new Printer());
    Thread t4 = new Thread(new Printer());
    t1.start();
    t2.start();
    t3.start();
    t4.start();
    try {
        t1.join();
    } catch (InterruptedException e1) {
        e1.printStackTrace();
    }
    try {
        t2.join();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    try {
        t3.join();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    try {
        t4.join();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }

    return start.until(LocalDateTime.now(), ChronoUnit.MILLIS);
}

The example above is a bit complex and the thread count is hard coded — so the results can vary and there is the possibility that the amount of threads slows down the execution. However with some minor change you can make the application work with a different amount of threads and make it configurable too.

And the last one is my favourite. It is lean and simple, utilizing the new possibilities of Java 8:

long printStreamJoiner(final int[] array) {
    LocalDateTime start = LocalDateTime.now();
    System.out.println(IntStream.of(array).mapToObj(String::valueOf).collect(Collectors.joining(" ")));
    return start.until(LocalDateTime.now(), ChronoUnit.MILLIS);
}

This one-liner is very straightforward.

Runtimes

Depending on the environment and configuration the results can vary. And there are significant differences between runtimes so eventually you have to be careful which one to choose — but mostly it is because garbage collection and other things like this.

For example when I run the complete benchmark inside a JBoss Developer Studio on Windows 7 I get slowly increasing runtimes for each method listed above — if I execute all functions after each other. Switching to Cygwin yields a much faster result.

Runtime for normal is 8.336
Runtime for first and last is 19.450
Runtime for stream is 31.779
Runtime for parallel stream is 46.225
Runtime for division is 57.891
Runtime for foreach is 72.657
Runtime for threads is 94.172
Runtime for first and last 2 is 106.345
Runtime for stream joiner is 145.461
Runtime for normal is 14.443
Runtime for first and last is 11.629
Runtime for stream is 11.780
Runtime for parallel stream is 12.272
Runtime for division is 11.768
Runtime for foreach is 11.566
Runtime for threads is 11.396
Runtime for first and last 2 is 5.974
Runtime for stream joiner is 168

As you can see, the results are really different. It looks like that the DevStudio performs slower each time a new iteration is started. So I switched to the command line and executed the code there too.

Runtime for normal is 15.128
Runtime for first and last is 11.754
Runtime for stream is 11.638
Runtime for parallel stream is 12.854
Runtime for division is 11.593
Runtime for foreach is 11.680
Runtime for threads is 11.525
Runtime for first and last 2 is 5.764
Runtime for stream joiner is 166

As you can see, using the console is faster too. In this case I did no warm-up prior the runs so this is one on-the-fly or dry run.

To see a more detailed result set I executed the functions in a “standalone” version too: I executed only one of the nine functions with 5 warm-up iterations (where no time is measured) and 5 measured rounds.

When I run the different solutions in the Developer Studio I get slow results at the end… The warm-up is real fast but then one iteration takes almost as much time than the complete warm-up did. I think some GC is kicked off in the background which makes the execution slow.

If I change the warm-up iteration count to 0, the results are fast again. The problem is that I cannot simply log the GC to the console because I print the results there too. This means the IO would be used for two types of messages — and I would not see the GC messages properly. The solution would be to enable the GC logging into a file — but I will analyse this in another article in the future.

Conclusion and sources

Depending on your environment the runtime can vary. So do not let fool yourself when running your application in your IDE: it can be slow however when running outside in a native environment it can be fast enough. Creating a micro benchmark is not easy — even if you want to create a light version of it. Interesting performance differences can only be viewed with GC in mind — and log because GC can impact the performance.

Naturally the presented runs do not have any information to use them in statistical calculations because the rule there is to have at least 30 measurements to have some significant and usable information.

The sources are available at my BitBucket repository. Beside the sources the results of the executions are available too — separated by environment.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s