Composing Objects – Java Concurrency in Practice

Hi, Java dudes! In my previous post, Sharing Objects – Java Concurrency in Practice, I reviewed how objects can be published and accessed from multiple threads in a safe manner. Today’s topic is how thread-safe components can be combined and enriched with new functionality.

While it is possible to write a thread-safe application that stores all its state in public static fields, but it is much easier if we do it by combining thread-safe components. By doing so we can delegate every thread-safety issue to the right application tier where it needs to be handled. Let the thread-safe class deal with how access to its inner state has to be synchronized. While on a higher level, we can think about how these thread-safe components can be used to represent the application state properly.

In order to prevent concurrency issues, we have to think about state ownership. This is something that is not part of the language itself but it is defined by class design. Therefore programmers have a great autonomy to make the right choices – and the bad ones too. A class usually does not own the objects passed to its methods. In many cases, ownership and encapsulation go together – the object encapsulates the state it owns and owns the state it encapsulates. Collections often share ownership of the contained objects with the client code that inserted them into the collection.

The Java monitor pattern is an example for ownership by encapsulation.

An object following the Java monitor pattern encapsulates all its mutable state and guards it with the object’s own intrinsic lock [JCiP]

Synchronized collections offered by the standard JDK follow the Java monitor pattern. Synchronization is implemented as a wrapper layer that controls all access to the underlying collection. A synchronized list of Strings can be created like this:

import java.util.Collections;
...
List syncStrings = Collections.synchronizedList(new ArrayList());

It is critical that the reference to the backing collection must not escape otherwise it can undermine thread-safety. Operations over a synchronized collection are mutually exclusive – only one thread at a time can work with the collection. In most cases, this is too restrictive and result in poor application performance. There are other alternatives but more on that later.

Now let’s suppose we have a class that is already thread-safe but misses an operation that we need. What can we do in such a situation? We have multiple options, but not all of them are correct:

  • put new synchronized code in a helper class, ☠
  • extend the original class, ☠
  • modify the original class to support the desired operation, or
  • add new functionality in a class encapsulating the original one.

Using a synchronized method of a helper class is probably the worst possible approach. Whatever lock the parameter object uses it definitely won’t be the intrinsic lock of the helper method. It only gives us the illusion of safety while other threads can still modify the state of the passed in object.

Extending the original class – if possible – could work, but it’s fragile because the underlying class might silently change its synchronization policy and jeopardize thread-safety of the extending class. Another problem is that not all classes expose enough of their state to make this approach possible (private lock).

The safest way to add a new atomic operation is to modify the original class consistently with its original design. In this case, all code that implements the synchronization policy is contained in a single source file – easier to understand and maintain.

If changing the existing class is not an option then using composition is the best alternative. This is how standard synchronized collections are implemented. And just as with synchronized collections, it’s crucial that all access to the underlying object must go through the wrapper class.

Don’t force clients or other developers to make risky guesses about the thread-safety of your code.

Document a class’s thread safety guarantees for its clients; document its synchronization policy for its maintainers. [JCiP]

Did you know that java.text.SimpleDateFormat is not thread-safe? Okay, maybe you did, but did you know that it wasn’t explicitly documented until JDK 1.4? And this is the worst that can happen: you assume that something is thread-safe that is not. This is why you should always document your code.

...
@ThreadSafe
public class ImprovedList<T> {

    @GuardedBy("this")
    private final List<T> innerList;

    public ImprovedList(List<T> list) {
        this.innerList = list;
    }

    public synchronized void putIfAbsent(T elem) {
        if (!innerList.contains(elem)) {
            innerList.add(elem);
        }
    }
}

Annotations @ThreadSafe and @GuardedBy are not part of the standard JDK.

That was all for now. Next week we are going to look at what building blocks Java provides that we can use as Lego™ pieces to build our application. Best wishes until we meet again here 😉

P.S.: instead of java.text.SimpleDateFormat use its thread-safe Java 8 alternative the java.time.format.DateTimeFormatter – where applicable.

Resource
[JCiP] Java Concurrency in Practice by Brian Goetz, ISBN-10: 0321349601

Sharing Objects – Java Concurrency in Practice

Welcome, Java enthusiast! Today I’m going to deal with issues around sharing objects in a multi-threaded environment. In last week’s post, Java Concurrency in Practice – Thread Safety, I was focusing on preventing multiple threads from accessing a shared state at the same time. In this post, I’m going to deal with a more subtle aspect of synchronization: visibility. How to publish changes by one thread so they can be safely read by other threads.

public class NotVisible {

  private static class WorkerThread extends Thread {
    public int value = 1;
    public boolean finished = false;
    public void run() {
      while (!finished)
        Thread.yield();
      System.out.println(value);
    }
  }
  
  public static void main(String[] args) {
    WorkerThread t = new WorkerThread();
    t.start();
    t.value = 2;
    t.finished = true;
  }

}

The class NotVisible demonstrates what can go wrong without proper synchronization. While it seems reasonable to assume that the code above will print 2. Actually, it can loop forever because there is no guarantee that the values set by the main thread will be visible in the “worker” thread. Nor, that they will become visible in the same order. Therefore it can happen that it will finish, but print 1 instead of 2.

“in the absence of synchronization, the Java Memory Model permits the compiler to reorder operations and cache values in registers, and permits CPUs to reorder operations and cache values in processor-specific caches.” [JCiP]

Therefore:

“Attempts to reason about the order in which memory actions ‘must’ happen in insufficiently synchronized multithreaded programs will almost certainly be incorrect.” [JCiP]

In insufficiently synchronized programs reader threads can see out-of-date values. Stale data can cause serious issues and failures like unexpected exceptions, broken computations, corrupted data, or infinite loops.

Intrinsic locking is one of the mechanisms that can guarantee that changes made by one thread will be visible to other threads in a predictable manner.

“Locking is not just about mutual exclusion; it is also about memory visibility. To ensure that all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock.” [JCiP]

Meaning that:

“When thread A executes a synchronized block, and subsequently thread B enters a synchronized block guarded by the same lock, the values of variables that were visible to A prior to releasing the lock are guaranteed to be visible to B upon acquiring the lock.” [JCiP]

There is an alternative way to propagate changes predictably: using volatile variables. When a field is declared volatile the compiler and the JVM is put on notice that this variable is shared and should not be cached nor should be access to it re-ordered with other memory operations. Volatile variables provide visibility but not atomicity. Therefore we cannot use them when a write to the variable depends on its current value.

“Locking can guarantee both visibility and atomicity; volatile variables can only guarantee visibility.” [JCiP]

Publishing an object means to make it available outside of its current scope. Using good OOP practices like encapsulation is not necessary for writing safe concurrent programs. However, it certainly makes easier to reason about correctness. Publishing internal state variables can compromise not just encapsulation but the thread-safety of your application. Other classes or threads can intentionally or carelessly misuse the published state and break your design. Sometimes, publishing is obvious e.g. storing a reference in a public static field. In other cases, it can be more subtle like passing an object to an overrideable – neither private or final – method.

An object is in a predictable, consistent state only after its constructor returns. This is why:

“Do not allow the this reference to escape during construction.” [JCiP]

We can let the this reference escape during construction either explicitly (by passing it) or implicitly by instantiating an inner class of the owning object.

No synchronization is needed if the data is accessed only from a single thread. What an ingenious idea! Actually, it is so important it has its own name: thread-confinement. For example JDBC connection pools use thread confinement to ensure correct program behaviour. The Connection object itself is not thread-safe but the pool won’t dispense the same connection to another thread until it is returned from the owning thread – which acquired the connection in a thread-safe manner from the pool.

ThreadLocal class allows us to maintain a separate copy of a value on a per-thread basis. It provides getter and setter methods to return or set the value for the currently executing thread respectively. When a thread calls get for the first time the initialValue() method is executed to construct the new value.

Stack confinement is a special form of thread confinement. Because local variables only live on the executing thread’s stack. If the data is only reachable through local variables thread-safety is not an issue.

An immutable object is one whose state is cannot be changed after construction.

“Immutable objects are always thread-safe.” [JCiP]

It’s technically possible to have an immutable object without all fields being final. The String class lazily computes the hash code the first time it is actually needed but because this is derived deterministically from an immutable state String is still considered immutable.

So far, we have been focusing how not to share objects among multiple threads. Of course, sometimes we simply have to and then we have t do it safely. If the object we would like to share is not immutable we have to follow the safe publication idioms listed below.

To publish an object safely, both the reference to the object and the object’s state must be made visible to other threads at the same time. A properly constructed object can be safely published by:

  • Initializing an object reference from a static initializer
  • Storing a reference to it into a volatile field or AtomicReference
  • Storing a reference to it into a final field of a properly constructed object
  • Storing a reference to it into a field that is properly guarded by a lock
[JCiP]

The Java Memory Model model offers a special guarantee for immutable objects: they don’t have to be published safely. But, they have to be constructed properly: all fields have to be final and their state must not change. This guarantee does not hold for effectively immutable objects. They have to be published safely but can be later accessed without any further synchronization.

Whoa, this was a lot of information. I’m so glad that I decided to write my own study notes because it helps me a lot to better understand what I’ve read. I also hope that I can help others as well. I recommend to read the book and use my notes to recapitulate 😉

Resource
[JCiP] Java Concurrency in Practice by Brian Goetz, ISBN-10: 0321349601

Java Concurrency in Practice – Thread Safety

In my last post, Java Concurrency in Practice: Study Notes, I finished my writing with a statement that all code paths accessing a shared state must be thread safe. But, what is thread-safety? A piece of code is said to be thread-safe when it continues to behave correctly when accessed from multiple threads. To put it in a slightly more formal way:

“A class is thread-safe if it behaves correctly when accessed from multiple threads, regardless of the scheduling or interleaving of the execution of those threads by the runtime environment, and with no additional synchronization or other coordination on the part of the calling code” [JCiP].

At the heart of the definition, there is something we call: correctness. As we programmers often don’t get precise specifications we have to define correctness as something we recognize when we see it – the code works. Since a single-threaded environment is just an edge case of a multi-threaded environment. A program, class, or any piece of code “cannot be thread-safe if it is not even correct in a single-threaded environment” [JCiP].

Let’s continue with an another definition:

“stateless objects are always thread-safe” [JCiP].

But, what do we mean by the state of the object? An object’s state includes any data that can affect its externally visible behavior.

“An object’s state is its data, stored in state variables such as instance or static fields. An object’s state may include fields from other, dependent objects” [JCiP].

For example, a HashMaps state is defined not just by its fields but also by the state of the key value pairs it contains. Now, let’s suppose that we have an object with a state accessible from multiple threads. We often define a set of actions over the object with pre- and post-conditions, invariants, that must hold true before or after any action. These invariants rule out part of the objects state space as invalid. If an object is correctly implemented, no sequence of operations can get the object into an invalid state.

Change of state might be tricky. Often, what seems like a single action operation, it might not be atomic, which means that it does not execute as a single, indivisible operation.

“Operations A and B are atomic with respect to each other if, from the perspective of a thread executing A, when another thread executes B, either all of B has executed or none of it has” [JCiP].

In the CarFactory example below the state of the factory instance is represented by the member variable nextId. The code below looks totally innocent until we realize that incrementing variable nextId is not an atomic operation. It consists of three separate actions: fetch the current value, add one to it, and write the new value back. If multiple threads can access the CarFactory instance with some unlucky timing two cars can get the same id. Depending on the application this can have fatal consequences.

...
@NotThreadSafe
class CarFactory {

  private long nextId = 1;
 
  public Car createCar() {
    Car c = new Sedan();
    c.setId(nextId++);
    return c;
  }
  ...
}

The possibility of getting incorrect results by unlucky timing is so important in concurrent programming that it has a name: a race condition.

“A race condition occurs when the correctness of a computation depends on the relative timing or interleaving of multiple threads by the runtime” [JCiP].

To avoid race conditions, there are multiple ways to prevent other threads from using a variable while we’re in the middle of modifying it. Our aim is to ensure that other threads can observe or modify the state only before we start or after we finish, but not in the middle. One way to make our CarFactory thread safe is to use the AtomicLong library class. Because method getAndIncrement ensures atomicity now we don’t have to worry about unlucky timing. Annotations @NotThreadSafe and @ThreadSafe are not part of the standard jdk. They are used throughout this article and in the book to differentiate between safe and unsafe patterns.

...
import java.util.concurrent.atomic.AtomicLong;

@ThreadSafe
class CarFactory {

  private AtomicLong nextId = new AtomicLong(1);
 
  public Car createCar() {
    Car c = new Sedan();
    c.setId(nextId.getAndIncrement());
    return c;
  }
  ...
}

But what happens if the state of the object is shared among multiple variables? Is it enough to replace all member variables with their atomic version? Of course, not.

To preserve state consistency, update related state variables in a single atomic operation” [JCiP].

Locking helps us to preserve state consistency be ensuring that a critical section can only be executed by a single thread at a time. Java provides the synchronized block as a built-in locking mechanism.

synchronized (lock) {
   // Access or modify shared state guarded by lock
}

Every java object has an intrinsic lock that can be used for synchronization purposes. This internal lock is automatically acquired when entering a synchronized block and automatically released when leaving the synchronized block – even if it is by throwing an exception. A special case of a synchronized block is when we make a whole method synchronized. By doing so, we synchronize on the objects intrinsic lock or in case of a static method the Class< ? > object is used.

Intrinsic locks are reentrant. If a thread tries to acquire a lock that it already holds, the request succeeds. Reentrancy means that locks are acquired on a per-thread rather than per-invocation basis. Reentrancy is implemented by associating with each lock an acquisition count and remembering the owning thread.

From a performance point of view, locking has its own challenges. We will discuss them in detail in a later chapter. For now, just remember one simple rule:

“Avoid holding locks during lengthy computations or operations at risk of not completing quickly such as network or console I/O” [JCiP].

In the next post, I’m going to deal with sharing objects: memory visibility, immutability, what is safe and unsafe publication. Stay tuned!

Resource
[JCiP] Java Concurrency in Practice by Brian Goetz, ISBN-10: 0321349601

Java Concurrency in Practice: Study Notes

A Personal Standpoint

Not long ago a guy from the HR department of the company I work asked if I have interest interviewing candidates for open positions. And I said something like: “Sure, why not”. On the first couple of occasions, I was only an observant. I asked a question or two, but the interview itself was conducted by a more experienced colleague. Then I started doing interviews on my own over the phone and started taking up the leading role in face-to-face interviews. Of course, every interview is different and each interviewer has a favorite set of questions but more or less all interviews are conducted based on a script prescribing which specific topics should be visited. As I started taking more and more responsibility for interviews I noticed that there was a topic that made me very uncomfortable. Often I just asked a couple of basic questions or sometimes even skipped it “because of a time constraint”. You have probably guessed by now, it was: concurrency. Of course, I had a basic concept of what concurrency is and what are the main pitfalls. However, I only had limited knowledge and experience with writing concurrent Java programs. I thought to myself, it would a good time to study the topic of concurrency in Java until, from being a weakness, it becomes one of my strongest points.

Study. Ok, but from What?

One can learn a lot about concurrency in Java only by reading the documentation of concurrency related Java classes like java.lang.Thread. However, to get a broader picture and a detailed explanation I was looking for a good book to learn from. After a little bit of googling, I’ve noticed that most of the sources point to a single book: Java Concurrency in Practice by Brian Goetz [JCiP].

Java Concurrency in Practice book cover
Java Concurrency in Practice by Brian Goetz

After reading the book I can say that it wasn’t a coincidence. It’s a wonderful book: it’s deep but also written in a way that is easy to understand, and most common problems and their solutions are illustrated with code snippets. To put it simply: it’s a must read for every Java developer.

Study Notes

The best way of acquiring new knowledge is not just to read books, but also to take notes. By doing so our brain is “forced” to process the new information once more. This is one of the reasons why I did other books on this as well – check posts in the Bookshelf Category. Another reason is that I hope others can benefit from my notes too. If you don’t have the time to read the full book, or you are not sure if it worths the time: just check my notes. In this case, I decided to go chapter-by-chapter because there is so much to process. I’ll present my notes from each chapter on a weekly basis. It won’t take more than 5-10 minutes to read them and at the end, I’ll share some extra content too. Without any further due, here comes the first chapter.

Chapter 1: Introduction

Writing concurrent programs is hard, to maintain them is arguably even harder: So, why bother with concurrency in the first place? In a nutshell: it is the easiest way to tap the computing power of multiprocessor systems, and often it is easier to write a complicated asynchronous program by writing multiple pieces of code that run concurrently but each doing only a single well-defined task.

The main motivating factors behind the development of operating systems that allowed multiple programs to execute simultaneously were:

  • Resource utilization – programs sometimes have to wait for external events that are out of they control. For example, an I/O operation to finish. While waiting for other programs might do some useful work.
  • Fairness – multiple users and programs may have equal claims on the computer’s resources. In a multi-user or multi-process environment, it’s more desirable for each program to get a chance to do some work than to wait for one program to run to finish and then start another.
  • Convenience – “It is often easier or more desirable to write several programs that each perform a single task and have them coordinate with each other as necessary than to write a single program that performs all the tasks” [JCiP].

Individual programs run in isolated processes: resources such as memory, file handles, and security credentials are allocated by the operating system separately. If they needed to, processes could communicate through means of inter-process communication: sockets, signal-handlers, shared memory, and files. “The same concerns (resource utilization, fairness, and convenience) that motivated the development of processes also motivated the development of threads. Threads allow multiple streams of program control flow to coexist within a process. They share process-wide resources such as memory and file handles, but each thread has its own program counter, stack, and local variables.” [JCiP]

In most modern operating system the basic unit of scheduling is the thread – not the process. All threads within a process have access to the heap that allows fine grained inter-process communication. However, uncoordinated access from multiple threads can leave shared data in an inconsistent state, resulting in undefined program behavior.

Java’s built-in support for threads is a double-edged sword. On one hand, it simplifies the development of concurrent applications. On the other hand, developers need to be aware of thread-safety issues. “Thread safety can be unexpectedly subtle because, in the absence of sufficient synchronization, the ordering of operations in multiple threads is unpredictable and sometimes surprising” [JCiP]. Fortunately, Java provides a number of synchronization mechanisms to coordinate shared access. But, in the absence of such synchronization, “the compiler, hardware, and runtime are allowed to take substantial liberties with the timing and ordering of actions, such as caching variables in registers or processor-local caches where they are temporarily (or even permanently) invisible to other threads” [JCiP].

When writing concurrent applications one must never compromise on safety, we must ensure that “nothing bad ever happens”. Although, that is desirable we also want to make sure that “something good eventually happens”, meaning that the program should not get into a state where it is permanently unable to make progress. On top of that, we often want “good things to happen quickly”. It really would be a waste of effort to rewrite an application to use multiple threads and end up with a program with a worse performance than the single-threaded version.

Threads are really everywhere. “When the JVM starts, it creates threads for JVM housekeeping tasks (garbage collection, finalization) and the main thread for running the main method” [JCiP]. Sometimes concurrency is introduced by using frameworks. The developer writes the business logic which seems to be a simple sequential order of steps (see. convenience as a reason for concurrency) but when it is plugged into the framework it might execute in parallel with other tasks, thus requiring that all code paths accessing shared state be thread-safe.