Type Safe Heterogenous Containers in Java

As someone who has shifted from Python to Java, this article discusses two of the problems I’ve faced in Java that don’t exist in Python:

  1. Java’s type system cannot represent containers or maps with heterogenous value types. Python dicts can hold anything.
  2. Java’s methods lack named and optional parameters. These useful language features are provided by Python, Ruby, and C#.

As a response to these problems, this article presents a small library that makes them a bit less painful. We build on Effective Java’s item #29: consider typesafe heterogeneous containers. The library code is available at https://github.com/stevewedig/blog and is published to Maven for easy usage as a dependency. This is the second in a series of Java libraries I’m sharing on this blog (the first was Value Objects in Java & Python).

Article Outline

  1. Two problems with Java
  2. Get and cast: Heterogenous maps without type safety
  3. TypeMap: Effective Java’s type safe heterogenous map
  4. Symbol: A key with a value type parameter
  5. SymbolMap: An improved type safe heterogenous map
  6. SymbolBus: Heterogeneous event publishing/subscribing
  7. Using SymbolMap for named and optional parameters
  8. SymbolFormat: Parsing & writing SymbolMaps
  9. SymbolSchema: Validating the symbols in SymbolMaps

1. Two problems with Java

Problem 1. Java lacks containers and maps with heterogeneous values.

As expected, Java objects support fields with heterogeneous types. For example a Person object can have a “name” field of type String and an “age” field of type Integer. Logically the fields form a Map from field names (“name” and “age”) to field values of different types (a String and an Integer). Such a Map would contain heterogeneous value types. Despite this being a perfectly reasonable Map, Java’s type system does not support it.

In this contrived example you could (and should) just use a Person object, however in other circumstances heterogeneous maps are more convenient. Falling back to objects also doesn’t work when implementing reusable heterogeneous containers, because the set of keys is open. (Examples include Favorites in Effective Java item #29, and this article’s SymbolMap and SymbolBus.)

A static type system can never represent all correct programs, so we should expect to bump up against this from time to time. However the lack of support for maps with heterogeneous value types is an inconvenient omission. They are a natural computational construct, and many namespaces contain heterogeneous values, including elemental namespaces such as parameters, scopes, object fields, etc.

Later in the article we look at ways to outsmart the type system in order to implement type safe heterogeneous containers.

Problem 2. Java lacks named and optional parameters

Named parameters (also called pass-by-name or keyword parameters) is a feature of Python Ruby, and C#, among other languages. When calling a function or method, named parameters give you the ability to specify a parameter’s value by name, instead of by position. This means you don’t have to remember the positional order of parameters, which frequently makes code easier to write, read, and refactor.

Optional parameters (also called default arguments or parameters with default values) are another language feature. When defining a function you have the ability to provide a default value for a parameter. If a parameter has a default value, then that parameter becomes optional. If a caller doesn’t provide a value for that parameter, the default value will be used.

Named and optional parameters have synergy: if a method’s 3rd and 4th parameters have default values, then named parameters enable you to provide a value for the 4th parameter without providing one for the 3rd.

Here’s what named and optional parameters look like in Python:

# ======================================
# defining a class
# ======================================

class Article(object):

    # published, author, and tags have default values
    def __init__(self, url, title, published = Opt.absent, author = None, tags = frozenset()):
        self.url = url
        self.title = title
        self.published = published
        self.author = author
        self.tags = tags

# ======================================
# creating instances
# ======================================

# providing all params by position
article1 = Article(
    "http://url.com",
    "title",
    Opt(1408170918),
    "Bob",
    frozenset(["software"]),
)

# providing all params by name (order doesn't matter)
article2 = Article(
    tags = frozenset(["software"]),
    author = "Bob",
    published = Opt(1408170918),
    title = "title",
    url = "http://url.com",
)

# param 1 by position, param 2 by name, params 3-5 using default values
article3 = Article("http://url.com", title = "title")

# param 1 & 2 by position, param 4 by name, params 3 & 5 using default values
article4 = Article("http://url.com", "title", author = "Bob")

# (Note that Opt can be imported from this library which is described here.)

Later in the article we look at ways to mimic named and optional parameters in Java.

2. Get and cast: Heterogeneous maps without type safety

The closest Java’s type system can get to map with heterogeneous value types is Map<String, Object>. Unfortunately Map<String, Object>.get() will return values of type Object, so we have to downcast them to their underlying types. This approach is demonstrated in TestGetAndCast. Instead of manually casting we use CastLib.get() which does it for us, with the convenience of generic method type inference.

This is nice and simple, but as shown in the test, the downcasts are not statically type safe. This isn’t the end of the world: dynamic typing works fine for Python, Ruby, Clojure, etc. However static types are a primary reason why I use Java, so it would be preferable to preserve this benefit. Fortunately we can outsmart the type system in order to implement type safe heterogeneous maps…

3. TypeMap: Effective Java’s type safe heterogeneous map

Joshua Bloch’s Effective Java should be read by every Java developer. Item #29 in the book is consider typesafe heterogenous containers. The idea is represented by this interface:

interface TypeMap {

    <Value> void put(Class<Value> type, Value value);

    <Value> Value get(Class<Value> type);
}

TypeMap provides a type safe mapping from type objects to heterogeneous values. The signature of put() ensures that values associated with a type object are instances of that type. This knowledge is reflected in the signature of get(): when we lookup by type object, we know that the returned value will be an instance of that type.

TypeMapClass implements TypeMap which is demonstrated in TestTypeMap. Behind the scenes we wrap a Map<Class<?>, Object> and do a downcast in the get() method, however the signature of put() guarantees that this cast is safe. I’ve modified the interface above by having put() return the map, so maps can be built fluently.

TypeMap achieves our goal of heterogeneous values with static type safety, however it has two problems:

  1. TypeMap cannot contain multiple values of the same type. This is an unusual and undesirable restriction for a map.
  2. TypeMap doesn’t quite work with parameterized value types such as List<Integer>. Although List.class is available at runtime, List<Integer>.class is not. This is presumably because Java has type erasure, where generic parameters are discarded at compile-time.

Fortunately we can solve these problems by using different keys…

4. Symbol: A key with a value type parameter

TypeMap’s technique for static type safety is based on having keys that are parameterized by value type. This parameter enables us to enforce value type during put() and it tells us the expected value type during get(). Type objects are convenient keys because they are built-in, however they also cause TypeMap’s two problems.

We can solve this by creating our own parameterized keys. A Symbol is a key with a value type parameter. In the next section we solve TypeMap’s two problems by using symbols as keys instead of type objects.

Here’s how you create symbols:

import static com.stevewedig.blog.symbol.SymbolLib.symbol;

Symbol<String> $name = symbol("name");
Symbol<Integer> $age = symbol("age");

The details of Symbol are demonstrated in TestSymbol:

  • Symbols are key objects with a Value parameter.
  • Symbols have name fields to make key errors meaningful. However symbols behave as entity objects, so two symbols with the same name are different keys. This behavior means that while symbol name conflicts would be confusing, they do not create bugs.
  • I use $ as a prefix for symbol variables, which I’ve found to be readable.

5. SymbolMap: An improved type safe heterogeneous map

SymbolMap preserves TypeMap’s technique for type safety while replacing the type object keys with Symbol keys:

interface SymbolMap {

    <Value> void put(Symbol<Value> symbol, Value value);

    <Value> Value get(Symbol<Value> symbol);
}

Using Symbols as keys solves TypeMap’s two problems:

  1. We can store multiple values of the same type by associating them with different symbols.
  2. We can seamlessly store values of generic types such as List<Integer>.

SymbolMap is demonstrated in TestSymbolMap. The actual interface is split into SymbolMap (the base interface), SymbolMap.Solid (the immutable version), and SymbolMap.Fluid (the mutable version):

// A type safe mapping from symbols to values of matching types
interface SymbolMap {

    <Value> Value get(Symbol<Value> symbol);

    SymbolMap.Fluid fluid(); // mutable copy

    SymbolMap.Solid solid(); // immutable copy
    
    // A SymbolMap that is immutable and a value object
    interface Solid {
    }

    // A SymbolMap that is mutable and an entity object
    interface Fluid extends SymbolMap {

        <Value> SymbolMap.Mutable put(Symbol<Value> symbol, Value value);
    }
}

These interfaces are implemented by SymbolMapSolidClass and SymbolMapFluidClass, each inheriting from SymbolMapMixin. The immutable maps are value objects and the mutable maps are entity objects. You can convert between the two by calling solid() or fluid().

As shown in TestSymbolMap, the SymbolMap.Fluid is used as a fluent builder for SymbolMap.Solid. I use $ as a prefix for symbol variables, which I’ve found to be readable. After adding a static import from SymbolLib here’s the syntax you end up with:

import static com.stevewedig.blog.symbol.SymbolLib.map;

SymbolMap params = map().put($name, "bob").put($age, 20).solid();

assertEquals("bob", params.get($name));

The rest of this article explores ways to use SymbolMap, but first let’s take a brief look at SymbolBus…

6. SymbolBus: Heterogenous event publishing/subscribing

SymbolBus is another example of using Symbols to create a type safe heterogenous container.

Event handling comes up a lot. We see it in GUI software, in the domain model, in asynchronous software, in loosely coupled systems, etc. The general concept is inversion of control (the Hollywood principle).

Two event handling patterns are observer and publish-subscribe (pubsub). The observer pattern is where a subject maintains a collection of its observers, and it notifies each observer when an event occurs. Unfortunately the subject and observer know about each other, and this coupling can limit flexibility and modularity. The publish-subscribe pattern (pubsub) reduces coupling by having subject and observer communicate through an intermediate event bus.

SymbolBus is a simple event bus that can route heterogenous events. Here is the interface:

interface SymbolBus {

    <Event> void publish(Symbol<Event> symbol, Event event);

    <Event> void subscribe(Symbol<Event> symbol, Act1<Event> callback);

    <Event> void unsubscribe(Symbol<Event> symbol, Act1<Event> callback);
}

SymbolBus uses symbols as logical channels. Publishers publish events to symbols, and subscribers subscribe to symbols. Event types are homogenous within a channel, and heterogeneous between channels.

SymbolBusClass implements SymbolBus which is demonstrated in TestSymbolBus. As usual for Java, callbacks are anonymous classes: I use Act1<Input> from my LambdaLib, which is demonstrated in TestLambdaLib. SymbolBus also includes subscribeAll() to receive all events and subscribeMisses() to receive only events without any subscribers, which is useful for debugging.

(A quality alternative to SymbolBus is Guava’s EventBus. Like TypeMap, EventBus is a mapping from type to value, more specifically it’s a mapping from event type to subscriber set. So I believe it suffers from one of TypeMap’s problems: we cannot have multiple event streams containing the same event types.)

7. Using SymbolMap for named and optional parameters

As discussed, one of Java’s problems is the lack of support for named and optional parameters. However there are ways to mimic this behavior:

  • Use method overloading to provide default values. This is less flexible and more verbose than Python’s optional values. (It also doesn’t help with named parameters.)
  • Create a fluent interface with named methods for building up the state of an object. Default values can be defined for optional fields that are not set.
  • Accept a parameter map as an input. Default values can be defined for optional fields that are missing. This is a common practice in Javascript, and was the standard practice in Ruby before named parameters were added in 2.0. Parameter maps contain heterogeneous values, so this technique is less appealing in Java until you have SymbolMap (or a similar tool).
  • Here’s a StackOverflow post discussing other creative tricks of varying practicality (which are not addressed in this post).

In TestArticleCreation we compare four Java techniques for creating articles that are analogous to the Python example:

  1. ArticleWithPositional is a traditional class with the constructor accepting five positional parameters.
  2. ArticleWithMutation is a class allowing an instance’s state to be incrementally added via fluent setters.
  3. ArticleWithBuilder has a separate builder class a fluent setters. Guava provides similar builders for their immutable collections.
  4. ArticleWithSymbols is a class whose constructor accepts named parameters in the form of a SymbolMap.

These classes all support a form of default values, and #2-4 mimic named parameters. ArticleWithPositional provides default values using overloading. ArticleWithMutation and ArticleWithBuilder provide default values for optional fields whose setters are not called. ArticleWithSymbols provides default values when pulling values out of the SymbolMap using these convenience methods:

interface SymbolMap {

    ...

    // return the value or defaultValue if missing or null
    <Value> Value getDefault(Symbol<Value> symbol, Value defaultValue);

    // return Optional.of(value), or Optional.absent if missing or null
    <Value> Optional<Value> getOptional(Symbol<Value> symbol);

    // return the value, or null if missing
    <Value> Value getNullable(Symbol<Value> symbol);
}

class ArticleWithSymbols {

    private final String url;
    private final String title;
    private final Optional<Integer> published; // optional
    private final String author; // nullable
    private final ImmutableSet<String> tags; // default

    private static ImmutableSet<String> defaultTags = ImmutableSet.of();

    public ArticleWithSymbols(SymbolMap params) {
        url = params.get($url);
        title = params.get($title);
        published = params.getOptional($published);
        author = params.getNullable($author);
        tags = params.getDefault($tags, defaultTags);
    }
}

All four of these classes are clumsy compared to the elegance of real named and optional parameters. However these kinds of techniques seem to be the best we can do in Java. Deciding which to use, or which combination to use, is a case-by-case judgement call. Here are some of their tradeoffs:

  • ArticleWithPositional is the easiest to implement, but it doesn’t provide named parameters. Default values are provided by defining a constructor with fewer inputs. This is awkward because you have to provide a new method for every combination of default values you want to support, which may not even be possible if two optional parameters have the same type. On the positive side, overloading enables the compiler to statically verify that a client is providing a valid combination of parameters. An alternative or complement to defaults via overloading is defaults via null, where the constructor converts null values into default values.
  • ArticleWithMutation mimics named parameters using named setters, so it offers a better client experience. However it requires more implementation effort because we have to create and maintain a setter method for every parameter. We also cannot make the article immutable (mark the fields as final), even if we only intend for the article to be mutated during construction. It’s also possible for the article to be in an invalid state where required fields haven’t been set.
  • ArticleWithBuilder solves ArticleWithMutation’s problems: fields can be marked as final and an article cannot be in an invalid state. The tradeoff is the extra effort of implementing a separate builder class.
  • ArticleWithSymbols also allows you to define parameters by name, however it requires a bit less implementation effort than fluent setters: we only need a symbol per field instead of a setter method per field. We also get the benefits of ArticleWithBuilder without requiring a separate builder class: fields can be final and there’s no risk of invalid instances. Another important benefit is that parameter maps give us separation between configuration and creation. This kind of coupling reduction is generally desirable, and such separation can have practical benefits. As an example I’ve included a generic copy-with-mutations method which is demonstrated in TestArticleCreation. (Copy-with-mutations can be particularly useful when working with immutable objects.)

I should also mention that another benefit of fluent interfaces is the ability to introduce domain logic in the methods, moving you towards an embedded domain-specific language. This is discussed with lovely clarity in Martin Fowler’s book: Domain-Specific Languages.

8. SymbolFormat: Parsing and writing SymbolMaps

Map<String, String> is a ubiquitous intermediate between our programs and various serialization formats:

These formats have varying serialization details such as delimiters, escaping, and ordering, however they all share the same logical interface:

interface ConfigFormat {

  Map<String, String> parse(String content) throws ParseError;

  String write(Map<String, String> strMap);
}

As a concrete example, let’s use configuration files for the remainder of this section. More specifically let’s use Java’s .properties file format which can be manipulated via java.util.Properties. Here’s the content of a sample .properties configuration file:

userName = bob
threadCount = 4
version = 2.3
precision = 0.01
createTables = 1
launchNukes = false
logLevel = warning
point = {"x": 7, "y": 8}
adminEmails = alice@example.com
thresholds = 10, 20, 30

PropLib provides a .properties backed implementation of ConfigFormat, as demonstrated in TestPropLib. Using this ConfigFormat we can easily convert between the file content string and a Map<String, String>. That’s the easy part. The hard part emerges when we realize that we need heterogeneous types for values, not just strings. For example the value for "threadCount" needs to be the integer 4, not the string "4". The value for "thresholds" needs to be ImmutableList.of(10, 20, 30), not the string "10, 20, 30". And so forth.

You could manually parse config values wherever they are used, but this is painful. What we really want is to work with a SymbolMap instead of a Map<String, String>. That’s where the SymbolTranslator interface comes in, converting between what we have and what we want:

interface SymbolTranslator {

    SymbolMap parse(Map<String, String> strMap) throws ParseError;

    Map<String, String> write(SymbolMap symbolMap);
}

Once we have a SymbolTranslator, we can chain that together with a ConfigFormat to implement the convenient SymbolFormat interface:

interface SymbolFormat {

    SymbolMap parse(String content) throws ParseError;

    String write(SymbolMap symbolMap);
}

TestSymbolFormat shows how to build and use a SymbolFormat. We build a SymbolTranslator, we import a ConfigFormat, and then we chain them together to get a SymbolFormat. However to fully understand what is going on here, we have to explore a couple of small base interfaces from com.stevewedig.blog.translate:

// A Parser parses, so it converts from Syntax to Model.
interface Parser<Syntax, Model> {
    Model parse(Syntax syntax) throws ParseError;
}

// A Writer writes, so it Converts from Model to Syntax.
interface Writer<Syntax, Model> {
    Syntax write(Model model);
}

// A Translator parses and writes, so it converts between Syntax and Model
interface Translator<Syntax, Model> extends Parser<Syntax, Model>, Writer<Syntax, Model> {
}

// FormatParser is a "type alias" binding Parser's Syntax to String.
interface FormatParser<Model> extends Parser<String, Model> {
}

// FormatWriter is a "type alias" binding Writer's Syntax to String.
interface FormatWriter<Model> extends Writer<String, Model> {
}

// Format is a "type alias" binding Translator's Syntax to String.
interface Format<Model> extends 
FormatParser<Model>, FormatWriter<Model>, Translator<String, Model> {
}

As demonstrated in TestFormatLibFormatLib defines a bunch of common formats (such as intFormat, boolFlagFormat) and shows how easy it is to create your own. Format, FormatParser, and FormatWriter are “type aliases” that bind Syntax to String. Unfortunately Java doesn’t provide real type aliases, so I use subtypes as aliases. (This is theoretically an antipattern because subtyping creates new nominal types, which means that while Format<Model> can be used as a Translator<String, Model>, the reverse is unfortunately not true. However in practice the proliferation of parameters can make code unreadable, so I think this “antipattern” can be worth the tradeoff. I have no idea why Java doesn’t have real type aliases yet.)

Now that we know about these base interfaces, we can take a look at the interfaces used in TestSymbolFormat:

// ConfigFormat is a "type alias" binding Format's Model to Map<String, String>
interface ConfigFormat extends Format<Map<String, String>> {
}

// SymbolParser is a Parser converting from Map<String, String> to SymbolMap
interface SymbolParser extends Parser<Map<String, String>, SymbolMap> {
}

// SymbolWriter is a Writer converting from SymbolMap to Map<String, String>
interface SymbolWriter extends Writer<Map<String, String>, SymbolMap> {
}

// SymbolTranslator is a Translator converting between Map<String, String> and SymbolMap
interface SymbolTranslator 
extends SymbolParser, SymbolWriter, Translator<Map<String, String>, SymbolMap> {
}

// SymbolFormat is a "type alias" binding Format's Model to SymbolMap
interface SymbolFormat extends Format<SymbolMap> {
}

With the preliminaries out of the way, we can review TestSymbolFormat more carefully. The test demonstrates the steps for using SymbolFormat:

  1. Define the symbols.
  2. Use a builder to create a SymbolTranslator (associating each symbol with a value format).
  3. Get a ConfigFormat. In the test we import one backed by java.util.Properties.
  4. Create a SymbolFormat by chaining together the ConfigFormat and SymbolTranslator.
  5. Use the SymbolFormat to convert between content strings and SymbolMaps.
  6. Once we’ve got a SymbolMap we can directly use it in our application, or we can use it as a parameter map for creating an object.

SymbolFormat is extensible along two axes. The first is the ability to provide whatever value formats your application needs. The second is the ability to provide whatever ConfigFormat you application needs. We’ve used a .properties file as an example, but as mentioned, Map<String, String> is ubiquitous.

Although SymbolFormat could be used for object serialization, I don’t recommend it. Once you’re working with objects instead of SymbolMaps, Java provides more convenient serialization mechanisms:

  • The Serializable interfaces lets you write and parse binary. Once you’ve got a binary representation, you can choose to trade CPU for disk space and IO by zipping it.
  • The Jackson ObjectMapper lets you easily write and parse JSON if your objects are Serializable.

9. SymbolSchema: Validating the symbols in SymbolMaps

A SymbolSchema can validate the symbols in a SymbolMap (its keys). This is occasionally useful as an assertion, especially when time elapses between the creation of a SymbolMap and its usage.

SymbolSchemaClass implements SymbolSchema which is demonstrated in TestSymbolSchema. A SymbolSchema contains a set of required symbols and a set of optional symbols. It can validate the keys of a SymbolMap, or any other set of symbols. This validation detects:

  • When required symbols are missing.
  • When an unexpected symbol is present.
Advertisements

One thought on “Type Safe Heterogenous Containers in Java

  1. Haakon Løtveit

    Of course you can have maps that hold multiple types. However, you lose out on type safety. This is 100% exactly the same as in Python.

    Say you want to use Strings as keys, and store stuff:

    Map pythonicMap = new HashMap();
    pythonicMap.put(“a number”, Integer.valueOf(3)); // Should be autoboxed, but Explicit is better than Implicit.. :p
    pythonicMap.put(“some text”, “Ash nazg durbatulûk, ash nazg gimbatul,\nAsh nazg thrakatulûk agh burzum-ishi krimpatul.”);

    And we can get things back out too. We just need to be explicit about what we claim that these things are.

    int number = (Integer) pythonicMap.get(“a number”);
    String some text = (String) pythonicMap.get(“some text”);

    See? Just like Python. However, since we’re leaving the type safety behind, we need to tell the compiler when we need to get some value back out.

Comments are closed.