Value Objects in Java & Python

So far I’ve posted a Software Developer’s Reading List and Why & How I Write Java. After all that high-level jabbering it’s time for something concrete with actual code. This article discusses value objects: what they are, how to implement them, and when to use them.

I think object-oriented languages should have value objects built-in, like structs in C# (with caveats). Unfortunately they aren’t built into Python or Java (yet). However we can implement value objects on our own, and this article includes small libraries to make it easy for Java and Python.

Article outline

  1. What are value objects?
  2. How to implement value objects?
  3. When to use value objects?
  4. Java library usage
  5. Python library usage

1. What are value objects?

A value object is an object that behaves as a value, not an entity. To make sense of that sentence, let’s build up to it…

Domain model concepts

  • Domain Model: A domain model is a conceptual representation of a problem domain. Domain models are never perfect or complete, but like metaphors, analogies, or statistics, effective domain models can capture salient aspects of the subject area. Eric Evans’ seminal book Domain Driven Design discusses evolving an effective domain model and translating it into a software representation. Of particular interest here is the fact that nouns in the domain model are either values or entities.
  • Value: A value is an abstract attribute which is timeless and unchanging. This is best illustrated by examples…
    • Blue is a value. It used as an attribute in various contexts: the sky is blue, my jeans are blue, my eyes are blue, my favorite color is blue, etc. While I could change my favorite color to red, the concept of blue itself is timeless and unchanging.
    • 21 is a value: 21 is the result of 10 + 11, it is the best score in blackjack, it is the drinking age in the US, etc. Your age may change from 21 to 22, but the concept of 21 is timeless and unchanging.
    • $5 is a value: a bill can be worth $5, a debt can be $5, a price can be $5, etc. A product’s price could change from $5 to $7, but the concept of $5 is timeless and unchanging. $5 is a composite value containing the currency value $ and the integer value 5. (Money values are the subject of the money pattern in PEAA.)
    • A mailing address is a value: you can write it on envelopes, you can type it into applications, you can tell it to people, etc. In fact a mailing address gets copied all over the place. You might move to a new address, but all those old copies still reference the same concept: your old address. A mailing address is a composite value containing a street value, a city value, a country value, etc.
    • Other examples of composite values include attributes like points in a coordinate system, a URL, or a string of characters.
  • Entity: An entity is something that exists. An entity has an identity over time that is independent from the entity’s current state (attributes). A person is an entity: their identity persists beyond changes in appearance, age, status, relationships, and even beyond death. The same is true for any discrete thing made of physical atoms including my dog, my coffee mug, a particular $5 bill, etc. Entities can also be non-physical abstract nouns or legal fictions such as Google, the United States, or a particular department in a university. These organizations have an identity. Entities with different identities are different even if they happen to have the same state (attributes). So two guys named Bob are different even if they have matching attributes such as same name, age, hair color, etc.
  • Identifier (Id)Identifiers (ids) are what we use to distinguish the identity of entities. Humans can use faces as identifiers to recognize other humans. Software can similarly use biometrics such as facial recognition, fingerprints, eye scanning, etc. We also use names as identifiers, which works ok as long as each name is unique within the namespace. Governments issue national ids to uniquely identify their citizens. The same approach is used with vehiclesgunspets, etc. In software systems, entities that need to be persisted or distributed are often assigned a globally unique identifier such as an auto-increment database id, a UUID, or a URI. Notice that adding an id to a relational database table causes the rows to behave like entities instead of values. (Not to get lost in metaphysics, but I suppose physical location at a point in time can also be viewed as an identifier: if two people (collections of physical atoms) are standing at the same place at the same time then we know they are the same person. If you’ve got a little free time, perhaps consider this thought experiment: would you use a teleporter if one existed? If you’ve got a lot of free time, try debating it with a friend.)

Programming language concepts

Domain models are translated into programming constructs…

  • Primitive Data Types: In a programming language or data definition language, the primitive data types are the elemental building blocks of data. The family of primitives is fairly standard, although availability and granularity varies between languages. Atomic types include boolean, integer (short, int, long), real (float, double), string, character, and binary (byte, bit). Homogenous composite types include list, set and map. Heterogenous composite types are provided by structs.
  • Struct Type: A struct (or record) is a mapping from field name to field data. In many environments struct field lookup is accomplished via dotted access syntax (x.y). With mutable structs the field mapping can be updated, often via dotted assignment syntax (x.y = z). A struct type defines a field type for every field name. So a struct type is just a mapping from field name to field type. A struct type can be viewed as the schema for a family of isomorphic structs.
  • Object: An object combines internal state with external behavior, which is at the heart of object-oriented programming (and also the actor model). In OO languages, an object’s internal state is a hidden struct: the object’s fields. An object’s external behavior is a collection of methods provided to clients for interacting with its internal state. To ensure that the internal state is properly hidden (encapsulated), an object’s clients must be unaware of which methods simply return fields on the struct, and which methods return the results of derived computations (this is the uniform access principle).
  • Class: A class is a family of objects that share the same struct type and methods (behavior). In many languages classes do more than this, but for our purposes here we can ignore these extras (e.g. implementation inheritance hierarchiesnominal type hierarchies).
  • Entity Object: An entity object is an object that behaves like an entity: it has an identity and can have mutable state. In Java and Python instances of user-defined classes are entity objects by default. The runtime system assigns internally unique identifiers to all objects, and by default these ids are used for object equality and hashing. (In Java you can access the system id’s hash via System.identityHashCode(), and compare ids via the == operator. In Python you can access the system id via id(), and compare ids via the is operator.)
  • Value Object: A value object is an object that behaves like a value: it doesn’t have an identity and its state is immutable (timeless and unchanging). If two value objects have the same state (fields) and behavior (class), then they should be referentially transparent: it should be transparent to the program’s semantics which object we reference, use, return, pass around, distribute, etc. In the next section we discuss how to implement value objects.

Tangentially related concepts

Feel free to skip these, but I include them to hopefully clarify some terminology…

  • Other senses of the term “value”: The term “value” is unfortunately overloaded. In this article we use value to mean “non-entity”. I wish there was a different term for non-entity, but I haven’t been able to think of one, and at this point the term “value object” has historical precedence. The Wikipedia article for value discusses a different sense of the word: an expression that cannot be evaluated any further. So the expression “1 + 2″ is a not a value, but “3″ is. That article also mentions l-values vs. r-values, which is the distinction between expressions on the left and right of an assignment. More generally, variables are often said to store values, and maps/dicts are mappings from keys to associated values.
  • Pass-by-value & pass-by-reference: These terms deceptively sound relevant to value objects. The terminology around evaluation strategy is rather confusing. Java and Python are pass-by-value only. This means function parameter variables are new memory slots with the parameters’ data copied into them at call time. However things get a little tricky for objects (non-primitives) because the copied parameter data is the object reference (a special value containing the object’s system id). So the reference itself is passed-by-value. However there is only one underlying referenced object, so mutations to that shared object are visible to both the caller and callee. In addition to pass-by-value, C++ and C# also support pass-by-reference, which means that the parameter variable is not a new memory slot, but instead the memory slot is shared between caller and callee. A good way to understand this distinction is to understand the swap example here. In conclusion, evaluation strategies involve value & reference variables, which are different from the subject of this blog post: value & entity objects.
  • JavaBean vs. POJO vs. Data Transfer Object (DTO)Stack Overflow answer

2. How to implement value objects?

As I mentioned, I think value objects should be built into object-oriented programming languages, like structs in C# (with caveats). There is an official proposal (Reddit thread) from John Rose, Brian Goetz, and Guy Steele to add a similar construct to Java. I hope it gets adopted. The proposal discusses interesting language design issues and opportunities.

Even without direct language support it’s fairly straightforward to implement value objects (and the Java and Python libraries introduced later make it quite easy). By following five rules we can create user-defined classes whose instances behave like value objects. Our goal is for value objects with the same state (fields) and behavior (class) to be referentially transparent: it should be transparent to the program’s semantics which instance we reference, use, return, pass around, distribute, etc. (The Java proposal says “code like a class, works like an int”, and links to a description of value-based classes.)

Rule 1: Override the equality method to compare state & behavior

Override the equality method to return true if the objects’ state (fields) and behavior (classes) are equal. The method to override in Java is Object.equals() and in Python it is object.__eq__().

(As an advanced note, we can omit fields that are derivable from other fields. We must also omit any fields that are used for locally caching computations because those are implementation details unrelated to the object’s logical state.)

Rule 2: Override the hashing method to hash state & behavior

Objects should have the same hash if they are equal, so whenever we override equality, we should also override hashing. For value objects this means returning a hash of the object’s state (fields) and behavior (class).

The method to override in Java is Object.hashCode() and in Python it is object.__hash__(). In Java you can return Objects.hash(class, field1, field2, ...). In Python you can return hash((class, field1, field2, ...)).

Rule 3: Disregard the system provided object ids

All objects in Java and Python have system provided object ids, including our user-defined value objects. However values aren’t supposed to have an identity, so these ids should never be compared for value objects (doing so would break referential transparency). In Java this means value objects should be compared for sameness via Object.equals(), and not via the == operator. In Python this means value objects should be compared for sameness via == (which calls object.__eq__()), and not via the is operator.

One way to remember rule #3 is to compare value objects as you would compare strings. This is because strings behave like built-in value objects, which is why it’s usually a bug to compare the system id of string objects.

Rule 4: The struct should be immutable

Values are timeless and unchanging, so value objects should be immutable. This means a value object’s fields shouldn’t change after object creation.

In Java rule #4 means all fields should be final. In Python rule #4 means having the discipline not to set or delete any fields after the object is created.

Rule 5: The struct fields should be values

Value objects are composite values, which means their nested fields should also be values. In Java and Python this means only referencing…

  • Atomic primitives: These include boolean, integer (short, int, long), real (float, double), string, char, byte, and also enums. As an example, an RssImage value object might include a URL string and integers for height and width.
  • Other user-defined value objects: As an example, an RssArticle value object might include a nested RssImage value object, along with the article’s title and URL.
  • Immutable collections that behave as values: Both Java and Python have immutable collections available that behave as values. For Java, Guava provides ImmutableListImmutableSetImmutableMap,  ImmutableMultiset, ImmutableMultimap, ImmutableBiMap, and Optional (which was apparently added to Java 8). For Python, tuple and frozenset are built-in, and I’ve included frozendict and Opt in the Python Library. As an example of a nested collection, an RssFeed value object might include an ImmutableList of RssArticle value objects, along with the feed’s title and URL.

(In Domain Driven Design Eric Evans says that value objects can reference entity objects. He mentions the example of a value object representing a driving route between several cities which are entity objects. This makes sense to me at the domain model level, but I don’t think those logical references should be implemented as programming language references. Instead the route value object could store the ids of referenced cities. Presumably the ids would be ints or strings, so the route object would remain composed of values. If we instead directly reference entity objects from value objects, I think we would lose some of the benefits of value objects, particularly in the areas of functional design, convenient distribution, and denormalized persistence. Note that a Java description of value-based classes does allow references to mutable objects, so presumably there are multiple perspectives on the issue. Anyway, this article’s Java and Python libraries will work fine if you want to reference entities, just don’t reference objects whose equality relationships and hashes change with mutation. In Java that includes ArrayList, HashSet, and HashMap, and in Python that includes list, set, and dict.)

Library support for implementing value objects

Later in this article we introduce Java and Python libraries that can do most of the boilerplate for you. For more details about implementing value objects take a look at how these libraries work.

3. When to use value objects?

I view choosing between entity or value as one of the three core decisions I make when designing a class:

  1. Mutable or immutable? Do objects of this class need to be mutable? My default choice is immutability.
  2. Entity or value? If the objects are mutable, they must be entities. If the objects are immutable, then we decide whether they should behave as entities or values. My default choice is to make them value objects.
  3. Serializable? Should the object be serializable? Serialization is necessary for persistence and distribution, and both Java and Python provide built-in serialization mechanisms. Java has Serializable along with Jackson for JSON, and Python has pickle. Although convenient, these built-in serialization mechanisms effectively publish the hidden struct type of your class. This makes the class harder to refactor once its instances are persisted or depended on by remote clients.

Benefits of value objects

Rich Hickey’s keynote The Value of Values may be a good place to start. He seems to oversell a bit, but he’s smarter than me so I’ll reserve judgement. In my own experience, here are the situations where I’ve found value objects to be useful:

  • Functional designState mutation and side effects are complexity creating monsters. We can combat this complexity by employing functional design when “appropriate”, regardless of programming language used. Functional design increases the amount of referentially transparent code, which decreases the amount of code where sequencing and distribution can cause confusion and bugs. An important difference between object-oriented programming and functional programming is what gets passed around. In OO we generally pass around mutable entities, and in FP we pass around immutable values. So embracing functional design requires the ability to define new value types. We can do this in OO languages using value objects. I usually start programming in a functional style, and then I add mutable objects when the problem space forces me to. So most of my objects start as value objects until proven otherwise. Here is Michael Feathers’ perspective on blending FP and OO. Newer languages such as Scala also advocate combining FP and OO.
  • Domain models: As I mentioned in the definition of value objects, nouns in the domain model are either entities or values. We can implement composite values from the domain model as value objects.
  • Hash-based collections: Value objects’ equality and hashing are defined in terms of state equality and hashing. There are many situations where this content hashing behavior is what you want. For example value objects make great keys for intraprocess caches.
  • Flyweight pattern: The flyweight pattern reduces memory usage by ensuring that there aren’t duplicate objects in memory. Instead of directly instantiating objects, clients acquire objects via a factory method that handles duplication prevention. Since instances are shared between unrelated usage contexts, they must be immutable. Value objects are a natural fit for this pattern due to their immutability and referential transparency.
  • Cached properties: Value objects (as defined in this article) are transitively immutable. This means a derived property on a value object can be cached indefinitely after it is first computed, assuming the property returns another value.
  • Convenient testing: Value objects work great with assertEquals(). Different instances of value objects can be equal because their equality and hashing are defined in terms of state. This is often useful in tests. Instead of inspecting parts of an object returned by the component under test, you can directly compare it to a manually created expected result. (I guess this is syntactically like a crappy version of the predicate part of pattern matching.)
  • Convenient distribution: How can you send an entity to another process? You generally have to devise globally unique identifiers (ids) so processes can agree on which entities they are talking about. Values are easier to distribute because you can just send a copy of the state and forget about it.
  • Denormalized persistence: When persisting mutable entities, normalization is usually required so there is only one canonical place to do updates. For a particular entity everyone references the same copy of the data, often via a logical id reference. There are downsides: the logical indirection hinders performance, the centralization can create a bottleneck or single point of failure, and translating the runtime object graph from physical pointers to logical ids contributes to the impedence mismatch. Fortunately we have more flexibility with value objects due to their immutability and referential transparency. We can make copies of value objects without affecting program behavior, which means normalization is no longer necessary. So we can embed nested value objects onto containing objects’ tables (as discussed in DDD and the embedded value pattern in PEAA). We can also serialize an entire object with a subgraph and store it as text, json, or binary (as discussed in the serialized LOB pattern in PEAA). This leads you down the path towards schemaless persistence, which can be done with NoSQL databases or traditional relational databases.
  • Efficiency?: The official proposal for adding value objects to Java describes how natively supported value objects could be more efficient than the currently supported reference objects. This is because the runtime wouldn’t have to keep track of their identities and they could be placed in the stack instead of the heap. This benefit won’t be available in Java or Python unless value objects become natively supported.

Books discussing value object usage

Eric Evans’ Domain Driven Design (2003) has the best description of entities vs. values that I know of (chapter 5). Martin Fowler mentions value objects in Refactoring (1999) and Patterns of Enterprise Application Architecture (2002). In Refactoring there are three sequential related patterns: replace data value with object, change value to reference, and change reference to value. In PEAA the related patterns are: value objectmoney, embedded value, and serialized LOB.

The Gang of Four’s Design Patterns (1995) doesn’t use the term “Value Object” but the flyweight pattern is closely related. They mention the distinction between intrinsic and extrinsic state. Flyweight objects are shared so they should only store intrinsic state: the state that is common to all of the usage contexts. If you and I both point at the same DateTime object, then an example of intrinsic state would be the Unix timestamp. An example of extrinsic state would be a particular format string. You and I might want to format the same date differently, so we provide this extrinsic state as a parameter to the format() method.

Articles related to value objects

4. Java library usage

(Edit: I’ve been told that AutoValue and Lombok are alternatives ways to implement value objects in Java, both of which use annotations + code generation. My library is intentionally low-tech and GWT compatible without a fuss. However code generation can provide a better syntax, including boilerplate reduction only possible via metaprogramming.)

The Java library code is published on Github here: https://github.com/stevewedig/blog/. It is also published to Maven for easy usage as a dependency. The value objects library is the first in a series of libraries I’ll be adding to this project and describing on this blog.

To get started, take a look at TestValueMixinExample. Implementing value objects is as easy as inheriting from ValueMixin and implementing the fields() array, which is an array of alternating field names and field values. ValueMixin implements toString(), equals(), and hashCode() for you. The toString() implementation renders the class name and internal struct using Guava’s toStringHelper(). As an example, an Image string might look like: "Image{[url=http://image.com, height=Optional.of(20), width=Optional.of(30)]}".

Although we can usually inherit from ValueMixin, this isn’t always possible because Java only supports single implementation inheritance. In this situation implementing value object behavior requires copy and pasting a few methods, as demonstrated in TestValueWithoutMixin. I’ve tried to minimize the amount of code duplication by pulling out the logic into the ObjectHelperClass.

The library also provides EntityMixin which only implements toString(), in the same way as ValueMixin. EntityMixin is demonstrated in TestEntityMixinExample. As a practice most of my classes inherit from either ValueMixin or EntityMixin (unless they need a different superclass for some reason). As shown in TestEntityMixinExample, subclasses of these mixins can use assertStateEquals() and assertStateNotEquals() for testing purposes. These methods are convenient when you want to compare state regardless of whether the objects are entities or values, and regardless of whether their classes match.

Lastly, you’ll want to use Guava’s immutable collections and Optional in your value objects, as discussed in the how to implement value objects section. Instances of these classes behave as values, which is demonstrated in TestGuavaValues.

5. Python library usage

(Edit: Paddy3118 on Reddit has pointed out that subclassing namedtuple is an alternative to subclassing ValueMixin from my library. On one hand I prefer using built-ins as much as possible, on the other hand the syntax is a bit rough in my opinion, especially with default values. Perhaps in the next version of my library I’ll keep the syntax the same, but have it create a subclass of a namedtuple underneath the hood.)

The Python library code is published on Github here: https://github.com/stevewedig/value-objects-in-python/. It is also published to Pip for easy installation.

To get started, take a look at test_value_mixin_example.py. Implementing value objects is as easy as inheriting from ValueMixin and indicating the field names via the __init__ method’s parameter names. As a convenience you don’t have to set the field attributes in the body of __init__ because the mixin handles that for you.

ValueMixin implements __str__, __unicode__, __repr____eq__, and __hash__ for you. The logic for these methods has been pulled out into the ObjectHelper. The __str__ implementation renders the class name and internal struct. As an example, an Image string might look like: "Image{url=http://image.com, height=Opt(20), width=Opt(30)}".

The library also provides EntityMixin which only implements __str__, __unicode__, and __repr__, in the same way as ValueMixin. EntityMixin also has the same convenience of not having to set the attributes in __init__. EntityMixin is demonstrated in test_entity_mixin_example.py. As a practice most of my classes inherit from either ValueMixin or EntityMixin.

The library also provides 3 tools that I use a lot with value objects:

Lastly, Vagrant (with VirtualBox) is a great way to run sandboxed experiments on your dev machine. I use Vagrant plus Tox to run this library’s tests in Python 2.x and 3.x without affecting my dev machine. I’ve included the three shell scripts I use. After going through the quick Vagrant tutorial you’ll be ready to try them out:

  1. Setup the VM on your dev machine: 1_enter_vm.sh
  2. Provision and test inside the VM: 2_run_inside_vm.sh
  3. The provision script: 3_provision.sh

 

About these ads

5 thoughts on “Value Objects in Java & Python

  1. Morten Christensen

    I have also been looking at how to make and maintain java value objects without fuss. My proposed solution is a bit different as it it is simply an annotation processor based on interfaces that generates the necessary code – no need to inherit from anything and no dependencies in generated code. Also I needed something which is extremely customizable and use standard java stuff without risk of tool integration problems. You can check my take on this at “http://valjogen.41concepts.com”. Let me know what you think?

    Reply
    1. Steve Wedig Post author

      Cool, I think annotations are the path to the most pleasant syntax possible without language support. I think AutoValue can also be added to your list of libraries that also take this approach. My hesitation with adopting annotations is for two reasons: 1. It makes GWT compatibility more difficult, and 2. With too much code generation going on, I worry about losing a clear mental model for what is going on. I’m open to the possibility that #2 is an invalid concern, and that for Java I should jump on the annotation bandwagon. TBD I guess.

      Reply
      1. Morten Christensen

        The VALJOGen annotations are source level only they should not affect anything at runtime like serialisation (not tried GWT though). The code generated will in most cases by exactly as if written by hand using modern Java 7 features like Objects class. There are no runtime dependencies in the generated code (apart from Java itself and your own stuff). The generated code will be visible in a folder like “generated-sources” so you will never be in doubt what is going on. There is no magic going on that affects your “mental model”.

        Reply
        1. Steve Wedig Post author

          Thanks for the added information. Admittedly I haven’t dug into the details of annotation based code generation.

          Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s