Wednesday, November 28, 2018

OOP Concepts: The Aggregate Class

One fundamental principle in object-oriented programming (OOP) is hiding data. The idea is that you use a class to encapsulate data associated with some entity in the business model. All data access and manipulation should be performed by methods on the class.

This principle also applies to collections. In this post, I'll cover some details of a specific pattern for hiding collections. This pattern is called the aggregate class. An Aggregate Class follows the same principles as any other class: data hiding, separation of concerns, etc. It also has the same benefits: reuse, contained changes, centralized business logic.

The Problem With Functional Style in OO Languages


With so many developers using ORM's and Linq (and other lambda-based operations in other languages besides C#), we see a trend toward functional style programming in OO languages such as C# and Java. We should still be careful to protect the data since OO languages do not work the same way as functional programming languages.

In OO languages, collections such as lists and arrays still allow you to change the state of each element. Even if the collection itself is read-only, it's only a pointer to the underlying objects. You could, perhaps, use structs in C#. But then again, you might not have control, and still, you'd have to go the extra mile to properly handle state changes.

If you're really doing functional programming, you won't mutate the state of a record. Some OO concepts have bled into functional languages like Lisp and F#. Basically, the lines have blurred on both sides. When it comes to OO languages, classes are mutable by default. The whole point is to mutate state within a class instance! So, we need to be careful to protect the state of collections from corruption by external manipulation.

Enter the Aggregate

The Aggregate Class is just the thing to keep our data within collections protected. The aggregate class helps by containing changes to the collection. Also, it gives us a centralized place to organize our collection logic. That's really important too!

As an example, let's consider an application for managing people within a school system. There are several types of people in a school. Major classifications are students, teachers, staff, and administrators. There may be some database table containing basic information of all persons: name, phone number, etc. But, indeed, there will be various information for each type of person. 

Specifically, a teacher may be tenured. Say a teacher earns tenure after five years. You can write a filter using Linq to get all tenured teachers like this:


var tenuredTeachers = persons.Where(
    person => person.Role == "Teacher" && 
    (DateTime.Now - person.StartDate).Days > (365 * 5)
);

As you can see, it can be a little messy. And of course, this encourages copy-paste coding because you might need the logic elsewhere. How do we fix this?

First of all, we'll make a Teacher class and put an IsTenured property on that to contain the logic.

var tenuredTeachers = persons.Where(
    person => person is Teacher && person.IsTenured
);

Already looking cleaner!

Next, let's do away with the "person is Teacher" by using OfType<Teacher>:

var tenuredTeachers = persons.OfType<Teacher>().Where(
    person => person.IsTenured
);

Even better! OfType will filter the collection down to the specified sub-type. But must we repeat this code every time we need teachers? No! We can use the Aggregate Class. Let's make one like this:

class Persons
{
   public IEnumerable<Teacher> Teachers { get { return _persons.OfType<Teacher>(); } }
}

And we can use that to get teachers wherever we need them. But we can go a bit further and create a Teachers Aggregate Class too.

class Persons
{
   public Teachers Teachers { get { return Teachers.Get(_persons); } }
}

class Teachers
{
    // factory method
    public static Teachers Get(IEnumerable<Person> persons)
    {
        return new Teachers(persons.OfType<Teacher>());
    }

    private Teachers(IEnumerable<Teacher> teachers)
    {
        _teachers = teachers;
    }

    public Teachers TenuredTeachers { get { return _teachers.Where(t => t.IsTenured); } }
}

And with that, we can return only the tenured teachers and contain the logic. Now in our business objects, controllers, or wherever we need to get tenured teachers, we can use this Teachers Aggregate.

var tenuredTeachers = persons.Teachers.TenuredTeachers;

And there we have some nice clean code!

Applying Functions to Aggregates

We want to protect the data within the collections. There's a pattern we can use to pass functions to the collections within the Aggregate Class. It's basically the visitor pattern, and it goes a little like this:

persons.Teachers.Apply(t => notification.Notify(t) );

// or

persons.Teachers.Apply( notification.Notify );

In this case, we're passing the "Notify" function to the Teachers Aggregate. The Teachers Aggregate will handle passing it along to all the teachers.

Even with the Apply function, we aren't really protecting the data unless the Teacher class protects its own data. A typical pattern you'll see is that the Teacher or Person class is a POCO—meaning it just has public properties. POCOs are really just DTOs. They're intended for transferring data, but you shouldn't really manipulate the data everywhere in your code. This is where you have a real divergence between the intention of OO and how we often see the languages used in practice.

On the flip-side, we see functional programming operating on records. But when that happens, the default is to create new records as a result of the application of a function. A Map method in a Teachers class would look like this:


public IEnumerable<Teacher> Map( Func<Teacher, Teacher> map )
{
    return _teachers.Select( t => map( t.Clone() ) );
}

In this Map method, the elements of the internal collection are copied then passed to the map delegate given to Map. Surely, this is an odd mix of OO concepts and functional principles. The Aggregate Class shouldn't really return a collection of the internal data even if it's a copy. This is really a generic type of method that has use in low levels of your application stack. It isn't beneficial. Think about it this way...what business function does "Map" perform? None. Which brings us to putting business methods in Aggregate Classes, which is the proper way to do things in the OO paradigm.

Business Methods in Aggregate Classes

What do you actually need to do with Teachers? We've already seen a case for "notify tenured teachers." We can expose other useful subsets of teachers like TeachersWithAbsences. But if you really feel the need to present a way to apply arbitrary filtering, the items in the set should prevent modification of internal data AND the set should prevent altering the items in the set. You can run into trouble with filters if they allow the collection the change:


// Dangerous Notify method...
public async IEnumerable<NotifyResult> NotifyAync( Func<Teachers, Teachers> filter )
{
    foreach( teacher in filter( _teachers ) )
        yield await teacher.NotifyAsync( notification );
}

// Dangerous call to Dangerous Notify method...

... await teachers.NotifyAsync( teachers => teachers.Where(t => t.IsTenured ? t : null ) );

Here, the programmer is returning null when a teacher isn't tenured. The implementation of NotifyAsync, while intended to be as flexible as possible, invites danger. A better implementation prevents modification to the internal collection as follows:

// Better Notify method...
public async IEnumerable<NotifyResult> NotifyAync( Func<Teacher, bool> filter )
{
    foreach( teacher in _teachers.Where(filter) )
        yield await teacher.NotifyAsync( notification );
}

// Dangerous call to Better Notify method...

... await teachers.NotifyAsync( teacher => 
  { 
    teacher.Email = null;
    return teacher.IsTenured; 
  } );

Here, the collection can't be changed, so it's better. But still, unless the underlying items are adequately protected, they can even be modified in ways that are dangerous. This example is a bit obvious, but similar trouble can occur when the underlying data is allowed to be manipulated when it should not.

We can go all the way to protect the underlying data by either denying access to the underlying items altogether or by limiting exposure to the underlying items.

// Best Notify method...
public async IEnumerable<NotifyTeacherResult> NotifyAync( NotifyTeachersFilter filter )
{
    foreach( teacher in filter.GetFiltered(_teachers) )
        yield await teacher.NotifyAsync( notification );
}

public class NotifyTeachersFilter
{
    public bool? IsTenured { set; private get; }

    internal IEnumerable<Teacher> GetFiltered( IEnumerable<Teacher> teachers )
    {
        return teachers.Where(t => IsTenured != null && t.IsTenured == this.IsTenured);
    }
}

With the filter type, we've entirely walled off access to the underlying items in the collection. This is a simple example of how to use filter types with an Aggregate Class. You can go further by passing a collection of filters or even an ordered collection.

Final Thoughts

I want to conclude by saying that using Linq is not precisely the same as Functional Programming. Sure, you can and should bring some of the concepts of FP into an OO language. But, remember that the language itself is built for OO programming. The principles won't translate 100%, and you can end up shooting yourself in the foot quickly by trying to do FP in an OO language. It's better to switch to a functional language like F# so you get the full support and benefits. When using FP concepts in OO languages, keep in mind the OO principles and use the Functional Programming practices with a grain of salt. Keep in mind that they can help but use with caution!