Monoids and Semigroups - Functional Programming with Scala

Back with another geeky topic! In this one, we will apply yet another discrete maths concept being applied in scala. Semigroups and Monoids, yet seemingly impractical in theory but finds its application in everyday code.

In simple terms, Monoid is a special case of Semi-group, which is, in fact, a special case of Groupoid. So, let's refresh our discrete maths theories before jumping into its scala version.

Groupoid

Formal Definition

An algebraic structure on a set S with a binary operator. The only restriction on the operator is closure (i.e., applying the binary operator to two elements of a given set S returns a value which is itself a member of S ).

If a,b ∈ S, * is a binary opeartor, then a * b = c, where c ∈ S

Semigroup

Formal Definition

A mathematical object defined for a set and a binary operator in which the multiplication operation is associative. No other restrictions are placed on a semigroup; thus a semigroup need not have an identity element and its elements need not have inverses within the semigroup. A semigroup is an associative groupoid.

If a,b,c ∈ S, * is a binary opeartor, then (a * b) * c = (a * b) * c = d , where d ∈ S

Monoid

Formal Definition

A monoid is a set that is closed under an associative binary operation and has an identity element I ∈ S such that for all a ∈ S, aI = Ia = a .

Definition in Action

Monoid can be defined using the definition that it has a binary operator and Identity and its binary operation is associative in nature.

trait Semigroup[S] {
  def binOp(x: S, y: S): S
}

trait Monoid[S] extends Semigroup[S] {
  def identity: S
}

object Monoid {
  def apply[S](implicit monoid: Monoid[S]) =
    monoid
}

Following ensure that it satisfies the law associated:

def associativeLaw[A](x: S, y: S, z: S)
      (implicit m: Monoid[S]): Boolean = {
  m.binOp(x, m.binOp(y, z)) ==
    m.binOp(m.binOp(x, y), z)
}

def identityLaw[A](x: A)
      (implicit m: Monoid[A]): Boolean = {
  (m.binOp(x, m.identity) == x) &&
    (m.binOp(m.identity, x) == x)
}

Putting it in Practice

Now let's create a Monoid for Boolean type and let's see how many monoids can be created. Binary operations that I could think of for Boolean Set S would be, OR AND XOR XNOR.

OR:

implicit val boolOrMonoid: Monoid[Boolean] =
  new Monoid[Boolean] {
    def binOp(a: Boolean, b: Boolean) = a || b
    def identity = false
  }

AND:

implicit val boolAndMonoid: Monoid[Boolean] =
  new Monoid[Boolean] {
    def binOp(a: Boolean, b: Boolean) = a && b
    def identity = true
  }

XOR:

implicit val boolXorMonoid: Monoid[Boolean] =
  new Monoid[Boolean] {
    def binOp(a: Boolean, b: Boolean) = (a && !b) || (!a && b)
    def identity = false
  }

XNOR:

implicit val boolXorMonoid: Monoid[Boolean] =
  new Monoid[Boolean] {
    def binOp(a: Boolean, b: Boolean) = (!a || b) && (a || !b)
    def identity = true
  }

Similarly, we can achieve the same for Integers, Monoids that I can think of Addition and Multiplication. Why not Subtraction and Division? 😏 We can extend the same theory to Strings and other primitive types. Then advance it to create Monoids for complex class Types.

Note: While designing monoids it should obey both the laws and its definition.

Real-World Application

In distributed systems, we come across CRDTs - Conflict-free Replication Data Types. CRDTs in simple terms are the data types that can be concurrently merged in any order. Think of a scenario in distributed systems, where the huge number of nodes in the system get the updated data. Now the challenge is that one node may receive an update where others may not. But we need to ensure data consistency throughout. So in distributed systems, we follow a principle of eventual consistency. It's not like ACID principle, where you have consistent data at all times, but with eventual consistency, the system becomes consistent eventually without data loss.

One of the CRDT type, an operation based or commutative CRDT uses the concept of commutative monoids. One of the popular operation-based CRDT is G-set (Grow only set). By strict mathematical definition, G-set can be defined as:

Data set A // Set named A
    initial ∅ // initially empty
update add(element e)
    A := A ∪ {e} // Set union element
query lookup(element e) : boolean b
    let b = (e ∈ A) // if e belongs to A
compare (S, T) : boolean b
    let b = (S.A ⊆ T.A) // if S is a subset of T
merge (S, T) : payload U
    let U.A = S.A ∪ T.A // Union of two Sets

Elements can only be added to a GSet and never removed. When the GSet diverges, it can be easily merged by calculating the union of two sets.

For a more comprehensive introduction to CRDT and its types, I found this interesting article by Nezih.

It should be quite an interesting task for you to explore the application of monoids in Big Data's map and reduce.

Conclusion

Monoids or any topic in discrete maths per se are being widely used across paradigms and systems and we don't realise the very basics we learnt as a part of our computer science course, making small but impactful changes. I guess that will be an interesting topic to talk about the hidden and unsung heroes of our course theory which are silently making great impacts. According to you, which is that one underestimated concept?

Monoids and Semigroups - Functional Programming with Scala - 101 Series

Groupoid

Formal Definition

Semigroup

Formal Definition

Monoid

Formal Definition

Definition in Action

Putting it in Practice

Real-World Application

Conclusion

Recent Posts

Comments