A Field Guide for OOP in R: S3 vs S4 vs R5

This tutorial gives an overview about OOP in R, covering S3, S4, and reference (R5) classes.

This post is a summary based on Hadley Wickham’s 4 posts:

1  Overview

Central to any object-oriented system are the concepts of class and method.

  • A class defines the behaviour of objects by describing their attributes and their relationship to other classes.
  • The class is also used when selecting methods, functions that behave differently depending on the class of their input.
  • Classes are usually organised in a hierarchy: if a method does not exist for a child, then the parent’s method is used instead; the child inherits behaviour from the parent. 

R’s three OO systems differ in how classes and methods are defined:

  1. S3 classes — implements a style of OO programming called generic-function OO.
    • Most OOP languages, such as Java and C++, implement message-passing OO, where messages (methods) are sent to objects, then object determines which method to call. A typical method call is object_name.method_name(……).
    • S3 is different in that it is the generic function that determines which method to call. A typical method call is method_name(object_name, “……”).
    • S3 is a very casual system, having no formal defnition of class.
  2. S4 classes — works similarly to S3, but is more formal.
    • S4 has formal class definitions, which describe the representation and inheritance for each class, and has special helper functions for defining generics and methods.
    • S4 also has multiple dispatch, which means that generic functions can pick methods based on the class of any number of arguments, not just one.
  3. Reference classes — called RC for short, are quite different from S3 and S4.
    • RC implements message-passing OO, so methods belong to classes, not functions.
    • “$” sign is used to separate objects and methods, so method calls look like object_name$method_name(……).
    • RC objects are also mutable: they don’t use R’s usual copy-on-modify semantics, but are modified in place. This makes them harder to reason about, but allows them to solve problems that are difficult to solve with S3 or S4.

 

2  S3 Classes

S3 is R’s first and simplest OO system. It is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.

S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. 

 

2.1  Recognizing objects, generic functions, and methods

2.1.1  Recognizing S3 objects

There is no simply way in base R to test if an object is a S3 object, and the closest test is is.object(x) & !isS4(x). An easy way is to use pryr::otype() function.

2.1.2  Recognizing generic functions and methods

S3 methods do not belong to objects or classes; instead they belong to generic functions.

  • A function is S3 generic if its source codes call UseMethod().
  • Some S3 generics, such as sum(), and cbind(), do not call UseMethod(); instead they call C function to do method dispatch. These functions are called internal generics.

An easy to test whether a function is S3 generic is to use pryr::ftype().

Given a class, the job of S3 generic is to call the appropriate S3 method. A S3 method has name like generic_name.class_name(). However, this sometimes is confused with some non-method names containing “.”; in this case, pryr::ftype() can be used to make a distinction.

Here, t.data.frame() is the data frame-targeted S3 method, belonging to the S3 generic t(), which implement Transpose  operation. The appearance-similar function t.test() is a generic function for t-test.

You can use the methods() function to do following:

  • List all the methods belonging to a generic function.
  • List all the generics that have a method for a given class.

 

2.2  Defining S3 classes and creating S3 objects

Being a simple and ad hoc system, S3 does not have formal definition for class. You just take an existing base object and set the class attribute, in two ways:

  1. Create and assign in one step with structure().
  2. Create base object first, then assign class with class<-().

Comments:

  • S3 objects are usually built on top of lists, or atomic vectors with attributes. You can also turn functions into S3 objects. 
  • You can use inherits(x,”classname”) function to check whether an object inherits from a specific class.

Most S3 classes provide a constructor function whose names is usually the same as class name. For example, factor(), and data.frame(). You should use then when available to ensures that you’re creating the class with the correct components. 

 

2.3  Creating new methods and generics

2.3.1  Add method to existing generic functions

To add a method to an existing generic function, you just create a regular function with the name generic_name.class_name(……)

As we can see, there’s no check to make sure what the method returns match what shall be return by the generic. It’s up to you to make sure that your method doesn’t violate the expectations of existing code.

2.3.2  Create new generic function and add methods

First create a new generic function that call UseMethod().

  • The UseMethod() function takes two arguments: (1) a string representing the name of the generic function; and (2) the argument to use for method dispatch.
  • If you omit the second argument in UseMethod(), then it will dispatch on the first argument to the function.
  • There is no need to pass any arguments of the generic function to UseMethod()., and you shouldn’t do so.

 After you created the generic, then you can define methods in the way described above.

 

2.4  Method dispatch

S3 method dispatch is relatively simple. UseMethod() creates a vector of function names, like paste0(“generic”, “.”, c(class(x), “default”)) and looks for methods in the order in which they appear in the class vecto from left to right..

The “default” class makes it possible to set up a fall back method for otherwise unknown classes. 

Note that it is dangerous to call method directly; instead, you shall call the generic function. 

Once UseMethod() has found the correct method, it’s invoked in a special way. Rather than creating a new evaluation environment, it uses the environment of the current function call (the call to the generic), so any assignments or evaluations that were made before the call to UseMethod() will be accessible to the method. The arguments that were used in the call to the generic are passed on to the method in the same order they were received.

 

2.5  Inheritance

The NextMethod() function provides a simple inheritance mechanism for method dispatch. It works like UseMethod(), but instead of dispatching on the first element of the class vector, it will dispatch based on the second (or subsequent) element.

The exact details are a little tricky: NextMethod() doesn’t actually work with the class attribute of the object; it uses a global variable (.Class) to keep track of which class to call next. This means that manually changing the class of the object will have no impact on the inheritance, as shown below:

Methods invoked as a result of a call to NextMethod behave as if they had been invoked from the previous method. The arguments to the inherited method are in the same order and have the same names as the call to the current method, and are therefore are the same as the call to the generic. However, the expressions for the arguments are the names of the corresponding formal arguments of the current method. Thus the arguments will have values that correspond to their value at the time NextMethod was invoked. Unevaluated arguments remain unevaluated. Missing arguments remain missing.

If NextMethod() is called in a situation where there is no second class it will return an error. A selection of these errors are shown below so that you know what to look for.

 

3  S4 Classes

3.1  S4 versus S3

Similarities:

  • S4 works in a similar way to S3, but it adds formality and rigour.
  • Methods still belong to functions, not classes.

Differences:

  • Classes have formal definitions which describe their fields and inheritance structures (parent classes).
  • Method dispatch can be based on multiple arguments to a generic function, not just one.
  • There is a special operator, @, for extracting slots (aka fields) from an S4 object.

All S4 related code is stored in the methods package, which is always available when you’re running R interactively, but may not be available when running R in batch mode. For this reason, it’s a good idea to explicitly include library(methods) whenever you’re using S4. 

 

3.2  Defining S4 classes and creating objects

As to defining classes,  in S3 you just need to set class attribute to a (base) object, while in S4 you must define the representation of a class with setClass(), and create a new object with new().

A class has three key properties:

  1. a name: an alpha-numeric string that identifies the class. By convention, S4 class names use UpperCamelCase.
  2. representation: a list of slots (or attributes), giving their names and classes. For example, a person class might be represented by a character name and a numeric age, as follows: representation(name = “character”, age = “numeric”)
  3. a character vector of classes that it inherits from, or in S4 terminology, contains. Note that S4 supports multiple inheritance, but this should be used with extreme caution as it makes method lookup extremely complicated.

In following example, we define a super class Person and a subclass Employee:

Comments:

  • If you omit a slot when creating an instance of a class, it will initiate it with the default object of the class.
  • When creating object of a clas, in addition to using new() function, you can also use the constructor function (with the same name as the calss) if it is defined in the class.
  • Function getSlots() returns a description of all the slots of a class.
  • You can use @ or slot() function to access the slots of a S4 object, but this is not recommended. Indead, you shall define accessor methods for accessing slots.
  • Unlike S3, S4 checks that all of the slots have the correct type, and it throw error when finding type mismatch.

You can specify default values for the slots with default prototype:

Note, in above example, the change of the superclass Person propagates to subclass object mike only after you re-run the definition of subclass Employee. Since R is an interactive programming language, it’s possible to create new classes or redefine existing classes at any time. This can be a problem when you’re interactively experimenting with S4. If you modify a class, make sure you also recreate any objects of that class, otherwise you’ll end up with invalid objects.

Further notes about slots and contains:

  • In slots and contains you can use S4 classes, S3 classes registered with setOldClass(), or the implicit class of a base type.
  • In slots you can also use the special class ANY which does not restrict the input.
  • If an S4 object contains (inherits from) an S3 class or a base type, it will have a special .Data slot which contains the underlying base type or S3 object:

     

3.3  Checking validity

 

 

 

 

 

 

 

3.5  Recongnizing objects, generic functions, and methods

 

 

 

 

 

4  R5 Classes

 

 

 

Course Posts

Leave a Reply

Your email address will not be published. Required fields are marked *