Tuesday 8 April 2014

(Pure) virtual functions

Virtual function is bounded with the C++ features, inheritance and polymorphsim. It is designed to allow different behavior between base and derived classes.

1. Virtual functions
Virtual functions are defined as functions proceeding with "virtual" keyword in a class declaration in syntax. How the functionality of "virtual" is achieved in compiler might be implementation-specific. One of ways is to use virtual table and virtual pointer.

*********************************************************************************
class Base {
};

class Base2{

private:
  char m_x;
}

class Base3{
public:
  virtual void Foo() {}
private:
  char m_x;
};

class Base1 {
public:
  virtual void Foo() {}
};

class Derived1 : Base1 {
public:
  void Foo() {}
private:
  char m_x;
};
*********************************************************************************

As virtual functions are added into a class, a virtual table will be generated for this class in the code sector. I discussed it in more detail in my another blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/virtual-table.html.  And at the same time a virtual pointer will be added into the objects when an instance of class (no matter base or derived) instantiated.

*********************************************************************************
sizeof(Base) = 1; // empty class
sizeof(Base1) = 4;// size of virtual pointer
sizeof(Base2) = 1; // char
sizeof(Base3) = 8; // char + vp + memory alignment
sizeof(Derived1) = 8; // char + vp + memory alignment
*********************************************************************************

The content that the virtual pointer is pointing to is initialized in the order of objects initialization. I have discussed this issue in my another blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/the-order-of-object-initializationdestr.html. Within the hierarchy of inheritance the initialization starts form the most base class to the most derived class. The virtual functions entry is either added into the content if it does not exists, or updated if it exists already. Eventually the content that the virtual pointer points to is a list of function addresses of most overridden entries.

*********************************************************************************
class A {
public:
  std::string E();
  virtual void F();
  virtual double G(int);
  virtual int H(double);
};

class B : public A {
public:
  void F(int);
  void F();
  virtual double G(int);
  virtual double M(double);
  virtual long N(long);

};

class C: public B {
public:
  double G(int);
  double M(double);
  virtual void X();
};

C c;
*********************************************************************************

When instantiate an instance of class C, then the constructor will be called in the order of class A, class B and then class C. The content will be looks like after calling A::A(). (Normally the content of virtual table include type information (information returned when calling typeid() operator) as well, which is used later for dynamic cast.

After A::()
************************************
| class A
| A::F()
| A::G(int)
| A::H(double)
************************************

After B::B()
************************************
| class B // Updated to B now
| B::F() // updated
| B::G(int) // updated
| A::H(double)
| B::N(long) // added
************************************

After C::C()
************************************
| class C // updated to C now
| B::F()
| C::G(int) // updated
| A::(double)
| B::N(long)
| C::X() // added
************************************

When calling a virtual function at the run time, the content the virtual pointer points to will be searched to find the correct entry to call. The overhead is simply some numerical operation and de-referencing on a (virtual) pointer, which is very trivial for most applications.

2. The issues with virtual functions
Performance penalty:
It only causes performance issue when the function body is very small and called for many many times. I have discussed this issue in other topics. Please see http://cpluspluslearning-petert.blogspot.co.uk/2014/03/design-pattern-curiously-recurring.html, and I have discussed this issue in more details and a alternative design pattern is proposed.

Memory usage:
As I described in section 1, virtual functions will add one virtual pointer into the objects. This is often a trivial case in classes with big data. But it does brings memory consumption up for classes that have few/no data member, especially in the case that the classes are used for packaging data and therefore often used in some container (for instance std::stl). And the impact could be larger,  If consider the memory fragment and memory alignment. I have briefly discussed how the memory could affect the overall program performance. Please read this blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/performance-running-out-of-memory-heap.html, if interested.

Virtual can't be mixed with keywords like:
- inline;
- static;
- constexp;
All these keywords require knowing exactly what it is at the compiling time. But virtual is used for run time binding. These contradictory keywords are not allowed to use together. And it is worth pointing out that virtual function prevents compilers to inline the function.

What functions can be virtual function?
All the functions except constructor. Constructor and destructor have the static semantics, because they can be invoked directly rather than from ./-> from a constructed object. As we mentioned in section 2, virtual can not be used together with "static". (Constructor and destructor have static feature.) Just bear in mind that these two behave differently when coming to virtual. Constructor can't be declared as virtual but destructor can. And this is actually quite important to declare destructor as virtual when the programmer is to de-allocate derived class instance via a base class pointer, where "virtual" destructor will make sure the hierarchy of destructors are called. (See this blog entry for details, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/the-order-of-object-initializationdestr.html., if interested.)

Do not call virtual function in its own constructor:
This is one of most dangerous pitfalls or misuses of virtual function. The misuse would expect the virtual function called in the constructor behaves polymorphically. But to be surprised it would not and don't feel shocked if it does not do what you expected. It is due to the life span of objects. Please read this blog entry about it, http://cpluspluslearning-petert.blogspot.co.uk/2014/03/the-order-of-object-initializationdestr.html, if interested. The life of the object starts after its constructor is finished. And as shown in Section 1, the content of the virtual pointer points to is updated/added when going through the hierarchy of the initialization, from the base class to the derived class. If the virtual function is called in the constructor, this means that the life of the object has not started yet because at this moment data member are not initialized properly yet. And this may also mean that the content of the virtual pointer points to is not updated to date. Here I use the word "might", which means that this may be implementation dependent on the compiler. If the content of the virtual pointer points to is updated before coming to the constructor body, then calling virtual functions in constructor will behave polymorphically (as you expected). If updated after coming out of constructor body, then it behave incorrectly. For the health and safety do not leave your life out of the control of your own hands.

Call the virtual function after the object is initialized. To ways to achieve that
- Use wrapper class to make sure it happens safely
- Use factory method pattern to create objects and do not allow users to create objects directly from constructor.
*********************************************************************************
class Foo{
public:
  virtual void Init();
};

class FooD : public Foo {
public:
  virtual void Init()
};

// wrapper class
class FooWrapper {
public:
  FooWrapper() {
    // create proper type might need to pass info to this constructor
    m_Foo = std::auto_ptr<Foo>(new Foo());
    // m_Foo = std::auto_ptr<Foo>(new FooD());

    m_Foo->Init();      
  }

private:
  std::auto_ptr<Foo> m_Foo;
};

// factory patterns
std::auto_ptr<Foo> CreateFoo(int type)
{
  std::auto_ptr<Foo> fooPtr;
  switch(type) {
  case :
    fooPtr = std::auto_ptr<Foo>(new Foo());
    break;
  //......
    fooPtr = std::auto_ptr<Foo>(new FooD());
  }

  if (fooPtr.get()) {
    fooPtr->Init();
  }

  return  fooPtr;
}
*********************************************************************************

The key point of workarounds is to prevent the users to create object directly from constructor. Either hide constructor from the users, and expose the wrapper class/the factory method to them only. Techniques like hiding the constructor of Foo/FooD by declaring them no-public and declare wrapper class and factory method friends of Foo/FooD.

Default value for virtual function not dynamic-linked
Default value for arguments are not dynamic linked. They are always static-linked even for virtual functions. Therefore call the virtual function via the base class pointer/reference don't expect that default value has polymorphic behavior.

*********************************************************************************
class Base {
public:
    virtual void Foo(int = 10);
};

class Derived : public Base {
public:
    void Foo(int = 20);
};

Derived d;
d.Foo(); // call D::Foo(20);
Base b;
b.Foo; // call Base::Foo(10);
Base& bRef = d;
bRef; // call D::Foo(10);
*********************************************************************************

Virtual function has dynamical linking but its default value does not. The best practice is not to use default value for virtual function and certainly not trying to override default value from derived class.

Virtual control and name hiding
Please see my other blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/c-features-override-controls.html.

Virtual function with multiple inheritance
Please see my other blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/multiple-inheritance-why-and-its.html.

3. Pure virtual function
In syntax pure virtual function is a virtual function appended with "= 0". It enforces two points,
- All the derived classes will have to override this function.
- No instance of the abstract class (class having at lease one pure virtual function) can be instantiated.

*********************************************************************************
class AbstractClass{
public:
  virtual void Foo() = 0;
};

class Derived1 : AbstrctClass {
public:
  void Foo() {}

};
*********************************************************************************

The requirement of overriding the pure virtual functions will be enforced by the compiler at the compiling time.

And this semantics will guarantee:
*********************************************************************************
AbstractClass ac; // error
void Bar(AbstractClass ac) ; // error
void Bar1(AbstractClass& ac); // ok
void Bar2(AbstractClass& acPtr); // ok
AbstractClass* acPtr = new Derived1() ; // ok
Derived1 d1;// ok
AbstractClass& acRef = d1; // ok
*********************************************************************************

What functions can be pure virtual functions?
All functions except constructor.

Difference between normal pure virtual function and pure virtual destructor
Normal pure virtual function does not require an implementation. However pure virtual destrucotr requires an implementation to enforce the call hierarchy of destructing a derived object via a base class pointer. Otherwise the compiler will complains about it

Default implementation of pure virtual functions.
It is absolutely legitimate to have default implementation of pure virtual functions. Herb Sutter lists a few scenarios that the default implementation for pure virtual functions make sense in [1].

4. Senarios to use pure virtual functions
Interface class:
Unlike Java and C# there is not a conceptual keyword for "interface" in C++. Interface serves the public API/gateway to another model, which provides excellent boundary/de-coupleing between models. In C++ interface class is achieved by pure abstract class, which functions are all pure virtual and does not have data member.

*********************************************************************************
class Interface {
public:
  virtual void Foo() = 0;
  virtual std::string GetName() = 0;
};
*********************************************************************************

This is one of usages of pure virtual function. Actually this the most important one in my opinion.

Class conceptually exists and does not in reality:
This is often caused by the process of abstraction and encapsulation. Often when designing the software architecture, the concrete objects are lifted into an abstract entity (class), which shares the common property (data member) and behaviors (functions). And the entity is only existing conceptually and is not in reality. But in the programming the conceptual entity is often being created and passed around in order to achieve better software decoupling and modulation.

For instance a car, conceptually has four wheels, steering, lights and can drive forward/backward and turn left/right. This is how a car should be in conceptually by lifting all the cars shares in common. But in reality a conceptual car does not exist because only branded cars are available on the market, for instance, BMW, General Motor or Mercedes.

*********************************************************************************
class Car {
public:
    virtual void TurnRight();
    virtual void DriveForward();
    virtual void Reverse();

private:
    int m_NumberOfGears;
};

class Mercedes : public Car {

};

class BMW : public Car {

};

// requirement
Car c; // fail
void Process(Car);// fail
void Process(Car&); // ok
void Process(Car*); //ok
Car* cPtr = new Mercedes(); //ok
Process(cPtr); //ok

*********************************************************************************

In some occasions the conceptual entity (here Car) have the default behavior/basic implementation for all the virtual functions. And the derived class (Mercedes, BMW) may or may not have different behavior (may or may not override the virtual function). But still the programmers want better code de-coupling or modulation. So the top (conceptual) class will be used outside the model but can not be instantiated. In this case if all the virtual functions have their own default implementation, usually destructor can be the candidate to be declared as a pure virtual function to meet the requirement. This is one of techniques to achieve this. Another technique is to make the top class's constructor protected and use factory method pattern to create instance based on derived classes only. More details please my other blog entry, http://cpluspluslearning-petert.blogspot.co.uk/2014/04/design-pattern-factory-method-pattern.html

Bibliography:
[1] Herb Sutter, "More Exceptional C++ - 40 New Engineering Puzzles, Programming Problems, and Solutions", 2002

No comments:

Post a Comment