Friday 30 May 2014

Aggregate on C++11

One way to learn aggregate is starting from C++ standard, reading the definition and the clauses. Here I would like to start from the other way. I will start from how it is being used and reversely deducing its requirements defined in standard.

1. Aggregate in C++03 Standard
C++03 inherits the initlialzer-lists of C's style. Initializer-lists can be used on aggregate. It means that aggregate can be directly initialized via "{}". And the member variables in aggregate will be initialized in the order of the values appearing in the {}-list. This implies two requirements,
    - Statically initialized at compiling time
    - Know the exact memory footprint at compiling time
Here I would like to show you what exactly these two requirements mean in C++ standard

Example 1: initializer-lists on classes (Here the clause of "classes" refers to class, struct and union in standard.)
//********************************************************************************
struct Foo {
    int x;
    double y;
};

Foo f = {1, 10.0};

struct Bar {
    int a;
    double b;
    Foo f;
    double c;
};
Bar bar = {2, 3.0, {1, 10.0}, 2.0};
//********************************************************************************

In Example 1 "Bar bar" is initialized as:
    bar.a = 2
    bar.b = 3.0
    bar.f.x = 1
    bar.f.y = 10.0
    bar.e = 2.0
This tells us that the sizeof(Bar) = size(int) + sizeof(double) + sizeof(Foo) + sizeof(double), where sizeof(Foo) = size(int) + sizeof(double), without considering memory padding and memory aligment (see my other blog entry in Performance - Running out of memory (heap)). And this has to be known by compiler at compiling time. At the same time it has to be known of the exact memory footprint/order of Bar, then you can know what exactly the values of member variables of "Bar bar" is initialized.

Let's think about what could affect the memory size and footprint of a class in C++. Virtual function and inheritance.
Virtual function will add an entry, virtual table pointer, into the memory of an object, which will increase the size of an object. More importantly C++ standard does not define where this virtual table pointer should locate in the memory of the object (It could stay at the top or anywhere else). Therefore there is no way to have a portable implementation to initialize an object via initializer-lists, if classes have virtual functions
Inheritance also increase the size of the derived classes. Normally the size of Derived is equal to its own size of non-static member variables plus the size of Base. (This is not exactly true if the class is an empty class or has virtual functions). But this is correct to claim that sizeof(Derived) >= sizeof(Base). And only when both Derived and Base class have no data member, then it will be the "equal" case. And the same case as virtual functions C++ standard does not define the order of Base class's memory in the footprint of Derived. Therefore there is no way to have a portable implementation to initialize an object via initializer-lists, if classes have base classes.

Here we can conclude aggregate can not have any C++ feature that increase its memory size or affects the order of its memory footprint. In other words aggregate can have features/qualification of C++, as along as they do not have impact on the memory.
    - No virtual functions
    - No base classes
    - Can have as many static (public/protected/private) member variables as you like.
    - Can have as many (public/protected/private) functions as you like.

One more thing keep in mind is that as initializer-lists can initialize the member variables of aggregate directly. It implies that all the member variables have to be visible/accessible from outside by anyone and this means that all the non-static member variables in aggregate classes have to be "public". (Static member variables will not take memory of objects, because it resides in global/static data section in memory and shared by all the objects. And they can directly accessed by classes plus "::")

Example 2: initializer-lists on array
//********************************************************************************
// user-defined type
class Foo;
Foo fooArr[n] = {F1, F2, ..., Fm};
// build-in type
int a[n] = {X1, X2, X3, ..., Xm};
//********************************************************************************

Where F1, F2, ..., Fm are the instances of Foo. X1, X2, ..., Xm are all integer with build-in type "int". Here are the 3 different relationships between n and m:
    n = m: each value is initialized as specified
    n > m: the first m values are initialized as specified and the rest is initialized as default value
    n < m: compilation error - compiler will flag it out

As shown in the case of (n>m), the rest of (n-m) objects will be initialized as the default objects. This will require that classes have to provide a default constructor. It implies that the classes can not have user-defined constructor because C++ standards says that any user-defined constructor will suppress the default constructor. As this class can (only) be initialized by default constructor. It leads to another requirement. All the member variables have to have default values. It implies that all the member variables must be build-in types or any other existing aggregate classes. And keep in mind that C++ reference does not have default value and therefore the aggregate class can not have member variables with C++ reference type.

Example 3: arrays with non-aggregate
//***************************'*****************************************************
class Foo () {
public:
    Foo(int x) : m_x(x) {} // Suppress the default constructor and
                                      // therefore Foo is not an aggregate
    int m_x;
};

Foo arrFoo1[] = {Foo(1), Foo(2), Foo(3)}; // ok
Foo arrFoo2[3] = {Foo(1), Foo(2), Foo(3)}; // ok
Foo arrFoo3[3] = {Foo(1)}; // Not ok
//********************************************************************************

Keep in mind that all arrays in C++03 are aggregate. But not all of them are legally initialized. In Example 3, Foo is not an aggregate class because it has user-defined constructor. Except arrFoo3, they are all legal aggregates, because arrFoo3 is the (n>m) case shown in Example 2. So the rest (3-1=2) has to be initialized as default value. Then it becomes illegal because Foo does not provide default constructor.

Here is the list of things that are worth of keeping in mind in term of aggregate in C++03
    - No virtual function
    - No base classes
    - No user-defined constructor (default constructor only)
    - No limitation on copy constructor, assignment operator and destructor (user-defined allowed)
    - No reference type in member variables
    - Any other build-in types and aggregate as member variables
    - All public non-static member variables
    - Any public/protected/private static variables
    - Any public/protected/private static/non-static functions
    - Any array is aggregate
    - Array works both on aggregate and non-aggregate
    - Array works only on aggregate in (n>m) case shown in Example 2

2. Improvement on C+11 Standard
There is no significant improvement on aggregate in C++11. However some new features newly introduced in C++11 relax the requirement/definition of aggregate.

Feature 1: explicitly defaulted member function
More details about this feature please refer to my other blog entry, Explicitly defaulted/deleted member functions.
This is not really an improvement. It simply changes the notation of explicitly declaring to use the default constructor generated by the compiler.

Example 4
//********************************************************************************
// C++11
class Foo {
public:
    Foo() = default;
    int m_x;
};
//********************************************************************************

C++11 allows to use "default" to declare explicitly that Foo will use the default constructor. And in C++11 Foo is an aggregate. However in C++03 any declaration/definition of constructor will prevent classes from being aggregate.

Feature 2: default value for class member variables
More details about this feature please refer to my other blog entry, Improvement on object construction.

This is a significant improvement on C++11 over C++03. It allows the aggregate to have different default value under C++11. In Example 1, the default values of member variables of Foo and Bar will be {0, 0.0}, {0, 0.0, {0, 0.0}, 0.0}

Example 5: default values in C++11
//********************************************************************************
struct Foo {
    int x = 1;
    double y = 1.0;
};

struct Bar {
    int a = 10;
    double b = 10.0;
    Foo f = {2, 20.0};
    double c = 30.0;
};

Bar barArr[3];
//********************************************************************************

barArr will have 3 Bar objects with the default value of {10, 10.0, {2, 20.0}, 30.0}, as specified in its declaration. It will save a lot of time/code to re-initialize them to different values in C++03.

Bibliography:
[1] C++03 Standard
[2] C++11 Standard
[3] http://www.stroustrup.com/C++11FAQ.html
[4] N2640 by Jason Merrill and Daveed Vandevoorde
[5] http://en.wikipedia.org/wiki/C++11

No comments:

Post a Comment