Showing posts with label Template Metaprogramming (TMP). Show all posts
Showing posts with label Template Metaprogramming (TMP). Show all posts

Saturday, November 15, 2025

Teaching the Compiler to Think: A Code Study in Template Metaprogramming

Imagine you could run parts of your program before it even compiles into an executable. Instead of calculating a value every time the user runs your app, the value is already baked into the code, ready to go. This is the core idea behind a powerful, advanced C++ technique called Template Metaprogramming (TMP).

TMP treats the C++ template system as a Turing-complete functional programming language. Instead of variables, we have static const members. Instead of functions, we have template structs. And instead of loops, we have recursion.

Recently, I came across some fascinating TMP code. In this article, we’ll explore how TMP works by “teaching the compiler” to generate unique numerical encodings for date formats—entirely at compile time.

 

#include <stdio.h>

 

enum { Y, M, D };

 

template<unsigned F, unsigned W = 2>

struct datefield {

     static const unsigned type = F * 10 + (W % 10);

};

 

template <typename T1, typename T2 = void, typename T3 = void>

struct dateformat

{

     static const unsigned pow10 = 100 * dateformat<T2, T3>::pow10;

     static const unsigned value = pow10 * T1::type + dateformat<T2, T3>::value;

};

 

template <>

struct dateformat<void, void, void>

{

     static const unsigned value = 0;

     static const unsigned pow10 = 1;

};

 

enum

{

     YYYYMMDD = dateformat<datefield<Y, 4>, datefield<M>, datefield<D>>::value,

     DDMMYY = dateformat<datefield<D>, datefield<M>, datefield<Y>>::value,

     YYYYMM = dateformat<datefield<Y, 4>, datefield<M>>::value,

};

 

int main() {

     printf("dateformat<Y, 4>=%u, dateformat<Y>=%u, dateformat<M>=%u, dateformat<D>=%u\n",

           dateformat<datefield<Y, 4>>::value, dateformat<datefield<Y>>::value,

           dateformat<datefield<M>>::value, dateformat<datefield<D>>::value);

     printf("YYYYMMDD=%u, DDMMYY=%u, YYYYMM=%u\n", YYYYMMDD, DDMMYY, YYYYMM);

 

     return 0;

}

 

Any guesses what this program prints? We’ll find out shortly.


A Different Way of Thinking

Normally, we use C++ templates to create generic code, like a vector that can hold any type. In TMP, we use templates as a mini-programming language that the compiler itself executes.

Here are the rules for this "language":

  • Variables don't vary: They are compile-time constants, usually defined with static const.
  • "Functions" are structs: We "call" a function by instantiating a template struct. The "return value" is a static const member inside it.
  • Loops are done with recursion: We make a template call itself with slightly different parameters until it hits a "stop" condition.

Step 1: The Basic "Function" - datefield

Let's look at the simplest piece of our program:

template<unsigned F, unsigned W = 2>

struct datefield {

     static const unsigned type = F * 10 + (W % 10);

};

 

Think of datefield as a simple function. It takes two numbers at compile time (F for field and W for width) and calculates a new number called type.

When the compiler sees datefield<D>, it knows D is 2 (from the enum) and the default W is 2. It immediately calculates:

static const unsigned type = 2*10 + (2 % 10);

...and determines that datefield<D>::type is the constant 22. This calculation happens during compilation, not at runtime.

Step 2: The Recursive Engine - dateformat

This is where the magic happens. We need to combine multiple datefields.

 

template <typename T1, typename T2 = void, typename T3 = void>

struct dateformat

{

     static const unsigned pow10 = 100 * dateformat<T2, T3>::pow10;

     static const unsigned value = pow10 * T1::type + dateformat<T2, T3>::value;

};

 

This is our recursive "function". Look closely at the value calculation. To figure out the value for dateformat<T1, T2, T3>, the compiler realizes it first needs to figure out the value for dateformat<T2, T3>. This is a recursive call! It causes the compiler to peel off the first datefield and re-run the process on the rest of the list.

But what happens when we run out of fields?

Step 3: The Stop Sign - The Base Case

A recursive function that never stops is an infinite loop. In TMP, this causes a compilation error. We need a "base case" to tell the compiler when to stop.

template <>

struct dateformat<void, void, void>

{

     static const unsigned value = 0;

     static const unsigned pow10 = 1;

};

This is a template specialization. It's a specific rule that says: "If you ever see dateformat with no fields (void, void, void), don't use the main recursive template. Use this one instead." This template provides simple, fixed values (value = 0) and stops the recursion.


Tracing the Compiler's "Thoughts"

Let's follow the compiler as it calculates YYYYMM.

  1. You ask for: dateformat<datefield<Y,4>, datefield<M>>::value.
  2. Compiler says: "Okay, to get that value, I need to instantiate dateformat<datefield<Y,4>, datefield<M>>. The formula requires me to first find the value from dateformat<datefield<M>>."
  3. Compiler says: "Now I need to instantiate dateformat<datefield<M>>. The formula requires me to first find the value from dateformat<void, void, void>."
  4. Compiler says: "Aha! I have a special rule for dateformat<void, void, void>. Its value is 0 and its pow10 is 1. The recursion stops here."

Now the compiler can work its way back up, calculating the final values:

  1. Finishing dateformat<datefield<M>>:
    • pow10 = 100 * 1 (from base case) = 100
    • value = 100 * datefield<M>::type + 0 (from base case) = 100 * 12 + 0 = 1200
  2. Finishing dateformat<datefield<Y,4>, datefield<M>>:
    • pow10 = 100 * 100 (from step 5) = 10000
    • value = 10000 * datefield<Y,4>::type + 1200 (from step 5) = 10000 * 4 + 1200 = 41200

And it's done! The compiler determines that YYYYMM is the constant 41200. When you run your program, this number is already computed and stored in the executable, making it incredibly fast.

 And here is the result of this program:


 

Pros and Cons

Pros:

  • Zero Runtime Overhead: All calculations are performed by the compiler. The resulting values (YYYYMMDDDDMMYY, etc.) are hard-coded into the executable as if you had typed the numbers yourself. This is extremely efficient.
  • Type Safety and Expressiveness: The format is defined declaratively (e.g., dateformat<datefield<Y,4>, datefield<M>>). This is more expressive and less prone to "magic number" errors than manually calculating and defining #define YYYYMM 412. The compiler validates the structure.
  • Extensibility: It's easy to define new formats without changing the core logic.

Cons:

  • Complexity: The code is difficult to read and understand for developers not familiar with template metaprogramming. The recursive nature and separation of value and pow10 can be confusing.
  • Compiler-Intensive: For very deep or complex template recursions, compile times can increase significantly.
  • Debugging Challenges: Debugging TMP code is notoriously difficult. Errors are reported as complex template instantiation failures, which can be cryptic and hard to trace back to the logical error.
  • Limited Functionality: This approach is suitable for generating constants. It cannot be used for runtime parsing or formatting of date strings, which would require a completely different, traditional runtime implementation. 


History is evolving

In the early days of C++, TMP was a revelation. It was the primary method for performing complex compile-time logic. It was the engine behind foundational libraries like Boost and was used for creating highly efficient code (e.g., expression templates that eliminate temporary objects in math libraries) and for type introspection (type traits). The example code is a perfect illustration of this classic, powerful, but syntactically complex style.

 While modern C++ (from C++11) offers simpler ways to do this with constexpr, understanding this foundational TMP technique gives you a deep insight into the power and flexibility of the C++ language.

 

P.S. I fed the code to a few AI tools—funny enough, they gave different outputs:

 

- by chatGPT

dateformat<Y, 4>=400, dateformat<Y>=200, dateformat<M>=1200, dateformat<D>=2200

YYYYMMDD=4122200, DDMMYY=22120200, YYYYMM=41200

 

- by deepseek

dateformat<Y, 4>=400, dateformat<Y>=200, dateformat<M>=1200, dateformat<D>=2200

YYYYMMDD=4122200, DDMMYY=22120200, YYYYMM=41200

 

- by GPT-5-Codex (Preview)

dateformat<Y, 4>=400, dateformat<Y>=200, dateformat<M>=1200, dateformat<D>=2200

YYYYMMDD=4122200, DDMMYY=22120200, YYYYMM=41200

 

- by Claude Sonnet 4.5 (wrong):

Individual datefields:

 

datefield<Y,4>::type = Y*10 + 4 = 0*10 + 4 = 4

datefield<Y,2>::type = Y*10 + 2 = 0*10 + 2 = 2 (default W=2)

datefield<M,2>::type = M*10 + 2 = 1*10 + 2 = 12

datefield<D,2>::type = D*10 + 2 = 2*10 + 2 = 22

 

For YYYYMMDD = dateformat<datefield<Y,4>, datefield<M>, datefield<D>>::value:

Works through: 4 + 12100 + 2210000 = 4 + 1200 + 220000 = 221204

 

For DDMMYY = dateformat<datefield<D>, datefield<M>, datefield<Y>>::value:

Works through: 22 + 12100 + 210000 = 22 + 1200 + 20000 = 21222

 

For YYYYMM = dateformat<datefield<Y,4>, datefield<M>>::value:

Works through: 4 + 12*100 = 4 + 1200 = 1204

 

Output:

dateformat<Y, 4>=4, dateformat<Y>=2, dateformat<M>=12, dateformat<D>=22

YYYYMMDD=221204, DDMMYY=21222, YYYYMM=1204

 

- by Gemini 2.5 pro (wrong)

dateformat<Y, 4>=4, dateformat<Y>=2, dateformat<M>=12, dateformat<D>=22

YYYYMMDD=41222, DDMMYY=221202, YYYYMM=412

 

Teaching the Compiler to Think: A Code Study in Template Metaprogramming

Imagine you could run parts of your program before it even compiles into an executable. Instead of calculating a value every time the user r...