rygo6

Welcome to the Raspberry Pi 4 on my desk running OpenBSD. Future blog

Templates. Good feature. Bad default.

Templates are one of those features which are so convenient, but they also create one of the worst possible incentives in a programming language.

Issue being, templates duplicate assembly. If you write out a whole class as a template, use it for different types, the template will duplicate the assembly for each type. If you have multi-type templates, that rely on other templates, then you end up with a duplicate of the assembly for every single permutation of every nested type used. This can compound non-linearly the more and more templates you add, and the more templates that rely on more templates.

Aside from increased compile times, and increasing binary bloat, this can have performance impact. Which is something I've been becoming too acutely aware in the past few months. If you have hot path code, there can be significant gains from setting it up so that code path can reuse as many of the same instructions as possible. It is akin to reusing data which is already in a CPU cache for performance. This applies to instructions too. It is better to keep reusing cached instructions if possible.

There are some cases where you do want to duplicate assembly for different types. Vectorizable 3D math operations are an example of where you might want this. But assuming it's better to duplicate the assembly for every type should not be the default. It's a terrible default.

Technically the compiler can do some deduplication if the assembly is exactly the same, but I find in practice this is like any other thing when you assume the compiler is smarter than you. It often doesn't do what you would want. It seems struggle with more complex methods. There is a minefield of edge cases which will make it struggle to deduplicate. The ability to deduplicate can vary from compiler to compiler. Even if it does manage to deduplicate effectively to not have runtime issues, that still adds compile time. It also cannot automatically make changes to deal with types of differing sizes, even if such changes would be trivial, so there are scenarios you'd have to manually do it anyways.

If you want to be sure you are not duplicating assembly for a type, it is better to avoid templates. It's better to write it in such a way that you pass in offset, count, and stride to then work with raw pointers. Then the generated assembly can be reused for many types. Reducing binary bloat, reducing compile time and potentially enabling performance gains depending on what exactly it is. Then if you want you can make a simple macro to fill in size and stride for some syntactical sugar.

The way templates are presented incentivizes their use for everything which needs to deal with differing types. You do get a lot of C++ codebases doing this. Potentially causing lots of binary and instruction bloat. However, templates nicely smooth over the potentially terrible thing happening underneath and hide it.

This is one of those things where, technically the feature is nice, and can be very useful in certain circumstances, but the incentive it introduces into the language is terrible. It incentivizes being careless about assembly duplication and not paying attention to what you are doing.

This is personally why I tend to think templates, in retrospect, aren't a good feature. That macros are sufficient, even though macros are harder to use than templates for comparably complex things. Because if you are struggling with a macro that is too complex, it's probably because the macro is doing too much and you shouldn't be doing that. It disincentivizes using macros for entire methods or systems and instead incentivizes the codebase towards simpler macros for little bits of syntactical sugar here and there, or simple operations.