Learning C++ part 5

Table of Contents

Compiler-specific attributes

__attribute__((always_inline))

__attribute__((always_inline)) is a GCC and Clang compiler-specific attribute that instructs the compiler to inline a function whenever possible. Unlike the standard inline keyword, which merely suggests that a function should be inlined, always_inline is a much stronger request and overrides many of the compiler's normal inlining heuristics.

Under normal circumstances, compilers decide whether to inline a function based on factors such as function size, complexity, recursion, and the current optimization level. Applying always_inline tells the compiler to ignore many of these heuristics and aggressively embed the function body directly at each call site.

One notable difference is that always_inline can still force inlining even when compiling with -O0, where normal inline functions are typically left as regular function calls. However, the compiler can still refuse to inline if it is technically impossible, such as when the function definition is unavailable, the function is recursive, or it has been marked noinline. In these situations, GCC and Clang usually emit a warning.

This attribute is commonly combined with static inline for small utility functions defined in header files. This provides the efficiency of a macro while preserving type safety, proper scoping, and easier debugging.

1static inline __attribute__((always_inline))
2int Add(int a, int b) {
3    return a + b;
4}
5
6int main() {
7    int result = Add(5, 7);
8    return 0;
9}

When should you use it?

Performance-critical code such as tight loops, packet processing, or low-latency systems where eliminating function call overhead can have a measurable impact.

Small helper functions that are called extremely frequently throughout your codebase.

Header-only libraries where you want utility functions to be embedded directly into the caller instead of generating separate function calls.

[[nodiscard]]

[[nodiscard]] is a standard C++ attribute introduced in C++17 that tells the compiler to warn whenever a function's return value is ignored. It helps prevent bugs caused by accidentally discarding important results such as error codes, newly created objects, or status values.

Without [[nodiscard]], a programmer can accidentally call a function and ignore its result without any compiler warning. Applying the attribute encourages correct usage by reminding the developer that the returned value is important.

This attribute is particularly useful for functions that return success or failure indicators, resource handles, smart pointers, or objects whose creation has side effects. The C++ standard library also uses [[nodiscard]] for several APIs, such as std::empty(), helping prevent mistakes where developers accidentally treat it as a command instead of a query.

1#include <iostream>
2
3[[nodiscard]]
4int CalculateSum(int a, int b) {
5    return a + b;
6}
7
8int main() {
9    // Compiler warning: return value is ignored.
10    CalculateSum(10, 20);
11
12    int result = CalculateSum(5, 5);
13
14    std::cout << "Sum: " << result << std::endl;
15
16    return 0;
17}

Common use cases

Functions that return error codes or status objects where ignoring the result could hide failures.

Factory functions that create resources or objects that should not be silently discarded.

Query functions whose names could easily be mistaken for commands, such as empty().

__attribute__((flatten))

__attribute__((flatten)) is a GCC compiler-specific attribute that tells the compiler to aggressively inline every function called inside the annotated function. Instead of evaluating each function call individually using normal inlining heuristics, the compiler attempts to inline the entire call hierarchy into a single function body.

In effect, the annotated function becomes "flattened", with many of its helper functions expanded directly into it. This can significantly reduce function call overhead in hot code paths and may expose further optimization opportunities, such as constant propagation, dead code elimination, and loop optimizations.

The compiler still cannot inline functions whose definitions are unavailable, functions explicitly marked with noinline, or cases where inlining is impossible. Because the same code may be duplicated into multiple callers, excessive use of flatten can significantly increase binary size.

1#include <iostream>
2
3void UtilityFunction() {
4    std::cout << "Doing work..." << std::endl;
5}
6
7__attribute__((flatten))
8void CriticalLoop() {
9    for (int i = 0; i < 1000; i++) {
10        UtilityFunction();
11    }
12}
13
14int main() {
15    CriticalLoop();
16    return 0;
17}

When should you use it?

Performance-critical functions where nearly all execution time is spent inside a single call chain.

Hot loops that repeatedly invoke many small helper functions and where removing function call overhead can improve performance.

Use sparingly, as aggressive inlining can increase executable size, reduce instruction cache efficiency, and even hurt performance in some situations.

noexcept

noexcept is a C++ specifier that tells the compiler a function is guaranteed not to throw exceptions. If a function marked noexcept does throw, the program immediately calls std::terminate() instead of attempting to unwind the stack.

One of the most important uses of noexcept is for move constructors and move assignment operators. The C++ Standard Library relies on this guarantee to safely optimize container operations, particularly for containers such as std::vector.

Consider what happens when a std::vector runs out of capacity. It allocates a larger block of memory and must relocate all of its existing elements into the new storage. If the element type has a move constructor that is marked noexcept, the vector can simply move every element, which is typically much cheaper than copying.

If the move constructor is not marked noexcept, the container cannot safely assume that moving every object will succeed. If an exception were thrown halfway through the relocation, some objects would already have been moved from while others would remain in the old storage, making it difficult to provide the strong exception guarantee.

To preserve correctness, containers such as std::vector typically choose to copy elements instead of moving them whenever moving is not guaranteed to be exception-safe. Copying often performs deep copies, additional memory allocations, and significantly more work than moving, potentially resulting in much slower reallocation.

Marking functions noexcept can also enable additional compiler optimizations. Since the compiler knows the function cannot throw, it may omit exception handling metadata such as stack unwinding tables, producing smaller binaries and allowing more aggressive optimization.

1class Buffer {
2public:
3    Buffer(Buffer&& other) noexcept
4        : data_(other.data_), size_(other.size_) {
5        other.data_ = nullptr;
6        other.size_ = 0;
7    }
8
9    Buffer& operator=(Buffer&& other) noexcept {
10        if (this != &other) {
11            delete[] data_;
12
13            data_ = other.data_;
14            size_ = other.size_;
15
16            other.data_ = nullptr;
17            other.size_ = 0;
18        }
19
20        return *this;
21    }
22
23private:
24    int* data_ = nullptr;
25    std::size_t size_ = 0;
26};

When should you use it?

Mark move constructors and move assignment operators noexcept whenever they truly cannot throw exceptions. This enables the STL to use move operations during container reallocation.

Use noexcept for small utility functions that are guaranteed not to fail, making your API's exception guarantees explicit.

Do not mark a function noexcept unless you are certain it cannot throw. If an exception escapes such a function, the program will terminate immediately.

Variadic templates

Variadic templates allow a function or class template to accept an arbitrary number of template arguments. They are built around parameter packs, which represent zero or more types or values. Combined with std::forward, they enable perfect forwarding, allowing a wrapper function to pass any number of arguments to another function while preserving each argument's original value category (lvalue or rvalue) and cv-qualifiers.

The expression std::forward<Args>(args)... performs a parameter pack expansion. The trailing ... expands both the type pack (Args) and the function parameter pack (args) simultaneously, producing one std::forward call for every argument.

1template <typename... Args>
2void Wrapper(Args&&... args) {
3    TargetFunction(std::forward<Args>(args)...);
4}

Breaking down the syntax

typename... Args declares a template parameter pack, allowing the template to capture zero or more types.

Args&&... args declares a function parameter pack. Because Args is deduced, each Args&& becomes a forwarding reference (formerly known as a universal reference), capable of binding to both lvalues and rvalues.

std::forward<Args>(args)... expands into a comma-separated list of perfectly forwarded arguments, ensuring that each argument retains the same value category it had when originally passed to the wrapper.

Why is std::forward necessary?

Although Args&& can bind to an rvalue, once the parameter is given a name inside the function (for example, args), it becomes an lvalue expression. If you simply write

1TargetFunction(args...);

every argument is treated as an lvalue, even if the caller originally passed an rvalue. As a result, move constructors and rvalue overloads can no longer be selected.

std::forward conditionally casts each argument back to an rvalue only if it was originally passed as an rvalue. Otherwise, it forwards the argument as an lvalue. This behavior is what makes perfect forwarding possible.

1int x = 10;
2
3Wrapper(x, 42, std::string("temporary"));

The parameter pack expansion becomes:

1TargetFunction(
2    std::forward<int&>(args0),          // x -> forwarded as int&
3    std::forward<int>(args1),           // 42 -> forwarded as int&&
4    std::forward<std::string>(args2)    // temporary -> forwarded as std::string&&
5);

When should you use variadic templates?

Wrapper functions that forward arguments to another function while preserving move semantics.

Generic factories such as std::make_unique and std::make_shared, which forward constructor arguments to the object being created.

Generic libraries, logging frameworks, containers, and utility functions that need to accept any number of arguments without writing multiple overloads.

std::pmr::monotonic_buffer_resource

std::pmr::monotonic_buffer_resource is a high-performance memory resource introduced in C++17 as part of the Polymorphic Memory Resource (PMR) library. It implements a monotonic (or bump) allocator, where memory is allocated sequentially from a buffer and is never individually freed. Instead, all allocated memory is released at once when the memory resource itself is destroyed.

Unlike traditional allocators, which must keep track of every allocation and deallocation, a monotonic buffer resource simply advances an internal pointer whenever memory is requested. Since deallocate() is effectively a no-op, allocation is extremely fast and incurs very little bookkeeping overhead.

If the supplied buffer becomes full, the resource automatically allocates a larger memory block from an upstream memory resource, which by default is std::pmr::new_delete_resource(). Future allocations continue from this new block, allowing programs to continue running without allocation failures. The newly allocated blocks are also released only when the memory resource is destroyed.

Because memory is only reclaimed when the resource's lifetime ends,std::pmr::monotonic_buffer_resource is best suited for workloads where many objects share the same lifetime, such as parsing, request handling, compilers, game engines, or temporary data structures that are discarded together.

1#include <array>
2#include <iostream>
3#include <memory_resource>
4#include <vector>
5
6int main() {
7    // Fixed-size buffer allocated on the stack.
8    std::array<std::byte, 1024> buffer;
9
10    // Construct a monotonic memory resource using the buffer.
11    std::pmr::monotonic_buffer_resource pool(
12        buffer.data(),
13        buffer.size()
14    );
15
16    // Allocate vector storage from the memory resource.
17    std::pmr::vector<int> numbers(&pool);
18
19    numbers.push_back(10);
20    numbers.push_back(20);
21    numbers.push_back(30);
22
23    std::cout << "Vector size: "
24              << numbers.size()
25              << '
26';
27
28    // All memory allocated by 'pool' is released together
29    // when 'pool' goes out of scope.
30    return 0;
31}

Why is it so fast?

Allocation simply increments an internal pointer instead of searching for free blocks or maintaining allocation metadata.

Individual deallocations are ignored, eliminating the overhead of free lists, coalescing, and fragmentation management.

Many small allocations can be satisfied from a single contiguous memory buffer, significantly reducing calls to the heap allocator.

Things to watch out for

Since individual allocations are never freed, memory usage only grows over the lifetime of the resource. If the resource lives for the entire duration of a long-running application, memory consumption can continue increasing even if the containers themselves are cleared.

If the initial buffer is too small, the resource transparently falls back to its upstream allocator. While correct, this introduces heap allocations and may reduce the performance benefits of using a monotonic allocator.

std::pmr::monotonic_buffer_resource is not thread-safe. If multiple threads allocate from the same resource concurrently, external synchronization is required.

When should you use it?

Temporary objects that all become unreachable at roughly the same time, such as compiler ASTs, parsers, serializers, or request-processing pipelines.

High-performance applications that perform thousands or millions of small allocations, where the overhead of the general-purpose heap allocator becomes significant.

Low-latency systems, including trading engines and game engines, where deterministic allocation performance is often more important than reclaiming memory immediately.