Specifics of C++ lambdas

Dmitry Soshnikov
6 min readOct 10, 2019

--

Lambda functions, introduced since C++11, are convenient way of passing different types of callbacks, doing parametrized operations, etc.

And as in many other languages, C++ lambdas are closures. That is, they can capture bindings from the outer scope, and use them inside. However they do this with some C++ specifics.

Note: you can get generic description of closures in a view of Funarg problem in this article.

Mutable capture by value: no sharing

First let’s consider the following JavaScript example:

function makeCounter(init = 0) {
let value = init;
return {
increment() {
return ++value;
},
decrement() {
return --value;
},
};
}
const counter = makeCounter(0);counter.increment(); // 1
counter.increment(); // 2
counter.decrement(); // 1

By default this code doesn’t work in C++.

In order to mutate a value inside a lambda, it should either be marked as mutable, or it should capture a variable by reference.

And while the mutable lambdas allow mutating the value, they don’t allow sharing this value with other lambdas. Each closure just gets its own copy of the value:

// Attempt 1: mutable lambda, capture by value.
// Doesn't allow sharing.
auto makeMutableValueCounter(int init = 0) {
auto value = init;
return std::make_pair(
[value]() mutable {
return ++value;
},
[value]() mutable {
return --value;
}
);
}
...auto c1 = makeMutableValueCounter(0);c1.first(); // 1
c1.first(); // 2, works!
c1.second(); // -1, not really!

Stack references: no Upwards Funarg

Due to C++’s memory organization — in particular due to allocating local variables on the stack — C++ lambdas do not easily and directly support Upwards Funarg.

Note: Upwards Funarg is when a function is returned outside from another function.

OK, coming from other functional languages, one could assume that to support sharing, JavaScript might be capturing the bindings by reference to both closures. JavaScript really stores a reference to a captured environment, however attempts to directly map it to C++ do not give a desired result.

In C++ local variables are allocated on the stack. And the stack-frames are automatically deallocated when a function returns, so to have a reference to a stack-allocated region means to produce an undefined behavior.

// Attempt 2: capture by reference: undefined behavior
//
//
NOTE: invalid solution, never refer stack variables
// if the frame is already deallocated.
auto makeReferenceCounter(int init = 0) {
auto value = init;
return std::make_pair(
[&value]() {
return ++value;
},
[&value]() {
return --value;
}
);
}
...auto c2 = makeReferenceCounter(0);c2.first(); // 1
c2.first(); // 2, works?
c2.second(); // ?, not really, UB!

Shared objects: correct Upwards Funarg

In fact, terms ”closure” and ”stack-allocated” don’t play well together.

Of course, if we need to deal with an Upwards Funarg, the data must be allocated on the heap. JavaScript just does it automatically in contrast with C++, which is in general pretty “manual” language when it comes to memory management.

Note: per ECMAScript specification a closure just stores a reference to a captured environment. JS engines though can optimize this case, and heap-allocate only bindings which are actually used in closures.

So we need to mimic what JS does with manually allocated heap data.

// Attemp 3: correct solution via shared pointer.auto makeSharedCounter(int init = 0) {  // Share value allocated on the heap.
auto value = std::make_shared<int>(init);
return std::make_pair( // First closure, increment.
[value] {
return ++*value;
},
// Second closure, decrement.
[value] {
return --*value;
}
);
}
...auto c3 = makeSharedCounter(0);c3.first(); // 1
c3.first(); // 2, works!
c3.second(); // 1, yes, works!

The actual pointer is still captured by value. However, the data for it is heap-allocated, so can be safely shared even if the stack-frame of the makeSharedCounter is already gone.

Unfortunately (or fortunately?), C++ doesn’t have embedded tracing garbage collector, as JavaScript does, so again we have to do it explicitly via the shared pointer (which is a form of a direct Reference Counting GC).

The shared pointer in this case explicitly underlines that the data is shared (often ”Explicit is better than implicit”, however implicit is often more convenient, as we have seen on JavaScript example).

Syntactic sugar for functors

What is lambda anyway in C++?

Well, in fact C++ had lambdas for years even before lambdas. The topic of pre-processing and transpilers is very actively used in C++.

A lambda function is just a syntactic sugar for a functor.

When we have a code like this:

int x = 10;auto fn = [x](int i) {
return i + x;
};

C++ just transpiles it to a class with an overloaded call operator:

int x = 10;class __lambda_7_14 {  // Captured binding `x`.
int x;
public: __lambda_7_14(int _x) : x{_x} {} inline int operator()(int i) {
return i + x;
}
};
__lambda_7_14 fn = __lambda_7_14{x};

As we can see, the captures are stored as private data on this class. In case of capturing by reference it just store the references there. And now it is clear why each closure receives its own copy unless captures by reference.

gcc: Leaking captures

Some compilers though may leak the captured values as mangled public properties. For example, gcc just prepends double-underscore to the captured name, and stores it in the public section:

int main() {
auto fn = [x = 2]() {
return x;
};
fn.__x = 4; return fn();
}

And the generated code:

main:
movl $4, %eax
ret

This though shouldn’t be a real problem, since double-underscored __names are reserved by C++, so a compiler/transpiler can do whatever it wants with them. This is just an interesting observation, when implementation details become observable in user-code.

Again, this is observed only with some versions of gcc, and is not observed with clang compiler.

The technique with name mangling is a standard approach used by compilers, for example Python implements __private and _protected properties just renaming them with the same underscores, and prefixing with a class name.

No capture for structured bindings (yet)

Another feature of C++ lambdas is that they can’t capture structured bindings.

The following code doesn’t compile:

auto [x, y] = std::make_tuple(1, 2);// error: 'x' in capture list does
// not name a variable:
auto fn = [x] {return x; };

The problem here is that the current version of the standard explicitly states, that lambdas can only capture variables. And structured bindings do not introduce “variables”.

An alternative solution until this is fixed in the spec, is to use actual declared variables:

int x, y;std::tie(x, y) = std::make_tuple(1, 2);auto fn = [x] { return x; }; // OK!

Current trunk version of gcc normally compiles though the original version with structured bindings, but not clang yet.

Compile-time evaluation

C++ has a pretty advanced optimizing compiler, and can evaluate the whole code at compile-time — as long as it’s a constexpr or when it can infer this automatically.

Since lambdas are just functors under the hood, and C++ can normally reduce functors to a single result value, using lambdas might be pretty cheap at runtime or even no-cost at all.

There are classic examples of evaluating Fibonacci numbers using templates in C++. Here’s another example of a full compile-time evaluation using a lambda function:

#include <memory>int main() {
auto fib = [i = 0, n = 1]() mutable {
return (i = std::exchange(n, n + i));
};
fib(); // 0
fib(); // 1
fib(); // 2
fib(); // 3
fib(); // 5
return fib(); // 8
}

And the generated code for this is just:

main:
movl $8, %eax
retq

Despite the specifics of implementations, lambda functions was a great addition to the language, and which allow now building more convenient and elegant code — and what is the main purpose of syntactic sugar in general in programing languages.

I hope this small note brought something interesting and useful to you! As usually I’ll be glad to answer any questions in comments.

--

--

Dmitry Soshnikov

Software engineer interested in learning and education. Sometimes blog on topics of programming languages theory, compilers, and ECMAScript.