Move Semantics
In a managed language, “transferring ownership” of a list is invisible — you assign the variable to a new name, and the runtime quietly shares one underlying buffer. There’s no concept of “did this copy a megabyte or just rebind a pointer?” because the runtime hides the difference.
C++ doesn’t hide it. By default, std::vector<float> y = compute(x); copies — allocates a fresh heap buffer, memcpys every float across. For a million-element tensor that’s a ~4 ms tax on a single line. Across a thousand-tensor forward pass, those copies become the program.
Move semantics are the language feature that says: “I’m done with this — you can take its guts.” Instead of duplicating the buffer, the move just hands over the pointer and zeroes the source. Same observable result. About 10,000× faster.
This lesson is about the machinery: , T&&, std::move, and why every modern C++ container is built on top of them.
TL;DR
- A copy of a
vector<float>of size N allocates a fresh heap buffer andmemcpys N floats. A move swaps internal pointers and leaves the source empty. Same result; ~10000× faster. - C++11 added rvalue references (
T&&) andstd::moveto express “I’m done with this; you can take its guts.” Modern C++ depends on them everywhere — every standard container, every smart pointer. - The rule of zero is the modern C++ design ideal: write classes that don’t need a destructor, copy constructor, or move constructor — let the compiler synthesize them from member types. PyTorch’s tensor classes follow this almost completely.
- Return-value optimization () has made many naive returns “free” for years; move semantics fill in the gaps RVO can’t reach (e.g., returning one of two locals).
- For ML systems code: heap allocations on a hot path are the enemy. Move semantics + small-buffer optimization + arena allocators are the toolkit for keeping tensors out of malloc.
The two kinds of expressions
C++ classifies every expression as either an or an :
- lvalue: has a name you can take the address of.
x,arr[3],*p. Persistent. - rvalue: a temporary, no name.
5,a + b, the value just returned by a function. About to disappear.
The distinction matters because rvalues are safe to steal from. They have no future user — no one will read them after this expression ends. So the language gives you a way to say “this one is movable”:
int x = 5;
int& l = x; // OK — bind lvalue ref to lvalue
int&& r = 5; // OK — bind rvalue ref to rvalue (the temporary 5)
int& bad = 5; // ERROR — can't bind lvalue ref to rvalue
int&& bad2 = x; // ERROR — x is an lvalue (it has a name)T&& is the type that says “I want a temporary.” A function overloaded on T&& is the function that gets called when the argument is “about to disappear.”
Copy vs move, in pictures
A copy duplicates the buffer. A move steals the pointer.
In code:
void take(const std::vector<float>& v); // copy version
void take(std::vector<float>&& v); // move version
std::vector<float> a(1000);
take(a); // calls copy version (a is an lvalue)
take(std::move(a)); // calls move version; a is now in a "moved-from" statestd::move(a) is just a cast — it produces an rvalue reference to a so the move overload is selected. The actual data movement happens inside the move constructor, not inside std::move itself.
After std::move(a), a is valid but unspecified. Don’t read from it; do destroy it.
The move constructor
A class with a heap buffer should provide both ways:
struct Buffer {
float* data;
size_t n;
// Copy: allocate + memcpy. SLOW.
Buffer(const Buffer& other) : n(other.n) {
data = new float[n];
std::memcpy(data, other.data, n * sizeof(float));
}
// Move: steal pointer. FAST.
Buffer(Buffer&& other) noexcept : data(other.data), n(other.n) {
other.data = nullptr;
other.n = 0;
}
~Buffer() { delete[] data; }
};std::vector, std::unique_ptr, std::string, every well-behaved C++ container — all roughly this shape. The noexcept on the move constructor matters: without it, vector::resize falls back to copying for safety.
Rule of zero, three, five
- Rule of zero: don’t define any of (, copy ctor, move ctor, copy assignment, move assignment). Let the compiler synthesize them from your member types. This is the goal.
- Rule of three: if you need one of (destructor, copy ctor, copy assignment), you probably need all three.
- Rule of five: same with move ctor + move assignment added.
Modern C++ favors rule-of-zero design: have your class hold its resources via std::vector, std::unique_ptr, etc., and the compiler does the right thing. Hand-written destructors are a sign you should reach for one of those wrappers instead.
RVO — when the compiler does even better
std::vector<float> make_big() {
std::vector<float> v(1'000'000);
// ... fill v ...
return v; // RVO: constructed directly in caller's frame, no copy
}
auto x = make_big(); // no allocation, no copyThe compiler is allowed (and required, since C++17 for some cases) to elide the copy here. Even without std::move, returning a local by value is essentially free.
Don’t write return std::move(v) — it can defeat and force a move where the compiler would have done zero copies. return v; is right.
Where it shows up in ML
Every PyTorch op returns a Tensor by value. Internally the tensor is a small struct (~64 bytes) wrapping a refcounted storage pointer. Move semantics let you write:
y = relu(x @ w + b)and have it produce one final tensor with no intermediate-tensor copies. The C++ side moves intermediates as it goes; the Python side never knows.
When you call .contiguous() on a non-contiguous tensor, that does allocate. Most other ops operate via views or in-place. Knowing which ops allocate (and which don’t) is the Contiguous vs Non-Contiguous discipline.
Run it in your browser — the cost of a copy
The shape — copy scales with N, move stays O(1) — is exactly the C++ vector picture, just with Python’s GIL adding overhead.
Quick check
Key takeaways
- Copies allocate; moves swap pointers. ~10000× speed gap on big buffers.
T&&andstd::moveare the language machinery. Every standard container provides both copy and move.- Rule of zero is the modern design ideal — let compiler-synthesized members do the work.
- Don’t
return std::move(v)— it defeats copy elision. - All modern ML systems code is built on this. PyTorch, every CUDA wrapper, every tensor library.
Go deeper
- Docscppreference — std::moveCanonical reference. The "Notes" section covers when std::move is and isn't the right call.
- VideoHoward Hinnant — Everything You Ever Wanted to Know About Move SemanticsThe author of std::move and unique_ptr. Long but the canonical talk.
- BlogHerb Sutter — GotW #93: Auto VariablesBest practical guide to when copies happen and when they're elided.
- PaperA Proposal to Add Move Semantics to C++ (N1377)The original proposal. Useful for understanding the design space.
- Docsisocpp FAQ — C++11 Language FeaturesMove semantics in context with the rest of C++11.
Prereq: Stack vs Heap. Move semantics are a way to avoid the heap traffic that a naive copy would cause.
TL;DR
- A copy of a
vector<float>of size N allocates a fresh heap buffer andmemcpys N floats. A move swaps internal pointers and leaves the source empty. Same result; ~10000× faster. - C++11 added rvalue references (
T&&) andstd::moveto express “I’m done with this; you can take its guts.” Modern C++ depends on them everywhere — every standard container, every smart pointer. - The rule of zero is the modern C++ design ideal: write classes that don’t need a destructor, copy constructor, or move constructor — let the compiler synthesize them from member types. PyTorch’s tensor classes follow this almost completely.
- Return-value optimization (RVO) has made many naive returns “free” for years; move semantics fill in the gaps RVO can’t reach (e.g., returning one of two locals).
- For ML systems code: heap allocations on a hot path are the enemy. Move semantics + small-buffer optimization + arena allocators are the toolkit for keeping tensors out of malloc.
Why this matters
A naïve std::vector<float> y = compute(x) could allocate a fresh buffer and copy data. With moves, it doesn’t. Across a transformer forward pass with thousands of intermediate tensors, the difference between “every result is a move” and “every result is a copy” is tens of milliseconds per token. All modern PyTorch internals, every CUDA kernel wrapper, every model-loading path is built on move semantics. Without them, the same code would run an order of magnitude slower.
Mental model
A copy duplicates the buffer. A move steals the buffer pointer and zeroes the source. Same observable outcome; very different cost.
Concrete walkthrough
Lvalues, rvalues, and T&&
Every C++ expression is either an lvalue (named, has an address) or an rvalue (temporary, no address you can name).
int x = 5;
int& l = x; // OK — bind lvalue ref to lvalue
int&& r = 5; // OK — bind rvalue ref to rvalue
int& bad = 5; // ERROR — can't bind lvalue ref to rvalue
int&& bad2 = x; // ERROR — x is an lvalue (named)T&& is the type that says “I want a temporary.” Function overloads on T&& get called when the argument is an rvalue:
void take(const std::vector<float>& v); // copy version
void take(std::vector<float>&& v); // move version
std::vector<float> a(1000);
take(a); // calls copy version (a is an lvalue)
take(std::move(a)); // calls move version; a is now in a "moved-from" statestd::move(a) is just a cast — it produces an rvalue from a. After it, a is valid but unspecified (per the standard). Don’t read from it; do destroy it.
Move constructors and assignment
A class with a heap buffer should provide:
struct Buffer {
float* data;
size_t n;
// Copy: allocate + memcpy. SLOW.
Buffer(const Buffer& other) : n(other.n) {
data = new float[n];
std::memcpy(data, other.data, n * sizeof(float));
}
// Move: steal pointer. FAST.
Buffer(Buffer&& other) noexcept : data(other.data), n(other.n) {
other.data = nullptr;
other.n = 0;
}
~Buffer() { delete[] data; }
};std::vector, std::unique_ptr, std::string, and every well-behaved C++ container has roughly this shape. noexcept on the move constructor matters — without it, vector::resize falls back to copying for safety.
Rule of zero, three, and five
- Rule of zero: don’t define any of (destructor, copy ctor, move ctor, copy assignment, move assignment). Let the compiler synthesize them from members.
- Rule of three: if you need one of (destructor, copy ctor, copy assignment), you probably need all three.
- Rule of five: same with move ctor + move assignment added.
Modern C++ favors rule-of-zero design: have your class hold its resources via std::vector, std::unique_ptr, etc., and the compiler does the right thing. Hand-written destructors are a sign you should reach for one of those wrappers.
Return-value optimization
std::vector<float> make_big() {
std::vector<float> v(1'000'000);
// ... fill v ...
return v; // RVO: constructed directly in caller's frame, no copy
}
auto x = make_big(); // no allocation, no copyThe compiler is allowed (and required, since C++17 for some cases) to elide the copy here. Even without std::move, returning a local by value is essentially free. Don’t write return std::move(v) — it can defeat RVO and force a move when the compiler would have done zero copies.
Where it shows up in ML
Every PyTorch op returns a Tensor by value. Internally the tensor is a small struct (~64 bytes) wrapping a refcounted storage pointer. Move semantics let you write:
y = relu(x @ w + b)and have it produce one final tensor with no intermediate-tensor copies. The C++ side moves intermediates as it goes; the Python side never knows.
When you call .contiguous() on a non-contiguous tensor, that does allocate. Most other ops operate via views or in-place. Knowing which ops allocate (and which don’t) is the Contiguous vs Non-Contiguous discipline.
Run it in your browser — the cost of a copy
The shape — copy scales with N, move stays O(1) — is exactly the C++ vector picture, just with Python’s GIL adding overhead.
Quick check
Key takeaways
- Copies allocate; moves swap pointers. ~10000× speed gap on big buffers.
T&&andstd::moveare the language machinery. Every standard container provides both copy and move.- Rule of zero is the modern design ideal — let compiler-synthesized members do the work.
- Don’t
return std::move(v)— it defeats copy elision. - All modern ML systems code is built on this. PyTorch, every CUDA wrapper, every tensor library.
Go deeper
- Docscppreference — std::moveCanonical reference. The "Notes" section covers when std::move is and isn't the right call.
- VideoHoward Hinnant — Everything You Ever Wanted to Know About Move SemanticsThe author of std::move and unique_ptr. Long but the canonical talk.
- BlogHerb Sutter — GotW #93: Auto VariablesBest practical guide to when copies happen and when they're elided.
- PaperA Proposal to Add Move Semantics to C++ (N1377)The original proposal. Useful for understanding the design space.
- Docsisocpp FAQ — C++11 Language FeaturesMove semantics in context with the rest of C++11.