Skip to content

Move Semantics

In a managed language, “transferring ownership” of a list is invisible — you assign the variable to a new name, and the runtime quietly shares one underlying buffer. There’s no concept of “did this copy a megabyte or just rebind a pointer?” because the runtime hides the difference.

C++ doesn’t hide it. By default, std::vector<float> y = compute(x); copies — allocates a fresh heap buffer, memcpys every float across. For a million-element tensor that’s a ~4 ms tax on a single line. Across a thousand-tensor forward pass, those copies become the program.

Move semantics are the language feature that says: “I’m done with this — you can take its guts.” Instead of duplicating the buffer, the move just hands over the pointer and zeroes the source. Same observable result. About 10,000× faster.

This lesson is about the machinery: , T&&, std::move, and why every modern C++ container is built on top of them.

TL;DR

  • A copy of a vector<float> of size N allocates a fresh heap buffer and memcpys N floats. A move swaps internal pointers and leaves the source empty. Same result; ~10000× faster.
  • C++11 added rvalue references (T&&) and std::move to express “I’m done with this; you can take its guts.” Modern C++ depends on them everywhere — every standard container, every smart pointer.
  • The rule of zero is the modern C++ design ideal: write classes that don’t need a destructor, copy constructor, or move constructor — let the compiler synthesize them from member types. PyTorch’s tensor classes follow this almost completely.
  • Return-value optimization () has made many naive returns “free” for years; move semantics fill in the gaps RVO can’t reach (e.g., returning one of two locals).
  • For ML systems code: heap allocations on a hot path are the enemy. Move semantics + small-buffer optimization + arena allocators are the toolkit for keeping tensors out of malloc.

The two kinds of expressions

C++ classifies every expression as either an or an :

  • lvalue: has a name you can take the address of. x, arr[3], *p. Persistent.
  • rvalue: a temporary, no name. 5, a + b, the value just returned by a function. About to disappear.

The distinction matters because rvalues are safe to steal from. They have no future user — no one will read them after this expression ends. So the language gives you a way to say “this one is movable”:

int x = 5; int& l = x; // OK — bind lvalue ref to lvalue int&& r = 5; // OK — bind rvalue ref to rvalue (the temporary 5) int& bad = 5; // ERROR — can't bind lvalue ref to rvalue int&& bad2 = x; // ERROR — x is an lvalue (it has a name)

T&& is the type that says “I want a temporary.” A function overloaded on T&& is the function that gets called when the argument is “about to disappear.”

Copy vs move, in pictures

A copy duplicates the buffer. A move steals the pointer.

In code:

void take(const std::vector<float>& v); // copy version void take(std::vector<float>&& v); // move version std::vector<float> a(1000); take(a); // calls copy version (a is an lvalue) take(std::move(a)); // calls move version; a is now in a "moved-from" state

std::move(a) is just a cast — it produces an rvalue reference to a so the move overload is selected. The actual data movement happens inside the move constructor, not inside std::move itself.

After std::move(a), a is valid but unspecified. Don’t read from it; do destroy it.

The move constructor

A class with a heap buffer should provide both ways:

struct Buffer { float* data; size_t n; // Copy: allocate + memcpy. SLOW. Buffer(const Buffer& other) : n(other.n) { data = new float[n]; std::memcpy(data, other.data, n * sizeof(float)); } // Move: steal pointer. FAST. Buffer(Buffer&& other) noexcept : data(other.data), n(other.n) { other.data = nullptr; other.n = 0; } ~Buffer() { delete[] data; } };

std::vector, std::unique_ptr, std::string, every well-behaved C++ container — all roughly this shape. The noexcept on the move constructor matters: without it, vector::resize falls back to copying for safety.

Rule of zero, three, five

  • Rule of zero: don’t define any of (, copy ctor, move ctor, copy assignment, move assignment). Let the compiler synthesize them from your member types. This is the goal.
  • Rule of three: if you need one of (destructor, copy ctor, copy assignment), you probably need all three.
  • Rule of five: same with move ctor + move assignment added.

Modern C++ favors rule-of-zero design: have your class hold its resources via std::vector, std::unique_ptr, etc., and the compiler does the right thing. Hand-written destructors are a sign you should reach for one of those wrappers instead.

RVO — when the compiler does even better

std::vector<float> make_big() { std::vector<float> v(1'000'000); // ... fill v ... return v; // RVO: constructed directly in caller's frame, no copy } auto x = make_big(); // no allocation, no copy

The compiler is allowed (and required, since C++17 for some cases) to elide the copy here. Even without std::move, returning a local by value is essentially free.

Don’t write return std::move(v) — it can defeat and force a move where the compiler would have done zero copies. return v; is right.

Where it shows up in ML

Every PyTorch op returns a Tensor by value. Internally the tensor is a small struct (~64 bytes) wrapping a refcounted storage pointer. Move semantics let you write:

y = relu(x @ w + b)

and have it produce one final tensor with no intermediate-tensor copies. The C++ side moves intermediates as it goes; the Python side never knows.

When you call .contiguous() on a non-contiguous tensor, that does allocate. Most other ops operate via views or in-place. Knowing which ops allocate (and which don’t) is the Contiguous vs Non-Contiguous discipline.

Run it in your browser — the cost of a copy

Python — editableTime list-copy vs list-swap in pure Python (no C++ compiler in the browser, but the proportion is the same).
Ctrl+Enter to run

The shape — copy scales with N, move stays O(1) — is exactly the C++ vector picture, just with Python’s GIL adding overhead.

Quick check

Fill in the blank
The cast that converts an lvalue into an rvalue (so a move constructor will be selected):
Two words including the namespace; canonical name.
Quick check
A function returns a `std::vector<float>` by value. Should you `return std::move(v)`?

Key takeaways

  1. Copies allocate; moves swap pointers. ~10000× speed gap on big buffers.
  2. T&& and std::move are the language machinery. Every standard container provides both copy and move.
  3. Rule of zero is the modern design ideal — let compiler-synthesized members do the work.
  4. Don’t return std::move(v) — it defeats copy elision.
  5. All modern ML systems code is built on this. PyTorch, every CUDA wrapper, every tensor library.

Go deeper

Prereq: Stack vs Heap. Move semantics are a way to avoid the heap traffic that a naive copy would cause.

TL;DR

  • A copy of a vector<float> of size N allocates a fresh heap buffer and memcpys N floats. A move swaps internal pointers and leaves the source empty. Same result; ~10000× faster.
  • C++11 added rvalue references (T&&) and std::move to express “I’m done with this; you can take its guts.” Modern C++ depends on them everywhere — every standard container, every smart pointer.
  • The rule of zero is the modern C++ design ideal: write classes that don’t need a destructor, copy constructor, or move constructor — let the compiler synthesize them from member types. PyTorch’s tensor classes follow this almost completely.
  • Return-value optimization (RVO) has made many naive returns “free” for years; move semantics fill in the gaps RVO can’t reach (e.g., returning one of two locals).
  • For ML systems code: heap allocations on a hot path are the enemy. Move semantics + small-buffer optimization + arena allocators are the toolkit for keeping tensors out of malloc.

Why this matters

A naïve std::vector<float> y = compute(x) could allocate a fresh buffer and copy data. With moves, it doesn’t. Across a transformer forward pass with thousands of intermediate tensors, the difference between “every result is a move” and “every result is a copy” is tens of milliseconds per token. All modern PyTorch internals, every CUDA kernel wrapper, every model-loading path is built on move semantics. Without them, the same code would run an order of magnitude slower.

Mental model

A copy duplicates the buffer. A move steals the buffer pointer and zeroes the source. Same observable outcome; very different cost.

Concrete walkthrough

Lvalues, rvalues, and T&&

Every C++ expression is either an lvalue (named, has an address) or an rvalue (temporary, no address you can name).

int x = 5; int& l = x; // OK — bind lvalue ref to lvalue int&& r = 5; // OK — bind rvalue ref to rvalue int& bad = 5; // ERROR — can't bind lvalue ref to rvalue int&& bad2 = x; // ERROR — x is an lvalue (named)

T&& is the type that says “I want a temporary.” Function overloads on T&& get called when the argument is an rvalue:

void take(const std::vector<float>& v); // copy version void take(std::vector<float>&& v); // move version std::vector<float> a(1000); take(a); // calls copy version (a is an lvalue) take(std::move(a)); // calls move version; a is now in a "moved-from" state

std::move(a) is just a cast — it produces an rvalue from a. After it, a is valid but unspecified (per the standard). Don’t read from it; do destroy it.

Move constructors and assignment

A class with a heap buffer should provide:

struct Buffer { float* data; size_t n; // Copy: allocate + memcpy. SLOW. Buffer(const Buffer& other) : n(other.n) { data = new float[n]; std::memcpy(data, other.data, n * sizeof(float)); } // Move: steal pointer. FAST. Buffer(Buffer&& other) noexcept : data(other.data), n(other.n) { other.data = nullptr; other.n = 0; } ~Buffer() { delete[] data; } };

std::vector, std::unique_ptr, std::string, and every well-behaved C++ container has roughly this shape. noexcept on the move constructor matters — without it, vector::resize falls back to copying for safety.

Rule of zero, three, and five

  • Rule of zero: don’t define any of (destructor, copy ctor, move ctor, copy assignment, move assignment). Let the compiler synthesize them from members.
  • Rule of three: if you need one of (destructor, copy ctor, copy assignment), you probably need all three.
  • Rule of five: same with move ctor + move assignment added.

Modern C++ favors rule-of-zero design: have your class hold its resources via std::vector, std::unique_ptr, etc., and the compiler does the right thing. Hand-written destructors are a sign you should reach for one of those wrappers.

Return-value optimization

std::vector<float> make_big() { std::vector<float> v(1'000'000); // ... fill v ... return v; // RVO: constructed directly in caller's frame, no copy } auto x = make_big(); // no allocation, no copy

The compiler is allowed (and required, since C++17 for some cases) to elide the copy here. Even without std::move, returning a local by value is essentially free. Don’t write return std::move(v) — it can defeat RVO and force a move when the compiler would have done zero copies.

Where it shows up in ML

Every PyTorch op returns a Tensor by value. Internally the tensor is a small struct (~64 bytes) wrapping a refcounted storage pointer. Move semantics let you write:

y = relu(x @ w + b)

and have it produce one final tensor with no intermediate-tensor copies. The C++ side moves intermediates as it goes; the Python side never knows.

When you call .contiguous() on a non-contiguous tensor, that does allocate. Most other ops operate via views or in-place. Knowing which ops allocate (and which don’t) is the Contiguous vs Non-Contiguous discipline.

Run it in your browser — the cost of a copy

Python — editableTime list-copy vs list-swap in pure Python (no C++ compiler in the browser, but the proportion is the same).
Ctrl+Enter to run

The shape — copy scales with N, move stays O(1) — is exactly the C++ vector picture, just with Python’s GIL adding overhead.

Quick check

Fill in the blank
The cast that converts an lvalue into an rvalue (so a move constructor will be selected):
Two words including the namespace; canonical name.
Quick check
A function returns a `std::vector<float>` by value. Should you `return std::move(v)`?

Key takeaways

  1. Copies allocate; moves swap pointers. ~10000× speed gap on big buffers.
  2. T&& and std::move are the language machinery. Every standard container provides both copy and move.
  3. Rule of zero is the modern design ideal — let compiler-synthesized members do the work.
  4. Don’t return std::move(v) — it defeats copy elision.
  5. All modern ML systems code is built on this. PyTorch, every CUDA wrapper, every tensor library.

Go deeper