2 A Quantum Particle in One Dimension 4 A Quantum Particle in Three Dimensions

3 The Formalism of Quantum Mechanics

In the previous section we made good headway in building some intuition for simple quantum systems by solving the Schrödinger equation and trying to interpret the results. But, at times, it felt like we were making up the rules as we went along. Our goal in this section is bring some order to the table and describe the mathematical structure that underlies quantum mechanics.

There are four facets of quantum mechanics. These are

•

States
•

Observables
•

Time Evolution
•

Measurement

We’ve already said all there is to say about time evolution, at least for these lectures: the evolution of the wavefunction is governed by the time-dependent Schrödinger equation. Here we will elaborate on the other three.

3.1 States

Recall the basic tenet: a quantum state is described by a normalisable wavefunction $\psi({\bf x},t)$ , where normalisable means that it obeys

\displaystyle\int d^{3}x\ |\psi|^{2}<\infty

(3.62)

Until now, it largely appears that quantum mechanics is, like many other areas of physics, all about solving differential equations. That’s not entirely wrong, but it misses the a key fact: a heart, quantum mechanics is really a theory of linear algebra. This will become apparent as we unveil the bigger picture.

We’ve already got a sniff of this the linear algebra aspect in the principle of superposition. If $\psi({\bf x},t)$ and $\phi({\bf x},t)$ are both viable states of a system, then so too is any linear combination $\alpha\psi({\bf x},t)+\beta\phi({\bf x},t)$ where $\alpha,\beta\in{\bf C}$ . Mathematically, this is the statement that the states form a vector space over the complex numbers.

3.1.1 The Inner Product

There is one important structure on our vector space: an inner product. This is a map that takes two vectors and spits out a number.

We’re very used to inner products for finite dimensional vector spaces. Given two vectors in ${\bf R}^{N}$ – call them $\vec{u}$ and $\vec{v}$ – the inner product is simply $\vec{u}\cdot\vec{v}$ . We introduce new notation and write the inner product on ${\bf R}^{N}$ as

\displaystyle\langle\vec{u}\,|\,\vec{v}\rangle=\vec{u}\cdot\vec{v}

The vertical line in the symbol on the left-hand side is a quantum mechanical affectation. In other areas of mathematics, the inner product is usually denoted as $\langle\vec{u},\vec{v}\rangle$ , or sometimes $(\vec{u},\vec{v})$ . The notation with the vertical line in part of a wider set of conventions used in quantum mechanics known as Dirac notation. For our present purposes, the vertical line will be the only sign that we’re early adopters of this notation.

A slight amendment to the inner product is needed if we’re dealing with a complex vector space like ${\bf C}^{N}$ . In this case, the inner product is

\displaystyle\langle\vec{u}\,|\,\vec{v}\rangle=\vec{u}^{\,\star}\cdot\vec{v}

(3.63)

where we take the complex conjugation of the first vector before we take the dot product. This has the advantage that the inner product of any vector with itself is necessarily non-negative

\displaystyle\langle\vec{u}\,|\,\vec{u}\rangle=\vec{u}^{\,\star}\cdot\vec{u}\ \geq 0

This means that we get to use the inner product to define the length of a vector $||\vec{u}||=\sqrt{\langle\vec{u}\,|\,\vec{u}\rangle}$ . Inner products with this property are said to be positive definite.

This now brings us to the infinite dimensional vector space of functions that we work with in quantum mechanics. The inner product between two complex functions $\psi({\bf x})$ and $\phi({\bf x})$ , each functions over ${\bf R}^{3}$ , is defined by

\displaystyle\langle\psi\,|\,\phi\rangle=\int d^{3}x\ \psi^{\star}({\bf x})% \phi({\bf x})

(3.64)

Note, again, the star in the first argument which ensures that the inner product of a function with itself is positive definite. In fact, our normalisability requirement (3.62) can be restated in terms of the inner product

\displaystyle\langle\psi\,|\,\psi\rangle\leq\infty

From the definition, we see that the inner product is linear in the second argument

\displaystyle\langle\psi\,|\,\alpha\phi_{1}+\phi_{2}\rangle=\alpha\langle\psi% \,|\,\phi_{1}\rangle+\langle\psi\,|\,\phi_{2}\rangle

for any $\alpha\in{\bf C}$ . However, it is anti-linear in the first argument, which simply means that the complex conjugation on $\psi$ gives us

\displaystyle\langle\alpha\psi_{1}+\psi_{2}\,|\,\phi\rangle=\alpha^{\star}% \langle\psi_{1}\,|\,\phi\rangle+\langle\psi_{2}\,|\,\phi\rangle

Alternatively, this anti-linear statement follows by noticing that complex conjugation exchanges the two entries in the inner product

\displaystyle\langle\psi\,|\,\phi\rangle^{\star}=\langle\phi\,|\,\psi\rangle

We’ll make use of some of these simple mathematical identities as we proceed.

In more fancy mathematical language, a vector space, whether finite or infinite dimensional, with a positive-definite inner product is called a Hilbert space ${\cal H}$ . (In fact this statement is only almost true: there is an extra requirement in the case of an infinite dimensional vector space that the space is “complete” which, roughly speaking, means that as your vector evolves it can’t suddenly exit the space.)

Hilbert space is the stage for quantum mechanics. The first thing you should do when describing any quantum system is specify its Hilbert space. Indeed, you can already see this in the examples from the last section: if we’re dealing with a particle on a line or particle on a circle then states are specified by normalisable functions on the line or circle respectively.

Again some fancy mathematical language. If a particle is moving on some space $M$ then the relevant Hilbert space is called ${\cal H}=L^{2}(M)$ , which is the space of square-normalisable functions on $M$ . Here the word “square” and the superscript 2 both reflect the fact that you should integrate $|\psi|^{2}$ to determine whether it’s normalisable. The Hilbert spaces in the last section were ${\cal H}=L^{2}({\bf R})$ and ${\cal H}=L^{2}({\bf S}^{1})$ .

As our theories get more advanced and abstract, we need different Hilbert spaces. Sometimes these are simpler: for example, there is a lot of interesting physics lurking in the finite dimensional Hilbert space ${\cal H}={\bf C}^{2}$ where a state is just a two-dimensional complex vector. But sometimes the Hilbert spaces are vastly more complicated, like the space in quantum field theory where $M$ itself is an infinite dimensional space of functions and $L^{2}(M)$ is something ghastly and poorly understood. In these lectures we’ll meet nothing more complicated that ${\cal H}=L^{2}({\bf R}^{3})$ , the space of normalisable functions on ${\bf R}^{3}$ .

3.2 Operators and Observables

The quantum state is described by a wavefunction. But how does this wavefunction encode the information contained in the state?

It’s useful to first return to the classical world. There we know that the state of a particle is described by its position ${\bf x}$ and velocity $\dot{{\bf x}}$ . Equivalently, but more usefully for what’s to come, we can say that it’s encoded in the position ${\bf x}$ and momentum ${\bf p}=m\dot{{\bf x}}$ of the particle.

Given the state, the questions that we can ask are all possible functions of ${\bf x}$ and ${\bf p}$ . We call these observables. Among the more trivial observables are position and momentum themselves. But we can also use our information of the state to calculate other observables such as angular momentum

\displaystyle{\bf L}={\bf x}\times{\bf p}

or energy

\displaystyle E=\frac{{\bf p}^{2}}{2m}+V({\bf x})

In general, any function $f({\bf x},{\bf p})$ can be considered a classical observable. Clearly all of them are simple to evaluate once you know the state ${\bf x}$ and ${\bf p}$ .

What about in the quantum world? The state is now the wavefunction $\psi({\bf x})$ which, abstractly, we think of as an vector in a Hilbert space. What is the observable associated to, say, momentum or angular momentum?

In quantum mechanics, all observables are represented by operators on the Hilbert space. In the present case, this is an object $\hat{O}$ that acts on the wavefunction $\psi({\bf x})$ and gives you back a new function. Throughout these lectures we will dress any such operator with a hat. (As we mentioned previously, this convention tends to get dropped in later quantum courses.)

To be more precise, observables in quantum mechanics are related to a particular kind of linear operator. Here “linear” means that operators obey

\displaystyle\hat{O}\left[\alpha\psi_{1}({\bf x})+\psi_{2}({\bf x})\right]=% \alpha\hat{O}[\psi_{1}({\bf x})]+\hat{O}[\psi_{2}({\bf x})]

for any $\alpha\in{\bf C}$ .

Returning to our analogy with finite dimensional vector spaces: if you’re dealing with $N$ -dimensional vectors, then the corresponding linear operators are just $N\times N$ matrices. Now, however, we’re dealing with infinite dimensional vector spaces – or, more precisely, Hilbert spaces – where the vectors themselves are functions. What replaces the matrices? The answer, as we will now see, is differential operators.

We’ll start by giving some physical examples. As this section proceeds, we’ll see what it actually means to say that an observable is represented by an operator.

•

For a particle moving in 3d there are three position operators, one for each direction. Combining these into a vector, they are simply $\hat{{\bf x}}={\bf x}$ . These operators act on a wavefunction by multiplication

$\displaystyle\hat{x}\psi({\bf x})=x\psi({\bf x})\ ,\ \ \ \hat{y}\psi({\bf x})=% y\psi({\bf x})\ ,\ \ \ \hat{z}\psi({\bf x})=z\psi({\bf x})$

You can also take any function of ${\bf x}$ and turn it into an operator. This acts in the obvious way, with $V(\hat{{\bf x}})\psi({\bf x})=V({\bf x})\psi({\bf x})$ .
•

The momentum operator for a particle moving in 3d is

$\displaystyle\hat{{\bf p}}=-i\hbar\nabla$

or, written out long hand,

$\displaystyle\hat{p}_{x}=-i\hbar\frac{\partial{}}{\partial{x}}\ ,\ \ \ \hat{p}% _{y}=-i\hbar\frac{\partial{}}{\partial{y}}\ ,\ \ \ \hat{p}_{z}=-i\hbar\frac{% \partial{}}{\partial{z}}$

Again, these should be thought of in terms their action on the wavefunctions so, for example,

$\displaystyle\hat{p}_{x}\psi({\bf x})=-i\hbar\frac{\partial{\psi}}{\partial{x}}$
•

Other operators now follow from our specification of position and momentum. The angular momentum operator is shorthand for a collection of three such operators,

$\displaystyle\hat{\bf L}=-i\hbar\,{\bf x}\times\nabla$

We’ll devote Section 4.1 to a detailed study of the properties of this operator. Finally the energy operator is

$\displaystyle\hat{H}=-\frac{\hbar^{2}}{2m}\nabla^{2}+V({\bf x})$

The energy operator is the only one that gets a special name, with $\hat{H}$ for Hamiltonian rather than $\hat{E}$ for energy. This is because of the special role it plays in time evolution of the state through the Schrödinger equation.

3.2.1 Eigenfunctions and Eigenvalues

Given an $N\times N$ matrix $A$ , it is often useful to determine its eigenvalues. Recall that these are a collection of (up to) $N$ numbers $\lambda$ which solve the equation

\displaystyle A\vec{u}=\lambda\vec{u}\ \ \mbox{for some vector $\vec{u}$}

The corresponding vector $\vec{u}$ is then called the eigenvector.

We can play the same game with operators. Given an operator $\hat{O}$ , its eigenvalues solve the equation

\displaystyle\hat{O}\psi({\bf x})=\lambda\psi({\bf x})\ \ \mbox{for some % function $\psi({\bf x})$}

The corresponding function $\psi({\bf x})$ is called the eigenfunction or eigenstate. The collection of all eigenvalues is called the spectrum of the operator $\hat{O}$ .

This has an important interpretation in physics. If you measure the value of a physical observable $\hat{O}$ then the answer that you get will be one of the eigenvalues of $\hat{O}$ . This is the key idea that relates operators to observables and deserves a special place in a box:

The outcome of any measurement of

\hat{O}

lies in the spectrum of

\hat{O}

We’ve already met this idea in the previous section. The time independent Schrödinger equation is simply the eigenvalue equation for the Hamiltonian

\displaystyle\hat{H}\psi=E\psi

The spectrum of the Hamiltonian determines the possible energy levels of the quantum system. In most cases this spectrum is discrete, like the harmonic oscillator. In other simple, but mildly pathological cases, the spectrum is continuous, like a particle on a line.

All physical observables have an associated eigenvalue equation. For the momentum operator this is

\displaystyle\hat{{\bf p}}\psi={\bf p}\psi\ \ \ \Rightarrow\ \ \ -i\hbar\nabla% \psi={\bf p}\psi

Eigenvalue equations in quantum mechanics often look like the first equation above where you have to squint to notice the hat on the left-hand side which elevates the statement above a tautology. (As you can imagine, this has the potential to get confusing when the hat is dropped in subsequent lectures!)

The momentum eigenvalue equation can easily be solved. The (unnormalised) momentum eigenstates are

\displaystyle\psi=e^{i{\bf k}\cdot{\bf x}}\ \ \ \Rightarrow\ \ \ \ \hat{{\bf p% }}\psi=\hbar{\bf k}\psi

These are called plane wave states on account of the fact that the wave is in the ${\bf k}$ direction and the function is uniform in the plane perpendicular to ${\bf k}$ . The associated momentum eigenvalue is ${\bf p}=\hbar{\bf k}$ . As we mentioned previously, this is the grown-up version of de Broglie’s relationship between momentum and wavelength.

As we saw in Section 2, there’s something a little bit fiddly about momentum eigenstates. Everything is nice if we put the particle on a finite volume space, say a torus ${\bf T}^{3}$ . In fancy language, we take the Hilbert space to be $L^{2}({\bf T}^{3})$ . This means that each spatial direction is periodic. We’ll take the sizes of the sizes of the three dimensions to be $L_{i}$ , with $i=1,2,3$ , meaning that we identify

\displaystyle x_{i}\equiv x_{i}+L_{i}\ \ \ a=1,2,3

The requirement that the wavefunctions are single valued functions then tells us that

\displaystyle e^{ik_{i}L_{i}}=1\ \ \ \ \mbox{no sum over $i$}

which is satisfied provided that the wavenumbers only take certain discrete values

\displaystyle k_{i}=\frac{2\pi n}{L_{i}}\ \ \ \Rightarrow\ \ \ p_{i}=\frac{2% \pi\hbar n}{L_{i}}\ \ \ \mbox{with $n\in{\bf Z}$}

So when the particle lives on a torus, the spectrum of the momentum operator is discrete. Moreover, in this case there’s no difficulty in normalising the momentum eigenstates $e^{i{\bf k}\cdot{\bf x}}$ .

In contrast if the particles lives in an infinite space, so the Hilbert space is $L^{2}({\bf R}^{3})$ , then the spectrum of the momentum operator is continuous. But the corresponding eigenstates $e^{i{\bf k}\cdot{\bf x}}$ do not strictly speaking live in the Hilbert space $L^{2}({\bf R}^{3})$ because they’re not $L^{2}$ ! This is the same issue that we saw for a one-dimensional particle. It’s a slightly annoying technical feature of quantum mechanics.

Finally, what about the position operator? To avoid any unnecessary confusion, I’ll briefly revert to a one-dimensional system. A position eigenstate must then obey

\displaystyle\hat{x}\psi(x)=x\psi(x)=X\psi(x)

(3.65)

where $X\in{\bf R}$ is the eigenvalue. This is a strange looking equation. Clearly we must have $\psi(x)=0$ whenever $x\neq X$ , but the equation does leave open the possibility that $\psi(X)\neq 0$ . There’s no function with this property, but there is a generalised function that does the job. This is the Dirac delta function $\delta(x-X)$ , roughly defined as

\displaystyle\delta(x-X)=\left\{\begin{array}[]{cc}\infty&x=X\\ 0&{\rm otherwise}\end{array}\right.

but with the additional property that it integrates to something nice:

\displaystyle\int_{-\infty}^{+\infty}dx\ \delta(x-X)=1

To check that $\psi(x)=\delta(x-X)$ does indeed solve (3.65), we can integrate it against an arbitrary function $f(x)$ . We want to check that the following holds

\displaystyle\int_{-\infty}^{+\infty}dx\ x\delta(x-X)f(x)\stackrel{{% \scriptstyle?}}{{=}}\int_{-\infty}^{+\infty}dx\ X\delta(x-X)f(x)

for all $f(x)$ . But this is true: both sides evaluate to $Xf(X)$ .

We see that the spectrum of the position operator is continuous, with the eigenvalue any $X\in{\bf R}$ . But, as with the momentum operator, the eigenfunctions are not normalisable.

\displaystyle\int_{-\infty}^{+\infty}dx\ |\psi|^{2}=\int_{-\infty}^{+\infty}dx% \ |\delta(x-X)|^{2}=\delta(0)=\infty

This time it doesn’t help if you put your system on a periodic space: the position eigenstates remain non-normalisable. The only get out is to consider space itself to be a discrete lattice. This is an interesting direction and will be discussed in the lectures on Solid State Physics. However, just like the plane wave momentum eigenstates, the position eigenstates are too useful to throw away completely.

There is a straightforward generalisation of the story above to position eigenstates in 3d. Now we have

\displaystyle{\bf x}\psi({\bf x})={\bf X}\psi({\bf x})\ \ \ \Rightarrow\ \ \ % \psi({\bf x})=\delta^{3}({\bf x}-{\bf X})=\delta(x-X)\delta(y-Y)\delta(z-Z)

Again, these states are non-normalisable.

3.2.2 Hermitian Operators

Not any old linear operator qualifies as a physical observable in quantum mechanics. We should restrict attention to those operators that are Hermitian.

To explain this, we first introduce the concept of an adjoint operator. Given an operator $\hat{O}$ , its adjoint $\hat{O}^{\dagger}$ is defined by the requirement that the following inner-product relation holds

\displaystyle\langle\psi\,|\,\hat{O}\phi\rangle=\langle\hat{O}^{\dagger}\psi\,% |\,\phi\rangle

for all states $\psi$ and $\phi$ . Written in terms of integrals of wavefunctions, this means

\displaystyle\int d^{3}x\ \psi^{\star}\hat{O}\phi=\int d^{3}x\ (\hat{O}^{% \dagger}\psi)^{\star}\phi

An operator is said to be Hermitian or self-adjoint if

\displaystyle\hat{O}=\hat{O}^{\dagger}

(In fact there’s a tiny bit of a lie in that previous statement. The terms “Hermitian” and “self-adjoint” are only synonymous for finite dimensional Hilbert spaces. When we’re dealing with infinite dimensional Hilbert spaces of functions there is a subtle distinction between them that we’re not going to get into in these lectures.)

All physical observables correspond to Hermitian operators. Let’s check that this is the case for the operators that we just met. First the position operator. We have

\displaystyle\langle\psi\,|\,\hat{{\bf x}}\phi\rangle=\int d^{3}x\ \psi^{\star% }\,{\bf x}\phi=\int d^{3}x\ ({\bf x}\psi)^{\star}\phi=\langle\hat{{\bf x}}\psi% \,|\,\phi\rangle

So we learn that $\hat{{\bf x}}=\hat{{\bf x}}^{\dagger}$ as promised. Here the calculation was trivial: it is just a matter of running through definitions. The calculation for the momentum operator is marginally less trivial. We have

$\displaystyle\langle\psi\,\|\,\hat{{\bf p}}\phi\rangle$	$\displaystyle=$	$\displaystyle\int d^{3}x\ \psi^{\star}\left(-i\hbar\nabla\phi\right)$
	$\displaystyle=$	$\displaystyle\int d^{3}x\ i\hbar(\nabla\psi^{\star})\phi$
	$\displaystyle=$	$\displaystyle\int d^{3}x\ (-i\hbar\nabla\psi)^{\star}\phi=\langle\hat{{\bf p}}% \psi\,\|\,\phi\rangle$

In the second line we integrated by parts, picking up one minus sign, and in the third line we took the factor of $i$ inside the brackets with the complex conjugation giving us a second minus sign. The two minus signs cancel out to give $\hat{{\bf p}}=\hat{{\bf p}}^{\dagger}$ . This is one way to see why the momentum operator necessarily comes with a factor of $i$ .

Note that, when we integrated by parts, we threw away the total derivative. This is justified on the grounds that the states $\psi$ and $\phi$ are normalisable and so asymptote to zero at infinity.

The Hermiticity of the Hamiltonian follows immediately because it’s a real function of $\hat{x}$ and $\hat{p}$ . Alternatively, you check explicitly that the Hamiltonian is Hermitian by doing the kind of integration-by-parts calculation that we saw above.

Properties of Hermitian Matrices

Hermitian operators have a number of properties that make them the right candidates for physical observables. To explain these, we’re first going to regress to the simpler world of $N\times N$ complex matrices acting on $N$ -dimensional complex vectors.

A complex matrix $A$ has is said to be Hermitian is it obeys $A^{\dagger}:=(A^{\star})^{T}=A$ . This means that the components of the matrix obey $A_{ij}=A^{\star}_{ji}$ . To see that this coincides with our previous definition, recall that the inner product for two complex vectors is (3.63)

\displaystyle\langle\vec{u}\,|\,\vec{v}\rangle=\vec{u}^{\star}\cdot\vec{v}

For a Hermitian matrix $A$ we have

\displaystyle\langle\vec{u}\,|\,A\vec{v}\rangle=\vec{u}^{\star}\cdot A\vec{v}=% u^{\star}_{i}A_{ij}v_{j}=u_{i}^{\star}A_{ji}^{\star}v_{j}=(A\vec{u})^{\star}% \cdot\vec{v}=\langle A\vec{u}\,|\,\vec{v}\rangle

in agreement with our earlier, more general definition.

The eigenvalues and eigenvectors of a Hermitian matrix $A$ are determined by the equation

\displaystyle A\vec{u}_{n}=\lambda_{n}\vec{u}_{n}

where $n=1,\ldots,N$ labels the eigenvalues and eigenvectors. These have a number special properties. The first two are:

•

The eigenvalues $\lambda_{n}$ are real.
•

If two eigenvalues are distinct $\lambda_{n}\neq\lambda_{m}$ then their corresponding eigenvectors are orthogonal: $\vec{u}_{n}^{\star}\cdot\vec{u}_{m}=0$ .

Both are simple to prove. We take the inner product of $A$ between two different eigenvectors.

	$\displaystyle\langle\vec{u}_{n}\,\|\,A\vec{u}_{m}\rangle=\langle A\vec{u}_{n}\,% \|\,\vec{u}_{m}\rangle$	$\displaystyle\Rightarrow$	$\displaystyle\ \ \ \lambda_{n}^{\star}\vec{u}^{\star}_{n}\cdot\vec{u}_{m}=% \lambda_{m}\vec{u}^{\star}_{n}\cdot\vec{u}_{m}$
		$\displaystyle\Rightarrow$	$\displaystyle\ \ \ (\lambda_{n}^{\star}-\lambda_{m})\,\vec{u}_{n}^{\star}\cdot% \vec{u}_{m}=0$

Set $n=m$ to find $\lambda_{n}=\lambda^{\star}_{n}$ so that the eigenvalue is real. Alternatively, pick distinct eigenvalues $\lambda_{n}\neq\lambda_{m}$ to show that the eigenvectors are orthogonal, $\vec{u}_{n}^{\star}\cdot\vec{u}_{m}=0$ if $m\neq n$ .

If the matrix has two or more eigenvalues that coincide, then it’s still possible to pick the eigenvectors to be orthogonal. We won’t prove it this statement, but it follows straightforwardly by just picking an orthogonal basis for the space spanned by the eigenvectors.

There is one final property of Hermitian matrices that we won’t prove: they always have $N$ eigenvectors. This is important. Because the eigenvectors can be taken to be orthogonal, they necessarily span the space ${\bf C}^{N}$ , meaning that any vector can be expanded in terms of eigenvectors

\displaystyle\vec{v}=\sum_{n=1}^{N}a_{n}\vec{u}_{n}

for some complex coefficients $a_{n}$ .

With these results in mind, let’s now return to Hermitian operators that act on infinite dimensional Hilbert spaces.

Properties of Hermitian Operators

The eigenvalue for a Hermitian operator is

\displaystyle\hat{O}\phi_{n}=\lambda_{n}\phi_{n}

where $\lambda_{n}$ is the eigenvalue and we’ve shifted notation slightly from our earlier discussion and will refer to eigenfunctions as $\phi_{n}$ . Now the label runs over $n=1,\ldots,\infty$ reflecting the fact that we’ve got an infinite dimensional Hilbert space.

For a Hermitian operator $\hat{O}$ , the eigenvalues and eigenfunctions have the following two properties:

•

The eigenvalues $\lambda_{n}$ are real.
•

If two eigenvalues are distinct $\lambda_{n}\neq\lambda_{m}$ then their corresponding eigenfunctions are orthogonal: $\langle\phi_{n}\,|\,\phi_{m}\rangle=0$ .

Recall our important boxed statement from earlier: the outcome of any measurement of $\hat{O}$ is given by one of the eigenvalues of $\hat{O}$ . The first of the properties above ensures that, happily, the result of any measurement is guaranteed to be a real number. That’s a good thing. If you go into a lab you’ll see dials and needles and rulers and other complicated things. All of them measure real numbers, never complex.

The proof of the two properties above is identical to the proof for finite matrices. If $\hat{O}$ is Hermitian, we have

	$\displaystyle\langle{\phi}_{n}\,\|\,\hat{O}{\phi}_{m}\rangle=\langle\hat{O}{% \phi}_{n}\,\|\,{\phi}_{m}\rangle$	$\displaystyle\Rightarrow$	$\displaystyle\ \ \ \lambda_{n}^{\star}\langle\phi_{n}\,\|\,\phi_{m}\rangle=% \lambda_{m}\langle\phi_{n}\,\|\,\phi_{m}\rangle$
		$\displaystyle\Rightarrow$	$\displaystyle\ \ \ (\lambda_{n}^{\star}-\lambda_{m})\,\langle\phi_{n}\,\|\,\phi% _{m}\rangle=0$

Agan, set $n=m$ to find $\lambda_{n}=\lambda^{\star}_{n}$ so that the eigenvalue is real. Alternatively, pick distinct eigenvalues $\lambda_{n}\neq\lambda_{m}$ to show that the eigenfunctions are orthogonal, which now means

\displaystyle\langle\phi_{n}\,|\,\phi_{m}\rangle=\int d^{3}x\ \phi_{n}^{\star}% \phi_{m}=0\ \ \ {\rm if}\ m\neq n

An operator with distinct eigenvalues is said to have a non-degenerate spectrum. If, on the other hand, two or more eigenvalues coincide then we say it has a degenerate spectrum. In this case, as with matrices, it’s still possible to pick orthogonal eigenfunctions. Again, we won’t prove this. However, it means that in all cases, if we normalise the eigenstates then we can always take them to obey

\displaystyle\langle\phi_{n}\,|\,\phi_{m}\rangle=\int d^{3}x\ \phi_{n}^{\star}% \phi_{m}=\delta_{mn}

(3.66)

Eigenfunctions that obey this are said to be orthonormal.

Finally, one last important statement that we make without proof: the eigenfunctions of any Hermitian operator are complete. This means that we can expand any wavefunction $\psi({\bf x})$ in terms of eigenstates of a given operator $\hat{O}$

\displaystyle\psi({\bf x})=\sum_{n=1}^{\infty}a_{n}\phi_{n}({\bf x})

(3.67)

for some complex coefficients $a_{n}$ . If you know both $\psi$ and the normalised eigenfunctions $\phi_{n}$ , then it’s simple to get an expression for these coefficients: we use the orhonormality relation (3.66), together with linearity of the inner product, to get

\displaystyle\langle\phi_{n}\,|\,\psi\rangle=\sum_{m}a_{m}\langle\phi_{n}\,|\,% \phi_{m}\rangle=\sum_{m}a_{n}\delta_{mn}=a_{n}

Or, in terms of integrals,

\displaystyle a_{n}=\int d^{3}x\ \phi_{n}^{\star}({\bf x})\,\psi({\bf x})

Relatedly, the norm of a wavefunction $\psi({\bf x})$ is

\displaystyle||\psi({\bf x})||^{2}:=\int d^{3}x\ |\psi({\bf x})|^{2}=\sum_{m}% \sum_{n}\int d^{3}x\ a_{n}^{\star}\phi^{\star}_{n}\,a_{m}\phi_{m}=\sum_{n}|a_{% n}|^{2}

where we have again used the orthonormality condition (3.66). We see that a wavefunction is normalised only in $\sum_{n}|a_{n}|^{2}=1$ .

3.2.3 Momentum Eigenstates and the Fourier Transform

The idea that any any function can be expressed as a sum of eigenstates of a Hermitian operator is not totally unfamiliar. When the operator in question is the momentum operator $\hat{p}$ , this coincides with the Fourier transform of a function.

To keep things simple, we’ll work in one dimension. We’ll start by considering a particle on a circle of radius $R$ , in which case the momentum eigenstates solve

\displaystyle\hat{p}\phi_{n}(x)=-i\hbar\frac{d\phi_{n}}{dx}=p_{n}\phi_{n}(x)

As we saw above, the eigenstates and eigenvalues are

\displaystyle\phi_{n}=\sqrt{\frac{1}{2\pi R}}\,e^{inx/R}\ \ \ {\rm and}\ \ \ p% _{n}=\frac{\hbar n}{2\pi R}\ ,\ n\in{\bf Z}

Notice that we’ve now normalised the wavefunction. You can check that the eigenstates are orthonormal, obeying

\displaystyle\langle\phi_{n}\,|\,\phi_{m}\rangle=\frac{1}{2\pi R}\int_{0}^{2% \pi R}dx\ e^{i(m-n)x/R}=\delta_{mn}

Any periodic complex function on the circle, such that $\psi(x)=\psi(x+2\pi R)$ , can be expanded in terms of these eigenstates, meaning that

\displaystyle\psi(x)=\sqrt{\frac{1}{2\pi R}}\,\sum_{n\in{\bf Z}}a_{n}e^{inx/R}

where, in contrast to our previous expression (3.67), we now label the eigenstates by the integers $n$ rather than just the positive integers. (There are equivalent…welcome to Hilbert’s hotel.) We have

\displaystyle a_{n}=\int_{0}^{2\pi R}dx\ \phi_{n}^{\star}\psi=\sqrt{\frac{1}{2% \pi R}}\,\int_{0}^{2\pi R}dx\ e^{-inx/R}\psi(x)

In this context, $a_{n}$ are known as Fourier coefficients.

We can also think about what happens to the particle on the line. Now the eigenstates are labelled by a continuous variable $k\in{\bf R}$ rather than a discrete variable,

\displaystyle\phi_{k}(x)=\sqrt{\frac{1}{2\pi}}e^{ikx}

As we’ve seen several time, these states are non-normalisable. Nonetheless, there is a version of the orthogonality relation (3.66), which is now

\displaystyle\langle\phi_{k}\,|\,\phi_{k^{\prime}}\rangle=\frac{1}{2\pi}\int dx% \ e^{i(k-k^{\prime})x}=\delta(k-k^{\prime})

(3.68)

where $\delta(k)$ is the Dirac delta function. Similarly, we can expand a function $\psi(x)$ in these eigenstates, with the sum over states in (3.67) now replaced by an integral

\displaystyle\psi(x)=\sqrt{\frac{1}{2\pi}}\int_{-\infty}^{+\infty}dk\ \tilde{% \psi}(k)\,e^{ikx}

(3.69)

The Fourier coefficients $a_{n}$ are now promoted to full function $\tilde{\psi}(k)$ . We can use the orthogonality condition (3.68) to invert the expression above and write $\tilde{\psi}(k)$ in terms of the original function $\psi(x)$ ,

\displaystyle\tilde{\psi}(k)=\sqrt{\frac{1}{2\pi}}\int_{-\infty}^{+\infty}dx\ % {\psi}(x)\,e^{-ikx}

(3.70)

The relations (3.69) and (3.70) are, as advertised, the Fourier transform of a function.

3.3 Measurement

Now we turn to the final facet of quantum mechanics: measurement. Suppose that we have a quantum system in a given state $\psi({\bf x})$ . We want to measure an observable that is associated to an operator $\hat{O}$ whose eigenvalue equation is

\displaystyle\hat{O}\phi_{n}=\lambda_{n}\phi_{n}

We stated before that the possible result of any measurement is taken from the spectrum of eigenvalues $\{\lambda_{n}\}$ . But how does this result depend on the state $\psi$ ?

Here is the answer. We first decomposes the state $\psi$ in terms of orthonormal eigenstates of $\hat{O}$ ,

\displaystyle\psi({\bf x})=\sum_{n}a_{n}\phi_{n}({\bf x})

We will take both $\phi_{n}$ and the wavefunction $\psi$ to be normalised, which means that $\sum_{n}|a_{n}|^{2}=1$ . Assuming that the spectrum is non-degenerate, the probability that the result of the measurement gives the answer $\lambda_{n}$ is given by the Born rule,

\displaystyle{\rm Prob}(\lambda_{n})=|a_{n}|^{2}

(3.71)

A normalised wavefunction ensures that the sum of probabilities is $\sum_{n}{\rm Prob}(\lambda_{n})=1$ .

We make a measurement and get the result $\lambda_{n}$ for some particular $n$ . Now there’s no doubt about the value of the observable: we’ve just measured it and know that it’s $\lambda_{n}$ . To reflect this, there is a “collapse of the wavefunction” which jumps to

\displaystyle\psi({\bf x})\ \longrightarrow\ \phi_{n}({\bf x})

(3.72)

This ensures that if we make a second measurement of $\hat{O}$ immediately after the first then we get the same answer: $\lambda_{n}$ .

The discontinuous jump of the wavefunction (3.72) is very different in character from the smooth evolution under the guidance of the time dependent Schrödinger equation. In particular, the collapse of the wavefunction is not even a linear operation: if you add two wavefunctions together, you first have to normalise the sum to compute the probability and that’s a simple, but non-linear calculation.

Given that measurement of quantum states is performed by stuff that’s made out of atoms and therefore should itself be governed by the Schrödinger equation, it seems like there should be a more unified description of how states evolve. We’ll make a few further comments about this in Section 3.5.

There’s a simple generalisation of the story above when the spectrum is degenerate. In that case, the probability that we get a given result $\lambda$ is

\displaystyle{\rm Prob}(\lambda)=\sum_{n\,|\,\lambda_{n}=\lambda}|a_{n}|^{2}

(3.73)

After the measurement, the wavefunction collapses to $\psi\,\longrightarrow\,C\sum_{n|\lambda_{n}=\lambda}a_{n}\phi_{n}$ where $C$ is the appropriate normalisation factor.

First One Thing, Then The Other

Let’s look at one obvious implication of the measurement procedure described above. Suppose that we measure a (non-degenerate) observable $\hat{O}$ and get a definite result, say $\lambda_{6}$ . Immediately after the measurement the system sits in the state

\displaystyle\psi({\bf x})=\phi_{6}({\bf x})

We already noted that if we again measure $\hat{O}$ then there’s no doubt about the outcome: we get the result $\lambda_{6}$ again for sure. The general statement is that the system has a definite value for some observable only if it sits in an eigenstate of that observable.

Now we measure a different observable, $\hat{M}$ . In general, this will have a different set of eigenstates,

\displaystyle\hat{M}\chi_{n}({\bf x})=\mu_{n}\chi_{n}({\bf x})

and a different set of outcomes $\{\mu_{n}\}$ . There’s no reason that the eigenstates of $\hat{M}$ should have anything to do with the eigenstates of $\hat{O}$ : they will typically be entirely different functions. We now have to play the game all over again: we take our state, which we know is $\phi_{6}$ , and expand

\displaystyle\phi_{6}({\bf x})=\sum_{m}b_{m}\chi_{m}({\bf x})

There is no certainty about what we get when we measure $\hat{M}$ : the result $\mu_{m}$ appears with probability ${\rm Prob}(\mu_{m})=|b_{m}|^{2}$ and the wavefunction will immediately collapse to the corresponding eigenstate $\chi_{m}({\bf x})$ . Suppose that we do the experiment and find the result $\mu_{17}$ . Now the wavefunction collapses again, to

\displaystyle\phi_{6}({\bf x})\ \longrightarrow\chi_{17}({\bf x})

Now comes the rub. We are capricious and decide to go back and measure the original observable $\hat{O}$ . It wasn’t so long ago that we measured it and found the result $\lambda_{6}$ . But now, having measured $\hat{M}$ , there’s no guarantee that we’ll get $\lambda_{6}$ again! Instead, we need to go through the same process and expand our wavefunction

\displaystyle\chi_{17}({\bf x})=\sum_{n}c_{n}\phi_{n}({\bf x})

The probability that we get the result $\lambda_{6}$ again is $|c_{6}|^{2}$ . But now it’s just one among many options. There’s no reason to have $|c_{6}|=1$ .

There are different words that we can drape around this. First, it’s clear that the measurement of a quantum system is never innocent. You can’t just take a small peek and walk away pretending that you’ve not done anything. Instead, any measurement of a quantum system necessarily disturbs the state.

Moreover, if the state originally had a specific value for one observable $\hat{O}$ and you measure a different observable $\hat{M}$ then you destroy the original property of the state. It’s tempting to say that there’s no way to know the values of both $\hat{O}$ and $\hat{M}$ at the same time. But that’s not what the mathematics is telling us. Instead, the quantum particle can’t have specific values of $\hat{O}$ and $\hat{M}$ at the same time. Indeed, most of the time it doesn’t have a specific value for either since its wavefunction is a superposition of eigenstates. But, providing the eigenstates of the two operators don’t coincide, if the quantum state takes a definite value for one observable, then it cannot for the other. We’ll quantify this idea in Section 3.4 where we introduce the Heisenberg uncertainty relatations.

The lectures on Topics in Quantum Mechanics have a section on the Foundations of Quantum Mechanics where ideas of measurement are discussed in more detail.

3.3.1 Expectation Values

Take a normalised state $\psi({\bf x})$ and measure the observable $\hat{O}$ , with eigenstates and eigenvalues given by

\displaystyle\hat{O}\phi_{n}=\lambda_{n}\phi_{n}

As we’ve seen, if we expand the original state in orthonormal eigenfunctions

\displaystyle\psi({\bf x})=\sum_{n}a_{n}\phi_{n}({\bf x})

then the probability that we measure $\lambda_{n}$ is $|a_{n}|^{2}$ (assuming a non-degenerate spectrum). This means that if we have many systems, all in the same state $\psi$ , and perform the same measurement on each then the results will differ but the average will be

\displaystyle\langle\hat{O}\rangle_{\psi}=\sum_{n}|a_{n}|^{2}\lambda_{n}

Note the little subscript $\psi$ on the angular brackets, reminding us that the average value depends on the state of the system. We call this average the expectation value. It has a nice expression in terms of the inner product.

Claim: For a normalised wavefunction $\psi({\bf x})$ , the expectation value can be written as

\displaystyle\langle\hat{O}\rangle_{\psi}=\langle\psi\,|\,\hat{O}\psi\rangle=% \int d^{3}x\ \psi({\bf x})^{\star}\,\hat{O}\psi({\bf x})

(3.74)

If $\psi({\bf x})$ is un-normalised, this should be replaced by

\displaystyle\langle\hat{O}\rangle_{\psi}=\frac{\langle\psi\,|\,\hat{O}\psi% \rangle}{\langle\psi\,|\,\psi\rangle}

which follows from (3.74) by first normalising $\psi$ .

Proof: We don’t need to do anything clever: just follow our previous definition. It’s simplest to start with a normalised wavefunction and work backwards:

$\displaystyle\int d^{3}x\ \psi({\bf x})^{\star}\,\hat{O}\psi({\bf x})$	$\displaystyle=$	$\displaystyle\sum_{n}\sum_{m}\int d^{3}x\ a_{n}^{\star}a_{m}\,\phi_{n}^{\star}% ({\bf x})\,\hat{O}\phi_{m}({\bf x})$
	$\displaystyle=$	$\displaystyle\sum_{n}\sum_{m}\int d^{3}x\ a_{n}^{\star}a_{m}\,\phi_{n}^{\star}% ({\bf x})\,\lambda_{m}\phi_{m}({\bf x})$
	$\displaystyle=$	$\displaystyle\sum_{n}\sum_{m}a_{n}^{\star}a_{m}\lambda_{m}\delta_{nm}=\sum_{n}% \|a_{n}\|^{2}\lambda_{n}$

where, in the last step, we used the orthonormality of the eigenstates (3.66). $\Box$

We can look at some simple examples to check that this coincides with our earlier results. The expectation values of the position operator is

\displaystyle\langle\hat{{\bf x}}\rangle_{\psi}=\int d^{3}x\ {\bf x}\,|\psi({% \bf x})|^{2}

(3.75)

This is telling us that $|\psi({\bf x})|^{2}$ must have the interpretation of the probability distribution over position. This, of course, was how we first motivated the wavefunction in Section 1.1.

The expectation value of the momentum operator is

\displaystyle\langle\hat{{\bf p}}\rangle_{\psi}=-i\hbar\int d^{3}x\ \psi({\bf x% })^{\star}\nabla\psi({\bf x})

We already met the 1d version of this in Section 2.1.4 when looking at Gaussian wavepackets. It’s fruitful to think about the wavefunction written in terms of momentum eigenstates. Recall that this is just the Fourier transform (3.69)

\displaystyle\psi({\bf x})=\left(\frac{1}{2\pi}\right)^{3/2}\int d^{3}k\ % \tilde{\psi}({\bf k})\,e^{i{\bf k}\cdot{\bf x}}

Substituting this into our expression (3.75) we find

$\displaystyle\langle\hat{{\bf p}}\rangle_{\psi}$	$\displaystyle=$	$\displaystyle\frac{1}{(2\pi)^{3}}\int d^{3}k\int d^{3}k^{\prime}\ \hbar{\bf k}% ^{\prime}\,\tilde{\psi}({\bf k})^{\star}\tilde{\psi}({\bf k}^{\prime})\int d^{% 3}x\ e^{-i({\bf k}-{\bf k}^{\prime})\cdot{\bf x}}$
	$\displaystyle=$	$\displaystyle\int d^{3}k\int d^{3}k^{\prime}\ \hbar{\bf k}^{\prime}\,\tilde{% \psi}({\bf k})^{\star}\tilde{\psi}({\bf k}^{\prime})\,\delta^{3}({\bf k}-{\bf k% }^{\prime})$
	$\displaystyle=$	$\displaystyle\int d^{3}k\ \hbar{\bf k}\,\|\tilde{\psi}({\bf k})\|^{2}$

where, to go from the first to second line, we’ve used the Fourier representation of the Dirac delta function, essentially just three copies of the 1d result (3.68). The upshot is that we learn something rather nice: while the mod-squared of the original wavefunction $|\psi({\bf x})|^{2}$ has the interpretation as a probability distribution over position, the mod-squared of its Fourier transform $|\tilde{\psi}({\bf k})|^{2}$ has the interpretation as a probability distribution over momentum.

3.4 Commutation Relations

There is an algebraic way of formalising whether two observables can be simultaneously measured or whether, as is usually the case, a measurement of one messes up a previous measurement of the other. The underlying structure is known as commutation relations.

Given two operators $\hat{O}$ and $\hat{M}$ , their commutator is defined to be

\displaystyle[\hat{O},\hat{M}]=\hat{O}\hat{M}-\hat{M}\hat{O}

(3.76)

The key idea is familiar from matrices. If you multiply two matrices $A$ and $B$ then the order matters. Typically $A B$ is not the same as $B A$ . The commutator $[A,B]$ captures the difference between these. Clearly $[A,B]=-[B,A]$ and the same also holds for operators: $[\hat{O},\hat{M}]=-[\hat{M},\hat{O}]$ .

Some care should be taken with formulae like (3.76). The operators have a job to do: they act on functions. It’s important to remember this when manipulating these kind of equations. A good rule of thumb is to think of operator equations like (3.76) as meaning

\displaystyle[\hat{O},\hat{M}]\psi=\hat{O}\hat{M}\psi-\hat{M}\hat{O}\psi\ \ \ % \mbox{for all $\psi({\bf x})$}

(3.77)

When we do calculations, at least initially, it’s useful to use this phrasing to avoid any slip ups.

For all the examples in these lectures, the fundamental operators are position $\hat{{\bf x}}$ and the momentum operator $\hat{{\bf p}}$ . It will prove useful to compute their commutators.

Claim: The commutation relations of $\hat{{\bf x}}=(\hat{x}_{1},\hat{x}_{2},\hat{x}_{3})$ and $\hat{{\bf p}}=(\hat{p}_{1},\hat{p}_{2},\hat{p}_{3})$ are

\displaystyle[\hat{x}_{i},\hat{x}_{j}]=[\hat{p}_{i},\hat{p}_{j}]=0\ \ \ {\rm and% }\ \ \ [\hat{x}_{i},\hat{p}_{j}]=i\hbar\delta_{ij}

(3.78)

These are known as the canonical commutation relations on account of the important role that they play in quantum mechanics. Note that the right-hand side of the final relation is just a constant, but this too should be viewed as an operator: it’s the trivial operator that just multiplies any function by a constant.

The last commutation relation is perhaps surprising. In 1d, it reads simply $[\hat{x},\hat{p}]=i\hbar$ . This kind of relation is not possible for finite matrices $A$ and $B$ : the analogous result would be $[A,B]=i\hbar\mathds{1}$ with $\mathds{1}$ the unit matrix. But if we take the trace of the left-hand side we find

\displaystyle{\rm Tr}[A,B]={\rm Tr}(AB-BA)=0

because ${\rm Tr}(AB)={\rm Tr}(BA)$ . Yet clearly ${\rm Tr}\,\mathds{1}\neq 0$ . However, infinite matrices, a.k.a. operators, can do things that finite matrices cannot, the canonical commutation relations (3.78) among them.

Proof: The canonical commutation relations follow straightforwardly from the definitions

\displaystyle\hat{x}_{i}=x_{i}\ \ \ {\rm and}\ \ \ \hat{p}_{i}=-i\hbar\frac{% \partial{}}{\partial{x_{i}}}

(3.79)

Recalling that we should view all operators as acting on functions, as in (3.77), the commutation of position operators is

\displaystyle[\hat{x}_{i},\hat{x}_{j}]\psi({\bf x})=(x_{i}x_{j}-x_{j}x_{i})% \psi({\bf x})=0\ \ \ \mbox{for all $\psi({\bf x})$}

while the commutation of momentum operators is

\displaystyle[\hat{p}_{i},\hat{p}_{j}]\psi({\bf x})=\left(\frac{\partial{}}{% \partial{x_{i}}}\frac{\partial{}}{\partial{x_{j}}}-\frac{\partial{}}{\partial{% x_{j}}}\frac{\partial{}}{\partial{x_{i}}}\right)\psi({\bf x})=0\ \ \ \mbox{for% all $\psi({\bf x})$}

which follows because the order in which we take partial derivatives doesn’t matter. That leaves us just with the more interesting commutation relation. It is

\displaystyle[\hat{x}_{i},\hat{p}_{j}]\psi({\bf x})=-i\hbar\left({x_{i}}\frac{% \partial{}}{\partial{x_{j}}}-\frac{\partial{}}{\partial{x_{j}}}{x_{i}}\right)% \psi({\bf x})

We can see the issue: the first derivative hits only the function $\psi$ . Meanwhile the second derivative acts on $x_{i}\psi$ , giving us an extra term from the chain rule. Moreover, the $\partial\psi/\partial x_{j}$ terms cancel, leaving us only with this extra term

\displaystyle[\hat{x}_{i},\hat{p}_{j}]\psi({\bf x})=i\hbar\frac{\partial{x_{i}% }}{\partial{x_{j}}}\psi({\bf x})=i\hbar\delta_{ij}\psi({\bf x})\ \ \ \mbox{for% all $\psi({\bf x})$}

This is the claimed result. $\Box$

Note that the proof ultimately boils down to the operator equation

\displaystyle{x_{i}}\frac{\partial{}}{\partial{x_{j}}}-\frac{\partial{}}{% \partial{x_{j}}}{x_{i}}=-\delta_{ij}

When you first meet these kinds of equations, it’s not immediately obvious why they’re true. As we saw above, the right way to proceed is to check that it holds on when evaluated on any arbitrary function $\psi({\bf x})$ .

The logic that we’ve presented above is to first introduce the position and momentum operators and subsequently derive the canonical commutation relations. This, it turns out, is inverted. In more advanced presentations of quantum mechanics, the canonical commutation relations (3.78) are the starting point. There is then a theorem – due to Stone and von Neumann – that, roughly speaking, says that one can derive the form of the operators (3.79) from the canonical commutation relations.

One reason why the canonical commutation relations hold such a prominent position is because very similar equations can be found in classical mechanics. In that context, the relation involves neither commutators nor $\hbar$ , but something called a Poisson bracket. This is a concept that arises in the more sophisticated, Hamiltonian approach to classical mechanics. You can read about this in the lectures on Classical Dynamics. For now, let me just say that while the physics of quantum mechanics is a huge departure from our classical world, the underlying mathematical structure turns out to be surprisingly close.

Occasionally on our quantum travels, we will run into two operators that commute, meaning

\displaystyle[\hat{O},\hat{M}]=0

This is interesting because …

Claim: $[\hat{O},\hat{M}]=0$ if and only if $\hat{O}$ and $\hat{M}$ share the same eigenfunctions.

Proof: First suppose that both $\hat{O}$ and $\hat{M}$ share the same eigenfunctions, so that

\displaystyle\hat{O}\phi_{n}({\bf x})=\lambda_{n}\phi_{n}({\bf x})\ \ \ {\rm and% }\ \ \ \hat{M}\phi_{n}({\bf x})=\mu_{n}\phi_{n}({\bf x})

Then any function $\psi({\bf x})$ can be expanded in terms of these eigenstates $\psi=\sum_{n}a_{n}\phi_{n}$ , giving

\displaystyle[\hat{O},\hat{M}]\psi=\sum_{n}a_{n}[\hat{O},\hat{M}]\phi_{n}=\sum% _{n}a_{n}(\lambda_{n}\mu_{n}-\mu_{n}\lambda_{n})\phi_{n}=0\ \ \ \mbox{for all % $\psi({\bf x})$}

Conversely, suppose that $[\hat{O},\hat{M}]=0$ . We will restrict ourselves to the case where $\hat{O}$ has a non-degenerate spectrum. (The claim holds more generally but we have to work a little harder.) If $\phi$ is an eigenstate of $\hat{O}$ , then we have

	$\displaystyle\hat{O}\phi=\lambda\phi$	$\displaystyle\Rightarrow$	$\displaystyle\ \ \ \hat{M}\hat{O}\phi=\lambda\hat{M}\phi$
		$\displaystyle\Rightarrow$	$\displaystyle\ \ \ \hat{O}\hat{M}\phi=\lambda\hat{M}\phi$

where the second line follows because $[\hat{O},\hat{M}]=0$ . But this tells us that $\hat{M}\phi$ is also an eigenstate of $\hat{O}$ with eigenvalue $\lambda$ . By assumption, there is only one eigenstate with eigenvalue $\lambda$ , which means that we must have

\displaystyle M\phi=\mu\phi

for some $\mu$ . But this is the statement that $\phi$ is also an eigenstate of $\hat{M}$ . The two operators need not have the same eigenvalues, but they do have the same eigenstates. $\Box$

The physical interpretation of the claim follows from our discussion in Section 3.3. It is possible for a quantum system to have simultaneous values for commuting observables. If you measure $\hat{O}$ and then measure $\hat{M}$ , the state will not be perturbed as long as $[\hat{O},\hat{M}]=0$ . (Actually this last statement does require that the spectrum of $\hat{O}$ is non-degenerate.) If you then measure $\hat{O}$ again, you’ll get the same answer as the first time. We’ll meet examples of commuting operators in Section 4.1 when we look at the angular momentum operators in more detail.

3.4.1 The Heisenberg Uncertainty Principle

A probability distribution is characterised by its moments. The simplest of these is the average, or mean, which is the expectation value (3.74). The next simplest is the variance, which gives some indication of the spread of the distribution.

In quantum mechanics the variance is called uncertainty. In a state $\psi$ , the uncertainty of an observable $\hat{O}$ is denoted $\Delta_{\psi}{O}$ and is defined as

$\displaystyle(\Delta_{\psi}{O})^{2}$	$\displaystyle=$	$\displaystyle\langle(\hat{O}-\langle\hat{O}\rangle_{\psi})^{2}\rangle_{\psi}$
	$\displaystyle=$	$\displaystyle\langle\hat{O}^{2}-2\langle\hat{O}\rangle_{\psi}\hat{O}+\langle% \hat{O}\rangle_{\psi}^{2}\rangle_{\psi}$
	$\displaystyle=$	$\displaystyle\langle\hat{O}^{2}\rangle_{\psi}-\langle\hat{O}\rangle_{\psi}^{2}$

In statistics, the uncertainty $\Delta_{\psi}O$ is also known as the standard deviation of the distribution. Using our expression (3.74) for the expectation value, and assuming a normalised wavefunction, we can write the uncertainty as

	$\displaystyle(\Delta_{\psi}{O})^{2}$	$\displaystyle=$	$\displaystyle\langle\psi\,\|\,\hat{O}^{2}\psi\rangle-\langle\psi\,\|\,\hat{O}% \psi\rangle^{2}$		(3.80)
		$\displaystyle=$	$\displaystyle\int d^{3}x\ \psi^{2}\hat{O}^{2}\psi-\left(\int d^{3}x\ \psi^{% \star}\hat{O}\psi\right)^{2}$		(3.80)

To motivate the name “uncertainty”, suppose that the state of the system is an eigenstate of $\hat{O}$ ,

\displaystyle\psi({\bf x})=\phi({\bf x})\ \ \ {\rm with}\ \ \ \hat{O}\phi({\bf x% })=\lambda\phi({\bf x})

Then the uncertainty of $\hat{O}$ in this state is

\displaystyle(\Delta_{\phi}{O})^{2}=\langle\phi\,|\,\hat{O}^{2}\phi\rangle-% \langle\phi\,|\,\hat{O}\phi\rangle^{2}=\lambda^{2}-\lambda^{2}=0

This makes sense: in an eigenstate $\phi$ we know exactly what the value of $\hat{O}$ is, so there is no uncertainty. Of course, in the same state the uncertainty for other observables will necessarily be non-zero. Indeed, unless $[\hat{O},\hat{M}]=0$ , the uncertainties $\Delta_{\psi}{O}$ and $\Delta_{\psi}{M}$ cannot both vanish.

In fact, the story is more interesting. It turns out for certain pairs of observables, as you reduce the uncertainty in one, the uncertainty in the other necessarily grows. One such pair is our favourite position and momentum. It’s simplest if we work in one dimension. (The generalisation to higher dimensions is trivial.) The canonical commutation relations (3.78) tell us that

\displaystyle[\hat{x},\hat{p}]=i\hbar

(3.81)

This is all we need to derive the following result:

Claim: In any state $\psi$ , we necessarily have

\displaystyle\Delta_{\psi}x\,\Delta_{\psi}p\geq\frac{\hbar}{2}

(3.82)

This is the mathematical expression of the Heisenberg uncertainty relation. Because it follows purely from (3.81), any operators that obey the same commutation relation will have the same uncertainty relation.

Proof: To make life simple, we will assume that we have a state for which $\langle\hat{x}\rangle_{\psi}=\langle\hat{p}\rangle_{\psi}=0$ . (If this isn’t the case, you can rescue the proof below by defining new operators $\hat{X}=\hat{x}-\langle\hat{x}\rangle_{\psi}$ and $\hat{P}=\hat{p}-\langle\hat{p}\rangle_{\psi}$ which do obey $\langle\hat{X}\rangle_{\psi}=\langle\hat{P}\rangle_{\psi}=0$ .)

Next, consider the one-parameter family of states defined by

\displaystyle\Psi_{s}({\bf x})=(\hat{p}-is\hat{x})\,\psi({\bf x})\ \ \ \ s\in{% \bf R}

The norm of this state is, like all others, positive definite, meaning

\displaystyle\langle\Psi_{s}\,|\,\Psi_{s}\rangle=\int dx\ |\Psi_{s}|^{2}\ \geq 0

Translated back to the original state $\psi$ , this tells us that

\displaystyle 0\leq\langle(\hat{p}-is\hat{x})\psi\,|\,(\hat{p}-is\hat{x})\psi% \rangle=\langle\psi\,|\,(\hat{p}+is\hat{x})(\hat{p}-is\hat{x})\psi\rangle

where the equality uses the fact that both $\hat{x}$ and $\hat{p}$ are Hermitian. Expanding out, we have

\displaystyle 0\leq\langle\psi\,|\,(\hat{p}^{2}+is[\hat{x},\hat{p}]+s^{2}\hat{% x}^{2})\psi\rangle=\langle\psi\,|\,(\hat{p}^{2}-s\hbar+s^{2}\hat{x}^{2})\psi\rangle

where, this time, the equality uses the commutation relation $[\hat{x},\hat{p}]=i\hbar$ . The $\hat{p}^{2}$ and $\hat{x}^{2}$ terms in this expression are just the uncertainties (3.80), a fact that follows because we’ve chosen $\langle\hat{x}\rangle_{\psi}=\langle\hat{p}\rangle_{\psi}=0$ . So we have

\displaystyle(\Delta_{\psi}p)^{2}-s\hbar+s^{2}(\Delta_{\psi}x)^{2}\geq 0\ \ \ % \mbox{for all $s\in{\bf R}$}

Now we’re in familiar territory. This is a quadratic in $s$ and the inequality tells us that it has either one root or no roots. This can only be true if the discriminant “ $b^{2}-4ac$ ” is non-positive, i.e.

\displaystyle 4(\Delta_{\psi}x)^{2}(\Delta_{\psi}p)^{2}\geq\hbar^{2}

This is the Heisenberg uncertainty relation (3.82). $\Box$

You can have some fun trying to come up with experiments that might evade the Heisenberg uncertainty relation, and then see how Nature finds a way to avoid this conclusion and ensure that (3.82) is always satisfied.

For example, the obvious way to figure out where a particle is sitting is to look at it. But if you want to resolve something on a distance scale $\Delta x$ , then you have to use light of wavelength $\lambda\lesssim\Delta x$ . Ultimately, light is made up of particles called photons. We haven’t said anything so far about photons in these lectures. (We’ll make a few comments in Section 4.4.) However, all we need to know is that, like all other quantum particles, photons have a momentum given by the de Broglie formula (2.14)

\displaystyle p=\frac{2\pi\hbar}{\lambda}

Clearly there’s a pay-off: if you want to be sure that the particle is sitting in a smaller and smaller region $\Delta x$ then you have to hit it with higher and higher momentum photons, and these will impart some recoil on the particle of order $\Delta p\sim 2\pi\hbar/\Delta x$ . The intrusive nature of measurement means that we can’t know both the position and momentum to better than $\Delta p\Delta x\sim 2\pi\hbar$ . The Heisenberg uncertainty relation (3.82) provides the more accurate bound.

For more examples along the same lines, you could do worse than look up the intellectual jousting of the original Einstein-Bohr debates. Einstein’s thinking is pellucid in its clarity; Bohr appears muddled, bordering on obscurantist. Yet Bohr had the overwhelming advantage of being right.

The Gaussian Wavepacket Revisited

We already met a baby version of the Heisenberg uncertainty relation when we studied the Gaussian wavepacket in Section 2.1. We can return to this example now that we have a better idea of what we’re doing.

We’ll forget about time dependence (as we have for much of this section) and look the normalised Gaussian wavefunction in one dimension

\displaystyle\psi(x)=\left(\frac{a}{\pi}\right)^{1/4}e^{-ax^{2}/2}

It’s not hard to check that $\langle\hat{x}\rangle_{\psi}=\langle\hat{p}\rangle_{\psi}=0$ . What about the uncertainty?

The uncertainty in position is given by

\displaystyle(\Delta_{\psi}x)^{2}=\langle\hat{x}^{2}\rangle_{\psi}=\sqrt{\frac% {a}{\pi}}\int_{-\infty}^{+\infty}dx\ x^{2}e^{-ax^{2}}

(3.83)

This integral is straightforward. We start from the Gaussian integral

\displaystyle\int_{-\infty}^{+\infty}dx\ e^{-ax^{2}}=\sqrt{\frac{\pi}{a}}

Then we differentiate both sides with respect to $a$ . This gives

\displaystyle\frac{\partial{}}{\partial{a}}\int_{-\infty}^{+\infty}dx\ e^{-ax^% {2}}=-\int_{-\infty}^{+\infty}dx\ x^{2}e^{-ax^{2}}=\frac{\partial{}}{\partial{% a}}\sqrt{\frac{\pi}{a}}=-\frac{1}{2}\sqrt{\frac{\pi}{a^{3}}}

So we have the result

\displaystyle\int_{-\infty}^{+\infty}dx\ x^{2}e^{-ax^{2}/2}=\frac{1}{2}\sqrt{% \frac{\pi}{a^{3}}}

Substituting this into (3.83), we get the uncertainty in position to be

\displaystyle(\Delta_{\psi}x)^{2}=\frac{1}{2a}

This is to be expected: it is the usual variance of a Gaussian distribution.

For the uncertainty in momentum, we have

$\displaystyle(\Delta_{\psi}p)^{2}$	$\displaystyle=$	$\displaystyle\langle\hat{x}^{2}\rangle_{\psi}=\sqrt{\frac{a}{\pi}}\int_{-% \infty}^{+\infty}dx\ e^{-ax^{2}/2}\left[-\hbar^{2}\frac{d^{2}}{dx^{2}}e^{-ax^{% 2}/2}\right]$
	$\displaystyle=$	$\displaystyle\sqrt{\frac{a}{\pi}}\hbar^{2}\int_{-\infty}^{+\infty}dx\ (a-a^{2}% x^{2})e^{-ax^{2}}$
	$\displaystyle=$	$\displaystyle\sqrt{\frac{a}{\pi}}\hbar^{2}\left[\sqrt{\frac{\pi}{a}}a-\frac{a^% {2}}{2}\sqrt{\frac{\pi}{a^{3}}}\right]=\frac{1}{2}\hbar^{2}$

Multiplying these results together gives

\displaystyle\Delta_{\psi}x\,\Delta_{\psi}p=\frac{\hbar}{2}

We see that the Gaussian wavepacket is rather special: it saturates the bound from the Heisenberg uncertainty relation. The class of Gaussian wavefunctions, parameterised by $a$ , does the best job possible of balancing the twin requirements of localising in both position and momentum.

3.5 Interpretations of Quantum Mechanics

There is, I think it’s fair to say, a level of disquiet about certain aspects of quantum mechanics.

In large part, this nervousness rests on the two ways in which the wavefunction can evolve. For much of the time, this evolution is governed by the time dependent Schrödinger equation. There is nothing random about this process: it is as deterministic and reversible as anything in classical mechanics. But then you decide to take a peek. Or, in more standard language, to make a measurement. At this point the wavefunction changes discontinuously, collapsing to one of the eigenstates in a probabilistic fashion.

Related to this are two distinct ways of thinking about the wavefunction. On the one hand, the wavefunction clearly captures certain innate properties of the system. This is exhibited in the interference pattern of the double slit experiment or, as we will see in the next section, the spectral lines of the hydrogen atom. On the other hand, the wavefunction also encodes some of aspect of our knowledge of the system as manifest in the probabilistic nature of measurement. The dichotomy posed by philosophers is whether the wavefunction is ontic, meaning describing some external aspect of the world, or epistemic, meaning relating only to our knowledge. The clear answer from quantum mechanics is that the wavefunction is both.

In this section we will briefly look at some of the ways in which people have grappled with these questions. I should warn you that we haven’t yet covered enough background to describe these in any detail. This means that, in some places, I will do little more than drop some relevant names which should, if nothing else, give you a starting point to explore the rabbit warren of articles on the subject.

Finally, before we proceed, I want to point out that a much better name for the title of this section would be “Interpretations of Classical Mechanics”. At the fundamental level, the world is quantum and probabilistic. Yet, from this emerges the classical world with its deterministic laws of physics. If there’s a question to answer at all, it’s how the latter arises from the former. If, instead, you’re looking for an explanation of quantum behaviour in terms of your prejudiced, classical worldview then you’ve got it backwards. That’s like turning to botany in the hope that it will help you understand the properties of quarks.

3.5.1 Hidden Variables

In some ways, hidden variables theories are not an interpretation of quantum mechanics at all. Instead, they are a rewriting of quantum mechanics: a novel and unspecified, underlying theory, with new degrees of freedom – the so-called hidden variables – that are designed to reproduce the rules of quantum mechanics. The hope is that these hidden variables may allow us to return to the comfortable, rosy classical world, where everything is predestined. In such a theory, the probabilities of quantum mechanics arise because of our ignorance of these underlying variables and are no different from probabilities in, say, the weather.

Hidden variables theories have an advantage over other interpretations. This is because, in some circumstances, they may deviate from the predictions of quantum mechanics, allowing an experiment to distinguish between the two. After all, the power of science derives from our ability to be wrong. Hidden variables theories hold a special place among interpretations of quantum mechanics because they are falsifiable. In fact, not only are the falsifiable, they are falsified!

It may seem odd to spend time discussing an interpretation that is known to be wrong. But because there are observable differences between hidden variables and common-or-garden quantum mechanics, this alternative viewpoint has taught us much more about the nature of the quantum world than any other interpretation and that’s a story worth telling.

The story starts with Einstein who, together with Podolsky and Rosen, was concerned about the instantaneous collapse of the wavefunction. The word “instantaneous” does not sit well with the tenets of special relativity, where the ordering of spacelike separated events can be different for different observers. In Minkowski space, simultaneity is tantamount to acausality.

In 1935, the trio of EPR conceived of a thought experiment in which the knowledge of a measurement performed in one part of space would instantaneously give rise to knowledge of an experiment performed in a far flung region of the universe. This, they argued, was untenable and so quantum mechanics must be incomplete: to be compatible with locality and causality it should be replaced by a new theory at shorter distance scales. These are the hidden variables.

Thirty years later, the physicist John Bell presented a glorious judo move of an argument in which he used the strength of the EPR result against them. He derived an inequality, now known as the Bell inequality, which is necessarily satisfied by any local, causal hidden variable theory but is violated by quantum mechanics. In other words, he found a way to experimentally distinguish between hidden variables and quantum mechanics. Needless to say, when the experiments were done they came down firmly on the side of quantum mechanics.

The Bell inequality greatly sharpened our understanding of quantum mechanics and helped us see what it can and can’t do. We now understand that there is no acausal behaviour in the collapse of the wavefunction. There is no way that one can use the collapse of the wavefunction to transmit information faster than the speed of light. Instead, the acausality only arises if, like EPR, you insist on some underlying classical mechanism to explain the observed quantum behaviour.

Strictly speaking, hidden variables theories are not quite dead. The Bell arguments only rule out local hidden variables theories, meaning those whose dynamics is compatible with special relativity and, consequentially, causality. But these properties were one of the main motivations for introducing hidden variables in the first place! Moreover, there is overwhelming evidence that dynamics in our universe is local and so little reason to think that, at the fundamental level things are non-local but that somehow this is hidden from our view.

Indeed, the lesson to take from EPR and Bell is that quantum mechanics is subtle and clever and achieves things through local dynamics that we may have naively thought impossible. Furthermore, this way of thinking has been astonishingly fruitful: one can draw a clear line from the arguments of Bell to applications such as quantum computing that harness some of the more surprising aspects of the quantum world. You can read more about the EPR argument and the Bell inequalities and their applications in the lectures on Topics in Quantum Mechanics.

3.5.2 Copenhagen and Many Worlds

When people talk about an interpretation of quantum mechanics, they don’t usually have in mind a new theory that will ultimately replace quantum mechanics, nor something that can be tested with experiment. Instead, they are searching for some comforting words that they can drape around the equations to help them sleep better at night.

Here I describe two attempts at constructing these words. Neither are comforting. In fact, both are jarring. They do, however, demonstrate the unsettling novelty that necessarily accompanies quantum mechanics.

The Copenhagen interpretation was the approach favoured by many (but not all) of the founding fathers of quantum mechanics, and advocated most strongly, but not always clearly, by Niels Bohr. The idea is to take the collapse of the wavefunction seriously. The universe in which we live should be divided into two: the quantum world, described by wavefunctions and the Schrödinger equation, and the familiar classical world described by Newtonian laws of physics. The measurement process provides a bridge between the two, where the nebulous nature of the quantum transmutes into concrete statements of the classical. The price you pay for this is probability.

There is much that is deeply unsatisfying about the Copenhagen interpretation. Both you and me and (for some us) our experimental apparatus are made of atoms which should also obey the laws of quantum mechanics and it seems very odd to deny this. Moreover, it’s far from clear where the dividing line should be drawn between classical and quantum. In fact, it seems preferable to keep the dividing line deliberately fuzzy and ill defined since it helps render certain unpalatable questions illegitimate. While unsatisfying, it does seem that the Copenhagen interpretation is a consistent logical viewpoint, although it may take a lifetime of study to be able to nimbly deflect the awkward questions with the skill of Niels Bohr.

The second approach is known as the many worlds interpretation. It was first suggested in the 1950s by Hugh Everett III and gained traction in the subsequent decades. The idea is to take the time dependent Schrödinger equation seriously. A particle that passes through two slits is described by a wavefunction that is non-vanishing in the vicinity of both slits. We say that the particle is in a superposition of states, sitting in two places at the same time. If there’s a detection apparatus placed on one slit then the wavefunction doesn’t collapse: instead it continues to obey the Schrödinger equation. The detection apparatus now also sits in a superposition of “particle detected” and “particle not detected”. When we subsequently look at the apparatus, we too split into a superposition. And so on. The “many worlds” are the different branches of the wavefunction. All of this follows from simply putting our faith in the veracity of the Schrödinger equation.

There is much that is deeply unsatisfying about the many worlds interpretation. The collapse of the wavefunction is largely ignored and the all-important Born rule for constructing probabilities must be added by hand. Relatedly, there is nothing that explains why I only feel myself in one branch of the wavefunction, rather than in a superposition of all possible outcomes. (Admittedly, I don’t know what I would feel if I were, in fact, in a superposition of states.)

There is one important facet of quantum mechanics that brings balm to both interpretations above, a process known as quantum decoherence. This is one of those topics that is beyond the scope of these lectures, but roughly speaking the idea of decoherence is that as an increasingly large number of particles become entangled in the superposition, so it becomes increasingly difficult to exhibit any quantum interference effects. This, at least, explains why peculiar quantum properties cannot be observed for macroscopic objects. It also helps in understanding how the blurry Copenhagen dividing line can arise, or how the splitting of the many worlds might practically occur. Decoherence is, like the Bell inequalities, an important part of our understanding of quantum dynamics. But it falls short of resolving all the difficulties of either interpretation.

The two interpretations above are painted only with the very broadest brush. There is an almost infinitely bifurcating tree of different viewpoints, with fights frequently breaking out in otherwise dull conferences between the Everettian People’s Front and the People’s Front of Everett. (Splitters.) Of course, the fiercest arguments are between Copenhagenists and Many Worlders, but they now tend to go to different conferences.

There is, however, an alternative approach, a third way. This is the approach advertised in the heading of this section: you should accept both Copenhagen and many worlds, together with anything that lies in between. At any given moment, choose the one that gives you the warmest feeling when applied to the problem at hand. If you experience a nagging sense of shame or inconsistency in adopting all viewpoints, simply shrug it off. The church of your choice matters not one iota for the simple reason that there is no experimental way to distinguish between them. Should that situation ever change, then so too should your perspective.

Before proceeding, I should mention one pseudo-experimental hope that is often raised in conjunction with the interpretation of quantum mechanics. This is the suggestion that a better understanding of the meaning of quantum mechanics is needed before we can solve some thorny problem, usually mooted as something like quantum gravity or human consciousness. For example, you might want to think about quantum cosmology, where we apply quantum mechanics to the entire universe and it is difficult to envisage the role of an external observer who can collapse the wavefunction. Although a popular opinion, it seems to me that the underlying logic is weak, roughly following the lines ”we don’t understand Topic X and I’m nervous about quantum mechanics so probably they’re related”. The same logic could equally well be applied to, say, high temperature superconductivity or the question of why the Higgs boson has its particular mass, but it’s never suggested that these can be solved only after invoking the right interpretation of quantum mechanics because it would sound silly. I suspect that the idea that a novel interpretation of quantum mechanics is needed before we can make progress on other important problems is similarly ill-judged.

3.5.3 Shut Up and Calculate

The phrase “shut up and calculate” was coined by the solid state physicist David Mermin as a somewhat cynical, but largely approving take on what is, by any measure, the most popular interpretation of quantum mechanics among practising physicists.

Mermin’s point is a simple one. Quantum mechanics is, by some margin, the most successful scientific framework of all time. The formalism provides unambiguous answers to any experiment we care to perform and the fact that these answers are necessarily of a statistical nature is more than offset by the fact that these answers are right.

Embracing quantum mechanics has lead, in the century since its discovery, to an unprecedented understanding of the world and resulted in an enormous body of work that, collectively, represents one of the great triumphs of human intellect. Sitting within this body are the subjects of atomic and molecular physics, condensed matter physics, statistical physics, quantum information, mathematical physics, particle physics, and early universe cosmology. Anyone who pauses at the starting point, having deep thoughts about what it all means and muttering the words “ontic” and “epistemic”, is in danger of missing the stunning quantum vistas that await just around the corner. The true meaning of quantum mechanics can be found in the answers it gives about the world we inhabit.

	$\displaystyle\langle{\phi}_{n}\,\|\,\hat{O}{\phi}_{m}\rangle=\langle\hat{O}{% \phi}_{n}\,\|\,{\phi}_{m}\rangle$	$\displaystyle\Rightarrow$	$\displaystyle\ \ \ \lambda_{n}^{\star}\langle\phi_{n}\,\|\,\phi_{m}\rangle=% \lambda_{m}\langle\phi_{n}\,\|\,\phi_{m}\rangle$
		$\displaystyle\Rightarrow$	$\displaystyle\ \ \ (\lambda_{n}^{\star}-\lambda_{m})\,\langle\phi_{n}\,\|\,\phi% _{m}\rangle=0$