4 Divide and Conquer Algorithms
This section introduces a powerful algorithmic paradigm, divide-and-conquer. This paradigm is closely related to recursion, and we introduce recursive algorithms as well.
Objectives. After learning this material, you should be able to:
- Solve problems by writing recursive divide-and-conquer algorithms.
- Prove correctness of recursive divide-and-conquer algorithms using structural induction.
- Analyze the time complexity of recursive divide-and-conquer algorithms using structural induction.
4.1 Recursion and Divide-and-Conquer
You have probably seen recursive algorithms before, so we will recall them briefly. An algorithm is recursive if, in order to compute its output, it calls itself on a modified input. An example of a recursive algorithm is this naive implementation of the Fibonacci numbers:
Algorithm 4.1 (Fibonacci numbers)
We will be looking at recursive algorithms that fall into the following algorithmic paradigm.
Definition 4.1 (Divide-and-conquer) A divide-and-conquer algorithm splits the input into parts, solves a useful subproblem on each part, then combines the solutions together to produce an answer to the original problem.
Often, divide-and-conquer algorithms are recursive in nature because the subproblems are of the same form as the original problem. This book will focus on recursive examples.
4.2 Binary Search
For our first example, we’ll look at a very important algorithm: binary search. In this problem, we are given a sorted list of elements coming from some universe. Given an element, we need to find its location in the list, if present. The elements could be numbers, strings, or other objects. For simplicity, we will assume they are numbers, but the algorithm easily generalizes to other cases.
The Search Problem:
- Input: a sorted array A of numbers; a number x.
- Output: the index of A where x should be inserted to maintain the sorted order. If x is present, this should be an index where x is located.
The binary search algorithm is as follows. We will 0-index A for this algorithm. We use A[i:j] to denote the subarray of A from element i to j inclusive. If i > j, such as A[0:-1] or A[5:4], this produces an empty array. We write A[i:end] to denote the subarray starting at index i and going to the end of the array. We assume that a subarray can be created, or referenced, in constant time.
Algorithm 4.2 (Binary search)
Remember that with algorithms, we want to analyze correctness and efficiency (time and space). Let’s start with correctness.
4.2.1 Correctness
Proposition 4.1 Binary search correctly solves the binary search problem, i.e. given a valid input, it produces a correct output.
Proof. We prove correctness by induction on the length of A. The base case is when A has length 0. In this case, the algorithm returns 0, which is the correct index.
For the inductive step, let the length of A be at least 1 and assume the algorithm is correct on all arrays shorter than A. Let \(i = int(len(A)/2)\), where int() rounds down. We note that because \(len(A) \geq 1\), we have \(i < len(A)\), which means that the array \(A[0:i-1]\) and the array \(A[i+1:end]\) are both shorter than A.
There are three cases. First, if \(x == A[i]\), then the algorithm correctly returns index \(i\), and we are done.
Second, suppose \(x < A[i]\). By inductive hypothesis, our recursive call on line 8 correctly returns the index to insert \(x\) in the sublist A[0:i-1]. We claim this index is also correct for inserting \(x\) into A. This follows immediately if the return value is in \(0,\dots,i-1\), which implies that \(x\) should be inserted somewhere before the end of the sublist. The other case is if the return value is \(i\), meaning an insertion at the end of the sublist, implying \(x > A[i-1]\). This is still correct, because we have supposed \(x < A[i]\), so \(i\) is the correct position.
Third, suppose \(x > A[i]\). By inductive hypothesis, our recursive call on line 10 correctly returns the index to insert \(x\) in the sublist A[i+1:end]. If this index is some \(j > 0\), then x should be inserted somewhere after the beginning of this sublist, which means the same location, shifted by \(i+1\), is correct to insert x into A. If the index is \(0\), this implies that \(x \leq A[i+1]\). On the other hand, we have \(x > A[i]\), so \(i+1\) is correct.
The exact code of binary search and its correctness proof are notoriously fiddly due to several edge cases. In such situations, it is helpful to keep in mind the famous quote of Donald Knuth: “Beware this code. I have not tested it, only proven it correct.”
4.2.2 Runtime analysis
Now, we will analyze efficiency. We will focus on running time (can you bound the space use yourself?).
Proposition 4.2 Binary search runs in time \(O(\log(n))\), where \(n = len(A)\).
Proof. First, we prove by induction that, when we call binary_search(A,x), it makes at most \(\log(n) + 1\) recursive calls total, where the logarithm is base 2. The first base case is when n == 0. In this case, we make \(0\) recursive calls. Since \(\log(0)\) is not defined, we will define this case as being satisfied. The other base case is when n == 1. In this case, we either make no recursive calls, because \(x == A[0]\), or we make one recursive call on a length-0 subarray. The maximum number of recursive calls is therefore \(1 = \log(1) + 1\), as claimed.
For the inductive step, let \(n \geq 2\). In this case, binary_search(A,x) either makes no recursive calls, or makes a call in line 8, or makes a call in line 10 (but not both). In either case, the call is for a subarray of length at most n/2. By inductive hypothesis, after that call, at most \(\log(n/2) + 1\) more calls are made. This makes the total number at most \(1 + \log(n/2) + 1 = 1 + \log(n) - \log(2) + 1 = 1 + \log(n)\), as claimed.
Now we look at each call to binary_search(). By counting the operations, we can see that any call does at most a constant number of steps, about 10, plus any steps done by its own recursive call. Since we proved that we visit the function binary_search() at most \(\log(n) + 1\) times total, this gives a total running time bound of \(10(\log(n) + 1) = 10\log(n) + 10 = O(\log(n))\).
4.3 Mergesort
As with binary search, Mergesort is a recursive divide-and-conquer problem. It also applies to lists of elements, and as with binary search above, we will focus on the case where the elements are numbers, but the ideas can easily extend to other types of list elements.
The Sorting Problem:
- Input: an array A of numbers.
- Output: an array containing the same numbers in sorted order.
The idea of Mergesort is to split the list in half and sort the left and right halves separately. Then, we need to run the merge subroutine to combine the two sorted sublists into our final sorted list.
Algorithm 4.3 (Mergesort)
mergesort(A):
if len(A) <= 1:
return A
let i = int(len(A)/2)
let B = mergesort(A[0:i])
let C = mergesort(A[i+1:end])
return merge(B, C)
merge(B, C):
let D = []
let i = 0
let j = 0
while i < len(B) or j < len(C):
if j >= len(C) or (i < len(B) and B[i] < C[j]):
append B[i] to D
i += 1
else:
append C[j] to D
j += 1
return D4.3.1 Correctness analysis
First, we need a lemma that the “merge” subroutine is correct.
Lemma 4.1 The subroutine merge(B,C), given two sorted lists, outputs a sorted list containing all of the elements in B and C.
Proof. All elements of B and C eventually get added to D because we only increment i when we add B[i] to D and similarly for j and C[j]. Now we just need to show that D is sorted. To do so, we claim that in the while loop of lines 5-11, each iteration, the smallest remaining element of B[i:end] and C[j:end] is added to D. This follows because B is sorted, so its minimum remaining element is located at B[i], and similarly for C[j], and we add the smaller of these to D. (If we have already added all elements of B, then \(i > len(B)\), so we add C[j]; and vice versa.) Because we always append the smallest remaining element to D, D is sorted.
Proposition 4.3 Mergesort correctly sorts the input.
Proof. By induction. For base cases, if len(A) is 0 or 1, mergsort(A) returns A, which is correct.
For the inductive case, let \(len(A) \geq 2\). The elements of A are partitioned into B and C in lines 7 and 8. Each of these is shorter than A, so by inductive hypothesis, B and C are correctly sorted by the recursive calls. By Lemma 4.1, merge(B,C) returns a sorted version of A, so mergesort is correct.
4.3.2 Runtime analysis
We again assume that subarrays can be referenced in constant time.
Lemma 4.2 merge(B,C) runs in time at most 10(len(B) + len(C)) + 4.
Proof. We first analyze the while loop, lines 5-11. Inside the loop, we take a constant number of steps, upper-bounded by about 10. How many iterations does the while loop take? Each iteration, we either increase i or j, but not both. We stop incrementing i when it reaches len(B) and we stop incrementing j when it reaches len(C). When both happen, the while loop completes. Therefore, there are len(B) + len(C) iterations of the while loop, so lines 5-11 take a total of at most 10(len(B) + len(C)) time. Adding 4 steps for lines 2-4 and line 12 gives a running time bound of 10(len(B) + len(C)) + 4.
Proposition 4.4 mergesort(A) runs in time \(O(n \log(n))\), where \(n = len(A)\).
Proof. We will use a similar strategy to binary search: we calculate how much work happens in each call to mergesort(), then we calculate the number of calls. However, we will see that this situation is a bit more complex.
It will be helpful to have an expression for the running time of mergesort on a given input size. Define T(n) to be the running time of mergesort on lists of length n.
First, how much work is done in mergesort(A) itself, ignoring recursive calls? The answer is about 6 steps plus the work done in merge(B,C), which gives a total (using the Lemma) of 10 + 10(len(B) + len(C)). We can observe that len(B) + len(C) = len(A), so this runtime is 10 + 10 len(A). To simplify analysis, let’s consider the case \(len(A) \geq 1\), so we can upper-bound this work by 10 + 10 len(A) \(\leq\) 20 len(A).
Now, we need to consider the recursive calls. To simplify the analysis, we will pretend that \(len(A) = 2^k\) for some integer \(k\), i.e. the length is a power of \(2\). That implies that, because we divide the array in half each time, every subarray we create throughout the algorithm has even length until the length reaches \(1\).
We recursively call mergesort twice, each on a list of length n/2. This gives the following recurrence relation:
\[\begin{align*} T(n) = 2T(n/2) + 20n \end{align*}\]
Here is how it relates to the code:
Verbally, the running time of mergesort on a list of length \(n\) is equal to twice its running time on lists of length \(n/2\), plus an extra \(20n\) steps needed to merge the lists and complete the rest of the algorithm.
Solving the recurrence. Now the question is: what formula T(n) solves the recurrence? The method we will use is called the recursion tree method. We create a tree where each node represents a call to mergesort() and its two children represent the two recursive calls it makes. We can make the following key observations.
- Each child node represents a call on an array of half the length of its parent. Therefore, a node at distance \(t\) from the root represents a call to mergesort() with an array of length \(len(A)/2^t = 2^{k-t}\).
- Therefore, the height of the tree is \(k\), because after \(k\) halvings, we end up with an array of length \(1\) and make no more recursive calls.
- The recursion tree is a complete binary tree, which implies that it has \(2^t\) nodes at distance exactly \(t\) from the root, for each \(t=0,\dots,k\).
We are now ready to add up the total time complexity by summing over the “layers” of the recursion tree. At each layer \(t=0,\dots,k\), there are \(2^t\) nodes. Each one does a total amount of work 10 + 10 len(input) = \(10 + 10 2^{k-t}\). Summing, the total time complexity is: \[\begin{align*} T(n) &= \sum_{t=0}^k 2^t \left( 20 \cdot 2^{k-t} \right) \\ &= 20 \sum_{t=0}^k 2^k \\ &= 20 k 2^k . \end{align*}\]
Recalling that \(n = len(A)\) and \(k = \log(n)\), we can rewrite this running time bound as \(T(n) = 20 n \log(n) = O(n \log(n))\).
As a note, we can check that \(T(n) = 20n \log(n)\) satisfies our original recurrence:
\[\begin{align*} 2T(n/2) + 20n &= 2 \left( 20(n/2)\log(n/2) \right) + 20n \\ &= 2 \left( 10n (\log(n) - 1) \right) + 20n \\ &= 2 \left( 10n \log(n) - 10n \right) + 20n \\ &= 20n \log(n) - 20n + 20n \\ &= 20n \log(n) . \end{align*}\]
4.4 Integer Multiplication
Let’s consider one more problem before we discuss how to analyze divide-and-conquer algorithms in general.
The Integer Multiplication Problem:
- Input: two positive n-bit integers \(x,y\).
- Output: the product \(xy\).
The “grade-school” algorithm is to multiply each bit of \(x\) by each bit of \(y\) and take the results, suitably shifted, and add them all up. We won’t go into to the analysis, but this takes \(O(n^2)\) time, as we must consider all pairs of bits.
Can we multiply faster than that? Famous mathematicians thought no. But in fact, we can, using a divide-and-conquer algorithm. First, we’ll look at an approach that is not ultimately faster, but paves the way.
Algorithm 4.4 (Divide-and-Conquer Multiplication)
multiply(x, y): # assume both have n bits
let n = len(x)
if n <= 1:
return x*y
let x0,x1 = split(x)
let y0,y1 = split(y)
a = multiply(x1,y1)
b = multiply(x1,y0)
c = multiply(x0,y1)
d = multiply(x0,y0)
return a*2^n + (b+c)*2^(n/2) + d
// Divide an integer into upper and lower halves
split(x):
let x0 = x % 2^(n/2) // % is the remainder operation
let x1 = int(x/2^(n/2)) // int() rounds down
return x0, x1Let’s take correctness as given and analyze this algorithm using the recurrence-tree method. Let \(T(n)\) be the running time of multiply(). The split() subroutine is \(O(n)\) time using bit operations. Line 11 is actually \(O(n)\) as well, because adding is \(O(n)\) and multiplying by a power of two is just a bit shift, so also \(O(n)\). That leaves lines 7-10. Each calls multiply() on inputs of half the original size, so they take \(T(n/2)\) time each.
\[\begin{align*} T(n) &= 4 T(n/2) + O(n) . \end{align*}\]
Let \(n = 2^k\). The tree method gives that at each level \(t=0,1,\dots,\log(n)\), there are \(4^t\) nodes and each node does \(n/2^t = 2^{k-t}\) work.
\[\begin{align*} T(n) &= \sum_{t=0}^{k} 4^t (2^{k-t}) \\ &= \sum_{t=0}^{k} 2^{2t} 2^{k-t} \\ &= \sum_{t=0}^{k} 2^{t+k} \\ &= 2^k \sum_{t=0}^{k} 2^t \\ &= 2^k (2^{k+1} - 1) \\ &\approx (2^k)^2 \\ &= n^2 . \end{align*}\] A more rigorous proof wouldn’t hurt, but we conclude the algorithm is \(O(n^2)\). No faster! However, a clever observation allows us to speed it up asymptotically. We’ll see this next.
4.4.1 Karatsuba’s multiplication algorithm
We know that addition and subtraction of \(n\)-bit numbers are \(O(n)\) time operations. Asymptotically, these are much, much cheaper than multiplication, which is \(O(n^2)\) as far as we know. Therefore, we should be willing to do a few extra additions and subtractions if they can help us save even one multiplication. That is the idea behind the next algorithm.
Remember that \(x = 2^n (x_1) + x_0\) and \(y = 2^n (y_1) + y_0\), where \(x_1,x_0,y_1,y_0\) are each \(n/2\) bits. We want to compute \(xy = (2^n x_1 + x_0)(2^n y_1 + y_0) = 2^{2n} x_1y_1 + 2^{n}(x_1y_0 + x_0y_1) + x_0y_0\). Let’s take for granted that we’ll need to compute \(x_1y_1\) and \(x_0y_0\). Can we get the middle term any easier?
The key observation is that we can compute the following product of \(n/2\) bit numbers: \((x_1 + x_0)(y_1 + y_0) = x_1y_1 + x_0y_1 + x_1y_0 + x_0y_0\). Subtracting off \(x_1y_1\) and \(x_0y_0\) leaves us with the middle term we need, \(x_0y_1 + y_0x_1\).
Algorithm 4.5 (Karatsuba Multiplication)
We now show that Karatsuba’s algorithm runs faster than the \(O(n^2)\) algorithms above.
Proposition 4.5 Karatsuba multiplication runs in time \(O(n^c)\) where \(c = \log_2(3) \approx 1.58.\)
Proof. We first need to write the recurrence. We have:
multiply(x, y): # T(n) steps
let n = len(x)
if n <= 1: # O(1)
return x*y # O(1)
let x0,x1 = split(x) # O(n)
let y0,y1 = split(y) # O(n)
a = multiply(x1,y1) # T(n/2)
d = multiply(x0,y0) # T(n/2)
x_sum = x1 + x0 # O(n)
y_sum = y1 + y0 # O(n)
product = multiply(x_sum, y_sum) # T(n/2)
b = a + d - product # O(n)
return a*2^n + b*2^(n/2) + d # O(n)We can write this recurrence as: \[\begin{align*} T(n) &= 3 T(n/2) + O(n) . \end{align*}\] It turns out that the constants on the \(O(n)\) don’t matter for the big-O solution to the recurrence. We can analyze it with the tree method. Just as with mergesort, there are \(k = \log_2(n)\) levels of the tree, because we cut the input size in half at each level. But now, there are \(3^t\) nodes at level \(t=0,1,\dots,\log_2(n)\). In each node at level \(t\), we have an input size of \(n/2^t\), so we do at most \(C n/2^t\) work at that node for some \(C\). The total is: \[\begin{align*} T(n) &= \sum_{t=0}^{k} 3^t C \frac{n}{2^t} \\ &= Cn \sum_{t=0}^k \left(\frac{3}{2}\right)^t \\ &= C' n \left(\frac{3}{2}\right)^{k} & \text{for some $C'$ (geometric series)} \\ &= C' 3^k \\ &= C' 2^{k \log_2(3)} \\ &= C' n^{\log_2(3)} . \end{align*}\] We used that the sum of an increasing geometric series, in this case \(\frac{3}{2} + \left(\frac{3}{2}\right)^2 + \cdots + \left(\frac{3}{2}\right)^k\), is bounded by a constant times its final element.
There are even faster algorithms than Karatsuba’s for integer multiplication. They are also based on the idea of divide and conquer.