10 Greedy Algorithms and Knapsack
This section introduces greedy algorithms and the knapsack problem. Greedy is an important general algorithmic paradigm to know, including its pitfalls.
Objectives. After learning this material, you should be able to:
- Explain the concept of a “greedy” algorithm.
- Recognize exchange arguments for greedy proofs of correctness.
- Recognize variants of the knapsack problem and prove when greedy is optimal, or provide a counterexample when it is not.
10.1 Definition and knapsack
We call an algorithm greedy if it takes a sequence of decisions that each look “locally” optimal at the current moment, without appearing to worry about global solution quality. To get the idea, let’s look at an example.
Knapsack problem with uniform weights.
- Input: a list of items \(i = 1,\dots,n\), with a value \(v_i \geq 0\) for each. Also, an integer \(W \geq 0\).
- Output: the subset of \(W\) of the items that have the largest total value (i.e. sum of the values).
The problem’s name comes from the image that our knapsack can fit at most \(k\) items, and we want to put the highest-value total set in it. Before continuing, can you solve this problem?
A natural “greedy” algorithm is the following, which we’ll state informally rather than in pseudocode:
- repeat W times:
- add the highest-value item remaining to the knapsack.
This algorithm is greedy: it makes a sequence of decisions that are optimal in the moment, i.e. taking the highest-value item available. But it turns out to be globally optimal too. This is probably not surprising in this case, but proving it will introduce some important ideas.
Proposition 10.1 The greedy algorithm is correct, i.e. returns the subset of size \(W\) with highest total value.
Proof. We use an exchange argument, which involves showing that any other solution can be improved by swapping or exchanging part of its solution out for the greedy algorithm’s solution.
Let us re-name the items so that \(v_1 \geq v_2 \geq \cdots \geq v_n\). The greedy algorithm’s solution set is \(\{1,\dots,W\}\) with value \(\sum_{j=1}^W v_j\).
To prove it’s optimal, consider any other solution set \(S\), a subset of \(\{1,\dots,n\}\) of size at most \(W\). Suppose \(i > W\) and \(i \in S\). Then there must be some item \(j \in \{1,\dots,W\}\) such that \(j \not\in S\). Notice that \(j < i\), so \(v_j \geq v_i\).
Let’s make a swap: remove \(i\) from \(S\) and add \(j\) instead. The value of \(S\) goes up by \(v_j - v_i \geq 0\). So the value of \(S\) only improves.
Now let’s just repeat this process over and over. Eventually, \(S = \{1,\dots,W\}\) because we swap in all of the first \(W\) items that were missing. The value of \(S\) only goes up each time. So the greedy solution set is optimal.
While there’s no universal definition of greedy algorithms, many of them share the above components:
- The algorithm’s job is to construct a solution set.
- The goal is to maximize some total “value” of the solution across the entire set (a global objective).
- The algorithm proceeds by “greedily” choosing an item that looks the best “locally” in the current moment, and repeating this until the solution set is constructed.
We will see several other examples, but first, let’s look at variants of the problem where greedy fails.
10.2 Failure of greedy algorithms
For algorithm designers, almost as important as designing correct algorithms is identifying failure points of incorrect ones. Let’s look at a variant of knapsack where the greedy algorithm fails.
Knapsack problem.
- Input: a list of items \(i = 1,\dots,n\), with a value \(v_i \geq 0\) for each and a weight \(w_i \geq 0\) for each. Also, a weight limit \(W \geq 0\).
- Output: the subset of the items that have the largest total value (i.e. sum of the values), whose total weight is at most \(W\).
Let’s do an example.
| Item \(i\) | Value \(v_i\) | Weight \(w_i\) |
|---|---|---|
| 1 | 10 | 1 |
| 2 | 15 | 3 |
| 3 | 7 | 2 |
Suppose the weight limit is \(W = 4\). Our greedy algorithm is the same as before, but must respect the weight limit: add the highest value item, then the next, and so on. If an item doesn’t fit in our weight limit, skip it, and once no more items fit, we’re done.
Exercise 10.1 On the above instance with a weight limit \(W=4\), what output does this greedy algorithm produce? Is that optimal?
Solution.
Greedy first takes item 2, which has a value of 15. It has weight 3. It then takes item 1, with a value of 10 and weight of 1. The total value is now 25 and total weight is 4. Now, we can’t take any more items and stop.
Yes, 25 is the optimal possible solution for this instance. We can’t fit all three items, and we have the two most valuable items.
Now comes the key question. Can you solve this one?
Exercise 10.2 Give a different weight limit \(W\) so that, on the above instance, the greedy algorithm fails to find the optimal solution. Explain how.
Solution.
We can use \(W=3\). In this case, greedy will take item 2 and then stop, for a total value of 15. But we could take items 1 and 3 instead for a total value of 17, while still having a weight of 3.
It’s worth reflecting on why greedy failed here. We needed to take into account the resources being used up (i.e. weight) by the items, not just their value.
Here is another exercise. It considers a different type of greedy rule that tries to account for the weights.
Exercise 10.3 Consider the following greedy algorithm for knapsack: always take the item with smallest weight, until the knapsack is full (i.e. no more items fit in the weight limit). Is this algorithm always optimal? If so, give a proof. If not, give a counterexample and explain.
Example solution.
It is not always optimal. There are many possible counterexamples, but the general idea is to make low-weight items have very low values.
| Item \(i\) | Value \(v_i\) | Weight \(w_i\) |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 10 | 2 |
Suppose the weight limit is \(W=2\). This greedy algorithm will take items 1 and 2, for a total value of 2. But the optimal solution is to take just item 3, for a value of 10.