Homework Assignment #3

Exercise I. What is the result of Build-Max-Heap on the input A = [16, 4, 3, 9, 1, 44, 29]? Illustrate all intermediate steps.

What is the result of increasing the key 1 to 34 once the max-heap is built?

Exercise II. What is the result of the following sequence of operations on a (i) standard BST, (ii) an AVL-tree and (iii) a RB-tree: Insert(16), Insert(4), Insert(3), Insert(9), Insert(1), Insert(44), Insert(29), Delete(1), Delete(4), Insert(34).

Exercise III. The Birthday Paradox: If we assume that each person’s birthday is picked uniformly at random (and independently of everyone else in the room), then, even though there are 365 days in a year (disregarding leap-day) — in a party of 30 random people it holds with overwhelming probability that at least one pair of people have the same birthday. In this question, you will show why this holds.

Suppose that the are N possible colors, and each person is assigned a color uniformly at random and independently from anyone else.

• Show that for any two people i and j it holds that Pr[i and j have same color] = 1 . N

So intuitively, in a set of N people, we are likely to see at least one pair of people with the same color.

√

• Showthatifwetakeasetofn=⌈ 2N+1⌉people,thenwecanexpecttohaveatleastonepairof

The Birthday Paradox is the fact that this happens for much smaller sets — of the order of O( √

N).

people with the same color. (Make sure you define the suitable Bernoulli random variables.)

Exercise IV. Sorting Problems:

(a) Suppose that we want to sort an array A that contains n + k elements, where the first n elements are

sorted, and the latter k elements are arbitrary.

(An example with n = 8 and k = 2 is A = [1,4,17,19,31,33,34,73,72,12] )

nk

Assume n is very large (say, n = 106) and that k is a small constant (say, k = 10). Which of the sorting methods we have seen so far (InsertionSort, MergeSort, HeapSort, QuickSort) would be especially suitable for such a task? Explain why and justify your answer.

(b) Give an algorithm using the techniques and data structures you have seen in the course so far, for the following problem. The input is an array A of size n of integers and some k ≤ n. The output is the k largest elements of A. Your algorithm must run in O(n + k log n). Explain your algorithm and analyze its running time. What is the asymptotic upper bound on k for which your algorithm is still linear in n?

1

Problem 1. (45 pts) The Selection problem takes two inputs: an array (unsorted) A of n pairwise comparable elements, and an integer k ∈ [1, n]; and outputs the element x s.t. #{y ∈ A : y ≤ x} = k.

If we assume all elements in A are unique (which you may assume for the rest of the question), then your goal is to return the unique element which is strictly greater than exactly k − 1 other elements and strictly smaller than exactly n − k other element.

Note that you cannot assume k is constant. In fact, a common application of the Selection problem is to set k = n and return the median of the elements in A.

2

(a) (3 pts) Give an algorithm that takes Θ(n log(n)) time for the Selection problem.

In the remainder of the question, our goal is to design an algorithm for the Selection problem that runs in linear time in the number of elements in the array, namely runtime of O(n) where n = p − r + 1. Here is attempt #1:

Select(A, p, r, k)

s ← Partition(A, p, r) index ← s − p + 1

if (index = k) then

return A[s]

if (index > k) then

∗∗ precondition: (p − 1) + k ≤ r

return Select(A,p,(s−1),k) if (index < k) then
return Select(A,(s+1),r,k−index)
(b) (5 pts) Prove that Select indeed solves the Selection problem. I.e., that Select indeed returns x
that would appear in the k-th place had we sorted A.
(c) (2 pts) We invoke Select(A,1,n,k). Show that the best-case running time of Select is O(n) and
give an example of a best-case instance.
(d) (3 pts) We invoke Select(A, 1, n, k). Show that the worst-case running time of Select is Ω(n2) and give an example of a worst-case instance.
The randomized version of Select is described below.
RSelect(A, p, r, k ̇) ∗∗ precondition: p − 1 + k ≤ r i ← uniformly chosen random integer in {p, p + 1, . . . , r} exchange A[i] ↔ A[r]
s ← Partition(A, p, r)
index ← s − p + 1
if (index = k) then
return A[s]
if (index > k) then

return RSelect(A,p,(s−1),k) if (index < k) then
return RSelect(A,(s+1),r,k−index)
(e) (8 pts) Let f(n) denote the expect runtime of RSelect in the worst-case, i.e. f(n) = E[T(n)]. Prove
that there exists a constant c s.t.
f(1)≤c,and f(n)≤c·n+2 f(i)
∗∗ From here on the code is like Select
n−1 n
i=n/2
Hint: In the worst case, we always recurse on the largest of the two subarrays Partition creates.
2
(f) (2 pts) Prove that E[T(n)] = Θ(n).
And here is a deterministic version of RSelect (you may assume n is divisible by 5 or 10)
DSelect(A, p, r, k) ∗∗ precondition: p − 1 + k ≤ r if (r−p<5) then
Sort(A[p, ..., r]) and return A[k].
∗∗ Sorting can be done using even InsertionSort since any
∗∗ (reasonable) sorting algorithm on ≤ 5 elements takes O(1) time Let B be an array of length n/5.
for (i from 1 to n/5)
B[i] ← DSelect(A, (5(i − 1) + 1), 5i, 3)
key←DSelect(B,1,n/5,⌊1 ·n/5⌋) 2
for (j from 1 to n) if (A[j] = key) then
i←j exchange A[i] ↔ A[r]
s ← Partition(A, p, r) index ← s − p + 1
if (index = k) then
return A[s]
if (index > k) then

∗∗ From here on the code is like RSelect

return DSelect(A,p,(s−1),k) if (index < k) then
return DSelect(A,(s+1),r,k−index)
(g) (12 pts) Prove that the key(= A[i]) chosen by the recursive call of DSelect on B satisfies
#{x∈A:x≤key}≥ 3 n ,andsimilarly #{x∈A:x≥key}≥ 3 n 10 10
You may assume n is divisible by 10 and that all elements in A are unique.
(h) (10 pts) Write down the recursion describing the runtime T (n) of DSelect and prove that T (n) = Θ(n).
Hint: You may rely on previous homework without reproving them.
Problem 2. (10 pts) Lower bound on Ramsey numbers:
A clique on k nodes is the complete graph on these n nodes: between any two nodes there’s an edge.
Thus, a clique on k nodes has k edges. 2
(a) (2 pts) Suppose we color the edges of a clique on k nodes randomly, independently and uniformly in
red or blue. Show that the probability that the clique is monochromatic — namely, all edges are colored
− k2 −k−2 by the same color — is 2 2 .
k+1
(b) (1 pts) Argue by induction that for any k ≥ 3 we have 22 < 1.
k!
(c) (7 pts) Suppose we color the edges of a clique on n nodes randomly, independently and uniformly
in red or blue. Show that the expected number of monochromatic cliques of size k we create is strictly smaller than 1 given that k ≥ 2 log(n) ≥ 3.
Hint: Make sure you define the suitable Bernoulli random variables. You may also use the na ̈ıve bound: n = n·(n−1)·...·(n−k+1) ≤ nk .
k k! k!
Explanation: The Ramsey number R(k, l) is the smallest natural n such that any coloring of the edges
of the complete graph over n nodes in red or blue must produce either a monochromatic red clique of size k
k or a monochromatic blue clique of size l. We argue that you have just proven that R(k, k) > ⌊2 2 ⌋. 3

Note that by definition, the number of monochromatic k-cliques is an integer. If the expected number of monochromatic k-cliques is smaller than 1, then there has to be at least one coloring of the edges of a 2k/2-clique that creates no red k-clique and no blue k-clique! (This is because expectation is averaging — for each coloring of the edges in red and blue we count the number of monochromatic k-cliques we create and then average those numbers. Since the average of those integers is smaller than 1 then one of these integers must be smaller than 1 and therefore 0.)

This is a classic result by Erd ̋os (1947), which started the probabilistic method: providing examples without actually specifying them, but rather just pointing out to the fact that such examples must exist – using probability.

Problem 3. (30 pts)

(a) (15 pts) Your TAs love cereals — for the prize in the box. They have purchased n boxes of cereal, one of which contains the big (and heavy) prize. Instead of opening and emptying all n boxes, they’d like to use your help.

Design an algorithm that uses the scales in their lab — on which you can weigh any set of boxes against any other set of boxes and see which one is heavier — and minimizes the number of weighs required until the one heavier box that contains the prize is found.

(b) (15 pts) Prove that your algorithm is optimal. That is, if the algorithm that you design makes in the worst-case w weighs, then show that any other algorithm that is guaranteed to find the prize-box must also make w weighs in the worst-case.

Note: Ideally, your upper- and lower-bound will match exactly (or up to ±1). If you can’t get them to match exactly, have your bounds be asymptotically the same.

Problem 4. (25 pts) Various Sorting Problems.

(a) (10 pts) Propose an efficient algorithm for the following problem. The input is composed of a heap A of size n, and an additional element x. The algorithm’s objective is to return (say, print to the screen) all elements in A that are greater than or equal to x (note that x is not necessarily in A). The running time of your algorithm should be O(k) where k is the number of elements reported by the algorithm. Prove the correctness and runtime of the algorithm.

(b) (15 pts) Given a BST T, let us add a field size that keeps track of the number of nodes in T.

1. (2pt) How would you alter Insert() to update the size of the tree (and all its subtrees) when we

add a new node to T, in a way that doesn’t increase the runtime of Insert()?

2. (3pt) How would you alter Delete() to update the size of the tree (and all its subtrees) when we

remove a node from T, in a way that doesn’t increase the runtime of Delete()?

3. (10pt) Write a code (that uses the size field) for the quantile problem: given a BST T and a value v, Quantile(T, v) should return the number of elements in T which are ≤ v. Argue the correctness of your code, and justify its runtime.

Note how similar are the Select() problem and the Quantile() problem (except for the fact you were asked to solve the Quantile() problem only on BST, but we could define a similar problem for an array of numbers: Quantile(A, v)= #elements in A which are ≤ v.)

4