27.3-1

Explain how to coarsen the base case of P-MERGE\text{P-MERGE}.

Replace the condition on line 2 with a check that n<kn < k for some base case size kk. And instead of just copying over the particular element of AA to the right spot in BB, you would call a serial sort on the remaining segment of AA and copy the result of that over into the right spots in BB.

27.3-2

Instead of finding a median element in the larger subarray, as P-MERGE\text{P-MERGE} does, consider a variant that finds a median element of all the elements in the two sorted subarrays using the result of Exercise 9.3-8. Give pseudocode for an efficient multithreaded merging procedure that uses this median-finding procedure. Analyze your algorithm.

By a slight modification of exercise 9.3-8 we can find we can find the median of all elements in two sorted arrays of total length nn in O(lgn)O(\lg n) time. We'll modify P-MERGE\text{P-MERGE} to use this fact. Let MEDIAN(T,p1,r1,p2,r2)\text{MEDIAN}(T, p_1, r_1, p_2, r_2) be the function which returns a pair, qq, where q.posq.pos is the position of the median of all the elements TT which lie between positions p1p_1 and r1r_1, and between positions p2p_2 and r2r_2, and q.arrq.arr is 11 if the position is between p1p_1 and r1r_1, and 22 otherwise.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
P-MEDIAN-MERGE(T, p[1], r[1], p[2], r[2], A, p[3])
    n[1] = r[1] - p[1] + 1
    n[2] = r[2] - p[2] + 1
    if n[1] < n[2]              // ensure that n[1] ≥ n[2]
        exchange p[1] with p[2]
        exchange r[1] with r[2]
        exchange n[1] with n[2]
    if n[1] == 0              // both empty?
        return
    q = MEDIAN(T, p[1], r[1], p[2], r[2])
    if q.arr == 1
        q[2] = BINARY-SEARCH(T[q.pos], T, p[2], r[2])
        q[3] = p[3] + q.pos - p[1] + q[2] - p[2]
        A[q[3]] = T[q.pos]
        spawn P-MEDIAN-MERGE(T, p[1], q.pos - 1, p[2], q[2] - 1, A, p[3])
        P-MEDIAN-MERGE(T, q.pos + 1, r[1], q[2] + 1, r[2], A, p[3])
        sync
    else
        q[2] = BINARY-SEARCH(T[q.pos], T, p[1], r[1])
        q[3] = p[3] + q.pos - p[2] + q[2] - p[1]
        A[q[3]] = T[q.pos]
        spawn P-MEDIAN-MERGE(T, p[1], q[2] - 1, p[2], q.pos - 1, A, p[3])
        P-MEDIAN-MERGE(T, q[2] + 1, r[1], q.pos + 1, r[2], A, p[3])
        sync

The work is characterized by the recurrence T1(n)=O(lgn)+2T1(n/2)T_1(n) = O(\lg n) + 2T_1(n / 2), whose solution tells us that T1(n)=O(n)T_1(n) = O(n). The work is at least Ω(n)\Omega(n) since we need to examine each element, so the work is Θ(n)\Theta(n). The span satisfies the recurrence

T(n)=O(lgn)+O(lgn/2)+T(n/2)=O(lgn)+T(n/2)=Θ(lg2n), \begin{aligned} T_\infty(n) & = O(\lg n) + O(\lg n / 2) + T_\infty(n / 2) \\ & = O(\lg n) + T_\infty(n / 2) \\ & = \Theta(\lg^2 n), \end{aligned}

by exercise 4.6-2.

27.3-3

Give an efficient multithreaded algorithm for partitioning an array around a pivot, as is done by the PARTITION\text{PARTITION} procedure on page 171. You need not partition the array in place. Make your algorithm as parallel as possible. Analyze your algorithm. (Hint:\textit{Hint:} You may need an auxiliary array and may need to make more than one pass over the input elements.)

Suppose that there are cc different processors, and the array has length nn and you are going to use its last element as a pivot. Then, look at each chunk of size nc\lceil \frac{n}{c} \rceil of entries before the last element, give one to each processor. Then, each counts the number of elements that are less than the pivot. Then, we compute all the running sums of these values that are returned. This can be done easily by considering all of the subarrays placed along the leaves of a binary tree, and then summing up adjacent pairs. This computation can be done in time lg(min{c,n})\lg(\min\{c, n\}) since it's the log of the number of leaves. From there, we can compute all the running sums for each of the subarrays also in logarithmic time. This is by keeping track of the sum of all more left cousins of each internal node, which is found by adding the left sibling's sum vale to the left cousin value of the parent, with the root's left cousin value initiated to 00. This also just takes time the depth of the tree, so is lg(min{c,n})\lg(\min\{c, n\}). Once all of these values are computed at the root, it is the index that the subarray's elements less than the pivot should be put. To find the position where the subarray's elements larger than the root should be put, just put it at twice the sum value of the root minus the left cousin value for that subarray. Then, the time taken is just O(nc)O(\frac{n}{c}). By doing this procedure, the total work is just O(n)O(n), and the span is O(lgn)O(\lg n), and so has parallelization of O(nlgn)O(\frac{n}{\lg n}). This whole process is split across the several algoithms appearing here.

27.3-4

Give a multithreaded version of RECURSIVE-FFT\text{RECURSIVE-FFT} on page 911. Make your implementation as parallel as possible. Analyze your algorithm.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
P-RECURSIVE-FFT(a)
    n = a.length
    if n == 1
        return a
    w[n] = e^{2 * π * i / n}
    w = 1
    a(0) = [a[0], a[2]..a[n - 2]]
    a(1) = [a[1], a[3]..a[n - 1]]
    y(0) = spawn P-RECURSIVE-FFT(a[0])
    y(1) = P-RECURSIVE-FFT(a[1])
    sync
    parallel for k = 0 to n / 2 - 1
        y[k] = y[k](0) + w * y[k](1)
        y[k + n / 2] = y[k](0) - w * y[k](1)
        w = w * w[n]
    return y

P-RECURSIVE-FFT\text{P-RECURSIVE-FFT} parallelized over the two recursive calls, having a parallel for works because each of the iterations of the for loop touch independent sets of variables. The span of the procedure is only Θ(lgn)\Theta(\lg n) giving it a parallelization of Θ(n)\Theta(n).

27.3-5 \star

Give a multithreaded version of RANDOMIZED-SELECT\text{RANDOMIZED-SELECT} on page 216. Make your implementation as parallel as possible. Analyze your algorithm. (Hint:\textit{Hint:} Use the partitioning algorithm from Exercise 27.3-3.)

Randomly pick a pivot element, swap it with the last element, so that it is in the correct format for running the procedure described in 27.3-3. Run partition from problem 27.3-3. As an intermediate step, in that procedure, we compute the number of elements less than the pivot (T.root.sumT.root.sum), so keep track of that value after the end of partition. Then, if we have that it is less than kk, recurse on the subarray that was greater than or equal to the pivot, decreasing the order statistic of the element to be selected by T.root.sumT.root.sum. If it is larger than the order statistic of the element to be selected, then leave it unchanged and recurse on the subarray that was formed to be less than the pivot. A lot of the analysis in section 9.2 still applies, except replacing the timer needed for partitioning with the runtime of the algorithm in problem 27.3-3. The work is unchanged from the serial case because when c=1c = 1, the algorithm reduces to the serial algorithm for partitioning. For span, the O(n)O(n) term in the equation half way down page 218 can be replaced with an O(lgn)O(\lg n) term. It can be seen with the substitution method that the solution to this is logarithmic

E[T(n)]2nk=n/2n1Clgk+O(lgn)O(lgn).E[T(n)] \le \frac{2}{n} \sum_{k = \lfloor n / 2 \rfloor}^{n - 1} C\lg k + O(\lg n) \le O(\lg n).

So, the total span of this algorithm will still just be O(lgn)O(\lg n).

27.3-6 \star

Show how to multithread SELECT\text{SELECT} from Section 9.3. Make your implementation as parallel as possible. Analyze your algorithm.

Let MEDIAN(A)\text{MEDIAN}(A) denote a brute force method which returns the median element of the array AA. We will only use this to find the median of small arrays, in particular, those of size at most 55, so it will always run in constant time. We also let A[i..j]A[i..j] denote the array whose elements are A[i],A[i+1],,A[j]A[i], A[i + 1], \ldots, A[j]. The function P-PARTITION(A,x)\text{P-PARTITION}(A, x) is a multithreaded function which partitions AA around the input element xx and returns the number of elements in AA which are less than or equal to xx. Using a parallel for loop, its span is logarithmic in the number of elements in AA. The work is the same as the serialization, which is Θ(n)\Theta(n) according to section 9.3. The span satisfies the recurrence

T(n)=Θ(lgn/5)+T(n/5)+Θ(lgn)+T(7n/10+6)Θ(lgn)+T(n/5)+T(7n/10+6). \begin{aligned} T_\infty(n) & = \Theta(lg n / 5) + T_\infty(n / 5) + \Theta(\lg n) + T_\infty(7n / 10 + 6) \\ & \le \Theta(\lg n) + T_\infty(n / 5) + T_\infty(7n / 10 + 6). \end{aligned}

Using the substitution method we can show that T(n)=O(nϵ)T_\infty(n) = O(n^\epsilon) for some ϵ<1\epsilon < 1. In particular, ϵ=0.9\epsilon = 0.9 works. This gives a parallelization of Ω(n0.1)\Omega(n^0.1).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
P-SELECT(A, i)
    if n == 1
        return A[1]
    let T[1..floor(n / 5)] be a new array
    parallel for i = 0 to floor(n / 5) - 1
        T[i + 1] = MEDIAN(A[i * floor(n / 5)..i * floor(n / 5) + 4])
    if n / 5 is not an integer
        T[floor(n / 5)] = MEDIAN(A[5 * floor(n / 5)..n])
    x = P-SELECT(T, ceil(n / 5))
    k = P-PARTITION(A, x)
    if k == i
        return x
    else if i < k
        P-SELECT(A[1..k - 1], i)
    else
        P-SELECT(A[k + 1..n], i - k)