##### Lecture 15 • 2017/03/14
• Greedy – decompose & reduce problem – top-down approach
• Dynamic programming – solve all possible sub-problems and use solutions to solve larger problems – bottom-up approach
• Going back to the activity selection problem: given activities, find subset with greatest total activity duration, where no two activity intervals overlap
• Greedy algorithm (Lecture 7) doesn’t always work, eg activities (0, 2), (1, 8), (7, 9)
• Greedy would pick (0, 2) & (7, 9) whereas the best solution is (1, 8)
• Binary choice
• Let OPT(j) denote the best solution for activities 1 to j
• Let p(j) denote the largest index i < j such that activity i is compatible with activity j
• Case 1: j is in the optimal solution
• Compute weightj + OPT(p(j))
• Case 2: j is not in optimal solution
• Compute OPT(j – 1)
• The maximum of the two cases denotes the optimal solution up to j
• If we draw out the recursive calls, we’ll notice that we often compute values of OPT(i), where i < j, multiple times. Instead, we can compute those values once, store them, and reuse them next time.
• Also notice that every OPT(j) depends only on OPT values at indices < j
• Memoization – cache results of each subproblem; lookup as needed
• Running time – O(n logn)
• Sort by finishing time O(n logn)
• Compute p(*) O(n logn)
• Compute OPT once O(1) (either return existing value or existing value and one sum)
• At most 2n recursive calls, O(n)
• Note that running time is O(n) for jobs presorted by start & finishing times
• Dynamic programming goal – solve sub-problems in order so that once you reach larger problems, the sub-solutions are already computed
• However, after the call, the algorithm computes the optimal value, not the activity set
• We may find the solution by backtracking
• For every potential j, if weightj + OPT(p(j)) > OPT(j – 1), j is included in the set
• Keep checking previous activities to find full set
• Pseudocode
``````/*
* Pseudocode for finding max weight activity set
*/
public class DynamicProgramming {

int[] M //holds OPT
int[] w //holds weight of activity
int[] p //holds latest valid predecessor

/**
* Finds the max possible weight;
* fills arrays above in the process
*
* @return max weight
*/
int findMaxWeight() {
M.setAllValueTo(EMPTY)
M = 0
return findMaxWeight(M.length - 1)
}

int findMaxWeight(j) {
if (M[j] == EMPTY)
M[j] = max(v[j] + findMaxWeight(p[j]), findMaxWeight(j - 1))
return M[j]
}

/**
* Find the activity set given the completion of the arrays above
*
* @param j index of activity in question (start at last index)
* @return optimal set
*/
int[] findSolution(j) {
if (j == 0) return []
if (w[j] + M[p[j]] > M[j - 1]) return findSolution(p[j]) + [j]
return findSolution(p[j - 1]) //j is not in solution set
}
}
``````
• Example activity 1 2 3 4 5 predecessor 0 0 2 2 3 Best weight M 2 3 4 9 9 Vj + M[p(j)] 2 3 4 9 8 M[j - 1] 0 2 3 4 9 • Reconstruction yields a2 & a4
##### Lecture 16 • 2017/03/16
• Pairwise Sequence Alignment
• Goal: Map letters between two strings (a & b) such that the “distance” (see below) between the strings are minimized
• Letters must remain in order, but spaces can be added between them
• Definitions
• Match – letters are identical
• Substitution – letters are different
• Insertion – letter of b is mapped to empty character
• Deletion – letter of a is mapped to empty character
• Indels – group covering insertions & deletions
• Can be used to find similarities in amino acid sequences
• Counting alignments
• We may observe that the alignments of a and b must end by (a, -), (a, b), or (-, b) (deletion, match/substitution, insertion)
• If we define c(m, n) as the # of alignments formed between them, we see that c(m, n) = min(c(m – 1, n), c(m – 1, n – 1), c(m, n – 1))
• Base case – f(0, n) = f(m, 0) = f(0, 0) = 1
• Levenshtein Distance
• Minimal # of substitutions, insertions & deletions to transform one into the other
• Each of those actions adds one to the total distance
• Edit distance
• If every edit operation has a positive cost & an inverse operation with equal cost, the edit distance is metric
• d(x, y) ≥ 0 (separate axiom)
• d(x, y) = 0 iff x = y (coincidence axiom)
• d(x, y) = d(y, x) (symmetry)
• d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality)
• Optimal sub-structure – sub-alignment of optimal alignments are also optimal
• Cut-and-paste proof
• Backtracking
• Each move associated with one edit operation
• Vertical – insertion
• Diagonal – match/substitution
• Horizontal –deletion
• Find move that was used to find the value of the cell
• Apply recursively
• Analysis
• Comparing two strings of length m & n is Ω(mn) time and Ω(mn) space
• It’s easy to save space and compute values in Ω(m + n) space by computing OPT(i, *) from OPT(i – 1, *); however, recovering alignment is harder
• Example * - A T T G - 0 1 2 3 4 C 1 1 2 3 4 T 2 2 1 2 3
• Pseudocode
``````/*
* Algorithm for finding the optimal sequence alignment for two strings
*/
public class NeedlemanWunsch {

String a, b //two strings to compare; for our sake, both string start with '-'
//and the first real letter is at index 1
int[][] d //matrix holding minimum distance up to two characters

//get cost for two characters
int delta(m, n) {
return m == n ? 0 : 1
}

/*
* Compute lowest distance and store values for backtracking
*/
int getLowestDistance() {
//map the borders (base case; assumption that one of the strings is empty)
//therefore, distances will increase by one each time
for (int i = 0; i < a.length(); i++)
d[i] = i
for (int j = 0; j < b.length(); j++)
d[j] = j
for (int i = 1; i < a.length(); i++)
for (int j = 1; j < b.length(); j++)
d[i][j] = min(d[i - 1][j] + delta(a[i], '-'), //deletion
d[i - 1][j - 1] + delta(a[i], b[j]), //match/substitution
d[i][j - 1] + delta('-', b[j])) //insertion
return d[a.length() - 1][b.length() - 1]
}

/*
* Find match pairs between the two strings; set i & j to last index initially
*/
Pair[] getSolution(i, j) {
if (i == 0 || j == 0) return []
delta = delta(a[i], b[j])
if (d[i - 1][j] + delta(a[i], '-') == d[i][j])
return [Pair(a[i], '-')] + getSolution(i - 1, j) //deletion occurred
if (d[i - 1][j - 1] + delta(a[i], b[j] == d[i][j])
return [Pair(a[i], b[j])] + getSolution(i - 1, j - 1) //match/substitution occurred
return [Pair('-', b[j])] + getSolution(i, j - 1) //insertion occurred
}
}
``````
##### Lecture 17 • 2017/03/21
• To add onto the pairwise sequencing from last lectures, the approach may be modified with different weightings to provide different results, eg 1 and -1 for delta in bioinformatics
• Dijkstra’s algorithm & negative weights
• Weighted version of BFS – priority queue rather than FIFO queue
• Greedy choice – picks lightest edge at each step
• How do we deal with negative weight edges?
• Re-insertion in queue leads to exponential running time
• Adding constants to each edge to make them positive changes the question, because paths with different numbers of edges are incremented with different values
• Bellman-Ford algorithm
• Allows negative-weights
• Computer d[v] (shortest-path weights) and π[v] (predecessors) for all v ∈ V
• Return true if no negative-weight cycles are reachable form source, false otherwise
• If Bellman-Ford has not converged after V(G) – 1 iterations, there cannot be a shortest path tree, so there must be a negative weight cycle (longest path w/o cycles is V(G) – 1 in length)
• Pseudocode
``````/*
* Find shortest path between two nodes in a graph
* Allows for negative-weight edges and will catch negative cycles if they occur
*
* Time Complexity O(VE)
*/
public class BellmanFord {

/**
* Check if shortest path exists
*
* @param G the graph
* @param s source node
* @return true if path exists, false otherwise (negative cycle)
*/
boolean initialize(G, s) {
for (int i = 0; i < G.size() - 1; i++)
for (u, v in G.edges)
relax(u, v, w(u, v)) //set d[v] as min(d[v], d[u] + w(u, v))
for (u, v in G.edges)
if (d[v] > d[u] + w(u, v)) return false //mismatch found
return true
}
}
``````
• Dynamic Programming in Bellman-Ford
• Let d(i, j) = cost of shortest path from s to i with at most j hops
• Cases
•  i = s & j = 0 0
•  i ≠ s & j = 0 ∞
•  j > 0 min(d(k, j – 1) + w(k, i): i ∈ Adj(k), d(i, j – 1))Either a valid predecessor's weight + current weight, or no change (achieved with fewer hops)
##### Lecture 18 • 2017/03/23
• Divide & Conquer
• Divide – split problems into smaller sub-problems
• Conquer – solve sub-problems recursively, or use base cases
• Combine – merge two sub-problems and eventually reach the original problem again
• Example – Merge Sort
• Divide – split n-elements to subsequences of n/2 elements
• Conquer – sort recursively using merge sort; once array has size 1, simply return that array
• Combine – merge two sorted subsequences to produce sorted answer
• Multiplication – Divide & Conquer
• For value x with n digits, let xL = x / 2n/2, and let xR = x % 2n/2
• Instead of using grade school multiplication (O(n2)), we may compute x * y through their components
x * y = 2nxLyL + 2n/2(xLyR + xRyL) + xRyR
##### Lecture 19 • 2017/03/28
• Merge sort running time
• T(n) = execution time for size n = 2 * T(n/2) + n (2 sub calls + merge time)
• Binary Search T(n) = T(n/2) + 1
• Karatsuba T(n) = 3 * T(n/2) + n
• Master Theorem – solves commond divide and conquer runtimes
• General: T(n) = a(Tn/b) + f(n)
• a ≥ 1: # of subproblems
• b > 0: factor by which the subproblem size decreases
• f(n): work to divide/merge subproblems
• Recursion tree
• k: logba
• logbn: # of levels
• ai: # of subproblems at level i
• n/bi: size of subproblem at level i
• Case 1 – cost dominated by cost of leaves
• If f(n) = O(nk – ε) for some ε > 0
• T(n) = Θ(nk)
• Eg T(n) = 3T(n/2) + n → T(n) = Θ(nlog23)
• Case 2 – cost evenly distributed among levels
• If f(n) = Θ(nklogpn)
• T(n) = Θ(nklogp + 1n)
• Eg T(n) = 2T(n/2) + Θ(n logn) → T(n) = Θ(n log2n)
• Case 3 – cost dominated by cost of root
• If f(n) = Ω(nk + ε) for some ε > 0) and if a * f(n/b) ≤ c * f(n) for some c < 1 ∀ sufficiently large n (holds if f(n) = Θ (nk + ε))
• T(n) = Θ(f(n))
• Eg T(n) = 3T(n/4) + n5 → T(n) = Θ(n5)