##### Lecture 15 • 2017/03/14

- Algorithm paradigms
- Greedy – decompose & reduce problem – top-down approach
- Dynamic programming – solve all possible sub-problems and use solutions to solve larger problems – bottom-up approach

- Going back to the activity selection problem: given activities, find subset with greatest total activity duration, where no two activity intervals overlap
- Greedy algorithm (Lecture 7) doesn’t always work, eg activities (0, 2), (1, 8), (7, 9)
- Greedy would pick (0, 2) & (7, 9) whereas the best solution is (1, 8)

- Binary choice
- Let OPT(j) denote the best solution for activities 1 to j
- Let p(j) denote the largest index i < j such that activity i is compatible with activity j
- Case 1: j is in the optimal solution
- Compute weight
_{j}+ OPT(p(j))

- Compute weight
- Case 2: j is not in optimal solution
- Compute OPT(j – 1)

- The maximum of the two cases denotes the optimal solution up to j
- If we draw out the recursive calls, we’ll notice that we often compute values of OPT(i), where i < j, multiple times. Instead, we can compute those values once, store them, and reuse them next time.
- Also notice that every OPT(j) depends only on OPT values at indices < j

- Greedy algorithm (Lecture 7) doesn’t always work, eg activities (0, 2), (1, 8), (7, 9)
- Memoization – cache results of each subproblem; lookup as needed
- Running time – O(n logn)
- Sort by finishing time O(n logn)
- Compute p(*) O(n logn)
- Compute OPT once O(1) (either return existing value or existing value and one sum)
- At most 2n recursive calls, O(n)

- Note that running time is O(n) for jobs presorted by start & finishing times
- Dynamic programming goal – solve sub-problems in order so that once you reach larger problems, the sub-solutions are already computed
- However, after the call, the algorithm computes the optimal value, not the activity set
- We may find the solution by backtracking
- For every potential j, if weight
_{j}+ OPT(p(j)) > OPT(j – 1), j is included in the set - Keep checking previous activities to find full set

- Pseudocode
`/* * Pseudocode for finding max weight activity set */ public class DynamicProgramming { int[] M //holds OPT int[] w //holds weight of activity int[] p //holds latest valid predecessor /** * Finds the max possible weight; * fills arrays above in the process * * @return max weight */ int findMaxWeight() { M.setAllValueTo(EMPTY) M[0] = 0 return findMaxWeight(M.length - 1) } int findMaxWeight(j) { if (M[j] == EMPTY) M[j] = max(v[j] + findMaxWeight(p[j]), findMaxWeight(j - 1)) return M[j] } /** * Find the activity set given the completion of the arrays above * * @param j index of activity in question (start at last index) * @return optimal set */ int[] findSolution(j) { if (j == 0) return [] if (w[j] + M[p[j]] > M[j - 1]) return findSolution(p[j]) + [j] return findSolution(p[j - 1]) //j is not in solution set } }`

- Example
activity 1 2 3 4 5 predecessor 0 0 2 2 3 Best weight M 2 3 4 9 9 V _{j}+ M[p(j)]2 3 4 9 8 M[j - 1] 0 2 3 4 9 - Reconstruction yields a
_{2}& a_{4}

- Reconstruction yields a

##### Lecture 16 • 2017/03/16

- Pairwise Sequence Alignment
- Goal: Map letters between two strings (a & b) such that the “distance” (see below) between the strings are minimized
- Letters must remain in order, but spaces can be added between them

- Definitions
- Match – letters are identical
- Substitution – letters are different
- Insertion – letter of b is mapped to empty character
- Deletion – letter of a is mapped to empty character
- Indels – group covering insertions & deletions

- Can be used to find similarities in amino acid sequences
- Counting alignments
- We may observe that the alignments of a and b must end by (a, -), (a, b), or (-, b) (deletion, match/substitution, insertion)
- If we define c(m, n) as the # of alignments formed between them, we see that c(m, n) = min(c(m – 1, n), c(m – 1, n – 1), c(m, n – 1))
- Base case – f(0, n) = f(m, 0) = f(0, 0) = 1

- Levenshtein Distance
- Minimal # of substitutions, insertions & deletions to transform one into the other
- Each of those actions adds one to the total distance

- Edit distance
- If every edit operation has a positive cost & an inverse operation with equal cost, the edit distance is metric
- d(x, y) ≥ 0 (separate axiom)
- d(x, y) = 0 iff x = y (coincidence axiom)
- d(x, y) = d(y, x) (symmetry)
- d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality)

- Optimal sub-structure – sub-alignment of optimal alignments are also optimal
- Cut-and-paste proof

- Backtracking
- Each move associated with one edit operation
- Vertical – insertion
- Diagonal – match/substitution
- Horizontal –deletion

- Find move that was used to find the value of the cell
- Apply recursively

- Each move associated with one edit operation
- Analysis
- Comparing two strings of length m & n is Ω(mn) time and Ω(mn) space
- It’s easy to save space and compute values in Ω(m + n) space by computing OPT(i, *) from OPT(i – 1, *); however, recovering alignment is harder

- Example
* - A T T G - 0 1 2 3 4 C 1 1 2 3 4 T 2 2 1 2 **3** - Pseudocode
`/* * Algorithm for finding the optimal sequence alignment for two strings */ public class NeedlemanWunsch { String a, b //two strings to compare; for our sake, both string start with '-' //and the first real letter is at index 1 int[][] d //matrix holding minimum distance up to two characters //get cost for two characters int delta(m, n) { return m == n ? 0 : 1 } /* * Compute lowest distance and store values for backtracking */ int getLowestDistance() { //map the borders (base case; assumption that one of the strings is empty) //therefore, distances will increase by one each time for (int i = 0; i < a.length(); i++) d[i][0] = i for (int j = 0; j < b.length(); j++) d[0][j] = j for (int i = 1; i < a.length(); i++) for (int j = 1; j < b.length(); j++) d[i][j] = min(d[i - 1][j] + delta(a[i], '-'), //deletion d[i - 1][j - 1] + delta(a[i], b[j]), //match/substitution d[i][j - 1] + delta('-', b[j])) //insertion return d[a.length() - 1][b.length() - 1] } /* * Find match pairs between the two strings; set i & j to last index initially */ Pair[] getSolution(i, j) { if (i == 0 || j == 0) return [] delta = delta(a[i], b[j]) if (d[i - 1][j] + delta(a[i], '-') == d[i][j]) return [Pair(a[i], '-')] + getSolution(i - 1, j) //deletion occurred if (d[i - 1][j - 1] + delta(a[i], b[j] == d[i][j]) return [Pair(a[i], b[j])] + getSolution(i - 1, j - 1) //match/substitution occurred return [Pair('-', b[j])] + getSolution(i, j - 1) //insertion occurred } }`

##### Lecture 17 • 2017/03/21

- To add onto the pairwise sequencing from last lectures, the approach may be modified with different weightings to provide different results, eg 1 and -1 for delta in bioinformatics
- Dijkstra’s algorithm & negative weights
- Weighted version of BFS – priority queue rather than FIFO queue
- Greedy choice – picks lightest edge at each step
- How do we deal with negative weight edges?
- Re-insertion in queue leads to exponential running time
- Adding constants to each edge to make them positive changes the question, because paths with different numbers of edges are incremented with different values

- Bellman-Ford algorithm
- Allows negative-weights
- Computer d[v] (shortest-path weights) and π[v] (predecessors) for all v ∈ V
- Return true if no negative-weight cycles are reachable form source, false otherwise
- If Bellman-Ford has not converged after V(G) – 1 iterations, there cannot be a shortest path tree, so there must be a negative weight cycle (longest path w/o cycles is V(G) – 1 in length)

- Pseudocode
`/* * Find shortest path between two nodes in a graph * Allows for negative-weight edges and will catch negative cycles if they occur * * Time Complexity O(VE) */ public class BellmanFord { /** * Check if shortest path exists * * @param G the graph * @param s source node * @return true if path exists, false otherwise (negative cycle) */ boolean initialize(G, s) { for (int i = 0; i < G.size() - 1; i++) for (u, v in G.edges) relax(u, v, w(u, v)) //set d[v] as min(d[v], d[u] + w(u, v)) for (u, v in G.edges) if (d[v] > d[u] + w(u, v)) return false //mismatch found return true } }`

- Dynamic Programming in Bellman-Ford
- Let d(i, j) = cost of shortest path from s to i with at most j hops
- Cases
i = s & j = 0 0 i ≠ s & j = 0 ∞ j > 0 min(d(k, j – 1) + w(k, i): i ∈ Adj(k), d(i, j – 1))

Either a valid predecessor's weight + current weight, or no change (achieved with fewer hops)

##### Lecture 18 • 2017/03/23

- Divide & Conquer
- Divide – split problems into smaller sub-problems
- Conquer – solve sub-problems recursively, or use base cases
- Combine – merge two sub-problems and eventually reach the original problem again

- Example – Merge Sort
- Divide – split n-elements to subsequences of n/2 elements
- Conquer – sort recursively using merge sort; once array has size 1, simply return that array
- Combine – merge two sorted subsequences to produce sorted answer

- Multiplication – Divide & Conquer
- For value x with n digits, let x
_{L}= x / 2^{n/2}, and let x_{R}= x % 2^{n/2} - Instead of using grade school multiplication (O(n
^{2})), we may compute x * y through their components

x * y = 2^{n}x_{L}y_{L}+ 2^{n/2}(x_{L}y_{R}+ x_{R}y_{L}) + x_{R}y_{R}

- For value x with n digits, let x

##### Lecture 19 • 2017/03/28

- Merge sort running time
- T(n) = execution time for size n = 2 * T(n/2) + n (2 sub calls + merge time)
- Binary Search T(n) = T(n/2) + 1
- Karatsuba T(n) = 3 * T(n/2) + n

- Master Theorem – solves commond divide and conquer runtimes
- General: T(n) = a(Tn/b) + f(n)
- a ≥ 1: # of subproblems
- b > 0: factor by which the subproblem size decreases
- f(n): work to divide/merge subproblems

- Recursion tree
- k: log
_{b}a - log
_{b}n: # of levels - a
^{i}: # of subproblems at level i - n/b
^{i}: size of subproblem at level i

- k: log
- Case 1 – cost dominated by cost of leaves
- If f(n) = O(n
^{k – ε}) for some ε > 0 - T(n) = Θ(n
^{k}) - Eg T(n) = 3T(n/2) + n → T(n) = Θ(n
^{log23})

- If f(n) = O(n
- Case 2 – cost evenly distributed among levels
- If f(n) = Θ(n
^{k}log^{p}n) - T(n) = Θ(n
^{k}log^{p + 1}n) - Eg T(n) = 2T(n/2) + Θ(n logn) → T(n) = Θ(n log
^{2}n)

- If f(n) = Θ(n
- Case 3 – cost dominated by cost of root
- If f(n) = Ω(n
^{k + ε}) for some ε > 0) and if a * f(n/b) ≤ c * f(n) for some c < 1 ∀ sufficiently large n (holds if f(n) = Θ (n^{k + ε})) - T(n) = Θ(f(n))
- Eg T(n) = 3T(n/4) + n
^{5}→ T(n) = Θ(n^{5})

- If f(n) = Ω(n