menu

Lecture 15 • 2017/03/14
  • Algorithm paradigms
    • Greedy – decompose & reduce problem – top-down approach
    • Dynamic programming – solve all possible sub-problems and use solutions to solve larger problems – bottom-up approach
  • Going back to the activity selection problem: given activities, find subset with greatest total activity duration, where no two activity intervals overlap
    • Greedy algorithm (Lecture 7) doesn’t always work, eg activities (0, 2), (1, 8), (7, 9)
      • Greedy would pick (0, 2) & (7, 9) whereas the best solution is (1, 8)
    • Binary choice
      • Let OPT(j) denote the best solution for activities 1 to j
      • Let p(j) denote the largest index i < j such that activity i is compatible with activity j
      • Case 1: j is in the optimal solution
        • Compute weightj + OPT(p(j))
      • Case 2: j is not in optimal solution
        • Compute OPT(j – 1)
      • The maximum of the two cases denotes the optimal solution up to j
      • If we draw out the recursive calls, we’ll notice that we often compute values of OPT(i), where i < j, multiple times. Instead, we can compute those values once, store them, and reuse them next time.
      • Also notice that every OPT(j) depends only on OPT values at indices < j
  • Memoization – cache results of each subproblem; lookup as needed
  • Running time – O(n logn)
    • Sort by finishing time O(n logn)
    • Compute p(*) O(n logn)
    • Compute OPT once O(1) (either return existing value or existing value and one sum)
      • At most 2n recursive calls, O(n)
  • Note that running time is O(n) for jobs presorted by start & finishing times
  • Dynamic programming goal – solve sub-problems in order so that once you reach larger problems, the sub-solutions are already computed
  • However, after the call, the algorithm computes the optimal value, not the activity set
    • We may find the solution by backtracking
    • For every potential j, if weightj + OPT(p(j)) > OPT(j – 1), j is included in the set
    • Keep checking previous activities to find full set
  • Pseudocode
    /*
     * Pseudocode for finding max weight activity set
     */
    public class DynamicProgramming {
    
        int[] M //holds OPT
        int[] w //holds weight of activity
        int[] p //holds latest valid predecessor
    
        /**
         * Finds the max possible weight;
         * fills arrays above in the process
         *
         * @return max weight
         */
        int findMaxWeight() {
            M.setAllValueTo(EMPTY)
            M[0] = 0
            return findMaxWeight(M.length - 1)
        }
    
        int findMaxWeight(j) {
            if (M[j] == EMPTY)
                M[j] = max(v[j] + findMaxWeight(p[j]), findMaxWeight(j - 1))
            return M[j]
        }
    
        /**
         * Find the activity set given the completion of the arrays above
         *
         * @param j index of activity in question (start at last index)
         * @return optimal set
         */
        int[] findSolution(j) {
            if (j == 0) return []
            if (w[j] + M[p[j]] > M[j - 1]) return findSolution(p[j]) + [j]
            return findSolution(p[j - 1]) //j is not in solution set
        }
    }
    
  • Example
    activity12345
    predecessor00223
    Best weight M23499
    Vj + M[p(j)]23498
    M[j - 1]02349




    • Reconstruction yields a2 & a4
Lecture 16 • 2017/03/16
  • Pairwise Sequence Alignment
    • Goal: Map letters between two strings (a & b) such that the “distance” (see below) between the strings are minimized
    • Letters must remain in order, but spaces can be added between them
  • Definitions
    • Match – letters are identical
    • Substitution – letters are different
    • Insertion – letter of b is mapped to empty character
    • Deletion – letter of a is mapped to empty character
    • Indels – group covering insertions & deletions
  • Can be used to find similarities in amino acid sequences
  • Counting alignments
    • We may observe that the alignments of a and b must end by (a, -), (a, b), or (-, b) (deletion, match/substitution, insertion)
    • If we define c(m, n) as the # of alignments formed between them, we see that c(m, n) = min(c(m – 1, n), c(m – 1, n – 1), c(m, n – 1))
    • Base case – f(0, n) = f(m, 0) = f(0, 0) = 1
  • Levenshtein Distance
    • Minimal # of substitutions, insertions & deletions to transform one into the other
    • Each of those actions adds one to the total distance
  • Edit distance
    • If every edit operation has a positive cost & an inverse operation with equal cost, the edit distance is metric
    • d(x, y) ≥ 0 (separate axiom)
    • d(x, y) = 0 iff x = y (coincidence axiom)
    • d(x, y) = d(y, x) (symmetry)
    • d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality)
  • Optimal sub-structure – sub-alignment of optimal alignments are also optimal
    • Cut-and-paste proof
  • Backtracking
    • Each move associated with one edit operation
      • Vertical – insertion
      • Diagonal – match/substitution
      • Horizontal –deletion
    • Find move that was used to find the value of the cell
    • Apply recursively
  • Analysis
    • Comparing two strings of length m & n is Ω(mn) time and Ω(mn) space
    • It’s easy to save space and compute values in Ω(m + n) space by computing OPT(i, *) from OPT(i – 1, *); however, recovering alignment is harder
  • Example
    *-ATTG
    -01234
    C11234
    T22123
  • Pseudocode
    /*
     * Algorithm for finding the optimal sequence alignment for two strings
     */
    public class NeedlemanWunsch {
    
        String a, b //two strings to compare; for our sake, both string start with '-'
        //and the first real letter is at index 1
        int[][] d //matrix holding minimum distance up to two characters
    
        //get cost for two characters
        int delta(m, n) {
            return m == n ? 0 : 1
        }
    
        /*
         * Compute lowest distance and store values for backtracking
         */
        int getLowestDistance() {
            //map the borders (base case; assumption that one of the strings is empty)
            //therefore, distances will increase by one each time
            for (int i = 0; i < a.length(); i++)
                d[i][0] = i
            for (int j = 0; j < b.length(); j++)
                d[0][j] = j
            for (int i = 1; i < a.length(); i++)
                for (int j = 1; j < b.length(); j++)
                    d[i][j] = min(d[i - 1][j] + delta(a[i], '-'), //deletion
                            d[i - 1][j - 1] + delta(a[i], b[j]), //match/substitution
                            d[i][j - 1] + delta('-', b[j])) //insertion
            return d[a.length() - 1][b.length() - 1]
        }
    
        /*
         * Find match pairs between the two strings; set i & j to last index initially
         */
        Pair[] getSolution(i, j) {
            if (i == 0 || j == 0) return []
            delta = delta(a[i], b[j])
            if (d[i - 1][j] + delta(a[i], '-') == d[i][j])
                return [Pair(a[i], '-')] + getSolution(i - 1, j) //deletion occurred
            if (d[i - 1][j - 1] + delta(a[i], b[j] == d[i][j])
                return [Pair(a[i], b[j])] + getSolution(i - 1, j - 1) //match/substitution occurred
            return [Pair('-', b[j])] + getSolution(i, j - 1) //insertion occurred
        }
    }
    
Lecture 17 • 2017/03/21
  • To add onto the pairwise sequencing from last lectures, the approach may be modified with different weightings to provide different results, eg 1 and -1 for delta in bioinformatics
  • Dijkstra’s algorithm & negative weights
    • Weighted version of BFS – priority queue rather than FIFO queue
    • Greedy choice – picks lightest edge at each step
    • How do we deal with negative weight edges?
      • Re-insertion in queue leads to exponential running time
      • Adding constants to each edge to make them positive changes the question, because paths with different numbers of edges are incremented with different values
  • Bellman-Ford algorithm
    • Allows negative-weights
    • Computer d[v] (shortest-path weights) and π[v] (predecessors) for all v ∈ V
    • Return true if no negative-weight cycles are reachable form source, false otherwise
    • If Bellman-Ford has not converged after V(G) – 1 iterations, there cannot be a shortest path tree, so there must be a negative weight cycle (longest path w/o cycles is V(G) – 1 in length)
  • Pseudocode
    /*
     * Find shortest path between two nodes in a graph
     * Allows for negative-weight edges and will catch negative cycles if they occur
     *
     * Time Complexity O(VE)
     */
    public class BellmanFord {
    
        /**
         * Check if shortest path exists
         *
         * @param G the graph
         * @param s source node
         * @return true if path exists, false otherwise (negative cycle)
         */
        boolean initialize(G, s) {
            for (int i = 0; i < G.size() - 1; i++)
                for (u, v in G.edges)
                    relax(u, v, w(u, v)) //set d[v] as min(d[v], d[u] + w(u, v))
            for (u, v in G.edges)
                if (d[v] > d[u] + w(u, v)) return false //mismatch found
            return true
        }
    }
    
  • Dynamic Programming in Bellman-Ford
    • Let d(i, j) = cost of shortest path from s to i with at most j hops
    • Cases
      • i = s & j = 0 0
      • i ≠ s & j = 0
      • j > 0 min(d(k, j – 1) + w(k, i): i ∈ Adj(k), d(i, j – 1))
        Either a valid predecessor's weight + current weight, or no change (achieved with fewer hops)
Lecture 18 • 2017/03/23
  • Divide & Conquer
    • Divide – split problems into smaller sub-problems
    • Conquer – solve sub-problems recursively, or use base cases
    • Combine – merge two sub-problems and eventually reach the original problem again
  • Example – Merge Sort
    • Divide – split n-elements to subsequences of n/2 elements
    • Conquer – sort recursively using merge sort; once array has size 1, simply return that array
    • Combine – merge two sorted subsequences to produce sorted answer
  • Multiplication – Divide & Conquer
    • For value x with n digits, let xL = x / 2n/2, and let xR = x % 2n/2
    • Instead of using grade school multiplication (O(n2)), we may compute x * y through their components
      x * y = 2nxLyL + 2n/2(xLyR + xRyL) + xRyR
Lecture 19 • 2017/03/28
  • Merge sort running time
    • T(n) = execution time for size n = 2 * T(n/2) + n (2 sub calls + merge time)
    • Binary Search T(n) = T(n/2) + 1
    • Karatsuba T(n) = 3 * T(n/2) + n
  • Master Theorem – solves commond divide and conquer runtimes
    • General: T(n) = a(Tn/b) + f(n)
    • a ≥ 1: # of subproblems
    • b > 0: factor by which the subproblem size decreases
    • f(n): work to divide/merge subproblems
  • Recursion tree
    • k: logba
    • logbn: # of levels
    • ai: # of subproblems at level i
    • n/bi: size of subproblem at level i
  • Case 1 – cost dominated by cost of leaves
    • If f(n) = O(nk – ε) for some ε > 0
    • T(n) = Θ(nk)
    • Eg T(n) = 3T(n/2) + n → T(n) = Θ(nlog23)
  • Case 2 – cost evenly distributed among levels
    • If f(n) = Θ(nklogpn)
    • T(n) = Θ(nklogp + 1n)
    • Eg T(n) = 2T(n/2) + Θ(n logn) → T(n) = Θ(n log2n)
  • Case 3 – cost dominated by cost of root
    • If f(n) = Ω(nk + ε) for some ε > 0) and if a * f(n/b) ≤ c * f(n) for some c < 1 ∀ sufficiently large n (holds if f(n) = Θ (nk + ε))
    • T(n) = Θ(f(n))
    • Eg T(n) = 3T(n/4) + n5 → T(n) = Θ(n5)