The City College CUNY Class: CSC 22000 Algorithms

Professor: Peter Brass, NAC 8/208, office hours: Wed 3-4pm; email: peter@cs.ccny.cuny.edu, phjmbrass@gmail.com (homework submission)

4 HW projects: each implementation of an algorithm in C/C++
Interface & test code provided, submission must pass on prof's computer. No partial Credit - (pass/fail only).

Recommended reading: Kernighan/Ritchie's The C Programming Language.

Recommended compiler: gcc/g++

Grading:
median of (HW, midterm, final) + adjustments (positive for early HW submission; negative for too many resubmissions, resubmission without fixing, does not compile) = course grade.
To get A: 1) pass 4 HW and either A in Midterm or Final; Or 2) pass 1 HW and A in both midterm and final. All tests are open book (printouts only, no laptop).

Lookup algorithms in Wikipedia. Everything in there.

This entry was posted in Projects, Technical. Bookmark the permalink.

22 Responses to The City College CUNY Class: CSC 22000 Algorithms

  1. timlyg says:

    Sorting n numbers:
    Given n numbers in an array, we want to sort them in increasing order so:

    Bubble Sort Algorithm
    idea, sample run, code, runtime (worst case scenario/upper bound).
    two loops.
    Outer loop runs at most n times.
    Inner loop takes n*c time of test and exchange.
    Upper bound for bubblesort for array of length n is runtime < c * n^2, where c is time of test-and-exchange.
    How many exchange operations do we need in the worst case?
    call (i,j) a transposition of array a[] if a[i]>a[j] and i < j.

    Each exchange in bubblesort (two neighboring elements) decreases number of such wrong pair (i,j) by one, if we start with a decreasing sorted array then all pairs (i,j) with i < j are wrong pairs.
    such pairs = , so time required to sort decreasing array into increasing order by bubblesort , where is time of exchange.

    there are length n with runtime > c_1 * ~ , where is time of exchange.

  2. timlyg says:

    02/03/2016
    Asymptotic Analysis / O-Notation
    (Landau symbols)
    Define some abbreviations

    1) f(n) = O(g(n)) is abbreviation for

    There is a constant C such that for all n, f(n) < C g(n)
    f, g are assumed to be positive.

    2) is abbreviation for,
    there is a constant C > 0, such that for all n

    3) is abbreviation for
    there are constants such that

    4) f(n) = o(g(n)) is abbreviation for
    for all C > 0, there is an such that for all

    5) is abbreviation for
    for all C, there is an such that for all

    Comparing two explicitly given functions:
    if exists and is <
    then f(n) = O(g(n))
    if limit , then f(n) = o(g(n))
    if limit exists and is > 0, then

    If f(n) = O(g(n)) and g is increasing
    then
    Why?


    where n in , are summands of all

    Selectionsort / Insertionsort
    Incremental sorting: we maintain a sorted part of the array and add new elements at the right place
    Insertionsort: Want to sort array a[]
    maintain sorted part a[0]...a[i-1]
    Insert a[i] in the right place, shifting everything behind it backward.

    Analysis of insertionsort:
    The inner loop shifts at most i element, so takes O(i).
    Outer loop runs i=1...(n-1)
    so total time is

    Selectionsort idea:
    Want to avoid shifting the elements of the sorted array.
    We can do this by the right choice of the next element:
    if we always insert at the end, we don't have to shift.
    So - first find the largest element, put it in last place, then find second largest, put it in the second last place...(prof's method)
    Analysis: runtime

    We saw three different algorithms,
    each solves in
    and reads for some input arrays

    For testing a sorting function you want to generate an array which is "mixed" but you know what it should be after sorting.
    Select p prime, constant b, then (i*b) mod p for i = 0...p-1
    is a permutation of 0, 1, ..., p-1

  3. timlyg says:

    02/08/2016
    Fast Sorting Methods I: Mergesort
    Idea: Divide and Conquer
    Divide our input array in two (halves), takes O(1), sort each separately <- Recursive Call, takes 2 * time(n/2)
    merge the sorted arrays, takes O(n).
    Recurrence Relation for the runtime time(n) 2 * time(n/2) + O(n)

    Merging two sorted arrays of combined length n takes O(n) time.
    repeat until one array is empty,
    compare first element of both arrays
    move the smaller to the output array
    then move the rest of the other array to the output array.

    Prove Bound for runtime using INDUCTION
    Want to show: time(n) = O(n log n)
    means there is a C such that time(n) C n log n
    We have: time(n) 2 time(n/2) + O(n)
    means there is a B such that time(n) 2 time(n/2) + Bn,
    B is given, C we can choose

    Want to show: time(n) C n log n for some C(we choose)
    Have: time(n) 2 time(n/2) + Bn (B given)
    Proof by induction: Assume we have already for all k < n
    time(k) C k log k
    then time(n/2) C (n/2) log (n/2) = C(n/2)log(n) - C(n/2)log2
    (log taken to base 2, so log 2 = 1)
    So time(n) 2(C(n/2)log(n) - C(n/2)log 2) + Bn = Cnlog(n) - Cn log 2 + Bn
    Choose C B, then -Cn + Bn 0
    So time(n) Cn log(n)

    So, mergesort needs O(n log n) to sort n numbers.

    Exercise:
    Show
    Ans:
    We have:

    we want:
    , we choose A
    Inductive Assumption: for k < n
    then
    So,
    Choose A such that

    Mergesort without recursion: Breaking array into two (even side and odd side), back in the classic Fortran day.
    When system restrict memory usage, look into: ulimit -s unlimited

  4. timlyg says:

    02/10/2016
    Prof. called us to go for CUNY Math Challenge.
    In the fall there will be ACM programming Contest: International Collegiate Programming Contest
    Look at UVA Problem Archives if interested (link?).

    Quicksort
    idea: Randomized Algorithm
    Means: algorithm makes random choices during execution, so two runs on the same data will not be same.
    We ask for worst case over all input of the expected runtime.

    Idea of Quicksort:
    want to sort array of n elements
    we choose a random element of the array as comparison element (pivot).
    Divide array into those elements smaller (go in front) and those larger than the comparison element.
    Recursively sort smaller part, larger part.

    How to divide array a[] into two parts.
    See cxx file.

    Runtime f(n) is a random variable,

    recursions: f(i)
    on smaller part, larger part: f(n-i)
    time to divide into smaller, larger: O(n)
    Claim: This implies: f(n) = O(n log n)
    Have: , C is given
    Want: f(n) A n log n (we choose A)
    Inductive assumption: for k < n



    [what is left over]

    Define function H(x) = xlog(x) + (i-x)log(1-x) entropy function
    for

    H(1/4) < 0
    Choose then , so
    Anlog(n) as claimed.

    Qsort supposed to be fastest, other faster ones are not too important.
    Qsort libraries usually use middle element (how they even find middle element?) as pivot, not randomized, which is essential to qsort.

  5. timlyg says:

    02/17/2016
    Fast sorting Methods:
    Mergesort: divide and conquer algorithm
    quicksort: randomized algorithm
    Heapsort: Use of data structure

    Data Structure:
    We give interface, it has to implement the given operations correctly, and as fast as possible.
    Heap data structure: maintains a set of key values (or (key,object) pair).
    Heapsort properties: insert, findmin, deletemin
    Many ways to implement it. All with O(log n) for all operations.

    Array-based heap
    we store the elements currently in the heap in an array H[]
    with properties:
    1) we always us a prefix of H[] so if currently there are i elements in the heap, they are in posistions H[0]...H[i-1]
    2) Heap order if there are i elements in the heap
    then heap[j] <= heap[2*j+1] if 2j+1 <= i-1
    heap[j] <=heap[2*j+2] 2j+2 <= i-1

    Insert: If there are currently i elements, then by the no-gaps condition, new element needs to use first empty space H[i] but then in violates Heap order, so we exchange downard until heap order is satisfied. O(log n)

    Deletemin: We have to remove the element at the bottom we also have to make the heap one element shorter (no gaps).
    We have to restore heap order.

    Classical method: move last element in the gap at the bottom,
    then move it up until heap order is satisfied.
    To move item H[j] up, we have to compare H[2j+1] and H[2j+2]
    with H[j], if the smaller of H[j+1], H[2j+2] is smaller than H[j], exchange with that.

  6. timlyg says:

    02/22/2016
    See notes

  7. timlyg says:

    02/24/2016
    See notes

  8. timlyg says:

    02/29/2016
    Topological Sorting
    Given a directed graph (vertices, directed edges)
    We want to find a sequence of the vertices such that all edges go from earlier to later vertices (edges go from left to right)
    Or show that no such sequence exists.

    If there is a directed cycle, then no such sequence exists. If no such sequence exists, then there is a directed cycle.

    Algorithm: Find a vertex with no incoming edges. Put that first, remove it from the graph. Repeat.
    If there is no such vertex, then there is a directed cycle.

    Suppose we have a directed graph in which each vertex has at least one incoming edge. Then this graph has a directed cycle.
    Proof: Choose any starting vertex, and take its predecessor. Repeat: We get a sequence of vertices and edges.

    This is an infinite sequence, but only finite number of vertices. There will be first vertex , that has occurred earlier .

    we found directed cycle.

    Algorithm (more detail)
    Make a pass over all directed edges, keep track of indegree of each vertex (any of indegrees, initially 0, for edge i->j increment degree[j])
    Took O(E)
    make pass over all vertices, put vertices of indegree 0 on a stack. Took O(V).
    Take a vertex from the stack (if not possible, with finite or no squence) put it next in output sequence. Take all edges from that vertex, O(E). Decrease indegree of their end vertices, if indegree is O. Put in stack.

    Topological sorting can be done in O(E).
    If the vertices are numbered: 0...v-1
    (Else use such take or other dictionary structure to map vertex ID to number)

    Median finding /Rank Order
    We are given n items, and want to find the k-th largest of them. (k = n/2 median).
    If we sort them, we can just look it up. But sorting takes O(n log n).
    For k - 1, or k = n
    We can answer in O(n).
    What about other k?

    Can be done in O(n) expected time for any k.

    Algorithm Quickselect:
    Want to find the k-th largest element of n given elements.
    Select random comparison element (pivot) a[j], divide array into smaller and larger elements.
    Let b be the number of i with a[i] < a[j].
    If b < k we recursively search for the k-b smallest array items >= a[j]
    Else, we search for the smallest array item < a[j].

    Balanced Search Trees
    (Height-balanced trees, leaf tree model)

    *Leaf tree model: Data is stored only in the leaves, interior nodes just for left/right decision in nodes, no test for n -> key == query_key, only happens in the leaf.

    *Node tree model: Each key comes with its data object, in each node test if n -> key == query_key.

    Find question:
    -Needs to encode when a node is a leaf
    -Needs to store the associated data in the leaf.
    Any convention of your choice.

    Insert new key in the tree go down to the leaf where key should go.
    There is already different key there. So,
    - create two nodes (leaves) with old key new key. Make previous leaf if old_key < new_key.
    - If old_key < new_key, then old_key leave goes left, new_key leaf goes right.
    Key in new interior node is max(old_key, new_key).

  9. timlyg says:

    03/02/2016
    Balancing trees: rotations
    Universal operations - a tree can be transformed in a ny other search tree with the same leaves and interior keys by rotation.

    Want to show: a height balanced tree with n leaves has height O(log n)



    for some number > 1.

    leaves(h) = minimum number of leaves of a height-balanced tree of height h
    Want to show leaves(h) > b^n, for some b > 1

    Minimal trees = Fibonacci Trees:
    h = 0, 1 leaf
    h = 1, 2 leaves
    h = 2, 3 leaves
    ...

  10. timlyg says:

    03/07/2016
    Height Balanced Tree
    Binary search tree: In each interior node the height of left subtree and the height of right subtree differ at most by one.
    HBT: insert/delete possibly affect only height of nodes with changed leaf, not for other nodes.

    Rotations of tree (for insertion/deletion)

  11. timlyg says:

    03/09/2016
    O-notation sorting functions according to growth rate
    So: given some functions:
    Sort them in increasing growth rate.
    Given functions f(n), g(n)
    then if lim f(n)/g(n) = +
    then f(n) grows faster than g(n)

    Things to remember (log n) grows slower than n
    for any constant K grows slower than n

    if you have function with n in base and in exponent always try use everything to exponent.

    Test Review:

    Example: Given a function for which f(n+1) - f(n) = \Omega(n)
    Ans:
    Exponent 1: f(n) = n
    n+1-n = 1 is not Ω(n)
    Exponent 2: f(n) = (n)^2
    (n+1)^2 - n^2 = 2n+1 = Ω(n)
    Found: f(n) = n^2

    Example: f(n) - f(n/2) = \Theta(1)
    Ans: f(n) = log n

    Bubblesort, insertion sort, selection sort (all O(n^2))
    Mergesort, quicksort: O(n logn)
    Heapsort: O(nlogn), insert/delete/find: O(logn)

    Decision tree: Lower bound for comparison-based sorting

    Linear time sorting with additional assumptions: counting sort, radix sort

    Linear time median selection: topological sorting

    Search Trees (leaf tree model)

    Height-balanced search trees

  12. timlyg says:

    03/28/2016
    Graph Algorithms
    1) Graph Exploration, Connectivity: Can we get there from here?
    State Space Exploration

    Graph: Given a start vertex. Methods to generate for each vertex its neighbor want to visit/generate all vertices that can be reaached from the start vertex.
    (Directed graph: If u is generated as neighbor of v, it is not required (so far) that v is generated as neighbor of u)

    BFS, uses queue
    DFS uses a stack
    both need a structure to maintain which vertices have been visited (search tree, hash table, bit field)

    Algorithm: S start vertex given
    put S on stack/in queue
    while stack/queue not empty
    {take vertex x from stack/queue
    if (x not visited)
    {mark x as visited,
    generate all neighbor y_1...y_k of x
    for each y_i, if (y_i is not visited) insert y_i on stack/in queue
    }
    }

    Each vertex can be added to the queue once from each neighbor so upper bound for number of add-to-queue operations is \sum_v deg(v) = 2e
    So there are \leq 2e add-to-queue operations so also \leq 2e remove from queue operations
    So \leq 2e steps in total, where step includes
    - operate a neighbor O(1)
    - insert/delete from queue O(1)
    - mark visited/check if visited O(1) or O(log n)
    Total time of BFS/DFS:
    O(e) if vertices are numbered
    O(elog v) if vertices have arbitrary integer.

    Why BFS works:
    BFS generates/visits vertices in sequence of increasing distance from start vertex became queue has always this structure.
    So we can get shortest paths from BFS

    To use BFS to find shortest paths, we need to keep track in our queue for each vertex from where it was visited.
    Get a tree of shortest paths pointing back to start vertex to find path from the start vertex, we have to invert the sequence. By putting it on a stack.

    DFS without state copies:
    On our stack we put the move, not the state
    Needs undirected graph: every move allows an undo.
    algorithm given

  13. timlyg says:

    03/30/2016
    Shortest paths in Graphs with Edge lengths

    Do shortest paths exist? No if there are negative cycles, there are no shortest paths length can get as negative as we want.

    If there are no negative cycles, then there are always shortest paths because assume we have a sequence of shorter and shorter paths )infinite sequence a) so paths got longer and longer. But we have only fixed number of vertices so they have to repeat some cycle again and again.

    If there are no negative edges, then the algorithm of choice is Dijkstra's algorithm.
    Give a graph with positive edgelengths and a start vertex s. We simultaneously compute shortest paths to all vertices of the graph.
    Maintain for each vertex the length of the best path through finished vertices that we currently know.
    Initially: s finished, dist(s) = 0
    all neighbors v1...vk of s are boundary
    dist(vi) = edgelength(s,vi)
    Repeat until there are no boundary vertices left
    { take boundary vertex x with dist(x) minimal
    change x to finished.
    For each neighbor y of x
    {if y is not finished) set y to boundary
    dist(y) = min(dist(y), dist(x) + edgelength(x,y)

    Dijkstra with loop:
    v times: select from heap the smallest element => total O(vlogv)
    \leq 2e times (for each neighbor) either - insert in heap O(logv) or - decrease distance in loop O(logv) => total O(elogv)
    Total Time O(elogv)

  14. timlyg says:

    04/04/2016
    Minimum Spanning Trees
    Undirected graph with edgelengths, edgelengths >= 0
    We want the shortest subgraph that connects all vertices. Tree that connects all vertices.

    Prim's Algorithm
    Choose a start vertex (any)
    Look at all out going edge of the current tree add the cheapest out going edge.
    repeat until all is connected.

    Need heap, need structure to keep track of visited vertice.

    Choose start vertex, mark start vertex visited.
    All others not visited.
    Insert all outgoing edges of start vertex into heap repeat while heap not empty
    {
    take minimum weight edge ab from the heap
    if a visited && b not visited
    mark b visited, select edge ab for the tree, add all outgoing edges of b to the heap
    else if b visited && a not visited
    mark a visited.....

    Analysis: Loop which enters and deletes each edge into/from the heap
    each edge is at most twice enetered in the heap
    (one from each end point)
    entering/deleting from heap takes O(log V) time
    Mark vertices as visited, look up visited status if done by search tree, takes O(log V)
    do this for each edge so total O(ElogV)

    Need to show correctness:
    The tree that Prim's Algorithm produces is really the minimum spanning tree.
    Assume the opposite: There is a spanning tree
    whose total length is smaller than the tree Prim's algorithm

    T_prim: tree produced by Prim's algorithm
    T_opt: the minimum spanning tree
    Assumption: length(T_opt) < length(T_prim)
    Let ab be in edge in T_opt not in T_prim
    Find such edges, so all edges chosen by Prim's algorithm before ab are also in T_opt.

    1) Look up Minimum Spanning Tree, Prim's Algorithm in Wikipedia.
    2) can fine tuen the algorithm, especially: should stop earlier. Should stop when all vertices are connected.
    3) Krushkal's algorithm

  15. timlyg says:

    04/06/2016
    MST: Prim, Kruskal's Algorithm
    Kruskal

    Set union structure: root-directed trees
    with union by rank
    Each vertex gets one outgoing edge/pointer to another vertex
    if that pointer is NULL, the vertex is a root.
    Initially every vertex is root
    Later: some vertices point to other vertices, and we can follow pointers until we find root.

    Since component test for vertices u, v follow for both vertices the pointers until you find the root
    if root(u) = root(v): same component
    else: different component.

    To join components containing vertices u, v
    find root(u), root(v)
    test that they are distinct point one to the other.

    Each vertex gets an integer,
    the rank: initially each rank = 0
    We always point smaller root to larger root on joining
    if they are the same, we increment the roots.

    Along any path the ranks are increasing length of path <= maximum rank.
    Component with root at rank k has at least 2^k vertices.

    Complexity of same_component & join_component test is O(log v)

    Kruskals Alg.:
    First: sort the edges, takes O(E log E) = O(E log v)
    then for each edge, do same-component test, possibly join components. O(E log v)

    End of distance problem in Graphs

    Next:
    Maximum Flows on Graphs
    bottleneck connections.
    Graph directed capacities for each edge
    Assign each edge a flow value.
    1) Flow value is <= edge capacity
    2) Flow value >= 0
    3) for all vertices but s, t (source and target)
    we have inflow = outflow
    Flow preservation: sumflow(uv) = sum !flow(vu)

    An assignment of flow values to directed edges that satisfies these conditions is a valid flow Maximum Flow problem
    Maximize the flow from s to t
    = maximum outflow of s in any valid flow.

    Application in UPS?

  16. timlyg says:

    04/11/2016
    Maximum Flow Problems
    Given a directed graph
    each edge has a capacity
    there are two special vertices s (source), t (sink)
    aim maximize outflow of of s = inflow of t

    A valid flow
    assigns each directed edge a flow value
    which is between O and capacity of the edge such that for each vertex (excluding s, t)
    the inflow of the vertex = the outflow of the vertex

    Idea of algorithm
    Residual capacity graph
    given a flow, the residual capacity graph represents the possible changes of the flow

    Residual capacities
    A new directed graph on the same vertex set (again a flow problem)
    each edge in the original graph is replaced by two edges:
    (u,v): forward edge, capacity
    (v,u): backward edge, residual capacity: flow(u,v)

    ---------
    if the original graph already had two edges (u,v) (v,u)
    then we generate forward and backward edges for both, merge these in the same direction.

    Cut: divide vertex set into two classes, one containing s, the other containing t.
    Capacity of the cut:
    The sum of edge capacities of edges going across the cut from s-side to the t-side

    Since any flow from s to t has to flow across the cut for any flow, any cut
    flow from s to t capacity of cut separating s, t.

    Algorithm:
    given capacities s, t
    set initial flow to 0 on all edges
    repeat {
    construct the residual capacities graph
    find a path from s to t with capacities > 0
    increase the flow along that path by that minimum capacity
    }
    until no path is found. Then flow is maximal.
    In the find residual capacities graph
    there is no path from s to t,
    so let X be the vertices that can be reached with capacities > 0 from S, and
    Y be the remaining vertices
    then X, Y is a cut separating s, t such that capacity of this cut = flow.

    All edges going from X to Y has 0 reached capacity fully used

    All edges from Y to X have 0 capacity on backward edge so not used at all.

    Max flow min cut theorem

    One iteration takes O(E)
    each iteration increases flow value by at least 1
    so number of iteration \leq maximum flow
    so time O(E * maxflow)
    if all edge capacities are \leq C
    then outflow of s \leq CV, so time O(E*V)

    This algorithm no good for NP problem.

    Look up P-flow (push flow)

  17. timlyg says:

    04/13/2016
    May 18 Final
    May 11 Final Review

    Dynamic Programming

    Example: Evaluate n-th Fibonacci number:
    f(n) = f(n-1) + f(n-2), f(1)=f(2) = 1
    Bad Idea
    int f(int n)
    { if (n \leq 2) return(1);
    else return (f(n-1)+f(n-2));
    }
    Total number of leaves in the call tree is f(n)~(16)^n

    Second idea: Bottom up
    Buttomupevaluation of f(n) takes O(n)
    Top-down evaluation of f(n) takes ~(16)^n

    Dynamic programming:
    We are given a problem that can be decomposed into smaller problems of the same type.
    Instead of recursively decomposing, we tabulate all small problems that can occur in this decomposition process,
    then solve them from smallest to longest, reducing each problem to subproblems for which we already have the answer.

    Complexity is determined by the number of subproblems we generate subproblems.
    Subproblems are typically stored in a table, so this Dynamic Programming Table is feature of all DP algorithms.

    Knapsack Problem
    Given items 1 n item i has value a(1), weight b(1)
    we can carry at most weight L, so which items should we select to maximize the value.

    Subproblem[i,w]: maximum value we can achieve if the weight limit is w (1 \leq w \leq L) and we can use only items 1, ..., i
    first column: subproblem [1,w] =
    a). O(1) if w \geq b(1)
    b). O else

    fill in column i+1, column 1...i already finished
    subproblem [i+1, w] =
    a). max(a(i+1)+subproblem(i,w-b(i)), subproblem(i,w)), if b(i+1) \leq w
    b). subproblem(i,w), else

    Complexity of finding subproblem(n,L) (our original problem) in O(1) per table entry, So O(n+L)

    Knapsack Complexity: size of the table

    Edit distance of Strings
    Define edit operations:
    insert a character
    delete a character
    exchange one character for another
    distance of two strings is the minimum cost sequence of edit operations that changes on string to the other.

    Edit distance of s_1...s_n to t_1...t_m
    subproblem[i,j] = edit distance of s_1...s_i to t_1...t_j
    subproblem[0,j] = edit distance from empty string to t_1...t_j = insert cost(t_k)
    subproblem[i,0] = edit distance from s_1...s_i to emptystring = delete cost(s_k)

    For i,j \geq 1
    subproblem[i,j] = min {
    subproblem[i-1,j] + deletecost(s_i),
    subproblem[i,j-1] + insert cost(t_j),
    subproblem[i-1,j-1] + exchangecost(s_i,t_j)
    }

    Complexity of....subproblem(n,m) = the original problem is O((mn)

    This is not just useful with dictionary, but also in Bioinformatics.

  18. timlyg says:

    04/18/2016
    Floyd-Warshall all pairs shortest path
    Look up details.

    F-W algorithm needs O(V^3) time
    Space can be reduced to O(V^2)
    because S[i,j,k-1] is not needed after s[ijk]

    F-W code:
    { create table S[n][n][n+1]
    for (i = 1...n) for (j = 1...n) if (edge vec(i,j) exist)
    {
    S[i,j,0] = edgelength(vec(i,j));
    else S[i,j,0] = +infinity;
    }
    for (k = 1...n) for (i = 1...n)
    s[i,j,k] = min(S[i,j,k-1], S[i,k,k-1]+S[k,j,k-1])
    }

    Single-source shortest path with negative edgelengths alllowed
    Bellman-Ford algorithm
    (similar to Dijkstra, but easier)

    Coin change problem

  19. timlyg says:

    04/20/2016
    Complexity Classes
    Up to now: studied complexity of algorithms
    next: complexity of problems

    Sample problems:
    1. sorting n numbers
    2. find minimum spanning tree
    3. Find shortest path
    4. find the shortest closed walk that visits each vertex(traveling salesman problem)
    5. Assign three colors to vertices of a graph so that neighboring vertices have different colors

    Decision Problems: Problems with yes/no answer
    Sample problems:
    1. Is the minimum spanning tree length \leq k
    2. can the graph be 3-colored
    3. does the graph have a hamiltonian circuit (cycle visits each vertex exactly once)
    4. can a given boolean formula be satisfied?

    Class P: decision problems for which an algorithm exists that solves each problem of size n in O(n^C) time, C is some constant.
    Polynomial time Algorithms

    Leads to P = NP problem

    Class NP: Those decision problems for which algorithm exists that takes problem instance (size n)
    and additional O(n^C) but of evidence as input and
    if the answer to problem is yes
    then for some evidence the algorithm will answer "yes"
    if the answer to problem is no
    then for any evidence the algorithm will answer "no"
    and algorithm run in O(n^C) time

    Class NP: Problems for which a "yes" answer can be checked in polynomial time, given some certificate of polynomial length (Some ways to satisfy "yes" answer)
    Example: boolean function satisfiable: is in NP.
    Since given the satisfying assignment we can check it
    and any non-satisfying assignment will be rejected.
    Sample problems of NP:
    Satisfiability Problem (SAT)
    1. Graph 3-colorability
    2. Hamiltonian Circuit
    3. Maximum Independent set

    P \subseteq NP
    famous problem is P = NP
    PhD level, unsolved.
    Unknown problem: P = NP = P-Space = Exptime, Polynomial collapse
    There appears to be a movie about this.

    Reducibility of Problems
    I have two problems, Pr_1, Pr_2
    and any instance of Pr_1 can be reduced by some polynomial-time preprocessing to an instance of Pr_2 with the same answer.
    If I can solve Pr_2 in polynomial time, then I can solve Pr_1 in polynomial time.
    If Pr_1 cannot be solved in polynomial time,
    then Pr_2 cannot be solved in polynomial time.
    So, Pr_1 reduced in polynomial time to Pr_2
    then Pr_2 is at least as hard as Pr_1

    A Problem is NP-Complete
    If every problem in NP can be reduced to it.

    Theorem: SAT is NP-complete
    Proven
    Outline of prof's proof:
    We want to show: any problem in NP can be reduced to SAT
    How is a problem in NP given?
    Problem is given as a piece of code that takes problem instance and additional certificate, answer yes/no
    Need to transform:
    -that piece of code
    -Problem instance of size n into a boolean satisfiability problem with same answer.
    That piece of code takes O(n^C) to execute on problem size n,
    so it needs only O(n^C) storage/memory.

    Model the possible execution of that piece of code on a computer
    O(n^C) boolean variable corresponding to the memory cells.
    Each variable exists in O(n^C) copies, for the O(n^C) time steps
    Table: time step (y-axis), O(n^C) X memory addresses (x-axis, O(n^C)
    O(n^{2C}) boolean variables in total.
    Count each variable from one timestep to the next.
    Can model the possible transition from one step to next also as boolean formula.
    Get: Boolean formula of size O(n^{2C}) that describes all possible executions of program on input put in the memory cells at time 0.
    Assign the problem instance to the problem input bits, leave certificate bits open.
    Then ask SAT solver "are there any certificate bits such that the program run will end with yes?"
    If answer is yes, then answer to original problem is yes, else no.
    So, SAT instance has same answer as original problem.

    For samples of NP complete problems, Prof recommended this book.

    Finite Automata = Finite Automaton = Finite State Machine (FSM)
    Turing Machine

  20. timlyg says:

    05/02/2016
    Arithmetic on Long Integers
    - addition, no problem, add 2 n bit numbers, O(n)
    - subtraction, same
    - multiplication, normal style O(n^2) => Karatsuba Algorithm
    - division, can be done O(time of multiplication)
    - exponentiation
    - modulo a number

    Karatsuba-Ofen algorithm for multiplication of n-bit #
    Idea: multiply recursively reduce n-bit multiplication to n/2 bit multiplication
    A x B, A, B n bit numbers
    A = A_upper * 2^(n/2) + A_lower
    B = B_upper * 2^(n/2) + B_lower

    A_upper B_upper upper n/2 bits
    A_lower B_lower lower n/2 bits

    AxB = A_upper * B_upper * 2^n + A_upper * B_lower * 2^(n/2) + A_lower * B_upper * 2^(n/2) + A_lower * B*lower * 2^0
    ----
    So one n-bit multiplication reduced to four n/2 bit multiplication, some shifts, some additions
    t(n) \leq 4t(n/2) + O(n)

    Recursion gives t(n) = O(n^2).

    Need to rewruite the 4 terms so that only three multiples necessary.

    Karatsube-Ofen (ofman?) Algorithm:
    Sum1 = A_upper + A_lower
    Sum2 = B_upper + B_lower
    Prod1 = Sum1 * Sum2 = A_upper * B_upper + A_upper * B_lower + A_lower * B_upper + A_lower * B_lower
    Prod2 = A_upper * B_upper
    Prod3 = A_upper * B_lower

    Thus,
    A*B = Prod2 * 2^n + (Prod1 - Prod2 - Prod3) * 2^(n/2) + Prod3 * 2^0

    Reduced n-bit multiplication to 3 n/2 bit multiplication and addition, subtraction, shifts
    t(n) \leq 3t(n/2) * n/2 + O(n), => t(n) \leq O(n^{log 3})

    Fourier's Transform mentioned.
    Polynomial interpolation mentioned!

    Basic operation in Cryptography (RSA)
    a^b mod c
    a,b,c are long integers
    First attempt: compute a^b as a*a*...*a, b times
    takes O(b) where b is long int 2^1024
    Not possible!

    Better way: reduce to O(log b) multiplications (after each mult, reduce mod c)
    Example: (123)^456 = ((123)^2)^228 = ... = ((((123)^2)^2)^2)57 = let it = A(A)^56 = A*(A^2)^28...
    = A(((A^2)^2)^2)^7
    = A*B*B^6 = A*B(B^2)^3 = AB^B*B^2*((B^2)^2)
    So, need only < 2(log b) multiplication to compute a^b

    Next, Newton's Method:
    given f,
    want to find x such that f(x) = 0

  21. timlyg says:

    05/04/2016
    Algorithms on Strings
    linear time string matching
    Given two strings s and t. We want to decide whether s is a substring of t.
    m = length(s), n = length(t)
    Does there exist an i such that for j = 0...(m-1)
    s[j] = t[i+j]

    Start recording...

    self overlaps of s
    ...prefix_k(x) = suffix_k(x)

    ...outline how to construct table of maximal self-overlaps of prefixes of s

  22. timlyg says:

    05/09/2016
    Traveling Salesman Problem TSP

    Given n cities that have to be visited.
    With their distances, find the closed tour which visits all cities exactly once, and has minimal cost.

    Given a graph with edge lengths.
    Want the cycle through all vertices of the graph of minimum cost.
    Assume that the edge lengths satisfy the triangle inequality d(a,c) \leq d(a,b) + d(b,c).

    Trivial algorithm for exact solution:
    closed tour has to include vertex 1.
    cut it open at vertex 1, then the other n-1 vertices are visited in some sequence. Try all (n-1)! seq. complexity O((n-1)!), unreasonable even for n = 10.

    Better exact algorithm (Held-Karp-Algorithm
    Dynamic programming idea:
    Tabulate the shortest path that starts at vertex 1,
    ends at vertex i, uses vertex set S with 1, i \subset S \subseteq \{1,...,n\}
    Do this for all S, and all i \in S

    TSP is NP-Complete
    Contains a special case Hamiltonian circuit problem (given a graph, i, these closed tour which visits each vertex exactly once?)
    Hamiltonian circuit is also NP complete.
    apparently difficult problems, Yet large instances have been solved exactly.
    - There are easy instances
    - in any problem instance, main difficulty is to show that the proposed solution is optimal.

    There are some techniques to generate lower bounds for TSP instance, if the lower bound meets the length of the given tour, then the given tour is optimal.

    One lower bound technique 'rivers and moats'

    Approximation algorithm for TSP
    Choose a minimum spanning tree (MST) go around the tree, skipping vertices you have already visited before.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.