Line breaking slides

Home

Reading assignments

Programming Assignments

Resources

Syllabus

Slide Sets

Final Project Ideas

Updates


    The problem

      Dividing words into lines

      Esthetic results (big differences, how to quantify)

      optimization problem (bad news)

      Not NP complete (good news)

    What is unesthetic?

      paragraphs with big holes in them

      paragraphs with words squeezed together

      paragraphs with words pulled apart

      uneven paragraphs

    strategies

      choice: use dynamic programming or greedy alg.

      Define penalty function

      minimize penalty function over paragraph

    Greedy algorithms

      try to make an optimum solution at each step, hope the result is overall the best

      greedy text breaking

      fill up each line until full

      l = 1;
      i = 1;
      while (i < n)
          linestart[l] = i
          length[l] += wordlength[i]
          while (length[l] + wordlength[i+1] <= limit && i <= n)
             length[l] += wordlength[i+1]
             i += 1
          endwhile
      endwhile
      
      At the end:
      
      linestart[l] has index of initial word in line l.
      length[l] has length of that line.
      

    What is dynamic programming?

      an optimization strategy that takes advantage of the structure of the problem

      based on divide and conquer

      based on remembering intermediate results

      Simplest case: memoizing a function

      In a non search context, fibonacci function:
      f(n) = (n> 0) f(n-1) + f(n-2)
      (else) 0
      this is inefficient (unless we remember
      all values of f(n) once we calculate them.
      Dynamic programming works when we can simplify
      exponential divide and conquer by remembering
      a polynomial number of results of smaller problems,
      whose solutions can be used to solve the larger
      problem.

    dynamic line breaking

      search for minimum penalty function for paragraph

      we will consider the value of adding a break at every point in the paragraph

      But, we can ensure that we don't need to consider all previous breaks in combination with that

      We need to guarantee that no break we consider later can invalidate an earlier optimum

    The Dynamic line breaker

    array cost[i] is the lowest penalty
    of breaks from words i to n.
    
    length(i, j) = sum of word lengths from i to j.
    
    Function Legal(i, j) = boolean test as to
    whether the line from word[i] to word[j] is a
    legal line length.
    
    nextbreak[i] = if word[i] is a break, index
    of best break for all preceding lines.
    
    penalty[n+1] = 0
    for i=n to 1
       if length(i, n+1) <= optimum
          cost[i] = 0
          nextbreak[i] = n + 1;
       else if Legal(i, j) for some j
          choose r such that Legal(i , r)
                and cost[r] + penalty(Line(i, r) is minimal
           cost[i] = cost[r] + penalty(line(i, r)
           nextbreak[i] = r
       else
           cost[i] = infinity
    
    at the end, if cost[1] < infinity, there is a legal,
    optimal breaking sequence.
    
    To get that sequence, follow the nextbreak array
    from nextbreak[n].
    

      dynamic line breaking discussion

      For simplicity, the algorithm given works backwards from the end of the paragraph to the beginning, which makes the special case 0 penalties for words on the final line easy to accommodate within the algorithm. There are a number of reasons, however, why one might want to run the algorithm in the other direction; for one thing, in an interactive system, the line break array is more likely to be easy to process incrementally in that order, as typed input often appears at the end of a paragraph, and in any case, always follows previously typed data. Making this algorithm incremental in this way requires recalculating penalties from before the final line, so as to accommodate the special case for the last line correctly.

      Breaking lines in a forward direction is also essential if paragraphs with variable line lengths are to be accommodated; while it is easy to parameterize the line-length by current line number, the line number is not known when breaking from the end of a paragraph back to the beginning.