Sunday, July 3, 2016

making puzzles by solving puzzles

Constraint Satisfaction Problems

Constraint satisfaction is a useful and very general approach to solving a wide class of puzzle-y problems. If you can phrase a kind of problem as "I need to choose some values (from finite sets) for a bunch of variables, but there are constraints that rule out some combinations of values", that kind of puzzle can probably be solved with constraint satisfaction.

Example puzzles would be things like sudoku (as a constraint problem: it's assigning a decimal digit value to each of 81 variables, subject to the row/column/3x3 uniqueness constraints), crosswords (assign a letter to all the boxes, such that all contiguous boxes spell out words in the dictionary), and the classic "dinner party seating problems" (eg. there are 8 seats at a round table and 8 guests. pick seats for everyone such that Alex can't sit next to Bob, Bob must sit next to either Cathy or Dave, Dave must sit next to Alex but not Edgar, etc). 

Possible Solutions

When there are a lot of variables to choose, it's intractable to exhaustively test every combination of values to find a solution, because there's an exponential number of combinations. Imagine the tree of choices we could potentially make, where at each depth D of the tree, when we've picked the first V values from a domain D, there will be |D|^V branches in the tree, which grows too quickly for brute-force searching.

Constraints as Pruning

However, we don't need to search through most of the tree, because once a constraint makes a tree node invalid, we don't need to explore any of its children. So applying constraints prunes off whole branches of the tree of value assignments*, which eliminates an exponential number of nodes with each pruning, which makes it feasible to explore the tree.

* That is, if you don't have to make many choices for them to apply. If the constraint was "the MD5 hash of your values is 0", you can't get out of doing the exhaustive work.

Recursive Backtracking

The straightforward way to recursively guess-and-test your way to a solution is to randomly pick a value, apply any constraints using that value (which rules out values for other variables), and bailing out if we ever violate a constraint. Eg. something like:

def solve(puzzle):
if any variable has no potential values: return False
if all variables have only one value: return puzzle
pick an undecided var (ideally the one with the fewest remaining values)
for each potential value in var (ideally in most constraints order):
set var to value
apply all constraints
found = solve(puzzle)
if found: return found
  return False

To be clever, you can keep track of the set of potential values for each variable, so that when a constraint rules out a value, you can remove it from the set. Furthermore, constraints can propagate transitively from variable to variable - that is, once you've limited the possible values of one variable, that can in turn limit other variables.

When a constraint is violated by making a bad random choice, then applying the constraint makes one of the set of potential values empty, which makes us fail and undo (backtrack) to the previous guess. There's a great writeup of this technique by Peter Norvig.

Find All Solutions

Some types of puzzle are only valid if they have exactly one solution (Sudoku is an example of this). How could we distinguish between under-constrained sudoku puzzles and valid ones? Our solver stops once it hits the first valid solution. What if it didn't and kept going? We could pretty easily modify our solver into a recursive generator function, ie one that returns all solutions:
def solve_all(puzzle):
if any variable has no potential values:
  return # nothing to yield

if all variables have only one value: 
yield puzzle # the one solution
return
for all undecided vars:
found_value = False
for each potential value in var:
set var to value (in a local copy of puzzle)
apply all constraints
for y in solve_all(puzzle_copy): # assign all other vars
found_value = True
yield y
# couldn't assign any value for v, this branch is dead
if not found_value:
return

So that eg. if we gave it this underconstrained puzzle to solve:
   |  6|
 59|   |  8
2  |  8|
-----------
 45|   |
  3|   |
  6|  3| 54
-----------
   |325|  6
   |   |
   |   |

It would find many solutions, starting with these two (note the last two rows are swapped).
178|236|495
359|147|268
264|598|137
-----------
745|962|381
813|754|629
926|813|754
-----------
481|325|976
592|671|843
637|489|512


178|236|495
359|147|268
264|598|137
-----------
745|962|381
813|754|629
926|813|754
-----------
481|325|976
637|489|512
592|671|843

To make the puzzle a valid one (with only a single solution), we'd have to add more digits to the puzzle. We can pick an arbitrary blank space to reveal a number. In a sense, we can use the solver to help turn incomplete puzzles (not enough information to be valid) into every valid puzzle that it could be.

So if our fancy solver can search for all solutions, what if we gave it the empty 9x9 sudoku grid? Here's the first and the ten thousandth solution (of a huge number!) of solutions to the empty sudoku board, in sorted order (trying the numbers in order, from left to right). In a sense, our solver is now a generator! If we left it running, it would eventually print every solved valid sudoku puzzle.
#1
123|456|789
457|891|236
689|237|145
-----------
214|365|897
375|982|461
896|174|352
-----------
531|628|974
748|519|623
962|743|518

#10,000
123|456|789
457|891|236
689|237|145
-----------
214|365|897
375|982|461
896|174|352
-----------
538|729|614
942|618|573
761|543|928

Too much Information

Of course, although those puzzles are now valid, they're also already solved, which makes them bad puzzles. Ideally, our generator would show only the minimal number of clues to a valid puzzle. Since our solver knows when a puzzle is solved, though, we already know when we've added enough information!

A Backtracking Generator

To turn a simple solver into a simple generator, instead of guessing at assigning variables, we actually assign them, and keep doing that until the puzzle is (uniquely) solvable. Like the solver, it has to backtrack if it hits a dead end (if we pick incompatible values). We can avoid giving away our completed solution by maintaining two copies of the puzzle - one that's minimal (ie. the puzzle we're going to return), and the other that's fully constrained (ie. what can be deduced from the minimal one without any guessing), and we're done when the constrained one is solved.

def generate(minimal, constrained):
if failed(constrained): return False
if solved(constrained): return minimal

while 1:
pick a random variable
pick a random value from the variable's set

assign the value for minimal

assign the value for constrained
run the constraints on constrained

if the constraints fail: 
then that random value isn't valid,
so remove it from the set of values Run the constraints again. If they fail (meaning there are now no valid values for that variable):
return False

if it succeeds, return generate(minimal, constrained)

The reason we have to remove invalid values (instead of just trying another number) is to stop us from getting stuck in an infinite loop - we need to be able to backtrack multiple times to back out of a dead end that started with a grandparent.

But this generates valid mostly-empty puzzles (and the solution at the same time). Sweet!

Eg. this guy:
 38| 5 |  4
   |8  |65
9 7|   |
-----------
 6 |48 |
 71|  6|3
   | 7 |
-----------
 4 |367|8 5
   |  1|79
 26|   |

Difficulty

We could generate harder puzzles by picking variables and values via some heuristics, rather than randomly (just like the ones that improve solvers, I like the similarity). we could also do a better job of noticing when we're done, by counting whether there are 0, 1, or multiple solutions to a puzzle, ie.

def count_solutions(p):
count = 0
for y in yieldall(p):
count = count + 1
if count == 2: break
return count

Then starting the generator with this instead:
def generate(minimal, constrained):
sols = count_solutions(constrained)
if sols is 0: return False
if sols is 1: return minimal

To get the really minimal puzzles (they're probably still not difficult, though!).