Note: it turns out to be faster to do subset search implicitly with chain search using Fork-Fulkerson max flow. (And is even faster using Hopcroft-Karp bipartite maximum matching.)
My Sudoku solver's disjoint subset logical reduction routine logical reduction routine detects when any n
cells in a row, column, or block share the same n
possible values. Those cells are the only cells in that row, column, or block that may contain those values, allowing the corresponding possible values for the other cells to be reduced. Menneske.no has a good explanation
of this operation.
I represent the possible values with a boost::dynamic_bitset. For a cell in a 9-wide sudoku with possible values 1, 3, and 9, the corresponding bitset would be the binary number b101000001. Imagine that another cell in the same row has possible values 2, 3, 6, and 9, and the bitset b011001001. Bitwise AND of their two bitsets would yield b001000001 - cardinality 2 for AND of 2 cells, indicating that these cells form a disjoint pair and that no other cells in the row may contain 3 or 9.
How about identifying disjoint sets of 3, 4, 5, or perhaps even 65535 cells (for a very large Sudoku puzzle)? A clustering algorithm using a modified Hamming distance operator to compare similarity rather than discrepancy might help, but the algorithms I investigated were not trivial to adapt for providing the exact solution required.
At first blush, bitwise ANDing all combinations of cells for all n
values seemed impractical. Consider that 256 choose 2 (to identify all pairs in a 256-wide Sudoku) yields 32640, and 256 choose 10 yields 278,826,214,642,518,400... Some method to avoid searching the incredibly vast number of useless cell sets is required. So, I went outside with a notepad and a beer and set about drawing Sierpinski triangles and considering methods of stacking results to avoid redundant calculation and useless sets. Some sketching and fiddling later, I found the fractal pattern, illustrated below, where A, B, ..., G are cell bitsets. AND the color on the left with the column of the same color on the right to produce the column of that color below in the next iteration.
This works quite well and solves any 3x3 (9 wide) Sudoku in less than a microsecond on my workstation. 16x16 (256 wide) puzzles still take minutes, so I'm thinking about various ways store more book keeping information and avoid the per-row, column, and block AND operations entirely. It will certainly be necessary (along with more RAM!) in order to solve 256x256 (65536-wide) puzzles.