# Choco and Regular Expressions

A quick update to the series on Choco and pipes last week.

An alternative way to specify the neighbour constraints is to use regular expressions. That may seem a little odd, but much of the more advanced constraints in Choco are based around DFAs (Discrete Finite Automata).

DFAs are the “science” behind regular expressions - they are what a regular expression is compiled to (at least in theory; in practice many regular expressions are not strictly regular, and many regex implementations are poor).

How does this work?

Well, a DFA is just a set of rules (transitions) associated with particular states. You start, unsuprisingly, at the start state and look at the first character; the transitions for the start state tell you, given the first character, the next state. At the next state you move according to the transitions there and the next character. And so it repeats, until you have either matched all characters, or there is no suitable transition from your current state.

That’s a pretty powerful and general approach, so building it into Choco makes for a flexible constraint engine. At least in theory. In practice, I found it a little confusing at first, but I’m starting to understand and have just got a simple test case working.

The basic idea is the values associated with variables are the “characters” in
the regular expression (for integer values you really can use a regex -
something like `"123"`

means values 1, 2 and 3 for three variables, while
`"<10><11>"`

is the syntax for 10 and 11 - but you can also construct the DFA
directly, and there are a bunch of simpler API methods, like the one I’ll use
below). Obviously, this means that along with the regular expression you need
to specify which variables it applies to (and their order, which is implicit
in the array used in the API).

Back to my simple test case. If you read the earlier article you may recall that we needed to constrain neighbouring tiles, so that pipes connect correctly. I did that by defining binary relations for neighbouring variables.

An alternative way to provide this constraint is to say that two neighbouring
cells must have values that satisfy a regular expression, where the regular
expression is simply the list of acceptable pairs. And Choco provides a
simple API for this: `Choco.regular(IntegerVariable[], List<int[]>)`

.

Here’s the new code, which replaces the earlier `constrain-pairs`

:

```
(defn constrain-pairs-regular [model n lo hi masks relation? variables swap]
(let [tuples
(java.util.ArrayList.
(map int-array
(for [m1 masks, m2 masks :when (relation? m1 m2)] [m1 m2])))]
(doseq [[v1 v2]
(map #(map variables %)
(for [i (range (dec n)), j (range n)]
(map swap [[i j] [(inc i) j]])))]
(.addConstraint model
(Choco/regular (into-array IntegerVariable [v1 v2]) tuples)))))
```

It’s almost identical to the original (a little simpler), but appears to trigger quite different behaviour in the engine. Unfortunately the new behaviour is not an improvement - the benchmarks are 2-3 times slower (presumably because the DFA machinery has more overhead than the simple binary relations).

Still, it’s an extra weapon to add to the armoury for more complex programs in the future…

As ever, latest code is in the repo.

Andrew

## Related Posts

- Pipes, Clojure and Choco (part 5: Choco)
- Pipes, Clojure and Choco - Optimisation
- Pipes, Clojure and Choco (part 3: Smart Search)
- Pipes, Clojure and Choco (part 2: Direct Search)
- An ORM for C?
- CMake with Check Unit Tests
- Pipes, Clojure and Choco (part 1: Limits of Laziness)
- How to Average Complex Responses
- Javascript Graphics w/ Raphael
- Wacom Bamboo on Linux 3.1