@exchgr I mean, that is how you can make it produce better than average software. Given its limitations, it's very fast but an average developer at best, giving it clear success criteria is the way to make it produce reliable code. A really extensive test suite is one way to give it clear success criteria across many sessions. Static analysis is another good way to keep the machines honest.
I've come to the conclusion that accountability the model can test itself should be most of your budget.