Friday, January 2, 2009

New Benchmarks for 2009

Greetings:

Again, right now, for 1Q2009 (or maybe for next year) we are looking for new Benchmarks. If I can't find any, we'll have to re-hash the old ones with a few new twists.

1. Waltz-50
2. WaltzDB-16
3. WaltzDB-200
4. 10K Telecom
5. 100K Telecom
6. MicroBenchmarks
7. Sudoku
8. 64-Queens

The 10K and 100K Telecom benchmarks do NOT exist yet. The 10K is almost finished but, for reasons of being over-worked out-of-work geek, I have not had time to work on it since about June or July of 2008. However, 1Q2009 is focused on Benchmarks 2009 2nd only the normal 1st priorities of God, Family and Job. After that, the secondary focus will be ORF 2009 until October. The WaltzDB-200 has to be written for everyone except OPSJ and Drools. But it is just a matter of writing two Java classes and that part is pretty simple. It follows the Manners 4, 8, 12, 16, 64, 128, 256 method where we just keep adding more and more data. There is the problem of the code generator, of course.

But, back to the topic at hand: What we need are benchmarks that will stress the entire rulebase, not just a few rules. Waltz is good in that it is a general problem but what we (OK, Dr. Forgy found it...) found was that, like Manners before it, only a couple of rules were being run most of the time. Fortunately, those two rules did stress the engine rather than just building a Fibonacci-type recursive algorithm as did Manners.

What we need mostly is consensus. Everyone agrees that what we have is not "sufficient" but no one has (as yet) defined "sufficiency." Certainly, we can talk more about this at ORF 2009 and discuss what a real performance benchmark would contain and what it would stress. The 10K and 100K Telecom benchmarks are not "true" benchmarks - they simply determine how long a particular engine takes to process rows and rows of data from a decision table. BUT they are all that I have for those engines that cannot process "normal" if-then-else rules.

Comments? Suggestion? Questions? Help is needed if we, the rulebase community, is to move forward and standardize on anything. What we don't want is a simple "toy" benchmark that won't really stress the engine. I have a feeling the either Sudoku or 64 Queens might be our last resort for consensus.

Most of you have blogging rights here - USE THEM! If you don't have blogging rights, apply for them and tell me which USA$1M project you have completed to qualify you for this august position and I shall certainly put you on the list of bloggers for ExSCG. :-)

SDG
jco