The Unofficial VS Benchmark Project


NEW DEVELOPMENTS AS OF FEB 1, 2001:

All of the material on this page is now obsolete. I have developed COBOL performance measures for various VS models, several HP models and several IBM RS/6000 models, all without benefit of any outside help. A couple of you contacted me about this project, and for that I thank you. In the end, though, I had to do it myself and I had to wait until circumstances presented opportunities to test HP and IBM platforms on the same basis as the VS.

The numbers I have developed for about 10 different brands and models will not be disclosed just yet. I am still thinking through the implications of what I have discovered.

If you think you have a pressing need to understand VS, HP and RS/6000 performance, you are free to email me. If you have access to Microfocus or AccuCOBOL or Fujitsu COBOL on any interesting platforms and are willing to spend a little time getting my benchmark to run on your system, by all means drop me a line and we’ll talk about it.

Watch for news about COBOL ReSource.


NEW DEVELOPMENTS AS OF JULY/AUGUST 1999:

I have developed a measurement technique that appears to yield CPU speed measurements close to the nanosecond. As described in my original concept of this project (below), the technique measures COBOL verb execution speeds. That’s all we really care about. COBOL measures are far more useful than machine language measures, since few business programs in the real world are written in assembly language and none in raw machine language.

The new benchmark presently contains 43 measurements of COBOL verb and data item combinations. They are not necessarily the 43 one might most wish to see — they are the first 43 I implemented. I am presently measuring:

  • GO TO
  • simple PERFORM
  • MOVE literal zero TO comp 9(2)
  • MOVE literal zero TO comp 9(3)
  • MOVE literal zero TO comp 9(4)
  • MOVE literal zero TO comp 9(5)
  • MOVE literal zero TO comp 9(6)
  • MOVE literal zero TO comp 9(7)
  • MOVE literal zero TO comp 9(8)
  • MOVE literal 12 TO PIC Z9
  • MOVE literal 123 TO PIC ZZ9
  • MOVE literal 1234 TO PIC Z,ZZ9
  • MOVE literal 12345 TO PIC ZZ,ZZ9
  • MOVE literal 123456 TO PIC ZZZ,ZZ9
  • MOVE literal 1234567 TO PIC Z,ZZZ,ZZ9
  • MOVE literal 12345678 TO PIC ZZ,ZZZ,ZZ9
  • MOVE literal “1” TO PIC X(1)
  • MOVE literal “12” TO PIC X(2)
  • MOVE literal “123” TO PIC X(3)
  • MOVE literal “1234” TO PIC X(4)
  • MOVE literal “12345” TO PIC X(5)
  • MOVE literal length 10 TO PIC X(10)
  • MOVE literal length 20 TO PIC X(20)
  • MOVE variable length 132 TO PIC X(132)
  • MOVE literal “A” TO PIC X(2)
  • MOVE literal “A” TO PIC X(3)
  • MOVE literal “A” TO PIC X(4)
  • MOVE literal “A” TO PIC X(5)
  • MOVE literal “A” TO PIC X(10
  • MOVE literal “A” TO PIC X(20)
  • MOVE literal “A” TO PIC X(132)
  • MOVE binary halfword to binary halfword
  • MOVE literal TO binary halfword
  • ADD binary halfword TO binary halfword
  • SUBTRACT binary halfword FROM binary halfword
  • MULTIPLY binary halfword BY binary halfword
  • DIVIDE binary halfword INTO binary halfword
  • MOVE binary fullword TO binary fullword
  • MOVE literal TO binary fullword
  • ADD binary fullword TO binary fullword
  • SUBTRACT binary fullword FROM binary fullword
  • MULTIPLY binary fullword BY binary fullword
  • DIVIDE binary fullword INTO binary fullword

Programmatic access to a real time clock is important, but not critical. The VS has a 1/100th-second real time clock. The OS charges CPU clock usage to each task, and that, too, is available to the application. Other platforms may make CPU or “wall clock” time available to the application, and the appropriate call must be formulated for each platform.

METHODOLOGY

The methodology for making relatively precise measurements of processor speed in a given work context is as follows:

Run an empty loop many times, measuring the CPU usage, then run the same loop the same number of times with a unit of work within it, again measuring CPU usage. Subtract the empty loop CPU usage from the work-unit loop CPU usage. The difference is a measurement of the CPU usage of the unit of work. Compute the derivatives to show how many units of work per second, how many nanoseconds or microseconds one unit of work takes to complete, etc.

The details get a bit more complicated, but the two essentials are:

  1. Programmatic access to task or process CPU usage
  2. Running the loop enough times to improve the signal-to-noise ratio to an acceptable level

If programmatic access to CPU usage is not available on a given platform, resort may be made to programmatic access to a real time “wall clock” if the machine is available for quiet-time testing when only minimal residual and unavoidable OS functions may be using CPU in addition to the benchmark program.

If no programmatic access to task CPU usage or even elapsed time is at hand, a wristwatch may still serve as the time base if the machine can be tested in quiet mode and if the loop is repeated enough times to overcome the human error in reading seconds of elapsed time from a watch. It is quite easy to reduce human timing measurement error to less than 1%.

Loop repetitions must be sufficient to overcome inherent inaccuracy resulting from gross readings of task CPU usage. All digital measurements are accurate to no more than plus or minus one tick of resolution. In the VS we have 1/100th second CPU usage ticks. If we run a loop that takes approximately one second to complete, the very nearly two tick window of error introduces a 2% uncertainty. If we run the loop for twice the repetitions, there is still only a two tick window of error, which is now only 1% of the doubled iterations. If we run the loop for 20 seconds, the error is reduced to 1/10th of 1%. At 200 seconds the error is reduced to 1/100th of 1%.

If we attempt in this manner to measure a CPU that is approximately in the 100-nanosecond speed range, a one-second test run could, at most, involve 10 million instructions (which might translate to 5 million or fewer loop iterations). A safe rule of thumb is that the loop probably cannot run at greater than one-half the fastest instruction execution rate, since at least two instructions will usually be required to comprise the empty loop. A 2-tick error in a 1-second run therefore represents no more than 100,000 loop iterations of our 100-nanosecond machine, and probably fewer, or 1%. A 2-tick error in a ten second run represents a 0.1% error and in a 100 second run, a 0.01% error, or one part in ten thousand.

Boiled down, if the empty loop CPU ticks on a VS are at least several hundred, the loop is massive enough to start to give us some meaningful timings of single instructions. If the empty loop overhead runs over a thousand ticks, we’re starting to get some interesting measurement resolution.

Accuracy may be further enhanced by reducing the percentage of the measured CPU usage is consumed by loop overhead. This is easily done by loading the loop with more than one instance of the payload instruction. If the loop contains 10 instances of the payload, then only 1/10th of the measured loop overhead comes into play in determining the execution time of one of the payload instructions. This reduces the error in the calculation of the overhead and its impact on the accuracy of the calculation of the payload timing.

Accuracy and resolution are not the same thing. If, for example, we only run the loop for one second, we can still compute the instruction timing to the nanosecond, but the error will be larger than the resolution of the calculation, and so the low order digits will be meaningless. Although not a perfect proof, one empirical form of confirmation that the resolution of the measurement and calculation do not exceed the accuracy is the repeatability of the results. If we are calculating to a finer resolution than the combined accuracy of our technique and the time base allow, we would expect the trailing digits to vary wildly.

The first effort to measure VS performance objectively has not been designed to be as accurate as I suggest, but it has shown remarkably consistent readings, so perhaps I need to review my accuracy analysis — I seem to be getting more consistent results than I should be getting for the accuracy with which I think I am measuring.

DETAILS

For the VS, I am using a loop iteration count of 10 million. I am computing the results to a fraction of a nanosecond and, as expected, the digits around and below the nanosecond level appear to be unreliable. The results are, however, very consistent to within a few nanoseconds.

The methodology, in greater detail, is this:

  1. Synchronize with an increment of the task’s processor tick counter.
  2. Measure the processor usage of an empty loop executed n times.
  3. Synchronize with an increment of the task’s processor tick counter.
  4. Measure the processor usage of the same loop containing a unit of work.
  5. Subtract the empty loop processor usage from the work loop processor usage.
  6. Derive single operation timing and operations per second.
  7. Repeat steps 3 through 6 for various payloads.

The following is illustrative of the implementation, but is not the actual code used in the benchmark, so please do not copy this code and run it, as the results will not integrate with those obtained by the real benchmark program. If you’d like to help get comparable measurements of non-VS systems, contact me and I’ll send you the source code.

------------------------------------------------------------------------------------
* Synchronize with processor tick rollover...
* (This is not for accuracy, it is for consistency, by reducing the 
* wandering of processor clock tick granularity)

     MOVE LOOP-MAX TO LOOP-CTR
     CALL "EXTRACT" USING EXTRACT-KW-PROC-TICKS, TICKS-AFT.
RESYNC1.
     CALL "EXTRACT" USING EXTRACT-KW-PROC-TICKS, TICKS-BEF.
     IF TICKS-BEF = TICKS-AFT GO TO RESYNC1.

* Measure overhead of loop with no payload...

 LOOP0.
     SUBTRACT 1 FROM LOOP-CTR.
     IF LOOP-CTR > 0 GO TO LOOP0.
     CALL "EXTRACT" USING EXTRACT-KW-PROC-TICKS, TICKS-AFT.

* Now we know how many processor clock ticks the empty loop consumes...

     SUBTRACT TICKS-BEF FROM TICKS-AFT GIVING OH-TICKS.

* Measure loop with a payload...

[ synchronize with CPU tick counter again]
 LOOP1.
     MOVE "X" TO CHAR-LEN-1.
     SUBTRACT 1 FROM LOOP-CTR.
     IF LOOP-CTR > 0 GO TO LOOP1.
     CALL "EXTRACT" USING EXTRACT-KW-PROC-TICKS, TICKS-AFT.

* Now we know how many net processor clock ticks the 1-char MOVE takes...

     SUBTRACT TICKS-BEF FROM TICKS-AFT GIVING RESULT-MOVE-1.
------------------------------------------------------------------------------------

Clearly the EXTRACT calls may require substitution by something quite different on non-VS platforms. When the benchmark is ported to non-VS systems, the time base will have to be selected and appropriately coded for each platform. LOOP-MAX and the calculations may have to be adjusted to get loop times long enough to provide the desired accuracy. In the worst case the time base may be left out entirely and a wristwatch used instead.

We can compute the overhead of the EXTRACT (or equivalent) call used to obtain the current CPU ticks charged to the task. Since the payload timing loop includes the two complementary parts of the EXTRACT overhead (the part before the capture of the CPU counter and the part after), even though in reverse order, once we know the total overhead of the EXTRACT, that too can be subtracted to get a more precise measure of net payload CPU ticks. This is likely to be equally valid on most non-VS platforms that offer access to task CPU usage measures.

RESULTS

Yes, I have timing results for simple COBOL verb and operand scenarios from several generations of mid- and hi-end VS models. I don’t, however, yet have comparable measures from other types of computer systems. Even though the measures are objective timings, they don’t mean much except in relation to other systems doing comparable work, because we are measuring the speed of execution of units of COBOL work.

So, before I will publish any results, I need your help to run the benchmark on non-VS platforms that support COBOL and have some relevance in today’s business data processing world. Mainframes, unix boxes and PCs come to mind, particularly as either or both of the two major non-VS COBOL compilers run at least on PCs and on various unix platforms.

If you can help, please email me, Thomas Junker and tell me what platform and what COBOL you have under your control.

AND WHAT ABOUT MIPS?

MIPS (Millions of Instructions Per Second) has fallen out of use as a measure of CPU speeds or workload. That is mainly because no one can agree on what mix of instructions might be “typical” for measuring MIPS. However, the only raw speed I am looking at in any of this is what we can presume is the machine’s maximum raw execution speed — the short-range Branch, such as is generated by a COBOL GO TO by most compilers. I am making no effort to develop weighted MIPS measures, since many others have tried that in past decades only to see their efforts replaced by measures of objectively real work such as database transaction rates. While the latter are also subjective, they may be developed through the study of real world database applications that are current in today.

Straight COBOL, though, is very common in the VS world, and when a VS system is threatened by replacement by another platform running Microfocus COBOL or AccuCOBOL, some of us would like to know the relative performance numbers for the VS and its challenger. It is already plainly evident that most of the business decisions involving replacement of VS systems have no technical basis whatsoever. In many cases they have no business basis, either, and in some notable cases have crashed and burned with loss of reputation and jobs for the managements involved. A large VS doing real work is not an easy system to replace. I’d like to undersand more of the reason for that, especially inasmuch as having the facts at hand might make it possible to avoid some of those disasters by showing that the economical replacement systems do not have the processing capacity of the VS they are intended to replace.

RATIONALE

Some VS customers have spent in excess of US$10 million to replace one or a pair of VS12650-class systems. Spending ten to twenty times the value of an “old” system to replace it with a newer, “cheaper” system is too bizarre for words. Furthermore, in such cases it is not uncommon that the replacement system has had to be upgraded one or more times after migration because the original proposal vastly underestimated the CPU power required to replace the VS. Had the original proposal accurately gauged the size and cost of the replacement system necessary, perhaps the project may have been deemed cost ineffective. Once committed, though, the customer usually feels it necessary to carry through even though experiencing higher costs than anticipated.

In many cases, replacement systems bring shoddy or incomplete applications to a field where the VS application had been highly developed, thoroughly debugged, and had achieved remarkable stability. Management incompetence is evident in cases where the users are unhappy with the new system and almost the only people in the organization who fail to understand how ill-conceived the migration was are the managers who pushed the project forward. It would be funny, were it not such a disaster for employees and shareholders, when companies make strident efforts to replace a VS, only to find that they cannot turn off the VS because volumes of previously unnoticed but critical traffic is unhandled by the new system and continues to be processed on the VS. They then bear the burden of the cost of the new system without the saving of removing the old system. Oddly, the managements that find themselves painted into that corner still celebrate their success at having “replaced the Wang.”

The saddest cases are those where, a couple of years later, the actors have moved on to new jobs while the organization has to live with the consequences of a poorly planned and executed migration from a system that was either doing the job perfectly well to begin with or only needed modest upgrade to perform it well.

My hope is that the Unofficial VS Benchmark Project can give us a helpful view of the relative data processing speeds of the VS and its competitors. I have long suspected that, when measured in terms of comparable work, at least some of the competitive systems are poor performers. My reason for suspecting that is that the VS has a DP instruction set, while many other platforms have to perform in CPU-costly software many of the basic DP operations that the VS does in hardware. If another machine has a clock speed four times faster than the VS but typically executes COBOL verbs one-tenth as efficiently, the VS still comes out on top. Being able to show that may help thwart some of the mindless attacks being waged on our valiant and loyal servant, the VS. If, on the other hand, my suspicion turns out to be wrong or only marginally realistic, I’d just as soon know that rather than keep guessing about it.

Thomas Junker
August 28, 1999


ORIGINAL DESCRIPTION OF THE UNOFFICIAL VS BENCHMARK PROJECT:

I would like to compare VS models with non-Wang computers. Since I have access to few, if any, non-Wang computers, at least the type of access that would allow me to construct and run benchmark programs, I would like the help of others who do. If you have programming access to any current, non-Wang computer and find this project interesting, you can help at the cost of very little effort.

The idea is to define and implement a set of simple benchmark tests in a common language, tests that can let us see how the VS stacks up against the machines that are winning over the converts from the VS community and also the machines that are commonly in use whether or not they directly draw users away from the VS. Frankly, I have no idea how it will turn out, but I will publish the results regardless. If people are buying silicon snake oil when they abandon the VS, I’d like to be able to show the strengths of the VS. If the VS has weaknesses, I think it would be useful to show those, too, to allow people to make informed choices and to put pressure on Wang to do even better then they have so far done.

Some thoughts on measures I think would be useful, and which I will happily modify and update as I hear from interested parties and commentators, are:

  • Pure, raw processor speed, as fast as it gets
  • Binary integer arithmetic
  • Floating point arithmetic
  • Short COBOL COMP add
  • Long COBOL COMP add
  • Single char move
  • Long string move
  • String concatenations
  • Source lines per second of large COBOL compile

It occurs to me that some measures of OS performance might be interesting, too:

  • File reads per second, fully cached, consecutive and indexed
  • Interprocess communication for large and small messages

These are pretty easy things to measure, provided that task or process CPU time is available to be marked before and after the benchmark code, as it is in the VS. On any platform which does not make task or process CPU usage available to the program, the crude approach can be used of running an exceptionally lengthy benchmark on an otherwise idle system and measuring the elapsed time, by hand if necessary.

What I’ve offered for discussion above is just a start. I have reasons for suggesting the items I have, and thoughts about how they can easily be measured. For the most part I’d like to approach this (floating point excepted) in COBOL, for several reasons. First, COBOL is nearly universal, available on almost all platforms. Second, it was designed to describe real data processing work steps. Regardless of the languages of choice on different platforms or at different times, doing decimal fraction arithmetic and moving strings around is pretty basic stuff. In fact it’s so basic any machine that has to jump through hoops to do those things will reveal itself to be a lot slower at doing real work than it is at achieving raw MIPS speeds. Third, there should be a close and determinable relationship between fundamental COBOL source operations and machine instructions, which allows some inferences to be drawn about machine speeds.

Email me (below) if this project interests you.