May 2, 2013

Perhaps They Should Have Tested More - Chicago Board Options Exchange


Perhaps the Chicago Board Options Exchange should have tested more



The Chicago Board Options Exchange (CBOE) experienced a shutdown lasting 3.5 hours last Thursday, April 25th, with residual effects lasting through Friday morning. copyrightjoestrazzere
  • The outage came as CBOE celebrated its 40th anniversary last Friday.
  • The interruption left some auction processes unavailable to traders until Friday morning.
  • A software glitch shut it down for three hours, wrecking trades and shaking confidence.
  • Glitch was related to the planned reconfiguration of the exchange’s systems.
  • Root of Thursday's problem is seen as being changes made to the software underpinning its systems that handle "complex" orders.
  • Outage paralyzed the biggest U.S. venue for options trading.
  • The shutdown left customers unable to manage positions in some of the options industry's most heavily traded contracts, which are available only at CBOE. 
  • Traders could not deal in CBOE’s S&P 500 Index options contract, the most active US index option, as well as its popular Vix index.
  • The market's shutdown for more than three hours created chaos for customers wanting to trade the exchange's proprietary contracts, such as options on the Standard & Poor's 500-stock index and the VIX, and left them without key hedging tools..
  • Consequences could have been much worse had the failure occurred on a busier, or more volatile, day.
  • CBOE Holdings Inc. technology staff knew of software issues in the hours before the largest U.S. options exchange suffered a three-hour outage.
  • “We have determined that the catalyst was preliminary staging work related to the planned reconfiguration of our systems in preparation for extended trading hours on CBOE Futures Exchange and eventually CBOE options,” the company said in a letter to traders. “It was this staging work, and not a systems upgrade or new systems load, that exposed and triggered a design flaw in the existing messaging infrastructure configuration.”
  • The solution has to be more investments on creaky IT infrastructure.
  • CBOE and its peers won't get away with that "it-doesn't-happen-much-around-here" explanation for very long.

Perhaps there was no way to avoid this problem (although some reports indicate that CBOE's technical staff knew of problems in advance of Thursday outage.)

Perhaps, as Chairman and Chief Executive Officer Bill Brodsky wrote in a follow-up letter to traders "Unfortunately, the nature of a software bug is sometimes only identifiable once the system is operationally ready."

But, perhaps CBOE should have tested more.



CBOE Says Software Bug Resolved

CBOE Letter to traders explaining what happened

CBOE Preaches to Vegas Choir as ‘Glitch’ Crashes Exchange

CBOE Staff Knew of Problems in Advance

CBOE details software bug that shut down options trading

CBOE's Systems Preparation for After Hours Caused Outage

How a "Harlem Shake" shook Wall Street

Time for a Reboot For Tech on Street



And yet another CBOE outage just one week later!
Thursday's isolated delay "was triggered by a product maintenance function being performed for a symbol change to one of the classes in the cluster," CBOE CEO William Brodsky and CEO-designate Ed Tilly said in a memo to clients.
"The software maintenance function was recently updated in preparation for the extended trading hours initiative."

CBOE has 2nd outage in a week
http://www.chicagotribune.com/business/breaking/chi-cboe-2nd-outage-20130502,0,2170747.story

CBOE defers start of extended trading day after recent outages
http://www.baltimoresun.com/business/sns-rt-us-cboe-delay-brodskybre9420k4-20130503,0,7588326.story




This article originally appeared in my blog: All Things Quality
My name is Joe Strazzere and I'm currently a Director of Quality Assurance.
I like to lead, to test, and occasionally to write about leading and testing.
Find me at http://AllThingsQuality.com/.

4 comments:

  1. Isnt the CEO basically saying that sometimes production issues are invevitable and is fair to say that. Although its less forgiving in a industry like CBOE.

    One thing Iam not sure if they took a deliberate risk knowing the issue would occur or a calculated risk with a higher probability of that issue not happening. Either way Iam sure they are in some kind of trouble with their customers.

    ReplyDelete
  2. @logilife - Inevitable, meaning "testing doesn't matter"? Inevitable meaning "more testing wouldn't have helped"? Or inevitable meaning that "bugs were likely to make their way into production given the amount of money they spent on testing"? Perhaps.

    ReplyDelete
  3. "inevitable" in this context would mean no matter how much and how long one may test, things like these may happen. However there is certainly a lesson to learn in each case.

    ReplyDelete
  4. I agree that things like these may happen. I worry that incidents are too often tossed off as "there's nothing you can do." You can review your development, testing, and deployment practices. You can determine which parts are adequate and which parts need more time and money spent on them. Then you can choose to spend more time and money, or not.

    ReplyDelete