AMAZON today has identified the cause of the error that brought much of the internet grounding to a halt this week: someone with dodgy fingers typed the wrong command.
On Wednesday morning large slabs of the internet were brought to a go-slow, as the Amazon S3 cloud server that powers a lot of back end operations on website went offline for nearly four hours.
Today Amazon released a post mortem of the massive failure and amid the jargon of “GET, LIST, PUT and DELETE requests” comes the key point that someone typed a command incorrectly and everything went bad.
Amazon chief technology officer Werner Vogels summed up the findings of the report with a tweet suggesting he would get a tattoo to remind himself of the failures of technology.
Amazon says the IT maintenance person was following an “established playbook” in carrying out a procedure intended to take a small number of servers in the giant cloud storage system in Virginia offline.
MORE: What happens when The Cloud crashes
MORE: Amazon outage causes online chaos
“Unfortunately, one of the inputs to the command was entered incorrectly,” Amazon says.
There is nothing in the official Amazon report about what the employee said when he accidentally stuffed the internet — perhaps they are saving that for a book that no doubt would be an Amazon bestseller.
According to analysis by Cyence, the four-hour outage cost some of the biggest companies in the world up to $211 million in down time and lost performance.
Web monitoring performance company Apica reported 54 of the top 100 internet retailers were affected by the outage.
While only three websites went totally offline — Lululemon, One Kings Lane and Express — top etailers affected included the Disney Store which was running 1165 per cent slower, the US Target store slowed by 991 per cent and Nike dropped performance by 642 per cent.
Other services affected included Slack, Quora and the US Securities and Exchange Commission.
The Amazon site was not affected by the outage, although Amazon could not update its status page at one stage so that the dashboard incorrectly showed it was operating at full performance.
Amazon says in the post mortem that it has implemented safeguards so that any incorrect input won’t bring the whole cloud server down again.