Today's article comes courtesy of understanding your customer's environment, and that emphasizing style over substance can break the bank.
Our story starts with an urgent request for assistance from a system administrator, a minor application update has resulted in a nearly five-fold increase in total network traffic at a remote office. On Monday morning, when the morning shift arrived network traffic began to spike. It grew, and grew, far beyond normal traffic levels for the office and stayed there, night and day, until the last shift signed off on Friday evening.
Think about this for a moment.
The application's network traffic did not increase five-fold. The entire office's network traffic consumption increased from ~2Mbps (24-hour average) to nearly 9Mbps (24-hour average) with peaks topping out at 10Mbps.
The first stage is always denial.
How could a minor update to a server-based web application that mostly consisted of a couple minor bug fixes, one new summary screen, and some UI tweaks be the problem? It must be something else, there isn't anything new or unusual in the application update.
Step 1: Isolating the Problem
Easy enough to do, by using more detailed reports we could see the initial spike in unusual network activity began at 10am on the Friday before, the same time as the outage window when the application update was deployed.
Also, there were no other maintenance activities ongoing at that time, no file transfers, and no other system updates scheduled or otherwise.
Step 2: Reproducing the Issue
To confirm, we conducted a test where all instances of the application on every workstation on the site was closed, and by so doing network traffic returned to its normal (low) stable state.
One by one, each workstation re-loaded the application and by time the last workstation was reconnected the network traffic was back at its five-fold peak. It was during this process of re-loading that the system administrator noticed something unusual. A new UI feature had been added to some screens, when a specific piece of data was in a warning state - the text box would flash orange. And every time a new screen loaded that happened to display one of these flashing orange boxes, network traffic would spike sharply.
In the end, the system administrator noticed 10 out of more than 100 workstations happened to be displaying this flashing orange box. Returning to shut down just these 10 flashing displays suddenly dropped network traffic back to normal once more. When they were re-loaded, network traffic would spike once more.
Step 3: Cause and Solution
So how can a simple CSS and HTML5 flashing text box on 10 screen possibly start pushing 4 times the total network traffic of an office of more than 100? The technique to highlight a text box had been used elsewhere in the application already, and the style control was rendered entirely in-browser with no additional server requests being made.
How can a box no larger than 400px by 300px with a lovely fade-in/fade-out HTML5 animation possibly consume 800Kbps per screen?
Animated. For the first time in this application, the developer (to satisfy a requested requirement) used a simple animation cycle to flash the control orange on a 4-second cycle. It was a lovely effect, understated, but unmistakable and was easily visible when displayed on a large screen from across the room.
What wasn't clear however, was that each workstation did not use a browser to view the application. Instead, each workstation used a Remote Desktop (RDP) session to a centralized (offsite) server which then connected to the application in a browser window.
Instead of being rendered in a local browser session and consuming no network resources at all, the beautifully animated flashing textbox was being rendered and streamed as uncompressed full-motion video over the network to the site office.
10 workstations streaming uncompressed video for 24-hours a day, 5 days a week.
Step 4: Resolution
In the end the problem was easy to solve. Rip out the animation, and using a simple steady-state orange highlight to communicate the information.
But it was an important lesson. You must be aware of how your customer is using the tools that you build, and how the simplest of design choices can cause a major impact under the wrong circumstances.