Monday, July 28, 2014

Web Performance Standards: Finding Value in User Surveys

Studies that purport to establish or define performance standards for web page loading times typically take one of three forms: examinations and measurements of physiological traits, empirical studies based on abandonment, or surveys based on participants’ emotional response. Of these three, surveys are least likely to produce reliable results as they are based on participants’ subjective self-assessment of their tolerance levels and not on precise, concrete, measurable actions. As well, a participant’s tolerance for loading times may vary significantly based on numerous factors such as: Age, Experience, Task, Time of Day, and others[8].

However, the goal for defining performance standards is to establish a level at which the typical user of a web page will be satisfied. Setting a target performance level at a point where 50% of study participants abandon the web page before it completes loading is a poor target for user satisfaction. In order to aim for user satisfaction, the targets that are set must be faster than the typical users’ frustration level – an emotional tipping point that must be reached before a user will decide to abandon a web page. Thus surveys, by necessity, play an important part in understanding how to define an effective performance standard.

This article examines two significant web performance surveys conducted by JupiterResearch in 2006 and Forrester Consulting in 2009 which attempt to produce generalizations about user’s satisfaction with web page performance and their tolerance thresholds. The rest of this article reviews the methodology used by these two surveys and potential deficiencies in those methodologies, and then describes an experimental survey to compare how participant responses differ when similar survey questions are used but the method for providing their response differs. The results of the survey are then presented and compared. Conclusions are drawn regarding the value and usefulness of the research.

Published Survey-Based Standards

JupiterResearch (2006)

Retail Web Site Performance, Consumer Reaction to a Poor Online Shopping Experience. June 1, 2006 prepared for Akamai Technologies, Inc.

Key Finding:
“Overall, 28 percent of online shoppers will not wait longer than four seconds for a Web site page to load before leaving. Broadband users are even less tolerance of slow rendering. A full one-third of online shoppers with a broadband connection are unwilling to wait more than four seconds (compared with 19 percent of online shoppers with a dial-up connection).”[6]

Methodology:
JupiterResearch conducted a survey of 1,058 online shoppers. Among other questions posed to respondents the one we are concerned with is “Typically, how long are you willing to wait for a single Web page to load before leaving the Web site? (Select one.)”.

The options presented were:
  • Less than 1 second
  • 1 to 2 seconds
  • 3 to 4 seconds
  • 5 to 6 seconds
  • More than 6 seconds

Forrester Consulting (2009)

eCommerce Web Site Performance Today, An Updated Look At Consumer Reaction To A Poor Online Shopping Experience. August 17, 2009 prepared for Akamai Technologies, Inc.

This report is a directly analogous to the 2006 report by JupiterResearch as the latter was acquired by Forrester Consulting in 2008[7].

Key Finding:
“Forty-seven percent of consumers expect a Web page to load in 2 seconds or less.”[2]

Methodology:
Forrester Consulting conducted a survey of 1,048 online shoppers. Among other questions posed to respondents the one we are concerned with is “What are your expectations for how quickly a Web site should load when you are browsing or searching for a product?”.

The options presented were:
  • Less than 1 second
  • 1 second
  • 2 seconds
  • 3 seconds
  • More than 4 seconds

Comparison

Forrester Consulting references the previous 2006 study and attributes the difference in their key findings to increasing access to broadband among US customers. However the report fails to provide any supposition as to why the expectations of broadband users would increase so dramatically in three years. The 2006 survey reports 33% of broadband users will not wait more than 4 seconds, while the 2009 survey reports that at least 47% of broadband users will not wait more than 2 seconds.

Comparing similar units, the percentage of broadband users that will not wait more than 2 seconds for a web page to load increases from 12% in 2006 to 47% in 2009. No reason is suggested by the paper as to why there is a four-fold increase in only 3 years.

The 2009 paper also makes the claim that “This methodology was consistent with the 2006 study methodology.” Although the basic format of the survey and how it was conducted remained the same, the options being presented to the respondents for this question are dramatically different. In 2006, 73.5% of respondents answered with “5 to 6 seconds” or “more than 6 seconds”. These same respondents in 2009 have no longer been given the option to make this distinction and have all been lumped into the same “more than 4 seconds” category.

Presenting such a limited range of options in the 2009 survey, it may be that respondents who would have naturally answered by selecting an option with a larger time reconsidered what their answer would be when presented with options dramatically different from their initial expectation. It may be that instead of choosing the most appropriate answer based on their initial thought (more than 4 seconds) they instead selected an option closer to the middle. This presents the possibility that either or both of these surveys may be subject to Central Tendency Bias or Position Bias.[3]


Experimental Survey Methodology

The purpose of this survey is to determine if the choice of answer structure and the options presented in the 2006 and 2009 surveys impacted the answers given by the respondents. In order to make this determination, this survey presented a single question to respondents that closely matched the question presented in the 2006 and 2009 surveys. The differing factor is that this survey allowed respondents to answer however they wished using a free-form answer field instead of selecting a predefined option.

The primary question presented in this survey is: “When opening a typical webpage, how long (in seconds) will you wait for it to load before feeling frustrated or taking some kind of action? Taking an action may include doing something else while you wait (switching windows/tabs), reloading the page, or giving up and going somewhere else.”

Three additional demographic questions were also included in the survey:
  • Age? - Options: under 18, 18-24, 25-36, 35-44, 45-54, 55-64, 65-74, over 75.
  • Gender? - Freeform entry
  • At what level of proficiency do you use the internet? - Options: I am a web application developer, I am a content creator, I use it for work, I use it for personal use regularly (>3 times/week), I use it for personal use occasionally (<= 3 times/week).

Findings

The results of this survey are based on 78 online responses from Canada and the US. The answers given by the respondents ranged from 0.5s to 60s. As shown in Table 1 this survey resulted in responses that were significantly higher and distributed much more broadly than the responses provided to the 2006 and 2009 surveys, as expected. However we can also see that this survey’s resulting Median and Mode closely match those from the 2006 survey. Compressing all the responses in the Freeform survey that exceeded 6 seconds into a single >6s option would have resulted in the same Mode value. In contrast the 2009 survey portrays a vastly different picture.

Table 1: Survey Average Values Comparison

Freeform Survey

2006 Survey (Jupiter)

2009 Survey (Forrester)
Median
5.00
Median
5-6s
Median
3s
Mean
9.82
Mean*
5.80
Mean*
2.51
Mode
5.00
Mode
>6s
Mode
3s
SD
11.41

SD*
1.57

SD*
1.01
*Mean and Standard Deviation values for 2006 Survey (Jupiter) and 2009 Survey (Forrester) were calculated using the midpoint value in each option range, and using the highest value + 1s for the highest option. These calculations are not intended to be exact, but are used in this context for comparative purposes only.

Grouping the responses to this survey into the same options presented by the 2006 and 2009 surveys produce a clearer comparison as shown in Figure 1 and Figure 2.


Figure 1: Response Frequency by Option, Freeform vs 2006 (Jupiter) Survey


Although the 2006 survey suffers from compression at the largest interval which contains nearly half of all responses, the results are a close match to the results of this survey. A slight shift towards faster web page loading time expectations can be seen between 2006 when the JupiterResearch survey was completed, and 2014 when this survey was completed. This shift is evident by the in the increase in percentage of responses in the <1s and 1-2s categories and the corresponding decrease in responses in the 3-4s, 5-6s, and >6s categories.

The Student’s t-test[1] when applied to the compressed results shown in Figure 1 using an independent two-sample t-test for unequal variances produced a value of P = 0.1650.

Figure 2: Response Frequency by Option, Freeform vs 2009 (Forrester) Survey


The 2009 survey results show no such similarity, the distribution of responses shows a vastly different pattern than the responses to both the 2006 survey and this survey. The Student’s t-test[1] when applied to the compressed results shown in Figure 2 using an independent two-sample t-test for unequal variances produced a value of P = 7.147*10-8.

Conclusions

There is a significant agreement between the results of this freeform survey and the 2006 JupiterResearch survey (P > 0.05) which indicates that it is unlikely that there is significant bias caused by the structure of the question or the options presented to the respondents for that survey. However, the results of the 2009 Forrester Consulting survey disagree greatly (P < 0.01) which is suggestive that the 2009 survey is being subjected to some form of bias that is likely imparted by the presentation of response options to the respondents.

All surveys that are conducted in an attempt to quantitatively define an emotional response (frustration) are going to produce results that are imprecise and limited by the ability of respondents to accurately self-evaluate. Patience is a volatile thing, fluctuating wildly between different users and within a single users themselves depending on their current state of mind.[4][5]

The freeform survey in particular is also limited by its small sample size of respondents, increasing the number of respondents to produce a sample size of >1000 would be beneficial to provide a better comparison and strengthen confidence in the conclusions.

Business Applicability of Results

It is important when working within a business context to understand how to apply the results obtained by web page performance surveys when establishing a set of performance guidelines or requirements. It is typically ineffective to set performance standards by using the median result – it is not an effective business goal to aim for a standard where only 50% of your users abandon your web page out of frustration. On the opposite end setting a standard where every user will be satisfied with the web page performance for every page may be unrealistic, factors such as type of connection, geographic location, and processing time can prevent those from being achievable.

A more typical approach is to use a high percentile of the survey results as a business performance standard, setting a goal to meet the performance level that would satisfy 90% of your users. Figure 3 compares the three surveys and their responses that correspond to several response percentile levels.

Figure 3: Response Time Percentile Comparison


Based on the percentile analysis of survey responses we can draw conclusions regarding the expected percentage of satisfied users based on achieving a specific web page performance response time standard.

We can observe that the percentage of users that will be satisfied with a web page performance standard of 1s or less is >=95% in the freeform survey, >=99% in the 2006 (Jupiter) survey, and >=90% in the 2009 (Forrester) survey. This gives us the conclusion that in general at least 90% of all users would be satisfied if a web page performance standard of 1s or less was achieved.

Similarly, a performance standard of 2s or less is >=85% in the freeform survey, >=99% in the 2006 (Jupiter) survey, and >=80% in the 2009 (Forrester) survey. Thus at least 80% of all users would be satisfied if a standard of 2s or less was achieved.

These observations are based on the worst-case scenario that the survey that provided the most aggressive performance targets is the most accurate of the three. This does provide a good lower bound for user satisfaction, but not perhaps a good expected level of user satisfaction. If we take each of the surveys as having equal weight, we can average their response time percentile values to determine an expected level of user satisfaction.

Based on the average percentile value, we see that our expected level of user satisfaction with a performance standard of 1s or less is >=99%, and for a standard of 2s or less it is >=90%. Thus given a business case where we aim to achieve at least a 90% rate of user satisfaction with our web site performance, we expect that our web page response times would need to be 2s or less.

References

[1]   Encyclopedia Brittanica (2014), Student's t-test. Available at: http://www.britannica.com/EBchecked/topic/569907/Students-t-test
[2]   Forrester Consulting (2009), eCommerce Web Site Performance Today. Available at: http://www.damcogroup.com/white-papers/ecommerce_website_perf_wp.pdf
[3]   Gingery, Tyson (2009), Survey Research Definitions: Central Tendency Bias, Cvent: Web Surveys, Dec 22, 2009. Available at: http://survey.cvent.com/blog/market-research-design-tips-2/survey-research-definitions-central-tendency-bias
[4]   Gozlan, Marc (2013), A stopwatch on the brain’s perception of time, Guardian Weekly, Jan 1, 2013. Available at: http://www.theguardian.com/science/2013/jan/01/psychology-time-perception-awareness-research
[5]   Hotchkiss, Jon (2013), How Bad Is Our Perception of Time? Very!, Huffington Post – The BLOG, Sep 19, 2013. Available at: http://www.huffingtonpost.com/jon-hotchkiss/how-bad-is-our-perception_b_3955696.html
[6]   JupiterResearch (2006), Retail Web Site Performance. Available at: http://www.akamai.com/dl/reports/Site_Abandonment_Final_Report.pdf
[7]   Kaplan, David (2008), Forrester Buys JupiterResearch for $23 Million, Forbes Magazine, Jul 31, 2008. Available at: https://web.archive.org/web/20080915011602/http://www.forbes.com/technology/2008/07/31/forrester-buys-jupiter-research-tech-cx_pco_0731paidcontent.html
[8]   Shneiderman, Ben (1984), Response Time and Display Rate in Human Performance with Computers, Computing Surveys 16, no. 3 (1984): pages 265-285. Available at: http://dl.acm.org/citation.cfm?id=2517

Appendix A – Freeform Survey Response Distribution and Demographics