Studies that purport to establish or define performance
standards for web page loading times typically take one of three forms:
examinations and measurements of physiological traits, empirical studies based
on abandonment, or surveys based on participants’ emotional response. Of these
three, surveys are least likely to produce reliable results as they are based
on participants’ subjective self-assessment of their tolerance levels and not
on precise, concrete, measurable actions. As well, a participant’s tolerance
for loading times may vary significantly based on numerous factors such as:
Age, Experience, Task, Time of Day, and others[8].
However, the goal for defining performance standards is to
establish a level at which the typical user of a web page will be satisfied.
Setting a target performance level at a point where 50% of study participants
abandon the web page before it completes loading is a poor target for user
satisfaction. In order to aim for user satisfaction, the targets that are set
must be faster than the typical users’ frustration level – an emotional tipping
point that must be reached before a user will decide to abandon a web page.
Thus surveys, by necessity, play an important part in understanding how to
define an effective performance standard.
This article examines two significant web performance
surveys conducted by JupiterResearch in 2006 and Forrester Consulting in 2009
which attempt to produce generalizations about user’s satisfaction with web
page performance and their tolerance thresholds. The rest of this article
reviews the methodology used by these two surveys and potential deficiencies in
those methodologies, and then describes an experimental survey to compare how
participant responses differ when similar survey questions are used but the
method for providing their response differs. The results of the survey are then
presented and compared. Conclusions are drawn regarding the value and
usefulness of the research.
Published Survey-Based Standards
JupiterResearch (2006)
Retail Web Site Performance, Consumer Reaction to a Poor Online Shopping
Experience. June 1, 2006 prepared for Akamai Technologies, Inc.
Key Finding:
“Overall, 28 percent of online
shoppers will not wait longer than four seconds for a Web site page to load
before leaving. Broadband users are even less tolerance of slow rendering. A
full one-third of online shoppers with a broadband connection are unwilling to
wait more than four seconds (compared with 19 percent of online shoppers with a
dial-up connection).”[6]
Methodology:
JupiterResearch conducted a survey of 1,058 online shoppers. Among other questions posed to respondents the one we are concerned with is
“Typically, how long are you willing to wait for a single Web page to load
before leaving the Web site? (Select one.)”.
The options presented were:
- Less than 1 second
- 1 to 2 seconds
- 3 to 4 seconds
- 5 to 6 seconds
- More than 6 seconds
Forrester Consulting (2009)
eCommerce Web Site Performance Today, An Updated Look At Consumer Reaction To
A Poor Online Shopping Experience. August 17, 2009 prepared for Akamai
Technologies, Inc.
This report is a directly analogous to the 2006 report by
JupiterResearch as the latter was acquired by Forrester Consulting in 2008[7].
Key Finding:
“Forty-seven percent of consumers
expect a Web page to load in 2 seconds or less.”[2]
Methodology:
Forrester Consulting conducted a survey of 1,048 online
shoppers. Among other questions posed to respondents the one we are concerned
with is “What are your expectations for how quickly a Web site should load when
you are browsing or searching for a product?”.
The options presented were:
- Less than 1 second
- 1 second
- 2 seconds
- 3 seconds
- More than 4 seconds
Comparison
Forrester Consulting references the previous 2006 study and
attributes the difference in their key findings to increasing access to
broadband among US customers. However the report fails to provide any
supposition as to why the expectations of broadband
users would increase so dramatically in three years. The 2006 survey reports
33% of broadband users will not wait more than 4 seconds, while the 2009 survey
reports that at least 47% of broadband users will not wait more than 2 seconds.
Comparing similar units, the percentage of broadband users
that will not wait more than 2 seconds for a web page to load increases from
12% in 2006 to 47% in 2009. No reason is suggested by the paper as to why there
is a four-fold increase in only 3 years.
The 2009 paper also makes the claim that “This methodology
was consistent with the 2006 study methodology.” Although the basic format of
the survey and how it was conducted remained the same, the options being
presented to the respondents for this question are dramatically different. In
2006, 73.5% of respondents answered with “5 to 6 seconds” or “more than 6
seconds”. These same respondents in 2009 have no longer been given the option
to make this distinction and have all been lumped into the same “more than 4
seconds” category.
Presenting such a limited range of options in the 2009
survey, it may be that respondents who would have naturally answered by
selecting an option with a larger time reconsidered what their answer would be
when presented with options dramatically different from their initial
expectation. It may be that instead of choosing the most appropriate answer
based on their initial thought (more than 4 seconds) they instead selected an
option closer to the middle. This presents the possibility that either or both
of these surveys may be subject to Central Tendency Bias or Position Bias.[3]
Experimental Survey Methodology
The purpose of this survey is to determine if the choice of
answer structure and the options presented in the 2006 and 2009 surveys
impacted the answers given by the respondents. In order to make this
determination, this survey presented a single question to respondents that
closely matched the question presented in the 2006 and 2009 surveys. The
differing factor is that this survey allowed respondents to answer however they
wished using a free-form answer field instead of selecting a predefined option.
The primary question presented in this survey is: “When
opening a typical webpage, how long (in seconds) will you wait for it to load
before feeling frustrated or taking some kind of action? Taking an action may
include doing something else while you wait (switching windows/tabs), reloading
the page, or giving up and going somewhere else.”
Three additional demographic questions were also included in
the survey:
- Age? - Options: under 18, 18-24, 25-36, 35-44, 45-54, 55-64, 65-74, over 75.
- Gender? - Freeform entry
- At what level of proficiency do you use the internet? - Options: I am a web application developer, I am a content creator, I use it for work, I use it for personal use regularly (>3 times/week), I use it for personal use occasionally (<= 3 times/week).
Findings
The results of this survey are based on 78 online responses
from Canada and the US. The answers given by the respondents ranged from 0.5s
to 60s. As shown in Table 1 this survey resulted in responses that were
significantly higher and distributed much more broadly than the responses
provided to the 2006 and 2009 surveys, as expected. However we can also see
that this survey’s resulting Median and Mode closely match those from the 2006
survey. Compressing all the responses in the Freeform survey that exceeded 6
seconds into a single >6s option would have resulted in the same Mode value.
In contrast the 2009 survey portrays a vastly different picture.
Table 1: Survey Average Values Comparison
Freeform Survey
|
2006 Survey (Jupiter)
|
2009 Survey (Forrester)
|
|||||
Median
|
5.00
|
Median
|
5-6s
|
Median
|
3s
|
||
Mean
|
9.82
|
Mean*
|
5.80
|
Mean*
|
2.51
|
||
Mode
|
5.00
|
Mode
|
>6s
|
Mode
|
3s
|
||
SD
|
11.41
|
SD*
|
1.57
|
SD*
|
1.01
|
*Mean and Standard Deviation values for
2006 Survey (Jupiter) and 2009 Survey
(Forrester) were calculated using the midpoint value in each option range, and
using the highest value + 1s for the highest option. These calculations are not
intended to be exact, but are used in this context for comparative purposes
only.
Grouping the responses to this survey into the same options
presented by the 2006 and 2009 surveys produce a clearer comparison as shown in
Figure 1 and Figure 2.
Figure 1: Response Frequency by Option, Freeform vs 2006 (Jupiter) Survey
Although the 2006 survey suffers from compression at the
largest interval which contains nearly half of all responses, the results are a
close match to the results of this survey. A slight shift towards faster web
page loading time expectations can be seen between 2006 when the JupiterResearch
survey was completed, and 2014 when this survey was completed. This shift is evident by the in the increase in percentage of responses in the <1s and 1-2s categories and the corresponding decrease in responses in the 3-4s, 5-6s, and >6s categories.
The Student’s t-test[1] when applied to the compressed results shown in Figure 1 using an independent two-sample t-test for unequal variances produced a value of P = 0.1650.
The Student’s t-test[1] when applied to the compressed results shown in Figure 1 using an independent two-sample t-test for unequal variances produced a value of P = 0.1650.
Figure 2: Response Frequency by Option, Freeform vs 2009 (Forrester) Survey
The 2009 survey results show no such similarity, the
distribution of responses shows a vastly different pattern than the responses
to both the 2006 survey and this survey. The Student’s t-test[1] when applied to
the compressed results shown in Figure 2 using an independent two-sample t-test
for unequal variances produced a value of P = 7.147*10-8.
Conclusions
There is a significant agreement between the results of this
freeform survey and the 2006 JupiterResearch survey (P > 0.05) which
indicates that it is unlikely that there is significant bias caused by the
structure of the question or the options presented to the respondents for that
survey. However, the results of the 2009 Forrester Consulting survey disagree
greatly (P < 0.01) which is suggestive that the 2009 survey is being
subjected to some form of bias that is likely imparted by the presentation of
response options to the respondents.
All surveys that are conducted in an attempt to
quantitatively define an emotional response (frustration) are going to produce
results that are imprecise and limited by the ability of respondents to
accurately self-evaluate. Patience is a volatile thing, fluctuating wildly
between different users and within a single users themselves depending on their
current state of mind.[4][5]
The freeform survey in particular is also limited by its
small sample size of respondents, increasing the number of respondents to
produce a sample size of >1000 would be beneficial to provide a better
comparison and strengthen confidence in the conclusions.
Business Applicability of Results
It is important when working within a business context to
understand how to apply the results obtained by web page performance surveys
when establishing a set of performance guidelines or requirements. It is
typically ineffective to set performance standards by using the median result –
it is not an effective business goal to aim for a standard where only 50% of your users abandon your web
page out of frustration. On the opposite end setting a standard where every
user will be satisfied with the web page performance for every page may be
unrealistic, factors such as type of connection, geographic location, and
processing time can prevent those from being achievable.
A more typical approach is to use a high percentile of the
survey results as a business performance standard, setting a goal to meet the
performance level that would satisfy 90% of your users. Figure 3 compares the
three surveys and their responses that correspond to several response
percentile levels.
Figure 3: Response Time Percentile Comparison
Based on the percentile analysis of survey responses we can draw conclusions regarding the expected percentage of satisfied users based on achieving a specific web page performance response time standard.
We can observe that the percentage of users that will be satisfied with a web page performance standard of 1s or less is >=95% in the freeform survey, >=99% in the 2006 (Jupiter) survey, and >=90% in the 2009 (Forrester) survey. This gives us the conclusion that in general at least 90% of all users would be satisfied if a web page performance standard of 1s or less was achieved.
Similarly, a performance standard of 2s or less is >=85% in the freeform survey, >=99% in the 2006 (Jupiter) survey, and >=80% in the 2009 (Forrester) survey. Thus at least 80% of all users would be satisfied if a standard of 2s or less was achieved.
These observations are based on the worst-case scenario that the survey that provided the most aggressive performance targets is the most accurate of the three. This does provide a good lower bound for user satisfaction, but not perhaps a good expected level of user satisfaction. If we take each of the surveys as having equal weight, we can average their response time percentile values to determine an expected level of user satisfaction.
Based on the average percentile value, we see that our expected level of user satisfaction with a performance standard of 1s or less is >=99%, and for a standard of 2s or less it is >=90%. Thus given a business case where we aim to achieve at least a 90% rate of user satisfaction with our web site performance, we expect that our web page response times would need to be 2s or less.
We can observe that the percentage of users that will be satisfied with a web page performance standard of 1s or less is >=95% in the freeform survey, >=99% in the 2006 (Jupiter) survey, and >=90% in the 2009 (Forrester) survey. This gives us the conclusion that in general at least 90% of all users would be satisfied if a web page performance standard of 1s or less was achieved.
Similarly, a performance standard of 2s or less is >=85% in the freeform survey, >=99% in the 2006 (Jupiter) survey, and >=80% in the 2009 (Forrester) survey. Thus at least 80% of all users would be satisfied if a standard of 2s or less was achieved.
These observations are based on the worst-case scenario that the survey that provided the most aggressive performance targets is the most accurate of the three. This does provide a good lower bound for user satisfaction, but not perhaps a good expected level of user satisfaction. If we take each of the surveys as having equal weight, we can average their response time percentile values to determine an expected level of user satisfaction.
Based on the average percentile value, we see that our expected level of user satisfaction with a performance standard of 1s or less is >=99%, and for a standard of 2s or less it is >=90%. Thus given a business case where we aim to achieve at least a 90% rate of user satisfaction with our web site performance, we expect that our web page response times would need to be 2s or less.
References
[1]
Encyclopedia Brittanica (2014), Student's t-test. Available at: http://www.britannica.com/EBchecked/topic/569907/Students-t-test
[2]
Forrester
Consulting (2009), eCommerce Web Site Performance Today. Available at: http://www.damcogroup.com/white-papers/ecommerce_website_perf_wp.pdf
[3]
Gingery, Tyson (2009), Survey Research
Definitions: Central Tendency Bias, Cvent:
Web Surveys, Dec 22, 2009. Available at: http://survey.cvent.com/blog/market-research-design-tips-2/survey-research-definitions-central-tendency-bias
[4]
Gozlan, Marc (2013), A stopwatch on the brain’s
perception of time, Guardian Weekly,
Jan 1, 2013. Available at: http://www.theguardian.com/science/2013/jan/01/psychology-time-perception-awareness-research
[5]
Hotchkiss, Jon (2013), How Bad Is Our Perception
of Time? Very!, Huffington Post – The
BLOG, Sep 19, 2013. Available at: http://www.huffingtonpost.com/jon-hotchkiss/how-bad-is-our-perception_b_3955696.html
[6]
JupiterResearch
(2006), Retail Web Site Performance. Available at: http://www.akamai.com/dl/reports/Site_Abandonment_Final_Report.pdf
[7]
Kaplan, David (2008), Forrester Buys
JupiterResearch for $23 Million, Forbes
Magazine, Jul 31, 2008. Available at: https://web.archive.org/web/20080915011602/http://www.forbes.com/technology/2008/07/31/forrester-buys-jupiter-research-tech-cx_pco_0731paidcontent.html
[8]
Shneiderman, Ben (1984), Response Time and
Display Rate in Human Performance with Computers, Computing Surveys 16, no. 3 (1984):
pages 265-285. Available at: http://dl.acm.org/citation.cfm?id=2517