A/B test case study – Arenaturist booking engine layout
Data driven decision making
When determining the configuration of elements on your website, it is always a better strategy to take decisions based on actual data. This concept becomes even more crucial when the elements in mind have an impact on your conversion process.
To put things in the real (or rather- virtual) world- how do you decide which location is best for your booking engine? What color palette works best? Which version of the text drives more conversion? Wouldn’t it be just marvelous if every option came with a clear parameter value attached to it, ie- how many clicks, conversions etc it created? If we only had this data when taking the decision, life would be much more simple, and our sites- much more effective.
Data- driven decision making is, basically, exactly that- collecting the data in order to have the required numbers in front of you when you take those tough decisions.
There are many approaches to acquiring and analyzing such data. A/B testing is by far the one most used by the Web and SEO industries, mostly because of its ease of use and decisive output.
What is A/B testing?
To put it simple, A/B testing is a process in which you compare the performance of 2 designs that are identical in all aspects apart from 1- the element you want to examine.
Like all good tests, A/B testing is a statistical process, and as such, requires a clear hypothesis, well defined parameters and a correct definition of the required sample size in order to make a statistically valid decision.
Sounds too complicated? Don’t let that scare you away. The A/B test is so commonly in use by the web industry, that there are many free tools out there to handle all of the statistics for you, and Google Analytics even has experiments built into the Analytics system (see Google analytics experiments overview).
All you need, is to make sure you have
- 2 designs identical in all parameters except the element you want to choose
- The ability to randomly serve the different designs to users A well defined Goal that allows you to measure the performance of each design.
- For example: Goal completion (filling out a form), Event (clicking on a button) or E-commerce parameters (conversion rate). It can even be parameters like BR (if the purpose is to reduce it) or any other parameter- as long as it is meaningful for you and measuring it can successfully determine the performance of a design in a way that is comparable (mathematically).
We bring here a case study of A/B tests, to elicit the importance of data driven decision making.
Arenaturist network booking engine location- an A/B testing case study
Arenaturist is a leading hotel and resorts chain in Croatia, managing 19 properties.
In late 2013, Arenaturist decided to launch a new version of their 3 websites:
Arenaturist.com, arenacamps.com and arenaturist.hr.
When creating the new design, Arena was faced with a dilemma: how to position the booking engine? Two designs were proposed, differing by 1 parameter:
Design A- a horizontal layout
Design B- a vertical layout
In an attempt to maximize the online revenue, it was important to choose the layout that would have the highest conversion rate.
Therefore, an experiment has been conducted on all Arena domains.
In General, the experiment was conducted as an A/B test for the display of the booking engine. Upon first visit, a user would be randomly presented with either:
an A version of the site (Horizontal display of the booking engine, served from a URL such as http://www.arenaturist.com/croatia_hotels/park_plaza_histria_hotel?mlt=A
for example), or – a B version of the site (vertical display of the booking engine, served from a URL such as http://www.arenaturist.com/croatia_hotels/park_plaza_histria_hotel?mlt=B for example).
The user was also marked by a cookie for 60 days, so that each following visit will produce the same version as the first one.
Visits and site versions were tracked using costume variables in analytics (the built-in analytics experiments could not be used due to technical limitations), therefore requiring manual statistics testing, performed by the Carmelon agency during data analysis.
Canonical tags were set on the URL’s with parameters (such as in the examples above) in order to prevent indexing of various URL’s with the same content.
The experiment continued until a sufficient amount of bookings were performed in order to statistically differentiate between the 2 layouts.
The data showed a 0.32% conversion rate for layout B, as opposed to 0.23% for layout A- a 52% higher conversion rate for the vertical booking engine.
In absolute numbers, this data represents 119 transactions for the vertical layout, produced from 37,770 visits, while the horizontal layout produced 87 transactions from 38,121 visits- this is a difference (in favor of the vertical layout) of 32 transactions more, made from 351 visits less than the horizontal layout.
Thus, it was concluded that the test results were in favor of the vertical (B) layout.
Summary table 1:
Conclusion and recommendation
Based on the data above, the Carmelon agency recommended that Arena implements a vertical layout of the booking engine on all Arena websites.
Notes on statistical validity
It should be noted that experiments such as this are always sensitive to the sample size (in this case- the number of bookings). Therefore, it is crucial to always statistically examine the results, and not to decide just by the way the numbers look like. That said, sometimes the required sample size is too big to be realistic in commercial environments, and decisions have to be made based on lower significance values (ie- higher error rates). Therefore, an important principle when performing this experiments manually, is to determine before the experiment starts what are the parameters boundaries (ie- what sample size is required, what statistical significance level is accepted etc). If not done, the tester risks ending the experiment based on a hunch (or worst- reaching a biased conclusion), thus the whole experiment becomes pointless.
Such issues can also sometimes be avoided when using automated testing tools such as the built in experiments in Google analytics.
Arena’s case was an excellent example to show the need for statistical validity: the experiment was extremely sensitive due to relatively low numbers of bookings in the beginning. For example, stopping the experiment earlier (after half of the time), would have resulted in no apparent difference between the 2 layouts, and stopping it 3 days before that- would have resulted in a slight preference for the horizontal version (see tables below). Therefore, it is clear that should we have decided to stop based on a timeframe or any other parameter, any conclusion could have been reached.
However, a statistical examination shows that both these conclusions are not valid, as the sample size (or more specifically- the difference in the number of bookings yielded by each layout) was not enough to statistically differentiate between the layouts.
Summary table 2: data for stopping the experiment at an earlier time- no apparent difference between the layouts.
Summary table 3: data for stopping the experiment 3 days before the state described at Table2- a slight preference for layout A.
Arena’s study case demonstrates the need for clear, actual data in the decision-making process, and for the great value that A/B testing can supply consulting agencies and site owners alike when deciding which elements should be used. A/B testing is a (relatively) easy tool to apply, and yields reliable results that can be used to differentiate between different designs based on their performance. Albeit, as in any statistical test, cretin considerations must be observed, and careful planning of the test is necessary in order to allow accurate conclusions to be drawn.