26-07-2021

Googling for a New Home

By Eric Schaap

Using Google search data to predict (socio-)economic indicators was a hot topic even before Seth Stephens-Davidowitz published his renowned book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are in 2017. Already in the early 2010s, researchers from different fields acknowledged the potential benefits of this novel data source which would allow them to predict automobile sales, unemployment claims, travel destination planning, consumer confidence, GDP growth, etc. Search data can reveal hidden correlations between factors that classic theory might have never thought of.

Given the success of early research on Google search data, it should come as no surprise that the housing market also turned to this data source. In general, studies have integrated data from selected Google search queries into models predicting future housing prices or unit sales. However, not only the dependent variable of a study employing Google search data should be reconsidered but also the search queries themselves. With regards to the housing market, it is fairly common for researchers to collect data on queries such as real estate agent or agency, house(s) for sale, or simply house(s) as well as entire search categories related to the real estate market. But, should we blindly trust these obvious keywords to predict the market best, or are other (less obvious) keywords better predictors?

Figure 1. House price index (HPI) predictions in the Netherlands.

Graph 14 01

The answer is yes: for the Dutch housing market, besides the frequently used keywords house (huis) or houses (huizen), terms such as mortgage (hypotheek) but also Rabobank and property (onroerend goed) have some predictive power with respect to housing prices. Most interestingly, is that the keyword rebuild value (herbouwwaarde) alone appears to be a valid predictor of future changes in residential mortgage flows. Rebuild value is a keyword that is predominantly used by the buyer, not the seller. Finding this keyword overcomes a fatal flaw in previous analysis: both sides of the market, i.e., demand and supply, are likely to use Google as a starting point for their search for buying or selling a house and, consequently, the aggregated effect of Google searches on housing prices is ambiguous. Rebuild value, however, more exclusively identifies the demand side of the market, which best predicts the real house prices.

Figure 1 shows this effect: the black line shows the actual development of the Dutch house price index (HPI) from January 2016 until January 2019. The dotted lines subsequently represent the predicted HPI of different models including macroeconomic variables only (blue, labelled benchmark), Google search data from a category (orange, labelled category) and data on specific search terms such as rebuild value (grey, labelled lasso, BIC). Clearly, the inclusion of such specific queries capturing the demand side of the market, outperforms models based on common search categories mentioned earlier.

Figure 2. Google search term vs. category.

Graph 14 02

The conclusion that using data on specific (yet not per se obvious) search terms is more advantageous over search data from an entire category can be visually supported when observing the development of the HPI, the Google searches of the term houses and those of the category Property (including terms as ‘houses’, ‘mortgages’, and such). Figure 2 shows clearly that the searches for the keyword houses (blue line) follow the price developments much closer than the search behaviour within the category Property (orange line) which slopes downward from 2007 onwards before stabilising in 2012. In contrast, the price index and the searches for houses dip in 2013/14 before moving upwards again.

Thus, Google search data offers researchers and practitioners in the real estate market with new ways to analyse and forecast the dynamics of this particular market. However, to fully reap the potential benefits of this data source, the selection of the variable(s) of interest and the search queries is crucial.

Read the MCRE Working Paper