All blog posts

Building the Data Mosaic with Exabel

When everyone in a market has access to the same data, it's what you do with it that counts.

Building the Data Mosaic with Exabel

“If you can think of a data set, it's most likely out there — and if you can't find it, then you aren't looking hard enough.” Those were the words of Matthew Rothman, the deputy of quant strategies for the $52bn hedge fund Millennium Management at Neudata’s online summer summit on June 15th. “Five years ago we were almost like kids in a candy store” with all of the new and different types of alternative data that were coming onto the market, but "it's been a while since I saw something where I was like, 'wow, I've never seen something like this before’."

In the past, when alternative data was in its true infancy, the sector was in a state of constant discovery with new wonders appearing every day. Market practitioners all over the world were simultaneously learning that they could use satellite imagery to measure the heights of floating oil container lids and reveal inventory levels, or use credit card transaction data to leap ahead of company reporting figures. Armed with these new techniques and flowering datasets, one could enter the financial markets and reliably make money, certain that the novelty factor meant that only a few participants were using these same datasets. 

Today, Rothman’s statements speak to a change. The explosion of alt datasets has continued, but as more participants have entered and information about how to use alt data has diffused,, the first-mover advantage has been eroded, and the game-changing dataset which alone could generate significant returns doesn’t appear nearly as often. This is only to be expected of course. Time always adds complexity, based on the spread of understanding of what has gone before, indeed another word for this phenomenon is ‘progress’. 


The question that follows is where now is the cutting edge for using alternative data to make money in the markets? Where is the competition to be found today? 

The question that follows is where now is the cutting edge for using alternative data to make money in the markets? Where is the competition to be found today? 

The answer, as is often the case, lies in incremental gains. One hedge fund manager recently described to me how his firm benefits from its understanding of how other market participants are using alternative data in ways that are now becoming ‘old’ - his team identifies the alternative data that they are likely using, obtains a more complete dataset or mix of datasets, and then makes money off these simpler practitioners of alternative data. 

The greater truth is that, as alternative data continues its implacable march towards becoming ‘just data’, the challenges around it are coming to resemble that of traditional data. In other words, when everyone in a market has access to the same data, it's what you do with it that counts. The competitive edge can now be secured and maintained by taking datasets which might not be fresh and new, but can be used in new ways. 


The way to stay ahead, using creativity and ingenuity to reach answers that sophisticated market players would not view as ‘stale’, is to combine different alternative datasets in ways that improve accuracy levels. This challenge, known as ‘building the data mosaic’, can be extremely hard to do well. Alternative datasets often have wild variances and idiosyncrasies, and to build an effective mosaic these need to be corralled not just against the ‘ground-truth’ market data (i.e. the real stock prices, or KPIs, etc against which any dataset needs to be trained and which a model is trying to predict), but against other alternative datasets as well.


Exabel has created a software platform which makes building the data mosaic commonplace. The best way to illustrate how, is with an example. 

From an investment perspective, one of the major stories of the last 18 months has been the COVID-19 pandemic and the way consumption levels plummeted and are now gradually recovering as consumers return to pre-pandemic habits. A vivid example of using alternative data to track this phenomenon was demonstrated when Pret A Manger recently made its real-time sales activity available on Bloomberg.

That’s all well and good, but an investor does not just want to watch developments as they come in, but to predict future prices based on the high frequency alternative data available. To demonstrate how Exabel’s platform can do this, let's use the example of that beloved Pret A Manger equivalent - Chipotle.

The first layer of the mosaic is the ‘ground truth’ data, i.e. the estimated and reported metrics that are announced publicly in the marketplace. The Exabel platform is already populated with this ‘scaffolding data’ thanks to its partnerships with Factset and Visible Alpha. If we choose a two year period from 2019 to 2021, we see a chart that shows Chipotle sales (the yellow line) dropping rapidly in early 2020, exactly as one would expect given pandemic events of the time. 

Chipotle Mexican Grill - Credit card transactions (purple) against actual sales (yellow) for the period April 2019 - April 2021

Onto this we lay our first alternative dataset, credit card transactions data (the purple line) from one of our alternative data partners - 1010data. The credit card data leveraged in this example is reported 6 days after the fact, so if it contains predictive value there should be plenty of time to make money from it before quarterly results are released.

Looking at the chart, we see that the two lines are clearly correlated, though the credit card data does not quite capture the full fluctuations of the true data, as credit card sales do not collapse to quite the degree that is seen in total sales. This could be evidence of a cash effect - credit card transactions continued in the pandemic, whereas cash spend implies a physical interaction and so dropped off completely.

The ultimate aim, let’s remember, is to build a model that uses alternative data to predict the true sales line.  So we built one. As shown below, credit card sales data are able to successfully predict sales within a 3.5% prediction error band, a very solid result, but as we expected after studying the data, the prediction was not quite able to capture the full collapse in sales.

Chipotle Mexican Grill - Predicted sales (blue line) vs actual sales (black line)


Here is where we can look to further improve the accuracy by building a mosaic using a complementary alternative dataset. When we add visits data, also from 1010Data, we are better able to capture the steeper drop off in sales that came as a result of broad Chipotle store closures all over the country. 

Chipotle Mexican Grill - Adding visits data (orange line) to the picture, along with actual sales (yellow) and credit card transactions (purple)


This extreme drop depicted in the new visits line (orange), when incorporated into the predictive model delivers an estimate that tracks much closer to actual sales, driving the prediction error down from 3.5% to 1.9%. Exabel’s platform, in this case, has enabled a user to integrate two alt data sets which significantly increased the predictive capacity of the model.

Chipotle Mexican Grill - New predicted sales line using both credit card and visits data (blue) vs actual sales (black)


Although this is just one example, Exabel’s software allows you to look at a broad universe of companies at scale and generate predictions based on multiple alternative datasets in tandem. The results shown below highlight the improvement in prediction error rates across a larger universe of companies.

Using Exabel’s platform to expand beyond a single stock and inspect prediction error rates across a larger universe of companies


So far we have really been looking at the benefits of building a mosaic from the perspective of a fundamental investor. While a fundamental investor may focus on the gaps between expected results and actual results in a key performance indicator (KPI) like sales, Exabel's platform can also look at how to generate alpha from alternative data more directly. The platform can enable a quantamental investor to identify the incremental returns generated by co-mingling multiple datasets in a single strategy. Below is a return profile of a portfolio strategy built around 300 companies.  What is quite interesting, is that here too we can see the boost that adding visits data to credit/debit transaction data can bring to a portfolio strategy.  Although a strategy based on credit card data can yield a 17% annualized return with a sharpe ratio of 2, the addition of visits data provides an additional 4% annualized boost in returns, with a commensurate increase in the sharpe ratio to 2.8.

With Exabel’s platform it is possible for a quantamental investor to see the boost that adding visits data to credit/debit transaction data can bring to a portfolio strategy


Breaking this down even further, Exabel’s platform can help you parse out the alpha from the strategy, distinguishing between the returns attributable to the visits and card data signals (ie ‘alpha’) vs that which is driven largely by exposure to certain factors.

An investor can use the platform to distinguish between the returns attributable to alpha vs those driven by exposure to certain factors


Bringing this analysis full circle, Exabel enables you to track the data, alpha signals, and KPI predictions in easy to navigate dashboards, on an ongoing basis. Scaffolding data, alternative data and quantitative modelling predictions are all updated real-time, allowing an investor to not only build the mosaics, but also to keep track on a daily basis. This way it is possible to generate ideas and act on the alternative datasets as soon as the data is processed. Investors can easily identify opportunities, be they in the form of mismatches between wall street expectations and the predictions or simply alpha signals that help to capture inflections in the data.


Exabel allows a user to put the whole data mosaic in a dashboard - scaffolding data, alternative data and quantitative modelling predictions are all updated real-time


As the alternative data market matures and more datasets are introduced, it will become increasingly complex to identify and extract ideas from the data. The challenges that were previously insurmountable will become the baseline expectation. Today, the edge lies in constructing effective models across multiple alternative data sources, and with Exabel, building a data mosaic can be an integral part of every investment team’s process.

About Exabel

Exabel was founded in Oslo in 2016 by Øyvind Grotmol, a brilliant serial winner of international competitions in maths, physics and programming. In fact, those competitions proved a fertile recruiting ground for Exabel’s early team, many of whom still make up the company’s core, and who have now been supplemented by world class talent from the investment management, product and technology domains. 

Five years on, and the Exabel team have created a flexible and intuitive technology platform to simplify both the provider and consumer sides of the alternative data market. Data vendors use the platform to build and distribute easy-to-consume branded data insight offerings on top of their raw data assets. Investment teams can consume these insights and use Exabel’s powerful machine learning analytics and modelling toolset to further refine, combined and evolve them. Exabel creates the zone in which ‘filling the Alt-Data mosaic’ can become reality.

by Mark Fleming-Williams, Host of The Alternative Data Podcast.

Subscribe to updates from Exabel

Submitted!
Oops! Something went wrong while submitting the form.

Exabel is a financial technology company based in Oslo, London, and New York.