The Alternative Data Discovery Mechanism

The alternative data market seems awash with data marketplaces. For every giant cloud provider looking to increase client traction with a marketplace service, there’s a nimble new company striving to provide the perfect environment for a buyer to match with the ideal dataset for its needs. Which solution will ultimately prove most efficient remains to be seen, but the prevailing trend is evidence of an alternative data market striving to evolve. Such developments also go hand in hand with the increasing importance of the companies providing enabling technology and services that help smooth the process.

A short history of data discovery

To understand this latest development in alternative data discovery, it is useful to examine where it has come from; and that means going back at least twenty years.

The earliest forms of alt data will always have been DIY. At the end of the last millennium, computer-savvy hedge funds were exploring the fast-emerging opportunities in the internet and finding that companies were being remarkably naive about what they were sharing online. A former investor told me how by scraping the website of the PC retailer Dell he was able to pick apart their business model, and make remarkably precise predictions about their quarterly sales figures.

The ecosystem then evolved in the early 2000s, as rudimentary providers emerged and companies began to intermediate this new market. These earliest purveyors of alternative data would follow the traditional equity research model by buying up raw datatypes – one example I heard was a freshman experimenting with scraped Craigslist data – and crunching them for insights, before selling the resulting signals as research to a hungry hedge fund market.

The following decade saw the arrival of the alternative data catalogue model. Entrepreneurs saw a growing demand for raw data from a growing number of hedge funds that wanted to extract their own signals rather than having them curated by an intermediary. These new players developed a relationships-based business model in which buyers and sellers paid to be introduced to one another, often around conference events.  

The latest innovation: the data marketplace

Now in the 2020s, the wheel is turning once again, and the new innovation is the alternative data marketplace. The vast number of alternative datasets on offer and technology now available have led to the creation of marketplaces that aim to create new efficiencies in the discovery process, pairing up the hungry buyer with the perfect dataset.

There are different players in this market, with different business models. Some nimble, freshly-created companies are seeking to automate the process provided by the alternative data catalogue companies, i.e. making datasets searchable and extracting a fee for each buyer-seller introduction. Another type of increasingly common form of data marketplace is that offered by the large cloud providers. These behemoths are competing with one another to be the one stop shop for all corporate remote data needs, and the provision of marketplace services – in some cases simply bringing together two existing clients – helps to make their offering more vibrant and compelling, covering more of a client’s requirements and tying them ever more closely. 

New solutions, familiar problems

No matter how efficient and advanced these new marketplaces may be, however, a substantial amount of baggage is still involved in buying an alternative dataset – details that have not yet been resolved. Like the data catalogue companies that originated in the decade before, data marketplaces are still largely about making an introduction, leaving the buyer and seller to work out the rest of the process by themselves.

An introduction would be sufficient if buying an alternative dataset was just a ‘point and click’ transaction, but that is not yet the case. Even though prices have been dropping in recent years, alternative datasets still command a hefty premium, meaning the decision to buy is one that is not taken lightly. That decision is further complicated by the fact that it is a  complicated process to discover whether a given dataset actually contains value. Even after a buying decision is made, the logistics involved in connecting and routinely operating the data pipelines between buyer and seller can be tortuous and filled with hazards.

An investor seeking to assess value must find answers to the following:

· Is this dataset potent – i.e. correlated to market metrics of interest?

· Is this dataset relevant – i.e. does it tell me anything about specific questions I am interested in?

· Does this data add to my existing point of view – i.e. does it tell me anything I don’t already know?

· Can combining this data with other datasets tell me something new – i.e. if I combine it with other signals, can I find new, unique insights? 

Then there is the problem of data connections. With the alternative data market still in its infancy, there is a lack of standardization when it comes to data types and delivery methods. The process of linking up the data can take several months and be fraught with risk even when it is up and running. And again, this steep data access curve must be climbed whether it is a purchased/routine dataset or a new trial dataset for evaluation. 


Then a hero comes along

This is where enablement service companies enter the fray. The problem faced by alternative data buyers is not just one of pure discovery, but also of deciding whether the dataset on offer will provide the value the buyer is seeking. Then there is the issue of data delivery and operations – an investment management company is not necessarily equipped to cope with the logistical issues involved in getting vast quantities of data from a to b.

Exabel solves the evaluation problem. Exabel is a software platform specifically built for investment teams to make the most out of alternative data. An investor can ingest a sample of vendor data into Exabel’s web-based app and proceed to answer all of the questions  above ‘at arms length’, testing existing investment strategies and/or using the data to develop new ones. With this platform a buyer can become entirely comfortable with the value that exists in the data, and the potential it has to deliver alpha-generating insights, perhaps in combination with other datasets. 

In terms of simplification of data integration (broadly the process of data extraction, transformation and loading or ‘ETL’), Crux Informatics was created for this very purpose. Crux helps companies integrate external (third-party) and related data sources to their destinations, enabling them to operate in a cloud native world with analytics-ready data. They are uniquely positioned because of their integrated delivery destinations (including GCP, Amazon, Azure, Snowflake) and the depth of ready-made pipelines on their platform (14K+ datasets, 140+ suppliers).

Companies such as Exabel and Crux are springing up and working together to form a networked ecosystem. Taken together, these partnerships improve the end to end data supply chain and customer evaluation, buying and consumption experience. 


As alternative data continues along its journey to universal adoption, efficiencies are increasing and inherent challenges are being smoothed. Each decade the market has thrown up new ways of answering the data discovery question, with the data marketplace being the latest. But as this evolution progresses, companies are also arriving to solve complementary problems involved in the buying and using of the data. In this way, one by one, the barriers to alternative data’s future growth are being removed.

by Mark Fleming-Williams, Host of The Alternative Data Podcast.

Don’t miss

More Insights

Exabel is a financial technology company based in Oslo, New York and London.

Subscribe to updates from Exabel