Morelix: The future of entrepreneurship data – getting to know CrunchBase

Editor’s note: This post was republished with permission from the Ewing Marion Kauffman Foundation’s Policy Dialogue blog.

Here at Kauffman we are actively thinking about ways of in which we can better measure entrepreneurship activity and ecosystems.

In many ways, entrepreneurship data beyond the traditional public and private data sources is still in relative infancy, and researchers are still learning how to use things like social media, crowdsourced, and news-based data.

As one of our forays into exploring the future of entrepreneurship data we partnered with CrunchBase for a session in San Francisco last month.

Getting to know CrunchBase

Last month during the American Economic Association (AEA) annual meeting in San Francisco we held a session with our friends at CrunchBase.

The session was focused on sharing insights by the CrunchBase team on how the data is assembled and how it can be used; Kauffman’s perspective on emerging datasets like CrunchBase and thoughts on further exploration and funding; discussion on the advantages and constraints of the data; and presentations by academic and industry users of the data.

About CrunchBase

CrunchBase is a leading platform to discover innovative companies and the people behind them. The CrunchBase Dataset is constantly expanding through contributions from their community of users, investment firms, and network of global partners. It now covers millions of users and businesses around the world. CrunchBase also has an open-access data license available to academic users.

About the session

Below is the agenda for the session, with slide decks and/or working papers shared below when possible.

Welcome and Introduction

Arnobio Morelix, Sr. Research Analyst & Program Officer @ Kauffman Foundation

EJ Reedy, Director and Program Officer @ Kauffman Foundation

An Inside Look at CrunchBase Dataset

Gené Teare, Director of Content @ CrunchBase

Comparing CrunchBase Fundraising Data Against to Another Source

Yas Motoyama, Director of Research and Policy @ Kauffman Foundation

Very Early Venture Finance: Pitch Competitions and their Judges

Sabrina Howell, Assistant Professor of Finance @ New York University Stern School of Business

Organizational Decision-Making and Information: Angel Investments by Venture Capital Partners

Andy Wu, PhD Candidate in Applied Economics @ University of Pennsylvania and Founding Director and Investor @ Identified Technologies

Ecosystem Attraction Metrics

J-F Gauthier, CFO & Head of BizDev @ Startup Compass Inc.

How Do Accelerators Impact High-Tech Ventures?

Sandy Yu, Postdoctoral fellow @ UC Berkeley | Coleman Fung Institute

Thoughts on the data

One of the main strengths of CrunchBase is, in my opinion, the fact that they have data on both the people (e.g., founders, employees, investors) and the companies. This allows for data users to get at some stuff not easily accessible, such as the connections among different ecosystems players.

As any dataset, it has limitations. One of the main limitations for academic research is that they are reporting is typically private, and we do not fully understand potential reporting biases. Usually, when that type of private information is made public, it is because of strategic reasons for the parties involved (e.g., a startup wants to show traction).

Andy Wu, PhD student at Wharton, sums it up well:

“The primary challenge for using Crunchbase is that we don’t fully understand the extent of missing data and more broadly the limitations for crowdsourced data. I suspect that we are missing a huge amount of data on the smallest investment events that go undisclosed without press releases or without SEC Form D filings; to be fair, this is a huge problem with all datasets in entrepreneurial finance. Furthermore, since the data is continually being backfilled, there is an implicit selection bias towards the inclusion of the most successful firms that are easiest to find historical information about.

Regardless, Crunchbase is definitely something for all entrepreneurship researchers to keep an eye on.”

How to access CrunchBase data

Just yesterday CrunchBase launched a new way of accessing their data, and you can learn more about it here. Also, if you are researcher at a university, you can get free access to all their stuff here.

Summing it up

CrunchBase is a really exciting dataset for entrepreneurship researchers – even though we are still learning about what are their main strengths and constraints.

If you are a researcher using the data, and would like to share your thoughts on it or propose ways in which we can better understand and augment the data, I’d love if you can let me know here.

Arnobio Morelix, Ewing Marion Kauffman Foundation senior research analyst | Courtesy of the Kauffman Foundation
Arnobio Morelix | Courtesy of the Kauffman Foundation

Arnobio Morelix is a senior research analyst and program officer in Research and Policy at the Ewing Marion Kauffman Foundation, where he is a principal investigator on the Kauffman Index of Entrepreneurship, the first and largest index tracking entrepreneurship across city, state, and national levels.

Morelix also is an editor of Kauffman’s entrepreneurship research blog,, which is syndicated by Missouri Business Alert.

Find Morelix on Twitter: @arnobiomorelix.


Tags:, , ,

Leave a Reply

Have you heard?

Missouri Business Alert is participating in CoMoGives2019!

Find out how we plan to use your gift to enhance training and programming for our students