Machine Learning Models to Investigate Startup Success in Venture Capital Using Crunchbase Dataset
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Laxman Gautam and Naruemon Wattanapongsakorn
Publication year: 2024
URL: https://jcsse2024.computing.psu.ac.th/
Languages: English-United States (EN-US)
Abstract
The venture capital (VC) industry offers opportunities for early-stage investment in startup enterprises, characterized by elevated levels of uncertainty and risk. Unfortunately, several venture funds like Google, Facebook, and different investment funds like Softbank, and Vision Funds Manager heavily rely on algorithms to assist their decisionmaking with high accuracy of success. VCs prioritize investments in high-risk, high-return ventures, aiming for rapid expansion despite the potential for swift downturns. This strategy is characterized as a high-risk, high-return game. The tools available are either inaccessible to investors or insufficient to help them effectively manage their funds and investments while minimizing the risk. This gap can be minimized using a machine learning data-driven technique to discover the hidden pattern from the application’s central data analysis. In our research, we have used machine learning techniques with the Crunchbase (CB) enterprise dataset, focusing on forecasting future success based on various critical characteristics and features to identify potential successful businesses. In our analysis, we analyzed the dataset with machine learning techniques such as Extreme Gradient Boosting (XGBoost), Light Gradient Boosting (LightGbm), and Logistic Regression (Lr) classifiers. The models arrived at promising outcomes as XGBoost, and LightGbm performed best on target class prediction, scoring with 79-80% accuracy and F1 score of 86% on both techniques. We used the dataset of 2 million companies revealing predictive insights into company status. Our model offers valuable support to venture investors in their decision-making process.
Keywords
Decision support system, Machine Learning