Data science is a new field of expertise that makes use of computers to leverage information in data through the use of machine learning techniques, which is a subfield of artificial intelligence. One of the largest problems today for uninitiated is the threat of vaporware, or the selling of the idea rather than the solution. Given the novelty of the field and knowing there are people out there selling quick fix remedies to complex problems, how can one best distinguish between the ‘too good to be true’ and the ‘state-of-the-art’ solutions, abbreviated as ‘SoA’ or ‘SOTA’ more frequently in IT where could be confused with service oriented architecture abbreviated as ‘SOA’.
For investors attempting to derisk a startup or understand their special sauce, the best way to see through possible vaporware is to look at the data being used. Consider whether you think it might be possible that someone could capture information about the process they are selling from the data used to build a machine learning model. In the end, artificial intelligence is simply math applied to data so it is up to them to come with sufficient evidence to convince you that the data has the information in it. If you have your doubts, you can ask them to show you which features are driving the model or what the computer is looking for in an image or video to make its decision.
For companies contemplating a merger or acquisition (M&A) of a technology, in addition to understanding the data they used to generate their fantastical results, we advise signing an NDA to review the source code. There is a large difference between a proof of concept (POC) or minimum viable product (MVP) and a production ready, scalable solution. For example, many initial iterations will not have fully integrated functional version control or have a scalable solution that can be upscaled to tens of thousands of users without a complete rewrite or refactoring of the code. Data scientists are not data engineers or developers and the risk companies run when derisking or auditing a start up is that they assume that the POC or MVP will directly translate to an enterprise solution. If the valuation of the company relies on data, additional considerations need to be taken into account to assess the bias of the data. For example, in the healthcare industry or when using biometric information, such as facial recognition, is the information sufficiently representative of your target population or was it only captured from western Europeans? Will the company be labeled as racist after your algorithm is found not to be able to distinguish between Asians?
For companies with digital exhaust, data generated from existing solutions, and little experience with data or analytics, how do you rank external vendors? Every vendor has their own proprietary software solution to slice and dice data but in the end, they are software developers and not data scientists so what they are using is likely a cookie cutter solution wrapped in a shiny interface. From the small vendors that will not do any of the data preparation needed for the machine learning to the big boys looking to take your data and then create a solution that directly competes with a POC you paid to develop. For example, several fly by night vendors will advertise automated machine learning solutions that will provide you with the best fitting model very quickly. What they will not do is the 80% of the work required to prepare the data or the efficiency to allow you to leverage your dataset in a reasonable timeframe, i.e. seconds rather than weeks. On the other hand, many of the big names will offer non-exclusive licensing of any derived works from the project, such as other products developed with your data that may directly compete with your product, or results without sharing any of the software source code locking you into a vendor for the entire lifecycle of a product. A good external vendor is one provides all work generated during a project (source code, results, etc.), make no claim on the derived works regarding licensing or use as they would then be collaborators rather than external vendors, and expect only payment for their services. The data is where the value lays and there are many vendors out there wanting to double dip on their payment by taking payment for services as well as the value of the data you provide them.
There are many other situations where there are things to watch out for when in the new field of artificial intelligence, machine learning, and data science. If you feel uncomfortable or have questions, do not hesitate to reach out.