Why Machine Learning Projects are Icebergs

In the world of machine learning, the allure often lies in the dazzling advancements: the latest models, cutting-edge techniques, and breakthrough success stories. It’s comparable to admiring the tip of an iceberg without recognizing the substantial structure beneath the surface that keeps it afloat.

Machine learning projects require an immense amount of support work, encompassing data annotation, data validation, drift detection, model evaluation, and model version management, to name a few. While these foundational tasks are seldom highlighted in mainstream blogs and LinkedIn posts, they are essential for the success of any ML project.

Majority of required ML work is unseen


The stark reality of production machine learning systems is that they demand far more than the visible, glamorous facets. The extensive labor supporting these systems might not grab headlines, but it’s unquestionably vital. This is why I liken ML projects to icebergs: the end-users rarely perceive or acknowledge the vast amount of work behind the scenes that goes into a successful ML project.

LLM made the iceberg deeper


The advent of large language models (LLMs) has only expanded the iceberg. Why? Because with LLMs, we face more complex model evaluations, intricate fine-tuning processes, and the necessity for sophisticated infrastructure during deployments. It’s far from simply deploying a new foundational model in a notebook; it’s a comprehensive, multifaceted endeavor.

Feeling overwhelmed by the scope of work? One might consider leveraging closed-source model APIs like OpenAI, which handle many of these support tasks for you. However, there's a significant trade-off. If your competitive advantage relies on OpenAI prompts that are accessible to your competitors, your 'iceberg' might melt away rapidly, much like an ice cream cone on a hot day—unpredictably and messily.

Model APIs can shrink your iceberg into an ice cream cone


Choosing to run open-source models presents its own set of challenges, but it comes with substantial rewards. It allows you to safeguard your intellectual property and, more crucially, protect your clients’ data. This approach offers a more sustainable competitive edge, ensuring your iceberg remains robust in the face of competition.

We invite you to follow our blog as we delve deeper into how we tackle our machine learning icebergs. By sharing our insights, strategies, and experiences, we aim to provide a comprehensive view of the less visible but critical aspects of ML projects. Join us in exploring the depths of machine learning and uncovering the full extent of what it takes to succeed in this dynamic field.