The Data Lake Was Never About Data: Larry Hackney

I spent the better part of two years at iPromoteu trying to solve what looked like a data problem.

The promotional products industry runs on a consortium called PromoStandards. They had established eight API standards: product data, pricing, inventory, order status, and more. The goal was to create a common language between suppliers and distributors. And it worked, mostly. We had integrations with roughly ninety vendors. Manual integrations, one at a time, each one a small project unto itself.

The ceiling was obvious. You cannot manually integrate your way to a thousand vendors. You need a different architecture entirely.

So we started designing a data lake. And that is when I learned the lesson that changed how I think about data infrastructure: the data lake was never about storing data. It was about preserving relationships.

What a Data Lake Actually Solves

Here is the problem with point-to-point integrations at scale. Every time you connect Vendor A to Distributor B, you are encoding a relationship in code. That relationship has assumptions baked into it: about field names, about data types, about what "in stock" means versus "available to order." When Vendor A changes their API, that assumption breaks. When you have ninety vendors, you have ninety fragile assumptions.

A data lake changes the architecture. Instead of encoding relationships in integrations, you encode them in a schema. The data flows in raw. The relationships are defined once, at the schema level, and every downstream consumer benefits from that definition.

But here is what most data lake projects miss: the schema is the relationship. When you define how a "product" relates to a "supplier" relates to a "price break" relates to a "decoration method," you are not just organizing data. You are building a model of how the industry actually works.

The PromoStandards Gap

PromoStandards gave us a vocabulary. Eight standards, each covering a different type of transaction. But vocabulary is not grammar. Knowing what a "product" is does not tell you how a product relates to a decoration method, or how a decoration method relates to a lead time, or how a lead time relates to a rush order surcharge.

That grammar: the relational logic that connects the standards: was what the data lake needed to encode. And encoding it required something that no API spec can give you: domain expertise.

You have to know that a distributor's "net price" is not the same as a supplier's "net price." You have to know that "in stock" means something different for a promotional product that needs decoration than it does for a commodity item that ships as-is. You have to know that the same SKU can have fifteen different decoration options, each with its own pricing matrix, each with its own lead time, and that the relationship between those options is not additive: it is conditional.

That knowledge does not live in documentation. It lives in the heads of people who have been in the industry for twenty years.

What This Means for Product Managers

If you are building a data platform in any industry with complex relational logic: automotive, healthcare, financial services, promotional products: the technical architecture is the easy part. The hard part is the domain modeling.

Before you write a single line of ETL code, you need to answer these questions: What are the entities in this domain? What are the relationships between them? What are the rules that govern those relationships? And critically: where do those rules live today, and how do you extract them?

In most organizations, the rules live in spreadsheets, in the institutional memory of long-tenured employees, and in the workarounds that people have built because the system does not handle edge cases correctly.

Your job as a PM is to surface those rules before the engineers start building. Because once the schema is set, changing it is expensive. And a schema built on incomplete domain knowledge will produce a data lake that stores everything and understands nothing.

The Lesson I Carry Forward

The data lake project at iPromoteu taught me that data infrastructure is really a knowledge management problem. The question is not "how do we store this data?" The question is "how do we encode what we know about this industry into a structure that a machine can use?"

That reframe changes everything. It changes who you talk to first (domain experts, not engineers). It changes what you build first (the schema, not the pipeline). And it changes how you measure success (not by the number of vendors integrated, but by the quality of the relationships encoded).

Data without relationships is just storage. Relationships without data are just theory. The data lake is where they meet: and getting that meeting right is the whole job.

Data LakePromoStandardsArchitectureIntegration

The Data Lake Was Never About Data

What a Data Lake Actually Solves

The PromoStandards Gap

What This Means for Product Managers

The Lesson I Carry Forward

More from Larry

A System of Systems