There is a seductive moment in every data platform project when the pipeline starts working. Data flows in. The tables fill up. The dashboards light up with numbers. And everyone in the room feels like the hard part is done.
It is not. The hard part has not started.
Moving data is an engineering problem. Creating value from data is a product problem. And in my experience, most data platform projects confuse the two.
What the Pipeline Actually Does
A data pipeline moves data from a source to a destination. It handles extraction, transformation, and loading. It manages errors, retries, and schema changes. It is genuinely complex engineering work, and good pipeline engineers are worth their weight in gold.
But the pipeline does not decide what data to move. It does not decide how to transform it. It does not decide what questions the data needs to answer. Those are product decisions, and they need to be made before the pipeline is built: not after.
When I was working on the NexusStream data lake at iPromoteu, we made the classic mistake. We started with the pipeline. We said: "Let us get all the PromoStandards data into a central repository, and then we will figure out what to do with it."
The result was a repository full of data that was technically correct and practically unusable. We had product data from ninety suppliers. We had pricing data. We had inventory data. But we could not answer the question that mattered most to the business: "Which supplier should I use for this order, given these constraints?"
That question required relationships that the pipeline had not been designed to preserve. It required a schema that encoded the business logic of sourcing decisions. And it required a product layer: an interface, a query model, a set of use cases: that translated raw data into actionable recommendations.
We had a pipeline. We did not have a product.
The Use Case Inversion
The fix was to invert the process. Instead of starting with the data and asking "what can we do with this?", we started with the use cases and asked "what data do we need to support these?"
The use cases were specific: a distributor needs to find three suppliers who can fulfill a rush order for a specific product category, with decoration, at a specific price point, with a minimum order quantity under fifty units. That is a sourcing decision. It requires data about supplier capabilities, lead times, pricing tiers, and decoration methods: and it requires that data to be structured in a way that makes the query fast and the result trustworthy.
Once we had the use cases, the schema became obvious. And once the schema was obvious, the pipeline requirements became specific. We knew exactly what data to extract, how to transform it, and what relationships to preserve.
The pipeline got simpler. The product got better.
What This Means for Data PMs
If you are a PM on a data platform, your job is not to own the pipeline. Your job is to own the use cases. You need to be able to answer: What decisions does this data need to support? What questions does it need to answer? What would a user do differently if they had this data versus if they did not?
Those answers drive the schema. The schema drives the pipeline. The pipeline drives the infrastructure.
Work in that order, and you build a data product. Work in the reverse order, and you build a data repository: which is a different thing, and a less valuable one.
The pipeline is not the product. The product is the decision it enables.