MLflow on production-near operations data — what changes for data science

What data scientists get in most companies isn’t what they need. Sample exports, anonymized snapshots, stale CSVs from the data warehouse team — reality sits three data hops away. MLflow on a shared operations data foundation shifts that. From “here is a sample” to “work directly on the productive data, in a controlled environment”.

Why sample exports are an anti-pattern

Data drift stays invisible. A sample exported six months ago doesn’t represent today’s distribution. A model trained on the sample performs differently in deployment — sometimes worse, sometimes in unobtrusive ways.

Edge cases disappear. Sampling smooths reality. Exactly the rare incidents an operational model should detect are usually missing from a 5% sample — or so rare that the model doesn’t learn them.

The transition to production is a break. Model is trained on sample, then released on productive data, then problems show — schemas slightly shifted, feature distributions different, edge cases not covered. Three iterations later, something works that could have worked on the first attempt.

Audit reproducibility is missing. What data exactly was in place when model version 3.2 was trained? An answer that reads “we have the snapshot from May” doesn’t hold up against an audit — especially when the model supports regulatorily relevant decisions.

How the workflow changes with a production-near data foundation

Data scientists work directly on the shared data foundation. A pre-integrated browser IDE plus MLflow as experiment tracking, model registry and training runtime — all on the same data that flows in operations. No data export, no synchronization, no offset.

Experiments are reproducible. MLflow captures the training run, hyperparameters, metrics, model artifact — and references the data version used. Whoever asks in six months “how did version 3.2 come about” gets a complete answer.

Deployment is a configuration step. Models from the registry go to production via Kubernetes workflows — on the same infrastructure that carried training. The break between “model in notebook” and “model in operational service” disappears.

Drift monitoring built in. When the data distribution in production diverges from the one at training time, that becomes visible — as a metric, as an alarm, as a trigger for re-training. Model decay is detected before it costs business.

Which use cases only then become possible

Predictive maintenance on telemetry. Predictive maintenance requires models that react to real-time telemetry. If telemetry has to be synced across three data hops before the model sees it, “predictive” is a lie — the forecast window is eaten by the lag.

Anomaly detection in operational data streams. A model detecting irregularities in fuel consumption, waiting times, incident patterns must run on productive data — not on a smoothed sample where the anomalies are missing.

Real-time decision services with ML components. When a decision service scores misconnection risk, estimates delay probabilities or derives maintenance recommendations, data freshness is critical to the second. ML on the operational data foundation delivers that without data-pipeline tinkering.

Continuous re-training. Models that are continuously re-trained on new data are only possible when training data and production data are the same. Otherwise every iteration goes through the same export tunnel — weeks, not hours.

The lever isn’t “another ML tool”, but the shift of the data foundation. Anyone wanting to move data science from the sample-export mode into the production-near mode clarifies that in the Tactical Assessment.

Data science in production

Why sample exports are an anti-pattern

How the workflow changes with a production-near data foundation

Which use cases only then become possible

Tamper-evident audit trail in practice