Five Principles of a Modern Data Platform
A practical lens for evaluating data stacks when latency, governance, and AI readiness all matter.
Data platforms are having a moment of architectural clarity. After years of tool proliferation and 'modern data stack' hype, the patterns that actually work in production are becoming clear. Here are the five principles that guide every data platform we build.
1. Treat data as a product
Data products have owners, SLAs, documentation, and quality standards. They're designed for consumers, not just producers. The shift from 'data as a byproduct' to 'data as a product' is the single most impactful organisational change a data team can make — and it has nothing to do with tooling.
2. Separate storage from compute
Coupling storage and compute — as traditional data warehouses did — creates scaling and cost problems. Modern platforms separate them completely: object storage (S3, GCS, Azure Blob) for data, compute engines (Spark, Flink, DuckDB, Snowflake compute) that scale independently. This enables the open table formats (Delta Lake, Iceberg, Hudi) that make the lakehouse architecture viable.
3. Model your data, don't just move it
Raw data ingestion without transformation is a data swamp, not a data platform. dbt (or equivalent) should be a first-class citizen of every modern data platform. Transformation as code, version controlled, tested, and documented. Every derived dataset should be traceable back to its source through a transformation DAG.
4. Build for AI readiness from day one
The data platform of 2024 is also the AI platform of 2025. This means: feature stores for ML model training and serving, vector stores for embedding-based retrieval, metadata catalogues that LLM agents can query, and data quality standards that ML training requires. Retrofitting AI readiness is expensive. Building it in is not.
5. Observe everything
Data quality degrades silently. Pipelines fail in partial ways that don't trigger alerts. The only defence is end-to-end observability: pipeline latency monitoring, data freshness SLAs, schema evolution alerts, and anomaly detection on key business metrics. If you can't answer 'is this data correct and current?' in under 30 seconds, you don't have observability — you have logging.
The Nimbus embodiment
Our Nimbus platform was built around these five principles — governed ingestion, open table format storage, dbt transformation, feature store integration, and built-in data quality monitoring. But the principles matter more than any specific tool. Apply them to your stack and the architecture decisions become much clearer.
