Data is the key ingredient to building high-quality, production AI applications. It comes in during the training phase, where more and higher-quality training data enables better models, as well as during the production phase, where understanding the model’s behaviour in production and detecting changes in the predictions and input data are critical to maintaining a production application. However, so far most data management and machine learning tools have been largely separate. In this presentation, we’ll talk about several efforts from Databricks, in Apache Spark, as well as other open source projects, to unify data and AI in order to make it significantly simpler to build production AI applications.