Presentation by David Smith (VP Marketing and Community, Revolution Analytics) to Strata Santa Clara, February 26 2013. More at http://www.strataconf.com
Taking data science into action requires deploying statistical models into production environments, usually with real-time processing requirements. Every company that relies on predictive models to drive their applications and operations has a different process for model deployment, but by working with many such companies I’ve seen a common pattern emerge. The real-time model deployment process can be broken down into these five stages:
– Data distillation
– Model development
– Model validation and deployment
– Model refresh
– Real-time model scoring
In this talk, I’ll describe the five stages of real-time analytics deployment, and the technologies supporting each stage, including Hadoop, R, and database warehousing systems. I’ll share some best practices for setting up a the technology stack and processes for model deployment, based on some real-life case studies.