Building Deep Learning Applications for Big Data

An Introduction to Analytics Zoo: Distributed TensorFlow, Keras and BigDL on Apache Spark


Jason Dai


Recent breakthroughs in artificial intelligence applications have brought deep learning to the forefront of new generations of data analytics. In this tutorial, we will present the practice and design tradeoffs on building large-scale deep learning applications (such as computer vision and NLP), for production data and workflow on Big Data platforms. We will provide an overview of emerging deep learning frameworks for Big Data (e.g., BigDL, TensorFlowOnSpark, Deep Learning Pipelines for Apache Spark, etc.), and present the underlying distributed systems and algorithms. More importantly, we will show how to build and productionize end-to-end deep learning application pipelines for Big Data (on top of Analytics Zoo, a unified analytics + AI platform for distributed TensorFlow, Keras and BigDL on Apache Spark), using real-world use cases (such as Azure,, World Bank, Midea/KUKA, etc.)


Sunday, January 27 (1:30PM - 5:30PM), 2019

1:30PM - 1:45PM Motivation
1:45PM - 2:15PM DL frameworks on Apache Spark
2:15PM - 2:45PM Analytics Zoo Overview
2:45PM - 3:15PM Analytics Zoo Examples
3:15PM - 3:45PM Break
3:45PM - 4:15PM Distributed training
4:15PM - 4:35PM Advanced applications
4:35PM - 5:20PM Real-world applications
5:20PM - 5:30PM Q&A