Build Deep Learning Applications for Big Data using Analytics Zoo

Distributed TensorFlow, Keras and BigDL on Apache Spark


Jason Dai


Recent breakthroughs in artificial intelligence applications have brought deep learning to the forefront of new generations of data analytics. In this tutorial, we will present the practice and design tradeoffs on building large-scale deep learning applications (such as computer vision and NLP), for production data and workflow on Big Data platforms. We will provide an overview of emerging deep learning frameworks for Big Data (e.g., BigDL, TensorFlowOnSpark, Deep Learning Pipelines for Spark, etc.), and present the underlying distributed systems and algorithms. More importantly, we will show how to build and productionize deep learning application pipelines for Big Data using Analytics Zoo (an end-to-end data analytics + AI platform for Apache Spark and BigDL), using real-world use cases (such as, MLSListings, World Bank, UnionPay, etc.)


June 19 (9AM - 12PM) 2018, Room 151 ABCG

9:00 - 9:10 Motivation
9:10 - 9:30 DL frameworks on Apache Spark
9:30 - 09:45 Analytics Zoo for Spark and BigDL
9:45 - 10:00 Analytics Zoo Examples (notenook I, notenook II, notenook III)
10:00 - 10:30 Break
10:30 - 11:00 Distributed training
11:00 - 11:30 Advanced applications (notenook I, notenook II, notenook III)
11:30 - 11:50 Real-world applications (notebook I, notebook II)
11:50 - 12:00 Q&A