BigDL allows users to write deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters to process Big Data.

Analytics Zoo seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink and Ray programs into an integrated pipeline, which can transparently scale from laptops to large clusters to process production big data.

It leverages emerging AI technologies (e.g., Ray, hyperparameter optimization, sequence generation models, etc.) to automatically generate feature, select models and tune hyperparameters for time series prediction in a distributed fashion.

Building an experiment platform using both DRL algorithms (e.g., imitation learning, DQN, policy gradient, etc.) as well as computer vision models (e.g., object detection, object tracking, OCR, etc.) to play FIFA18.

  • Time Series Based Network Quality Prediction in
  • Context-Aware Fast Food Recommendation at
  • NLP Based Customer Service Chatbot for
  • Computer Vision Based Product Defect Detection in
  • Etc.

Selected Publications

Automated ML Workflow for Distributed Big Data Using Analytics Zoo

Tutorial In the Conference on Computer Vision and Pattern Recognition (CVPR) 2020

BigDL: A Distributed Deep Learning Framework for Big Data

In ACM Symposium of Cloud Computing conference (SoCC) 2019

Build Deep Learning Applications for Big Data Platforms Using Analytics Zoo

Tutorial in the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI) 2019

Building Deep Learning Applications on Big Data Platforms

Tutorial In the Conference on Computer Vision and Pattern Recognition (CVPR) 2018

Experience from Hadoop Benchmarking with HiBench: From Micro-Benchmarks Toward End-to-End Pipelines

In Proceedings of the 2013 Workshop Series on Big Data Benchmarking

ReNIC: Architectural Extension to SR-IOV I/O Virtualization for Efficient Replication

ACM Transactions on Architecture and Code Optimization (TACO), January 2012

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud

In USENIX Annual Technical Conference (ATC) 2011

The HiBench benchmark suite: Characterization of the MapReduce-based data analysis

In Proceedings of the 26th International Conference on Data Engineering Whokshops (ICDEW) 2010

Design Patterns for Internet-Scale Services

In Proceedings of the 25th International Conference on Data Engineering Workshops (ICDEW) 2009

Towards high-quality I/O virtualization

In Proceedings of SYSTOR 2009

Latency Hiding in Multi-Threading and Multi-Processing of Network Applications

In the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT) 2007

Automatic multithreading and multiprocessing of C programs for IXP

In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP) 2005

Automatically Partitioning Packet Processing Applications for Pipelined Architectures

In ACM Sigplan 2005 Conference on Programming Language Design and Implementation (PLDI 2005)