Deep Learning at 15 Petaflops from Intel and partners

• by brian wang

Intel researchers and parters developed supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and
classifying extreme weather in climate data. Our Intelcaffe based implementation obtains ∼2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employing synchronous node-groups, while using asynchronous communication across groups. They use this strategy to scale training of a single model to ∼9600 Xeon-Phi nodes; obtaining peak performance of 11.73-15.07 PFLOP/s and sustained performance of 11.41-13.27 PFLOP/s. At scale, their HEP architecture produces state-of-the-art classification accuracy on a dataset with 10 Million images, exceeding
that achieved by selections on high-level physics-motivated features. Their semi-supervised architecture successfully extracts weather patterns in a 15TB climate dataset. Their results demonstrate that Deep Learning can be optimized and scaled effectively on many-core, HPC systems.

