Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.
[00:00:00] Housekeeping
[00:01:08] Preamble
[00:01:50] Vitaliy Chiley Introduction
[00:03:11] Cerebrus architecture
[00:08:12] Memory management and FLOP utilisation
[00:18:01] Centralised vs decentralised compute architecture
[00:21:12] Sparsity
[00:23:47] Does Sparse NN imply Heterogeneous compute?
[00:29:21] Cost of distributed memory stores?
[00:31:01] Activation vs weight sparsity
[00:37:52] What constitutes a dead weight to be pruned?
[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?
[00:41:02] Cerebras is a cool place to work
[00:44:05] What is sparsity? Why do we need to start dense?
[00:46:36] Evolutionary algorithms on Cerebras?
[00:47:57] How can we start sparse? Google RIGL
[00:51:44] Inductive priors, why do we need them if we can start sparse?
[00:56:02] Why anthropomorphise inductive priors?
[01:02:13] Could Cerebras run a cyclic computational graph?
[01:03:16] Are NNs locality sensitive hashing tables?
References;
Rigging the Lottery: Making All Tickets Winners [RIGL]
https://arxiv.org/pdf/1911.11134.pdf
[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet
https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/
A Spline Theory of Deep Learning [Balestriero]
https://proceedings.mlr.press/v80/balestriero18b.html