Brian McWilliams.


I am a research scientist at Disney Research Zurich heading the Deep Learning and Analytics group. I am particularly interested in deep learning, randomized algorithms and optimization for large scale learning.

I completed my Ph.D. in 2012 in the Statistics Section of the Department of Mathematics Imperial College, London under the supervision of Giovanni Montana. Previously, I did the MSc. in Informatics specialising in Machine Learning, Neuroinformatics and Intelligent Robotics at the University of Edinburgh and a BEng. in Computer Systems Engineering at the University of Warwick.

Until August 2015 I was a postdoctoral researcher and lecturer in the Institute of Machine Learning at ETH Zurich.

Open positions: We have masters thesis projects in the area of deep learning and generative modelling available.


May 2017
Shattered Gradients and Neural Taylor Approximations accepted to ICML 2017!
March 2017
Paper with Pixar Animation Studios on denoising MC rendered images using kernel-predicting convnets accepted to SIGGRAPH 2017!
Feb 2017
PriDE (Private Distributed Estimation) is an algorithm for preserving differential privacy in distributed statistical estimation tasks.
The Shattered Gradients Problem describes the phenomenon of how gradients whiten as neural networks get deeper making optimization hard. We show why ResNets fix this issue.
Nov 2016
Neural Taylor Approximation. We provide the first convergence result for deep neural networks with ReLU activations.
Aug 2016
RadaGrad! our paper on scalable approximations to full-matrix AdaGrad using random projections is accepted to NIPS 2016!



  • Preserving Differential Privacy Between Features in Distributed Estimation.
    C Heinze-Deml, B McWilliams, N Meinshausen. arXiv.


  • The Shattered Gradients Problem: If resnets are the answer, then what is the question?
    D Balduzzi, M Frean, L Leary, JP Lewis, K Ma, B McWilliams. ICML 2017. arXiv.
  • Neural Taylor Approximation: Convergence and Exploration in Rectifier Networks.
    D Balduzzi, B McWilliams, T Butler-Yeoman. ICML 2017. arXiv.
  • Kernel-predicting Convolutional Networks for Denoising Monte Carlo Renderings.
    S Bako, T Vogels, B McWilliams, M Meyer, J Novak, A Harvill, P Sen, T DeRose, F Rouselle. SIGGRAPH 2017.
    Project page.
  • Automatically Learning an Intuitive Animation Interface From a Collection of Human Motion Clips.
    M Lüdi, M Guay, B McWilliams, R W Sumner.


  • Scalable Adaptive Stochastic Optimization Using Random Projections.
    G Krummenacher, B McWilliams, Y Kilcher, J Buhmann, N Meinshausen. NIPS 2016. arXiv.
  • A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation.
    F Perazzi, J Pont-Tuset, B McWilliams, M Gross, L Van Gool, A Sorkine-Hornung. CVPR 2016.
    Project page.
  • DUAL-LOCO: Distributing Statistical Estimation Using Random Projections.
    C Heinze, B McWilliams, N Meinshausen. AISTATS 2016. arXiv. Software (maintained by Christina).


  • Variance Reduced Stochastic Gradient Descent with Neighbors.
    T Hofmann, A Lucchi, S Lacoste-Julien, B McWilliams. NIPS 28. arXiv.
  • DUAL-LOCO: Preserving privacy between features in distributed estimation.
    C Heinze, B McWilliams, N Meinshausen. NIPS WS on Learning and privacy with incomplete data and weak supervision.
  • LOCO: Distributing Ridge Regression with Random Projections.
    C Heinze, B McWilliams, N Meinshausen, G Krummenacher. arXiv.
    Software (maintained by Christina).
  • Learning Representations for Outlier Detection on a Budget.
    B Micenková, B McWilliams, I Assent. arXiv.
  • A Variance Reduced Stochastic Newton Method.
    A Lucchi, B McWilliams, T Hofmann. arXiv.



  • Correlated random features for fast semi-supervised learning.
    B McWilliams, D Balduzzi, J Buhmann. In Advances in Neural Information Processing Systems (NIPS) 26. arXiv. Matlab code. Poster.
  • Pruning random features with correlated kitchen sinks (1 page abstract).
    B McWilliams, D Balduzzi. SPARS 2013.

2012 and earlier


Fall Semester 2014

I co-lectured Probabilistic Graphical Models for Image Analysis with Dr. Aurelien Lucchi.

Spring Semester 2014

I was head TA of Computational Intelligence Laboratory. This course is now taught by Prof. Hofmann and headed by Martin Jaggi and Aurelien Lucchi.


The website for the probabilistic graphical models course that Dr. David Balduzzi and I taught at Uni Basel in summer 2013 is located here.



Disney Research Zurich
Stampfenbachstrasse 48
8006 Zurich