Showing 1–2 of 2 results for author: Hah, T K

Search v0.5.6 released 2020-02-24

arXiv:1901.06433 [pdf, other]

cs.LG cs.CV

Machine Learning with Clos Networks

Authors: Timothy Whithing, Thiam Khean Hah

Abstract: We present a new methodology for improving the accuracy of small neural networks by applying the concept of a clos network to achieve maximum expression in a smaller network. We explore the design space to show that more layers is beneficial, given the same number of parameters. We also present findings on how the relu nonlinearity ffects accuracy in separable networks. We present results on early… ▽ More We present a new methodology for improving the accuracy of small neural networks by applying the concept of a clos network to achieve maximum expression in a smaller network. We explore the design space to show that more layers is beneficial, given the same number of parameters. We also present findings on how the relu nonlinearity ffects accuracy in separable networks. We present results on early work with Cifar-10 dataset. △ Less

Submitted 18 January, 2019; originally announced January 2019.
arXiv:1901.04969 [pdf, other]

cs.DC cs.AI

Low Precision Constant Parameter CNN on FPGA

Authors: Thiam Khean Hah, Yeong Tat Liew, Jason Ong

Abstract: We report FPGA implementation results of low precision CNN convolution layers optimized for sparse and constant parameters. We describe techniques that amortizes the cost of common factor multiplication and automatically leverage dense hand tuned LUT structures. We apply this method to corner case residual blocks of Resnet on a sparse Resnet50 model to assess achievable utilization and frequency a… ▽ More We report FPGA implementation results of low precision CNN convolution layers optimized for sparse and constant parameters. We describe techniques that amortizes the cost of common factor multiplication and automatically leverage dense hand tuned LUT structures. We apply this method to corner case residual blocks of Resnet on a sparse Resnet50 model to assess achievable utilization and frequency and demonstrate an effective performance of 131 and 23 TOP/chip for the corner case blocks. The projected performance on a multichip persistent implementation of all Resnet50 convolution layers is 10k im/s/chip at batch size 2. This is 1.37x higher than V100 GPU upper bound at the same batch size after normalizing for sparsity. △ Less

Submitted 11 January, 2019; originally announced January 2019.

Search v0.5.6 released 2020-02-24