-
Q-CHOP: Quantum constrained Hamiltonian optimization
Authors:
Michael A. Perlin,
Ruslan Shaydulin,
Benjamin P. Hall,
Pierre Minssen,
Changhao Li,
Kabir Dubey,
Rich Rines,
Eric R. Anschuetz,
Marco Pistoia,
Pranav Gokhale
Abstract:
Combinatorial optimization problems that arise in science and industry typically have constraints. Yet the presence of constraints makes them challenging to tackle using both classical and quantum optimization algorithms. We propose a new quantum algorithm for constrained optimization, which we call quantum constrained Hamiltonian optimization (Q-CHOP). Our algorithm leverages the observation that…
▽ More
Combinatorial optimization problems that arise in science and industry typically have constraints. Yet the presence of constraints makes them challenging to tackle using both classical and quantum optimization algorithms. We propose a new quantum algorithm for constrained optimization, which we call quantum constrained Hamiltonian optimization (Q-CHOP). Our algorithm leverages the observation that for many problems, while the best solution is difficult to find, the worst feasible (constraint-satisfying) solution is known. The basic idea is to to enforce a Hamiltonian constraint at all times, thereby restricting evolution to the subspace of feasible states, and slowly "rotate" an objective Hamiltonian to trace an adiabatic path from the worst feasible state to the best feasible state. We additionally propose a version of Q-CHOP that can start in any feasible state. Finally, we benchmark Q-CHOP against the commonly-used adiabatic algorithm with constraints enforced using a penalty term and find that Q-CHOP performs consistently better on a wide range of problems, including textbook problems on graphs, knapsack, combinatorial auction, as well as a real-world financial use case, namely bond exchange-traded fund basket optimization.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
On the Computational Hardness of Quantum One-Wayness
Authors:
Bruno Cavalar,
Eli Goldin,
Matthew Gray,
Peter Hall,
Yanyi Liu,
Angelos Pelecanos
Abstract:
There is a large body of work studying what forms of computational hardness are needed to realize classical cryptography. In particular, one-way functions and pseudorandom generators can be built from each other, and thus require equivalent computational assumptions to be realized. Furthermore, the existence of either of these primitives implies that $\rm{P} \neq \rm{NP}$, which gives a lower boun…
▽ More
There is a large body of work studying what forms of computational hardness are needed to realize classical cryptography. In particular, one-way functions and pseudorandom generators can be built from each other, and thus require equivalent computational assumptions to be realized. Furthermore, the existence of either of these primitives implies that $\rm{P} \neq \rm{NP}$, which gives a lower bound on the necessary hardness.
One can also define versions of each of these primitives with quantum output: respectively one-way state generators and pseudorandom state generators. Unlike in the classical setting, it is not known whether either primitive can be built from the other. Although it has been shown that pseudorandom state generators for certain parameter regimes can be used to build one-way state generators, the implication has not been previously known in full generality. Furthermore, to the best of our knowledge, the existence of one-way state generators has no known implications in complexity theory.
We show that pseudorandom states compressing $n$ bits to $\log n + 1$ qubits can be used to build one-way state generators and pseudorandom states compressing $n$ bits to $ω(\log n)$ qubits are one-way state generators. This is a nearly optimal result since pseudorandom states with fewer than $c \log n$-qubit output can be shown to exist unconditionally. We also show that any one-way state generator can be broken by a quantum algorithm with classical access to a $\rm{PP}$ oracle.
An interesting implication of our results is that a $t(n)$-copy one-way state generator exists unconditionally, for every $t(n) = o(n/\log n)$. This contrasts nicely with the previously known fact that $O(n)$-copy one-way state generators require computational hardness. We also outline a new route towards a black-box separation between one-way state generators and quantum bit commitments.
△ Less
Submitted 20 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Nonlinear Equivariant Imaging: Learning Multi-Parametric Tissue Map** without Ground Truth for Compressive Quantitative MRI
Authors:
Ketan Fatania,
Kwai Y. Chau,
Carolin M. Pirkl,
Marion I. Menzel,
Peter Hall,
Mohammad Golbabaee
Abstract:
Current state-of-the-art reconstruction for quantitative tissue maps from fast, compressive, Magnetic Resonance Fingerprinting (MRF), use supervised deep learning, with the drawback of requiring high-fidelity ground truth tissue map training data which is limited. This paper proposes NonLinear Equivariant Imaging (NLEI), a self-supervised learning approach to eliminate the need for ground truth fo…
▽ More
Current state-of-the-art reconstruction for quantitative tissue maps from fast, compressive, Magnetic Resonance Fingerprinting (MRF), use supervised deep learning, with the drawback of requiring high-fidelity ground truth tissue map training data which is limited. This paper proposes NonLinear Equivariant Imaging (NLEI), a self-supervised learning approach to eliminate the need for ground truth for deep MRF image reconstruction. NLEI extends the recent Equivariant Imaging framework to nonlinear inverse problems such as MRF. Only fast, compressed-sampled MRF scans are used for training. NLEI learns tissue map** using spatiotemporal priors: spatial priors are obtained from the invariance of MRF data to a group of geometric image transformations, while temporal priors are obtained from a nonlinear Bloch response model approximated by a pre-trained neural network. Tested retrospectively on two acquisition settings, we observe that NLEI (self-supervised learning) closely approaches the performance of supervised learning, despite not using ground truth during training.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Authors:
Hsiang-Sheng Tsai,
Heng-Jui Chang,
Wen-Chin Huang,
Zili Huang,
Kushal Lakhotia,
Shu-wen Yang,
Shuyan Dong,
Andy T. Liu,
Cheng-I Jeff Lai,
Jiatong Shi,
Xuankai Chang,
Phil Hall,
Hsuan-Jui Chen,
Shang-Wen Li,
Shinji Watanabe,
Abdelrahman Mohamed,
Hung-yi Lee
Abstract:
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in…
▽ More
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all researchers, and encourage efficient use of computational resources. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
A Plug-and-Play Approach to Multiparametric Quantitative MRI: Image Reconstruction using Pre-Trained Deep Denoisers
Authors:
Ketan Fatania,
Carolin M. Pirkl,
Marion I. Menzel,
Peter Hall,
Mohammad Golbabaee
Abstract:
Current spatiotemporal deep learning approaches to Magnetic Resonance Fingerprinting (MRF) build artefact-removal models customised to a particular k-space subsampling pattern which is used for fast (compressed) acquisition. This may not be useful when the acquisition process is unknown during training of the deep learning model and/or changes during testing time. This paper proposes an iterative…
▽ More
Current spatiotemporal deep learning approaches to Magnetic Resonance Fingerprinting (MRF) build artefact-removal models customised to a particular k-space subsampling pattern which is used for fast (compressed) acquisition. This may not be useful when the acquisition process is unknown during training of the deep learning model and/or changes during testing time. This paper proposes an iterative deep learning plug-and-play reconstruction approach to MRF which is adaptive to the forward acquisition process. Spatiotemporal image priors are learned by an image denoiser i.e. a Convolutional Neural Network (CNN), trained to remove generic white gaussian noise (not a particular subsampling artefact) from data. This CNN denoiser is then used as a data-driven shrinkage operator within the iterative reconstruction algorithm. This algorithm with the same denoiser model is then tested on two simulated acquisition processes with distinct subsampling patterns. The results show consistent de-aliasing performance against both acquisition schemes and accurate map** of tissues' quantitative bio-properties. Software available: https://github.com/ketanfatania/QMRI-PnP-Recon-POC
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Geometric Style Transfer
Authors:
Xiao-Chang Liu,
Xuan-Yi Li,
Ming-Ming Cheng,
Peter Hall
Abstract:
Neural style transfer (NST), where an input image is rendered in the style of another image, has been a topic of considerable progress in recent years. Research over that time has been dominated by transferring aspects of color and texture, yet these factors are only one component of style. Other factors of style include composition, the projection system used, and the way in which artists warp an…
▽ More
Neural style transfer (NST), where an input image is rendered in the style of another image, has been a topic of considerable progress in recent years. Research over that time has been dominated by transferring aspects of color and texture, yet these factors are only one component of style. Other factors of style include composition, the projection system used, and the way in which artists warp and bend objects. Our contribution is to introduce a neural architecture that supports transfer of geometric style. Unlike recent work in this area, we are unique in being general in that we are not restricted by semantic content. This new architecture runs prior to a network that transfers texture style, enabling us to transfer texture to a warped image. This form of network supports a second novelty: we extend the NST input paradigm. Users can input content/style pair as is common, or they can chose to input a content/texture-style/geometry-style triple. This three image input paradigm divides style into two parts and so provides significantly greater versatility to the output we can produce. We provide user studies that show the quality of our output, and quantify the importance of geometric style transfer to style recognition by humans.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
Magnify Your Population: Statistical Downscaling to Augment the Spatial Resolution of Socioeconomic Census Data
Authors:
Giulia Carella,
Andy Eschbacher,
Dongjie Fan,
Miguel Álvarez,
Álvaro Arredondo,
Alejandro Polvillo Hall,
Javier Pérez Trufero,
Javier de la Torre
Abstract:
Fine resolution estimates of demographic and socioeconomic attributes are crucial for planning and policy development. While several efforts have been made to produce fine-scale gridded population estimates, socioeconomic features are typically not available at scales finer than Census units, which may hide local heterogeneity and disparity. In this paper we present a new statistical downscaling a…
▽ More
Fine resolution estimates of demographic and socioeconomic attributes are crucial for planning and policy development. While several efforts have been made to produce fine-scale gridded population estimates, socioeconomic features are typically not available at scales finer than Census units, which may hide local heterogeneity and disparity. In this paper we present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes. The method leverages demographic and geographical extensive covariates available at multiple scales and additional Census covariates only available at coarse resolution, which are included in the model hierarchically within a "forward learning" approach. For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions, which are then adjusted to ensure the best possible consistency with the coarser Census data. As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of ~300 spatial resolution. The accuracy of the method is assessed at both spatial scales, first computing a pseudo cross-validation coefficient of determination for the predictions at the block group level and then, for extensive variables only, also for the (unadjusted) predicted counts summed by block group. Based on these scores and on the inspection of the downscaled maps, we conclude that our method is able to provide accurate, smoother, and more detailed socioeconomic estimates than the available Census data.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
VGPN: Voice-Guided Pointing Robot Navigation for Humans
Authors:
Jun Hu,
Zhongyu Jiang,
Xionghao Ding,
Peter Hall,
Taijiang Mu
Abstract:
Pointing gestures are widely used in robot navigationapproaches nowadays. However, most approaches only use point-ing gestures, and these have two major limitations. Firstly, they need to recognize pointing gestures all the time, which leads to long processing time and significant system overheads. Secondly,the user's pointing direction may not be very accurate, so the robot may go to an undesired…
▽ More
Pointing gestures are widely used in robot navigationapproaches nowadays. However, most approaches only use point-ing gestures, and these have two major limitations. Firstly, they need to recognize pointing gestures all the time, which leads to long processing time and significant system overheads. Secondly,the user's pointing direction may not be very accurate, so the robot may go to an undesired place. To relieve these limitations,we propose a voice-guided pointing robot navigation approach named VGPN, and implement its prototype on a wheeled robot,TurtleBot 2. VGPN recognizes a pointing gesture only if voice information is insufficient for navigation. VGPN also uses voice information as a supplementary channel to help determine the target position of the user's pointing gesture. In the evaluation,we compare VGPN to the pointing-only navigation approach. The results show that VGPN effectively reduces the processing timecost when pointing gesture is unnecessary, and improves the usersatisfaction with navigation accuracy.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Artistic Domain Generalisation Methods are Limited by their Deep Representations
Authors:
Padraig Boulton,
Peter Hall
Abstract:
The cross-depiction problem refers to the task of recognising visual objects regardless of their depictions; whether photographed, painted, sketched, {\em etc}. In the past, some researchers considered cross-depiction to be domain adaptation (DA). More recent work considers cross-depiction as domain generalisation (DG), in which algorithms extend recognition from one set of domains (such as photog…
▽ More
The cross-depiction problem refers to the task of recognising visual objects regardless of their depictions; whether photographed, painted, sketched, {\em etc}. In the past, some researchers considered cross-depiction to be domain adaptation (DA). More recent work considers cross-depiction as domain generalisation (DG), in which algorithms extend recognition from one set of domains (such as photographs and coloured artwork) to another (such as sketches). We show that fixing the last layer of AlexNet to random values provides a performance comparable to state of the art DA and DG algorithms, when tested over the PACS benchmark. With support from background literature, our results lead us to conclude that texture alone is insufficient to support generalisation; rather, higher-order representations such as structure and shape are necessary.
△ Less
Submitted 29 July, 2019;
originally announced July 2019.
-
Proposed Guidelines for the Responsible Use of Explainable Machine Learning
Authors:
Patrick Hall,
Navdeep Gill,
Nicholas Schmidt
Abstract:
Explainable machine learning (ML) enables human learning from ML, human appeal of automated model decisions, regulatory compliance, and security audits of ML models. Explainable ML (i.e. explainable artificial intelligence or XAI) has been implemented in numerous open source and commercial packages and explainable ML is also an important, mandatory, or embedded aspect of commercial predictive mode…
▽ More
Explainable machine learning (ML) enables human learning from ML, human appeal of automated model decisions, regulatory compliance, and security audits of ML models. Explainable ML (i.e. explainable artificial intelligence or XAI) has been implemented in numerous open source and commercial packages and explainable ML is also an important, mandatory, or embedded aspect of commercial predictive modeling in industries like financial services. However, like many technologies, explainable ML can be misused, particularly as a faulty safeguard for harmful black-boxes, e.g. fairwashing or scaffolding, and for other malevolent purposes like stealing models and sensitive training data. To promote best-practice discussions for this already in-flight technology, this short text presents internal definitions and a few examples before covering the proposed guidelines. This text concludes with a seemingly natural argument for the use of interpretable models and explanatory, debugging, and disparate impact testing methods in life- or mission-critical ML systems.
△ Less
Submitted 29 November, 2019; v1 submitted 8 June, 2019;
originally announced June 2019.
-
Example-Guided Style Consistent Image Synthesis from Semantic Labeling
Authors:
Miao Wang,
Guo-Ye Yang,
Ruilong Li,
Run-Ze Liang,
Song-Hai Zhang,
Peter. M. Hall,
Shi-Min Hu
Abstract:
Example-guided image synthesis aims to synthesize an image from a semantic label map and an exemplary image indicating style. We use the term "style" in this problem to refer to implicit characteristics of images, for example: in portraits "style" includes gender, racial identity, age, hairstyle; in full body pictures it includes clothing; in street scenes, it refers to weather and time of day and…
▽ More
Example-guided image synthesis aims to synthesize an image from a semantic label map and an exemplary image indicating style. We use the term "style" in this problem to refer to implicit characteristics of images, for example: in portraits "style" includes gender, racial identity, age, hairstyle; in full body pictures it includes clothing; in street scenes, it refers to weather and time of day and such like. A semantic label map in these cases indicates facial expression, full body pose, or scene segmentation. We propose a solution to the example-guided image synthesis problem using conditional generative adversarial networks with style consistency. Our key contributions are (i) a novel style consistency discriminator to determine whether a pair of images are consistent in style; (ii) an adaptive semantic consistency loss; and (iii) a training data sampling strategy, for synthesizing style-consistent results to the exemplar.
△ Less
Submitted 27 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Rank3DGAN: Semantic mesh generation using relative attributes
Authors:
Yassir Saquil,
Qun-Ce Xu,
Yong-Liang Yang,
Peter Hall
Abstract:
In this paper, we investigate a novel problem of using generative adversarial networks in the task of 3D shape generation according to semantic attributes. Recent works map 3D shapes into 2D parameter domain, which enables training Generative Adversarial Networks (GANs) for 3D shape generation task. We extend these architectures to the conditional setting, where we generate 3D shapes with respect…
▽ More
In this paper, we investigate a novel problem of using generative adversarial networks in the task of 3D shape generation according to semantic attributes. Recent works map 3D shapes into 2D parameter domain, which enables training Generative Adversarial Networks (GANs) for 3D shape generation task. We extend these architectures to the conditional setting, where we generate 3D shapes with respect to subjective attributes defined by the user. Given pairwise comparisons of 3D shapes, our model performs two tasks: it learns a generative model with a controlled latent space, and a ranking function for the 3D shapes based on their multi-chart representation in 2D. The capability of the model is demonstrated with experiments on HumanShape, Basel Face Model and reconstructed 3D CUB datasets. We also present various applications that benefit from our model, such as multi-attribute exploration, mesh editing, and mesh attribute transfer.
△ Less
Submitted 28 May, 2019; v1 submitted 24 May, 2019;
originally announced May 2019.
-
What and Where: A Context-based Recommendation System for Object Insertion
Authors:
Song-Hai Zhang,
Zheng** Zhou,
Bin Liu,
Xin Dong,
Dun Liang,
Peter Hall,
Shi-Min Hu
Abstract:
In this work, we propose a novel topic consisting of two dual tasks: 1) given a scene, recommend objects to insert, 2) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semi-automated advertising and video composition. The major challenge lies in the fact that the target obje…
▽ More
In this work, we propose a novel topic consisting of two dual tasks: 1) given a scene, recommend objects to insert, 2) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semi-automated advertising and video composition. The major challenge lies in the fact that the target object is neither present nor localized at test time, whereas available datasets only provide scenes with existing objects. To tackle this problem, we build an unsupervised algorithm based on object-level contexts, which explicitly models the joint probability distribution of object categories and bounding boxes with a Gaussian mixture model. Experiments on our newly annotated test set demonstrate that our system outperforms existing baselines on all subtasks, and do so under a unified framework. Our contribution promises future extensions and applications.
△ Less
Submitted 24 November, 2018;
originally announced November 2018.
-
On the Art and Science of Machine Learning Explanations
Authors:
Patrick Hall
Abstract:
This text discusses several popular explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the explanatory methods are accepted tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpr…
▽ More
This text discusses several popular explanatory methods that go beyond the error measurements and plots traditionally used to assess machine learning models. Some of the explanatory methods are accepted tools of the trade while others are rigorously derived and backed by long-standing theory. The methods, decision tree surrogate models, individual conditional expectation (ICE) plots, local interpretable model-agnostic explanations (LIME), partial dependence plots, and Shapley explanations, vary in terms of scope, fidelity, and suitable application domain. Along with descriptions of these methods, this text presents real-world usage recommendations supported by a use case and public, in-depth software examples for reproducibility.
△ Less
Submitted 31 May, 2020; v1 submitted 5 October, 2018;
originally announced October 2018.
-
LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments
Authors:
Dun Liang,
Yuanchen Guo,
Shaokui Zhang,
Song-Hai Zhang,
Peter Hall,
Min Zhang,
Shimin Hu
Abstract:
High Definition (HD) maps play an important role in modern traffic scenes. However, the development of HD maps coverage grows slowly because of the cost limitation. To efficiently model HD maps, we proposed a convolutional neural network with a novel prediction layer and a zoom module, called LineNet. It is designed for state-of-the-art lane detection in an unordered crowdsourced image dataset. An…
▽ More
High Definition (HD) maps play an important role in modern traffic scenes. However, the development of HD maps coverage grows slowly because of the cost limitation. To efficiently model HD maps, we proposed a convolutional neural network with a novel prediction layer and a zoom module, called LineNet. It is designed for state-of-the-art lane detection in an unordered crowdsourced image dataset. And we introduced TTLane, a dataset for efficient lane detection in urban road modeling applications. Combining LineNet and TTLane, we proposed a pipeline to model HD maps with crowdsourced data for the first time. And the maps can be constructed precisely even with inaccurate crowdsourced data.
△ Less
Submitted 16 July, 2018;
originally announced July 2018.
-
Ranking CGANs: Subjective Control over Semantic Image Attributes
Authors:
Yassir Saquil,
Kwang In Kim,
Peter Hall
Abstract:
In this paper, we investigate the use of generative adversarial networks in the task of image generation according to subjective measures of semantic attributes. Unlike the standard (CGAN) that generates images from discrete categorical labels, our architecture handles both continuous and discrete scales. Given pairwise comparisons of images, our model, called RankCGAN, performs two tasks: it lear…
▽ More
In this paper, we investigate the use of generative adversarial networks in the task of image generation according to subjective measures of semantic attributes. Unlike the standard (CGAN) that generates images from discrete categorical labels, our architecture handles both continuous and discrete scales. Given pairwise comparisons of images, our model, called RankCGAN, performs two tasks: it learns to rank images using a subjective measure; and it learns a generative model that can be controlled by that measure. RankCGAN associates each subjective measure of interest to a distinct dimension of some latent space. We perform experiments on UT-Zap50K, PubFig and OSR datasets and demonstrate that the model is expressive and diverse enough to conduct two-attribute exploration and image editing.
△ Less
Submitted 24 July, 2018; v1 submitted 11 April, 2018;
originally announced April 2018.
-
A Highly Accelerated Parallel Multi-GPU based Reconstruction Algorithm for Generating Accurate Relative Stop** Powers
Authors:
Paniz Karbasi,
Ritchie Cai,
Blake Schultze,
Hanh Nguyen,
Jones Reed,
Patrick Hall,
Valentina Giacometti,
Vladimir Bashkirov,
Robert Johnson,
Nick Karonis,
Jeffrey Olafsen,
Caesar Ordonez,
Keith E. Schubert,
Reinhard W. Schulte
Abstract:
Low-dose Proton Computed Tomography (pCT) is an evolving imaging modality that is used in proton therapy planning which addresses the range uncertainty problem. The goal of pCT is generating a 3D map of Relative Stop** Power (RSP) measurements with high accuracy within clinically required time frames. Generating accurate RSP values within the shortest amount of time is considered a key goal when…
▽ More
Low-dose Proton Computed Tomography (pCT) is an evolving imaging modality that is used in proton therapy planning which addresses the range uncertainty problem. The goal of pCT is generating a 3D map of Relative Stop** Power (RSP) measurements with high accuracy within clinically required time frames. Generating accurate RSP values within the shortest amount of time is considered a key goal when develo** a pCT software. The existing pCT softwares have successfully met this time frame and even succeeded this time goal, but requiring clusters with hundreds of processors.
This paper describes a novel reconstruction technique using two Graphics Processing Unit (GPU) cores, such as is available on a single Nvidia P100. The proposed reconstruction technique is tested on both simulated and experimental datasets and on two different systems namely Nvidia K40 and P100 GPUs from IBM and Cray. The experimental results demonstrate that our proposed reconstruction method meets both the timing and accuracy with the benefit of having reasonable cost, and efficient use of power.
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
English Conversational Telephone Speech Recognition by Humans and Machines
Authors:
George Saon,
Gakuto Kurata,
Tom Sercu,
Kartik Audhkhasi,
Samuel Thomas,
Dimitrios Dimitriadis,
Xiaodong Cui,
Bhuvana Ramabhadran,
Michael Picheny,
Lynn-Li Lim,
Bergul Roomi,
Phil Hall
Abstract:
One of the most difficult speech recognition tasks is accurate recognition of human to human communication. Advances in deep learning over the last few years have produced major speech recognition improvements on the representative Switchboard conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to b…
▽ More
One of the most difficult speech recognition tasks is accurate recognition of human to human communication. Advances in deep learning over the last few years have produced major speech recognition improvements on the representative Switchboard conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates? A recent paper by Microsoft suggests that we have already achieved human performance. In trying to verify this statement, we performed an independent set of human performance measurements on two conversational tasks and found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve. We also report on our own efforts in this area, presenting a set of acoustic and language modeling techniques that lowered the word error rate of our own English conversational telephone LVCSR system to the level of 5.5%/10.3% on the Switchboard/CallHome subsets of the Hub5 2000 evaluation, which - at least at the writing of this paper - is a new performance milestone (albeit not at what we measure to be human performance!). On the acoustic side, we use a score fusion of three models: one LSTM with multiple feature inputs, a second LSTM trained with speaker-adversarial multi-task learning and a third residual net (ResNet) with 25 convolutional layers and time-dilated convolutions. On the language modeling side, we use word and character LSTMs and convolutional WaveNet-style language models.
△ Less
Submitted 6 March, 2017;
originally announced March 2017.
-
Detecting People in Artwork with CNNs
Authors:
Nicholas Westlake,
Hong** Cai,
Peter Hall
Abstract:
CNNs have massively improved performance in object detection in photographs. However research into object detection in artwork remains limited. We show state-of-the-art performance on a challenging dataset, People-Art, which contains people from photos, cartoons and 41 different artwork movements. We achieve this high performance by fine-tuning a CNN for this task, thus also demonstrating that tra…
▽ More
CNNs have massively improved performance in object detection in photographs. However research into object detection in artwork remains limited. We show state-of-the-art performance on a challenging dataset, People-Art, which contains people from photos, cartoons and 41 different artwork movements. We achieve this high performance by fine-tuning a CNN for this task, thus also demonstrating that training CNNs on photos results in overfitting for photos: only the first three or four layers transfer from photos to artwork. Although the CNN's performance is the highest yet, it remains less than 60\% AP, suggesting further work is needed for the cross-depiction problem. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46604-0_57
△ Less
Submitted 27 October, 2016;
originally announced October 2016.
-
Dense Motion Estimation for Smoke
Authors:
Da Chen,
Wenbin Li,
Peter Hall
Abstract:
Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of…
▽ More
Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of smoke compared to other dense motion estimation algorithms, including state of the art and neural network approaches. The key to our contribution is to use skeletal flow, without explicit point matching, to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this paper we describe our algorithm in greater detail, and provide experimental evidence to support our claims.
△ Less
Submitted 8 September, 2016; v1 submitted 7 September, 2016;
originally announced September 2016.
-
The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs
Authors:
Hong** Cai,
Qi Wu,
Tadeo Corradi,
Peter Hall
Abstract:
The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc. It is a potentially significant yet under-researched problem. Emulating the remarkable human ability to recognise objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of Computer Vision.
In this p…
▽ More
The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc. It is a potentially significant yet under-researched problem. Emulating the remarkable human ability to recognise objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of Computer Vision.
In this paper we benchmark classification, domain adaptation, and deep learning methods; demonstrating that none perform consistently well in the cross-depiction problem. Given the current interest in deep learning, the fact such methods exhibit the same behaviour as all but one other method: they show a significant fall in performance over inhomogeneous databases compared to their peak performance, which is always over data comprising photographs only. Rather, we find the methods that have strong models of spatial relations between parts tend to be more robust and therefore conclude that such information is important in modelling object classes regardless of appearance details.
△ Less
Submitted 1 May, 2015;
originally announced May 2015.
-
Comment: Citation Statistics
Authors:
Peter Gavin Hall
Abstract:
Comment on "Citation Statistics" [arXiv:0910.3529]
Comment on "Citation Statistics" [arXiv:0910.3529]
△ Less
Submitted 19 October, 2009;
originally announced October 2009.