Search | arXiv e-print repository

arXiv:2406.19585 [pdf]

doi 10.1016/j.jeurceramsoc.2024.116696

Effect of interfacial Fe3O4 nanoparticles on the microstructure and mechanical properties of textured alumina densified by ultrafast high-temperature sintering

Authors: Rohit Pratyush Behera, Andrew Yun Ru Ng, Zehui Du, Chee Lip Gan, Hortense Le Ferrand

Abstract: Alumina microplatelets coated with a small amount of Fe3O4 can be oriented via a rotating magnetic field to create texture. After ultrafast high-temperature sintering (UHS), Fe atoms are found at the grain boundaries and within the grains, influencing the mechanical properties. Here, we compare the microstructure and mechanical properties of textured alumina prepared with and without Fe3O4 and sin… ▽ More Alumina microplatelets coated with a small amount of Fe3O4 can be oriented via a rotating magnetic field to create texture. After ultrafast high-temperature sintering (UHS), Fe atoms are found at the grain boundaries and within the grains, influencing the mechanical properties. Here, we compare the microstructure and mechanical properties of textured alumina prepared with and without Fe3O4 and sintered using UHS or conventional sintering (CS). Microstructural analysis using electron backscattering diffraction (EBSD) indicates that Fe3O4 induces crystallographic defects in the ceramic after UHS. Nanoindentation measurements enlighten that the presence of Fe3O4 leads to plastic flow that increases the energy dissipation, reaching ~122 % at a maximum load of 1900 mN compared to pristine samples. Overall, due to the concentrated effects of Fe3O4 after UHS, the flexural strength and fracture toughness values are higher than the other two samples, reaching values of ~287 MPa and 7 MPa.m0.5, respectively. These results could be leveraged to produce stronger and tougher ceramics. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 10 pages, 11 figures, contains main manuscript and supplementary file

Journal ref: Journal of the European Ceramic Society 44 (2024) 116696

arXiv:2406.01464 [pdf, other]

Using Convolutional Neural Networks to detect Edge Localized Modes in DIII-D from Doppler Backscattering measurements

Authors: N. Q. X. Teo, V. H. Hall-Chen, K. Barada, R. J. H. Ng, L. Gu, A. K. Yeoh, Q. T. Pratt, X. Garbet, T. L. Rhodes

Abstract: In H-mode tokamak plasmas, the plasma is sometimes ejected beyond the edge transport barrier. These events are known as edge localized modes (ELMs). ELMs cause a loss of energy and damage the vessel walls. Understanding the physics of ELMs and by extension, how to detect and mitigate them, is an important challenge. In this paper, we focus on two diagnostic methods $\unicode{x2013}$ D-alpha spectr… ▽ More In H-mode tokamak plasmas, the plasma is sometimes ejected beyond the edge transport barrier. These events are known as edge localized modes (ELMs). ELMs cause a loss of energy and damage the vessel walls. Understanding the physics of ELMs and by extension, how to detect and mitigate them, is an important challenge. In this paper, we focus on two diagnostic methods $\unicode{x2013}$ D-alpha spectroscopy and Doppler backscattering (DBS). The former detects ELMs by measuring Balmer alpha emission while the latter uses microwave radiation to probe the plasma. DBS has the advantage of having higher temporal resolution and robustness to damage. These advantages of DBS diagnostics may be beneficial for future operational tokamaks and thus data processing techniques for DBS should be developed in preparation. In sight of this, we explore the training of neural networks to detect ELMs from DBS data, using D-alpha data as the ground truth. With shots found in the DIII-D database, the model is trained to classify each time step based on the occurrence of an ELM event. The results are promising. When tested on shots similar to those used for training, the model is capable of consistently achieving a high f1-score of 0.93. This score is a performance metric for imbalanced datasets that ranges between 0 and 1. We evaluate the performance of our neural network on a variety of ELMs $\unicode{x2013}$ grasssy, suppressed, and wide pedestal $\unicode{x2013}$ finding broad applicability. Beyond ELMs, our work demonstrates the wider feasibility of applying neural networks to data from DBS diagnostics. △ Less

Submitted 3 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.01842 [pdf, ps, other]

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

Authors: Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Abstract: To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native ann… ▽ More To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2402.08855 [pdf, other]

GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency

Authors: Catherine Yeh, Gonzalo Ramos, Rachel Ng, Andy Huntington, Richard Banks

Abstract: Large language models (LLMs) are becoming more prevalent and have found a ubiquitous use in providing different forms of writing assistance. However, LLM-powered writing systems can frustrate users due to their limited personalization and control, which can be exacerbated when users lack experience with prompt engineering. We see design as one way to address these challenges and introduce GhostWri… ▽ More Large language models (LLMs) are becoming more prevalent and have found a ubiquitous use in providing different forms of writing assistance. However, LLM-powered writing systems can frustrate users due to their limited personalization and control, which can be exacerbated when users lack experience with prompt engineering. We see design as one way to address these challenges and introduce GhostWriter, an AI-enhanced writing design probe where users can exercise enhanced agency and personalization. GhostWriter leverages LLMs to learn the user's intended writing style implicitly as they write, while allowing explicit teaching moments through manual style edits and annotations. We study 18 participants who use GhostWriter on two different writing tasks, observing that it helps users craft personalized text generations and empowers them by providing multiple ways to control the system's writing style. From this study, we present insights regarding people's relationship with AI-assisted writing and offer design recommendations for future work. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 29 pages, 12 figures

arXiv:2402.03659 [pdf, other]

doi 10.1145/3589334.3645611

Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models

Authors: Kelvin J. L. Koa, Yunshan Ma, Ritchie Ng, Tat-Seng Chua

Abstract: Explaining stock predictions is generally a difficult task for traditional non-generative deep learning models, where explanations are limited to visualizing the attention weights on important texts. Today, Large Language Models (LLMs) present a solution to this problem, given their known capabilities to generate human-readable explanations for their decision-making process. However, the task of s… ▽ More Explaining stock predictions is generally a difficult task for traditional non-generative deep learning models, where explanations are limited to visualizing the attention weights on important texts. Today, Large Language Models (LLMs) present a solution to this problem, given their known capabilities to generate human-readable explanations for their decision-making process. However, the task of stock prediction remains challenging for LLMs, as it requires the ability to weigh the varying impacts of chaotic social texts on stock prices. The problem gets progressively harder with the introduction of the explanation component, which requires LLMs to explain verbally why certain factors are more important than the others. On the other hand, to fine-tune LLMs for such a task, one would need expert-annotated samples of explanation for every stock movement in the training set, which is expensive and impractical to scale. To tackle these issues, we propose our Summarize-Explain-Predict (SEP) framework, which utilizes a self-reflective agent and Proximal Policy Optimization (PPO) to let a LLM teach itself how to generate explainable stock predictions in a fully autonomous manner. The reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations from input texts. The training samples for the PPO trainer are also the responses generated during the reflective process, which eliminates the need for human annotators. Using our SEP framework, we fine-tune a LLM that can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient for the stock classification task. To justify the generalization capability of our framework, we further test it on the portfolio construction task, and demonstrate its effectiveness through various portfolio metrics. △ Less

Submitted 29 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: WWW 2024

arXiv:2311.16047 [pdf]

doi 10.1117/12.2652661

Observer study-based evaluation of TGAN architecture used to generate oncological PET images

Authors: Roberto Fedrigo, Fereshteh Yousefirizi, Zi** Liu, Abhinav K. Jha, Robert V. Bergen, Jean-Francois Rajotte, Raymond T. Ng, Ingrid Bloise, Sara Harsini, Dan J. Kadrmas, Carlos Uribe, Arman Rahmim

Abstract: The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal… ▽ More The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal generative adversarial network (TGAN) architecture and evaluated its performance segmenting lesions and identifying radiomics features in synthesized images. In this work, a two-alternative forced-choice (2AFC) observer study was performed to quantitatively evaluate the ability of human observers to distinguish between real and synthesized oncological PET images. In the study eight trained readers, including two board-certified nuclear medicine physicians, read 170 real/synthetic image pairs presented as 2D-transaxial using a dedicated web app. For each image pair, the observer was asked to identify the real image and input their confidence level with a 5-point Likert scale. P-values were computed using the binomial test and Wilcoxon signed-rank test. A heat map was used to compare the response accuracy distribution for the signed-rank test. Response accuracy for all observers ranged from 36.2% [27.9-44.4] to 63.1% [54.8-71.3]. Six out of eight observers did not identify the real image with statistical significance, indicating that the synthetic dataset was reasonably representative of oncological PET images. Overall, this study adds validity to the realism of our simulated H&N cancer dataset, which may be implemented in the future to train AI algorithms while favoring patient confidentiality and privacy protection. △ Less

Submitted 27 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2309.00288 [pdf, other]

Stark effect tunable terahertz transitions in finite carbon chains

Authors: R. A. Ng, M. E. Portnoi, R. R. Hartmann

Abstract: We employ a tight-binding model to calculate the optical selection rules of gold-terminated carbyne chains in the presence of an applied electric field. We show that both the magnitude of the edge-state gap and the strength of optical transitions across it can be tuned via the Stark effect. In the case of sufficiently long carbyne chains, the dipole transitions between edge states occur within the… ▽ More We employ a tight-binding model to calculate the optical selection rules of gold-terminated carbyne chains in the presence of an applied electric field. We show that both the magnitude of the edge-state gap and the strength of optical transitions across it can be tuned via the Stark effect. In the case of sufficiently long carbyne chains, the dipole transitions between edge states occur within the THz frequency range. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 12 pages, 6 figures

arXiv:2309.00073 [pdf, other]

doi 10.1145/3583780.3614844

Diffusion Variational Autoencoder for Tackling Stochasticity in Multi-Step Regression Stock Price Prediction

Authors: Kelvin J. L. Koa, Yunshan Ma, Ritchie Ng, Tat-Seng Chua

Abstract: Multi-step stock price prediction over a long-term horizon is crucial for forecasting its volatility, allowing financial institutions to price and hedge derivatives, and banks to quantify the risk in their trading books. Additionally, most financial regulators also require a liquidity horizon of several days for institutional investors to exit their risky assets, in order to not materially affect… ▽ More Multi-step stock price prediction over a long-term horizon is crucial for forecasting its volatility, allowing financial institutions to price and hedge derivatives, and banks to quantify the risk in their trading books. Additionally, most financial regulators also require a liquidity horizon of several days for institutional investors to exit their risky assets, in order to not materially affect market prices. However, the task of multi-step stock price prediction is challenging, given the highly stochastic nature of stock data. Current solutions to tackle this problem are mostly designed for single-step, classification-based predictions, and are limited to low representation expressiveness. The problem also gets progressively harder with the introduction of the target price sequence, which also contains stochastic noise and reduces generalizability at test-time. To tackle these issues, we combine a deep hierarchical variational-autoencoder (VAE) and diffusion probabilistic techniques to do seq2seq stock prediction through a stochastic generative process. The hierarchical VAE allows us to learn the complex and low-level latent variables for stock prediction, while the diffusion probabilistic model trains the predictor to handle stock price stochasticity by progressively adding random noise to the stock data. Our Diffusion-VAE (D-Va) model is shown to outperform state-of-the-art solutions in terms of its prediction accuracy and variance. More importantly, the multi-step outputs can also allow us to form a stock portfolio over the prediction length. We demonstrate the effectiveness of our model outputs in the portfolio investment task through the Sharpe ratio metric and highlight the importance of dealing with different types of prediction uncertainties. △ Less

Submitted 29 October, 2023; v1 submitted 18 August, 2023; originally announced September 2023.

Comments: CIKM 2023

arXiv:2307.15867 [pdf, other]

A new method to distinguish gravitational-wave signals from detector noise transients with Gravity Spy

Authors: Seraphim Jarov, Sarah Thiele, Siddharth Soni, Julian Ding, Jess McIver, Raymond Ng, Rikako Hatoya, Derek Davis

Abstract: The Advanced LIGO and Advanced Virgo detectors have enabled the confident detection of dozens of mergers of black holes and neutron stars. However, the presence of detector noise transients (glitches) hinders the search for these gravitational wave (GW) signals. We prototyped a restructuring of Gravity Spy's classification model to distinguish between glitches and astrophysical signals. Our method… ▽ More The Advanced LIGO and Advanced Virgo detectors have enabled the confident detection of dozens of mergers of black holes and neutron stars. However, the presence of detector noise transients (glitches) hinders the search for these gravitational wave (GW) signals. We prototyped a restructuring of Gravity Spy's classification model to distinguish between glitches and astrophysical signals. Our method is able to correctly classify three-quarters of retracted candidate events in O3b as non-astrophysical and 100\% of the confirmed astrophysical events as true signals. This approach will inform candidate event validation efforts in the latest observing run. △ Less

Submitted 5 February, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 11 pages, 7 figures, submitted to PHYSICAL REVIEW D

arXiv:2305.14617 [pdf, other]

COMET-M: Reasoning about Multiple Events in Complex Sentences

Authors: Sahithya Ravi, Raymond Ng, Vered Shwartz

Abstract: Understanding the speaker's intended meaning often involves drawing commonsense inferences to reason about what is not stated explicitly. In multi-event sentences, it requires understanding the relationships between events based on contextual knowledge. We propose COMET-M (Multi-Event), an event-centric commonsense model capable of generating commonsense inferences for a target event within a comp… ▽ More Understanding the speaker's intended meaning often involves drawing commonsense inferences to reason about what is not stated explicitly. In multi-event sentences, it requires understanding the relationships between events based on contextual knowledge. We propose COMET-M (Multi-Event), an event-centric commonsense model capable of generating commonsense inferences for a target event within a complex sentence. COMET-M builds upon COMET (Bosselut et al., 2019), which excels at generating event-centric inferences for simple sentences, but struggles with the complexity of multi-event sentences prevalent in natural text. To overcome this limitation, we curate a multi-event inference dataset of 35K human-written inferences. We trained COMET-M on the human-written inferences and also created baselines using automatically labeled examples. Experimental results demonstrate the significant performance improvement of COMET-M over COMET in generating multi-event inferences. Moreover, COMET-M successfully produces distinct inferences for each target event, taking the complete context into consideration. COMET-M holds promise for downstream tasks involving natural text such as coreference resolution, dialogue, and story understanding. △ Less

Submitted 23 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.07930 [pdf, other]

FoundWright: A System to Help People Re-find Pages from Their Web-history

Authors: Haekyu Park, Gonzalo Ramos, **a Suh, Christopher Meek, Rachel Ng, Mary Czerwinski

Abstract: Re-finding information is an essential activity, however, it can be difficult when people struggle to express what they are looking for. Through a need-finding survey, we first seek opportunities for improving re-finding experiences, and explore one of these opportunities by implementing the FoundWright system. The system leverages recent advances in language transformer models to expand people's… ▽ More Re-finding information is an essential activity, however, it can be difficult when people struggle to express what they are looking for. Through a need-finding survey, we first seek opportunities for improving re-finding experiences, and explore one of these opportunities by implementing the FoundWright system. The system leverages recent advances in language transformer models to expand people's ability to express what they are looking for, through the interactive creation and manipulation of concepts contained within documents. We use FoundWright as a design probe to understand (1) how people create and use concepts, (2) how this expanded ability helps re-finding, and (3) how people engage and collaborate with FoundWright's machine learning support. Our probe reveals that this expanded way of expressing re-finding goals helps people with the task, by complementing traditional searching and browsing. Finally, we present insights and recommendations for future work aiming at develo** systems to support re-finding. △ Less

Submitted 13 May, 2023; originally announced May 2023.

Comments: 26 pages

arXiv:2304.09977 [pdf, other]

GSpyNetTree: A signal-vs-glitch classifier for gravitational-wave event candidates

Authors: Sofia Alvarez-Lopez, Annudesh Liyanage, Julian Ding, Raymond Ng, Jess McIver

Abstract: Despite achieving sensitivities capable of detecting the extremely small amplitude of gravitational waves (GWs), LIGO and Virgo detector data contain frequent bursts of non-Gaussian transient noise, commonly known as 'glitches'. Glitches come in various time-frequency morphologies, and they are particularly challenging when they mimic the form of real GWs. Given the higher expected event rate in t… ▽ More Despite achieving sensitivities capable of detecting the extremely small amplitude of gravitational waves (GWs), LIGO and Virgo detector data contain frequent bursts of non-Gaussian transient noise, commonly known as 'glitches'. Glitches come in various time-frequency morphologies, and they are particularly challenging when they mimic the form of real GWs. Given the higher expected event rate in the next observing run (O4), LIGO-Virgo GW event candidate validation will require increased levels of automation. Gravity Spy, a machine learning tool that successfully classified common types of LIGO and Virgo glitches in previous observing runs, has the potential to be restructured as a signal-vs-glitch classifier to accurately distinguish between glitches and GW signals. A signal-vs-glitch classifier used for automation must be robust and compatible with a broad array of background noise, new sources of glitches, and the likely occurrence of overlap** glitches and GWs. We present GSpyNetTree, the Gravity Spy Convolutional Neural Network Decision Tree: a multi-CNN classifier using CNNs in a decision tree sorted via total GW candidate mass tested under these realistic O4-era scenarios. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 19 pages, 12 figures, submitted to Classical and Quantum Gravity

arXiv:2303.03752 [pdf, other]

Quantum thermometry with single molecules in portable nanoprobes

Authors: V. Esteso, R. Duquennoy, R. C. Ng, M. Colautti, P. Lombardi, G. Arregui, E. Chavez-Ángel, C. M. Sotomayor-Torres, P. D. García, M. Hilke, C. Toninelli

Abstract: Understanding heat transport is relevant to develop efficient strategies for thermal management in microelectronics for instance, as well as for fundamental science purposes. However, measuring temperatures in nanostructured environments and in cryogenic conditions remains a challenging task, that requires both high sentitivity and a non-invasive approach. Here we present a portable nanothermomete… ▽ More Understanding heat transport is relevant to develop efficient strategies for thermal management in microelectronics for instance, as well as for fundamental science purposes. However, measuring temperatures in nanostructured environments and in cryogenic conditions remains a challenging task, that requires both high sentitivity and a non-invasive approach. Here we present a portable nanothermometer based on a molecular two-level quantum system that operates in the 3 - 30 K temperature range, with excellent temperature and spatial resolutions on the order of mK and $μ$m, respectively. We validate the performance of this molecular thermometer on nanostructures, by estimating the thermal conductivity of a patterned silicon membrane. In addition, we demonstrate the two-dimensional temperature map** of a patterned surface via the simultaneous spectroscopy of all thermometers deposited on a sample. These results demonstrate the potential of this molecular thermometer to explore thermal properties and related phenomena at cryogenic temperatures. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 12 pages, 6 figures

arXiv:2302.09715 [pdf, other]

What happens before and after: Multi-Event Commonsense in Event Coreference Resolution

Authors: Sahithya Ravi, Chris Tanner, Raymond Ng, Vered Shwartz

Abstract: Event coreference models cluster event mentions pertaining to the same real-world event. Recent models rely on contextualized representations to recognize coreference among lexically or contextually similar mentions. However, models typically fail to leverage commonsense inferences, which is particularly limiting for resolving lexically-divergent mentions. We propose a model that extends event men… ▽ More Event coreference models cluster event mentions pertaining to the same real-world event. Recent models rely on contextualized representations to recognize coreference among lexically or contextually similar mentions. However, models typically fail to leverage commonsense inferences, which is particularly limiting for resolving lexically-divergent mentions. We propose a model that extends event mentions with temporal commonsense inferences. Given a complex sentence with multiple events, e.g., "The man killed his wife and got arrested", with the target event "arrested", our model generates plausible events that happen before the target event - such as "the police arrived", and after it, such as "he was sentenced". We show that incorporating such inferences into an existing event coreference model improves its performance, and we analyze the coreferences in which such temporal knowledge is required. △ Less

Submitted 21 February, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: Accepted to EACL 2023

arXiv:2210.16370 [pdf, other]

Intermodulation of optical frequency combs in a multimode optomechanical system

Authors: Ryan C. Ng, Paul Nizet, Daniel Navarro-Urrios, Guillermo Arregui, Marcus Albrechtsen, Pedro D. García, Søren Stobbe, Clivia M. Sotomayor-Torres, Guilhem Madiot

Abstract: Phonons offer the possibility to connect the microwave and optical domains while being efficiently transduced with electronic and optical signals. Here, we present a multimodal optomechanical platform, consisting of a mechanical-optical-mechanical resonator configuration. The mechanical modes, with frequencies at 265 MHz and 6.8 GHz, can be simultaneously excited into a phonon lasing regime as sup… ▽ More Phonons offer the possibility to connect the microwave and optical domains while being efficiently transduced with electronic and optical signals. Here, we present a multimodal optomechanical platform, consisting of a mechanical-optical-mechanical resonator configuration. The mechanical modes, with frequencies at 265 MHz and 6.8 GHz, can be simultaneously excited into a phonon lasing regime as supported by a stability analysis of the system. Both the MHz and the GHz modes enter a self-sustained oscillation regime, leading to the intermodulation of two frequency combs in the optical field. We characterize this platform experimentally, demonstrating previously unexplored dynamical regimes. These results suggest the possibility to control multiple mechanical degrees of freedom via a single optical mode, with implications in GHz phononic devices, signal processing, and optical comb sensing applications. △ Less

Submitted 3 March, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 10 pages

arXiv:2207.11754 [pdf, other]

Virtual Reality Therapy for the Psychological Well-being of Palliative Care Patients in Hong Kong

Authors: Daniel Eckhoff, Royce Ng, Alvaro Cassinelli

Abstract: In this paper we introduce novel Virtual Reality (VR) and Augmented Reality (AR) treatments to improve the psychological well being of patients in palliative care, based on interviews with a clinical psychologist who has successfully implemented VR assisted interventions on palliative care patients in the Hong Kong hospital system. Our VR and AR assisted interventions are adaptations of traditiona… ▽ More In this paper we introduce novel Virtual Reality (VR) and Augmented Reality (AR) treatments to improve the psychological well being of patients in palliative care, based on interviews with a clinical psychologist who has successfully implemented VR assisted interventions on palliative care patients in the Hong Kong hospital system. Our VR and AR assisted interventions are adaptations of traditional palliative care therapies which simultaneously facilitate patients communication with family and friends while isolated in hospital due to physical weakness and COVID-19 related restrictions. The first system we propose is a networked, metaverse platform for palliative care patients to create customized virtual environments with therapists, family and friends which function as immersive and collaborative versions of 'life review' and 'reminiscence therapy'. The second proposed system will investigate the use of Mixed Reality telepresence and haptic touch in an AR environment, which will allow palliative care patients to physically feel friends and family in a virtual space, adding to the sense of presence and immersion in that environment. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2207.02104 [pdf, ps, other]

doi 10.1109/ASRU46091.2019.9003838

A cross-corpus study on speech emotion recognition

Authors: Rosanna Milner, Md Asif Jalal, Raymond W. M. Ng, Thomas Hain

Abstract: For speech emotion datasets, it has been difficult to acquire large quantities of reliable data and acted emotions may be over the top compared to less expressive emotions displayed in everyday life. Lately, larger datasets with natural emotions have been created. Instead of ignoring smaller, acted datasets, this study investigates whether information learnt from acted emotions is useful for detec… ▽ More For speech emotion datasets, it has been difficult to acquire large quantities of reliable data and acted emotions may be over the top compared to less expressive emotions displayed in everyday life. Lately, larger datasets with natural emotions have been created. Instead of ignoring smaller, acted datasets, this study investigates whether information learnt from acted emotions is useful for detecting natural emotions. Cross-corpus research has mostly considered cross-lingual and even cross-age datasets, and difficulties arise from different methods of annotating emotions causing a drop in performance. To be consistent, four adult English datasets covering acted, elicited and natural emotions are considered. A state-of-the-art model is proposed to accurately investigate the degradation of performance. The system involves a bi-directional LSTM with an attention mechanism to classify emotions across datasets. Experiments study the effects of training models in a cross-corpus and multi-domain fashion and results show the transfer of information is not successful. Out-of-domain models, followed by adapting to the missing dataset, and domain adversarial training (DAT) are shown to be more suitable to generalising to emotions across datasets. This shows positive information transfer from acted datasets to those with more natural emotions and the benefits from training on different corpora. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: ASRU 2019

Journal ref: IEEE Workshop on Automatic Speech Recognition and Understanding 2019

arXiv:2206.06913 [pdf, other]

doi 10.1103/PhysRevLett.130.106903

Optomechanical generation of coherent GHz vibrations in a phononic waveguide

Authors: Guilhem Madiot, Ryan C. Ng, Guillermo Arregui, Omar Florez, Marcus Albrechtsen, Søren Stobbe, Pedro D. Garcia, Clivia M. Sotomayor-Torres

Abstract: Nanophononics has the potential for information transfer, in an analogous manner to its photonic and electronic counterparts. The adoption of phononic systems has been limited, due to difficulties associated with the generation, manipulation, and detection of phonons, especially at GHz frequencies. Existing techniques often require piezoelectric materials with an external radiofrequency excitation… ▽ More Nanophononics has the potential for information transfer, in an analogous manner to its photonic and electronic counterparts. The adoption of phononic systems has been limited, due to difficulties associated with the generation, manipulation, and detection of phonons, especially at GHz frequencies. Existing techniques often require piezoelectric materials with an external radiofrequency excitation that are not readily integrated into existing CMOS infrastructures, while non-piezoelectric demonstrations have been inefficient. In this work, we explore the optomechanical generation of coherent phonons in a suspended 2D silicon phononic crystal cavity with a guided mode around 6.8 GHz. By incorporating an air-slot into this cavity, we turn the phononic waveguide into an optomechanical platform that exploits localized photonic modes resulting from inherent fabrication imperfections for the transduction of mechanics. Such a platform exhibits very fine control of phonons using light, and is capable of coherent self-sustained phonon generation via mechanical lasing around 6.8 GHz. The ability to generate high frequency coherent mechanical vibrations within such a simple 2D CMOS-compatible system could be a first step towards the development of sources in phononic circuitry and the coherent manipulation of other solid-state properties. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: 13 pages, 4 main text figures, 6 appendix figures

arXiv:2206.06448 [pdf]

Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN

Authors: Robert V. Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Arman Rahmim, Raymond T. Ng

Abstract: Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transve… ▽ More Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transversal GAN (TrGAN), using head & neck PET images which are conditioned on tumour masks as a case study. We define quantitative measures of image fidelity, utility and privacy for our model. These metrics are evaluated in the course of training to identify ideal fidelity, utility and privacy trade-offs and establish the relationships between these parameters. We show that the discriminator of the TrGAN is vulnerable to attack, and that an attacker can identify which samples were used in training with almost perfect accuracy (AUC = 0.99). We also show that an attacker with access to only the generator cannot reliably classify whether a sample had been used for training (AUC = 0.51). This suggests that TrGAN generators, but not discriminators, may be used for sharing synthetic 3-D PET data with minimal privacy risk while maintaining good utility and fidelity. △ Less

Submitted 31 October, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2111.01866

arXiv:2205.13741 [pdf, other]

Generating multivariate time series with COmmon Source CoordInated GAN (COSCI-GAN)

Authors: Ali Seyfi, Jean-Francois Rajotte, Raymond T. Ng

Abstract: Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such a… ▽ More Generating multivariate time series is a promising approach for sharing sensitive data in many medical, financial, and IoT applications. A common type of multivariate time series originates from a single source such as the biometric measurements from a medical patient. This leads to complex dynamical patterns between individual time series that are hard to learn by typical generation models such as GANs. There is valuable information in those patterns that machine learning models can use to better classify, predict or perform other downstream tasks. We propose a novel framework that takes time series' common origin into account and favors channel/feature relationships preservation. The two key points of our method are: 1) the individual time series are generated from a common point in latent space and 2) a central discriminator favors the preservation of inter-channel/feature dynamics. We demonstrate empirically that our method helps preserve channel/feature correlations and that our synthetic data performs very well in downstream tasks with medical and financial data. △ Less

Submitted 14 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: 19 pages, 16 figures

arXiv:2204.03715 [pdf, other]

Gravitationally Lensed Black Hole Emission Tomography

Authors: Aviad Levis, Pratul P. Srinivasan, Andrew A. Chael, Ren Ng, Katherine L. Bouman

Abstract: Measurements from the Event Horizon Telescope enabled the visualization of light emission around a black hole for the first time. So far, these measurements have been used to recover a 2D image under the assumption that the emission field is static over the period of acquisition. In this work, we propose BH-NeRF, a novel tomography approach that leverages gravitational lensing to recover the conti… ▽ More Measurements from the Event Horizon Telescope enabled the visualization of light emission around a black hole for the first time. So far, these measurements have been used to recover a 2D image under the assumption that the emission field is static over the period of acquisition. In this work, we propose BH-NeRF, a novel tomography approach that leverages gravitational lensing to recover the continuous 3D emission field near a black hole. Compared to other 3D reconstruction or tomography settings, this task poses two significant challenges: first, rays near black holes follow curved paths dictated by general relativity, and second, we only observe measurements from a single viewpoint. Our method captures the unknown emission field using a continuous volumetric function parameterized by a coordinate-based neural network, and uses knowledge of Keplerian orbital dynamics to establish correspondence between 3D points over time. Together, these enable BH-NeRF to recover accurate 3D emission fields, even in challenging situations with sparse measurements and uncertain orbital dynamics. This work takes the first steps in showing how future measurements from the Event Horizon Telescope could be used to recover evolving 3D emission around the supermassive black hole in our Galactic center. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: To appear in the IEEE Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Supplemental material including accompanying pdf, code, and video highlight can be found in the project page: http://imaging.cms.caltech.edu/bhnerf/

arXiv:2203.15476 [pdf]

Excitation and detection of acoustic phonons in nanoscale systems

Authors: Ryan C Ng, Alexandros El Sachat, Francisco Cespedes, Martin Poblet, Guilhem Madiot, Juliana Jaramillo-Fernandez, Peng Xiao, Omar Florez, Marianna Sledzinska, Clivia Sotomayor-Torres, Emigdio Chavez-Angel

Abstract: Phonons play a key role in the physical properties of materials, and have long been a topic of study in physics. While the effects of phonons had historically been considered to be a hindrance, modern research has shown that phonons can be exploited due to their ability to couple to other excitations and consequently affect the thermal, dielectric, and electronic properties of solid state systems,… ▽ More Phonons play a key role in the physical properties of materials, and have long been a topic of study in physics. While the effects of phonons had historically been considered to be a hindrance, modern research has shown that phonons can be exploited due to their ability to couple to other excitations and consequently affect the thermal, dielectric, and electronic properties of solid state systems, greatly motivating the engineering of phononic structures. Advances in nanofabrication have allowed for structuring and phonon confinement even down to the nanoscale, drastically changing material properties. Despite developments in fabricating such nanoscale devices, the proper manipulation and characterization of phonons continues to be challenging. However, a fundamental understanding of these processes could enable the realization of key applications in diverse fields such as topological phononics, information technologies, sensing, and quantum electrodynamics, especially when integrated with existing electronic and photonic devices. Here, we highlight seven of the available methods for the excitation and detection of acoustic phonons and vibrations in solid materials, as well as advantages, disadvantages, and additional considerations related to their application. We then provide perspectives towards open challenges in nanophononics and how the additional understanding granted by these techniques could serve to enable the next generation of phononic technological applications. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: 50 pages

arXiv:2202.02166 [pdf, other]

doi 10.1038/s41565-022-01178-1

Engineering nanoscale hypersonic phonon transport

Authors: O. Florez, G. Arregui, M. Albrechtsen, R. C. Ng, J. Gomis-Bresco, S. Stobbe, C. M. Sotomayor-Torres, P. D. García

Abstract: Controlling the vibrations in solids is crucial to tailor their mechanical properties and their interaction with light. Thermal vibrations represent a source of noise and dephasing for many physical processes at the quantum level. One strategy to avoid these vibrations is to structure a solid such that it possesses a phononic stop band, i.e., a frequency range over which there are no available mec… ▽ More Controlling the vibrations in solids is crucial to tailor their mechanical properties and their interaction with light. Thermal vibrations represent a source of noise and dephasing for many physical processes at the quantum level. One strategy to avoid these vibrations is to structure a solid such that it possesses a phononic stop band, i.e., a frequency range over which there are no available mechanical modes. Here, we demonstrate the complete absence of mechanical vibrations at room temperature over a broad spectral window, with a 5.3 GHz wide band gap centered at 8.4 GHz in a patterned silicon nanostructure membrane measured using Brillouin light scattering spectroscopy. By constructing a line-defect waveguide, we directly measure GHz localized modes at room temperature. Our experimental results of thermally excited guided mechanical modes at GHz frequencies provides an eficient platform for photon-phonon integration with applications in optomechanics and signal processing transduction. △ Less

Submitted 9 February, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Journal ref: Nature Nanotechnology 17, 947 (2022)

arXiv:2202.01568 [pdf, other]

doi 10.1103/PhysRevB.106.L041403

Tuning terahertz transitions in cyclo[n]carbon rings

Authors: R. A. Ng, M. E. Portnoi, R. R. Hartmann

Abstract: We develop an analytic model for an ideal polyyne ring which describes the induced THz gap in the molecular spectrum due to the Stark effect. This simple model can also be used to describe an odd-dimered cyclocarbon which has undergone a spontaneous symmetry-breaking event (due to the Jahn-Teller effect) as an effective dipole across an ideal ring. We show that both the size of the gap, and the st… ▽ More We develop an analytic model for an ideal polyyne ring which describes the induced THz gap in the molecular spectrum due to the Stark effect. This simple model can also be used to describe an odd-dimered cyclocarbon which has undergone a spontaneous symmetry-breaking event (due to the Jahn-Teller effect) as an effective dipole across an ideal ring. We show that both the size of the gap, and the strength of optical transitions across it, can be modulated by varying the external electric field strength. A THz emission scheme based on optical excitation is proposed, thus paving the way for a new class of THz emitters based on cyclo[n]carbon. △ Less

Submitted 13 July, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. B 106, L041403 (2022)

arXiv:2111.10760 [pdf, other]

doi 10.1038/s41598-022-11742-3

Map** borophene onto graphene: Quasi-exact solutions for guiding potentials in tilted Dirac cones

Authors: R. A. Ng, A. Wild, M. E. Portnoi, R. R. Hartmann

Abstract: We show that if the solutions to the (2+1)-dimensional massless Dirac equation for a given 1D potential are known, then they can be used to obtain the eigenvalues and eigenfunctions for the same potential, orientated at an arbitrary angle, in a tilted anisotropic 2D Dirac material. This simple set of transformations enables all the exact and quasi-exact solutions associated with 1D quantum wells i… ▽ More We show that if the solutions to the (2+1)-dimensional massless Dirac equation for a given 1D potential are known, then they can be used to obtain the eigenvalues and eigenfunctions for the same potential, orientated at an arbitrary angle, in a tilted anisotropic 2D Dirac material. This simple set of transformations enables all the exact and quasi-exact solutions associated with 1D quantum wells in graphene to be applied to the confinement problem in tilted Dirac materials such as borophene. We also show that smooth electron waveguides in tilted Dirac materials can be used to manipulate the degree of valley polarization of quasiparticles travelling along a particular direction of the channel. We examine the particular case of the hyperbolic secant potential to model realistic top-gated structures for valleytronic applications. △ Less

Submitted 21 November, 2021; originally announced November 2021.

Comments: 12 pages, 5 figures and 1 Supplementary Material

Journal ref: Scientific Reports Vol. 12, Article number: 7688 (2022)

arXiv:2111.09465 [pdf, other]

doi 10.1088/1361-6382/ac7278

UniMAP: Model-free detection of unclassified noise transients in LIGO-Virgo data using the Temporal Outlier Factor

Authors: Julian Ding, Raymond Ng, Jess McIver

Abstract: Data from current gravitational wave detectors contains a high rate of transient noise (glitches) that can trigger false detections and obscure true astrophysical events. Existing noise-detection algorithms largely rely on model-based methods that may miss noise transients unwitnessed by auxiliary sensors or with exotic morphologies. We propose the Unicorn Multi-window Anomaly-detection Pipeline (… ▽ More Data from current gravitational wave detectors contains a high rate of transient noise (glitches) that can trigger false detections and obscure true astrophysical events. Existing noise-detection algorithms largely rely on model-based methods that may miss noise transients unwitnessed by auxiliary sensors or with exotic morphologies. We propose the Unicorn Multi-window Anomaly-detection Pipeline (UniMAP): a model-free algorithm to identify and characterize transient noise leveraging the Temporal Outlier Factor (TOF) via a multi-window data-resampling scheme. We show this windowing scheme extends the anomaly detection capabilities of the TOF algorithm to resolve noise transients of arbitrary morphology and duration. We demonstrate the efficacy of this pipeline in detecting glitches during LIGO and Virgo's third observing run, and discuss potential applications. △ Less

Submitted 4 May, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: 16 pages, 6 figures, submitted to Classical and Quantum Gravity

arXiv:2111.01866 [pdf]

3-D PET Image Generation with tumour masks using TGAN

Authors: Robert V Bergen, Jean-Francois Rajotte, Fereshteh Yousefirizi, Ivan S Klyuzhin, Arman Rahmim, Raymond T. Ng

Abstract: Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult due to the lack of training data, labeled samples, and privacy concerns. For this reason, a robust generative method to create synthetic data is highly sought after. However, most three-dimensional image generators require additional image input or are extremely memory intensive.… ▽ More Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult due to the lack of training data, labeled samples, and privacy concerns. For this reason, a robust generative method to create synthetic data is highly sought after. However, most three-dimensional image generators require additional image input or are extremely memory intensive. To address these issues we propose adapting video generation techniques for 3-D image generation. Using the temporal GAN (TGAN) architecture, we show we are able to generate realistic head and neck PET images. We also show that by conditioning the generator on tumour masks, we are able to control the geometry and location of the tumour in the generated images. To test the utility of the synthetic images, we train a segmentation model using the synthetic images. Synthetic images conditioned on real tumour masks are automatically segmented, and the corresponding real images are also segmented. We evaluate the segmentations using the Dice score and find the segmentation algorithm performs similarly on both datasets (0.65 synthetic data, 0.70 real data). Various radionomic features are then calculated over the segmented tumour volumes for each data set. A comparison of the real and synthetic feature distributions show that seven of eight feature distributions had statistically insignificant differences (p>0.05). Correlation coefficients were also calculated between all radionomic features and it is shown that all of the strong statistical correlations in the real data set are preserved in the synthetic data set. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.11005 [pdf, other]

Cavity optomechanics with Anderson-localized optical modes

Authors: Guillermo Arregui, Ryan Cecil Ng, Marcus Albrechtsen, Søren Stobbe, Clivia Marfa Sotomayor Torres, Pedro David García

Abstract: Confining photons in cavities enhances the interactions between light and matter. In cavity optomechanics, this enables a wealth of phenomena ranging from optomechanically induced transparency to macroscopic objects cooled to their motional ground state. Previous work in cavity optomechanics employed devices where ubiquitous structural disorder played no role beyond perturbing resonance frequencie… ▽ More Confining photons in cavities enhances the interactions between light and matter. In cavity optomechanics, this enables a wealth of phenomena ranging from optomechanically induced transparency to macroscopic objects cooled to their motional ground state. Previous work in cavity optomechanics employed devices where ubiquitous structural disorder played no role beyond perturbing resonance frequencies and quality factors. More generally, the interplay between disorder, which must be described by statistical physics, and optomechanical effects has thus far been unexplored. Here, we demonstrate how sidewall roughness in air-slot photonic-crystal waveguides can induce sufficiently strong backscattering of slot-guided light to create Anderson-localized modes with quality factors as high as half a million and mode volumes that are below the diffraction limit. We observe how the interaction between these disorder-induced optical modes and in-plane mechanical modes of the slotted membrane is governed by a distribution of coupling rates, which can exceed $g_{\text{o}}/2π\sim 200$ kHz, leading to mechanical amplification up to self sustained oscillations via optomechanical backaction. Our work constitutes the first steps towards understanding optomechanics in the multiple-scattering regime and opens new perspectives for exploring complex systems with multitude mutually-coupled degrees of freedom. △ Less

Submitted 1 April, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2108.09456 [pdf]

A Modular Design of Continuously Tunable Full Color Plasmonic Pixels with Broken Rotational Symmetry

Authors: Rui Feng, Hao Wang, Yongyin Cao, Ray J. H. Ng, You Sin Tan, Yanxia Zhang, Fangkui Sun, Cheng-Wei Qiu, Joel K. W. Yang, Weiqiang Ding

Abstract: Color tuning is a fascinating and indispensable property in applications such as advanced display, active camouflaging and information encryption. Thus far, a variety of reconfigurable approaches have been implemented to achieve color change. However, it is still a challenge to enable a continuous color tuning over the entire hue range in a simple, stable and rapid manner without changes in config… ▽ More Color tuning is a fascinating and indispensable property in applications such as advanced display, active camouflaging and information encryption. Thus far, a variety of reconfigurable approaches have been implemented to achieve color change. However, it is still a challenge to enable a continuous color tuning over the entire hue range in a simple, stable and rapid manner without changes in configuration and material properties. Here, we demonstrate an all-optical continuously tunable plasmonic pixel scheme via a modular design approach to realize polarization-controlled full color tuning by breaking the intrinsic symmetry of the unit cell layout. The polarization-controlled full color tunable plasmonic pixels consist of three different types of color modules oriented at an angle of 60° with respect to each other, corresponding to three subtractive primary colors. Without changing the structural properties or surrounding environment, the structural colors can be continuously and precisely tuned across all hues by illuminating linearly polarized light with different polarization directions. Meanwhile, the plasmonic pixels can be flexibly customized for various color tuning processes, such as different initial output colors and color tuning sequences, through the appropriate choice of component modules and the elaborate design of module layouts. Furthermore, we extend the color tuning to achromatic colors, white or black, with the utilization of a single module or the introduction of a black module. The proposed polarization-controlled full color tunable plasmonic pixels hold considerable potential to function as next-generation color pixels integrated with liquid-crystal polarizers. △ Less

Submitted 21 August, 2021; originally announced August 2021.

Comments: 39 pages, 17 figures

arXiv:2107.07081 [pdf]

Tunable Mie Resonances in the Visible Spectrum

Authors: Li Lu, Zhaogang Dong, Febiana Tijiptoharsono, Ray Jia Hong Ng, Hongtao Wang, Soroosh Daqiqeh Rezaei, Yunzheng Wang, Hai Sheng Leong, Joel K. W. Yang, Robert E. Simpson

Abstract: Dielectric optical nanoantennas play an important role in color displays, metasurface holograms, and wavefront sha** applications. They usually exploit Mie resonances as supported on nanostructures with high refractive index, such as Si and TiO2. However, these resonances normally cannot be tuned. Although phase change materials, such as the germanium-antimony-tellurium alloys and post transitio… ▽ More Dielectric optical nanoantennas play an important role in color displays, metasurface holograms, and wavefront sha** applications. They usually exploit Mie resonances as supported on nanostructures with high refractive index, such as Si and TiO2. However, these resonances normally cannot be tuned. Although phase change materials, such as the germanium-antimony-tellurium alloys and post transition metal oxides, such as ITO, have been used to tune optical antennas in the near infrared spectrum, tunable dielectric antennae in the visible spectrum remain to be demonstrated. In this paper, we designed and experimentally demonstrated tunable dielectric nanoantenna arrays with Mie resonances in the visible spectrum, exploiting phase transitions in wide-bandgap Sb2S3 nano-resonators. In the amorphous state, Mie resonances in these Sb2S3 nanostructures give rise to a strong structural color in reflection mode. Thermal annealing induced crystallization and laser induced amorphization of the Sb2S3 resonators allow the color to be tuned reversibly. We believe these tunable Sb2S3 nanoantennae arrays will enable a wide variety of tunable nanophotonic applications, such as high-resolution color displays, holographic displays, and miniature LiDAR systems. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: 36 pages, 5 figures in main text, 9 figures in supporting information

arXiv:2106.12285 [pdf]

Schrodinger's Red Pixel by Quasi Bound-State-In-Continuum

Authors: Zhaogang Dong, Lei **, Soroosh Daqiqeh Rezaei, Hao Wang, Yang Chen, Febiana Tjiptoharsono, **fa Ho, Sergey Gorelik, Ray Jia Hong Ng, Qifeng Ruan, Cheng-Wei Qiu, Joel K. W. Yang

Abstract: While structural colors are ubiquitous in nature, saturated reds are mysteriously absent. Hence, a longstanding problem is in fabricating nanostructured surfaces that exhibit reflectance approaching the theoretical limit. This limit is termed the Schrodinger red and demands sharp spectral transitions from "stopband" to a high reflectance "passband" with total suppression of higher-order resonances… ▽ More While structural colors are ubiquitous in nature, saturated reds are mysteriously absent. Hence, a longstanding problem is in fabricating nanostructured surfaces that exhibit reflectance approaching the theoretical limit. This limit is termed the Schrodinger red and demands sharp spectral transitions from "stopband" to a high reflectance "passband" with total suppression of higher-order resonances at blue and green wavelengths. Current approaches based on metallic or dielectric nanoantennas are insufficient to simultaneously meet these conditions. Here, for the 1st time, we designed and fabricated tall Si nanoantenna arrays on quartz substrate to support two partially overlap** y polarized quasi bound-state-in-the-continuum (q-BIC) modes in the red wavelengths with sharp spectral edges. These structures produce possibly the most saturated and brightest reds with ~80% reflectance, exceeding the red vertex in sRGB and even the cadmium red pigment. We employed a gradient descent algorithm with structures supporting q BIC as the starting point. Although the current design is polarization dependent, the proposed paradigm has enabled us to achieve the elusive structural red and the design principle could be generalized to Schrodinger's pixels of other colors. The design is suitable for scale up using other nanofabrication techniques for larger area applications, such as red pixels in displays, decorative coatings, and miniaturized spectrometers with high wavelength selectivity. △ Less

Submitted 23 June, 2021; originally announced June 2021.

Comments: 40 pages, 4 figures in the main text, 15 figures in the supplementary information

arXiv:2103.14024 [pdf, other]

PlenOctrees for Real-time Rendering of Neural Radiance Fields

Authors: Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, Angjoo Kanazawa

Abstract: We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800x800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of sc… ▽ More We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800x800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve view-dependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu.net/plenoctrees △ Less

Submitted 17 August, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

Comments: ICCV 2021 (Oral)

arXiv:2101.07235 [pdf, other]

Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary

Authors: Jean-Francois Rajotte, Sumit Mukherjee, Caleb Robinson, Anthony Ortiz, Christopher West, Juan Lavista Ferres, Raymond T Ng

Abstract: We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning. In particular, we show how a data owner with limited and biased data could benefit from other data owners while kee** data from all the sources private. This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside… ▽ More We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning. In particular, we show how a data owner with limited and biased data could benefit from other data owners while kee** data from all the sources private. This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside local premises. FELICIA works for a large family of Generative Adversarial Networks (GAN) architectures including vanilla and conditional GANs as demonstrated in this work. We show that by using the FELICIA mechanism, a data owner with limited image samples can generate high-quality synthetic images with high utility while neither data owners has to provide access to its data. The sharing happens solely through a central discriminator that has access limited to synthetic data. Here, utility is defined as classification performance on a real test set. We demonstrate these benefits on several realistic healthcare scenarions using benchmark image datasets (MNIST, CIFAR-10) as well as on medical images for the task of skin lesion classification. With multiple experiments, we show that even in the worst cases, combining FELICIA with real data gracefully achieves performance on par with real data while most results significantly improves the utility. △ Less

Submitted 28 August, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: 10 pages, 10 figures

MSC Class: 68W15 ACM Class: I.2.11

arXiv:2012.02189 [pdf, other]

Learned Initializations for Optimizing Coordinate-Based Neural Representations

Authors: Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P. Srinivasan, Jonathan T. Barron, Ren Ng

Abstract: Coordinate-based neural representations have shown significant promise as an alternative to discrete, array-based representations for complex low dimensional signals. However, optimizing a coordinate-based network from randomly initialized weights for each new signal is inefficient. We propose applying standard meta-learning algorithms to learn the initial weight parameters for these fully-connect… ▽ More Coordinate-based neural representations have shown significant promise as an alternative to discrete, array-based representations for complex low dimensional signals. However, optimizing a coordinate-based network from randomly initialized weights for each new signal is inefficient. We propose applying standard meta-learning algorithms to learn the initial weight parameters for these fully-connected networks based on the underlying class of signals being represented (e.g., images of faces or 3D models of chairs). Despite requiring only a minor change in implementation, using these learned initial weights enables faster convergence during optimization and can serve as a strong prior over the signal class being modeled, resulting in better generalization when only partial observations of a given signal are available. We explore these benefits across a variety of tasks, including representing 2D images, reconstructing CT scans, and recovering 3D shapes and scenes from 2D image observations. △ Less

Submitted 23 March, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: Project page: https://www.matthewtancik.com/learnit

arXiv:2010.07017 [pdf]

Computational Skills by Stealth in Secondary School Data Science

Authors: Wesley Burr, Fanny Chevalier, Christopher Collins, Alison L Gibbs, Raymond Ng, Chris Wild

Abstract: The unprecedented growth in the availability of data of all types and qualities and the emergence of the field of data science has provided an impetus to finally realizing the implementation of the full breadth of the Nolan and Temple Lang proposed integration of computing concepts into statistics curricula at all levels in statistics and new data science programs and courses. Moreover, data scien… ▽ More The unprecedented growth in the availability of data of all types and qualities and the emergence of the field of data science has provided an impetus to finally realizing the implementation of the full breadth of the Nolan and Temple Lang proposed integration of computing concepts into statistics curricula at all levels in statistics and new data science programs and courses. Moreover, data science, implemented carefully, opens accessible pathways to stem for students for whom neither mathematics nor computer science are natural affinities, and who would traditionally be excluded. We discuss a proposal for the stealth development of computational skills in students' first exposure to data science through careful, scaffolded exposure to computation and its power. The intent of this approach is to support students, regardless of interest and self-efficacy in coding, in becoming data-driven learners, who are capable of asking complex questions about the world around them, and then answering those questions through the use of data-driven inquiry. This discussion is presented in the context of the International Data Science in Schools Project which recently published computer science and statistics consensus curriculum frameworks for a two-year secondary school data science program, designed to make data science accessible to all. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: 38 pages, 8 figures

arXiv:2010.05382 [pdf]

doi 10.1038/s41377-020-00403-7

Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy

Authors: Kyrollos Yanny, Nick Antipa, William Liberti, Sam Dehaeck, Kristina Monakhova, Fanglin Linda Liu, Konlin Shen, Ren Ng, Laura Waller

Abstract: Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal pha… ▽ More Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal phase mask at the objective's aperture stop. Placing the phase mask at the aperture stop significantly reduces the size of the device, and varying the focal lengths enables a uniform resolution across a wide depth range. The phase mask encodes the 3D fluorescence intensity into a single 2D measurement, and the 3D volume is recovered by solving a sparsity-constrained inverse problem. We provide methods for designing and fabricating the phase mask and an efficient forward model that accounts for the field-varying aberrations in miniature objectives. We demonstrate a prototype that is 17 mm tall and weighs 2.5 grams, achieving 2.76 $μ$m lateral, and 15 $μ$m axial resolution across most of the 900x700x390 $μm^3$ volume at 40 volumes per second. The performance is validated experimentally on resolution targets, dynamic biological samples, and mouse brain tissue. Compared with existing miniature single-shot volume-capture implementations, our system is smaller and lighter and achieves a more than 2x better lateral and axial resolution throughout a 10x larger usable depth range. Our microscope design provides single-shot 3D imaging for applications where a compact platform matters, such as volumetric neural imaging in freely moving animals and 3D motion studies of dynamic samples in incubators and lab-on-a-chip devices. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: Published with Nature Springer in Light: Science and Applications

Journal ref: Light: Science & Applications 9.1 (2020): 1-13

arXiv:2010.03435 [pdf]

Fabrication Development of a Large Area Grating for Out of Plane Beam Coupling

Authors: Jonathan Trisno, Tong Hua Lee, Parvathi Nair S., You Sin Tan, Ray J. H. Ng, Yingyan Huang, Seng Tiong Ho, Joel K. W. Yang

Abstract: We develop a single-layer waveguide surface grating structure to vertically couple near infrared (NIR) light at ~1.55 um wavelength from a large area (~100 um length scale) Si waveguide on a Silicon-On-Insulator (SOI) substrates to free-space for high-power laser applications. Our design approach is based on the optimization of local emission angles and the out-coupling intensities. Simulation res… ▽ More We develop a single-layer waveguide surface grating structure to vertically couple near infrared (NIR) light at ~1.55 um wavelength from a large area (~100 um length scale) Si waveguide on a Silicon-On-Insulator (SOI) substrates to free-space for high-power laser applications. Our design approach is based on the optimization of local emission angles and the out-coupling intensities. Simulation results show that a focal spot with a 1/e2 width of 3.82 um can be achieved at the desired focal position, with 33% (-4.81 dB) simulated source to free-space focusing efficiency, while initial measurements show an efficiency of 22% (-6.58 dB). △ Less

Submitted 7 October, 2020; originally announced October 2020.

arXiv:2009.11362 [pdf, other]

Dense Forecasting of Wildfire Smoke Particulate Matter Using Sparsity Invariant Convolutional Neural Networks

Authors: Renhao Wang, Ashutosh Bhudia, Brandon Dos Remedios, Minnie Teng, Raymond Ng

Abstract: Accurate forecasts of fine particulate matter (PM 2.5) from wildfire smoke are crucial to safeguarding cardiopulmonary public health. Existing forecasting systems are trained on sparse and inaccurate ground truths, and do not take sufficient advantage of important spatial inductive biases. In this work, we present a convolutional neural network which preserves sparsity invariance throughout, and l… ▽ More Accurate forecasts of fine particulate matter (PM 2.5) from wildfire smoke are crucial to safeguarding cardiopulmonary public health. Existing forecasting systems are trained on sparse and inaccurate ground truths, and do not take sufficient advantage of important spatial inductive biases. In this work, we present a convolutional neural network which preserves sparsity invariance throughout, and leverages multitask learning to perform dense forecasts of PM 2.5values. We demonstrate that our model outperforms two existing smoke forecasting systems during the 2018 and 2019 wildfire season in British Columbia, Canada, predicting PM 2.5 at a grid resolution of 10 km, 24 hours in advance with high fidelity. Most interestingly, our model also generalizes to meaningful smoke dispersion patterns despite training with irregularly distributed ground truth PM 2.5 values available in only 0.5% of grid cells. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: Submitted to the 2020 NeurIPS Workshop on Machine learning in Public Health

arXiv:2009.06764 [pdf, other]

Private data sharing between decentralized users through the privGAN architecture

Authors: Jean-Francois Rajotte, Raymond T Ng

Abstract: More data is almost always beneficial for analysis and machine learning tasks. In many realistic situations however, an enterprise cannot share its data, either to keep a competitive advantage or to protect the privacy of the data sources, the enterprise's clients for example. We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data, nor… ▽ More More data is almost always beneficial for analysis and machine learning tasks. In many realistic situations however, an enterprise cannot share its data, either to keep a competitive advantage or to protect the privacy of the data sources, the enterprise's clients for example. We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data, nor the parameters of models that have direct access to the data. The method proposed is based on the privGAN architecture where local GANs are trained on their respective data subsets with an extra penalty from a central discriminator aiming to discriminate the origin of a given fake sample. We demonstrate that this approach, when applied to subsets of various sizes, leads to better utility for the owners than the utility from their real small datasets. The only shared pieces of information are the parameter updates of the central discriminator. The privacy is demonstrated with white-box attacks on the most vulnerable elments of the architecture and the results are close to random guessing. This method would apply naturally in a federated learning setting. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: 6 pages, 9 figures, to be in the proceedings of International Workshop on Privacy and Security in Enterprise Modeling (PriSEM'20)

arXiv:2008.06690 [pdf]

Experimental investigations of acoustic curtains for hospital environment noise mitigations

Authors: Sanjay Kumar, Rui Qin Ng, Heow Pueh Lee

Abstract: The continuous increase of hospital noise levels has become a vital challenge for society. The complex soundscapes in the hospital produce unpleasant noise, which may exceed the prescribed noise level for the patients and healthcare professionals. Previous studies have reported that extended exposure to loud noise may cause auditory and nonauditory disorders in healthcare professionals, medical st… ▽ More The continuous increase of hospital noise levels has become a vital challenge for society. The complex soundscapes in the hospital produce unpleasant noise, which may exceed the prescribed noise level for the patients and healthcare professionals. Previous studies have reported that extended exposure to loud noise may cause auditory and nonauditory disorders in healthcare professionals, medical staff, and patients. Therefore, there is an increased interest for the design and fabrication of effective noise barriers for the hospital premises. Herein, we have performed the thorough experimental investigations on the acoustical performances for PVC coated polyester fabrics and 100 % pure PVC sheets. The performances of these potential acoustic curtains have found to be superior to that of existing acoustic curtains for hospitals. Also, the results showed that the sound transmission class rating of PVC curtains are much higher than the existing commercial acoustic curtains. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: 16 pages, 8 figures

arXiv:2006.10739 [pdf, other]

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

Authors: Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, Ren Ng

Abstract: We show that passing input points through a simple Fourier feature map** enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (N… ▽ More We show that passing input points through a simple Fourier feature map** enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature map** to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: Project page: https://people.eecs.berkeley.edu/~bmild/fourfeat/

arXiv:2005.08925 [pdf, other]

Portrait Shadow Manipulation

Authors: Xuaner Cecilia Zhang, Jonathan T. Barron, Yun-Ta Tsai, Rohit Pandey, Xiuming Zhang, Ren Ng, David E. Jacobs

Abstract: Casually-taken portrait photographs often suffer from unflattering lighting and shadowing because of suboptimal conditions in the environment. Aesthetic qualities such as the position and softness of shadows and the lighting ratio between the bright and dark parts of the face are frequently determined by the constraints of the environment rather than by the photographer. Professionals address this… ▽ More Casually-taken portrait photographs often suffer from unflattering lighting and shadowing because of suboptimal conditions in the environment. Aesthetic qualities such as the position and softness of shadows and the lighting ratio between the bright and dark parts of the face are frequently determined by the constraints of the environment rather than by the photographer. Professionals address this issue by adding light sha** tools such as scrims, bounce cards, and flashes. In this paper, we present a computational approach that gives casual photographers some of this control, thereby allowing poorly-lit portraits to be relit post-capture in a realistic and easily-controllable way. Our approach relies on a pair of neural networks---one to remove foreign shadows cast by external objects, and another to soften facial shadows cast by the features of the subject and to add a synthetic fill light to improve the lighting ratio. To train our first network we construct a dataset of real-world portraits wherein synthetic foreign shadows are rendered onto the face, and we show that our network learns to remove those unwanted shadows. To train our second network we use a dataset of Light Stage scans of human subjects to construct input/output pairs of input images harshly lit by a small light source, and variably softened and fill-lit output images of each face. We propose a way to explicitly encode facial symmetry and show that our dataset and training procedure enable the model to generalize to images taken in the wild. Together, these networks enable the realistic and aesthetically pleasing enhancement of shadows and lights in real-world portrait images △ Less

Submitted 20 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: (updated version); SIGGRAPH 2020;Project webpage: https://people.eecs.berkeley.edu/~cecilia77/project-pages/portrait Video: https://youtu.be/M_qYTXhzyac

arXiv:2005.01322 [pdf, other]

Building Proactive Voice Assistants: When and How (not) to Interact

Authors: O. Miksik, I. Munasinghe, J. Asensio-Cubero, S. Reddy Bethi, S-T. Huang, S. Zylfo, X. Liu, T. Nica, A. Mitrocsak, S. Mezza, R. Beard, R. Shi, R. Ng, P. Mediano, Z. Fountas, S-H. Lee, J. Medvesek, H. Zhuang, Y. Rogers, P. Swietojanski

Abstract: Voice assistants have recently achieved remarkable commercial success. However, the current generation of these devices is typically capable of only reactive interactions. In other words, interactions have to be initiated by the user, which somewhat limits their usability and user experience. We propose, that the next generation of such devices should be able to proactively provide the right infor… ▽ More Voice assistants have recently achieved remarkable commercial success. However, the current generation of these devices is typically capable of only reactive interactions. In other words, interactions have to be initiated by the user, which somewhat limits their usability and user experience. We propose, that the next generation of such devices should be able to proactively provide the right information in the right way at the right time, without being prompted by the user. However, achieving this is not straightforward, since there is the danger it could interrupt what the user is doing too much, resulting in it being distracting or even annoying. Furthermore, it could unwittingly, reveal sensitive/private information to third parties. In this report, we discuss the challenges of develo** proactively initiated interactions, and suggest a framework for when it is appropriate for the device to intervene. To validate our design assumptions, we describe firstly, how we built a functioning prototype and secondly, a user study that was conducted to assess users' reactions and reflections when in the presence of a proactive voice assistant. This pre-print summarises the state, ideas and progress towards a proactive device as of autumn 2018. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 17 pages, technical report

arXiv:2004.05264 [pdf]

Pseudo-real-time retinal layer segmentation for high-resolution adaptive optics optical coherence tomography

Authors: Worawee Janpongsri, Joey Huang, Ringo Ng, Daniel J. Wahl, Marinko V. Sarunic, Yifan Jian

Abstract: We present a pseudo-real-time retinal layer segmentation for high-resolution Sensorless Adaptive Optics-Optical Coherence Tomography (SAO-OCT). Our pseudo-real-time segmentation method is based on Dijkstra's algorithm that uses the intensity of pixels and the vertical gradient of the image to find the minimum cost in a geometric graph formulation within a limited search region. It segments six ret… ▽ More We present a pseudo-real-time retinal layer segmentation for high-resolution Sensorless Adaptive Optics-Optical Coherence Tomography (SAO-OCT). Our pseudo-real-time segmentation method is based on Dijkstra's algorithm that uses the intensity of pixels and the vertical gradient of the image to find the minimum cost in a geometric graph formulation within a limited search region. It segments six retinal layer boundaries in an iterative process according to their order of prominence. The segmentation time is strongly correlated to the number of retinal layers to be segmented. Our program permits en face images to be extracted during data acquisition to guide the depth specific focus control and depth dependent aberration correction for high-resolution SAO-OCT systems. The average processing times for our entire pipeline for segmenting six layers in a retinal B-scan of 496x400 pixels and 240x400 pixels are around 25.60 ms and 13.76 ms, respectively. When reducing the number of layers segmented to only two layers, the time required for a 240x400 pixel image is 8.26 ms. △ Less

Submitted 10 April, 2020; originally announced April 2020.

arXiv:2003.08934 [pdf, other]

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Authors: Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng

Abstract: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction… ▽ More We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(θ, φ)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons. △ Less

Submitted 3 August, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

Comments: ECCV 2020 (oral). Project page with videos and code: http://tancik.com/nerf

arXiv:1905.13221 [pdf, other]

Video from Stills: Lensless Imaging with Rolling Shutter

Authors: Nick Antipa, Patrick Oare, Emrah Bostan, Ren Ng, Laura Waller

Abstract: Because image sensor chips have a finite bandwidth with which to read out pixels, recording video typically requires a trade-off between frame rate and pixel count. Compressed sensing techniques can circumvent this trade-off by assuming that the image is compressible. Here, we propose using multiplexing optics to spatially compress the scene, enabling information about the whole scene to be sample… ▽ More Because image sensor chips have a finite bandwidth with which to read out pixels, recording video typically requires a trade-off between frame rate and pixel count. Compressed sensing techniques can circumvent this trade-off by assuming that the image is compressible. Here, we propose using multiplexing optics to spatially compress the scene, enabling information about the whole scene to be sampled from a row of sensor pixels, which can be read off quickly via a rolling shutter CMOS sensor. Conveniently, such multiplexing can be achieved with a simple lensless, diffuser-based imaging system. Using sparse recovery methods, we are able to recover 140 video frames at over 4,500 frames per second, all from a single captured image with a rolling shutter sensor. Our proof-of-concept system uses easily-fabricated diffusers paired with an off-the-shelf sensor. The resulting prototype enables compressive encoding of high frame rate video into a single rolling shutter exposure, and exceeds the sampling-limited performance of an equivalent global shutter system for sufficiently sparse objects. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 8 pages, 7 figures, IEEE International Conference on Computational Photography 2019, Tokyo

arXiv:1905.06326 [pdf, other]

Synthetic Defocus and Look-Ahead Autofocus for Casual Videography

Authors: Xuaner Zhang, Kevin Matzen, Vivien Nguyen, Dillon Yao, You Zhang, Ren Ng

Abstract: In cinema, large camera lenses create beautiful shallow depth of field (DOF), but make focusing difficult and expensive. Accurate cinema focus usually relies on a script and a person to control focus in realtime. Casual videographers often crave cinematic focus, but fail to achieve it. We either sacrifice shallow DOF, as in smartphone videos; or we struggle to deliver accurate focus, as in videos… ▽ More In cinema, large camera lenses create beautiful shallow depth of field (DOF), but make focusing difficult and expensive. Accurate cinema focus usually relies on a script and a person to control focus in realtime. Casual videographers often crave cinematic focus, but fail to achieve it. We either sacrifice shallow DOF, as in smartphone videos; or we struggle to deliver accurate focus, as in videos from larger cameras. This paper is about a new approach in the pursuit of cinematic focus for casual videography. We present a system that synthetically renders refocusable video from a deep DOF video shot with a smartphone, and analyzes future video frames to deliver context-aware autofocus for the current frame. To create refocusable video, we extend recent machine learning methods designed for still photography, contributing a new dataset for machine training, a rendering model better suited to cinema focus, and a filtering solution for temporal coherence. To choose focus accurately for each frame, we demonstrate autofocus that looks at upcoming video frames and applies AI-assist modules such as motion, face, audio and saliency detection. We also show that autofocus benefits from machine learning and a large-scale video dataset with focus annotation, where we use our RVR-LAAF GUI to create this sizable dataset efficiently. We deliver, for example, a shallow DOF video where the autofocus transitions onto each person before she begins to speak. This is impossible for conventional camera autofocus because it would require seeing into the future. △ Less

Submitted 21 May, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: (V2 author name corrected) SIGGRAPH 2019; project website: https://ceciliavision.github.io/vid-auto-focus/

arXiv:1905.05913 [pdf]

doi 10.1038/s41467-019-12360-w

Structural Color 3D Printing By Shrinking Photonic Crystals

Authors: Ye**g Liu, Hao Wang, **fa Ho, Ryan C. Ng, Ray J. H. Ng, Valerian H. Hall-Chen, Eleen H. H. Koay, Zhaogang Dong, Hailong Liu, Cheng-Wei Qiu, Julia R. Greer, Joel K. W. Yang

Abstract: The rings, spots and stripes found on some butterflies, Pachyrhynchus weevils, and many chameleons are notable examples of natural organisms employing photonic crystals to produce colorful patterns. Despite advances in nanotechnology, we still lack the ability to print arbitrary colors and shapes in all three dimensions at this microscopic length scale. Commercial nanoscale 3D printers based on tw… ▽ More The rings, spots and stripes found on some butterflies, Pachyrhynchus weevils, and many chameleons are notable examples of natural organisms employing photonic crystals to produce colorful patterns. Despite advances in nanotechnology, we still lack the ability to print arbitrary colors and shapes in all three dimensions at this microscopic length scale. Commercial nanoscale 3D printers based on two-photon polymerization are incapable of patterning photonic crystal structures with the requisite ~300 nm lattice constant to achieve photonic stopbands/ bandgaps in the visible spectrum and generate colors. Here, we introduce a means to produce 3D-printed photonic crystals with a 5x reduction in lattice constants (periodicity as small as 280 nm), achieving sub-100-nm features with a full range of colors. The reliability of this process enables us to engineer the bandstructures of woodpile photonic crystals that match experiments, showing that observed colors can be attributed to either slow light modes or stopbands. With these lattice structures as 3D color volumetric elements (voxels), we printed 3D microscopic scale objects, including the first multi-color microscopic model of the Eiffel Tower measuring only 39-microns tall with a color pixel size of 1.45 microns. The technology to print 3D structures in color at the microscopic scale promises the direct patterning and integration of spectrally selective devices, such as photonic crystal-based color filters, onto free-form optical elements and curved surfaces. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1905.05169 [pdf, other]

Zoom To Learn, Learn To Zoom

Authors: Xuaner Cecilia Zhang, Qifeng Chen, Ren Ng, Vladlen Koltun

Abstract: This paper shows that when applying machine learning to digital zoom for photography, it is beneficial to use real, RAW sensor data for training. Existing learning-based super-resolution methods do not use real sensor data, instead operating on RGB images. In practice, these approaches result in loss of detail and accuracy in their digitally zoomed output when zooming in on distant image regions.… ▽ More This paper shows that when applying machine learning to digital zoom for photography, it is beneficial to use real, RAW sensor data for training. Existing learning-based super-resolution methods do not use real sensor data, instead operating on RGB images. In practice, these approaches result in loss of detail and accuracy in their digitally zoomed output when zooming in on distant image regions. We also show that synthesizing sensor data by resampling high-resolution RGB images is an oversimplified approximation of real sensor data and noise, resulting in worse image quality. The key barrier to using real sensor data for training is that ground truth high-resolution imagery is missing. We show how to obtain the ground-truth data with optically zoomed images and contribute a dataset, SR-RAW, for real-world computational zoom. We use SR-RAW to train a deep network with a novel contextual bilateral loss (CoBi) that delivers critical robustness to mild misalignment in input-output image pairs. The trained network achieves state-of-the-art performance in 4X and 8X computational zoom. △ Less

Submitted 13 May, 2019; originally announced May 2019.

Comments: CVPR 2019, https://ceciliavision.github.io/project-pages/project-zoom.html (paper, video, supp, code, dataset)

arXiv:1905.00889 [pdf, other]

Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

Authors: Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, Abhishek Kar

Abstract: We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an… ▽ More We present a practical and robust deep learning solution for capturing and rendering novel views of complex real world scenes for virtual exploration. Previous approaches either require intractably dense view sampling or provide little to no guidance for how users should sample views of a scene to reliably render high-quality novel views. Instead, we propose an algorithm for view synthesis from an irregular grid of sampled views that first expands each sampled view into a local light field via a multiplane image (MPI) scene representation, then renders novel views by blending adjacent local light fields. We extend traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using our algorithm. In practice, we apply this bound to capture and render views of real world scenes that achieve the perceptual quality of Nyquist rate view sampling while using up to 4000x fewer views. We demonstrate our approach's practicality with an augmented reality smartphone app that guides users to capture input images of a scene and viewers that enable realtime virtual exploration on desktop and mobile platforms. △ Less

Submitted 2 May, 2019; originally announced May 2019.

Comments: SIGGRAPH 2019. Project page with video and code: http://people.eecs.berkeley.edu/~bmild/llff/

Showing 1–50 of 72 results for author: Ng, R