Search | arXiv e-print repository

Helical Phononic Modes Induced by a Screw Dislocation

Authors: Yun Zhou, Robert Davis, Li Chen, Erda Wen, Prabhakar Bandaru, Daniel Sievenpiper

Abstract: In this study, we investigate a one-dimensional (1D) unidirectional phononic waveguide embedded within a three-dimensional (3D) hexagonal close-packed phononic crystal, achieved by the introduction of a screw dislocation. This approach does not rely on the non-trivial topological characteristics of the 3D crystal. We discover that this dislocation induces a pair of helical modes, characterized by… ▽ More In this study, we investigate a one-dimensional (1D) unidirectional phononic waveguide embedded within a three-dimensional (3D) hexagonal close-packed phononic crystal, achieved by the introduction of a screw dislocation. This approach does not rely on the non-trivial topological characteristics of the 3D crystal. We discover that this dislocation induces a pair of helical modes, characterized by their orthogonal $x$- and $y$-directional displacements being out of phase by 90 degrees, which results in a distinctive rotational motion. These helical modes demonstrate directional propagation, tightly linked to the helicity of the screw dislocation. Through considerations of symmetry, we reveal that the emergence of these helical modes is governed by the symmetry of the screw dislocation itself. Our findings not only provide insights into the interplay between dislocation-induced symmetry and wave propagation in phononic systems but also open new avenues for designing directionally selective waveguides without relying on the crystal's topological properties. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 13 pages, 4 figures

arXiv:2403.02545 [pdf, other]

Wukong: Towards a Scaling Law for Large-Scale Recommendation

Authors: Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen

Abstract: Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets.… ▽ More Scaling laws play an instrumental role in the sustainable improvement in model quality. Unfortunately, recommendation models to date do not exhibit such laws similar to those observed in the domain of large language models, due to the inefficiencies of their upscaling mechanisms. This limitation poses significant challenges in adapting these models to increasingly more complex real-world datasets. In this paper, we propose an effective network architecture based purely on stacked factorization machines, and a synergistic upscaling strategy, collectively dubbed Wukong, to establish a scaling law in the domain of recommendation. Wukong's unique design makes it possible to capture diverse, any-order of interactions simply through taller and wider layers. We conducted extensive evaluations on six public datasets, and our results demonstrate that Wukong consistently outperforms state-of-the-art models quality-wise. Further, we assessed Wukong's scalability on an internal, large-scale dataset. The results show that Wukong retains its superiority in quality over state-of-the-art models, while holding the scaling law across two orders of magnitude in model complexity, extending beyond 100 GFLOP/example, where prior arts fall short. △ Less

Submitted 4 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 12 pages

arXiv:2403.00877 [pdf, other]

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9x speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales. △ Less

Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2308.02511 [pdf, other]

Multifunctional Metasurface: Simultaneous Beam Steering, Polarization Conversion and Phase Offset

Authors: Xiaozhen Yang, Erda Wen, Dinesh Bharadia, Daniel F. Sievenpiper

Abstract: A varactor-based reconfigurable multifunctional metasurface capable of simultaneous beam steering, polarization conversion and phase offset is proposed in this paper. The unit cell is designed to naturally decompose the incident waves into two equal amplitude orthogonal linear components, and by integrating varactors, the reflection phase of the field components can be engineered from… ▽ More A varactor-based reconfigurable multifunctional metasurface capable of simultaneous beam steering, polarization conversion and phase offset is proposed in this paper. The unit cell is designed to naturally decompose the incident waves into two equal amplitude orthogonal linear components, and by integrating varactors, the reflection phase of the field components can be engineered from $-180^{\circ}$ to $180^{\circ}$.Taking advantage of the infinite states of the varactors, this design integrates a new function, the phase offset. After simulation validation of its capability, a four-layer $7$ by $6$ unit one-dimensional prototype is fabricated as a printed circuit board. It is experimentally demonstrated that it switches between X/Y and circular polarization with more than $10$ dB cross polarization isolation, while reaching $\pm45^{\circ}$ steering and $\pm180^{\circ}$ phase offset. △ Less

Submitted 27 July, 2023; originally announced August 2023.

arXiv:2307.11096 [pdf, other]

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

Authors: Xuewei Wang, Qiang **, Shengyu Huang, Min Zhang, Xi Liu, Zhengli Zhao, Yukun Chen, Zhengyu Zhang, Jiyan Yang, Ellie Wen, Sagar Chordia, Wenlin Chen, Qin Huang

Abstract: Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads… ▽ More Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Accepted by AdKDD 23

arXiv:2306.03381 [pdf, other]

VR.net: A Real-world Dataset for Virtual Reality Motion Sickness Research

Authors: Elliott Wen, Chitralekha Gupta, Prasanth Sasikumar, Mark Billinghurst, James Wilmott, Emily Skow, Arindam Dey, Suranga Nanayakkara

Abstract: Researchers have used machine learning approaches to identify motion sickness in VR experience. These approaches demand an accurately-labeled, real-world, and diverse dataset for high accuracy and generalizability. As a starting point to address this need, we introduce `VR.net', a dataset offering approximately 12-hour gameplay videos from ten real-world games in 10 diverse genres. For each video… ▽ More Researchers have used machine learning approaches to identify motion sickness in VR experience. These approaches demand an accurately-labeled, real-world, and diverse dataset for high accuracy and generalizability. As a starting point to address this need, we introduce `VR.net', a dataset offering approximately 12-hour gameplay videos from ten real-world games in 10 diverse genres. For each video frame, a rich set of motion sickness-related labels, such as camera/object movement, depth field, and motion flow, are accurately assigned. Building such a dataset is challenging since manual labeling would require an infeasible amount of time. Instead, we utilize a tool to automatically and precisely extract ground truth data from 3D engines' rendering pipelines without accessing VR games' source code. We illustrate the utility of VR.net through several applications, such as risk factor detection and sickness level prediction. We continuously expand VR.net and envision its next version offering 10X more data than the current form. We believe that the scale, accuracy, and diversity of VR.net can offer unparalleled opportunities for VR motion sickness research and beyond. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.11899 [pdf, other]

Real-data-driven Real-time Reconfigurable Microwave Reflective Surface

Authors: Erda Wen, Xiaozhen Yang, Daniel F. Sievenpiper

Abstract: Manipulating the electromagnetic (EM) reflection behavior from an arbitrary surface dynamically on arbitrary design goals is an ultimate ambition for many EM stealth and communication problems, yet it is nearly impossible to accomplish with conventional analysis and optimization techniques. In this paper we present a reconfigurable conformal metasurface prototype as well as a workflow that enables… ▽ More Manipulating the electromagnetic (EM) reflection behavior from an arbitrary surface dynamically on arbitrary design goals is an ultimate ambition for many EM stealth and communication problems, yet it is nearly impossible to accomplish with conventional analysis and optimization techniques. In this paper we present a reconfigurable conformal metasurface prototype as well as a workflow that enables it to respond to multiple design targets on the reflection pattern with extremely low on-site computing power and time. The metasurface is driven by a sequential tandem neural network which is pre-trained using actual experimental data, avoiding any possible errors that may arise from calculation, simulation or manufacturing tolerances. This platform empowers the surface to operate accurately in a complex environment including varying incident angle and operating frequency, or even with other scatterers present close to the surface. The proposed data-driven approach requires minimum amount of prior knowledge and human effort yet provides maximized versatility on the reflection control, step** towards the end form of an artificial-intelligence-based tunable EM surface. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.02207 [pdf, other]

All-passive Microwave-Diode Nonreciprocal Metasurface

Authors: Xiaozhen Yang, Erda Wen, Daniel F. Sievenpiper

Abstract: Breaking reciprocity in the microwave frequency range will have important implications for modern electronic systems. Since it usually involves bulky biasing magnets or complex spatial-temporal modulations, exploring a lightweight, all-passive approach becomes intriguing. Starting from a circuit model, we theoretically demonstrate the nonreciprocal behaviour on a transmission line building block c… ▽ More Breaking reciprocity in the microwave frequency range will have important implications for modern electronic systems. Since it usually involves bulky biasing magnets or complex spatial-temporal modulations, exploring a lightweight, all-passive approach becomes intriguing. Starting from a circuit model, we theoretically demonstrate the nonreciprocal behaviour on a transmission line building block creating a strong field asymmetry with a switchable matching stub to enable two distinct working states. After translating to an electromagnetic model, this concept is first proved by simulation and then experimentally verified on a microstrip-line-based diode-integrated metasurface showing nonreciprocal transmission. This printed circuit board design is expected to find various applications in electromagnetic protecting layers, communication systems, microwave isolators and circulators. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2304.03450 [pdf, other]

Striving for Authentic and Sustained Technology Use In the Classroom: Lessons Learned from a Longitudinal Evaluation of a Sensor-based Science Education Platform

Authors: Yvonne Chua, Sankha Cooray, Juan Pablo Forero Cortes, Paul Denny, Sonia Dupuch, Dawn L Garbett, Alaeddin Nassani, Jiashuo Cao, Hannah Qiao, Andrew Reis, Deviana Reis, Philipp M. Scholl, Priyashri Kamlesh Sridhar, Hussel Suriyaarachchi, Fiona Taimana, Vanessa Tanga, Chamod Weerasinghe, Elliott Wen, Michelle Wu, Qin Wu, Haimo Zhang, Suranga Nanayakkara

Abstract: Technology integration in educational settings has led to the development of novel sensor-based tools that enable students to measure and interact with their environment. Although reports from using such tools can be positive, evaluations are often conducted under controlled conditions and short timeframes. There is a need for longitudinal data collected in realistic classroom settings. However, s… ▽ More Technology integration in educational settings has led to the development of novel sensor-based tools that enable students to measure and interact with their environment. Although reports from using such tools can be positive, evaluations are often conducted under controlled conditions and short timeframes. There is a need for longitudinal data collected in realistic classroom settings. However, sustained and authentic classroom use requires technology platforms to be seen by teachers as both easy to use and of value. We describe our development of a sensor-based platform to support science teaching that followed a 14-month user-centered design process. We share insights from this design and development approach, and report findings from a 6-month large-scale evaluation involving 35 schools and 1245 students. We share lessons learnt, including that technology integration is not an educational goal per se and that technology should be a transparent tool to enable students to achieve their learning goals. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2302.08446 [pdf]

Engineering Robust Metallic Zero-Mode States in Olympicene Graphene Nanoribbons

Authors: Ryan D. McCurdy, Aidan Delgado, **gwei Jiang, Junmian Zhu, Ethan Chi Ho Wen, Raymond E. Blackwell, Gregory C. Veber, Shenkai Wang, Steven G. Louie, Felix R. Fischer

Abstract: Metallic graphene nanoribbons (GNRs) represent a critical component in the toolbox of low-dimensional functional materials technolo-gy serving as 1D interconnects capable of both electronic and quantum information transport. The structural constraints imposed by on-surface bottom-up GNR synthesis protocols along with the limited control over orientation and sequence of asymmetric monomer building… ▽ More Metallic graphene nanoribbons (GNRs) represent a critical component in the toolbox of low-dimensional functional materials technolo-gy serving as 1D interconnects capable of both electronic and quantum information transport. The structural constraints imposed by on-surface bottom-up GNR synthesis protocols along with the limited control over orientation and sequence of asymmetric monomer building blocks during the radical step-growth polymerization has plagued the design and assembly of metallic GNRs. Here we report the regioregular synthesis of GNRs hosting robust metallic states by embedding a symmetric zero-mode superlattice along the backbone of a GNR. Tight-binding electronic structure models predict a strong nearest-neighbor electron hop** interaction between adjacent zero-mode states resulting in a dispersive metallic band. First principles DFT-LDA calculations confirm this prediction and the robust, metallic zero-mode band of olympicene GNRs (oGNRs) is experimentally corroborated by scanning tunneling spectroscopy. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 8 pages, 4 figures

arXiv:2210.02627 [pdf, other]

Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering

Authors: Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, Suranga Nanayakkara

Abstract: Retrieval Augment Generation (RAG) is a recent advancement in Open-Domain Question Answering (ODQA). RAG has only been trained and explored with a Wikipedia-based external knowledge base and is not optimized for use in other specialized domains such as healthcare and news. In this paper, we evaluate the impact of joint training of the retriever and generator components of RAG for the task of domai… ▽ More Retrieval Augment Generation (RAG) is a recent advancement in Open-Domain Question Answering (ODQA). RAG has only been trained and explored with a Wikipedia-based external knowledge base and is not optimized for use in other specialized domains such as healthcare and news. In this paper, we evaluate the impact of joint training of the retriever and generator components of RAG for the task of domain adaptation in ODQA. We propose \textit{RAG-end2end}, an extension to RAG, that can adapt to a domain-specific knowledge base by updating all components of the external knowledge base during training. In addition, we introduce an auxiliary training signal to inject more domain-specific knowledge. This auxiliary signal forces \textit{RAG-end2end} to reconstruct a given sentence by accessing the relevant information from the external knowledge base. Our novel contribution is unlike RAG, RAG-end2end does joint training of the retriever and generator for the end QA task and domain adaptation. We evaluate our approach with datasets from three domains: COVID-19, News, and Conversations, and achieve significant performance improvements compared to the original RAG model. Our work has been open-sourced through the Huggingface Transformers library, attesting to our work's credibility and technical consistency. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: This paper is awaiting publication at Transactions of the Association for Computational Linguistics. This is a pre-MIT Press publication version. For associated huggingface transformers code, see https://github.com/huggingface/transformers/tree/main/examples/research_projects/rag-end2end-retriever

arXiv:2205.07333 [pdf, other]

Trucks Don't Mean Trump: Diagnosing Human Error in Image Analysis

Authors: J. D. Zamfirescu-Pereira, Jerry Chen, Emily Wen, Allison Koenecke, Nikhil Garg, Emma Pierson

Abstract: Algorithms provide powerful tools for detecting and dissecting human bias and error. Here, we develop machine learning methods to to analyze how humans err in a particular high-stakes task: image interpretation. We leverage a unique dataset of 16,135,392 human predictions of whether a neighborhood voted for Donald Trump or Joe Biden in the 2020 US election, based on a Google Street View image. We… ▽ More Algorithms provide powerful tools for detecting and dissecting human bias and error. Here, we develop machine learning methods to to analyze how humans err in a particular high-stakes task: image interpretation. We leverage a unique dataset of 16,135,392 human predictions of whether a neighborhood voted for Donald Trump or Joe Biden in the 2020 US election, based on a Google Street View image. We show that by training a machine learning estimator of the Bayes optimal decision for each image, we can provide an actionable decomposition of human error into bias, variance, and noise terms, and further identify specific features (like pickup trucks) which lead humans astray. Our methods can be applied to ensure that human-in-the-loop decision-making is accurate and fair and are also applicable to black-box algorithmic systems. △ Less

Submitted 15 May, 2022; originally announced May 2022.

Comments: To be published in FAccT 2022

arXiv:2203.11014 [pdf, other]

DHEN: A Deep and Hierarchical Ensemble Network for Large-Scale Click-Through Rate Prediction

Authors: Buyun Zhang, Liang Luo, Xi Liu, Jay Li, Zeliang Chen, Weilin Zhang, Xiaohan Wei, Yuchen Hao, Michael Tsang, Wenjun Wang, Yang Liu, Huayu Li, Yasmine Badr, Jongsoo Park, Jiyan Yang, Dheevatsa Mudigere, Ellie Wen

Abstract: Learning feature interactions is important to the model performance of online advertising services. As a result, extensive efforts have been devoted to designing effective architectures to learn feature interactions. However, we observe that the practical performance of those designs can vary from dataset to dataset, even when the order of interactions claimed to be captured is the same. That indi… ▽ More Learning feature interactions is important to the model performance of online advertising services. As a result, extensive efforts have been devoted to designing effective architectures to learn feature interactions. However, we observe that the practical performance of those designs can vary from dataset to dataset, even when the order of interactions claimed to be captured is the same. That indicates different designs may have different advantages and the interactions captured by them have non-overlap** information. Motivated by this observation, we propose DHEN - a deep and hierarchical ensemble architecture that can leverage strengths of heterogeneous interaction modules and learn a hierarchy of the interactions under different orders. To overcome the challenge brought by DHEN's deeper and multi-layer structure in training, we propose a novel co-designed training system that can further improve the training efficiency of DHEN. Experiments of DHEN on large-scale dataset from CTR prediction tasks attained 0.27\% improvement on the Normalized Entropy (NE) of prediction and 1.2x better training throughput than state-of-the-art baseline, demonstrating their effectiveness in practice. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2201.05907 [pdf, ps, other]

Designing Topological Defect Lines Protected by Gauge-dependent Symmetry Indicators

Authors: Erda Wen, Dia'aaldin J. Bisharat, Robert J. Davis, Xiaozhen Yang, Daniel F. Sievenpiper

Abstract: Symmetry indicators are a modern tool for characterizing topological phases that require only minimal computational expense but provide an elegant means of designing practical devices. This paper demonstrates how a rotational symmetry indicator can be used to construct and characterize a topologically robust waveguide, which is then verified experimentally on a printed circuit board (PCB) platform… ▽ More Symmetry indicators are a modern tool for characterizing topological phases that require only minimal computational expense but provide an elegant means of designing practical devices. This paper demonstrates how a rotational symmetry indicator can be used to construct and characterize a topologically robust waveguide, which is then verified experimentally on a printed circuit board (PCB) platform. The design takes advantage of the real-space gauge-dependency of the symmetry indicators and adopts a $C_6$ lattice with simple shifts, forming a defect line supporting topological edge modes. It is shown that the modes can realize the same features as previous topological waveguides, but in addition possesses a greater degree of reconfigurability and the unique ability to form a one-way termination. Moreover, the design illustrates the critical role real space information plays in determining the topological properties of photonic crystals, enabling a wider range of possible realizations. △ Less

Submitted 15 January, 2022; originally announced January 2022.

arXiv:2111.04502 [pdf, other]

doi 10.1109/LAWP.2022.3158590

Power-dependent Reflective Metasurface with Self-induced Bandgap

Authors: Xiaozhen Yang, Erda Wen, Daniel F. Sievenpiper

Abstract: A metallic ring based, diode-integrated, low-profile, power-dependent, reflective metasurface working from 3 GHz to 3.6 GHz is proposed in this letter. Unlike the previous study which shifts a band up and down to change the impedance of the surface, the triggering of the diodes directly transforms the structure from a surface wave supportive state to a self-induced bandgap topology if exposed to h… ▽ More A metallic ring based, diode-integrated, low-profile, power-dependent, reflective metasurface working from 3 GHz to 3.6 GHz is proposed in this letter. Unlike the previous study which shifts a band up and down to change the impedance of the surface, the triggering of the diodes directly transforms the structure from a surface wave supportive state to a self-induced bandgap topology if exposed to high power RF illumination. We demonstrate the concept by conducting the EM-circuit co-simulation and measurements for a 6 by 8 unit 2D prototype. Near field scan experiments verify that the proposed topology works in two distinct states, the ON and OFF state, and high-power measurements prove that the reflection varies with the incident signal power. The highest 10 dB decrement in transmission occurs at 3.3 GHz with 52 dBm illumination. This structure can be used to protect sensitive devices from large signals while otherwise supporting a communication channel for small signals. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.12843 [pdf, other]

Broadband time-modulated absorber beyond the Bode-Fano limit by energy trap**

Authors: Xiaozhen Yang, Erda Wen, Daniel F. Sievenpiper

Abstract: Wide-band absorption is a popular topic in microwave engineering to protect sensitive devices against broadband sources. However, the Bode-Fano criterion defines the trade-off between bandwidth and efficiency for all passive, linear, time-invariant systems. In this letter, we propose a broadband absorber beyond the Bode-Fano limit by creating an energy trap using time-modulated switch/diodes. This… ▽ More Wide-band absorption is a popular topic in microwave engineering to protect sensitive devices against broadband sources. However, the Bode-Fano criterion defines the trade-off between bandwidth and efficiency for all passive, linear, time-invariant systems. In this letter, we propose a broadband absorber beyond the Bode-Fano limit by creating an energy trap using time-modulated switch/diodes. This work starts with an ideal circuit model to prove the concept, followed by two EM realizations - a freuqnecy selective surface (FSS) approach for general bandwidth broadening and a low-profile PCB design. The prototype of the latter is built and measured, demonstrating a Bode-Fano integral larger than one. This approach paves a way to many practical ultra-wide band absorber designs. △ Less

Submitted 4 October, 2021; originally announced October 2021.

arXiv:2109.02436 [pdf, other]

ReLaX: Retinal Layer Attribution for Guided Explanations of Automated Optical Coherence Tomography Classification

Authors: Evan Wen, Rebecca Sorenson, Max Ehrlich

Abstract: 30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning… ▽ More 30 million Optical Coherence Tomography (OCT) imaging tests are issued annually to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained eye care professionals who are still prone to making errors. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we present ReLaX, a novel deep learning framework for explainable, accurate classification of retinal pathologies which achieves state-of-the-art accuracy. Furthermore, we emphasize producing both qualitative and quantitative explanations of the model's decisions. While previous works use pixel-level attribution methods for generating model explanations, our work uses a novel retinal layer attribution method for producing rich qualitative and quantitative model explanations. ReLaX determines the importance of each retinal layer by combining heatmaps with an OCT segmentation model. Our work is the first to produce detailed quantitative explanations of a model's predictions in this way. The combination of accuracy and interpretability can be clinically applied for accessible, high-quality patient care. △ Less

Submitted 1 October, 2022; v1 submitted 3 September, 2021; originally announced September 2021.

Comments: ECCV 2022 Medical Computer Vision Workshop

arXiv:2106.11517 [pdf, ps, other]

Fine-tune the Entire RAG Architecture (including DPR retriever) for Question-Answering

Authors: Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Suranga Nanayakkara

Abstract: In this paper, we illustrate how to fine-tune the entire Retrieval Augment Generation (RAG) architecture in an end-to-end manner. We highlighted the main engineering challenges that needed to be addressed to achieve this objective. We also compare how end-to-end RAG architecture outperforms the original RAG architecture for the task of question answering. We have open-sourced our implementation in… ▽ More In this paper, we illustrate how to fine-tune the entire Retrieval Augment Generation (RAG) architecture in an end-to-end manner. We highlighted the main engineering challenges that needed to be addressed to achieve this objective. We also compare how end-to-end RAG architecture outperforms the original RAG architecture for the task of question answering. We have open-sourced our implementation in the HuggingFace Transformers library. △ Less

Submitted 21 June, 2021; originally announced June 2021.

Comments: for associated code, see https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever

arXiv:2105.12676 [pdf, other]

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Authors: Zhaoxia, Deng, Jongsoo Park, ** Tak Peter Tang, Haixin Liu, Jie, Yang, Hector Yuen, Jianyu Huang, Daya Khudia, Xiaohan Wei, Ellie Wen, Dhruv Choudhary, Raghuraman Krishnamoorthi, Carole-Jean Wu, Satish Nadathur, Changkyu Kim, Maxim Naumov, Sam Naghshineh, Mikhail Smelyanskiy

Abstract: Tremendous success of machine learning (ML) and the unabated growth in ML model complexity motivated many ML-specific designs in both CPU and accelerator architectures to speed up the model inference. While these architectures are diverse, highly optimized low-precision arithmetic is a component shared by most. Impressive compute throughputs are indeed often exhibited by these architectures on ben… ▽ More Tremendous success of machine learning (ML) and the unabated growth in ML model complexity motivated many ML-specific designs in both CPU and accelerator architectures to speed up the model inference. While these architectures are diverse, highly optimized low-precision arithmetic is a component shared by most. Impressive compute throughputs are indeed often exhibited by these architectures on benchmark ML models. Nevertheless, production models such as recommendation systems important to Facebook's personalization services are demanding and complex: These systems must serve billions of users per month responsively with low latency while maintaining high prediction accuracy, notwithstanding computations with many tens of billions parameters per inference. Do these low-precision architectures work well with our production recommendation systems? They do. But not without significant effort. We share in this paper our search strategies to adapt reference recommendation models to low-precision hardware, our optimization of low-precision compute kernels, and the design and development of tool chain so as to maintain our models' accuracy throughout their lifespan during which topic trends and users' interests inevitably evolve. Practicing these low-precision technologies helped us save datacenter capacities while deploying models with up to 5X complexity that would otherwise not be deployed on traditional general-purpose CPUs. We believe these lessons from the trenches promote better co-design between hardware architecture and software engineering and advance the state of the art of ML in industry. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2104.05158 [pdf, other]

Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models

Authors: Dheevatsa Mudigere, Yuchen Hao, Jianyu Huang, Zhihao Jia, Andrew Tulloch, Srinivas Sridharan, Xing Liu, Mustafa Ozdal, Jade Nie, Jongsoo Park, Liang Luo, Jie Amy Yang, Leon Gao, Dmytro Ivchenko, Aarti Basant, Yuxi Hu, Jiyan Yang, Ehsan K. Ardestani, Xiaodong Wang, Rakesh Komuravelli, Ching-Hsiang Chu, Serhat Yilmaz, Huayu Li, Jiyuan Qian, Zhuobo Feng , et al. (28 additional authors not shown)

Abstract: Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pa… ▽ More Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) develo** sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments. △ Less

Submitted 26 February, 2023; v1 submitted 11 April, 2021; originally announced April 2021.

arXiv:2102.10484 [pdf, other]

CheXseg: Combining Expert Annotations with DNN-generated Saliency Maps for X-ray Segmentation

Authors: Soham Gadgil, Mark Endo, Emily Wen, Andrew Y. Ng, Pranav Rajpurkar

Abstract: Medical image segmentation models are typically supervised by expert annotations at the pixel-level, which can be expensive to acquire. In this work, we propose a method that combines the high quality of pixel-level expert annotations with the scale of coarse DNN-generated saliency maps for training multi-label semantic segmentation models. We demonstrate the application of our semi-supervised met… ▽ More Medical image segmentation models are typically supervised by expert annotations at the pixel-level, which can be expensive to acquire. In this work, we propose a method that combines the high quality of pixel-level expert annotations with the scale of coarse DNN-generated saliency maps for training multi-label semantic segmentation models. We demonstrate the application of our semi-supervised method, which we call CheXseg, on multi-label chest X-ray interpretation. We find that CheXseg improves upon the performance (mIoU) of fully-supervised methods that use only pixel-level expert annotations by 9.7% and weakly-supervised methods that use only DNN-generated saliency maps by 73.1%. Our best method is able to match radiologist agreement on three out of ten pathologies and reduces the overall performance gap by 57.2% as compared to weakly-supervised methods. △ Less

Submitted 17 May, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

Comments: Accepted to Medical Imaging with Deep Learning (MIDL) Conference 2021

arXiv:2010.08655 [pdf, other]

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

Authors: Mao Ye, Dhruv Choudhary, Jiecao Yu, Ellie Wen, Zeliang Chen, Jiyan Yang, Jongsoo Park, Qiang Liu, Arun Kejariwal

Abstract: Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost and carbon footprint in modern data centers. Pruning is an effective technique that reduces both memory and compute demand for model inference. However, pruning for online recommendation systems… ▽ More Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost and carbon footprint in modern data centers. Pruning is an effective technique that reduces both memory and compute demand for model inference. However, pruning for online recommendation systems is challenging due to the continuous data distribution shift (a.k.a non-stationary data). Although incremental training on the full model is able to adapt to the non-stationary data, directly applying it on the pruned model leads to accuracy loss. This is because the sparsity pattern after pruning requires adjustment to learn new patterns. To the best of our knowledge, this is the first work to provide in-depth analysis and discussion of applying pruning to online recommendation systems with non-stationary data distribution. Overall, this work makes the following contributions: 1) We present an adaptive dense to sparse paradigm equipped with a novel pruning algorithm for pruning a large scale recommendation system with non-stationary data distribution; 2) We design the pruning algorithm to automatically learn the sparsity across layers to avoid repeating hand-tuning, which is critical for pruning the heterogeneous architectures of recommendation systems trained with non-stationary data. △ Less

Submitted 21 October, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

arXiv:2003.13593 [pdf, other]

How Not to Give a FLOP: Combining Regularization and Pruning for Efficient Inference

Authors: Tai Vu, Emily Wen, Roy Nehoran

Abstract: The challenge of speeding up deep learning models during the deployment phase has been a large, expensive bottleneck in the modern tech industry. In this paper, we examine the use of both regularization and pruning for reduced computational complexity and more efficient inference in Deep Neural Networks (DNNs). In particular, we apply mixup and cutout regularizations and soft filter pruning to the… ▽ More The challenge of speeding up deep learning models during the deployment phase has been a large, expensive bottleneck in the modern tech industry. In this paper, we examine the use of both regularization and pruning for reduced computational complexity and more efficient inference in Deep Neural Networks (DNNs). In particular, we apply mixup and cutout regularizations and soft filter pruning to the ResNet architecture, focusing on minimizing floating-point operations (FLOPs). Furthermore, by using regularization in conjunction with network pruning, we show that such a combination makes a substantial improvement over each of the two techniques individually. △ Less

Submitted 9 April, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: Citations added, typos fixed

arXiv:1902.09451 [pdf, other]

Optimizing Controller Placement for Software-Defined Networks

Authors: Victoria Huang, Gang Chen, Qiang Fu, Elliott Wen

Abstract: Controller placement problem (CPP) is a key issue for Software-Defined Networking (SDN) with distributed controller architectures. This problem aims to determine a suitable number of controllers deployed in important locations so as to optimize the overall network performance. In comparison to communication delay, existing literature on the CPP assumes that the influence of controller workload dis… ▽ More Controller placement problem (CPP) is a key issue for Software-Defined Networking (SDN) with distributed controller architectures. This problem aims to determine a suitable number of controllers deployed in important locations so as to optimize the overall network performance. In comparison to communication delay, existing literature on the CPP assumes that the influence of controller workload distribution on network performance is negligible. In this paper, we tackle the CPP that simultaneously considers the communication delay, the control plane utilization, and the controller workload distribution. Due to this reason, our CPP is intrinsically different from and clearly more difficult than any previously studied CPPs that are NP-hard. To tackle this challenging issue, we develop a new algorithm that seamlessly integrates the genetic algorithm (GA) and the gradient descent (GD) optimization method. Particularly, GA is used to search for suitable CPP solutions. The quality of each solution is further evaluated through GD. Simulation results on two representative network scenarios (small-scale and large-scale) show that our algorithm can effectively strike the trade-off between the control plane utilization and the network response time. △ Less

Submitted 14 February, 2019; originally announced February 2019.

Journal ref: 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) (2019) 224-232

Showing 1–24 of 24 results for author: Wen, E