-
Perceptual Assessment and Optimization of HDR Image Rendering
Authors:
Peibei Cao,
Rafal K. Mantiuk,
Kede Ma
Abstract:
High dynamic range (HDR) rendering has the ability to faithfully reproduce the wide luminance ranges in natural scenes, but how to accurately assess the rendering quality is relatively underexplored. Existing quality models are mostly designed for low dynamic range (LDR) images, and do not align well with human perception of HDR image quality. To fill this gap, we propose a family of HDR quality m…
▽ More
High dynamic range (HDR) rendering has the ability to faithfully reproduce the wide luminance ranges in natural scenes, but how to accurately assess the rendering quality is relatively underexplored. Existing quality models are mostly designed for low dynamic range (LDR) images, and do not align well with human perception of HDR image quality. To fill this gap, we propose a family of HDR quality metrics, in which the key step is employing a simple inverse display model to decompose an HDR image into a stack of LDR images with varying exposures. Subsequently, these decomposed images are assessed through well-established LDR quality metrics. Our HDR quality models present three distinct benefits. First, they directly inherit the recent advancements of LDR quality metrics. Second, they do not rely on human perceptual data of HDR image quality for re-calibration. Third, they facilitate the alignment and prioritization of specific luminance ranges for more accurate and detailed quality assessment. Experimental results show that our HDR quality metrics consistently outperform existing models in terms of quality assessment on four HDR image quality datasets and perceptual optimization of HDR novel view synthesis.
△ Less
Submitted 16 June, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Improving the Performance of R17 Type-II Codebook with Deep Learning
Authors:
Ke Ma,
Yiliang Sang,
Yang Ming,
** Lian,
Chang Tian,
Zhaocheng Wang
Abstract:
The Type-II codebook in Release 17 (R17) exploits the angular-delay-domain partial reciprocity between uplink and downlink channels to select part of angular-delay-domain ports for measuring and feeding back the downlink channel state information (CSI), where the performance of existing deep learning enhanced CSI feedback methods is limited due to the deficiency of sparse structures. To address th…
▽ More
The Type-II codebook in Release 17 (R17) exploits the angular-delay-domain partial reciprocity between uplink and downlink channels to select part of angular-delay-domain ports for measuring and feeding back the downlink channel state information (CSI), where the performance of existing deep learning enhanced CSI feedback methods is limited due to the deficiency of sparse structures. To address this issue, we propose two new perspectives of adopting deep learning to improve the R17 Type-II codebook. Firstly, considering the low signal-to-noise ratio of uplink channels, deep learning is utilized to accurately select the dominant angular-delay-domain ports, where the focal loss is harnessed to solve the class imbalance problem. Secondly, we propose to adopt deep learning to reconstruct the downlink CSI based on the feedback of the R17 Type-II codebook at the base station, where the information of sparse structures can be effectively leveraged. Besides, a weighted shortcut module is designed to facilitate the accurate reconstruction. Simulation results demonstrate that our proposed methods could improve the sum rate performance compared with its traditional R17 Type-II codebook and deep learning benchmarks.
△ Less
Submitted 13 September, 2023;
originally announced October 2023.
-
BRAINTEASER: Lateral Thinking Puzzles for Large Language Models
Authors:
Yifan Jiang,
Filip Ilievski,
Kaixin Ma,
Zhivar Sourati
Abstract:
The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the…
▽ More
The success of language models has inspired the NLP community to attend to tasks that require implicit and complex reasoning, relying on human-like commonsense mechanisms. While such vertical thinking tasks have been relatively popular, lateral thinking puzzles have received little attention. To bridge this gap, we devise BRAINTEASER: a multiple-choice Question Answering task designed to test the model's ability to exhibit lateral thinking and defy default commonsense associations. We design a three-step procedure for creating the first lateral thinking benchmark, consisting of data collection, distractor generation, and generation of adversarial examples, leading to 1,100 puzzles with high-quality annotations. To assess the consistency of lateral reasoning by models, we enrich BRAINTEASER based on a semantic and contextual reconstruction of its questions. Our experiments with state-of-the-art instruction- and commonsense language models reveal a significant gap between human and model performance, which is further widened when consistency across adversarial formats is considered. We make all of our code and data available to stimulate work on develo** and evaluating lateral thinking models.
△ Less
Submitted 9 November, 2023; v1 submitted 8 October, 2023;
originally announced October 2023.
-
FDLS: A Deep Learning Approach to Production Quality, Controllable, and Retargetable Facial Performances
Authors:
Wan-Duo Kurt Ma,
Muhammad Ghifary,
J. P. Lewis,
Byungkuk Choi,
Haekwang Eom
Abstract:
Visual effects commonly requires both the creation of realistic synthetic humans as well as retargeting actors' performances to humanoid characters such as aliens and monsters. Achieving the expressive performances demanded in entertainment requires manipulating complex models with hundreds of parameters. Full creative control requires the freedom to make edits at any stage of the production, whic…
▽ More
Visual effects commonly requires both the creation of realistic synthetic humans as well as retargeting actors' performances to humanoid characters such as aliens and monsters. Achieving the expressive performances demanded in entertainment requires manipulating complex models with hundreds of parameters. Full creative control requires the freedom to make edits at any stage of the production, which prohibits the use of a fully automatic ``black box'' solution with uninterpretable parameters. On the other hand, producing realistic animation with these sophisticated models is difficult and laborious. This paper describes FDLS (Facial Deep Learning Solver), which is Weta Digital's solution to these challenges. FDLS adopts a coarse-to-fine and human-in-the-loop strategy, allowing a solved performance to be verified and edited at several stages in the solving process. To train FDLS, we first transform the raw motion-captured data into robust graph features. Secondly, based on the observation that the artists typically finalize the jaw pass animation before proceeding to finer detail, we solve for the jaw motion first and predict fine expressions with region-based networks conditioned on the jaw position. Finally, artists can optionally invoke a non-linear finetuning process on top of the FDLS solution to follow the motion-captured virtual markers as closely as possible. FDLS supports editing if needed to improve the results of the deep learning solution and it can handle small daily changes in the actor's face shape. FDLS permits reliable and production-quality performance solving with minimal training and little or no manual effort in many cases, while also allowing the solve to be guided and edited in unusual and difficult cases. The system has been under development for several years and has been used in major movies.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Knowledge Base Aware Semantic Communication in Vehicular Networks
Authors:
Le Xia,
Yao Sun,
Dusit Niyato,
Kairong Ma,
Jiawen Kang,
Muhammad Ali Imran
Abstract:
Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background k…
▽ More
Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background knowledge matching mechanism in SemCom makes it challenging to realize efficient vehicle-to-vehicle service provisioning for multiple users at the same time. To this end, this paper identifies and jointly addresses two fundamental problems of knowledge base construction (KBC) and vehicle service pairing (VSP) inherently existing in SemCom-enabled vehicular networks. Concretely, we first derive the knowledge matching based queuing latency specific for semantic data packets, and then formulate a latency-minimization problem subject to several KBC and VSP related reliability constraints. Afterward, a SemCom-empowered Service Supplying Solution (S$^{\text{4}}$) is proposed along with the theoretical analysis of its optimality guarantee. Simulation results demonstrate the superiority of S$^{\text{4}}$ in terms of average queuing latency, semantic data packet throughput, and user knowledge preference satisfaction compared with two different benchmarks.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
TOPPQuad: Dynamically-Feasible Time Optimal Path Parametrization for Quadrotors
Authors:
Katherine Mao,
Igor Spasojevic,
M. Ani Hsieh,
Vijay Kumar
Abstract:
Planning time-optimal trajectories for quadrotors in cluttered environments is a challenging, non-convex problem. This paper addresses minimizing the traversal time of a given collision-free geometric path without violating bounds on individual motor thrusts of the vehicle. Previous approaches have either relied on convex relaxations that do not guarantee dynamic feasibility, or have generated ove…
▽ More
Planning time-optimal trajectories for quadrotors in cluttered environments is a challenging, non-convex problem. This paper addresses minimizing the traversal time of a given collision-free geometric path without violating bounds on individual motor thrusts of the vehicle. Previous approaches have either relied on convex relaxations that do not guarantee dynamic feasibility, or have generated overly conservative time parametrizations. We propose TOPPQuad, a time-optimal path parameterization algorithm for quadrotors which explicitly incorporates quadrotor rigid body dynamics and constraints such as bounds on inputs (including motor speeds) and state of the vehicle (including the pose, linear and angular velocity and acceleration). We demonstrate the ability of the planner to generate faster trajectories that respect hardware constraints of the robot compared to several planners with relaxed notions of dynamic feasibility. We also demonstrate how TOPPQuad can be used to plan trajectories for quadrotors that utilize bidirectional motors. Overall, the proposed approach paves a way towards maximizing the efficacy of autonomous micro aerial vehicles while ensuring their safety.
△ Less
Submitted 10 April, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Authors:
Runpei Dong,
Chunrui Han,
Yuang Peng,
Zekun Qi,
Zheng Ge,
**rong Yang,
Liang Zhao,
Jianjian Sun,
Hongyu Zhou,
Haoran Wei,
Xiangwen Kong,
Xiangyu Zhang,
Kaisheng Ma,
Li Yi
Abstract:
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The first focuses on the generative modeling of both language and image posteriors by direct sampling in the raw multimodal space. This a…
▽ More
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The first focuses on the generative modeling of both language and image posteriors by direct sampling in the raw multimodal space. This approach circumvents the limitations and information loss inherent to external feature extractors like CLIP, and a more thorough multimodal understanding is obtained. Second, DreamLLM fosters the generation of raw, interleaved documents, modeling both text and image contents, along with unstructured layouts. This allows DreamLLM to learn all conditional, marginal, and joint multimodal distributions effectively. As a result, DreamLLM is the first MLLM capable of generating free-form interleaved content. Comprehensive experiments highlight DreamLLM's superior performance as a zero-shot multimodal generalist, rea** from the enhanced learning synergy. Project page: https://dreamllm.github.io.
△ Less
Submitted 15 March, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
A new method for spatially resolving the turbulence driving mixture in the ISM with application to the Small Magellanic Cloud
Authors:
Isabella A. Gerrard,
Christoph Federrath,
Nickolas M. **el,
Naomi M. McClure-Griffiths,
Antoine Marchal,
Gilles Joncas,
Susan E. Clark,
Snežana Stanimirović,
Min-Young Lee,
Jacco Th. van Loon,
John Dickey,
Helga Dénes,
Yik Ki Ma,
James Dempsey,
Callum Lynn
Abstract:
Turbulence plays a crucial role in sha** the structure of the interstellar medium. The ratio of the three-dimensional density contrast ($σ_{ρ/ρ_0}$) to the turbulent sonic Mach number ($\mathcal{M}$) of an isothermal, compressible gas describes the ratio of solenoidal to compressive modes in the turbulent acceleration field of the gas, and is parameterised by the turbulence driving parameter:…
▽ More
Turbulence plays a crucial role in sha** the structure of the interstellar medium. The ratio of the three-dimensional density contrast ($σ_{ρ/ρ_0}$) to the turbulent sonic Mach number ($\mathcal{M}$) of an isothermal, compressible gas describes the ratio of solenoidal to compressive modes in the turbulent acceleration field of the gas, and is parameterised by the turbulence driving parameter: $b=σ_{ρ/ρ_0}/\mathcal{M}$. The turbulence driving parameter ranges from $b=1/3$ (purely solenoidal) to $b=1$ (purely compressive), with $b=0.38$ characterising the natural mixture (1/3~compressive, 2/3~solenoidal) of the two driving modes. Here we present a new method for recovering $σ_{ρ/ρ_0}$, $\mathcal{M}$, and $b$, from observations on galactic scales, using a roving kernel to produce maps of these quantities from column density and centroid velocity maps. We apply our method to high-resolution HI emission observations of the Small Magellanic Cloud (SMC) from the GASKAP-HI survey. We find that the turbulence driving parameter varies between $b\sim 0.3$ and $b\sim 1.0$ within the main body of the SMC, but the median value converges to $b\sim0.51$, suggesting that the turbulence is overall driven more compressively ($b>0.38$). We observe no correlation between the $b$ parameter and HI or H$α$ intensity, indicating that compressive driving of HI turbulence cannot be determined solely by observing HI or H$α$ emission density, and that velocity information must also be considered. Further investigation is required to link our findings to potential driving mechanisms such as star-formation feedback, gravitational collapse, or cloud-cloud collisions.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
LASER: LLM Agent with State-Space Exploration for Web Navigation
Authors:
Kaixin Ma,
Hongming Zhang,
Hongwei Wang,
Xiaoman Pan,
Wenhao Yu,
Dong Yu
Abstract:
Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. While achieving decent performance, previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples to guide the model on how to reason in the environment. Consequently, the model could not handle m…
▽ More
Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. While achieving decent performance, previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples to guide the model on how to reason in the environment. Consequently, the model could not handle more challenging scenarios not covered in the in-context examples, e.g., mistakes, leading to sub-optimal performance. To address this issue, we propose to model the interactive task as state space exploration, where the LLM agent transitions among a pre-defined set of states by performing actions to complete the task. This formulation enables flexible backtracking, allowing the model to recover from errors easily. We evaluate our proposed LLM Agent with State-Space ExploRation (LASER) on both the WebShop task and amazon.com. Experimental results show that LASER significantly outperforms previous methods and closes the gap with human performance on the web navigation task.
△ Less
Submitted 21 February, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Testing Bell inequality through $h\toττ$ at CEPC
Authors:
Kai Ma,
Tong Li
Abstract:
The decay of Higgs boson into two spin-1/2 particles provides an ideal system to reveal quantum entanglement and Bell-nonlocality. Future $e^+e^-$ colliders can improve the measurement accuracy of the spin correlation of tau lepton pairs from Higgs boson decay. We show the testability of Bell inequality through $h\to ττ$ at Circular Electron Positron Collider (CEPC). Two realistic methods of testi…
▽ More
The decay of Higgs boson into two spin-1/2 particles provides an ideal system to reveal quantum entanglement and Bell-nonlocality. Future $e^+e^-$ colliders can improve the measurement accuracy of the spin correlation of tau lepton pairs from Higgs boson decay. We show the testability of Bell inequality through $h\to ττ$ at Circular Electron Positron Collider (CEPC). Two realistic methods of testing Bell inequality are investigated, i.e., Törnqvist's method and Clauser-Home-Shimony-Holt (CHSH) inequality. In the simulation, we take into account the detector effects of CEPC including uncertainties for tracks and jets from $Z$ boson in the production of $e^+e^-\to Zh$. Necessary reconstruction approaches are described to measure quantum entanglement between $τ^+$ and $τ^-$. Finally, we show the sensitivity of CEPC to the Bell inequality violation for the two methods.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
VisActs: Describing Intent in Communicative Visualization
Authors:
Keshav Dasu,
Yun-Hsin Kuo,
Kwan-Liu Ma
Abstract:
Data visualization can be defined as the visual communication of information. One important barometer for the success of a visualization is whether the intents of the communicator(s) are faithfully conveyed. The processes of constructing and displaying visualizations have been widely studied by our community. However, due to the lack of consistency in this literature, there is a growing acknowledg…
▽ More
Data visualization can be defined as the visual communication of information. One important barometer for the success of a visualization is whether the intents of the communicator(s) are faithfully conveyed. The processes of constructing and displaying visualizations have been widely studied by our community. However, due to the lack of consistency in this literature, there is a growing acknowledgment of a need for frameworks and methodologies for classifying and formalizing the communicative component of visualization. This work focuses on intent and introduces how this concept in communicative visualization mirrors concepts in linguistics. We construct a map** between the two spaces that enables us to leverage relevant frameworks to apply to visualization. We describe this translation as using the philosophy of language as a base for explaining communication in visualization. Furthermore, we illustrate the benefits and point out several prospective research directions.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Character-Oriented Design for Visual Data Storytelling
Authors:
Keshav Dasu,
Yun-Hsin Kuo,
Kwan-Liu Ma
Abstract:
When telling a data story, an author has an intention they seek to convey to an audience. This intention can be of many forms such as to persuade, to educate, to inform, or even to entertain. In addition to expressing their intention, the story plot must balance being consumable and enjoyable while preserving scientific integrity. In data stories, numerous methods have been identified for construc…
▽ More
When telling a data story, an author has an intention they seek to convey to an audience. This intention can be of many forms such as to persuade, to educate, to inform, or even to entertain. In addition to expressing their intention, the story plot must balance being consumable and enjoyable while preserving scientific integrity. In data stories, numerous methods have been identified for constructing and presenting a plot. However, there is an opportunity to expand how we think and create the visual elements that present the story. Stories are brought to life by characters; often they are what make a story captivating, enjoyable, memorable, and facilitate following the plot until the end. Through the analysis of 160 existing data stories, we systematically investigate and identify distinguishable features of characters in data stories, and we illustrate how they feed into the broader concept of "character-oriented design". We identify the roles and visual representations data characters assume as well as the types of relationships these roles have with one another. We identify characteristics of antagonists as well as define conflict in data stories. We find the need for an identifiable central character that the audience latches on to in order to follow the narrative and identify their visual representations. We then illustrate "character-oriented design" by showing how to develop data characters with common data story plots. With this work, we present a framework for data characters derived from our analysis; we then offer our extension to the data storytelling process using character-oriented design. To access our supplemental materials please visit https://chaorientdesignds.github.io/
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Feature-aware conditional GAN for category text generation
Authors:
Xinze Li,
Kezhi Mao,
Fanfan Lin,
Zijian Feng
Abstract:
Category text generation receives considerable attentions since it is beneficial for various natural language processing tasks. Recently, the generative adversarial network (GAN) has attained promising performance in text generation, attributed to its adversarial training process. However, there are several issues in text GANs, including discreteness, training instability, mode collapse, lack of d…
▽ More
Category text generation receives considerable attentions since it is beneficial for various natural language processing tasks. Recently, the generative adversarial network (GAN) has attained promising performance in text generation, attributed to its adversarial training process. However, there are several issues in text GANs, including discreteness, training instability, mode collapse, lack of diversity and controllability etc. To address these issues, this paper proposes a novel GAN framework, the feature-aware conditional GAN (FA-GAN), for controllable category text generation. In FA-GAN, the generator has a sequence-to-sequence structure for improving sentence diversity, which consists of three encoders including a special feature-aware encoder and a category-aware encoder, and one relational-memory-core-based decoder with the Gumbel SoftMax activation function. The discriminator has an additional category classification head. To generate sentences with specified categories, the multi-class classification loss is supplemented in the adversarial training. Comprehensive experiments have been conducted, and the results show that FA-GAN consistently outperforms 10 state-of-the-art text generation approaches on 6 text classification datasets. The case study demonstrates that the synthetic sentences generated by FA-GAN can match the required categories and are aware of the features of conditioned sentences, with good readability, fluency, and text authenticity.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Particle swarm optimization with state-based adaptive velocity limit strategy
Authors:
Xinze Li,
Kezhi Mao,
Fanfan Lin,
Xin Zhang
Abstract:
Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results…
▽ More
Velocity limit (VL) has been widely adopted in many variants of particle swarm optimization (PSO) to prevent particles from searching outside the solution space. Several adaptive VL strategies have been introduced with which the performance of PSO can be improved. However, the existing adaptive VL strategies simply adjust their VL based on iterations, leading to unsatisfactory optimization results because of the incompatibility between VL and the current searching state of particles. To deal with this problem, a novel PSO variant with state-based adaptive velocity limit strategy (PSO-SAVL) is proposed. In the proposed PSO-SAVL, VL is adaptively adjusted based on the evolutionary state estimation (ESE) in which a high value of VL is set for global searching state and a low value of VL is set for local searching state. Besides that, limit handling strategies have been modified and adopted to improve the capability of avoiding local optima. The good performance of PSO-SAVL has been experimentally validated on a wide range of benchmark functions with 50 dimensions. The satisfactory scalability of PSO-SAVL in high-dimension and large-scale problems is also verified. Besides, the merits of the strategies in PSO-SAVL are verified in experiments. Sensitivity analysis for the relevant hyper-parameters in state-based adaptive VL strategy is conducted, and insights in how to select these hyper-parameters are also discussed.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress
Authors:
Xinze Li,
Xin Zhang,
Fanfan Lin,
Changjiang Sun,
Kezhi Mao
Abstract:
The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB…
▽ More
The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB converter has been an important performance indicator when TPS modulation is applied for smaller size and higher efficiency. However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively. Firstly, three degrees of modulation variables in TPS modulation bring challenges to the analysis of current stress in different operating modes. This analysis and deduction process leads to heavy computational burden and also suffers from low accuracy. Secondly, to realize TPS modulation, if a lookup table is adopted after the optimization of modulation variables, modulation performance will be unsatisfactory because of the discrete nature of lookup table. Therefore, an AI-based TPS modulation (AI-TPSM) strategy is proposed in this paper. Neural network (NN) and fuzzy inference system (FIS) are utilized to deal with the two difficulties mentioned above. With the proposed AI-TPSM, the optimization of TPS modulation for minimized current stress will enjoy high degree of automation which can relieve engineers' working burden and improve accuracy. In the end of this paper, the effectiveness of the proposed AI-TPSM has been experimentally verified with a 1 kW prototype.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Authors:
Xinze Li,
Xin Zhang,
Fanfan Lin,
Changjiang Sun,
Kezhi Mao
Abstract:
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to a…
▽ More
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while kee** high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Classes are not Clusters: Improving Label-based Evaluation of Dimensionality Reduction
Authors:
Hyeon Jeon,
Yun-Hsin Kuo,
Michaël Aupetit,
Kwan-Liu Ma,
**wook Seo
Abstract:
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into m…
▽ More
A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into multiple separated clusters, and multiple classes can be merged into a single cluster. We thus cannot always assure the credibility of the evaluation using class labels. In this paper, we introduce two novel quality measures -- Label-Trustworthiness and Label-Continuity (Label-T&C) -- advancing the process of DR evaluation based on class labels. Instead of assuming that classes are well-clustered in the original space, Label-T&C work by (1) estimating the extent to which classes form clusters in the original and embedded spaces and (2) evaluating the difference between the two. A quantitative evaluation showed that Label-T&C outperform widely used DR evaluation measures (e.g., Trustworthiness and Continuity, Kullback-Leibler divergence) in terms of the accuracy in assessing how well DR embeddings preserve the cluster structure, and are also scalable. Moreover, we present case studies demonstrating that Label-T&C can be successfully used for revealing the intrinsic characteristics of DR techniques and their hyperparameters.
△ Less
Submitted 11 August, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
MoS$_{2}$/Al$_{0.68}$Sc$_{0.32}$N negative capacitance field-effect transistors
Authors:
Seunguk Song,
Kwan-Ho Kim,
Srikrishna Chakravarthi,
Zirun Han,
Gwangwoo Kim,
Kyung Yeol Ma,
Hyeon Suk Shin,
Roy H. Olsson III,
Deep Jariwala
Abstract:
Al$_{0.68}$Sc$_{0.32}$N (AlScN) has gained attention for its outstanding ferroelectric properties, including a high coercive field and high remnant polarization. Although AlScN-based ferroelectric field-effect transistors (FETs) for memory applications have been demonstrated, a device for logic applications with minimal hysteresis has not been reported. This study reports on the transport characte…
▽ More
Al$_{0.68}$Sc$_{0.32}$N (AlScN) has gained attention for its outstanding ferroelectric properties, including a high coercive field and high remnant polarization. Although AlScN-based ferroelectric field-effect transistors (FETs) for memory applications have been demonstrated, a device for logic applications with minimal hysteresis has not been reported. This study reports on the transport characteristics of a MoS$_{2}$ negative capacitance FET (NCFET) based on an AlScN ferroelectric material. We experimentally demonstrate the effect of a dielectric layer in the gate stack on the memory window and subthreshold swing (SS) of the NCFET. We show that the hysteresis behavior of transfer characteristics in the NCFET can be minimized with the inclusion of a non-ferroelectric dielectric layer, which fulfills the capacitance-matching condition. Remarkably, we also observe the NC effect in MoS$_{2}$/AlScN NCFETs arrays based on large-area monolayer MoS$_{2}$ synthesized by chemical vapor deposition, showing the SS values smaller than its thermionic limit (~36-60 mV/dec) and minimal variation in threshold voltages (< 20 mV).
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
Authors:
Zekun Qi,
Muzhou Yu,
Runpei Dong,
Kaisheng Ma
Abstract:
Conditional 3D generation is undergoing a significant advancement, enabling the free creation of 3D content from inputs such as text or 2D images. However, previous approaches have suffered from low inference efficiency, limited generation categories, and restricted downstream applications. In this work, we revisit the impact of different 3D representations on generation quality and efficiency. We…
▽ More
Conditional 3D generation is undergoing a significant advancement, enabling the free creation of 3D content from inputs such as text or 2D images. However, previous approaches have suffered from low inference efficiency, limited generation categories, and restricted downstream applications. In this work, we revisit the impact of different 3D representations on generation quality and efficiency. We propose a progressive generation method through Voxel-Point Progressive Representation (VPP). VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects. VPP can generate high-quality 8K point clouds within 0.2 seconds. Additionally, the masked generation Transformer allows for various 3D downstream tasks, such as generation, editing, completion, and pre-training. Extensive experiments demonstrate that VPP efficiently generates high-fidelity and diverse 3D shapes across different categories, while also exhibiting excellent representation transfer performance. Codes will be released at \url{https://github.com/qizekun/VPP}.
△ Less
Submitted 20 October, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models
Authors:
Wei Sun,
Wen Wen,
Xiongkuo Min,
Long Lan,
Guangtao Zhai,
Kede Ma
Abstract:
Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to proper…
▽ More
Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.
△ Less
Submitted 3 April, 2024; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Competing phases and intertwined orders in coupled wires near the self-dual point
Authors:
Ken K. W. Ma,
Oğuz Türker,
Alexander Seidel,
Kun Yang
Abstract:
The interplay between different quantum phases plays an important role in strongly correlated systems, such as high-$T_c$ cuprates, quantum spin systems, and ultracold atoms. In particular, the application of effective field theory model and renormalization group analysis suggested that the coexistence of density wave (DW) and superfluid (SF) orders can lead to a supersolid phase of ultracold boso…
▽ More
The interplay between different quantum phases plays an important role in strongly correlated systems, such as high-$T_c$ cuprates, quantum spin systems, and ultracold atoms. In particular, the application of effective field theory model and renormalization group analysis suggested that the coexistence of density wave (DW) and superfluid (SF) orders can lead to a supersolid phase of ultracold bosons. Here we revisit the problem by considering weakly coupled wires, where we treat the intra-wire interactions exactly via bosonization and inter-wire couplings using a mean-field theory which becomes asymptotically exact in the limit of high dimensionality. We obtain and solve the mean-field equations for the system near the self-dual point, where each wire has the Luttinger parameter $K=1$ and the inter-wire DW and SF coupling strengths are identical. This allows us to find explicit solutions for the possible supersolid order. An energy comparison between different possible solutions shows that the supersolid order is energetically unfavorable at zero temperature. This suggests that the density wave and superfluid phases are connected by a first order transition near the self-dual point. We also discuss the relation between our work and the intertwining of charge density wave and superconducting orders in cuprates.
△ Less
Submitted 19 December, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
SAM-Path: A Segment Anything Model for Semantic Segmentation in Digital Pathology
Authors:
**gwei Zhang,
Ke Ma,
Saarthak Kapse,
Joel Saltz,
Maria Vakalopoulou,
Prateek Prasanna,
Dimitris Samaras
Abstract:
Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due t…
▽ More
Semantic segmentations of pathological entities have crucial clinical value in computational pathology workflows. Foundation models, such as the Segment Anything Model (SAM), have been recently proposed for universal use in segmentation tasks. SAM shows remarkable promise in instance segmentation on natural images. However, the applicability of SAM to computational pathology tasks is limited due to the following factors: (1) lack of comprehensive pathology datasets used in SAM training and (2) the design of SAM is not inherently optimized for semantic segmentation tasks. In this work, we adapt SAM for semantic segmentation by introducing trainable class prompts, followed by further enhancements through the incorporation of a pathology encoder, specifically a pathology foundation model. Our framework, SAM-Path enhances SAM's ability to conduct semantic segmentation in digital pathology without human input prompts. Through experiments on two public pathology datasets, the BCSS and the CRAG datasets, we demonstrate that the fine-tuning with trainable class prompts outperforms vanilla SAM with manual prompts and post-processing by 27.52% in Dice score and 71.63% in IOU. On these two datasets, the proposed additional pathology foundation model further achieves a relative improvement of 5.07% to 5.12% in Dice score and 4.50% to 8.48% in IOU.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Discovering User Types: Map** User Traits by Task-Specific Behaviors in Reinforcement Learning
Authors:
L. L. Ankile,
B. S. Ham,
K. Mao,
E. Shin,
S. Swaroop,
F. Doshi-Velez,
W. Pan
Abstract:
When assisting human users in reinforcement learning (RL), we can represent users as RL agents and study key parameters, called \emph{user traits}, to inform intervention design. We study the relationship between user behaviors (policy classes) and user traits. Given an environment, we introduce an intuitive tool for studying the breakdown of "user types": broad sets of traits that result in the s…
▽ More
When assisting human users in reinforcement learning (RL), we can represent users as RL agents and study key parameters, called \emph{user traits}, to inform intervention design. We study the relationship between user behaviors (policy classes) and user traits. Given an environment, we introduce an intuitive tool for studying the breakdown of "user types": broad sets of traits that result in the same behavior. We show that seemingly different real-world environments admit the same set of user types and formalize this observation as an equivalence relation defined on environments. By transferring intervention design between environments within the same equivalence class, we can help rapidly personalize interventions.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
The Rapid ASKAP Continuum Survey III: Spectra and Polarisation In Cutouts of Extragalactic Sources (SPICE-RACS) First Data Release
Authors:
Alec J. M. Thomson,
David McConnell,
Emil Lenc,
Timothy J Galvin,
Lawrence Rudnick,
George Heald,
Catherine L. Hale,
Stefan W. Duchesne,
Craig S. Anderson,
Ettore Carretti,
Christoph Federrath,
B. M. Gaensler,
Lisa Harvey-Smith,
Marijke Haverkorn,
Aidan W. Hotan,
Yik Ki Ma,
Tara Murphy,
N. M. McClure-Griffith,
Vanessa A. Moss,
Shane P. O'Sullivan,
Wasim Raja,
Amit Seta,
Cameron L. Van Eck,
Jennifer L. West,
Matthew T. Whiting
, et al. (1 additional authors not shown)
Abstract:
The Australian SKA Pathfinder (ASKAP) radio telescope has carried out a survey of the entire Southern Sky at 887.5MHz. The wide area, high angular resolution, and broad bandwidth provided by the low-band Rapid ASKAP Continuum Survey (RACS-low) allow the production of a next-generation rotation measure (RM) grid across the entire Southern Sky. Here we introduce this project as Spectral and Polarisa…
▽ More
The Australian SKA Pathfinder (ASKAP) radio telescope has carried out a survey of the entire Southern Sky at 887.5MHz. The wide area, high angular resolution, and broad bandwidth provided by the low-band Rapid ASKAP Continuum Survey (RACS-low) allow the production of a next-generation rotation measure (RM) grid across the entire Southern Sky. Here we introduce this project as Spectral and Polarisation in Cutouts of Extragalactic sources from RACS (SPICE-RACS). In our first data release, we image 30 RACS-low fields in Stokes $I$, $Q$, $U$ at 25'' angular resolution, across 744 to 1032MHz with 1MHz spectral resolution. Using a bespoke, highly parallelised, software pipeline we are able to rapidly process wide-area spectro-polarimetric ASKAP observations. Notably, we use 'postage stamp' cutouts to assess the polarisation properties of \ncomponents\ radio components detected in total intensity. We find that our Stokes $Q$ and $U$ images have an rms noise of ~80$μ$Jy/PSF, and our correction for instrumental polarisation leakage allows us to characterise components with >1% polarisation fraction over most of the field of view. We produce a broadband polarised radio component catalogue that contains \nrms\ RM measurements over an area of ~1300deg^2 with an average error in RM of 1.6+1.1-1.0rad/m^2, and an average linear polarisation fraction 3.4+3.0-1.6%. We determine this subset of components using the conditions that the polarised signal-to-noise ratio is $>8$, the polarisation fraction is above our estimated polarised leakage, and the Stokes $I$ spectrum has a reliable model. Our catalogue provides an areal density of $4\pm2$ RMs/deg^2; an increase of $\sim4$ times over the previous state-of-the-art (Taylor et al. 2009). Meaning that, having used just 3% of the RACS-low sky area, we have produced the 3rd largest RM catalogue to date. This catalogue has broad applications for studying...
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Exciton Confinement in Two-Dimensional, In-Plane, Quantum Heterostructures
Authors:
Gwangwoo Kim,
Benjamin Huet,
Christopher E. Stevens,
Kiyoung Jo,
Jeng-Yuan Tsai,
Saiphaneendra Bachu,
Meghan Leger,
Kyung Yeol Ma,
Nicholas R. Glavin,
Hyeon Suk Shin,
Nasim Alem,
Qimin Yan,
Joshua R. Hedrickson,
Joan M. Redwing,
Deep Jariwala
Abstract:
Two-dimensional (2D) semiconductors are promising candidates for optoelectronic application and quantum information processes due to their inherent out-of-plane 2D confinement. In addition, they offer the possibility of achieving low-dimensional in-plane exciton confinement, similar to zero-dimensional quantum dots, with intriguing optical and electronic properties via strain or composition engine…
▽ More
Two-dimensional (2D) semiconductors are promising candidates for optoelectronic application and quantum information processes due to their inherent out-of-plane 2D confinement. In addition, they offer the possibility of achieving low-dimensional in-plane exciton confinement, similar to zero-dimensional quantum dots, with intriguing optical and electronic properties via strain or composition engineering. However, realizing such laterally confined 2D monolayers and systematically controlling size-dependent optical properties remain significant challenges. Here, we report the observation of lateral confinement of excitons in epitaxially grown in-plane MoSe2 quantum dots (~15-60 nm wide) inside a continuous matrix of WSe2 monolayer film via a sequential epitaxial growth process. Various optical spectroscopy techniques reveal the size-dependent exciton confinement in the MoSe2 monolayer quantum dots with exciton blue shift (12-40 meV) at a low temperature as compared to continuous monolayer MoSe2. Finally, single-photon emission was also observed from the smallest dots at 1.6 K. Our study opens the door to compositionally engineered, tunable, in-plane quantum light sources in 2D semiconductors.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Sampling the Faraday rotation sky of TNG50: Imprint of the magnetised circumgalactic medium around Milky Way-like galaxies
Authors:
Seoyoung Lyla Jung,
N. M. McClure-Griffiths,
Ruediger Pakmor,
Yik Ki Ma,
Alex S. Hill,
Cameron L. Van Eck,
Craig S. Anderson
Abstract:
Faraday rotation measure (RM) is arguably the most practical observational tracer of magnetic fields in the diffuse circumgalactic medium (CGM). We sample synthetic Faraday rotation skies of Milky Way-like galaxies in TNG50 of the IllustrisTNG project by placing an observer inside the galaxies at a solar circle-like position. Our synthetic RM grids emulate specifications of current and upcoming su…
▽ More
Faraday rotation measure (RM) is arguably the most practical observational tracer of magnetic fields in the diffuse circumgalactic medium (CGM). We sample synthetic Faraday rotation skies of Milky Way-like galaxies in TNG50 of the IllustrisTNG project by placing an observer inside the galaxies at a solar circle-like position. Our synthetic RM grids emulate specifications of current and upcoming surveys; the NRAO VLA Sky Survey (NVSS), the Polarisation Sky Survey of the Universe's Magnetism (POSSUM), and a future Square Kilometre Array (SKA1-mid) polarisation survey. It has been suggested that magnetic fields regulate the survival of high-velocity clouds. However, there is only a small number of observational detections of magnetised clouds thus far. In the first part of the paper, we test conditions for the detection of magnetised circumgalactic clouds. Based on the synthetic RM samplings of clouds in the simulations, we predict upcoming polarimetric surveys will open opportunities for the detection of even low-mass and distant clouds. In the second part of the paper, we investigate the imprint of the CGM in the all-sky RM distribution. We test whether the RM variation produced by the CGM is correlated with global galaxy properties, such as distance to a satellite, specific star formation rate, neutral hydrogen covering fraction, and accretion rate to the supermassive black hole. We argue that the observed fluctuation in the RM measurements on scales less than 1 degree, which has been considered an indication of intergalactic magnetic fields, might in fact incorporate a significant contribution of the Milky Way CGM.
△ Less
Submitted 16 September, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data
Authors:
Qing Xu,
Min Wu,
Xiaoli Li,
Kezhi Mao,
Zhenghua Chen
Abstract:
For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource-limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existi…
▽ More
For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource-limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called Universal and joint knowledge distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher's and student's representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Internal Contrastive Learning for Generalized Out-of-distribution Fault Diagnosis (GOOFD) Framework
Authors:
Xingyue Wang,
Hanrong Zhang,
Ke Ma,
Shuting Tao,
Peng Peng,
Hongwei Wang
Abstract:
Fault diagnosis is essential in industrial processes for monitoring the conditions of important machines. With the ever-increasing complexity of working conditions and demand for safety during production and operation, different diagnosis methods are required, and more importantly, an integrated fault diagnosis system that can cope with multiple tasks is highly desired. However, the diagnosis subt…
▽ More
Fault diagnosis is essential in industrial processes for monitoring the conditions of important machines. With the ever-increasing complexity of working conditions and demand for safety during production and operation, different diagnosis methods are required, and more importantly, an integrated fault diagnosis system that can cope with multiple tasks is highly desired. However, the diagnosis subtasks are often studied separately, and the currently available methods still need improvement for such a generalized system. To address this issue, we propose the Generalized Out-of-distribution Fault Diagnosis (GOOFD) framework to integrate diagnosis subtasks, such as fault detection, fault classification, and novel fault diagnosis. Additionally, a unified fault diagnosis method based on internal contrastive learning is put forward to underpin the proposed generalized framework. The method extracts features utilizing the internal contrastive learning technique and then recognizes the outliers based on the Mahalanobis distance. Experiments are conducted on a simulated benchmark dataset as well as two practical process datasets to evaluate the proposed framework. As demonstrated in the experiments, the proposed method achieves better performance compared with several existing techniques and thus verifies the effectiveness of the proposed framework.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems
Authors:
Shilpika,
Bethany Lusch,
Murali Emani,
Filippo Simini,
Venkatram Vishwanath,
Michael E. Papka,
Kwan-Liu Ma
Abstract:
The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions. In this work, we aim to build a holistic analyti…
▽ More
The ability to monitor and interpret of hardware system events and behaviors are crucial to improving the robustness and reliability of these systems, especially in a supercomputing facility. The growing complexity and scale of these systems demand an increase in monitoring data collected at multiple fidelity levels and varying temporal resolutions. In this work, we aim to build a holistic analytical system that helps make sense of such massive data, mainly the hardware logs, job logs, and environment logs collected from disparate subsystems and components of a supercomputer system. This end-to-end log analysis system, coupled with visual analytics support, allows users to glean and promptly extract supercomputer usage and error patterns at varying temporal and spatial resolutions. We use multiresolution dynamic mode decomposition (mrDMD), a technique that depicts high-dimensional data as correlated spatial-temporal variations patterns or modes, to extract variation patterns isolated at specified frequencies. Our improvements to the mrDMD algorithm help promptly reveal useful information in the massive environment log dataset, which is then associated with the processed hardware and job log datasets using our visual analytics system. Furthermore, our system can identify the usage and error patterns filtered at user, project, and subcomponent levels. We exemplify the effectiveness of our approach with two use scenarios with the Cray XC40 supercomputer.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Locally controlled arrested thermalization
Authors:
Ken K. W. Ma,
Hitesh J. Changlani
Abstract:
The long-time dynamics of quantum systems, typically, but not always, results in a thermal steady state. The microscopic processes that lead to or circumvent this fate are of interest, since everyday experience tells us that not all spatial regions of a system heat up or cool down uniformly. This motivates the question: under what conditions can one slow down or completely arrest thermalization lo…
▽ More
The long-time dynamics of quantum systems, typically, but not always, results in a thermal steady state. The microscopic processes that lead to or circumvent this fate are of interest, since everyday experience tells us that not all spatial regions of a system heat up or cool down uniformly. This motivates the question: under what conditions can one slow down or completely arrest thermalization locally? Is it possible to construct realistic Hamiltonians and initial states such that a local region is effectively insulated from the rest, or acts like a barrier between two or more regions? We answer this in the affirmative by outlining the conditions that govern the flow of energy and entropy between subsystems. Using these ideas we provide a representative example for how simple few-body states can be used to engineer a ``thermal switch" between interacting regions.
△ Less
Submitted 9 May, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
Tailoring Exciton Dynamics in TMDC Heterobilayers in the Quantum Plasmonic Regime
Authors:
Mahfujur Rahaman,
Gwangwoo Kim,
Kyung Yeol Ma,
Seunguk Song,
Hyeon Suk Shin,
Deep Jariwala
Abstract:
Control of excitons in transition metal dichalcogenides (TMDCs) and their heterostructures is fundamentally interesting for tailoring light-matter interactions and exploring their potential applications in high-efficiency optoelectronic and nonlinear photonic devices. While both intra- and interlayer excitons in TMDCs have been heavily studied, their behavior in the quantum tunneling regime, in wh…
▽ More
Control of excitons in transition metal dichalcogenides (TMDCs) and their heterostructures is fundamentally interesting for tailoring light-matter interactions and exploring their potential applications in high-efficiency optoelectronic and nonlinear photonic devices. While both intra- and interlayer excitons in TMDCs have been heavily studied, their behavior in the quantum tunneling regime, in which the TMDC or its heterostructure is optically excited and concurrently serves as a tunnel junction barrier, remains unexplored. Here, using the degree of freedom of a metallic probe in an atomic force microscope, we investigated both intralayer and interlayer excitons dynamics in TMDC heterobilayers via locally controlled junction current in a finely tuned sub-nanometer tip-sample cavity. Our tip-enhanced photoluminescence measurements reveal a significantly different exciton-quantum plasmon coupling for intralayer and interlayer excitons due to different orientation of the dipoles of the respective e-h pairs. Using a steady-state rate equation fit, we extracted field gradients, radiative and nonradiative relaxation rates for excitons in the quantum tunneling regime with and without junction current. Our results show that tip-induced radiative (nonradiative) relaxation of intralayer (interlayer) excitons becomes dominant in the quantum tunneling regime due to the Purcell effect. These findings have important implications for near-field probing of excitonic materials in the strong-coupling regime.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
NFTVis: Visual Analysis of NFT Performance
Authors:
Fan Yan,
Xumeng Wang,
Ketian Mao,
Wei Zhang,
Wei Chen
Abstract:
A non-fungible token (NFT) is a data unit stored on the blockchain. Nowadays, more and more investors and collectors (NFT traders), who participate in transactions of NFTs, have an urgent need to assess the performance of NFTs. However, there are two challenges for NFT traders when analyzing the performance of NFT. First, the current rarity models have flaws and are sometimes not convincing. In ad…
▽ More
A non-fungible token (NFT) is a data unit stored on the blockchain. Nowadays, more and more investors and collectors (NFT traders), who participate in transactions of NFTs, have an urgent need to assess the performance of NFTs. However, there are two challenges for NFT traders when analyzing the performance of NFT. First, the current rarity models have flaws and are sometimes not convincing. In addition, NFT performance is dependent on multiple factors, such as images (high-dimensional data), history transactions (network), and market evolution (time series). It is difficult to take comprehensive consideration and analyze NFT performance efficiently. To address these challenges, we propose NFTVis, a visual analysis system that facilitates assessing individual NFT performance. A new NFT rarity model is proposed to quantify NFTs with images. Four well-coordinated views are designed to represent the various factors affecting the performance of the NFT. Finally, we evaluate the usefulness and effectiveness of our system using two case studies and user studies.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Learning to Relate to Previous Turns in Conversational Search
Authors:
Fengran Mo,
Jian-Yun Nie,
Kaiyu Huang,
Kelong Mao,
Yutao Zhu,
Peng Li,
Yang Liu
Abstract:
Conversational search allows a user to interact with a search system in multiple turns. A query is strongly dependent on the conversation context. An effective way to improve retrieval effectiveness is to expand the current query with historical queries. However, not all the previous queries are related to, and useful for expanding the current query. In this paper, we propose a new method to selec…
▽ More
Conversational search allows a user to interact with a search system in multiple turns. A query is strongly dependent on the conversation context. An effective way to improve retrieval effectiveness is to expand the current query with historical queries. However, not all the previous queries are related to, and useful for expanding the current query. In this paper, we propose a new method to select relevant historical queries that are useful for the current query. To cope with the lack of labeled training data, we use a pseudo-labeling approach to annotate useful historical queries based on their impact on the retrieval results. The pseudo-labeled data are used to train a selection model. We further propose a multi-task learning framework to jointly train the selector and the retriever during fine-tuning, allowing us to mitigate the possible inconsistency between the pseudo labels and the changed retriever. Extensive experiments on four conversational search datasets demonstrate the effectiveness and broad applicability of our method compared with several strong baselines.
△ Less
Submitted 4 June, 2023;
originally announced June 2023.
-
A Study of Situational Reasoning for Traffic Understanding
Authors:
Jiarui Zhang,
Filip Ilievski,
Kaixin Ma,
Aravinda Kollaa,
Jonathan Francis,
Alessandro Oltramari
Abstract:
Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure. Understanding traffic situations requires a complex fusion of perceptual information with domain-specific and causal commonsense knowledge. Whereas prior work has provided benchmarks and methods for traffic monitoring, it remains unclear whether model…
▽ More
Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure. Understanding traffic situations requires a complex fusion of perceptual information with domain-specific and causal commonsense knowledge. Whereas prior work has provided benchmarks and methods for traffic monitoring, it remains unclear whether models can effectively align these information sources and reason in novel scenarios. To address this assessment gap, we devise three novel text-based tasks for situational reasoning in the traffic domain: i) BDD-QA, which evaluates the ability of Language Models (LMs) to perform situational decision-making, ii) TV-QA, which assesses LMs' abilities to reason about complex event causality, and iii) HDT-QA, which evaluates the ability of models to solve human driving exams. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work, based on natural language inference, commonsense knowledge-graph self-supervision, multi-QA joint training, and dense retrieval of domain information. We associate each method with a relevant knowledge source, including knowledge graphs, relevant benchmarks, and driving manuals. In extensive experiments, we benchmark various knowledge-aware methods against the three datasets, under zero-shot evaluation; we provide in-depth analyses of model performance on data partitions and examine model predictions categorically, to yield useful insights on traffic understanding, given different background knowledge and reasoning strategies.
△ Less
Submitted 15 July, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Conceptual Design Generation Using Large Language Models
Authors:
Kevin Ma,
Daniele Grandi,
Christopher McComb,
Kosa Goucher-Lambert
Abstract:
Concept generation is a creative step in the conceptual design phase, where designers often turn to brainstorming, mindmap**, or crowdsourcing design ideas to complement their own knowledge of the domain. Recent advances in natural language processing (NLP) and machine learning (ML) have led to the rise of Large Language Models (LLMs) capable of generating seemingly creative outputs from textual…
▽ More
Concept generation is a creative step in the conceptual design phase, where designers often turn to brainstorming, mindmap**, or crowdsourcing design ideas to complement their own knowledge of the domain. Recent advances in natural language processing (NLP) and machine learning (ML) have led to the rise of Large Language Models (LLMs) capable of generating seemingly creative outputs from textual prompts. The success of these models has led to their integration and application across a variety of domains, including art, entertainment, and other creative work. In this paper, we leverage LLMs to generate solutions for a set of 12 design problems and compare them to a baseline of crowdsourced solutions. We evaluate the differences between generated and crowdsourced design solutions through multiple perspectives, including human expert evaluations and computational metrics. Expert evaluations indicate that the LLM-generated solutions have higher average feasibility and usefulness while the crowdsourced solutions have more novelty. We experiment with prompt engineering and find that leveraging few-shot learning can lead to the generation of solutions that are more similar to the crowdsourced solutions. These findings provide insight into the quality of design solutions generated with LLMs and begins to evaluate prompt engineering techniques that could be leveraged by practitioners to generate higher-quality design solutions synergistically with LLMs.
△ Less
Submitted 30 May, 2023;
originally announced June 2023.
-
Electronic structure of few-layer black phosphorus from $μ$-ARPES
Authors:
Florian Margot,
Simone Lisi,
Irène Cucchi,
Edoardo Cappelli,
Andrew Hunter,
Ignacio Gutiérrez-Lezama,
KeYuan Ma,
Fabian von Rohr,
Christophe Berthod,
Francesco Petocchi,
Samuel Poncé,
Nicola Marzari,
Marco Gibertini,
Anna Tamai,
Alberto F. Morpurgo,
Felix Baumberger
Abstract:
Black phosphorus (BP) stands out among two-dimensional (2D) semiconductors because of its high mobility and thickness dependent direct band gap. However, the quasiparticle band structure of ultrathin BP has remained inaccessible to experiment thus far. Here we use a recently developed laser-based micro-focus angle resolved photoemission ($μ$-ARPES) system to establish the electronic structure of 2…
▽ More
Black phosphorus (BP) stands out among two-dimensional (2D) semiconductors because of its high mobility and thickness dependent direct band gap. However, the quasiparticle band structure of ultrathin BP has remained inaccessible to experiment thus far. Here we use a recently developed laser-based micro-focus angle resolved photoemission ($μ$-ARPES) system to establish the electronic structure of 2-9 layer BP from experiment. Our measurements unveil ladders of anisotropic, quantized subbands at energies that deviate from the scaling observed in conventional semiconductor quantum wells. We quantify the anisotropy of the effective masses and determine universal tight-binding parameters which provide an accurate description of the electronic structure for all thicknesses.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Associated Production of Neutrino and Dark Fermion at Future Lepton Colliders
Authors:
Shao-Feng Ge,
Kai Ma,
Xiao-Dong Ma,
Jie Sheng
Abstract:
Fermionic dark matter can be pairly produced and hence searched with missing energy at colliders. We extend such probe to the associated production of a neutrino and a dark sector fermion at the future $e^+ e^-$ colliders such as CEPC, FCC-ee, ILC, and CLIC. Two typical processes, the mono-photon and electron-positron pair productions associated with missing energy, can serve the purpose. While th…
▽ More
Fermionic dark matter can be pairly produced and hence searched with missing energy at colliders. We extend such probe to the associated production of a neutrino and a dark sector fermion at the future $e^+ e^-$ colliders such as CEPC, FCC-ee, ILC, and CLIC. Two typical processes, the mono-photon and electron-positron pair productions associated with missing energy, can serve the purpose. While the mono-photon search prevails at CEPC, FCC-ee, and ILC, the $e^+ e^- \met$ channel has more significant contributions at CLIC with much higher collision energy $\sqrt s$. The beam polarizations can help further suppressing the SM backgrounds to enhance the signal significance while differential cross sections can distinguish the Lorentz structure of various effective operators. The combined sensitivity can reach well above $1\tev$ at CEPC/FCC-ee and ILC while it further touches 30\,TeV at CLIC. Comparing with the updated results from the direct detection experiments (XENON1T, PandaX-II, PandaX-4T, LZ, and XENONnT), astrophysical $X/γ$-ray observations, and cosmological constraints for the sub-MeV absorption dark matter, the collider searches are actually more sensitive and hence can provide a complementary approach to addressing the dark fermions.
△ Less
Submitted 21 September, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast
Authors:
Guofan Fan,
Zekun Qi,
Wenkai Shi,
Kaisheng Ma
Abstract:
Geometry and color information provided by the point clouds are both crucial for 3D scene understanding. Two pieces of information characterize the different aspects of point clouds, but existing methods lack an elaborate design for the discrimination and relevance. Hence we explore a 3D self-supervised paradigm that can better utilize the relations of point cloud information. Specifically, we pro…
▽ More
Geometry and color information provided by the point clouds are both crucial for 3D scene understanding. Two pieces of information characterize the different aspects of point clouds, but existing methods lack an elaborate design for the discrimination and relevance. Hence we explore a 3D self-supervised paradigm that can better utilize the relations of point cloud information. Specifically, we propose a universal 3D scene pre-training framework via Geometry-Color Contrast (Point-GCC), which aligns geometry and color information using a Siamese network. To take care of actual application tasks, we design (i) hierarchical supervision with point-level contrast and reconstruct and object-level contrast based on the novel deep clustering module to close the gap between pre-training and downstream tasks; (ii) architecture-agnostic backbone to adapt for various downstream models. Benefiting from the object-level representation associated with downstream tasks, Point-GCC can directly evaluate model performance and the result demonstrates the effectiveness of our methods. Transfer learning results on a wide range of tasks also show consistent improvements across all datasets. e.g., new state-of-the-art object detection results on SUN RGB-D and S3DIS datasets. Codes will be released at https://github.com/Asterisci/Point-GCC.
△ Less
Submitted 1 June, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
RMTable2023 and PolSpectra2023: standards for reporting polarization and Faraday rotation measurements of radio sources
Authors:
C. L. Van Eck,
B. M. Gaensler,
S. Hutschenreuter,
J. Livingston,
Y. K. Ma,
C. J. Riseley,
A. J. M. Thomson,
B. Adebahr,
A. Basu,
M. Birkinshaw,
T. A. Ensslin,
G. Heald,
S. A. Mao,
N. M. McClure-Griffiths
Abstract:
Faraday rotation measures (RMs) have been used for many studies of cosmic magnetism, and in most cases having more RMs is beneficial for those studies. This has lead to development of RM surveys that have produced large catalogs, as well as meta-catalogs collecting RMs from many different publications. However, it has been difficult to take full advantage of all these RMs as the individual catalog…
▽ More
Faraday rotation measures (RMs) have been used for many studies of cosmic magnetism, and in most cases having more RMs is beneficial for those studies. This has lead to development of RM surveys that have produced large catalogs, as well as meta-catalogs collecting RMs from many different publications. However, it has been difficult to take full advantage of all these RMs as the individual catalogs have been published in many different places, and in many different formats. In addition, the polarization spectra used to determine these RMs are rarely published, limiting the ability to re-analyze data as new methods or additional observations become available.
We propose a standard convention for RM catalogs, RMTable2023, and a standard for source-integrated polarized spectra of radio sources, PolSpectra2023. These standards are intended to maximize the value and utility of these data for researchers and to make them easier to access. To demonstrate the use of the RMTable2023 standard, we have produced a consolidated catalog of 55 819 RMs collected from 42 published catalogs.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
ConvGQR: Generative Query Reformulation for Conversational Search
Authors:
Fengran Mo,
Kelong Mao,
Yutao Zhu,
Yihong Wu,
Kaiyu Huang,
Jian-Yun Nie
Abstract:
In conversational search, the user's real search intent for the current turn is dependent on the previous conversation history. It is challenging to determine a good search query from the whole conversation context. To avoid the expensive re-training of the query encoder, most existing methods try to learn a rewriting model to de-contextualize the current query by mimicking the manual query rewrit…
▽ More
In conversational search, the user's real search intent for the current turn is dependent on the previous conversation history. It is challenging to determine a good search query from the whole conversation context. To avoid the expensive re-training of the query encoder, most existing methods try to learn a rewriting model to de-contextualize the current query by mimicking the manual query rewriting. However, manually rewritten queries are not always the best search queries. Training a rewriting model on them would limit the model's ability to produce good search queries. Another useful hint is the potential answer to the question. In this paper, we propose ConvGQR, a new framework to reformulate conversational queries based on generative pre-trained language models (PLMs), one for query rewriting and another for generating potential answers. By combining both, ConvGQR can produce better search queries. In addition, to relate query reformulation to retrieval performance, we propose a knowledge infusion mechanism to optimize both query reformulation and retrieval. Extensive experiments on four conversational search datasets demonstrate the effectiveness of ConvGQR.
△ Less
Submitted 27 January, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study
Authors:
Muzhou Yu,
Linfeng Zhang,
Kaisheng Ma
Abstract:
The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as pruning, quantization and knowledge distillation have been proposed to compress neural networks and achieved significant breakthroughs. However, most of these c…
▽ More
The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as pruning, quantization and knowledge distillation have been proposed to compress neural networks and achieved significant breakthroughs. However, most of these compression methods focus on the architecture or the training method of neural networks but ignore the influence from data augmentation. In this paper, we revisit the usage of data augmentation in model compression and give a comprehensive study on the relation between model sizes and their optimal data augmentation policy. To sum up, we mainly have the following three observations: (A) Models in different sizes prefer data augmentation with different magnitudes. Hence, in iterative pruning, data augmentation with varying magnitudes leads to better performance than data augmentation with a consistent magnitude. (B) Data augmentation with a high magnitude may significantly improve the performance of large models but harm the performance of small models. Fortunately, small models can still benefit from strong data augmentations by firstly learning them with "additional parameters" and then discard these "additional parameters" during inference. (C) The prediction of a pre-trained large model can be utilized to measure the difficulty of data augmentation. Thus it can be utilized as a criterion to design better data augmentation policies. We hope this paper may promote more research on the usage of data augmentation in model compression.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
An Ensemble Learning Approach for Exercise Detection in Type 1 Diabetes Patients
Authors:
Ke Ma,
Hongkai Chen,
Shan Lin
Abstract:
Type 1 diabetes is a serious disease in which individuals are unable to regulate their blood glucose levels, leading to various medical complications. Artificial pancreas (AP) systems have been developed as a solution for type 1 diabetic patients to mimic the behavior of the pancreas and regulate blood glucose levels. However, current AP systems lack detection capabilities for exercise-induced glu…
▽ More
Type 1 diabetes is a serious disease in which individuals are unable to regulate their blood glucose levels, leading to various medical complications. Artificial pancreas (AP) systems have been developed as a solution for type 1 diabetic patients to mimic the behavior of the pancreas and regulate blood glucose levels. However, current AP systems lack detection capabilities for exercise-induced glucose intake, which can last up to 4 to 8 hours. This incapability can lead to hypoglycemia, which if left untreated, could have serious consequences, including death. Existing exercise detection methods are either limited to single sensor data or use inaccurate models for exercise detection, making them less effective in practice. In this work, we propose an ensemble learning framework that combines a data-driven physiological model and a Siamese network to leverage multiple physiological signal streams for exercise detection with high accuracy. To evaluate the effectiveness of our proposed approach, we utilized a public dataset with 12 diabetic patients collected from an 8-week clinical trial. Our approach achieves a true positive rate for exercise detection of 86.4% and a true negative rate of 99.1%, outperforming state-of-the-art solutions.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Deep Learning Empowered Type-II Codebook: New Paradigm for Enhancing CSI Feedback
Authors:
Ke Ma,
Yiliang Sang,
Yang Ming,
** Lian,
Chang Tian,
Zhaocheng Wang
Abstract:
Deep learning based channel state information (CSI) feedback in frequency division duplex systems has drawn much attention in both academia and industry. In this paper, we focus on integrating the Type-II codebook in the beyond fifth-generation (B5G) wireless systems with deep learning to enhance the performance of CSI feedback. In contrast to its counterpart in Release 16, the Type-II codebook in…
▽ More
Deep learning based channel state information (CSI) feedback in frequency division duplex systems has drawn much attention in both academia and industry. In this paper, we focus on integrating the Type-II codebook in the beyond fifth-generation (B5G) wireless systems with deep learning to enhance the performance of CSI feedback. In contrast to its counterpart in Release 16, the Type-II codebook in Release 17 (R17) exploits the angular-delay-domain partial reciprocity between uplink and downlink channels and selects part of angular-delay-domain ports for measuring and feeding back the downlink CSI, where the performance of the conventional deep learning methods is limited due to the deficiency of sparse structures. To address this issue, we propose the new paradigm of adopting deep learning to improve the performance of R17 Type-II codebook. Firstly, considering the relatively low signal-to-noise ratio of uplink channels, deep learning is utilized to refine the selection of the dominant angular-delay-domain ports, where the focal loss is harnessed to solve the class imbalance problem. Secondly, we propose to reconstruct the downlink CSI by way of deep learning based on the feedback of R17 Type-II codebook at the base station, where the information of sparse structures can be effectively leveraged. Finally, a weighted shortcut module is designed to facilitate the accurate reconstruction, and a two-stage loss function with the combination of the mean squared error and sum rate is proposed for adapting to actual multi-user scenarios. Simulation results demonstrate that our proposed angular-delay-domain port selection and CSI reconstruction paradigm can improve the sum rate performance by more than 10% compared with the traditional R17 Type-II codebook and deep learning benchmarks.
△ Less
Submitted 30 May, 2023; v1 submitted 14 May, 2023;
originally announced May 2023.
-
A Deep Dive into NFT Rug Pulls
Authors:
**tao Huang,
Ningyu He,
Kai Ma,
Jiang Xiao,
Haoyu Wang
Abstract:
NFT rug pull is one of the most prominent type of scam that the developers of a project abandon it and then run away with investors' funds. Although they have drawn attention from our community, to the best of our knowledge, the NFT rug pulls have not been systematically explored. To fill the void, this paper presents the first in-depth study of NFT rug pulls. Specifically, we first compile a list…
▽ More
NFT rug pull is one of the most prominent type of scam that the developers of a project abandon it and then run away with investors' funds. Although they have drawn attention from our community, to the best of our knowledge, the NFT rug pulls have not been systematically explored. To fill the void, this paper presents the first in-depth study of NFT rug pulls. Specifically, we first compile a list of 253 known NFT rug pulls as our initial ground truth, based on which we perform a pilot study, highlighting the key symptoms of NFT rug pulls. Then, we enforce a strict rule-based method to flag more rug pulled NFT projects in the wild, and have labelled 7,487 NFT rug pulls as our extended ground truth. Atop it, we have investigated the art of NFT rug pulls, with kinds of tricks including explicit ones that are embedded with backdoors, and implicit ones that manipulate the market. To release the expansion of the scam, we further design a prediction model to proactively identify the potential rug pull projects in an early stage ahead of the scam happens. We have implemented a prototype system deployed in the real-world setting for over 5 months. Our system has raised alarms for 7,821 NFT projects, by the time of this writing, which can work as a whistle blower that pinpoints rug pull scams timely, thus mitigating the impacts.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Knowledge-enhanced Agents for Interactive Text Games
Authors:
Prateek Chhikara,
Jiarui Zhang,
Filip Ilievski,
Jonathan Francis,
Kaixin Ma
Abstract:
Communication via natural language is a key aspect of machine intelligence, and it requires computational models to learn and reason about world concepts, with varying levels of supervision. Significant progress has been made on fully-supervised non-interactive tasks, such as question-answering and procedural text understanding. Yet, various sequential interactive tasks, as in text-based games, ha…
▽ More
Communication via natural language is a key aspect of machine intelligence, and it requires computational models to learn and reason about world concepts, with varying levels of supervision. Significant progress has been made on fully-supervised non-interactive tasks, such as question-answering and procedural text understanding. Yet, various sequential interactive tasks, as in text-based games, have revealed limitations of existing approaches in terms of coherence, contextual awareness, and their ability to learn effectively from the environment. In this paper, we propose a knowledge-injection framework for improved functional grounding of agents in text-based games. Specifically, we consider two forms of domain knowledge that we inject into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment. Our framework supports two representative model classes: reinforcement learning agents and language model agents. Furthermore, we devise multiple injection strategies for the above domain knowledge types and agent architectures, including injection via knowledge graphs and augmentation of the existing input encoding strategies. We experiment with four models on the 10 tasks in the ScienceWorld text-based game environment, to illustrate the impact of knowledge injection on various model configurations and challenging task settings. Our findings provide crucial insights into the interplay between task properties, model architectures, and domain knowledge for interactive contexts.
△ Less
Submitted 16 December, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Authors:
Kaixin Ma,
Hao Cheng,
Yu Zhang,
Xiaodong Liu,
Eric Nyberg,
Jianfeng Gao
Abstract:
The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transferability and scalability. In this work, we propose a modular retriever where individual modules correspond to key skills that c…
▽ More
The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transferability and scalability. In this work, we propose a modular retriever where individual modules correspond to key skills that can be reused across datasets. Our approach supports flexible skill configurations based on the target domain to boost performance. To mitigate task interference, we design a novel modularization parameterization inspired by sparse Transformer. We demonstrate that our model can benefit from self-supervised pretraining on Wikipedia and fine-tuning using multiple ODQA datasets, both in a multi-task fashion. Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.
△ Less
Submitted 26 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization
Authors:
Mu Li,
Kanglong Fan,
Kede Ma
Abstract:
Predicting human scanpaths when exploring panoramic videos is a challenging task due to the spherical geometry and the multimodality of the input, and the inherent uncertainty and diversity of the output. Most previous methods fail to give a complete treatment of these characteristics, and thus are prone to errors. In this paper, we present a simple new criterion for scanpath prediction based on p…
▽ More
Predicting human scanpaths when exploring panoramic videos is a challenging task due to the spherical geometry and the multimodality of the input, and the inherent uncertainty and diversity of the output. Most previous methods fail to give a complete treatment of these characteristics, and thus are prone to errors. In this paper, we present a simple new criterion for scanpath prediction based on principles from lossy data compression. This criterion suggests minimizing the expected code length of quantized scanpaths in a training set, which corresponds to fitting a discrete conditional probability model via maximum likelihood. Specifically, the probability model is conditioned on two modalities: a viewport sequence as the deformation-reduced visual input and a set of relative historical scanpaths projected onto respective viewports as the aligned path input. The probability model is parameterized by a product of discretized Gaussian mixture models to capture the uncertainty and the diversity of scanpaths from different users. Most importantly, the training of the probability model does not rely on the specification of "ground-truth" scanpaths for imitation learning. We also introduce a proportional-integral-derivative (PID) controller-based sampler to generate realistic human-like scanpaths from the learned probability model. Experimental results demonstrate that our method consistently produces better quantitative scanpath results in terms of prediction accuracy (by comparing to the assumed "ground-truths") and perceptual realism (through machine discrimination) over a wide range of prediction horizons. We additionally verify the perceptual realism improvement via a formal psychophysical experiment and the generalization improvement on several unseen panoramic video datasets.
△ Less
Submitted 4 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
CORSD: Class-Oriented Relational Self Distillation
Authors:
Muzhou Yu,
Sia Huat Tan,
Kailu Wu,
Runpei Dong,
Linfeng Zhang,
Kaisheng Ma
Abstract:
Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and int…
▽ More
Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling. Besides, the feature divergence of heterogeneous teacher-student architectures may lead to inaccurate relational knowledge transferring. In this work, we propose a novel training framework named Class-Oriented Relational Self Distillation (CORSD) to address the limitations. The trainable relation networks are designed to extract relation of structured data input, and they enable the whole model to better classify samples by transferring the relational knowledge from the deepest layer of the model to shallow layers. Besides, auxiliary classifiers are proposed to make relation networks capture class-oriented relation that benefits classification task. Experiments demonstrate that CORSD achieves remarkable improvements. Compared to baseline, 3.8%, 1.5% and 4.5% averaged accuracy boost can be observed on CIFAR100, ImageNet and CUB-200-2011, respectively.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
LCAUnet: A skin lesion segmentation network with enhanced edge and body fusion
Authors:
Qisen Ma,
Keming Mao,
Gao Wang,
Lisheng Xu,
Yuhai Zhao
Abstract:
Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representa…
▽ More
Accurate segmentation of skin lesions in dermatoscopic images is crucial for the early diagnosis of skin cancer and improving the survival rate of patients. However, it is still a challenging task due to the irregularity of lesion areas, the fuzziness of boundaries, and other complex interference factors. In this paper, a novel LCAUnet is proposed to improve the ability of complementary representation with fusion of edge and body features, which are often paid little attentions in traditional methods. First, two separate branches are set for edge and body segmentation with CNNs and Transformer based architecture respectively. Then, LCAF module is utilized to fuse feature maps of edge and body of the same level by local cross-attention operation in encoder stage. Furthermore, PGMF module is embedded for feature integration with prior guided multi-scale adaption. Comprehensive experiments on public available dataset ISIC 2017, ISIC 2018, and PH2 demonstrate that LCAUnet outperforms most state-of-the-art methods. The ablation studies also verify the effectiveness of the proposed fusion techniques.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Authors:
Ronghuan Wu,
Wanchao Su,
Kede Ma,
**g Liao
Abstract:
Scalable Vector Graphics (SVG) is a popular vector image format that offers good support for interactivity and animation. Despite its appealing characteristics, creating custom SVG content can be challenging for users due to the steep learning curve required to understand SVG grammars or get familiar with professional editing software. Recent advancements in text-to-image generation have inspired…
▽ More
Scalable Vector Graphics (SVG) is a popular vector image format that offers good support for interactivity and animation. Despite its appealing characteristics, creating custom SVG content can be challenging for users due to the steep learning curve required to understand SVG grammars or get familiar with professional editing software. Recent advancements in text-to-image generation have inspired researchers to explore vector graphics synthesis using either image-based methods (i.e., text -> raster image -> vector graphics) combining text-to-image generation models with image vectorization, or language-based methods (i.e., text -> vector graphics script) through pretrained large language models. However, these methods still suffer from limitations in terms of generation quality, diversity, and flexibility. In this paper, we introduce IconShop, a text-guided vector icon synthesis method using autoregressive transformers. The key to success of our approach is to sequentialize and tokenize SVG paths (and textual descriptions as guidance) into a uniquely decodable token sequence. With that, we are able to fully exploit the sequence learning power of autoregressive transformers, while enabling both unconditional and text-conditioned icon synthesis. Through standard training to predict the next token on a large-scale vector icon dataset accompanied by textural descriptions, the proposed IconShop consistently exhibits better icon synthesis capability than existing image-based and language-based methods both quantitatively and qualitatively. Meanwhile, we observe a dramatic improvement in generation diversity, which is validated by the objective Uniqueness and Novelty measures. More importantly, we demonstrate the flexibility of IconShop with multiple novel icon synthesis tasks, including icon editing, icon interpolation, icon semantic combination, and icon design auto-suggestion.
△ Less
Submitted 6 June, 2023; v1 submitted 27 April, 2023;
originally announced April 2023.