-
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
Authors:
Hok-Shing Lau,
Mark Huntly,
Nathon Morgan,
Adesua Iyenoma,
Biao Zeng,
Tim Bashford
Abstract:
Speech contains information that is clinically relevant to some diseases, which has the potential to be used for health assessment. Recent work shows an interest in applying deep learning algorithms, especially pretrained large speech models to the applications of Automatic Speech Assessment. One question that has not been explored is how these models output the results based on their inputs. In t…
▽ More
Speech contains information that is clinically relevant to some diseases, which has the potential to be used for health assessment. Recent work shows an interest in applying deep learning algorithms, especially pretrained large speech models to the applications of Automatic Speech Assessment. One question that has not been explored is how these models output the results based on their inputs. In this work, we train and compare two configurations of Audio Spectrogram Transformer in the context of Voice Disorder Detection and apply the attention rollout method to produce model relevance maps, the computed relevance of the spectrogram regions when the model makes predictions. We use these maps to analyse how models make predictions in different conditions and to show that the spread of attention is reduced as a model is finetuned, and the model attention is concentrated on specific phoneme regions.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Factual Dialogue Summarization via Learning from Large Language Models
Authors:
Rongxin Zhu,
Jey Han Lau,
Jianzhong Qi
Abstract:
Factual consistency is an important quality in dialogue summarization. Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries compared to those by smaller pretrained language models, but they face deployment challenges in real-world applications due to privacy or resource constraints. In this paper, we investigate the use of symbolic knowl…
▽ More
Factual consistency is an important quality in dialogue summarization. Large language model (LLM)-based automatic text summarization models generate more factually consistent summaries compared to those by smaller pretrained language models, but they face deployment challenges in real-world applications due to privacy or resource constraints. In this paper, we investigate the use of symbolic knowledge distillation to improve the factual consistency of smaller pretrained models for dialogue summarization. We employ zero-shot learning to extract symbolic knowledge from LLMs, generating both factually consistent (positive) and inconsistent (negative) summaries. We then apply two contrastive learning objectives on these summaries to enhance smaller summarization models. Experiments with BART, PEGASUS, and Flan-T5 indicate that our approach surpasses strong baselines that rely on complex data augmentation strategies. Our approach achieves better factual consistency while maintaining coherence, fluency, and relevance, as confirmed by various automatic evaluation metrics. We also provide access to the data and code to facilitate future research.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Evaluating Transparency of Machine Generated Fact Checking Explanations
Authors:
Rui Xing,
Timothy Baldwin,
Jey Han Lau
Abstract:
An important factor when it comes to generating fact-checking explanations is the selection of evidence: intuitively, high-quality explanations can only be generated given the right evidence. In this work, we investigate the impact of human-curated vs. machine-selected evidence for explanation generation using large language models. To assess the quality of explanations, we focus on transparency (…
▽ More
An important factor when it comes to generating fact-checking explanations is the selection of evidence: intuitively, high-quality explanations can only be generated given the right evidence. In this work, we investigate the impact of human-curated vs. machine-selected evidence for explanation generation using large language models. To assess the quality of explanations, we focus on transparency (whether an explanation cites sources properly) and utility (whether an explanation is helpful in clarifying a claim). Surprisingly, we found that large language models generate similar or higher quality explanations using machine-selected evidence, suggesting carefully curated evidence (by humans) may not be necessary. That said, even with the best model, the generated explanations are not always faithful to the sources, suggesting further room for improvement in explanation generation for fact-checking.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Sequential giant planet formation initiated by disc substructure
Authors:
Tommy Chi Ho Lau,
Til Birnstiel,
Joanna Drążkowska,
Sebastian Markus Stammler
Abstract:
Planet formation models are necessary to understand the origins of diverse planetary systems. Circumstellar disc substructures have been proposed as preferred locations of planet formation but a complete formation scenario has not been covered by a single model so far. We aim to study the formation of giant planets facilitated by disc substructure and starting with sub-micron-sized dust. We connec…
▽ More
Planet formation models are necessary to understand the origins of diverse planetary systems. Circumstellar disc substructures have been proposed as preferred locations of planet formation but a complete formation scenario has not been covered by a single model so far. We aim to study the formation of giant planets facilitated by disc substructure and starting with sub-micron-sized dust. We connect dust coagulation and drift, planetesimal formation, $N$-body gravity, pebble accretion, planet migration, planetary gas accretion and gap opening in one consistent modelling framework. We find rapid formation of multiple gas giants from the initial disc substructure. The migration trap near the substructure allows the formation of cold gas giants. A new pressure maximum is created at the outer edge of the planetary gap, which triggers the next generation of planet formation resulting in a compact chain of giant planets. A high planet formation efficiency is achieved as the first gas giants are effective in preventing dust from drifting further inwards, which preserves materials for planet formation. Sequential planet formation is a promising framework to explain the formation of chains of gas and ice giants.
△ Less
Submitted 3 July, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
MDeRainNet: An Efficient Neural Network for Rain Streak Removal from Macro-pixel Images
Authors:
Tao Yan,
Weijiang He,
Chenglong Wang,
Xiangjie Zhu,
Yinghui Wang,
Rynson W. H. Lau
Abstract:
Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benef…
▽ More
Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benefit rain streak detection and removal. However, existing LF image rain removal methods either do not fully exploit the global correlations of 4D LF data or only utilize partial sub-views, resulting in sub-optimal rain removal performance and no-equally good quality for all de-rained sub-views. In this paper, we propose an efficient network, called MDeRainNet, for rain streak removal from LF images. The proposed network adopts a multi-scale encoder-decoder architecture, which directly works on Macro-pixel images (MPIs) to improve the rain removal performance. To fully model the global correlation between the spatial and the angular information, we propose an Extended Spatial-Angular Interaction (ESAI) module to merge them, in which a simple and effective Transformer-based Spatial-Angular Interaction Attention (SAIA) block is also proposed for modeling long-range geometric correlations and making full use of the angular information. Furthermore, to improve the generalization performance of our network on real-world rainy scenes, we propose a novel semi-supervised learning framework for our MDeRainNet, which utilizes multi-level KL loss to bridge the domain gap between features of synthetic and real-world rain streaks and introduces colored-residue image guided contrastive regularization to reconstruct rain-free images. Extensive experiments conducted on synthetic and real-world LFIs demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Save It for the "Hot" Day: An LLM-Empowered Visual Analytics System for Heat Risk Management
Authors:
Haobo Li,
Wong Kam-Kwai,
Yan Luo,
Juntong Chen,
Chengzhong Liu,
Yaxuan Zhang,
Alexis Kai Hon Lau,
Huamin Qu,
Dongyu Liu
Abstract:
The escalating frequency and intensity of heat-related climate events, particularly heatwaves, emphasize the pressing need for advanced heat risk management strategies. Current approaches, primarily relying on numerical models, face challenges in spatial-temporal resolution and in capturing the dynamic interplay of environmental, social, and behavioral factors affecting heat risks. This has led to…
▽ More
The escalating frequency and intensity of heat-related climate events, particularly heatwaves, emphasize the pressing need for advanced heat risk management strategies. Current approaches, primarily relying on numerical models, face challenges in spatial-temporal resolution and in capturing the dynamic interplay of environmental, social, and behavioral factors affecting heat risks. This has led to difficulties in translating risk assessments into effective mitigation actions. Recognizing these problems, we introduce a novel approach leveraging the burgeoning capabilities of Large Language Models (LLMs) to extract rich and contextual insights from news reports. We hence propose an LLM-empowered visual analytics system, Havior, that integrates the precise, data-driven insights of numerical models with nuanced news report information. This hybrid approach enables a more comprehensive assessment of heat risks and better identification, assessment, and mitigation of heat-related threats. The system incorporates novel visualization designs, such as "thermoglyph" and news glyph, enhancing intuitive understanding and analysis of heat risks. The integration of LLM-based techniques also enables advanced information retrieval and semantic knowledge extraction that can be guided by experts' analytics needs. Our case studies on two cities that faced significant heatwave events and interviews with five experts have demonstrated the usefulness of our system in providing in-depth and actionable insights for heat risk management.
△ Less
Submitted 7 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors
Authors:
Tianyu Huang,
Yihan Zeng,
Hui Li,
Wangmeng Zuo,
Rynson W. H. Lau
Abstract:
Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the…
▽ More
Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the simulated results would become unnatural. The latter tends to formulate the video with minor motions and discontinuous frames, due to the absence of physical constraints in deformation learning. We think that video generative models are trained with real-world captured data, capable of judging physical phenomenon in simulation environments. To this end, we propose DreamPhysics in this work, which estimates physical properties of 3D Gaussian Splatting with video diffusion priors. DreamPhysics supports both image- and text-conditioned guidance, optimizing physical parameters via score distillation sampling with frame interpolation and log gradient. Based on a material point method simulator with proper physical parameters, our method can generate 4D content with realistic motions. Experimental results demonstrate that, by distilling the prior knowledge of video diffusion models, inaccurate physical properties can be gradually refined for high-quality simulation. Codes are released at: https://github.com/tyhuang0428/DreamPhysics.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
WTTFNet: A Weather-Time-Trajectory Fusion Network for Pedestrian Trajectory Prediction in Urban Complex
Authors:
Ho Chun Wu,
Esther Hoi Shan Lau,
Paul Yuen,
Kevin Hung,
John Kwok Tai Chui,
Andrew Kwok Fai Lui
Abstract:
Pedestrian trajectory modelling in an urban complex is challenging because pedestrians can have many possible destinations, such as shops, escalators, and attractions. Moreover, weather and time-of-day may affect pedestrian behavior. In this paper, a new weather-time-trajectory fusion network (WTTFNet) is proposed to improve the performance of baseline deep neural network architecture. By incorpor…
▽ More
Pedestrian trajectory modelling in an urban complex is challenging because pedestrians can have many possible destinations, such as shops, escalators, and attractions. Moreover, weather and time-of-day may affect pedestrian behavior. In this paper, a new weather-time-trajectory fusion network (WTTFNet) is proposed to improve the performance of baseline deep neural network architecture. By incorporating weather and time-of-day information as an embedding structure, a novel WTTFNet based on gate multimodal unit is used to fuse the multimodal information and deep representation of trajectories. A joint loss function based on focal loss is used to co-optimize both the deep trajectory features and final classifier, which helps to improve the accuracy in predicting the intended destination of pedestrians and hence the trajectories under possible scenarios of class imbalances. Experimental results using the Osaka Asia and Pacific Trade Center (ATC) dataset shows improved performance of the proposed approach over state-of-the-art algorithms by 23.67% increase in classification accuracy, 9.16% and 7.07% reduction of average and final displacement error. The proposed approach may serve as an attractive approach for improving existing baseline trajectory prediction models when they are applied to scenarios with influences of weather-time conditions. It can be employed in numerous applications such as pedestrian facility engineering, public space development and technology-driven retail.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Color Shift Estimation-and-Correction for Image Enhancement
Authors:
Yiyu Li,
Ke Xu,
Gerhard Petrus Hancke,
Rynson W. H. Lau
Abstract:
Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and under-exposed regions display opposite color tone distribution shifts with res…
▽ More
Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and under-exposed regions display opposite color tone distribution shifts with respect to each other, which may not be easily normalized in joint modeling as they usually do not have ``normal-exposed'' regions/pixels as reference. In this paper, we propose a novel method to enhance images with both over- and under-exposures by learning to estimate and correct such color shifts. Specifically, we first derive the color feature maps of the brightened and darkened versions of the input image via a UNet-based network, followed by a pseudo-normal feature generator to produce pseudo-normal color feature maps. We then propose a novel COlor Shift Estimation (COSE) module to estimate the color shifts between the derived brightened (or darkened) color feature maps and the pseudo-normal color feature maps. The COSE module corrects the estimated color shifts of the over- and under-exposed regions separately. We further propose a novel COlor MOdulation (COMO) module to modulate the separately corrected colors in the over- and under-exposed regions to produce the enhanced image. Comprehensive experiments show that our method outperforms existing approaches. Project webpage: https://github.com/yiyulics/CSEC.
△ Less
Submitted 29 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Automated Conversion of Static to Dynamic Scheduler via Natural Language
Authors:
Paul Mingzheng Tang,
Kenji Kah Hoe Leong,
Nowshad Shaik,
Hoong Chuin Lau
Abstract:
In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect chan…
▽ More
In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect changes in the scheduling rules. Furthermore, it may be necessary to turn a static model into a dynamic one in order to cope with disturbances in the environment. In this paper, we propose a Retrieval-Augmented Generation (RAG) based LLM model to automate the process of implementing constraints for Dynamic Scheduling (RAGDyS), without seeking help from an optimization modeling expert. Our framework aims to minimize technical complexities related to mathematical modelling and computational workload for end-users, thereby allowing end-users to quickly obtain a new schedule close to the original schedule with changes reflected by natural language constraint descriptions.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Randomness and Retention: Using Weak Mean Motion Resonances to Constrain Neptune's Late-Stage Migration
Authors:
Arcelia Hermosillo Ruiz,
Harriet C. P. Lau,
Ruth Murray-Clay
Abstract:
Planet-planetesimal interactions cause a planet to migrate, manifesting as a random walk in semi-major axis. In models for Neptune's migration involving a gravitational upheaval, this planetesimal-driven migration is a side-effect of the dynamical friction required to damp Neptune's orbital eccentricitiy. This migration is noisy, potentially causing Trans Neptunian Objects (TNOs) in mean motion re…
▽ More
Planet-planetesimal interactions cause a planet to migrate, manifesting as a random walk in semi-major axis. In models for Neptune's migration involving a gravitational upheaval, this planetesimal-driven migration is a side-effect of the dynamical friction required to damp Neptune's orbital eccentricitiy. This migration is noisy, potentially causing Trans Neptunian Objects (TNOs) in mean motion resonance to be lost. With Nbody simulations, we validate a previously-derived analytic model for resonance retention and determine unknown coefficients. We identify the impact of random-walk (noisy) migration on resonance retention for resonances up to fourth order lying between 39 au and 75 au. Using a population estimate for the weak 7:3 resonance from the well-characterized Outer Solar System Origins Survey (OSSOS), we rule out two cases: (1) a planetesimal disk distributed between 13.3 and 39.9 au with $\gtrsim$ 30 Earth masses in today's size distribution and $T_{\rm mig} \gtrsim$ 40Myr and (2) a top-heavy size distribution with $\gtrsim$ 2000 Pluto-sized TNOs and $T_{\rm mig} \gtrsim$ 10Myr, where $T_{\rm mig}$ is Neptune's migration timescale. We find that low-eccentricity TNOs in the heavily populated 5:2 resonance are easily lost due to noisy migration. Improved observations of the low-eccentricity region of the 5:2 resonance and of weak mean motion resonances with Rubin Observatory's Legacy Survey of Space and Time (LSST) will provide better population estimates, allowing for comparison with our model's retention fractions and providing strong evidence for or against Neptune's random interactions with planetesimals.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Overcoming challenges of translating deep-learning models for glioblastoma: the ZGBM consortium
Authors:
Haris Shuaib,
Gareth J Barker,
Peter Sasieni,
Enrico De Vita,
Alysha Chelliah,
Roman Andrei,
Keyoumars Ashkan,
Erica Beaumont,
Lucy Brazil,
Chris Rowland-Hill,
Yue Hui Lau,
Aysha Luis,
James Powell,
Angela Swampillai,
Sean Tenant,
Stefanie C Thust,
Stephen Wastling,
Tom Young,
Thomas C Booth
Abstract:
Objective: To report imaging protocol and scheduling variance in routine care of glioblastoma patients in order to demonstrate challenges of integrating deep-learning models in glioblastoma care pathways. Additionally, to understand the most common imaging studies and image contrasts to inform the development of potentially robust deep-learning models. Methods: MR imaging data were analysed from a…
▽ More
Objective: To report imaging protocol and scheduling variance in routine care of glioblastoma patients in order to demonstrate challenges of integrating deep-learning models in glioblastoma care pathways. Additionally, to understand the most common imaging studies and image contrasts to inform the development of potentially robust deep-learning models. Methods: MR imaging data were analysed from a random sample of five patients from the prospective cohort across five participating sites of the ZGBM consortium. Reported clinical and treatment data alongside DICOM header information were analysed to understand treatment pathway imaging schedules. Results: All sites perform all structural imaging at every stage in the pathway except for the presurgical study, where in some sites only contrast-enhanced T1-weighted imaging is performed. Diffusion MRI is the most common non-structural imaging type, performed at every site. Conclusion: The imaging protocol and scheduling varies across the UK, making it challenging to develop machine-learning models that could perform robustly at other centres. Structural imaging is performed most consistently across all centres. Advances in knowledge: Successful translation of deep-learning models will likely be based on structural post-treatment imaging unless there is significant effort made to standardise non-structural or peri-operative imaging protocols and schedules.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation
Authors:
Yihao Zhou,
Timothy Tin-Yan Lee,
Kelly Ka-Lee Lai,
Chonglin Wu,
Hin Ting Lau,
De Yang,
Chui-Yi Chan,
Winnie Chiu-Wing Chu,
Jack Chun-Yiu Cheng,
Tsz-** Lam,
Yong-** Zheng
Abstract:
The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea…
▽ More
The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment.
△ Less
Submitted 6 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Region-Aware Color Smudging
Authors:
Ying Jiang,
Pengfei Xu,
Congyi Zhang,
Hongbo Fu,
Henry Lau,
Wen** Wang
Abstract:
Color smudge operations from digital painting software enable users to create natural shading effects in high-fidelity paintings by interactively mixing colors. To precisely control results in traditional painting software, users tend to organize flat-filled color regions in multiple layers and smudge them to generate different color gradients. However, the requirement to carefully deal with regio…
▽ More
Color smudge operations from digital painting software enable users to create natural shading effects in high-fidelity paintings by interactively mixing colors. To precisely control results in traditional painting software, users tend to organize flat-filled color regions in multiple layers and smudge them to generate different color gradients. However, the requirement to carefully deal with regions makes the smudging process time-consuming and laborious, especially for non-professional users. This motivates us to investigate how to infer user-desired smudging effects when users smudge over regions in a single layer. To investigate improving color smudge performance, we first conduct a formative study. Following the findings of this study, we design SmartSmudge, a novel smudge tool that offers users dynamical smudge brushes and real-time region selection for easily generating natural and efficient shading effects. We demonstrate the efficiency and effectiveness of the proposed tool via a user study and quantitative analysis
△ Less
Submitted 4 May, 2024;
originally announced May 2024.
-
Light Cone Cancellation for Variational Quantum Eigensolver Ansatz
Authors:
Xinjian Yan,
Xinwei Lee,
Ningyi Xie,
Yoshiyuki Saito,
Leo Kurosawa,
Nobuyoshi Asai,
Dongsheng Cai,
HoongChuin Lau
Abstract:
Variational Quantum Algorithms (VQAs) represent a class of algorithms that utilize a hybrid approach, combining classical and quantum computing techniques. In this approach, classical computers serve as optimizers that update circuit parameters to find approximate solutions to complex problems. In this study, we apply a method known as Light Cone Cancellation (LCC) to optimize variational circuits…
▽ More
Variational Quantum Algorithms (VQAs) represent a class of algorithms that utilize a hybrid approach, combining classical and quantum computing techniques. In this approach, classical computers serve as optimizers that update circuit parameters to find approximate solutions to complex problems. In this study, we apply a method known as Light Cone Cancellation (LCC) to optimize variational circuits, effectively reducing the required number of qubits and gates for circuit simulation. We then evaluate the performance of LCC one of the VQAs -- the Variational Quantum Eigensolver (VQE) -- to address the Max-Cut problem. Compared with the Quantum Approximate Optimization Algorithm (QAOA), VQE offers greater degrees of freedom at lower circuit depths. By applying LCC to VQE, we can shift the complexity of circuit simulation from the number of qubits to the number of edges in the graph, i.e., from exponential time to polynomial time. This enables us to solve large problems up to 50 vertices, without actually simulating the entire circuit. From our simulation in a 7-qubit and a 27-qubit noisy devices, we show that LCC yields higher approximation ratios than those cases without LCC, implying that the effect of noise is reduced when LCC is applied.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Quantum Relaxation for Solving Multiple Knapsack Problems
Authors:
Yan **,
Monit Sharma,
Hoong Chuin Lau,
Rudy Raymond
Abstract:
Combinatorial problems are a common challenge in business, requiring finding optimal solutions under specified constraints. While significant progress has been made with variational approaches such as QAOA, most problems addressed are unconstrained (such as Max-Cut). In this study, we investigate a hybrid quantum-classical method for constrained optimization problems, particularly those with knaps…
▽ More
Combinatorial problems are a common challenge in business, requiring finding optimal solutions under specified constraints. While significant progress has been made with variational approaches such as QAOA, most problems addressed are unconstrained (such as Max-Cut). In this study, we investigate a hybrid quantum-classical method for constrained optimization problems, particularly those with knapsack constraints that occur frequently in financial and supply chain applications. Our proposed method relies firstly on relaxations to local quantum Hamiltonians, defined through commutative maps. Drawing inspiration from quantum random access code (QRAC) concepts, particularly Quantum Random Access Optimizer (QRAO), we explore QRAO's potential in solving large constrained optimization problems. We employ classical techniques like Linear Relaxation as a presolve mechanism to handle constraints and cope further with scalability. We compare our approach with QAOA and present the final results for a real-world procurement optimization problem: a significant sized multi-knapsack-constrained problem.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Anomalous Long-Distance Coherence in Critically-Driven Cavity Magnonics
Authors:
Ying Yang,
Jiguang Yao,
Yang Xiao,
Pak-Tik Fong,
Hoi-Kwan Lau,
C. -M. Hu
Abstract:
Develo** quantum networks necessitates coherently connecting distant systems via remote strong coupling. Here, we demonstrate long-distance coherence in cavity magnonics operating in the linear regime. By locally setting the cavity near critical coupling with travelling photons, non-local magnon-photon coherence is established via strong coupling over a 2-meter distance. We observe two anomalies…
▽ More
Develo** quantum networks necessitates coherently connecting distant systems via remote strong coupling. Here, we demonstrate long-distance coherence in cavity magnonics operating in the linear regime. By locally setting the cavity near critical coupling with travelling photons, non-local magnon-photon coherence is established via strong coupling over a 2-meter distance. We observe two anomalies in this long-distance coherence: first, the coupling strength oscillates twice the period of conventional photon-mediated couplings; second, clear mode splitting is observed within the cavity linewidth. Both effects cannot be explained by conventional coupled-mode theory, which reveal the tip of an iceberg of photon-mediated coupling in systems under critical driving. Our work shows the potential of using critical phenomena for harnessing long-distance coherence in distributed systems.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer
Authors:
Fengtao Zhou,
Yingxue Xu,
Yanfen Cui,
Shenyan Zhang,
Yun Zhu,
Weiyang He,
Jiguang Wang,
Xin Wang,
Ronald Chan,
Louis Ho Shing Lau,
Chu Han,
Dafu Zhang,
Zhenhui Li,
Hao Chen
Abstract:
Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among…
▽ More
Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among patients, with a considerable subset displaying treatment resistance. Ineffective NACT not only leads to adverse effects but also misses the optimal therapeutic window, resulting in lower survival rate. However, existing multimodal learning methods assume the availability of all modalities for each patient, which does not align with the reality of clinical practice. The limited availability of modalities for each patient would cause information loss, adversely affecting predictive accuracy. In this study, we propose an incomplete multimodal data integration framework for GC (iMD4GC) to address the challenges posed by incomplete multimodal data, enabling precise response prediction and survival analysis. Specifically, iMD4GC incorporates unimodal attention layers for each modality to capture intra-modal information. Subsequently, the cross-modal interaction layers explore potential inter-modal interactions and capture complementary information across modalities, thereby enabling information compensation for missing modalities. To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis. The scale of our datasets is significantly larger than previous studies. The iMD4GC achieved impressive performance with an 80.2% AUC on GastricRes, 71.4% C-index on GastricSur, and 66.1% C-index on TCGA-STAD, significantly surpassing other compared methods.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Giant planet formation in the solar system
Authors:
Anuja Raorane,
Ramon Brasser,
Soko Matsumura,
Tommy Chi Ho Lau,
Man Hoi Lee,
Audrey Bouvier
Abstract:
The formation history of Jupiter has been of interest due to its ability to shape the solar system's history. Yet little attention has been paid to the formation and growth of Saturn and the other giant planets. Here, we explore the implications of the simplest disc and pebble accretion model with steady-state accretion on the formation of giant planets in the solar system through N-body simulatio…
▽ More
The formation history of Jupiter has been of interest due to its ability to shape the solar system's history. Yet little attention has been paid to the formation and growth of Saturn and the other giant planets. Here, we explore the implications of the simplest disc and pebble accretion model with steady-state accretion on the formation of giant planets in the solar system through N-body simulations. We conducted a statistical survey of different disc parameters and initial conditions of the protoplanetary disc to establish which combination best reproduces the present outer solar system. We examined the effect of the initial planetesimal disc mass, the number of planetesimals and their size-frequency distribution slope, pebble accretion prescription, and sticking efficiency on the likelihood of forming gas giants and their orbital distribution. The results reveal that the accretion sticking efficiency is the most sensitive parameter for controlling the final masses and number of giant planets. We have been unable to replicate the formation of all three types of giant planets in the solar system in a single simulation. The probability distribution of the final location of the giant planets is approximately constant in $\log r$, suggesting there is a slight preference for formation closer to the Sun but no preference for more massive planets to form closer. The eccentricity distribution has a higher mean for more massive planets, indicating that systems with more massive planets are more violent. The formation timescales of the cores of the gas giants are distinct, suggesting that they formed sequentially.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Quantum-Enhanced Simulation-Based Optimization for Newsvendor Problems
Authors:
Monit Sharma,
Hoong Chuin Lau,
Rudy Raymond
Abstract:
Simulation-based optimization is a widely used method to solve stochastic optimization problems. This method aims to identify an optimal solution by maximizing the expected value of the objective function. However, due to its computational complexity, the function cannot be accurately evaluated directly, hence it is estimated through simulation. Exploiting the enhanced efficiency of Quantum Amplit…
▽ More
Simulation-based optimization is a widely used method to solve stochastic optimization problems. This method aims to identify an optimal solution by maximizing the expected value of the objective function. However, due to its computational complexity, the function cannot be accurately evaluated directly, hence it is estimated through simulation. Exploiting the enhanced efficiency of Quantum Amplitude Estimation (QAE) compared to classical Monte Carlo simulation, it frequently outpaces classical simulation-based optimization, resulting in notable performance enhancements in various scenarios. In this work, we make use of a quantum-enhanced algorithm for simulation-based optimization and apply it to solve a variant of the classical Newsvendor problem which is known to be NP-hard. Such problems provide the building block for supply chain management, particularly in inventory management and procurement optimization under risks and uncertainty
△ Less
Submitted 22 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Authors:
Haoyuan Wang,
Wenbo Hu,
Lei Zhu,
Rynson W. H. Lau
Abstract:
Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D enviro…
▽ More
Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D environmental map, which assumes infinite lights only. Observing the superiority of NeRFs in recovering radiance fields, we propose a novel 5D Neural Plenoptic Function (NeP) based on NeRFs and ray tracing, such that more accurate lighting-object interactions can be formulated via the rendering equation. We also design a material-aware cone sampling strategy to efficiently integrate lights inside the BRDF lobes with the help of pre-filtered radiance fields. Our method has two stages: the geometry of the target object and the pre-filtered environmental radiance fields are reconstructed in the first stage, and materials of the target object are estimated in the second stage with the proposed NeP and material-aware cone sampling strategy. Extensive experiments on the proposed real-world and synthetic datasets demonstrate that our method can reconstruct high-fidelity geometry/materials of challenging glossy objects with complex lighting interactions from nearby objects. Project webpage: https://whyy.site/paper/nep
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars
Authors:
Zhenwei Wang,
Tengfei Wang,
Gerhard Hancke,
Ziwei Liu,
Rynson W. H. Lau
Abstract:
Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D gener…
▽ More
Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D generation. ThemeStation synthesizes customized 3D assets based on given few exemplars with two goals: 1) unity for generating 3D assets that thematically align with the given exemplars and 2) diversity for generating 3D assets with a high degree of variations. To this end, we design a two-stage framework that draws a concept image first, followed by a reference-informed 3D modeling stage. We propose a novel dual score distillation (DSD) loss to jointly leverage priors from both the input exemplars and the synthesized concept image. Extensive experiments and user studies confirm that ThemeStation surpasses prior works in producing diverse theme-aware 3D models with impressive quality. ThemeStation also enables various applications such as controllable 3D-to-3D generation.
△ Less
Submitted 15 May, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Detecting quantum chaos via pseudo-entropy and negativity
Authors:
Song He,
Pak Hang Chris Lau,
Long Zhao
Abstract:
Quantum informatic quantities such as entanglement entropy are useful in detecting quantum phase transitions. Recently, a new entanglement measure called pseudo-entropy was proposed which is a generalization of the more well-known entanglement entropy. It has many nice properties and is useful in the study of post-selection measurements. In this paper, one of our goals is to explore the properties…
▽ More
Quantum informatic quantities such as entanglement entropy are useful in detecting quantum phase transitions. Recently, a new entanglement measure called pseudo-entropy was proposed which is a generalization of the more well-known entanglement entropy. It has many nice properties and is useful in the study of post-selection measurements. In this paper, one of our goals is to explore the properties of pseudo-entropy and study the effectiveness of it as a quantum chaos diagnostic, i.e. as a tool to distinguish between chaotic and integrable systems. Using various variants of the SYK model, we study the signal of quantum chaos captured in the pseudo-entropy and relate it to the spectral form factor (SFF) and local operator entanglement (LOE). We also explore another quantity called the negativity of entanglement which is a useful entanglement measure for a mixed state. We generalized it to accommodate the transition matrix and called it pseudo-negativity in analogy to pseudo-entropy. We found that it also nicely captures the spectral properties of a chaotic system and hence also plays a role as a tool of quantum chaos diagnostic.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Authors:
Yuhao Liu,
Zhanghan Ke,
Fang Liu,
Nanxuan Zhao,
Rynson W. H. Lau
Abstract:
Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity result…
▽ More
Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity results across a variety of low-level tasks. Specifically, we first propose a lightweight Task-Plugin module with a dual branch design to provide task-specific priors, guiding the diffusion process in preserving image content. We then propose a Plugin-Selector that can automatically select different Task-Plugins based on the text instruction, allowing users to edit images by indicating multiple low-level tasks with natural language. We conduct extensive experiments on 8 low-level vision tasks. The results demonstrate the superiority of Diff-Plugin over existing methods, particularly in real-world scenarios. Our ablations further validate that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes.
△ Less
Submitted 28 May, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
A Sentiment Consolidation Framework for Meta-Review Generation
Authors:
Miao Li,
Jey Han Lau,
Eduard Hovy
Abstract:
Modern natural language generation systems with Large Language Models (LLMs) exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if they truly possess the capability of information consolidation to generate summaries, especially on documents with opinionated information. We focus on meta-review generation, a form of sentiment summarisation for the…
▽ More
Modern natural language generation systems with Large Language Models (LLMs) exhibit the capability to generate a plausible summary of multiple documents; however, it is uncertain if they truly possess the capability of information consolidation to generate summaries, especially on documents with opinionated information. We focus on meta-review generation, a form of sentiment summarisation for the scientific domain. To make scientific sentiment summarization more grounded, we hypothesize that human meta-reviewers follow a three-layer framework of sentiment consolidation to write meta-reviews. Based on the framework, we propose novel prompting methods for LLMs to generate meta-reviews and evaluation metrics to assess the quality of generated meta-reviews. Our framework is validated empirically as we find that prompting LLMs based on the framework -- compared with prompting them with simple instructions -- generates better meta-reviews.
△ Less
Submitted 4 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
RelayAttention for Efficient Large Language Model Serving with Long System Prompts
Authors:
Lei Zhu,
Xinjiang Wang,
Wayne Zhang,
Rynson W. H. Lau
Abstract:
A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that…
▽ More
A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that involve long system prompts. Our key observation is that handling these system prompts requires heavily redundant memory accesses in existing causal attention computation algorithms. Specifically, for batched requests, the cached hidden states (\ie, key-value pairs) of system prompts are transferred from off-chip DRAM to on-chip SRAM multiple times, each corresponding to an individual request. To eliminate such a redundancy, we propose RelayAttention, an attention algorithm that allows reading these hidden states from DRAM exactly once for a batch of input tokens. RelayAttention is a free lunch: it maintains the generation quality while requiring no model retraining, as it is based on a mathematical reformulation of causal attention. We have observed significant performance improvements to a production-level system, vLLM, through integration with RelayAttention. The improvements are even more profound with longer system prompts.
△ Less
Submitted 30 May, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Delving into Dark Regions for Robust Shadow Detection
Authors:
Huankang Guan,
Ke Xu,
Rynson W. H. Lau
Abstract:
Shadow detection is a challenging task as it requires a comprehensive understanding of shadow characteristics and global/local illumination conditions. We observe from our experiment that state-of-the-art deep methods tend to have higher error rates in differentiating shadow pixels from non-shadow pixels in dark regions (ie, regions with low-intensity values). Our key insight to this problem is th…
▽ More
Shadow detection is a challenging task as it requires a comprehensive understanding of shadow characteristics and global/local illumination conditions. We observe from our experiment that state-of-the-art deep methods tend to have higher error rates in differentiating shadow pixels from non-shadow pixels in dark regions (ie, regions with low-intensity values). Our key insight to this problem is that existing methods typically learn discriminative shadow features from the whole image globally, covering the full range of intensity values, and may not learn the subtle differences between shadow and non-shadow pixels in dark regions. Hence, if we can design a model to focus on a narrower range of low-intensity regions, it may be able to learn better discriminative features for shadow detection. Inspired by this insight, we propose a novel shadow detection approach that first learns global contextual cues over the entire image and then zooms into the dark regions to learn local shadow representations. To this end, we formulate an effective dark-region recommendation (DRR) module to recommend regions of low-intensity values, and a novel dark-aware shadow analysis (DASA) module to learn dark-aware shadow features from the recommended dark regions. Extensive experiments show that the proposed method outperforms the state-of-the-art methods on three popular shadow detection datasets. Code is available at https://github.com/guanhuankang/ShadowDetection2021.git.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
CMA-R:Causal Mediation Analysis for Explaining Rumour Detection
Authors:
Lin Tian,
Xiuzhen Zhang,
Jey Han Lau
Abstract:
We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter. Interventions at the input and network level reveal the causal impacts of tweets and words in the model output. We find that our approach CMA-R -- Causal Mediation Analysis for Rumour detection -- identifies salient tweets that explain model predictions and show strong agreem…
▽ More
We apply causal mediation analysis to explain the decision-making process of neural models for rumour detection on Twitter. Interventions at the input and network level reveal the causal impacts of tweets and words in the model output. We find that our approach CMA-R -- Causal Mediation Analysis for Rumour detection -- identifies salient tweets that explain model predictions and show strong agreement with human judgements for critical tweets determining the truthfulness of stories. CMA-R can further highlight causally impactful words in the salient tweets, providing another layer of interpretability and transparency into these blackbox rumour detection systems. Code is available at: https://github.com/ltian678/cma-r.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy
Authors:
Efe Bozkir,
Süleyman Özdel,
Ka Hei Carrie Lau,
Mengdi Wang,
Hong Gao,
Enkelejda Kasneci
Abstract:
Advances in artificial intelligence and human-computer interaction will likely lead to extended reality (XR) becoming pervasive. While XR can provide users with interactive, engaging, and immersive experiences, non-player characters are often utilized in pre-scripted and conventional ways. This paper argues for using large language models (LLMs) in XR by embedding them in avatars or as narratives…
▽ More
Advances in artificial intelligence and human-computer interaction will likely lead to extended reality (XR) becoming pervasive. While XR can provide users with interactive, engaging, and immersive experiences, non-player characters are often utilized in pre-scripted and conventional ways. This paper argues for using large language models (LLMs) in XR by embedding them in avatars or as narratives to facilitate inclusion through prompt engineering and fine-tuning the LLMs. We argue that this inclusion will promote diversity for XR use. Furthermore, the versatile conversational capabilities of LLMs will likely increase engagement in XR, hel** XR become ubiquitous. Lastly, we speculate that combining the information provided to LLM-powered spaces by users and the biometric data obtained might lead to novel privacy invasions. While exploring potential privacy breaches, examining user privacy concerns and preferences is also essential. Therefore, despite challenges, LLM-powered XR is a promising area with several opportunities.
△ Less
Submitted 20 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Recasting Regional Lighting for Shadow Removal
Authors:
Yuhao Liu,
Zhanghan Ke,
Ke Xu,
Fang Liu,
Zhenwei Wang,
Rynson W. H. Lau
Abstract:
Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color map**s between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination…
▽ More
Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color map**s between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination, while simply enhancing the local illumination cannot fully recover the attenuated textures. Based on this observation, we propose to condition the restoration of attenuated textures on the corrected local lighting in the shadow region. Specifically, We first design a shadow-aware decomposition network to estimate the illumination and reflectance layers of shadow regions explicitly. We then propose a novel bilateral correction network to recast the lighting of shadow regions in the illumination layer via a novel local lighting correction module, and to restore the textures conditioned on the corrected illumination layer via a novel illumination-guided texture restoration module. We further annotate pixel-wise shadow masks for the public SRD dataset, which originally contains only image pairs. Experiments on three benchmarks show that our method outperforms existing state-of-the-art shadow removal methods.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality
Authors:
Ying Jiang,
Chang Yu,
Tianyi Xie,
Xuan Li,
Yutao Feng,
Huamin Wang,
Minchen Li,
Henry Lau,
Feng Gao,
Yin Yang,
Chenfanfu Jiang
Abstract:
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, whic…
▽ More
As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By develo** a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience.Our project page is available at: https://yingjiang96.github.io/VR-GS/.
△ Less
Submitted 4 May, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Can the giant planets of the Solar System form via pebble accretion in a smooth protoplanetary disc?
Authors:
Tommy Chi Ho Lau,
Man Hoi Lee,
Ramon Brasser,
Soko Matsumura
Abstract:
Prevailing $N$-body planet formation models typically start with lunar-mass embryos and show a general trend of rapid migration of massive planetary cores to the inner Solar System in the absence of a migration trap. This setup cannot capture the evolution from a planetesimal to embryo, which is crucial to the final architecture of the system. We aim to model planet formation with planet migration…
▽ More
Prevailing $N$-body planet formation models typically start with lunar-mass embryos and show a general trend of rapid migration of massive planetary cores to the inner Solar System in the absence of a migration trap. This setup cannot capture the evolution from a planetesimal to embryo, which is crucial to the final architecture of the system. We aim to model planet formation with planet migration starting with planetesimals of $\sim10^{-6}$ -- $10^{-4}M_\oplus$ and reproduce the giant planets of the Solar System. We simulated a population of 1,000 -- 5,000 planetesimals in a smooth protoplanetary disc, which was evolved under the effects of their mutual gravity, pebble accretion, gas accretion, and planet migration, employing the parallelized $N$-body code SyMBAp. We find that the dynamical interactions among growing planetesimals are vigorous and can halt pebble accretion for excited bodies. While a set of results without planet migration produces one to two gas giants and one to two ice giants beyond 6 au, massive planetary cores readily move to the inner Solar System once planet migration is in effect. Dynamical heating is important in a planetesimal disc and the reduced pebble encounter time should be considered in similar models. Planet migration remains a challenge to form cold giant planets in a smooth protoplanetary disc, which suggests an alternative mechanism is required to stop them at wide orbits.
△ Less
Submitted 25 March, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Authors:
Tianyu Huang,
Yihan Zeng,
Zhilu Zhang,
Wan Xu,
Hang Xu,
Songcen Xu,
Rynson W. H. Lau,
Wangmeng Zuo
Abstract:
3D generation has raised great attention in recent years. With the success of text-to-image diffusion models, the 2D-lifting technique becomes a promising route to controllable 3D generation. However, these methods tend to present inconsistent geometry, which is also known as the Janus problem. We observe that the problem is caused mainly by two aspects, i.e., viewpoint bias in 2D diffusion models…
▽ More
3D generation has raised great attention in recent years. With the success of text-to-image diffusion models, the 2D-lifting technique becomes a promising route to controllable 3D generation. However, these methods tend to present inconsistent geometry, which is also known as the Janus problem. We observe that the problem is caused mainly by two aspects, i.e., viewpoint bias in 2D diffusion models and overfitting of the optimization objective. To address it, we propose a two-stage 2D-lifting framework, namely DreamControl, which optimizes coarse NeRF scenes as 3D self-prior and then generates fine-grained objects with control-based score distillation. Specifically, adaptive viewpoint sampling and boundary integrity metric are proposed to ensure the consistency of generated priors. The priors are then regarded as input conditions to maintain reasonable geometries, in which conditional LoRA and weighted score are further proposed to optimize detailed textures. DreamControl can generate high-quality 3D content in terms of both geometry consistency and texture fidelity. Moreover, our control-based optimization guidance is applicable to more downstream tasks, including user-guided generation and 3D animation. The project page is available at https://github.com/tyhuang0428/DreamControl.
△ Less
Submitted 12 March, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
On the Backreaction of Dirac Matter in JT Gravity and SYK Model
Authors:
Pak Hang Chris Lau,
Chen-Te Ma,
Jeff Murugan,
Masaki Tezuka
Abstract:
We model backreaction in AdS$_2$ JT gravity via a proposed boundary dual Sachdev-Ye-Kitaev quantum dot coupled to Dirac fermion matter and study it from the perspective of quantum entanglement and chaos. The boundary effective action accounts for the backreaction through a linear coupling of the Dirac fermions to the Gaussian-random two-body Majorana interaction term in the low-energy limit. We ca…
▽ More
We model backreaction in AdS$_2$ JT gravity via a proposed boundary dual Sachdev-Ye-Kitaev quantum dot coupled to Dirac fermion matter and study it from the perspective of quantum entanglement and chaos. The boundary effective action accounts for the backreaction through a linear coupling of the Dirac fermions to the Gaussian-random two-body Majorana interaction term in the low-energy limit. We calculate the time evolution of the entanglement entropy between graviton and Dirac fermion fields for a separable initial state and find that it initially increases and then saturates to a finite value. Moreover, in the limit of a large number of fermions, we find a maximally entangled state between the Majorana and Dirac fields in the saturation region, implying a transition of the von Neumann algebra of observables from type I to type II. This transition in turn indicates a loss of information in the holographically dual emergent spacetime. We corroborate these observations with a detailed numerical computation of the averaged nearest-neighbor gap ratio of the boundary spectrum and provide a useful complement to quantum entanglement studies of holography.
△ Less
Submitted 25 March, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Unsupervised Lexical Simplification with Context Augmentation
Authors:
Takashi Wada,
Timothy Baldwin,
Jey Han Lau
Abstract:
We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data. We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model subs…
▽ More
We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data. We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages. We also establish a new state-of-the-art by ensembling our model with GPT-3.5. Lastly, we evaluate our model on the SWORDS lexical substitution data set, achieving a state-of-the-art result.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields
Authors:
Tianyu Huang,
Yihan Zeng,
Bowen Dong,
Hang Xu,
Songcen Xu,
Rynson W. H. Lau,
Wangmeng Zuo
Abstract:
Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rat…
▽ More
Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rather than using the text prompts as input directly, we suggest to inject dynamic noise into the latent space of given text prompts, i.e., Noisy Text Fields (NTFs). In this way, limited 3D data can be mapped to the appropriate range of textual latent space that is expanded by NTFs. To this end, an NTFGen module is proposed to model general text latent code in noisy fields. Meanwhile, an NTFBind module is proposed to align view-invariant image latent code to noisy fields, further supporting image-conditional 3D generation. To guide the conditional generation in both geometry and texture, multi-modal discrimination is constructed with a text-3D discriminator and a text-2.5D discriminator. Compared to previous methods, TextField3D includes three merits: 1) large vocabulary, 2) text consistency, and 3) low latency. Extensive experiments demonstrate that our method achieves a potential open-vocabulary 3D generation capability.
△ Less
Submitted 14 March, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Individually Rational Collaborative Vehicle Routing through Give-And-Take Exchanges
Authors:
Paul Mingzheng Tang,
Ba Phong Tran,
Hoong Chuin Lau
Abstract:
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pa…
▽ More
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pairs of vehicles from different logistics companies, optimizing the overall routes while considering standard VRP constraints plus individual rationality constraints. By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide. More importantly, our approach ensures individual rationality and faster convergence, which are important properties of ensuring the long-term sustainability of the marketplace platform. We demonstrate the efficacy of our approach through extensive experiments using real-world test data from major logistics companies. The results reveal our algorithm's ability to rapidly identify numerous optimal solutions, underscoring its practical applicability and potential to transform the logistics industry.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
A Feasibility-Preserved Quantum Approximate Solver for the Capacitated Vehicle Routing Problem
Authors:
Ningyi Xie,
Xinwei Lee,
Dongsheng Cai,
Yoshiyuki Saito,
Nobuyoshi Asai,
Hoong Chuin Lau
Abstract:
The Capacitated Vehicle Routing Problem (CVRP) is an NP-optimization problem (NPO) that arises in various fields including transportation and logistics. The CVRP extends from the Vehicle Routing Problem (VRP), aiming to determine the most efficient plan for a fleet of vehicles to deliver goods to a set of customers, subject to the limited carrying capacity of each vehicle. As the number of possibl…
▽ More
The Capacitated Vehicle Routing Problem (CVRP) is an NP-optimization problem (NPO) that arises in various fields including transportation and logistics. The CVRP extends from the Vehicle Routing Problem (VRP), aiming to determine the most efficient plan for a fleet of vehicles to deliver goods to a set of customers, subject to the limited carrying capacity of each vehicle. As the number of possible solutions skyrockets when the number of customers increases, finding the optimal solution remains a significant challenge. Recently, the Quantum Approximate Optimization Algorithm (QAOA), a quantum-classical hybrid algorithm, has exhibited enhanced performance in certain combinatorial optimization problems compared to classical heuristics. However, its ability diminishes notably in solving constrained optimization problems including the CVRP. This limitation primarily arises from the typical approach of encoding the given problems as penalty-inclusive binary optimization problems. In this case, the QAOA faces challenges in sampling solutions satisfying all constraints. Addressing this, our work presents a new binary encoding for the CVRP, with an alternative objective function of minimizing the shortest path that bypasses the vehicle capacity constraint of the CVRP. The search space is further restricted by the constraint-preserving mixing operation. We examine and discuss the effectiveness of the proposed encoding under the framework of the variant of the QAOA, Quantum Alternating Operator Ansatz (AOA), through its application to several illustrative examples. Compared to the typical QAOA approach, the proposed method not only preserves the feasibility but also achieves a significant enhancement in the probability of measuring optimal solutions.
△ Less
Submitted 21 April, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Language-based Photo Color Adjustment for Graphic Designs
Authors:
Zhenwei Wang,
Nanxuan Zhao,
Gerhard Hancke,
Rynson W. H. Lau
Abstract:
Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuiti…
▽ More
Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuitive system that can assist both experts and novices on graphic design. Given a graphic design containing a photo that needs to be recolored, our model can predict the source colors and the target regions, and then recolor the target regions with the source colors based on the given language-based instruction. The multi-granularity of the instruction allows diverse user intentions. The proposed novel task faces several unique challenges, including: 1) color accuracy for recoloring with exactly the same color from the target design element as specified by the user; 2) multi-granularity instructions for parsing instructions correctly to generate a specific result or multiple plausible ones; and 3) locality for recoloring in semantically meaningful local regions to preserve original image semantics. To address these challenges, we propose a model called LangRecol with two main components: the language-based source color prediction module and the semantic-palette-based photo recoloring module. We also introduce an approach for generating a synthetic graphic design dataset with instructions to enable model training. We evaluate our model via extensive experiments and user studies. We also discuss several practical applications, showing the effectiveness and practicality of our approach. Code and data for this paper are at: https://zhenwwang.github.io/langrecol.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
FGo: A Directed Grey-box Fuzzer with Probabilistic Exponential cut-the-loss Strategies
Authors:
Harvey Lau
Abstract:
Traditional coverage grey-box fuzzers perform a breadth-first search of the state space of Program Under Test (PUT). This aimlessness wastes a lot of computing resources. Directed grey-box fuzzing focuses on the target of PUT and becomes one of the most popular topics of software testing. The early termination of unreachable test cases is a method to improve directed grey-box fuzzing. However, exi…
▽ More
Traditional coverage grey-box fuzzers perform a breadth-first search of the state space of Program Under Test (PUT). This aimlessness wastes a lot of computing resources. Directed grey-box fuzzing focuses on the target of PUT and becomes one of the most popular topics of software testing. The early termination of unreachable test cases is a method to improve directed grey-box fuzzing. However, existing solutions have two problems: firstly, reachability analysis needs to introduce extra technologies (e.g., static analysis); secondly, the performance of reachability analysis and auxiliary technologies lack versatility.
We propose FGo, a probabilistic exponential cut-the-loss directed grey-box fuzzer. FGo terminates unreachable test cases early with exponentially increasing probability. Compared to other technologies, FGo makes full use of the unreachable information contained in iCFG and doesn't generate any additional overhead caused by reachability analysis. Moreover, it is easy to generalize to all PUT. This strategy based on probability is perfectly adapted to the randomness of fuzzing.
The experiment results show that FGo is 106% faster than AFLGo in reproducing crashes. We compare multiple parameters of probabilistic exponential cut-the-loss algorithm and analyze them in detail. In addition, for enhancing the inerpretability of FGo, this paper discusses the difference between the theoretical performance and the practical performance of probabilistic exponential cut-the-loss algorithm.
△ Less
Submitted 16 September, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Unsupervised Paraphrasing of Multiword Expressions
Authors:
Takashi Wada,
Yuji Matsumoto,
Timothy Baldwin,
Jey Han Lau
Abstract:
We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context. Our model employs only monolingual corpus data and pre-trained language models (without fine-tuning), and does not make use of any external resources such as dictionaries. We evaluate our method on the SemEval 2022 idiomatic semantic text similarity task, and show that it outperforms all unsupervised syste…
▽ More
We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context. Our model employs only monolingual corpus data and pre-trained language models (without fine-tuning), and does not make use of any external resources such as dictionaries. We evaluate our method on the SemEval 2022 idiomatic semantic text similarity task, and show that it outperforms all unsupervised systems and rivals supervised systems.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization
Authors:
Rongxin Zhu,
Jianzhong Qi,
Jey Han Lau
Abstract:
A series of datasets and models have been proposed for summaries generated for well-formatted documents such as news articles. Dialogue summaries, however, have been under explored. In this paper, we present the first dataset with fine-grained factual error annotations named DIASUMFACT. We define fine-grained factual error detection as a sentence-level multi-label classification problem, and we ev…
▽ More
A series of datasets and models have been proposed for summaries generated for well-formatted documents such as news articles. Dialogue summaries, however, have been under explored. In this paper, we present the first dataset with fine-grained factual error annotations named DIASUMFACT. We define fine-grained factual error detection as a sentence-level multi-label classification problem, and we evaluate two state-of-the-art (SOTA) models on our dataset. Both models yield sub-optimal results, with a macro-averaged F1 score of around 0.25 over 6 error classes. We further propose an unsupervised model ENDERANKER via candidate ranking using pretrained encoder-decoder models. Our model performs on par with the SOTA models while requiring fewer resources. These observations confirm the challenges in detecting factual errors from dialogue summaries, which call for further studies, for which our dataset and results offer a solid foundation.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
RWKV: Reinventing RNNs for the Transformer Era
Authors:
Bo Peng,
Eric Alcaide,
Quentin Anthony,
Alon Albalak,
Samuel Arcadinho,
Stella Biderman,
Huanqi Cao,
Xin Cheng,
Michael Chung,
Matteo Grella,
Kranthi Kiran GV,
Xuzheng He,
Haowen Hou,
Jiaju Lin,
Przemyslaw Kazienko,
Jan Kocon,
Jiaming Kong,
Bartlomiej Koptyra,
Hayden Lau,
Krishna Sri Ipsit Mantri,
Ferdinand Mom,
Atsushi Saito,
Guangyu Song,
Xiangru Tang,
Bolun Wang
, et al. (9 additional authors not shown)
Abstract:
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scala…
▽ More
Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.
△ Less
Submitted 10 December, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation
Authors:
Miao Li,
Eduard Hovy,
Jey Han Lau
Abstract:
We present PeerSum, a novel dataset for generating meta-reviews of scientific papers. The meta-reviews can be interpreted as abstractive summaries of reviews, multi-turn discussions and the paper abstract. These source documents have rich inter-document relationships with an explicit hierarchical conversational structure, cross-references and (occasionally) conflicting information. To introduce th…
▽ More
We present PeerSum, a novel dataset for generating meta-reviews of scientific papers. The meta-reviews can be interpreted as abstractive summaries of reviews, multi-turn discussions and the paper abstract. These source documents have rich inter-document relationships with an explicit hierarchical conversational structure, cross-references and (occasionally) conflicting information. To introduce the structural inductive bias into pre-trained language models, we introduce Rammer ( Relationship-aware Multi-task Meta-review Generator), a model that uses sparse attention based on the conversational structure and a multi-task training objective that predicts metadata features (e.g., review ratings). Our experimental results show that Rammer outperforms other strong baseline models in terms of a suite of automatic evaluation metrics. Further analyses, however, reveal that RAMMER and other models struggle to handle conflicts in source documents of PeerSum, suggesting meta-review generation is a challenging task and a promising avenue for further research.
△ Less
Submitted 23 October, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Parallelization of the Symplectic Massive Body Algorithm (SyMBA) $N$-body Code
Authors:
Tommy Chi Ho Lau,
Man Hoi Lee
Abstract:
Direct $N$-body simulations of a large number of particles, especially in the study of planetesimal dynamics and planet formation, have been computationally challenging even with modern machines. This work presents the combination of fully parallelized $N^2/2$ interactions and the incorporation of the GENGA code's close encounter pair grou** strategy to enable MIMD parallelization of the Symplec…
▽ More
Direct $N$-body simulations of a large number of particles, especially in the study of planetesimal dynamics and planet formation, have been computationally challenging even with modern machines. This work presents the combination of fully parallelized $N^2/2$ interactions and the incorporation of the GENGA code's close encounter pair grou** strategy to enable MIMD parallelization of the Symplectic Massive Body Algorithm (SyMBA) with OpenMP on multi-core CPUs in shared-memory environment. SyMBAp (SyMBA parallelized) preserves the symplectic nature of SyMBA and shows good scalability, with a speedup of 30.8 times with 56 cores in a simulation with 5,000 fully interactive particles.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Neural Preset for Color Style Transfer
Authors:
Zhanghan Ke,
Yuhao Liu,
Lei Zhu,
Nanxuan Zhao,
Rynson W. H. Lau
Abstract:
In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Map** (DNCM) to consistently operate on each pixel via an image-adaptive color map** matrix, avoiding ar…
▽ More
In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Map** (DNCM) to consistently operate on each pixel via an image-adaptive color map** matrix, avoiding artifacts and supporting high-resolution inputs with a small memory footprint. Second, we develop a two-stage pipeline by dividing the task into color normalization and stylization, which allows efficient style switching by extracting color styles as presets and reusing them on normalized input images. Due to the unavailability of pairwise datasets, we describe how to train Neural Preset via a self-supervised strategy. Various advantages of Neural Preset over existing methods are demonstrated through comprehensive evaluations. Notably, Neural Preset enables stable 4K color style transfer in real-time without artifacts. Besides, we show that our trained model can naturally support multiple applications without fine-tuning, including low-light image enhancement, underwater image correction, image dehazing, and image harmonization. Project page with demos: https://zhkkke.github.io/NeuralPreset .
△ Less
Submitted 24 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Reproduction number of SARS-CoV-2 Omicron variants, China, December 2022-January 2023
Authors:
Yuan Bai,
Zengyang Shao,
Xiao Zhang,
Ruohan Chen,
Lin Wang,
Sheikh Taslim Ali,
Tianmu Chen,
Eric H. Y. Lau,
Dong-Yan **,
Zhanwei Du
Abstract:
China adjusted the zero-COVID strategy in late 2022, triggering an unprecedented Omicron wave. We estimated the time-varying reproduction numbers of 32 provincial-level administrative divisions from December 2022 to January 2023. We found that the pooled estimate of initial reproduction numbers is 4.74 (95% CI: 4.41, 5.07).
China adjusted the zero-COVID strategy in late 2022, triggering an unprecedented Omicron wave. We estimated the time-varying reproduction numbers of 32 provincial-level administrative divisions from December 2022 to January 2023. We found that the pooled estimate of initial reproduction numbers is 4.74 (95% CI: 4.41, 5.07).
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
DeltaScore: Fine-Grained Story Evaluation with Perturbations
Authors:
Zhuohan Xie,
Miao Li,
Trevor Cohn,
Jey Han Lau
Abstract:
Numerous evaluation metrics have been developed for natural language generation tasks, but their effectiveness in evaluating stories is limited as they are not specifically tailored to assess intricate aspects of storytelling, such as fluency and interestingness. In this paper, we introduce DELTASCORE, a novel methodology that employs perturbation techniques for the evaluation of nuanced story asp…
▽ More
Numerous evaluation metrics have been developed for natural language generation tasks, but their effectiveness in evaluating stories is limited as they are not specifically tailored to assess intricate aspects of storytelling, such as fluency and interestingness. In this paper, we introduce DELTASCORE, a novel methodology that employs perturbation techniques for the evaluation of nuanced story aspects. Our central proposition posits that the extent to which a story excels in a specific aspect (e.g., fluency) correlates with the magnitude of its susceptibility to particular perturbations (e.g., the introduction of typos). Given this, we measure the quality of an aspect by calculating the likelihood difference between pre- and post-perturbation states using pre-trained language models. We compare DELTASCORE with existing metrics on storytelling datasets from two domains in five fine-grained story aspects: fluency, coherence, relatedness, logicality, and interestingness. DELTASCORE demonstrates remarkable performance, revealing a surprising finding that a specific perturbation proves highly effective in capturing multiple aspects.
△ Less
Submitted 2 November, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
MetaTroll: Few-shot Detection of State-Sponsored Trolls with Transformer Adapters
Authors:
Lin Tian,
Xiuzhen Zhang,
Jey Han Lau
Abstract:
State-sponsored trolls are the main actors of influence campaigns on social media and automatic troll detection is important to combat misinformation at scale. Existing troll detection models are developed based on training data for known campaigns (e.g.\ the influence campaign by Russia's Internet Research Agency on the 2016 US Election), and they fall short when dealing with {\em novel} campaign…
▽ More
State-sponsored trolls are the main actors of influence campaigns on social media and automatic troll detection is important to combat misinformation at scale. Existing troll detection models are developed based on training data for known campaigns (e.g.\ the influence campaign by Russia's Internet Research Agency on the 2016 US Election), and they fall short when dealing with {\em novel} campaigns with new targets. We propose MetaTroll, a text-based troll detection model based on the meta-learning framework that enables high portability and parameter-efficient adaptation to new campaigns using only a handful of labelled samples for few-shot transfer. We introduce \textit{campaign-specific} transformer adapters to MetaTroll to ``memorise'' campaign-specific knowledge so as to tackle catastrophic forgetting, where a model ``forgets'' how to detect trolls from older campaigns due to continual adaptation. Our experiments demonstrate that MetaTroll substantially outperforms baselines and state-of-the-art few-shot text classification models. Lastly, we explore simple approaches to extend MetaTroll to multilingual and multimodal detection. Source code for MetaTroll is available at: https://github.com/ltian678/metatroll-code.git.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization
Authors:
Miao Li,
Jianzhong Qi,
Jey Han Lau
Abstract:
Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such…
▽ More
Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such do not capture the diversity of relationships in the documents. To preserve only key information and relationships of the documents in the heterogeneous graph, HGSUM uses graph pooling to compress the input graph. And to guide HGSUM to learn compression, we introduce an additional objective that maximizes the similarity between the compressed graph and the graph constructed from the ground-truth summary during training. HGSUM is trained end-to-end with graph similarity and standard cross-entropy objectives. Experimental results over MULTI-NEWS, WCEP-100, and ARXIV show that HGSUM outperforms state-of-the-art MDS models. The code for our model and experiments is available at: https://github.com/oaimli/HGSum.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.