-
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Authors:
Yuxuan Zhang,
Tianheng Cheng,
Rui Hu,
Lei Liu,
Heng Liu,
Long** Ran,
Xiaoxin Chen,
Wenyu Liu,
Xinggang Wang
Abstract:
Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (…
▽ More
Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (EVF-SAM). EVF-SAM is a simple yet effective referring segmentation method which exploits multimodal prompts (i.e., image and text) and comprises a pre-trained vision-language model to generate referring prompts and a SAM model for segmentation. Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e.g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation. Our experiments show that the proposed EVF-SAM based on BEIT-3 can obtain state-of-the-art performance on RefCOCO/+/g for referring expression segmentation and demonstrate the superiority of prompting SAM with early vision-language fusion. In addition, the proposed EVF-SAM with 1.32B parameters achieves remarkably higher performance while reducing nearly 82% of parameters compared to previous SAM methods based on large multimodal models.
△ Less
Submitted 3 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
Recy-ctronics: Designing Fully Recyclable Electronics With Varied Form Factors
Authors:
Tingyu Cheng,
Zhihan Zhang,
Han Huang,
Yingting Gao,
Wei Sun,
Gregory D. Abowd,
HyunJoo Oh,
Josiah Hester
Abstract:
For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materi…
▽ More
For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materials-specifically, polyvinyl alcohol (PVA) and liquid metal (LM)-alongside accessible manufacturing techniques to produce electronic components and systems with versatile form factors. Our work centers on the development of recyclable electronics through three methods: 1) creating sheet electronics by screen printing LM traces on PVA substrates; 2) develo** foam-based electronics by immersing mechanically stirred PVA foam into an LM solution; and 3) fabricating recyclable electronic tubes by injecting LM into mold cast PVA tubes, which can then be woven into various structures. To further assess the sustainability of our proposed methods, we conducted a life cycle assessment (LCA) to evaluate the environmental impact of our recyclable electronics in comparison to their conventional counterparts.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Towards Clinical AI Fairness: Filling Gaps in the Puzzle
Authors:
Mingxuan Liu,
Yilin Ning,
Salinelat Teixayavong,
Xiaoxuan Liu,
Mayli Mertens,
Yuqing Shang,
Xin Li,
Di Miao,
Jie Xu,
Daniel Shu Wei Ting,
Lionel Tim-Ee Cheng,
Jasmine Chiat Ling Ong,
Zhen Ling Teo,
Ting Fang Tan,
Narrendar RaviChandran,
Fei Wang,
Leo Anthony Celi,
Marcus Eng Hock Ong,
Nan Liu
Abstract:
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva…
▽ More
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical advancements and their practical clinical applications, resulting in a lack of contextualized discussion of AI fairness in clinical settings. Through a detailed evidence gap analysis, our review systematically pinpoints several deficiencies concerning both healthcare data and the provided AI fairness solutions. We highlight the scarcity of research on AI fairness in many medical domains where AI technology is increasingly utilized. Additionally, our analysis highlights a substantial reliance on group fairness, aiming to ensure equality among demographic groups from a macro healthcare system perspective; in contrast, individual fairness, focusing on equity at a more granular level, is frequently overlooked. To bridge these gaps, our review advances actionable strategies for both the healthcare and AI research communities. Beyond applying existing AI fairness methods in healthcare, we further emphasize the importance of involving healthcare professionals to refine AI fairness concepts and methods to ensure contextually relevant and ethically sound AI applications in healthcare.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Claw-free minimal matching covered graphs
Authors:
Yipei Zhang,
Xiumei Wang,
**jiang Yuan,
C. T. Ng,
T. C. E. Cheng
Abstract:
A matching covered graph $G$ is minimal if for each edge $e$ of $G$, $G-e$ is not matching covered. An edge $e$ of a matching covered graph $G$ is removable if $G-e$ is also matching covered. Thus a matching covered graph is minimal if and only if it is free of removable edges. For bipartite graphs, Lovász and Plummer gave a characterization of bipartite minimal matching covered graphs. For bricks…
▽ More
A matching covered graph $G$ is minimal if for each edge $e$ of $G$, $G-e$ is not matching covered. An edge $e$ of a matching covered graph $G$ is removable if $G-e$ is also matching covered. Thus a matching covered graph is minimal if and only if it is free of removable edges. For bipartite graphs, Lovász and Plummer gave a characterization of bipartite minimal matching covered graphs. For bricks, Lovász showed that the only bricks that are minimal matching covered are $K_4$ and $\overline{C_6}$. In this paper, we present a complete characterization of minimal matching covered graphs that are claw-free. Moreover, for cubic claw-free matching covered graphs that are not minimal matching covered, we obtain the number of their removable edges (with respect to their bricks), and then prove that they have at least 12 removable edges (the bound is sharp).
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
OzDES Reverberation Map** Program: Stacking analysis with H$β$, Mg II and C IV
Authors:
Umang Malik,
Rob Sharp,
A. Penton,
Z. Yu,
P. Martini,
B. E. Tucker,
T. M. Davis,
G. F. Lewis,
C. Lidman,
M. Aguena,
O. Alves,
J. Annis,
J. Asorey,
D. Bacon,
D. Brooks,
A. Carnero Rosell,
J. Carretero,
T. -Y. Cheng,
L. N. da Costa,
M. E. S. Pereira,
J. De Vicente,
P. Doel,
I. Ferrero,
J. Frieman,
G. Giannini
, et al. (25 additional authors not shown)
Abstract:
Reverberation map** is the leading technique used to measure direct black hole masses outside of the local Universe. Additionally, reverberation measurements calibrate secondary mass-scaling relations used to estimate single-epoch virial black hole masses. The Australian Dark Energy Survey (OzDES) conducted one of the first multi-object reverberation map** surveys, monitoring 735 AGN up to…
▽ More
Reverberation map** is the leading technique used to measure direct black hole masses outside of the local Universe. Additionally, reverberation measurements calibrate secondary mass-scaling relations used to estimate single-epoch virial black hole masses. The Australian Dark Energy Survey (OzDES) conducted one of the first multi-object reverberation map** surveys, monitoring 735 AGN up to $z\sim4$, over 6 years. The limited temporal coverage of the OzDES data has hindered recovery of individual measurements for some classes of sources, particularly those with shorter reverberation lags or lags that fall within campaign season gaps. To alleviate this limitation, we perform a stacking analysis of the cross-correlation functions of sources with similar intrinsic properties to recover average composite reverberation lags. This analysis leads to the recovery of average lags in each redshift-luminosity bin across our sample. We present the average lags recovered for the H$β$, Mg II and C IV samples, as well as multi-line measurements for redshift bins where two lines are accessible. The stacking analysis is consistent with the Radius-Luminosity relations for each line. Our results for the H$β$ sample demonstrate that stacking has the potential to improve upon constraints on the $R-L$ relation, which have been derived only from individual source measurements until now.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
ZeST: Zero-Shot Material Transfer from a Single Image
Authors:
Ta-Ying Cheng,
Prafull Sharma,
Andrew Markham,
Niki Trigoni,
Varun Jampani
Abstract:
We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry…
▽ More
We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues. The method works on real images without any training resulting a zero-shot approach. Both qualitative and quantitative results on real and synthetic datasets demonstrate that ZeST outputs photorealistic images with transferred materials. We also show the application of ZeST to perform multiple edits and robust material assignment under different illuminations. Project Page: https://ttchengab.github.io/zest
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors
Authors:
Chenyang Ma,
Kai Lu,
Ta-Ying Cheng,
Niki Trigoni,
Andrew Markham
Abstract:
Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance th…
▽ More
Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance the spatial reasoning capabilities of VLMs through prompting and interacting with priors from multiple 3D foundation models in a zero-shot, training-free manner. Extensive experiments demonstrate that our spatial reasoning-imbued VLM performs well on various forms of spatial VQA and can extend to help in various downstream robotics tasks such as pick and stack and trajectory planning.
△ Less
Submitted 6 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
The XMM Cluster Survey: Automating the estimation of hydrostatic mass for large samples of galaxy clusters I -- Methodology, Validation, & Application to the SDSSRM-XCS sample
Authors:
D. J. Turner,
P. A. Giles,
A. K. Romer,
J. Pilling,
T. K. Lingard,
R. Wilkinson,
M. Hilton,
E. W. Upsdell,
R. Al-Serkal,
T. Cheng,
R. Eappen,
P. J. Rooney,
S. Bhargava,
C. A. Collins,
J. Mayers,
C. Miller,
R. C. Nichol,
M. Sahén,
P. T. P. Viana
Abstract:
We describe features of the X-ray: Generate and Analyse (XGA) open-source software package that have been developed to facilitate automated hydrostatic mass ($M_{\rm hydro}$) measurements from XMM X-ray observations of clusters of galaxies. This includes describing how XGA measures global, and radial, X-ray properties of galaxy clusters. We then demonstrate the reliability of XGA by comparing simp…
▽ More
We describe features of the X-ray: Generate and Analyse (XGA) open-source software package that have been developed to facilitate automated hydrostatic mass ($M_{\rm hydro}$) measurements from XMM X-ray observations of clusters of galaxies. This includes describing how XGA measures global, and radial, X-ray properties of galaxy clusters. We then demonstrate the reliability of XGA by comparing simple X-ray properties, namely the X-ray temperature and gas mass, with published values presented by the XMM Cluster Survey (XCS), the Ultimate XMM eXtragaLactic survey project (XXL), and the Local Cluster Substructure Survey (LoCuSS). XGA measured values for temperature are, on average, within 1% of the values reported in the literature for each sample. XGA gas masses for XXL clusters are shown to be ${\sim}$10% lower than previous measurements (though the difference is only significant at the $\sim$1.8$σ$ level), LoCuSS $R_{2500}$ and $R_{500}$ gas mass re-measurements are 3% and 7% lower respectively (representing a 1.5$σ$ and 3.5$σ$ difference). Like-for-like comparisons of hydrostatic mass are made to LoCuSS results, which show that our measurements are $10{\pm}3%$ ($19{\pm}7%$) higher for $R_{2500}$ ($R_{500}$). The comparison between $R_{500}$ masses shows significant scatter. Finally, we present new $M_{\rm hydro}$ measurements for 104 clusters from the SDSS DR8 redMaPPer XCS sample (SDSSRM-XCS). Our SDSSRM-XCS hydrostatic mass measurements are in good agreement with multiple literature estimates, and represent one of the largest samples of consistently measured hydrostatic masses. We have demonstrated that XGA is a powerful tool for X-ray analysis of clusters; it will render complex-to-measure X-ray properties accessible to non-specialists.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Dark Energy Survey Year 3 results: likelihood-free, simulation-based $w$CDM inference with neural compression of weak-lensing map statistics
Authors:
N. Jeffrey,
L. Whiteway,
M. Gatti,
J. Williamson,
J. Alsing,
A. Porredon,
J. Prat,
C. Doux,
B. Jain,
C. Chang,
T. -Y. Cheng,
T. Kacprzak,
P. Lemos,
A. Alarcon,
A. Amon,
K. Bechtol,
M. R. Becker,
G. M. Bernstein,
A. Campos,
A. Carnero Rosell,
R. Chen,
A. Choi,
J. DeRose,
A. Drlica-Wagner,
K. Eckert
, et al. (66 additional authors not shown)
Abstract:
We present simulation-based cosmological $w$CDM inference using Dark Energy Survey Year 3 weak-lensing maps, via neural data compression of weak-lensing map summary statistics: power spectra, peak counts, and direct map-level compression/inference with convolutional neural networks (CNN). Using simulation-based inference, also known as likelihood-free or implicit inference, we use forward-modelled…
▽ More
We present simulation-based cosmological $w$CDM inference using Dark Energy Survey Year 3 weak-lensing maps, via neural data compression of weak-lensing map summary statistics: power spectra, peak counts, and direct map-level compression/inference with convolutional neural networks (CNN). Using simulation-based inference, also known as likelihood-free or implicit inference, we use forward-modelled mock data to estimate posterior probability distributions of unknown parameters. This approach allows all statistical assumptions and uncertainties to be propagated through the forward-modelled mock data; these include sky masks, non-Gaussian shape noise, shape measurement bias, source galaxy clustering, photometric redshift uncertainty, intrinsic galaxy alignments, non-Gaussian density fields, neutrinos, and non-linear summary statistics. We include a series of tests to validate our inference results. This paper also describes the Gower Street simulation suite: 791 full-sky PKDGRAV dark matter simulations, with cosmological model parameters sampled with a mixed active-learning strategy, from which we construct over 3000 mock DES lensing data sets. For $w$CDM inference, for which we allow $-1<w<-\frac{1}{3}$, our most constraining result uses power spectra combined with map-level (CNN) inference. Using gravitational lensing data only, this map-level combination gives $Ω_{\rm m} = 0.283^{+0.020}_{-0.027}$, ${S_8 = 0.804^{+0.025}_{-0.017}}$, and $w < -0.80$ (with a 68 per cent credible interval); compared to the power spectrum inference, this is more than a factor of two improvement in dark energy parameter ($Ω_{\rm DE}, w$) precision.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
FedFDP: Fairness-Aware Federated Learning with Differential Privacy
Authors:
Xinpeng Ling,
Jie Fu,
Kuncan Wang,
Huifa Li,
Tong Cheng,
Zhili Chen
Abstract:
Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the tr…
▽ More
Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the transmission of gradients or models in federated learning can also give rise to privacy leakage issues, such as membership inference attacks.
To address the first issue mentioned above, we propose a fairness-aware federated learning algorithm, termed FedFair. Building upon FedFair, we introduce privacy protection to form the FedFDP algorithm to address the second issue mentioned above. In FedFDP, we devise a fairness-aware clip** strategy to achieve differential privacy while adjusting fairness. Additionally, for the extra uploaded loss values, we present an adaptive clip** approach to maximize utility. Furthermore, we theoretically prove that our algorithm converges and ensures differential privacy. Lastly, extensive experimental results demonstrate that FedFair and FedFDP significantly outperform state-of-the-art solutions in terms of model performance and fairness. Code and data is accessible at https://anonymous.4open.science/r/FedFDP-5607.
△ Less
Submitted 20 May, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Authors:
Chun-Hsiao Yeh,
Ta-Ying Cheng,
He-Yen Hsieh,
Chuan-En Lin,
Yi Ma,
Andrew Markham,
Niki Trigoni,
H. T. Kung,
Yubei Chen
Abstract:
Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts --…
▽ More
Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e.g., LAION). Second, given an image containing multiple personalized concepts, there lacks a holistic metric that evaluates performance on not just the degree of resemblance of personalized concepts, but also whether all concepts are present in the image and whether the image accurately reflects the overall text description. To address these issues, we introduce Gen4Gen, a semi-automated dataset creation pipeline utilizing generative models to combine personalized concepts into complex compositions along with text-descriptions. Using this, we create a dataset called MyCanvas, that can be used to benchmark the task of multi-concept personalization. In addition, we design a comprehensive metric comprising two scores (CP-CLIP and TI-CLIP) for better quantifying the performance of multi-concept, personalized text-to-image diffusion methods. We provide a simple baseline built on top of Custom Diffusion with empirical prompting strategies for future researchers to evaluate on MyCanvas. We show that by improving data quality and prompting strategies, we can significantly increase multi-concept personalized image generation quality, without requiring any modifications to model architecture or training algorithms.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Learning Continuous 3D Words for Text-to-Image Generation
Authors:
Ta-Ying Cheng,
Matheus Gadelha,
Thibault Groueix,
Matthew Fisher,
Radomir Mech,
Andrew Markham,
Niki Trigoni
Abstract:
Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input…
▽ More
Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input tokens that can be transformed in a continuous manner -- we call them Continuous 3D Words. These attributes can, for example, be represented as sliders and applied jointly with text prompts for fine-grained control over image generation. Given only a single mesh and a rendering engine, we show that our approach can be adopted to provide continuous user control over several 3D-aware attributes, including time-of-day illumination, bird wing orientation, dollyzoom effect, and object poses. Our method is capable of conditioning image creation with multiple Continuous 3D Words and text descriptions simultaneously while adding no overhead to the generative process. Project Page: https://ttchengab.github.io/continuous_3d_words
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Automating the audit of electronic invoices with a soft robot
Authors:
Tian Jun Cheng,
Chia Jung Chen,
Yao Lin Ong,
Yi Fang Yang,
Guang Yih Sheu
Abstract:
Taiwan's Chi Mei Medical Center has completed four challenges mentioned in published robotic process automation (RPA) studies including automating a dynamic process, designing feasible human-robot collaboration, incorporating other emerging technologies, and bringing positive business impacts. Its executives called a committee to implement the electronic invoicing. This implementation includes the…
▽ More
Taiwan's Chi Mei Medical Center has completed four challenges mentioned in published robotic process automation (RPA) studies including automating a dynamic process, designing feasible human-robot collaboration, incorporating other emerging technologies, and bringing positive business impacts. Its executives called a committee to implement the electronic invoicing. This implementation includes the creation of a software robot to download automatically cloud electronic invoice (E-invoice) data from Taiwan's E-invoice platform and detect the inconsistency between them and on-premise data. This bot operates when internal auditors are off their office. They satisfied this software robot since the remaining work is only verifying the resulting inconsistency. The Chi Mei Medical Center measured the time and costs before and after adopting software robots to audit E-invoice; consequently, it welcomed more bots automating other business processes. In conclusion, integrating a software robot with other emerging technologies mitigates the possible errors provided by this bot. A good human-robot collaboration relies on the consideration of human perspective in choosing RPA tasks. Free bot creators are sufficient to verify that automating a business process using a bot is a reasonable investment.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum
Authors:
Tin Sum Cheng,
Aurelien Lucchi,
Anastasis Kratsios,
David Belius
Abstract:
We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is…
▽ More
We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.
△ Less
Submitted 29 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Authors:
Tianheng Cheng,
Lin Song,
Yixiao Ge,
Wenyu Liu,
Xinggang Wang,
Ying Shan
Abstract:
The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling an…
▽ More
The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.
△ Less
Submitted 22 February, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting
Authors:
Benjamin Ummenhofer,
Sanskar Agrawal,
Rene Sepulveda,
Yixing Lao,
Kai Zhang,
Tianhang Cheng,
Stephan Richter,
Shenlong Wang,
German Ros
Abstract:
Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This w…
▽ More
Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This work presents a real-world dataset for measuring the reconstruction and rendering of objects for relighting. To this end, we capture the environment lighting and ground truth images of the same objects in multiple environments allowing to reconstruct the objects from images taken in one environment and quantify the quality of the rendered views for the unseen lighting environments. Further, we introduce a simple baseline composed of off-the-shelf methods and test several state-of-the-art methods on the relighting task and show that novel view synthesis is not a reliable proxy to measure performance. Code and dataset are available at https://github.com/isl-org/objects-with-lighting .
△ Less
Submitted 13 April, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects
Authors:
Tianhang Cheng,
Wei-Chiu Ma,
Kaiyu Guan,
Antonio Torralba,
Shenlong Wang
Abstract:
Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multi…
▽ More
Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Magnetar-powered Neutrinos and Magnetic Moment Signatures at IceCube
Authors:
Ting Cheng,
Hao-Jui Kuan,
Ying-Ying Li,
Vedran Brdar
Abstract:
The IceCube collaboration pioneered the detection of $\mathcal{O}{(\text{PeV})}$ neutrino events and the identification of astrophysical sources of high-energy neutrinos. In this study, we explore scenarios in which high-energy neutrinos are produced in the vicinity of astrophysical objects with strong magnetic field, such as magnetars. While propagating through such magnetic field, neutrinos expe…
▽ More
The IceCube collaboration pioneered the detection of $\mathcal{O}{(\text{PeV})}$ neutrino events and the identification of astrophysical sources of high-energy neutrinos. In this study, we explore scenarios in which high-energy neutrinos are produced in the vicinity of astrophysical objects with strong magnetic field, such as magnetars. While propagating through such magnetic field, neutrinos experience spin precession induced by their magnetic moments, and this impacts their helicity and flavor composition at Earth. Considering both flavor composition of high-energy neutrinos and Glashow resonance events we find that detectable signatures may arise at neutrino telescopes, such as IceCube, for presently unconstrained neutrino magnetic moments in the range between $\mathcal{O}(10^{-15})~μ_B$ and $\mathcal{O}(10^{-12})~μ_B$.
△ Less
Submitted 10 June, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Infrared anomalies in ultrathin Ti3C2Tx MXene films
Authors:
Meng Li,
Tao Cheng,
Gongze Liu,
He Huang,
Keqiao Li,
Yang Li,
Jiayue Yang,
Baoling Huang
Abstract:
Visible transparent but infrared reflective materials are ideal candidates for both transparent conductive films and low-emissivity glass, which are highly desired in a broad variety of areas such as touchscreens and displays, photovoltaics, smart windows, and antistatic coatings. Ultrathin Ti3C2Tx MXene films are emerging as promising low-emissivity transparent candidates. However, the fundamenta…
▽ More
Visible transparent but infrared reflective materials are ideal candidates for both transparent conductive films and low-emissivity glass, which are highly desired in a broad variety of areas such as touchscreens and displays, photovoltaics, smart windows, and antistatic coatings. Ultrathin Ti3C2Tx MXene films are emerging as promising low-emissivity transparent candidates. However, the fundamental IR properties of Ti3C2Tx has not been revealed experimentally due to daunting challenges in the preparation of continuous, large-area, and ultrathin films of optical quality on flat substrates. Herein, we proposed a tape-free transfer method that can help prepare centimeter-size and ultrathin (down to 8 nm) Ti3C2Tx films on diverse optical substrates. Benefitting from this method, the refractive index and permittivity for Ti3C2Tx were successfully measured. Ti3C2Tx films exhibit large in-plane permittivity in the IR region, yielding maximum IR reflectance of 88% for bulk films. Interestingly, three anomalies were found in ultrathin Ti3C2Tx films: strong dispersion in the permittivity, interlayer space-dependent optical properties, and abnormally high IR absorption for a 15-nm-thick film. These anomalies are important guidelines in the design of Ti3C2Tx-based low-emissivity transparent films and other related devices, and may inspire other intriguing applications such as ultrathin IR absorption coatings and tunable IR optical devices.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
An Incremental Unified Framework for Small Defect Inspection
Authors:
Jiaqi Tang,
Hao Lu,
Xiaogang Xu,
Ruizheng Wu,
Sixing Hu,
Tong Zhang,
Tsz Wa Cheng,
Ming Ge,
Ying-Cong Chen,
Fugee Tsung
Abstract:
Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advanta…
▽ More
Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advantageous in object-incremental learning scenarios. Employing a state-of-the-art transformer, we introduce Object-Aware Self-Attention (OASA) to delineate distinct semantic boundaries. Semantic Compression Loss (SCL) is integrated to optimize non-primary semantic space, enhancing network adaptability for novel objects. Additionally, we prioritize retaining the features of established objects during weight updates. Demonstrating prowess in both image and pixel-level defect inspection, our approach achieves state-of-the-art performance, proving indispensable for dynamic and scalable industrial inspections. Our code will be released at \url{https://github.com/jqtangust/IUF}.
△ Less
Submitted 24 January, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions
Authors:
Ziheng Zeng,
Kellen Tan Cheng,
Srihari Venkat Nanniyur,
Jianing Zhou,
Suma Bhat
Abstract:
Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020…
▽ More
Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020 graph, converting PTLMs into knowledge models (KMs) that encode and infer commonsense knowledge related to IE use. Experiments show that various PTLMs can be converted into KMs with IEKG. We verify the quality of IEKG and the ability of the trained KMs with automatic and human evaluation. Through applications in natural language understanding, we show that a PTLM injected with knowledge from IEKG exhibits improved IE comprehension ability and can generalize to IEs unseen during training.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
A Unified Particle-Based Solver for Non-Newtonian Behaviors Simulation
Authors:
Chunlei Li,
Yang Gao,
Jiayi He,
Tianwei Cheng,
Shuai Li,
Aimin Hao,
Hong Qin
Abstract:
In this paper, we present a unified framework to simulate non-Newtonian behaviors. We combine viscous and elasto-plastic stress into a unified particle solver to achieve various non-Newtonian behaviors ranging from fluid-like to solid-like. Our constitutive model is based on a Generalized Maxwell model, which incorporates viscosity, elasticity and plasticity in one non-linear framework by a unifie…
▽ More
In this paper, we present a unified framework to simulate non-Newtonian behaviors. We combine viscous and elasto-plastic stress into a unified particle solver to achieve various non-Newtonian behaviors ranging from fluid-like to solid-like. Our constitutive model is based on a Generalized Maxwell model, which incorporates viscosity, elasticity and plasticity in one non-linear framework by a unified way. On the one hand, taking advantage of the viscous term, we construct a series of strain-rate dependent models for classical non-Newtonian behaviors such as shear-thickening, shear-thinning, Bingham plastic, etc. On the other hand, benefiting from the elasto-plastic model, we empower our framework with the ability to simulate solid-like non-Newtonian behaviors, i.e., visco-elasticity/plasticity. In addition, we enrich our method with a heat diffusion model to make our method flexible in simulating phase change. Through sufficient experiments, we demonstrate a wide range of non-Newtonian behaviors ranging from viscous fluid to deformable objects. We believe this non-Newtonian model will enhance the realism of physically-based animation, which has great potential for computer graphics.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
A General Framework for Portfolio Construction Based on Generative Models of Asset Returns
Authors:
Tuoyuan Cheng,
Kan Chen
Abstract:
In this paper, we present an integrated approach to portfolio construction and optimization, leveraging high-performance computing capabilities. We first explore diverse pairings of generative model forecasts and objective functions used for portfolio optimization, which are evaluated using performance-attribution models based on LASSO. We illustrate our approach using extensive simulations of cry…
▽ More
In this paper, we present an integrated approach to portfolio construction and optimization, leveraging high-performance computing capabilities. We first explore diverse pairings of generative model forecasts and objective functions used for portfolio optimization, which are evaluated using performance-attribution models based on LASSO. We illustrate our approach using extensive simulations of crypto-currency portfolios, and we show that the portfolios constructed using the vine-copula generative model and the Sharpe-ratio objective function consistently outperform. To accommodate a wide array of investment strategies, we further investigate portfolio blending and propose a general framework for evaluating and combining investment strategies. We employ an extension of the multi-armed bandit framework and use value models and policy models to construct eclectic blended portfolios based on past performance. We consider similarity and optimality measures for value models and employ probability-matching ("blending") and a greedy algorithm ("switching") for policy models. The eclectic portfolios are also evaluated using LASSO models. We show that the value model utilizing cosine similarity and logit optimality consistently delivers robust superior performances. The extent of outperformance by eclectic portfolios over their benchmarks significantly surpasses that achieved by individual generative model-based portfolios over their respective benchmarks.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
A novel method of restoration path optimization for the AC-DC bulk power grid after a major blackout
Authors:
Chao Yang,
Gaoshen Liang,
Tianle Cheng,
Yang Li,
Shaoyan Li
Abstract:
The restoration control of the modern alternating current-direct current (AC-DC) hybrid power grid after a major blackout is difficult and complex. Taking into account the interaction between the line-commutated converter high-voltage direct current (LCC-HVDC) and the AC power grid, this paper proposes a novel optimization method of restoration path to reconfigure the skeleton network for the blac…
▽ More
The restoration control of the modern alternating current-direct current (AC-DC) hybrid power grid after a major blackout is difficult and complex. Taking into account the interaction between the line-commutated converter high-voltage direct current (LCC-HVDC) and the AC power grid, this paper proposes a novel optimization method of restoration path to reconfigure the skeleton network for the blackout power grid. Based on the system strength, the supporting capability of the AC power grid for the LCC-HVDC is first analysed from the aspects of start-up and operation of LCC-HVDCs. Subsequently, the quantitative relationship between the restoration path and the restoration characteristic of LCC-HVDC is derived in detail based on the system strength indices of the short-circuit capacity and the frequency regulation capability. Then, an optimization model of restoration path considering non-tree paths is formulated and a feasible optimization algorithm is proposed to achieve the optimal path restoration scheme. A modified IEEE 39-bus system and a partial power grid of Southwest China are simulated to show that the proposed method is suitable for the restoration of AC-DC power grids and can improve restoration efficiency. This research can be an important guidance for operators to rapidly restore the AC-DC power grid.
△ Less
Submitted 27 October, 2023;
originally announced November 2023.
-
Converse Flexoelectricity of Low-Dimensional Bismuth Selenite (Bi2Se3) Revealed by Piezoresponse Force Microscopy (PFM)
Authors:
Qiong Liu,
S. S. Nanthakumar,
Bin Li,
Teresa Cheng,
Florian Bittner,
Chenxi Ma,
Fei Ding,
Lei Zheng,
Bernhard Roth,
Xiaoying Zhuang
Abstract:
Many kinds of two-dimensional (2D) van der Waals (vdW) have been demonstrated to exhibit electromechanical coupling effects, which makes them promising candidates for next-generation devices, such as piezotronics and nanogenerators. Recently, flexoelectricity was found to account for the out-of-plane electromechanical coupling in many 2D transition metal dichalcogenides (TMDs) who only exhibit in-…
▽ More
Many kinds of two-dimensional (2D) van der Waals (vdW) have been demonstrated to exhibit electromechanical coupling effects, which makes them promising candidates for next-generation devices, such as piezotronics and nanogenerators. Recently, flexoelectricity was found to account for the out-of-plane electromechanical coupling in many 2D transition metal dichalcogenides (TMDs) who only exhibit in-plane piezoelectricity. However, low dimensional vdW three-dimensional (3D) topological insulators (TIs) have been overlooked regarding their electromechanical properties. In this study, for the first time, we experimentally investigate the electromechanical coupling of low dimensional 3D TIs with a centrosymmetric crystal structure, where a binary compound, bismuth selenite (Bi2Se3), is taken as an example. The results of piezoresponse force microscope (PFM) tests on the Bi2Se3 nanoflakes show that the material exhibits both out-of-plane and in-plane electromechanical responses. The Bi2Se3 nanoflake with a thickness of 37 nm possesses an effective out-of-plane piezoelectric coefficient of ~0.65 pm V-1. With careful analyses, the electromechanical responses are verified to arise from the converse flexoelectricity. The measured effective out-of-plane piezoelectric coefficient is mainly contributed by flexoelectric coefficient, μ_39, which is estimated to be approximately 0.13 nC m-1. However, it is rather difficult to obtain the in-plane component of the flexoelectric tensor from the in-plane PFM measurements since the direction of the in-plane stress is always not normal to the AFM cantilever axis. The results provide useful guidance for understanding the flexoelectric effect of low dimensional vdW materials with centrosymmetric crystal structures. Moreover, the work can pave to way to explore the electromechanical devices based on the flexoelectricity of vdW TIs.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets
Authors:
Ta-Ying Cheng,
Matheus Gadelha,
Soren Pirk,
Thibault Groueix,
Radomir Mech,
Andrew Markham,
Niki Trigoni
Abstract:
We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image repres…
▽ More
We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset. Project Page: https://ttchengab.github.io/3dminerOfficial
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers
Authors:
Daniela Teodorescu,
Tiffany Cheng,
Alona Fyshe,
Saif M. Mohammad
Abstract:
Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dyn…
▽ More
Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dynamics from one's everyday utterances addresses many of these concerns, but it is not yet known whether these measures of utterance emotion dynamics (UED) correlate with mental health diagnoses. Here, for the first time, we study the relationship between tweet emotion dynamics and mental health disorders. We find that each of the UED metrics studied varied by the user's self-disclosed diagnosis. For example: average valence was significantly higher (i.e., more positive text) in the control group compared to users with ADHD, MDD, and PTSD. Valence variability was significantly lower in the control group compared to ADHD, depression, bipolar disorder, MDD, PTSD, and OCD but not PPD. Rise and recovery rates of valence also exhibited significant differences from the control. This work provides important early evidence for how linguistic cues pertaining to emotion dynamics can play a crucial role as biosocial markers for mental illnesses and aid in the understanding, diagnosis, and management of mental health disorders.
△ Less
Submitted 4 November, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Boosting output performance of contact-separation mode triboelectric nanogenerators by adopting discontinuity and fringing effect: experiment and modelling studies
Authors:
Teresa Cheng,
Han Hu,
Navid Valizadeh,
Qiong Liu,
Florian Bittner,
Ling Yang,
Timon Rabczuk,
Xiaoning Jiang,
Xiaoying Zhuang
Abstract:
Triboelectric nanogenerators (TENGs) are promising self-powering supplies for a diverse range of intelligent sensing and monitoring devices, especially due to their capability of harvesting electric energy from low frequency and small-scale mechanical motions. Inspired by the fact that contact-separation mode TENGs with small contact areas harvest high electrical outputs due to fringing effect, th…
▽ More
Triboelectric nanogenerators (TENGs) are promising self-powering supplies for a diverse range of intelligent sensing and monitoring devices, especially due to their capability of harvesting electric energy from low frequency and small-scale mechanical motions. Inspired by the fact that contact-separation mode TENGs with small contact areas harvest high electrical outputs due to fringing effect, this study employed discontinuity on the dielectric side of contact-separation mode TENGs to promote fringing electric fields for the enhancement of electrical outputs. The results reveal that the TENGs with more discontinuities show higher overall electric performance. Compared to pristine TENGs, the TENGs with cross discontinuities increased the surface charge by 50% and the power density by 114%. However, one should avoid generating discontinuities on tribonegative side of TENGs using metal blade within a positive-ion atmosphere due to the neutralization through electrically conductive metal blade. The computational simulation validated that the TENGs with discontinuities obtained higher electrical outputs, and further investigated the effect of discontinuity gap size and array distance on TENGs performance. This study has provided a promising method for the future design of TENGs using discontinuous structures.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Properly colored even cycles in edge-colored complete balanced bipartite graphs
Authors:
Shanshan Guo,
Fei Huang,
**jiang Yuan,
C. T. Ng,
T. C. E. Cheng
Abstract:
Consider a complete balanced bipartite graph $K_{n,n}$ and let $K^c_{n,n}$ be an edge-colored version of $K_{n,n}$ that is obtained from $K_{n,n}$ by having each edge assigned a certain color. A subgraph $H$ of $K^c_{n,n}$ is called properly colored (PC) if every two adjacent edges of $H$ have distinct colors. $K_{n,n}^c$ is called properly vertex-even-pancyclic if for every vertex…
▽ More
Consider a complete balanced bipartite graph $K_{n,n}$ and let $K^c_{n,n}$ be an edge-colored version of $K_{n,n}$ that is obtained from $K_{n,n}$ by having each edge assigned a certain color. A subgraph $H$ of $K^c_{n,n}$ is called properly colored (PC) if every two adjacent edges of $H$ have distinct colors. $K_{n,n}^c$ is called properly vertex-even-pancyclic if for every vertex $u\in V(K_{n,n}^c)$ and for every even integer $k$ with $4 \leq k \leq 2n$, there exists a PC $k$-cycle containing $u$. The minimum color degree $δ^c(K^c_{n,n})$ of $K^c_{n,n}$ is the largest integer $k$ such that for every vertex $v$, there are at least $k$ distinct colors on the edges incident to $v$. In this paper we study the existence of PC even cycles in $K_{n,n}^c$. We first show that, for every integer $t\geq 3$, every $K^c_{n,n}$ with $δ^c(K^c_{n,n})\geq \frac{2n}{3}+t$ contains a PC 2-factor $H$ such that every cycle of $H$ has a length of at least $t$. By using the probabilistic method and absorbing technique, we use the above result to further show that, for every $\varepsilon>0$, there exists an integer $n_0(\varepsilon)$ such that every $K^c_{n,n}$ with $n\geq n_0(\varepsilon)$ is properly vertex-even-pancyclic, provided that $δ^c(K^c_{n,n})\geq (\frac{2}{3}+\varepsilon)n$.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression
Authors:
Tin Sum Cheng,
Aurelien Lucchi,
Ivan Dokmanić,
Anastasis Kratsios,
David Belius
Abstract:
Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regres…
▽ More
Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR. Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters.
△ Less
Submitted 3 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
A Tale of Two Cultures: Comparing Interpersonal Information Disclosure Norms on Twitter
Authors:
Mainack Mondal,
Anju Punuru,
Tyng-Wen Scott Cheng,
Kenneth Vargas,
Chaz Gundry,
Nathan S Driggs,
Noah Schill,
Nathaniel Carlson,
Josh Bedwell,
Jaden Q Lorenc,
Isha Ghosh,
Yao Li,
Nancy Fulda,
Xinru Page
Abstract:
We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and…
▽ More
We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and content disclosed. We decided to study whether such differences also occur online in context of Twitter when comparing tweets posted in an individualistic (U.S.) versus a collectivist (India) society. We collected more than 2 million tweets posted in the U.S. and India over a 3 month period which contain interpersonal relationship keywords. A card-sort study was used to develop this culturally-sensitive saturated taxonomy of keywords that represent interpersonal relationships (e.g., ma, mom, mother). Then we developed a high-accuracy interpersonal disclosure detector based on dependency-parsing (F1-score: 86%) to identify when the words refer to a personal relationship of the poster (e.g., "my mom" as opposed to "a mom"). This allowed us to identify the 400K+ tweets in our data set which actually disclose information about the poster's interpersonal relationships. We used a mixed methods approach to analyze these tweets (e.g., comparing the amount of joy expressed about one's family) and found differences in emotion, topic, and content disclosed between tweets from the U.S. versus India. Our analysis also reveals how a combination of qualitative and quantitative methods are needed to uncover these differences; Using just one or the other can be misleading. This study extends the prior literature on Multi-Party Privacy and provides guidance for researchers and designers of culturally-sensitive systems.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
GelSplitter: Tactile Reconstruction from Near Infrared and Visible Images
Authors:
Yuankai Lin,
Yulin Zhou,
Kaiji Huang,
Qi Zhong,
Tao Cheng,
Hua Yang,
Zhou** Yin
Abstract:
The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor w…
▽ More
The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor with synchronized multi-modal cameras and resemble a more human-like tactile receptor. Furthermore, we focus on 3D tactile reconstruction and implement a compact sensor structure that maintains a comparable size to state-of-the-art VT sensors, even with the addition of a prism and a near infrared (NIR) camera. We also design a photometric fusion stereo neural network (PFSNN), which estimates surface normals of objects and reconstructs touch geometry from both infrared and visible images. Our results demonstrate that the accuracy of RGB and NIR fusion is higher than that of RGB images alone. Additionally, our GelSplitter framework allows for a flexible configuration of different camera sensor combinations, such as RGB and thermal imaging.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Dark Energy Survey Year 6 Results: Intra-Cluster Light from Redshift 0.2 to 0.5
Authors:
Yuanyuan Zhang,
Jesse B. Golden-Marx,
Ricardo L. C. Ogando,
Brian Yanny,
Eli S. Rykoff,
Sahar Allam,
M. Aguena,
D. Bacon,
S. Bocquet,
D. Brooks,
A. Carnero Rosell,
J. Carretero,
T. -Y. Cheng,
C. Conselice,
M. Costanzi,
L. N. da Costa,
M. E. S. Pereira,
T. M. Davis,
S. Desai,
H. T. Diehl,
P. Doel,
I. Ferrero,
B. Flaugher,
J. Frieman,
D. Gruen
, et al. (24 additional authors not shown)
Abstract:
Using the full six years of imaging data from the Dark Energy Survey, we study the surface brightness profiles of galaxy cluster central galaxies and intra-cluster light. We apply a ``stacking'' method to over four thousand galaxy clusters identified by the redMaPPer cluster finding algorithm in the redshift range of 0.2 to 0.5. This yields high signal-to-noise radial profile measurements of the c…
▽ More
Using the full six years of imaging data from the Dark Energy Survey, we study the surface brightness profiles of galaxy cluster central galaxies and intra-cluster light. We apply a ``stacking'' method to over four thousand galaxy clusters identified by the redMaPPer cluster finding algorithm in the redshift range of 0.2 to 0.5. This yields high signal-to-noise radial profile measurements of the central galaxy and intra-cluster light out to 1 Mpc from the cluster center. Using redMaPPer richness as a cluster mass indicator, we find that the intra-cluster light brightness has a strong mass dependence throughout the 0.2 to 0.5 redshift range, and the dependence grows stronger at a larger radius. In terms of redshift evolution, we find some evidence that the central galaxy, as well as the diffuse light within the transition region between the cluster central galaxy and intra-cluster light within 80 kpc from the center, may be growing over time. At larger radii, more than 80 kpc away from the cluster center, we do not find evidence of additional redshift evolution beyond the cluster mass dependence, which is consistent with the findings from the IllustrisTNG hydrodynamic simulation. We speculate that the major driver of intra-cluster light growth, especially at large radii, is associated with cluster mass growth. Finally, we find that the color of the cluster central galaxy and intra-cluster light displays a radial gradient that becomes bluer at a larger radius, which is consistent with a stellar strip** and disruption origin of intra-cluster light as suggested by simulation studies.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Where Would I Go Next? Large Language Models as Human Mobility Predictors
Authors:
Xinglei Wang,
Meng Fang,
Zichao Zeng,
Tao Cheng
Abstract:
Accurate human mobility prediction underpins many important applications across a variety of domains, including epidemic modelling, transport planning, and emergency responses. Due to the sparsity of mobility data and the stochastic nature of people's daily activities, achieving precise predictions of people's locations remains a challenge. While recently developed large language models (LLMs) hav…
▽ More
Accurate human mobility prediction underpins many important applications across a variety of domains, including epidemic modelling, transport planning, and emergency responses. Due to the sparsity of mobility data and the stochastic nature of people's daily activities, achieving precise predictions of people's locations remains a challenge. While recently developed large language models (LLMs) have demonstrated superior performance across numerous language-related tasks, their applicability to human mobility studies remains unexplored. Addressing this gap, this article delves into the potential of LLMs for human mobility prediction tasks. We introduce a novel method, LLM-Mob, which leverages the language understanding and reasoning capabilities of LLMs for analysing human mobility data. We present concepts of historical stays and context stays to capture both long-term and short-term dependencies in human movement and enable time-aware prediction by using time information of the prediction target. Additionally, we design context-inclusive prompts that enable LLMs to generate more accurate predictions. Comprehensive evaluations of our method reveal that LLM-Mob excels in providing accurate and interpretable predictions, highlighting the untapped potential of LLMs in advancing human mobility prediction techniques. We posit that our research marks a significant paradigm shift in human mobility modelling, transitioning from building complex domain-specific models to harnessing general-purpose LLMs that yield accurate predictions through language instructions. The code for this work is available at https://github.com/xlwang233/LLM-Mob.
△ Less
Submitted 9 January, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Sufficient conditions for $k$-factors and spanning trees of graphs
Authors:
Guoyan Ao,
Ruifang Liu,
**jiang Yuan,
C. T. Ng,
T. C. E. Cheng
Abstract:
For any integer $k\geq1,$ a graph $G$ has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper we prove a sufficient condition in terms of the number of $r$-cliques to guarantee the existence of a $k$-factor in a graph with minimum degree at least $δ$, which improves the sufficient condition of O \cite{O2021} based on the number of edges. For any integer $k\geq2,$ a spanning…
▽ More
For any integer $k\geq1,$ a graph $G$ has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper we prove a sufficient condition in terms of the number of $r$-cliques to guarantee the existence of a $k$-factor in a graph with minimum degree at least $δ$, which improves the sufficient condition of O \cite{O2021} based on the number of edges. For any integer $k\geq2,$ a spanning $k$-tree of a connected graph $G$ is a spanning tree in which every vertex has degree at most $k$. Motivated by the technique of Li and Ning \cite{Li2016}, we present a tight spectral condition for an $m$-connected graph to have a spanning $k$-tree, which extends the result of Fan, Goryainov, Huang and Lin \cite{Fan2021} from $m=1$ to general $m$. Let $T$ be a spanning tree of a connected graph. The leaf degree of $T$ is the maximum number of leaves adjacent to $v$ in $T$ for any $v\in V(T)$. We provide a tight spectral condition for the existence of a spanning tree with leaf degree at most $k$ in a connected graph with minimum degree $δ$, where $k\geq1$ is an integer.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Simplified partial wave expansion of the Lamb shift
Authors:
J. Sapirstein,
K. T. Cheng
Abstract:
A method for calculating the self energy part of the Lamb shift is revisited. When the electron propagator in an external field is represented as an expansion in partial waves, the original method converges relatively slowly, requiring the calculation of dozens of partial waves. Here we show an improved method in which accurate results can be obtained using a much smaller number of partial waves.…
▽ More
A method for calculating the self energy part of the Lamb shift is revisited. When the electron propagator in an external field is represented as an expansion in partial waves, the original method converges relatively slowly, requiring the calculation of dozens of partial waves. Here we show an improved method in which accurate results can be obtained using a much smaller number of partial waves. The method is illustrated for the ground states of hydrogenlike and lithiumlike boron, and the possibility of high accuracy calculations on lower Z hydrogenic ions is discussed.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Tyshkevich's Graph Decomposition and the Distinguishing Numbers of Unigraphs
Authors:
Christine T. Cheng
Abstract:
A $c$-labeling $φ: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $π$ of $G$, there is some vertex $v$ so that $φ(v) \neq φ(π(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling.
We consider a compact version of Tyshkevich's graph decomposition theorem where trivial compo…
▽ More
A $c$-labeling $φ: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $π$ of $G$, there is some vertex $v$ so that $φ(v) \neq φ(π(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling.
We consider a compact version of Tyshkevich's graph decomposition theorem where trivial components are maximally combined to form a complete graph or a graph of isolated vertices. Suppose the compact canonical decomposition of $G$ is $G_{k} \circ G_{k-1} \circ \cdots \circ G_1 \circ G_0$. We prove that $φ$ is a distinguishing labeling of $G$ if and only if $φ$ is a distinguishing labeling of $G_i$ when restricted to $V(G_i)$ for $i = 0, \hdots, k$. Thus, $D(G) = \max \{D(G_i), i = 0, \hdots, k \}$. We then present an algorithm that computes the distinguishing number of a unigraph in linear time.
△ Less
Submitted 26 August, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telescope at 1.15-1.73 GHz
Authors:
Jean-Luc Margot,
Megan G. Li,
Pavlo Pinchuk,
Nathan Myhrvold,
Larry Lesyna,
Lea E. Alcantara,
Megan T. Andrakin,
Jeth Arunseangroj,
Damien S. Baclet,
Madison H. Belk,
Zerxes R. Bhadha,
Nicholas W. Brandis,
Robert E. Carey,
Harrison P. Cassar,
Sai S. Chava,
Calvin Chen,
James Chen,
Kellen T. Cheng,
Alessia Cimbri,
Benjamin Cloutier,
Jordan A. Combitsis,
Kelly L. Couvrette,
Brandon P. Coy,
Kyle W. Davis,
Antoine F. Delcayre
, et al. (56 additional authors not shown)
Abstract:
We conducted a search for narrowband radio signals over four observing sessions in 2020-2023 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. We pointed the telescope in the directions of 62 TESS Objects of Interest, capturing radio emissions from a total of ~11,680 stars and planetary systems in the ~9 arcminute beam of the telescope. All detections were either…
▽ More
We conducted a search for narrowband radio signals over four observing sessions in 2020-2023 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. We pointed the telescope in the directions of 62 TESS Objects of Interest, capturing radio emissions from a total of ~11,680 stars and planetary systems in the ~9 arcminute beam of the telescope. All detections were either automatically rejected or visually inspected and confirmed to be of anthropogenic nature. In this work, we also quantified the end-to-end efficiency of radio SETI pipelines with a signal injection and recovery analysis. The UCLA SETI pipeline recovers 94.0% of the injected signals over the usable frequency range of the receiver and 98.7% of the injections when regions of dense RFI are excluded. In another pipeline that uses incoherent sums of 51 consecutive spectra, the recovery rate is ~15 times smaller at ~6%. The pipeline efficiency affects calculations of transmitter prevalence and SETI search volume. Accordingly, we developed an improved Drake Figure of Merit and a formalism to place upper limits on transmitter prevalence that take the pipeline efficiency and transmitter duty cycle into account. Based on our observations, we can state at the 95% confidence level that fewer than 6.6% of stars within 100 pc host a transmitter that is detectable in our search (EIRP > 1e13 W). For stars within 20,000 ly, the fraction of stars with detectable transmitters (EIRP > 5e16 W) is at most 3e-4. Finally, we showed that the UCLA SETI pipeline natively detects the signals detected with AI techniques by Ma et al. (2023).
△ Less
Submitted 15 October, 2023; v1 submitted 4 August, 2023;
originally announced August 2023.
-
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Authors:
Haoyi Jiang,
Tianheng Cheng,
Naiyu Gao,
Haoyang Zhang,
Tianwei Lin,
Wenyu Liu,
Xinggang Wang
Abstract:
`3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves…
▽ More
`3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves into the integration of instance queries to orchestrate 2D-to-3D reconstruction and 3D scene modeling. Leveraging our proposed Serial Instance-Propagated Attentions, Symphonies dynamically encodes instance-centric semantics, facilitating intricate interactions between image-based and volumetric domains. Simultaneously, Symphonies enables holistic scene comprehension by capturing context through the efficient fusion of instance queries, alleviating geometric ambiguity such as occlusion and perspective errors through contextual scene reasoning. Experimental results demonstrate that Symphonies achieves state-of-the-art performance on challenging benchmarks SemanticKITTI and SSCBench-KITTI-360, yielding remarkable mIoU scores of 15.04 and 18.58, respectively. These results showcase the paradigm's promising advancements. The code is available at https://github.com/hustvl/Symphonies.
△ Less
Submitted 22 November, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
CIMulator: A Comprehensive Simulation Platform for Computing-In-Memory Circuit Macros with Low Bit-Width and Real Memory Materials
Authors:
Hoang-Hiep Le,
Md. Aftab Baig,
Wei-Chen Hong,
Cheng-Hsien Tsai,
Cheng-Jui Yeh,
Fu-Xiang Liang,
I-Ting Huang,
Wei-Tzu Tsai,
Ting-Yin Cheng,
Sourav De,
Nan-Yow Chen,
Wen-Jay Lee,
Ing-Chao Lin,
Da-Wei Chang,
Darsen D. Lu
Abstract:
This paper presents a simulation platform, namely CIMulator, for quantifying the efficacy of various synaptic devices in neuromorphic accelerators for different neural network architectures. Nonvolatile memory devices, such as resistive random-access memory, ferroelectric field-effect transistor, and volatile static random-access memory devices, can be selected as synaptic devices. A multilayer pe…
▽ More
This paper presents a simulation platform, namely CIMulator, for quantifying the efficacy of various synaptic devices in neuromorphic accelerators for different neural network architectures. Nonvolatile memory devices, such as resistive random-access memory, ferroelectric field-effect transistor, and volatile static random-access memory devices, can be selected as synaptic devices. A multilayer perceptron and convolutional neural networks (CNNs), such as LeNet-5, VGG-16, and a custom CNN named C4W-1, are simulated to evaluate the effects of these synaptic devices on the training and inference outcomes. The dataset used in the simulations are MNIST, CIFAR-10, and a white blood cell dataset. By applying batch normalization and appropriate optimizers in the training phase, neuromorphic systems with very low-bit-width or binary weights could achieve high pattern recognition rates that approach software-based CNN accuracy. We also introduce spiking neural networks with RRAM-based synaptic devices for the recognition of MNIST handwritten digits.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration
Authors:
Jiaqi Ma,
Tianheng Cheng,
Guoli Wang,
Qian Zhang,
Xinggang Wang,
Lefei Zhang
Abstract:
Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectu…
▽ More
Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectures for image restoration tasks. In this paper, we present Degradation-aware Visual Prompts, which encode various types of image degradation, e.g., noise and blur, into unified visual prompts. These degradation-aware prompts provide control over image processing and allow weighted combinations for customized image restoration. We then leverage degradation-aware visual prompts to establish a controllable and universal model for image restoration, called ProRes, which is applicable to an extensive range of image restoration tasks. ProRes leverages the vanilla Vision Transformer (ViT) without any task-specific designs. Furthermore, the pre-trained ProRes can easily adapt to new tasks through efficient prompt tuning with only a few images. Without bells and whistles, ProRes achieves competitive performance compared to task-specific methods and experiments can demonstrate its ability for controllable restoration and adaptation for new tasks. The code and models will be released in \url{https://github.com/leonmakise/ProRes}.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
Authors:
Jia-Xing Zhong,
Ta-Ying Cheng,
Yuhang He,
Kai Lu,
Kaichen Zhou,
Andrew Markham,
Niki Trigoni
Abstract:
A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two inter…
▽ More
A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two interconnected, lightweight heads. These heads predict segmentation masks using point-level invariant features and estimate motion from SE(3) equivariant features, all without the need for category information. Our training strategy is unified and can be implemented online, which jointly optimizes the predicted segmentation and motion by leveraging the interrelationships among scene flow, segmentation mask, and rigid transformations. We conduct experiments on four datasets to demonstrate the superiority of our method. The results show that our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds.
△ Less
Submitted 31 October, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Cathodoluminescence spectroscopy of monolayer hexagonal boron nitride
Authors:
K. Shima,
T. S. Cheng,
C. J. Mellor,
P. H. Beton,
C. Elias,
P. Valvin,
B. Gil,
G. Cassabois,
S. V. Novikov,
S. F. Chichibu
Abstract:
Cathodoluminescence (CL) spectroscopy is a powerful technique for studying emission properties of optoelectronic materials because CL is free from excitable bandgap limits and from ambiguous signals due to simple light scattering and resonant Raman scattering potentially involved in the photoluminescence (PL) spectra. However, direct CL measurements of atomically thin two-dimensional materials, su…
▽ More
Cathodoluminescence (CL) spectroscopy is a powerful technique for studying emission properties of optoelectronic materials because CL is free from excitable bandgap limits and from ambiguous signals due to simple light scattering and resonant Raman scattering potentially involved in the photoluminescence (PL) spectra. However, direct CL measurements of atomically thin two-dimensional materials, such as transition metal dichalcogenides and hexagonal boron nitride (hBN), have been difficult due to the small excitation volume that interacts with high-energy electron beams (e-beams). Herein, distinct CL signals from a monolayer hBN, namely mBN, epitaxial film grown on a highly oriented pyrolytic graphite substrate are shown by using a home-made CL system capable of large-area and surface-sensitive excitation by an e-beam. The spatially resolved CL spectra at 13 K exhibited a predominant 5.5-eV emission band, which has been ascribed to originate from multilayered aggregates of hBN, markedly at thicker areas formed on the step edges of the substrate. Conversely, a faint peak at 6.04 eV was routinely observed from atomically flat areas. Since the energy agreed with the PL peak of 6.05 eV at 10 K that has been assigned as being due to the recombination of phonon-assisted direct excitons of mBN by Elias et al. [Nat. Commun. 10, 2639 (2019)], the CL peak at 6.04 eV is attributed to originate from the mBN epilayer. The CL results support the transition from indirect bandgap in bulk hBN to direct bandgap in mBN, in analogy with molybdenum disulfide. The results also encourage to elucidate emission properties of other low-dimensional materials with reduced excitation volumes by using the present CL configuration.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Towards clinical AI fairness: A translational perspective
Authors:
Mingxuan Liu,
Yilin Ning,
Salinelat Teixayavong,
Mayli Mertens,
Jie Xu,
Daniel Shu Wei Ting,
Lionel Tim-Ee Cheng,
Jasmine Chiat Ling Ong,
Zhen Ling Teo,
Ting Fang Tan,
Ravi Chandran Narrendar,
Fei Wang,
Leo Anthony Celi,
Marcus Eng Hock Ong,
Nan Liu
Abstract:
Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the issue of fairness remains a concern in high-stakes fields such as healthcare. Despite extensive discussion and efforts in algorithm development, AI fairness and clinical concerns have not been adequately addressed. In this paper, we discuss the misalignment between technical and clinical perspectives o…
▽ More
Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the issue of fairness remains a concern in high-stakes fields such as healthcare. Despite extensive discussion and efforts in algorithm development, AI fairness and clinical concerns have not been adequately addressed. In this paper, we discuss the misalignment between technical and clinical perspectives of AI fairness, highlight the barriers to AI fairness' translation to healthcare, advocate multidisciplinary collaboration to bridge the knowledge gap, and provide possible solutions to address the clinical concerns pertaining to AI fairness.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Membrane Potential Distribution Adjustment and Parametric Surrogate Gradient in Spiking Neural Networks
Authors:
Siqi Wang,
Tee Hiang Cheng,
Meng-Hiot Lim
Abstract:
As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rul…
▽ More
As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rule, most SGs are chosen intuitively. We propose the parametric surrogate gradient (PSG) method to iteratively update SG and eventually determine an optimal surrogate gradient parameter, which calibrates the shape of candidate SGs. In SNNs, neural potential distribution tends to deviate unpredictably due to quantization error. We evaluate such potential shift and propose methodology for potential distribution adjustment (PDA) to minimize the loss of undesired pre-activations. Experimental results demonstrate that the proposed methods can be readily integrated with backpropagation through time (BPTT) algorithm and help modulated SNNs to achieve state-of-the-art performance on both static and dynamic dataset with fewer timesteps.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene
Authors:
Shaoyu Chen,
Yunchi Zhang,
Bencheng Liao,
Jiafeng Xie,
Tianheng Cheng,
Wei Sui,
Qian Zhang,
Chang Huang,
Wenyu Liu,
Xinggang Wang
Abstract:
High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geo…
▽ More
High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value. Code: https://github.com/hustvl/VMA.
△ Less
Submitted 27 August, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
On the second order of Zeta functional equations for Riemann Type
Authors:
Chin-yuan Hu,
Tsung-lin Cheng,
Ie-bin Lian
Abstract:
This paper discuss a new class of functional equations by using both Poisson summation formula and Jacobi type theta a function. The class of Riemann type functional equations are derived from self-reciprocal probability density functions. Finally, the second order Zeta functional equations for Riemann type is also investigated.
This paper discuss a new class of functional equations by using both Poisson summation formula and Jacobi type theta a function. The class of Riemann type functional equations are derived from self-reciprocal probability density functions. Finally, the second order Zeta functional equations for Riemann type is also investigated.
△ Less
Submitted 21 April, 2024; v1 submitted 19 April, 2023;
originally announced April 2023.
-
TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors
Authors:
Shaoyu Chen,
Tianheng Cheng,
Jiemin Fang,
Qian Zhang,
Yuan Li,
Wenyu Liu,
Xinggang Wang
Abstract:
Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-res…
▽ More
Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-resolution feature maps for dense anchoring to better cover small objects, proposes a sparsely-connected convolution for computation reduction, enhances the early stage features in the backbone, and addresses the feature misalignment problem for accurate small object detection. On the COCO benchmark, our TinyDet-M achieves 30.3 AP and 13.5 AP^s with only 991 MFLOPs, which is the first detector that has an AP over 30 with less than 1 GFLOPs; besides, TinyDet-S and TinyDet-L achieve promising performance under different computation limitation.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
MobileInst: Video Instance Segmentation on the Mobile
Authors:
Renhong Zhang,
Tianheng Cheng,
Shusheng Yang,
Haoyi Jiang,
Shuai Zhang,
Jiancheng Lyu,
Xin Li,
Xiaowen Ying,
Dashan Gao,
Wenyu Liu,
Xinggang Wang
Abstract:
Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile…
▽ More
Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.
△ Less
Submitted 18 December, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
Authors:
Bencheng Liao,
Shaoyu Chen,
Bo Jiang,
Tianheng Cheng,
Qian Zhang,
Wenyu Liu,
Chang Huang,
Xinggang Wang
Abstract:
Online lane graph construction is a promising but challenging task in autonomous driving. Previous methods usually model the lane graph at the pixel or piece level, and recover the lane graph by pixel-wise or piece-wise connection, which breaks down the continuity of the lane. Human drivers focus on and drive along the continuous and complete paths instead of considering lane pieces. Autonomous ve…
▽ More
Online lane graph construction is a promising but challenging task in autonomous driving. Previous methods usually model the lane graph at the pixel or piece level, and recover the lane graph by pixel-wise or piece-wise connection, which breaks down the continuity of the lane. Human drivers focus on and drive along the continuous and complete paths instead of considering lane pieces. Autonomous vehicles also require path-specific guidance from lane graph for trajectory planning. We argue that the path, which indicates the traffic flow, is the primitive of the lane graph. Motivated by this, we propose to model the lane graph in a novel path-wise manner, which well preserves the continuity of the lane and encodes traffic information for planning. We present a path-based online lane graph construction method, termed LaneGAP, which end-to-end learns the path and recovers the lane graph via a Path2Graph algorithm. We qualitatively and quantitatively demonstrate the superiority of LaneGAP over conventional pixel-based and piece-based methods on challenging nuScenes and Argoverse2 datasets. Abundant visualizations show LaneGAP can cope with diverse traffic conditions. Code and models will be released at \url{https://github.com/hustvl/LaneGAP} for facilitating future research.
△ Less
Submitted 17 December, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.