Search | arXiv e-print repository

arXiv:2406.20076 [pdf, other]

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Authors: Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Long** Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

Abstract: Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (… ▽ More Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (EVF-SAM). EVF-SAM is a simple yet effective referring segmentation method which exploits multimodal prompts (i.e., image and text) and comprises a pre-trained vision-language model to generate referring prompts and a SAM model for segmentation. Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e.g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation. Our experiments show that the proposed EVF-SAM based on BEIT-3 can obtain state-of-the-art performance on RefCOCO/+/g for referring expression segmentation and demonstrate the superiority of prompting SAM with early vision-language fusion. In addition, the proposed EVF-SAM with 1.32B parameters achieves remarkably higher performance while reducing nearly 82% of parameters compared to previous SAM methods based on large multimodal models. △ Less

Submitted 3 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

Comments: Preprint. Code and models are available at: https://github.com/hustvl/EVF-SAM

arXiv:2406.09611 [pdf, other]

Recy-ctronics: Designing Fully Recyclable Electronics With Varied Form Factors

Authors: Tingyu Cheng, Zhihan Zhang, Han Huang, Yingting Gao, Wei Sun, Gregory D. Abowd, HyunJoo Oh, Josiah Hester

Abstract: For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materi… ▽ More For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materials-specifically, polyvinyl alcohol (PVA) and liquid metal (LM)-alongside accessible manufacturing techniques to produce electronic components and systems with versatile form factors. Our work centers on the development of recyclable electronics through three methods: 1) creating sheet electronics by screen printing LM traces on PVA substrates; 2) develo** foam-based electronics by immersing mechanically stirred PVA foam into an LM solution; and 3) fabricating recyclable electronic tubes by injecting LM into mold cast PVA tubes, which can then be woven into various structures. To further assess the sustainability of our proposed methods, we conducted a life cycle assessment (LCA) to evaluate the environmental impact of our recyclable electronics in comparison to their conventional counterparts. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.17921 [pdf]

Towards Clinical AI Fairness: Filling Gaps in the Puzzle

Authors: Mingxuan Liu, Yilin Ning, Salinelat Teixayavong, Xiaoxuan Liu, Mayli Mertens, Yuqing Shang, Xin Li, Di Miao, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Narrendar RaviChandran, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan Liu

Abstract: The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva… ▽ More The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical advancements and their practical clinical applications, resulting in a lack of contextualized discussion of AI fairness in clinical settings. Through a detailed evidence gap analysis, our review systematically pinpoints several deficiencies concerning both healthcare data and the provided AI fairness solutions. We highlight the scarcity of research on AI fairness in many medical domains where AI technology is increasingly utilized. Additionally, our analysis highlights a substantial reliance on group fairness, aiming to ensure equality among demographic groups from a macro healthcare system perspective; in contrast, individual fairness, focusing on equity at a more granular level, is frequently overlooked. To bridge these gaps, our review advances actionable strategies for both the healthcare and AI research communities. Beyond applying existing AI fairness methods in healthcare, we further emphasize the importance of involving healthcare professionals to refine AI fairness concepts and methods to ensure contextually relevant and ethically sound AI applications in healthcare. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17040 [pdf, other]

Claw-free minimal matching covered graphs

Authors: Yipei Zhang, Xiumei Wang, **jiang Yuan, C. T. Ng, T. C. E. Cheng

Abstract: A matching covered graph $G$ is minimal if for each edge $e$ of $G$, $G-e$ is not matching covered. An edge $e$ of a matching covered graph $G$ is removable if $G-e$ is also matching covered. Thus a matching covered graph is minimal if and only if it is free of removable edges. For bipartite graphs, Lovász and Plummer gave a characterization of bipartite minimal matching covered graphs. For bricks… ▽ More A matching covered graph $G$ is minimal if for each edge $e$ of $G$, $G-e$ is not matching covered. An edge $e$ of a matching covered graph $G$ is removable if $G-e$ is also matching covered. Thus a matching covered graph is minimal if and only if it is free of removable edges. For bipartite graphs, Lovász and Plummer gave a characterization of bipartite minimal matching covered graphs. For bricks, Lovász showed that the only bricks that are minimal matching covered are $K_4$ and $\overline{C_6}$. In this paper, we present a complete characterization of minimal matching covered graphs that are claw-free. Moreover, for cubic claw-free matching covered graphs that are not minimal matching covered, we obtain the number of their removable edges (with respect to their bricks), and then prove that they have at least 12 removable edges (the bound is sharp). △ Less

Submitted 27 May, 2024; originally announced May 2024.

MSC Class: 05C70; 05C75

arXiv:2405.06100 [pdf, other]

doi 10.1093/mnras/stae1154

OzDES Reverberation Map** Program: Stacking analysis with H$β$, Mg II and C IV

Authors: Umang Malik, Rob Sharp, A. Penton, Z. Yu, P. Martini, B. E. Tucker, T. M. Davis, G. F. Lewis, C. Lidman, M. Aguena, O. Alves, J. Annis, J. Asorey, D. Bacon, D. Brooks, A. Carnero Rosell, J. Carretero, T. -Y. Cheng, L. N. da Costa, M. E. S. Pereira, J. De Vicente, P. Doel, I. Ferrero, J. Frieman, G. Giannini , et al. (25 additional authors not shown)

Abstract: Reverberation map** is the leading technique used to measure direct black hole masses outside of the local Universe. Additionally, reverberation measurements calibrate secondary mass-scaling relations used to estimate single-epoch virial black hole masses. The Australian Dark Energy Survey (OzDES) conducted one of the first multi-object reverberation map** surveys, monitoring 735 AGN up to… ▽ More Reverberation map** is the leading technique used to measure direct black hole masses outside of the local Universe. Additionally, reverberation measurements calibrate secondary mass-scaling relations used to estimate single-epoch virial black hole masses. The Australian Dark Energy Survey (OzDES) conducted one of the first multi-object reverberation map** surveys, monitoring 735 AGN up to $z\sim4$, over 6 years. The limited temporal coverage of the OzDES data has hindered recovery of individual measurements for some classes of sources, particularly those with shorter reverberation lags or lags that fall within campaign season gaps. To alleviate this limitation, we perform a stacking analysis of the cross-correlation functions of sources with similar intrinsic properties to recover average composite reverberation lags. This analysis leads to the recovery of average lags in each redshift-luminosity bin across our sample. We present the average lags recovered for the H$β$, Mg II and C IV samples, as well as multi-line measurements for redshift bins where two lines are accessible. The stacking analysis is consistent with the Radius-Luminosity relations for each line. Our results for the H$β$ sample demonstrate that stacking has the potential to improve upon constraints on the $R-L$ relation, which have been derived only from individual source measurements until now. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 20 pages, 15 figures. Accepted by MNRAS

Report number: FERMILAB-PUB-23-381-PPD

arXiv:2404.06425 [pdf, other]

ZeST: Zero-Shot Material Transfer from a Single Image

Authors: Ta-Ying Cheng, Prafull Sharma, Andrew Markham, Niki Trigoni, Varun Jampani

Abstract: We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry… ▽ More We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry cue and grayscale object shading as illumination cues. The method works on real images without any training resulting a zero-shot approach. Both qualitative and quantitative results on real and synthetic datasets demonstrate that ZeST outputs photorealistic images with transferred materials. We also show the application of ZeST to perform multiple edits and robust material assignment under different illuminations. Project Page: https://ttchengab.github.io/zest △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Project Page: https://ttchengab.github.io/zest

arXiv:2403.13438 [pdf, other]

SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors

Authors: Chenyang Ma, Kai Lu, Ta-Ying Cheng, Niki Trigoni, Andrew Markham

Abstract: Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance th… ▽ More Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance the spatial reasoning capabilities of VLMs through prompting and interacting with priors from multiple 3D foundation models in a zero-shot, training-free manner. Extensive experiments demonstrate that our spatial reasoning-imbued VLM performs well on various forms of spatial VQA and can extend to help in various downstream robotics tasks such as pick and stack and trajectory planning. △ Less

Submitted 6 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Project Page: https://dannymcy.github.io/zeroshot_task_hallucination/

arXiv:2403.07982 [pdf, other]

The XMM Cluster Survey: Automating the estimation of hydrostatic mass for large samples of galaxy clusters I -- Methodology, Validation, & Application to the SDSSRM-XCS sample

Authors: D. J. Turner, P. A. Giles, A. K. Romer, J. Pilling, T. K. Lingard, R. Wilkinson, M. Hilton, E. W. Upsdell, R. Al-Serkal, T. Cheng, R. Eappen, P. J. Rooney, S. Bhargava, C. A. Collins, J. Mayers, C. Miller, R. C. Nichol, M. Sahén, P. T. P. Viana

Abstract: We describe features of the X-ray: Generate and Analyse (XGA) open-source software package that have been developed to facilitate automated hydrostatic mass ($M_{\rm hydro}$) measurements from XMM X-ray observations of clusters of galaxies. This includes describing how XGA measures global, and radial, X-ray properties of galaxy clusters. We then demonstrate the reliability of XGA by comparing simp… ▽ More We describe features of the X-ray: Generate and Analyse (XGA) open-source software package that have been developed to facilitate automated hydrostatic mass ($M_{\rm hydro}$) measurements from XMM X-ray observations of clusters of galaxies. This includes describing how XGA measures global, and radial, X-ray properties of galaxy clusters. We then demonstrate the reliability of XGA by comparing simple X-ray properties, namely the X-ray temperature and gas mass, with published values presented by the XMM Cluster Survey (XCS), the Ultimate XMM eXtragaLactic survey project (XXL), and the Local Cluster Substructure Survey (LoCuSS). XGA measured values for temperature are, on average, within 1% of the values reported in the literature for each sample. XGA gas masses for XXL clusters are shown to be ${\sim}$10% lower than previous measurements (though the difference is only significant at the $\sim$1.8$σ$ level), LoCuSS $R_{2500}$ and $R_{500}$ gas mass re-measurements are 3% and 7% lower respectively (representing a 1.5$σ$ and 3.5$σ$ difference). Like-for-like comparisons of hydrostatic mass are made to LoCuSS results, which show that our measurements are $10{\pm}3%$ ($19{\pm}7%$) higher for $R_{2500}$ ($R_{500}$). The comparison between $R_{500}$ masses shows significant scatter. Finally, we present new $M_{\rm hydro}$ measurements for 104 clusters from the SDSS DR8 redMaPPer XCS sample (SDSSRM-XCS). Our SDSSRM-XCS hydrostatic mass measurements are in good agreement with multiple literature estimates, and represent one of the largest samples of consistently measured hydrostatic masses. We have demonstrated that XGA is a powerful tool for X-ray analysis of clusters; it will render complex-to-measure X-ray properties accessible to non-specialists. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 24 pages (18 + 6 appendices), 15 figures, submitted to MNRAS; see https://github.com/DavidT3/XCS-Mass-Paper-I-Analysis for the code and samples

arXiv:2403.02314 [pdf, other]

Dark Energy Survey Year 3 results: likelihood-free, simulation-based $w$CDM inference with neural compression of weak-lensing map statistics

Authors: N. Jeffrey, L. Whiteway, M. Gatti, J. Williamson, J. Alsing, A. Porredon, J. Prat, C. Doux, B. Jain, C. Chang, T. -Y. Cheng, T. Kacprzak, P. Lemos, A. Alarcon, A. Amon, K. Bechtol, M. R. Becker, G. M. Bernstein, A. Campos, A. Carnero Rosell, R. Chen, A. Choi, J. DeRose, A. Drlica-Wagner, K. Eckert , et al. (66 additional authors not shown)

Abstract: We present simulation-based cosmological $w$CDM inference using Dark Energy Survey Year 3 weak-lensing maps, via neural data compression of weak-lensing map summary statistics: power spectra, peak counts, and direct map-level compression/inference with convolutional neural networks (CNN). Using simulation-based inference, also known as likelihood-free or implicit inference, we use forward-modelled… ▽ More We present simulation-based cosmological $w$CDM inference using Dark Energy Survey Year 3 weak-lensing maps, via neural data compression of weak-lensing map summary statistics: power spectra, peak counts, and direct map-level compression/inference with convolutional neural networks (CNN). Using simulation-based inference, also known as likelihood-free or implicit inference, we use forward-modelled mock data to estimate posterior probability distributions of unknown parameters. This approach allows all statistical assumptions and uncertainties to be propagated through the forward-modelled mock data; these include sky masks, non-Gaussian shape noise, shape measurement bias, source galaxy clustering, photometric redshift uncertainty, intrinsic galaxy alignments, non-Gaussian density fields, neutrinos, and non-linear summary statistics. We include a series of tests to validate our inference results. This paper also describes the Gower Street simulation suite: 791 full-sky PKDGRAV dark matter simulations, with cosmological model parameters sampled with a mixed active-learning strategy, from which we construct over 3000 mock DES lensing data sets. For $w$CDM inference, for which we allow $-1<w<-\frac{1}{3}$, our most constraining result uses power spectra combined with map-level (CNN) inference. Using gravitational lensing data only, this map-level combination gives $Ω_{\rm m} = 0.283^{+0.020}_{-0.027}$, ${S_8 = 0.804^{+0.025}_{-0.017}}$, and $w < -0.80$ (with a 68 per cent credible interval); compared to the power spectrum inference, this is more than a factor of two improvement in dark energy parameter ($Ω_{\rm DE}, w$) precision. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 19 pages, 15 figures, submitted to Monthly Notices of the Royal Astronomical Society

arXiv:2402.16028 [pdf, other]

FedFDP: Fairness-Aware Federated Learning with Differential Privacy

Authors: Xinpeng Ling, Jie Fu, Kuncan Wang, Huifa Li, Tong Cheng, Zhili Chen

Abstract: Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the tr… ▽ More Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the transmission of gradients or models in federated learning can also give rise to privacy leakage issues, such as membership inference attacks. To address the first issue mentioned above, we propose a fairness-aware federated learning algorithm, termed FedFair. Building upon FedFair, we introduce privacy protection to form the FedFDP algorithm to address the second issue mentioned above. In FedFDP, we devise a fairness-aware clip** strategy to achieve differential privacy while adjusting fairness. Additionally, for the extra uploaded loss values, we present an adaptive clip** approach to maximize utility. Furthermore, we theoretically prove that our algorithm converges and ensures differential privacy. Lastly, extensive experimental results demonstrate that FedFair and FedFDP significantly outperform state-of-the-art solutions in terms of model performance and fairness. Code and data is accessible at https://anonymous.4open.science/r/FedFDP-5607. △ Less

Submitted 20 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.15504 [pdf, other]

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Authors: Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen

Abstract: Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts --… ▽ More Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e.g., LAION). Second, given an image containing multiple personalized concepts, there lacks a holistic metric that evaluates performance on not just the degree of resemblance of personalized concepts, but also whether all concepts are present in the image and whether the image accurately reflects the overall text description. To address these issues, we introduce Gen4Gen, a semi-automated dataset creation pipeline utilizing generative models to combine personalized concepts into complex compositions along with text-descriptions. Using this, we create a dataset called MyCanvas, that can be used to benchmark the task of multi-concept personalization. In addition, we design a comprehensive metric comprising two scores (CP-CLIP and TI-CLIP) for better quantifying the performance of multi-concept, personalized text-to-image diffusion methods. We provide a simple baseline built on top of Custom Diffusion with empirical prompting strategies for future researchers to evaluate on MyCanvas. We show that by improving data quality and prompting strategies, we can significantly increase multi-concept personalized image generation quality, without requiring any modifications to model architecture or training algorithms. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: Preprint; Project Page: https://danielchyeh.github.io/Gen4Gen/

arXiv:2402.08654 [pdf, other]

Learning Continuous 3D Words for Text-to-Image Generation

Authors: Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni

Abstract: Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input… ▽ More Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input tokens that can be transformed in a continuous manner -- we call them Continuous 3D Words. These attributes can, for example, be represented as sliders and applied jointly with text prompts for fine-grained control over image generation. Given only a single mesh and a rendering engine, we show that our approach can be adopted to provide continuous user control over several 3D-aware attributes, including time-of-day illumination, bird wing orientation, dollyzoom effect, and object poses. Our method is capable of conditioning image creation with multiple Continuous 3D Words and text descriptions simultaneously while adding no overhead to the generative process. Project Page: https://ttchengab.github.io/continuous_3d_words △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Project Page: https://ttchengab.github.io/continuous_3d_words

arXiv:2402.04517 [pdf]

Automating the audit of electronic invoices with a soft robot

Authors: Tian Jun Cheng, Chia Jung Chen, Yao Lin Ong, Yi Fang Yang, Guang Yih Sheu

Abstract: Taiwan's Chi Mei Medical Center has completed four challenges mentioned in published robotic process automation (RPA) studies including automating a dynamic process, designing feasible human-robot collaboration, incorporating other emerging technologies, and bringing positive business impacts. Its executives called a committee to implement the electronic invoicing. This implementation includes the… ▽ More Taiwan's Chi Mei Medical Center has completed four challenges mentioned in published robotic process automation (RPA) studies including automating a dynamic process, designing feasible human-robot collaboration, incorporating other emerging technologies, and bringing positive business impacts. Its executives called a committee to implement the electronic invoicing. This implementation includes the creation of a software robot to download automatically cloud electronic invoice (E-invoice) data from Taiwan's E-invoice platform and detect the inconsistency between them and on-premise data. This bot operates when internal auditors are off their office. They satisfied this software robot since the remaining work is only verifying the resulting inconsistency. The Chi Mei Medical Center measured the time and costs before and after adopting software robots to audit E-invoice; consequently, it welcomed more bots automating other business processes. In conclusion, integrating a software robot with other emerging technologies mitigates the possible errors provided by this bot. A good human-robot collaboration relies on the consideration of human perspective in choosing RPA tasks. Free bot creators are sufficient to verify that automating a business process using a bot is a reasonable investment. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 11 pages, 6 figures, 1 table

arXiv:2402.01297 [pdf, other]

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

Authors: Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius

Abstract: We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is… ▽ More We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature. △ Less

Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17270 [pdf, other]

YOLO-World: Real-Time Open-Vocabulary Object Detection

Authors: Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan

Abstract: The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling an… ▽ More The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation. △ Less

Submitted 22 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Work still in progress. Code & models are available at: https://github.com/AILab-CVC/YOLO-World

arXiv:2401.09126 [pdf, other]

Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Authors: Benjamin Ummenhofer, Sanskar Agrawal, Rene Sepulveda, Yixing Lao, Kai Zhang, Tianhang Cheng, Stephan Richter, Shenlong Wang, German Ros

Abstract: Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This w… ▽ More Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This work presents a real-world dataset for measuring the reconstruction and rendering of objects for relighting. To this end, we capture the environment lighting and ground truth images of the same objects in multiple environments allowing to reconstruct the objects from images taken in one environment and quantify the quality of the rendered views for the unseen lighting environments. Further, we introduce a simple baseline composed of off-the-shelf methods and test several state-of-the-art methods on the relighting task and show that novel view synthesis is not a reliable proxy to measure performance. Code and dataset are available at https://github.com/isl-org/objects-with-lighting . △ Less

Submitted 13 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted at 3DV 2024, Oral presentation. For the project page see https://github.com/isl-org/objects-with-lighting

arXiv:2401.05236 [pdf, other]

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Authors: Tianhang Cheng, Wei-Chiu Ma, Kaiyu Guan, Antonio Torralba, Shenlong Wang

Abstract: Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multi… ▽ More Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: Code: https://github.com/Tianhang-Cheng/SfD

arXiv:2312.14113 [pdf, other]

Magnetar-powered Neutrinos and Magnetic Moment Signatures at IceCube

Authors: Ting Cheng, Hao-Jui Kuan, Ying-Ying Li, Vedran Brdar

Abstract: The IceCube collaboration pioneered the detection of $\mathcal{O}{(\text{PeV})}$ neutrino events and the identification of astrophysical sources of high-energy neutrinos. In this study, we explore scenarios in which high-energy neutrinos are produced in the vicinity of astrophysical objects with strong magnetic field, such as magnetars. While propagating through such magnetic field, neutrinos expe… ▽ More The IceCube collaboration pioneered the detection of $\mathcal{O}{(\text{PeV})}$ neutrino events and the identification of astrophysical sources of high-energy neutrinos. In this study, we explore scenarios in which high-energy neutrinos are produced in the vicinity of astrophysical objects with strong magnetic field, such as magnetars. While propagating through such magnetic field, neutrinos experience spin precession induced by their magnetic moments, and this impacts their helicity and flavor composition at Earth. Considering both flavor composition of high-energy neutrinos and Glashow resonance events we find that detectable signatures may arise at neutrino telescopes, such as IceCube, for presently unconstrained neutrino magnetic moments in the range between $\mathcal{O}(10^{-15})~μ_B$ and $\mathcal{O}(10^{-12})~μ_B$. △ Less

Submitted 10 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 27 pages, 10 figures, accepted for publication in JCAP

Report number: USTC-ICTS/PCFT-23-42

arXiv:2312.09573 [pdf]

Infrared anomalies in ultrathin Ti3C2Tx MXene films

Authors: Meng Li, Tao Cheng, Gongze Liu, He Huang, Keqiao Li, Yang Li, Jiayue Yang, Baoling Huang

Abstract: Visible transparent but infrared reflective materials are ideal candidates for both transparent conductive films and low-emissivity glass, which are highly desired in a broad variety of areas such as touchscreens and displays, photovoltaics, smart windows, and antistatic coatings. Ultrathin Ti3C2Tx MXene films are emerging as promising low-emissivity transparent candidates. However, the fundamenta… ▽ More Visible transparent but infrared reflective materials are ideal candidates for both transparent conductive films and low-emissivity glass, which are highly desired in a broad variety of areas such as touchscreens and displays, photovoltaics, smart windows, and antistatic coatings. Ultrathin Ti3C2Tx MXene films are emerging as promising low-emissivity transparent candidates. However, the fundamental IR properties of Ti3C2Tx has not been revealed experimentally due to daunting challenges in the preparation of continuous, large-area, and ultrathin films of optical quality on flat substrates. Herein, we proposed a tape-free transfer method that can help prepare centimeter-size and ultrathin (down to 8 nm) Ti3C2Tx films on diverse optical substrates. Benefitting from this method, the refractive index and permittivity for Ti3C2Tx were successfully measured. Ti3C2Tx films exhibit large in-plane permittivity in the IR region, yielding maximum IR reflectance of 88% for bulk films. Interestingly, three anomalies were found in ultrathin Ti3C2Tx films: strong dispersion in the permittivity, interlayer space-dependent optical properties, and abnormally high IR absorption for a 15-nm-thick film. These anomalies are important guidelines in the design of Ti3C2Tx-based low-emissivity transparent films and other related devices, and may inspire other intriguing applications such as ultrathin IR absorption coatings and tunable IR optical devices. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.08917 [pdf, other]

An Incremental Unified Framework for Small Defect Inspection

Authors: Jiaqi Tang, Hao Lu, Xiaogang Xu, Ruizheng Wu, Sixing Hu, Tong Zhang, Tsz Wa Cheng, Ming Ge, Ying-Cong Chen, Fugee Tsung

Abstract: Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advanta… ▽ More Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advantageous in object-incremental learning scenarios. Employing a state-of-the-art transformer, we introduce Object-Aware Self-Attention (OASA) to delineate distinct semantic boundaries. Semantic Compression Loss (SCL) is integrated to optimize non-primary semantic space, enhancing network adaptability for novel objects. Additionally, we prioritize retaining the features of established objects during weight updates. Demonstrating prowess in both image and pixel-level defect inspection, our approach achieves state-of-the-art performance, proving indispensable for dynamic and scalable industrial inspections. Our code will be released at \url{https://github.com/jqtangust/IUF}. △ Less

Submitted 24 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.06053 [pdf, other]

IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions

Authors: Ziheng Zeng, Kellen Tan Cheng, Srihari Venkat Nanniyur, Jianing Zhou, Suma Bhat

Abstract: Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020… ▽ More Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020 graph, converting PTLMs into knowledge models (KMs) that encode and infer commonsense knowledge related to IE use. Experiments show that various PTLMs can be converted into KMs with IEKG. We verify the quality of IEKG and the ability of the trained KMs with automatic and human evaluation. Through applications in natural language understanding, we show that a PTLM injected with knowledge from IEKG exhibits improved IE comprehension ability and can generalize to IEs unseen during training. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

arXiv:2312.04814 [pdf, other]

A Unified Particle-Based Solver for Non-Newtonian Behaviors Simulation

Authors: Chunlei Li, Yang Gao, Jiayi He, Tianwei Cheng, Shuai Li, Aimin Hao, Hong Qin

Abstract: In this paper, we present a unified framework to simulate non-Newtonian behaviors. We combine viscous and elasto-plastic stress into a unified particle solver to achieve various non-Newtonian behaviors ranging from fluid-like to solid-like. Our constitutive model is based on a Generalized Maxwell model, which incorporates viscosity, elasticity and plasticity in one non-linear framework by a unifie… ▽ More In this paper, we present a unified framework to simulate non-Newtonian behaviors. We combine viscous and elasto-plastic stress into a unified particle solver to achieve various non-Newtonian behaviors ranging from fluid-like to solid-like. Our constitutive model is based on a Generalized Maxwell model, which incorporates viscosity, elasticity and plasticity in one non-linear framework by a unified way. On the one hand, taking advantage of the viscous term, we construct a series of strain-rate dependent models for classical non-Newtonian behaviors such as shear-thickening, shear-thinning, Bingham plastic, etc. On the other hand, benefiting from the elasto-plastic model, we empower our framework with the ability to simulate solid-like non-Newtonian behaviors, i.e., visco-elasticity/plasticity. In addition, we enrich our method with a heat diffusion model to make our method flexible in simulating phase change. Through sufficient experiments, we demonstrate a wide range of non-Newtonian behaviors ranging from viscous fluid to deformable objects. We believe this non-Newtonian model will enhance the realism of physically-based animation, which has great potential for computer graphics. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 12 pages

arXiv:2312.03294 [pdf, other]

doi 10.1016/j.jfds.2023.100113

A General Framework for Portfolio Construction Based on Generative Models of Asset Returns

Authors: Tuoyuan Cheng, Kan Chen

Abstract: In this paper, we present an integrated approach to portfolio construction and optimization, leveraging high-performance computing capabilities. We first explore diverse pairings of generative model forecasts and objective functions used for portfolio optimization, which are evaluated using performance-attribution models based on LASSO. We illustrate our approach using extensive simulations of cry… ▽ More In this paper, we present an integrated approach to portfolio construction and optimization, leveraging high-performance computing capabilities. We first explore diverse pairings of generative model forecasts and objective functions used for portfolio optimization, which are evaluated using performance-attribution models based on LASSO. We illustrate our approach using extensive simulations of crypto-currency portfolios, and we show that the portfolios constructed using the vine-copula generative model and the Sharpe-ratio objective function consistently outperform. To accommodate a wide array of investment strategies, we further investigate portfolio blending and propose a general framework for evaluating and combining investment strategies. We employ an extension of the multi-armed bandit framework and use value models and policy models to construct eclectic blended portfolios based on past performance. We consider similarity and optimality measures for value models and employ probability-matching ("blending") and a greedy algorithm ("switching") for policy models. The eclectic portfolios are also evaluated using LASSO models. We show that the value model utilizing cosine similarity and logit optimality consistently delivers robust superior performances. The extent of outperformance by eclectic portfolios over their benchmarks significantly surpasses that achieved by individual generative model-based portfolios over their respective benchmarks. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Journal ref: The Journal of Finance and Data Science (2023): 100113

arXiv:2311.06279 [pdf]

doi 10.1049/gtd2.13034

A novel method of restoration path optimization for the AC-DC bulk power grid after a major blackout

Authors: Chao Yang, Gaoshen Liang, Tianle Cheng, Yang Li, Shaoyan Li

Abstract: The restoration control of the modern alternating current-direct current (AC-DC) hybrid power grid after a major blackout is difficult and complex. Taking into account the interaction between the line-commutated converter high-voltage direct current (LCC-HVDC) and the AC power grid, this paper proposes a novel optimization method of restoration path to reconfigure the skeleton network for the blac… ▽ More The restoration control of the modern alternating current-direct current (AC-DC) hybrid power grid after a major blackout is difficult and complex. Taking into account the interaction between the line-commutated converter high-voltage direct current (LCC-HVDC) and the AC power grid, this paper proposes a novel optimization method of restoration path to reconfigure the skeleton network for the blackout power grid. Based on the system strength, the supporting capability of the AC power grid for the LCC-HVDC is first analysed from the aspects of start-up and operation of LCC-HVDCs. Subsequently, the quantitative relationship between the restoration path and the restoration characteristic of LCC-HVDC is derived in detail based on the system strength indices of the short-circuit capacity and the frequency regulation capability. Then, an optimization model of restoration path considering non-tree paths is formulated and a feasible optimization algorithm is proposed to achieve the optimal path restoration scheme. A modified IEEE 39-bus system and a partial power grid of Southwest China are simulated to show that the proposed method is suitable for the restoration of AC-DC power grids and can improve restoration efficiency. This research can be an important guidance for operators to rapidly restore the AC-DC power grid. △ Less

Submitted 27 October, 2023; originally announced November 2023.

Comments: Accepted by IET Generation, Transmission & Distribution

Journal ref: IET Generation, Transmission & Distribution 17 (2023) 5240-5251

arXiv:2311.06120 [pdf]

Converse Flexoelectricity of Low-Dimensional Bismuth Selenite (Bi2Se3) Revealed by Piezoresponse Force Microscopy (PFM)

Authors: Qiong Liu, S. S. Nanthakumar, Bin Li, Teresa Cheng, Florian Bittner, Chenxi Ma, Fei Ding, Lei Zheng, Bernhard Roth, Xiaoying Zhuang

Abstract: Many kinds of two-dimensional (2D) van der Waals (vdW) have been demonstrated to exhibit electromechanical coupling effects, which makes them promising candidates for next-generation devices, such as piezotronics and nanogenerators. Recently, flexoelectricity was found to account for the out-of-plane electromechanical coupling in many 2D transition metal dichalcogenides (TMDs) who only exhibit in-… ▽ More Many kinds of two-dimensional (2D) van der Waals (vdW) have been demonstrated to exhibit electromechanical coupling effects, which makes them promising candidates for next-generation devices, such as piezotronics and nanogenerators. Recently, flexoelectricity was found to account for the out-of-plane electromechanical coupling in many 2D transition metal dichalcogenides (TMDs) who only exhibit in-plane piezoelectricity. However, low dimensional vdW three-dimensional (3D) topological insulators (TIs) have been overlooked regarding their electromechanical properties. In this study, for the first time, we experimentally investigate the electromechanical coupling of low dimensional 3D TIs with a centrosymmetric crystal structure, where a binary compound, bismuth selenite (Bi2Se3), is taken as an example. The results of piezoresponse force microscope (PFM) tests on the Bi2Se3 nanoflakes show that the material exhibits both out-of-plane and in-plane electromechanical responses. The Bi2Se3 nanoflake with a thickness of 37 nm possesses an effective out-of-plane piezoelectric coefficient of ~0.65 pm V-1. With careful analyses, the electromechanical responses are verified to arise from the converse flexoelectricity. The measured effective out-of-plane piezoelectric coefficient is mainly contributed by flexoelectric coefficient, μ_39, which is estimated to be approximately 0.13 nC m-1. However, it is rather difficult to obtain the in-plane component of the flexoelectric tensor from the in-plane PFM measurements since the direction of the in-plane stress is always not normal to the AFM cantilever axis. The results provide useful guidance for understanding the flexoelectric effect of low dimensional vdW materials with centrosymmetric crystal structures. Moreover, the work can pave to way to explore the electromechanical devices based on the flexoelectricity of vdW TIs. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 6 figures

arXiv:2310.19188 [pdf, other]

3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets

Authors: Ta-Ying Cheng, Matheus Gadelha, Soren Pirk, Thibault Groueix, Radomir Mech, Andrew Markham, Niki Trigoni

Abstract: We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image repres… ▽ More We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset. Project Page: https://ttchengab.github.io/3dminerOfficial △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: In ICCV 2023

arXiv:2310.17369 [pdf, other]

Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers

Authors: Daniela Teodorescu, Tiffany Cheng, Alona Fyshe, Saif M. Mohammad

Abstract: Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dyn… ▽ More Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dynamics from one's everyday utterances addresses many of these concerns, but it is not yet known whether these measures of utterance emotion dynamics (UED) correlate with mental health diagnoses. Here, for the first time, we study the relationship between tweet emotion dynamics and mental health disorders. We find that each of the UED metrics studied varied by the user's self-disclosed diagnosis. For example: average valence was significantly higher (i.e., more positive text) in the control group compared to users with ADHD, MDD, and PTSD. Valence variability was significantly lower in the control group compared to ADHD, depression, bipolar disorder, MDD, PTSD, and OCD but not PPD. Rise and recovery rates of valence also exhibited significant differences from the control. This work provides important early evidence for how linguistic cues pertaining to emotion dynamics can play a crucial role as biosocial markers for mental illnesses and aid in the understanding, diagnosis, and management of mental health disorders. △ Less

Submitted 4 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 9 pages, 5 figures

arXiv:2310.17016 [pdf]

Boosting output performance of contact-separation mode triboelectric nanogenerators by adopting discontinuity and fringing effect: experiment and modelling studies

Authors: Teresa Cheng, Han Hu, Navid Valizadeh, Qiong Liu, Florian Bittner, Ling Yang, Timon Rabczuk, Xiaoning Jiang, Xiaoying Zhuang

Abstract: Triboelectric nanogenerators (TENGs) are promising self-powering supplies for a diverse range of intelligent sensing and monitoring devices, especially due to their capability of harvesting electric energy from low frequency and small-scale mechanical motions. Inspired by the fact that contact-separation mode TENGs with small contact areas harvest high electrical outputs due to fringing effect, th… ▽ More Triboelectric nanogenerators (TENGs) are promising self-powering supplies for a diverse range of intelligent sensing and monitoring devices, especially due to their capability of harvesting electric energy from low frequency and small-scale mechanical motions. Inspired by the fact that contact-separation mode TENGs with small contact areas harvest high electrical outputs due to fringing effect, this study employed discontinuity on the dielectric side of contact-separation mode TENGs to promote fringing electric fields for the enhancement of electrical outputs. The results reveal that the TENGs with more discontinuities show higher overall electric performance. Compared to pristine TENGs, the TENGs with cross discontinuities increased the surface charge by 50% and the power density by 114%. However, one should avoid generating discontinuities on tribonegative side of TENGs using metal blade within a positive-ion atmosphere due to the neutralization through electrically conductive metal blade. The computational simulation validated that the TENGs with discontinuities obtained higher electrical outputs, and further investigated the effect of discontinuity gap size and array distance on TENGs performance. This study has provided a promising method for the future design of TENGs using discontinuous structures. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 23 pages, 8 figures

arXiv:2310.04962 [pdf, ps, other]

Properly colored even cycles in edge-colored complete balanced bipartite graphs

Authors: Shanshan Guo, Fei Huang, **jiang Yuan, C. T. Ng, T. C. E. Cheng

Abstract: Consider a complete balanced bipartite graph $K_{n,n}$ and let $K^c_{n,n}$ be an edge-colored version of $K_{n,n}$ that is obtained from $K_{n,n}$ by having each edge assigned a certain color. A subgraph $H$ of $K^c_{n,n}$ is called properly colored (PC) if every two adjacent edges of $H$ have distinct colors. $K_{n,n}^c$ is called properly vertex-even-pancyclic if for every vertex… ▽ More Consider a complete balanced bipartite graph $K_{n,n}$ and let $K^c_{n,n}$ be an edge-colored version of $K_{n,n}$ that is obtained from $K_{n,n}$ by having each edge assigned a certain color. A subgraph $H$ of $K^c_{n,n}$ is called properly colored (PC) if every two adjacent edges of $H$ have distinct colors. $K_{n,n}^c$ is called properly vertex-even-pancyclic if for every vertex $u\in V(K_{n,n}^c)$ and for every even integer $k$ with $4 \leq k \leq 2n$, there exists a PC $k$-cycle containing $u$. The minimum color degree $δ^c(K^c_{n,n})$ of $K^c_{n,n}$ is the largest integer $k$ such that for every vertex $v$, there are at least $k$ distinct colors on the edges incident to $v$. In this paper we study the existence of PC even cycles in $K_{n,n}^c$. We first show that, for every integer $t\geq 3$, every $K^c_{n,n}$ with $δ^c(K^c_{n,n})\geq \frac{2n}{3}+t$ contains a PC 2-factor $H$ such that every cycle of $H$ has a length of at least $t$. By using the probabilistic method and absorbing technique, we use the above result to further show that, for every $\varepsilon>0$, there exists an integer $n_0(\varepsilon)$ such that every $K^c_{n,n}$ with $n\geq n_0(\varepsilon)$ is properly vertex-even-pancyclic, provided that $δ^c(K^c_{n,n})\geq (\frac{2}{3}+\varepsilon)n$. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.00987 [pdf, other]

A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

Authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmanić, Anastasis Kratsios, David Belius

Abstract: Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regres… ▽ More Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regression (KRR) by deriving sharp non-asymptotic upper and lower bounds for the KRR test error of any finite-rank KRR. Our bounds are tighter than previously derived bounds on finite-rank KRR, and unlike comparable results, they also remain valid for any regularization parameters. △ Less

Submitted 3 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.15197 [pdf, other]

doi 10.1145/3610045

A Tale of Two Cultures: Comparing Interpersonal Information Disclosure Norms on Twitter

Authors: Mainack Mondal, Anju Punuru, Tyng-Wen Scott Cheng, Kenneth Vargas, Chaz Gundry, Nathan S Driggs, Noah Schill, Nathaniel Carlson, Josh Bedwell, Jaden Q Lorenc, Isha Ghosh, Yao Li, Nancy Fulda, Xinru Page

Abstract: We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and… ▽ More We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and content disclosed. We decided to study whether such differences also occur online in context of Twitter when comparing tweets posted in an individualistic (U.S.) versus a collectivist (India) society. We collected more than 2 million tweets posted in the U.S. and India over a 3 month period which contain interpersonal relationship keywords. A card-sort study was used to develop this culturally-sensitive saturated taxonomy of keywords that represent interpersonal relationships (e.g., ma, mom, mother). Then we developed a high-accuracy interpersonal disclosure detector based on dependency-parsing (F1-score: 86%) to identify when the words refer to a personal relationship of the poster (e.g., "my mom" as opposed to "a mom"). This allowed us to identify the 400K+ tweets in our data set which actually disclose information about the poster's interpersonal relationships. We used a mixed methods approach to analyze these tweets (e.g., comparing the amount of joy expressed about one's family) and found differences in emotion, topic, and content disclosed between tweets from the U.S. versus India. Our analysis also reveals how a combination of qualitative and quantitative methods are needed to uncover these differences; Using just one or the other can be misleading. This study extends the prior literature on Multi-Party Privacy and provides guidance for researchers and designers of culturally-sensitive systems. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: This work will be presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2023). This paper will also be published in The Proceedings of the ACM on Human Computer Interaction

arXiv:2309.08096 [pdf, other]

GelSplitter: Tactile Reconstruction from Near Infrared and Visible Images

Authors: Yuankai Lin, Yulin Zhou, Kaiji Huang, Qi Zhong, Tao Cheng, Hua Yang, Zhou** Yin

Abstract: The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor w… ▽ More The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor with synchronized multi-modal cameras and resemble a more human-like tactile receptor. Furthermore, we focus on 3D tactile reconstruction and implement a compact sensor structure that maintains a comparable size to state-of-the-art VT sensors, even with the addition of a prism and a near infrared (NIR) camera. We also design a photometric fusion stereo neural network (PFSNN), which estimates surface normals of objects and reconstructs touch geometry from both infrared and visible images. Our results demonstrate that the accuracy of RGB and NIR fusion is higher than that of RGB images alone. Additionally, our GelSplitter framework allows for a flexible configuration of different camera sensor combinations, such as RGB and thermal imaging. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.00671 [pdf, other]

Dark Energy Survey Year 6 Results: Intra-Cluster Light from Redshift 0.2 to 0.5

Authors: Yuanyuan Zhang, Jesse B. Golden-Marx, Ricardo L. C. Ogando, Brian Yanny, Eli S. Rykoff, Sahar Allam, M. Aguena, D. Bacon, S. Bocquet, D. Brooks, A. Carnero Rosell, J. Carretero, T. -Y. Cheng, C. Conselice, M. Costanzi, L. N. da Costa, M. E. S. Pereira, T. M. Davis, S. Desai, H. T. Diehl, P. Doel, I. Ferrero, B. Flaugher, J. Frieman, D. Gruen , et al. (24 additional authors not shown)

Abstract: Using the full six years of imaging data from the Dark Energy Survey, we study the surface brightness profiles of galaxy cluster central galaxies and intra-cluster light. We apply a ``stacking'' method to over four thousand galaxy clusters identified by the redMaPPer cluster finding algorithm in the redshift range of 0.2 to 0.5. This yields high signal-to-noise radial profile measurements of the c… ▽ More Using the full six years of imaging data from the Dark Energy Survey, we study the surface brightness profiles of galaxy cluster central galaxies and intra-cluster light. We apply a ``stacking'' method to over four thousand galaxy clusters identified by the redMaPPer cluster finding algorithm in the redshift range of 0.2 to 0.5. This yields high signal-to-noise radial profile measurements of the central galaxy and intra-cluster light out to 1 Mpc from the cluster center. Using redMaPPer richness as a cluster mass indicator, we find that the intra-cluster light brightness has a strong mass dependence throughout the 0.2 to 0.5 redshift range, and the dependence grows stronger at a larger radius. In terms of redshift evolution, we find some evidence that the central galaxy, as well as the diffuse light within the transition region between the cluster central galaxy and intra-cluster light within 80 kpc from the center, may be growing over time. At larger radii, more than 80 kpc away from the cluster center, we do not find evidence of additional redshift evolution beyond the cluster mass dependence, which is consistent with the findings from the IllustrisTNG hydrodynamic simulation. We speculate that the major driver of intra-cluster light growth, especially at large radii, is associated with cluster mass growth. Finally, we find that the color of the cluster central galaxy and intra-cluster light displays a radial gradient that becomes bluer at a larger radius, which is consistent with a stellar strip** and disruption origin of intra-cluster light as suggested by simulation studies. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Submitted to MNRAS

arXiv:2308.15197 [pdf, other]

Where Would I Go Next? Large Language Models as Human Mobility Predictors

Authors: Xinglei Wang, Meng Fang, Zichao Zeng, Tao Cheng

Abstract: Accurate human mobility prediction underpins many important applications across a variety of domains, including epidemic modelling, transport planning, and emergency responses. Due to the sparsity of mobility data and the stochastic nature of people's daily activities, achieving precise predictions of people's locations remains a challenge. While recently developed large language models (LLMs) hav… ▽ More Accurate human mobility prediction underpins many important applications across a variety of domains, including epidemic modelling, transport planning, and emergency responses. Due to the sparsity of mobility data and the stochastic nature of people's daily activities, achieving precise predictions of people's locations remains a challenge. While recently developed large language models (LLMs) have demonstrated superior performance across numerous language-related tasks, their applicability to human mobility studies remains unexplored. Addressing this gap, this article delves into the potential of LLMs for human mobility prediction tasks. We introduce a novel method, LLM-Mob, which leverages the language understanding and reasoning capabilities of LLMs for analysing human mobility data. We present concepts of historical stays and context stays to capture both long-term and short-term dependencies in human movement and enable time-aware prediction by using time information of the prediction target. Additionally, we design context-inclusive prompts that enable LLMs to generate more accurate predictions. Comprehensive evaluations of our method reveal that LLM-Mob excels in providing accurate and interpretable predictions, highlighting the untapped potential of LLMs in advancing human mobility prediction techniques. We posit that our research marks a significant paradigm shift in human mobility modelling, transitioning from building complex domain-specific models to harnessing general-purpose LLMs that yield accurate predictions through language instructions. The code for this work is available at https://github.com/xlwang233/LLM-Mob. △ Less

Submitted 9 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: Major changes: Used the entire FSQ-NYC dataset (table 1). Used Geolife for ablation study (figure 5). Incorporated time-unknown prediction performance (table 2), robustness testing(section 5.6), and ethical statement (appendix). Reformatted the paper using double column template

arXiv:2308.13874 [pdf, other]

Sufficient conditions for $k$-factors and spanning trees of graphs

Authors: Guoyan Ao, Ruifang Liu, **jiang Yuan, C. T. Ng, T. C. E. Cheng

Abstract: For any integer $k\geq1,$ a graph $G$ has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper we prove a sufficient condition in terms of the number of $r$-cliques to guarantee the existence of a $k$-factor in a graph with minimum degree at least $δ$, which improves the sufficient condition of O \cite{O2021} based on the number of edges. For any integer $k\geq2,$ a spanning… ▽ More For any integer $k\geq1,$ a graph $G$ has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper we prove a sufficient condition in terms of the number of $r$-cliques to guarantee the existence of a $k$-factor in a graph with minimum degree at least $δ$, which improves the sufficient condition of O \cite{O2021} based on the number of edges. For any integer $k\geq2,$ a spanning $k$-tree of a connected graph $G$ is a spanning tree in which every vertex has degree at most $k$. Motivated by the technique of Li and Ning \cite{Li2016}, we present a tight spectral condition for an $m$-connected graph to have a spanning $k$-tree, which extends the result of Fan, Goryainov, Huang and Lin \cite{Fan2021} from $m=1$ to general $m$. Let $T$ be a spanning tree of a connected graph. The leaf degree of $T$ is the maximum number of leaves adjacent to $v$ in $T$ for any $v\in V(T)$. We provide a tight spectral condition for the existence of a spanning tree with leaf degree at most $k$ in a connected graph with minimum degree $δ$, where $k\geq1$ is an integer. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: 17 pages, 3 figures

MSC Class: 05C50; 05C35

arXiv:2308.11727 [pdf, ps, other]

Simplified partial wave expansion of the Lamb shift

Authors: J. Sapirstein, K. T. Cheng

Abstract: A method for calculating the self energy part of the Lamb shift is revisited. When the electron propagator in an external field is represented as an expansion in partial waves, the original method converges relatively slowly, requiring the calculation of dozens of partial waves. Here we show an improved method in which accurate results can be obtained using a much smaller number of partial waves.… ▽ More A method for calculating the self energy part of the Lamb shift is revisited. When the electron propagator in an external field is represented as an expansion in partial waves, the original method converges relatively slowly, requiring the calculation of dozens of partial waves. Here we show an improved method in which accurate results can be obtained using a much smaller number of partial waves. The method is illustrated for the ground states of hydrogenlike and lithiumlike boron, and the possibility of high accuracy calculations on lower Z hydrogenic ions is discussed. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 10 pages

arXiv:2308.03434 [pdf, ps, other]

Tyshkevich's Graph Decomposition and the Distinguishing Numbers of Unigraphs

Authors: Christine T. Cheng

Abstract: A $c$-labeling $φ: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $π$ of $G$, there is some vertex $v$ so that $φ(v) \neq φ(π(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling. We consider a compact version of Tyshkevich's graph decomposition theorem where trivial compo… ▽ More A $c$-labeling $φ: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $π$ of $G$, there is some vertex $v$ so that $φ(v) \neq φ(π(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling. We consider a compact version of Tyshkevich's graph decomposition theorem where trivial components are maximally combined to form a complete graph or a graph of isolated vertices. Suppose the compact canonical decomposition of $G$ is $G_{k} \circ G_{k-1} \circ \cdots \circ G_1 \circ G_0$. We prove that $φ$ is a distinguishing labeling of $G$ if and only if $φ$ is a distinguishing labeling of $G_i$ when restricted to $V(G_i)$ for $i = 0, \hdots, k$. Thus, $D(G) = \max \{D(G_i), i = 0, \hdots, k \}$. We then present an algorithm that computes the distinguishing number of a unigraph in linear time. △ Less

Submitted 26 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: 22 pages plus an appendix with 8 pages

arXiv:2308.02712 [pdf, other]

doi 10.3847/1538-3881/acfda4

A Search for Technosignatures Around 11,680 Stars with the Green Bank Telescope at 1.15-1.73 GHz

Authors: Jean-Luc Margot, Megan G. Li, Pavlo Pinchuk, Nathan Myhrvold, Larry Lesyna, Lea E. Alcantara, Megan T. Andrakin, Jeth Arunseangroj, Damien S. Baclet, Madison H. Belk, Zerxes R. Bhadha, Nicholas W. Brandis, Robert E. Carey, Harrison P. Cassar, Sai S. Chava, Calvin Chen, James Chen, Kellen T. Cheng, Alessia Cimbri, Benjamin Cloutier, Jordan A. Combitsis, Kelly L. Couvrette, Brandon P. Coy, Kyle W. Davis, Antoine F. Delcayre , et al. (56 additional authors not shown)

Abstract: We conducted a search for narrowband radio signals over four observing sessions in 2020-2023 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. We pointed the telescope in the directions of 62 TESS Objects of Interest, capturing radio emissions from a total of ~11,680 stars and planetary systems in the ~9 arcminute beam of the telescope. All detections were either… ▽ More We conducted a search for narrowband radio signals over four observing sessions in 2020-2023 with the L-band receiver (1.15-1.73 GHz) of the 100 m diameter Green Bank Telescope. We pointed the telescope in the directions of 62 TESS Objects of Interest, capturing radio emissions from a total of ~11,680 stars and planetary systems in the ~9 arcminute beam of the telescope. All detections were either automatically rejected or visually inspected and confirmed to be of anthropogenic nature. In this work, we also quantified the end-to-end efficiency of radio SETI pipelines with a signal injection and recovery analysis. The UCLA SETI pipeline recovers 94.0% of the injected signals over the usable frequency range of the receiver and 98.7% of the injections when regions of dense RFI are excluded. In another pipeline that uses incoherent sums of 51 consecutive spectra, the recovery rate is ~15 times smaller at ~6%. The pipeline efficiency affects calculations of transmitter prevalence and SETI search volume. Accordingly, we developed an improved Drake Figure of Merit and a formalism to place upper limits on transmitter prevalence that take the pipeline efficiency and transmitter duty cycle into account. Based on our observations, we can state at the 95% confidence level that fewer than 6.6% of stars within 100 pc host a transmitter that is detectable in our search (EIRP > 1e13 W). For stars within 20,000 ly, the fraction of stars with detectable transmitters (EIRP > 5e16 W) is at most 3e-4. Finally, we showed that the UCLA SETI pipeline natively detects the signals detected with AI techniques by Ma et al. (2023). △ Less

Submitted 15 October, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 21 pages, 9 figures, in press at AJ

Journal ref: AJ 166 206 (2023)

arXiv:2306.15670 [pdf, other]

Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

Authors: Haoyi Jiang, Tianheng Cheng, Naiyu Gao, Haoyang Zhang, Tianwei Lin, Wenyu Liu, Xinggang Wang

Abstract: `3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves… ▽ More `3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves into the integration of instance queries to orchestrate 2D-to-3D reconstruction and 3D scene modeling. Leveraging our proposed Serial Instance-Propagated Attentions, Symphonies dynamically encodes instance-centric semantics, facilitating intricate interactions between image-based and volumetric domains. Simultaneously, Symphonies enables holistic scene comprehension by capturing context through the efficient fusion of instance queries, alleviating geometric ambiguity such as occlusion and perspective errors through contextual scene reasoning. Experimental results demonstrate that Symphonies achieves state-of-the-art performance on challenging benchmarks SemanticKITTI and SSCBench-KITTI-360, yielding remarkable mIoU scores of 15.04 and 18.58, respectively. These results showcase the paradigm's promising advancements. The code is available at https://github.com/hustvl/Symphonies. △ Less

Submitted 22 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Technical report. Code and models at: https://github.com/hustvl/Symphonies

arXiv:2306.14649 [pdf, other]

CIMulator: A Comprehensive Simulation Platform for Computing-In-Memory Circuit Macros with Low Bit-Width and Real Memory Materials

Authors: Hoang-Hiep Le, Md. Aftab Baig, Wei-Chen Hong, Cheng-Hsien Tsai, Cheng-Jui Yeh, Fu-Xiang Liang, I-Ting Huang, Wei-Tzu Tsai, Ting-Yin Cheng, Sourav De, Nan-Yow Chen, Wen-Jay Lee, Ing-Chao Lin, Da-Wei Chang, Darsen D. Lu

Abstract: This paper presents a simulation platform, namely CIMulator, for quantifying the efficacy of various synaptic devices in neuromorphic accelerators for different neural network architectures. Nonvolatile memory devices, such as resistive random-access memory, ferroelectric field-effect transistor, and volatile static random-access memory devices, can be selected as synaptic devices. A multilayer pe… ▽ More This paper presents a simulation platform, namely CIMulator, for quantifying the efficacy of various synaptic devices in neuromorphic accelerators for different neural network architectures. Nonvolatile memory devices, such as resistive random-access memory, ferroelectric field-effect transistor, and volatile static random-access memory devices, can be selected as synaptic devices. A multilayer perceptron and convolutional neural networks (CNNs), such as LeNet-5, VGG-16, and a custom CNN named C4W-1, are simulated to evaluate the effects of these synaptic devices on the training and inference outcomes. The dataset used in the simulations are MNIST, CIFAR-10, and a white blood cell dataset. By applying batch normalization and appropriate optimizers in the training phase, neuromorphic systems with very low-bit-width or binary weights could achieve high pattern recognition rates that approach software-based CNN accuracy. We also introduce spiking neural networks with RRAM-based synaptic devices for the recognition of MNIST handwritten digits. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.13653 [pdf, other]

ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration

Authors: Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, Lefei Zhang

Abstract: Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectu… ▽ More Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectures for image restoration tasks. In this paper, we present Degradation-aware Visual Prompts, which encode various types of image degradation, e.g., noise and blur, into unified visual prompts. These degradation-aware prompts provide control over image processing and allow weighted combinations for customized image restoration. We then leverage degradation-aware visual prompts to establish a controllable and universal model for image restoration, called ProRes, which is applicable to an extensive range of image restoration tasks. ProRes leverages the vanilla Vision Transformer (ViT) without any task-specific designs. Furthermore, the pre-trained ProRes can easily adapt to new tasks through efficient prompt tuning with only a few images. Without bells and whistles, ProRes achieves competitive performance compared to task-specific methods and experiments can demonstrate its ability for controllable restoration and adaptation for new tasks. The code and models will be released in \url{https://github.com/leonmakise/ProRes}. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.05584 [pdf, other]

Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

Authors: Jia-Xing Zhong, Ta-Ying Cheng, Yuhang He, Kai Lu, Kaichen Zhou, Andrew Markham, Niki Trigoni

Abstract: A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two inter… ▽ More A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two interconnected, lightweight heads. These heads predict segmentation masks using point-level invariant features and estimate motion from SE(3) equivariant features, all without the need for category information. Our training strategy is unified and can be implemented online, which jointly optimizes the predicted segmentation and motion by leveraging the interrelationships among scene flow, segmentation mask, and rigid transformations. We conduct experiments on four datasets to demonstrate the superiority of our method. The results show that our method excels in both model performance and computational efficiency, with only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds. △ Less

Submitted 31 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: To appear at NeurIPS 2023

arXiv:2305.09952 [pdf]

doi 10.1038/s41598-023-50502-9

Cathodoluminescence spectroscopy of monolayer hexagonal boron nitride

Authors: K. Shima, T. S. Cheng, C. J. Mellor, P. H. Beton, C. Elias, P. Valvin, B. Gil, G. Cassabois, S. V. Novikov, S. F. Chichibu

Abstract: Cathodoluminescence (CL) spectroscopy is a powerful technique for studying emission properties of optoelectronic materials because CL is free from excitable bandgap limits and from ambiguous signals due to simple light scattering and resonant Raman scattering potentially involved in the photoluminescence (PL) spectra. However, direct CL measurements of atomically thin two-dimensional materials, su… ▽ More Cathodoluminescence (CL) spectroscopy is a powerful technique for studying emission properties of optoelectronic materials because CL is free from excitable bandgap limits and from ambiguous signals due to simple light scattering and resonant Raman scattering potentially involved in the photoluminescence (PL) spectra. However, direct CL measurements of atomically thin two-dimensional materials, such as transition metal dichalcogenides and hexagonal boron nitride (hBN), have been difficult due to the small excitation volume that interacts with high-energy electron beams (e-beams). Herein, distinct CL signals from a monolayer hBN, namely mBN, epitaxial film grown on a highly oriented pyrolytic graphite substrate are shown by using a home-made CL system capable of large-area and surface-sensitive excitation by an e-beam. The spatially resolved CL spectra at 13 K exhibited a predominant 5.5-eV emission band, which has been ascribed to originate from multilayered aggregates of hBN, markedly at thicker areas formed on the step edges of the substrate. Conversely, a faint peak at 6.04 eV was routinely observed from atomically flat areas. Since the energy agreed with the PL peak of 6.05 eV at 10 K that has been assigned as being due to the recombination of phonon-assisted direct excitons of mBN by Elias et al. [Nat. Commun. 10, 2639 (2019)], the CL peak at 6.04 eV is attributed to originate from the mBN epilayer. The CL results support the transition from indirect bandgap in bulk hBN to direct bandgap in mBN, in analogy with molybdenum disulfide. The results also encourage to elucidate emission properties of other low-dimensional materials with reduced excitation volumes by using the present CL configuration. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: 7 pages, 3 figures

arXiv:2304.13493 [pdf]

Towards clinical AI fairness: A translational perspective

Authors: Mingxuan Liu, Yilin Ning, Salinelat Teixayavong, Mayli Mertens, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Ravi Chandran Narrendar, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan Liu

Abstract: Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the issue of fairness remains a concern in high-stakes fields such as healthcare. Despite extensive discussion and efforts in algorithm development, AI fairness and clinical concerns have not been adequately addressed. In this paper, we discuss the misalignment between technical and clinical perspectives o… ▽ More Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the issue of fairness remains a concern in high-stakes fields such as healthcare. Despite extensive discussion and efforts in algorithm development, AI fairness and clinical concerns have not been adequately addressed. In this paper, we discuss the misalignment between technical and clinical perspectives of AI fairness, highlight the barriers to AI fairness' translation to healthcare, advocate multidisciplinary collaboration to bridge the knowledge gap, and provide possible solutions to address the clinical concerns pertaining to AI fairness. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.13289 [pdf, other]

Membrane Potential Distribution Adjustment and Parametric Surrogate Gradient in Spiking Neural Networks

Authors: Siqi Wang, Tee Hiang Cheng, Meng-Hiot Lim

Abstract: As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rul… ▽ More As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rule, most SGs are chosen intuitively. We propose the parametric surrogate gradient (PSG) method to iteratively update SG and eventually determine an optimal surrogate gradient parameter, which calibrates the shape of candidate SGs. In SNNs, neural potential distribution tends to deviate unpredictably due to quantization error. We evaluate such potential shift and propose methodology for potential distribution adjustment (PDA) to minimize the loss of undesired pre-activations. Experimental results demonstrate that the proposed methods can be readily integrated with backpropagation through time (BPTT) algorithm and help modulated SNNs to achieve state-of-the-art performance on both static and dynamic dataset with fewer timesteps. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: 10 pages, 8 figures

arXiv:2304.09807 [pdf, other]

VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

Abstract: High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geo… ▽ More High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geometric patterns as unified point sequence representation, which can be extended to most map elements in the driving scene. VMA is highly efficient and extensible, requiring negligible human effort, and flexible in terms of spatial scale and element type. We quantitatively and qualitatively validate the annotation performance on real-world urban and highway scenes, as well as NYC Planimetric Database. VMA can significantly improve map generation efficiency and require little human effort. On average VMA takes 160min for annotating a scene with a range of hundreds of meters, and reduces 52.3% of the human cost, showing great application value. Code: https://github.com/hustvl/VMA. △ Less

Submitted 27 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: https://github.com/hustvl/VMA

arXiv:2304.09416 [pdf, ps, other]

On the second order of Zeta functional equations for Riemann Type

Authors: Chin-yuan Hu, Tsung-lin Cheng, Ie-bin Lian

Abstract: This paper discuss a new class of functional equations by using both Poisson summation formula and Jacobi type theta a function. The class of Riemann type functional equations are derived from self-reciprocal probability density functions. Finally, the second order Zeta functional equations for Riemann type is also investigated. This paper discuss a new class of functional equations by using both Poisson summation formula and Jacobi type theta a function. The class of Riemann type functional equations are derived from self-reciprocal probability density functions. Finally, the second order Zeta functional equations for Riemann type is also investigated. △ Less

Submitted 21 April, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 24 pages

arXiv:2304.03428 [pdf, other]

TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors

Authors: Shaoyu Chen, Tianheng Cheng, Jiemin Fang, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang

Abstract: Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-res… ▽ More Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-resolution feature maps for dense anchoring to better cover small objects, proposes a sparsely-connected convolution for computation reduction, enhances the early stage features in the backbone, and addresses the feature misalignment problem for accurate small object detection. On the COCO benchmark, our TinyDet-M achieves 30.3 AP and 13.5 AP^s with only 991 MFLOPs, which is the first detector that has an AP over 30 with less than 1 GFLOPs; besides, TinyDet-S and TinyDet-L achieve promising performance under different computation limitation. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.17594 [pdf, other]

MobileInst: Video Instance Segmentation on the Mobile

Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang

Abstract: Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile… ▽ More Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on one single CPU core of Snapdragon 778G Mobile Platform, without other methods of acceleration. On the COCO dataset, MobileInst achieves 31.2 mask AP and 433 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research. △ Less

Submitted 18 December, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: Accepted by AAAI 2024 Main Track; Code will be released

arXiv:2303.08815 [pdf, other]

Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

Authors: Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

Abstract: Online lane graph construction is a promising but challenging task in autonomous driving. Previous methods usually model the lane graph at the pixel or piece level, and recover the lane graph by pixel-wise or piece-wise connection, which breaks down the continuity of the lane. Human drivers focus on and drive along the continuous and complete paths instead of considering lane pieces. Autonomous ve… ▽ More Online lane graph construction is a promising but challenging task in autonomous driving. Previous methods usually model the lane graph at the pixel or piece level, and recover the lane graph by pixel-wise or piece-wise connection, which breaks down the continuity of the lane. Human drivers focus on and drive along the continuous and complete paths instead of considering lane pieces. Autonomous vehicles also require path-specific guidance from lane graph for trajectory planning. We argue that the path, which indicates the traffic flow, is the primitive of the lane graph. Motivated by this, we propose to model the lane graph in a novel path-wise manner, which well preserves the continuity of the lane and encodes traffic information for planning. We present a path-based online lane graph construction method, termed LaneGAP, which end-to-end learns the path and recovers the lane graph via a Path2Graph algorithm. We qualitatively and quantitatively demonstrate the superiority of LaneGAP over conventional pixel-based and piece-based methods on challenging nuScenes and Argoverse2 datasets. Abundant visualizations show LaneGAP can cope with diverse traffic conditions. Code and models will be released at \url{https://github.com/hustvl/LaneGAP} for facilitating future research. △ Less

Submitted 17 December, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Showing 1–50 of 212 results for author: Cheng, T