-
Revisiting Scene Text Recognition: A Data Perspective
Authors:
Qing Jiang,
Jiapeng Wang,
Dezhi Peng,
Chongyu Liu,
Lianwen **
Abstract:
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of performance saturation, whereby only 2.91% of the benchmark images cannot be accurately recognized by an ensemble of 13 representative models. While these results are impressive and suggest that STR could be considered sol…
▽ More
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of performance saturation, whereby only 2.91% of the benchmark images cannot be accurately recognized by an ensemble of 13 representative models. While these results are impressive and suggest that STR could be considered solved, however, we argue that this is primarily due to the less challenging nature of the common benchmarks, thus concealing the underlying issues that STR faces. To this end, we consolidate a large-scale real STR dataset, namely Union14M, which comprises 4 million labeled images and 10 million unlabeled images, to assess the performance of STR models in more complex real-world scenarios. Our experiments demonstrate that the 13 models can only achieve an average accuracy of 66.53% on the 4 million labeled images, indicating that STR still faces numerous challenges in the real world. By analyzing the error patterns of the 13 models, we identify seven open challenges in STR and develop a challenge-driven benchmark consisting of eight distinct subsets to facilitate further progress in the field. Our exploration demonstrates that STR is far from being solved and leveraging data may be a promising solution. In this regard, we find that utilizing the 10 million unlabeled images through self-supervised pre-training can significantly improve the robustness of STR model in real-world scenarios and leads to state-of-the-art performance.
△ Less
Submitted 19 July, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
The near room-temperature upsurge of electrical resistivity in Lu-H-N is not superconductivity, but a metal-to-poor-conductor transition
Authors:
Di Peng,
Qiaoshi Zeng,
Fujun Lan,
Zhenfang Xing,
Yang Ding,
Ho-kwang Mao
Abstract:
Since the discovery of superconductivity in mercury at 4 K in 1911, searching for materials with superconductivity at higher temperatures towards practical conditions has been a primary enduring goal. The recent report of room-temperature superconductivity at near-ambient pressure in nitrogen-doped lutetium hydride (Lu-H-N) by Dasenbrock-Gammon et al. (Hereafter referred as D-G) seems a great step…
▽ More
Since the discovery of superconductivity in mercury at 4 K in 1911, searching for materials with superconductivity at higher temperatures towards practical conditions has been a primary enduring goal. The recent report of room-temperature superconductivity at near-ambient pressure in nitrogen-doped lutetium hydride (Lu-H-N) by Dasenbrock-Gammon et al. (Hereafter referred as D-G) seems a great step approaching the ultimate goal. Specifically, they claimed evidence of superconductivity on Lu-H-N with a maximum Tc of 294 K at 1 GPa. However, the failure to observe the drastic temperature-dependent resistance change above 200 K in high-pressure synthesized Lu-H-N compounds, a prerequisite for superconductivity, by researchers worldwide in all independent follow-up studies casts a heavy shadow on the authenticity of the claims. The sober questions are: what is the sample that produces the sharp resistance jump near room temperature? What are the reasons for the non-reproducibility of others who follow the D-G method of synthesis and the inscrutable low success rate (35%) in synthesizing the right sample even for the authors in Ref. 1? What causes the observed sharp resistance jump? Here, with a well-controlled experiment protocol, we repeatedly reproduced the near room-temperature sudden change of electrical resistance in the Lu-H-N sample, and we could quantitatively compare its behavior with the initial pure Lu in a normal metallic state. These results enable us to scrutinize the origin for the near-room temperature sharp resistance change, which is attributed to a metal-to-poor-conductor transition rather than superconductivity.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Revisiting $L_q(0\leq q<1)$ Norm Regularized Optimization
Authors:
Shenglong Zhou,
Xianchao Xiu,
Yingnan Wang,
Dingtao Peng
Abstract:
Sparse optimization has seen its advances in recent decades. For scenarios where the true sparsity is unknown, regularization turns out to be a promising solution. Two popular non-convex regularizations are the so-called $L_0$ norm and $L_q$ norm with $q\in(0,1)$, giving rise to extensive research on their induced optimization. However, the majority of these work centered around the main function…
▽ More
Sparse optimization has seen its advances in recent decades. For scenarios where the true sparsity is unknown, regularization turns out to be a promising solution. Two popular non-convex regularizations are the so-called $L_0$ norm and $L_q$ norm with $q\in(0,1)$, giving rise to extensive research on their induced optimization. However, the majority of these work centered around the main function that is twice continuously differentiable and the best convergence rate for an algorithm solving the optimization with $q\in(0,1)$ is superlinear. This paper explores the $L_q$ norm regularized optimization in a unified way for any $q\in[0,1)$, where the main function has a semismooth gradient. In particular, we establish the first-order and the second-order optimality conditions under mild assumptions and then integrate the proximal operator and semismooth Newton method to develop a proximal semismooth Newton pursuit algorithm. Under the second sufficient condition, the whole sequence generated by the algorithm converges to a unique local minimizer. Moreover, the convergence is superlinear and quadratic if the gradient of the main function is semismooth and strongly semismooth at the local minimizer, respectively. Hence, this paper accomplishes the quadratic rate for an algorithm designed to solve the $L_q$ norm regularization problem for any $q\in(0,1)$. Finally, some numerical experiments have showcased its nice performance when compared with several existing solvers.
△ Less
Submitted 11 August, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Authors:
Dezhi Peng,
Chongyu Liu,
Yuliang Liu,
Lianwen **
Abstract:
Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds. Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization. Moreover, most existing STR methods adopt convolutional architectures while the potential of vision Transformers (ViTs) remains lar…
▽ More
Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds. Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization. Moreover, most existing STR methods adopt convolutional architectures while the potential of vision Transformers (ViTs) remains largely unexplored. In this paper, we propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser. Following a concise encoder-decoder framework, ViTEraser can easily incorporate various ViTs to enhance long-range modeling. Specifically, the encoder hierarchically maps the input image into the hidden space through ViT blocks and patch embedding layers, while the decoder gradually upsamples the hidden features to the text-erased image with ViT blocks and patch splitting layers. As ViTEraser implicitly integrates text localization and inpainting, we propose a novel end-to-end pretraining method, termed SegMIM, which focuses the encoder and decoder on the text box segmentation and masked image modeling tasks, respectively. Experimental results demonstrate that ViTEraser with SegMIM achieves state-of-the-art performance on STR by a substantial margin and exhibits strong generalization ability when extended to other tasks, \textit{e.g.}, tampered scene text detection. Furthermore, we comprehensively explore the architecture, pretraining, and scalability of the ViT-based encoder-decoder for STR, which provides deep insights into the application of ViT to the STR field. Code is available at https://github.com/shannanyinxiang/ViTEraser.
△ Less
Submitted 18 February, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Global gyrokinetic simulations of the impact of magnetic island on ion temperature gradient driven turbulence
Authors:
J. C. Li,
J. Q. Xu,
Y. R. Qu,
Z. Lin,
J. Q. Dong,
X. D. Peng,
J. Q. Li
Abstract:
The effect of island width on the multi-scale interactions between magnetic island (MI) and ion temperature gradient (ITG) turbulence has been investigated based on the global gyrokinetic approach. It is found that the coupling between the island and turbulence is enhanced when the MI width (w) becomes larger. A vortex flow that is highly sensitive to the width of the magnetic island can be trigge…
▽ More
The effect of island width on the multi-scale interactions between magnetic island (MI) and ion temperature gradient (ITG) turbulence has been investigated based on the global gyrokinetic approach. It is found that the coupling between the island and turbulence is enhanced when the MI width (w) becomes larger. A vortex flow that is highly sensitive to the width of the magnetic island can be triggered, ultimately resulting in a potent shear flow and a consequent reduction in turbulent transport. The shearing rate induced by the vortex flow is minimum at the O-point while it is maximum at both of the two reconnection points of the island, i.e., the X-points, regardless of the island width. There exists a nonmonotonic relationship between zonal flow (ZF) amplitude and island width, showing that the ZF is partially suppressed by medium-sized MIs whereas enhanced in the case of large island. A larger MI can tremendously damage the ITG mode structure, resulting in higher turbulent transport at the X-point whereas a lower one at the O-point, respectively. Such phenomenon will be less distinct at very small island widths below w/a =8% (a is the minor radius), where it shows that turbulence near the X-point is hardly affected although it is still suppressed inside the island. Furthermore, the influence of different island sizes on turbulence transport level is also discussed.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Brainformers: Trading Simplicity for Efficiency
Authors:
Yanqi Zhou,
Nan Du,
Yan** Huang,
Daiyi Peng,
Chang Lan,
Da Huang,
Siamak Shakeri,
David So,
Andrew Dai,
Yifeng Lu,
Zhifeng Chen,
Quoc Le,
Claire Cui,
James Laudon,
Jeff Dean
Abstract:
Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in…
▽ More
Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this insight, we develop a complex block, named Brainformer, that consists of a diverse sets of layers such as sparsely gated feed-forward layers, dense feed-forward layers, attention layers, and various forms of layer normalization and activation functions. Brainformer consistently outperforms the state-of-the-art dense and sparse Transformers, in terms of both quality and efficiency. A Brainformer model with 8 billion activated parameters per token demonstrates 2x faster training convergence and 5x faster step time compared to its GLaM counterpart. In downstream task evaluation, Brainformer also demonstrates a 3% higher SuperGLUE score with fine-tuning compared to GLaM with a similar number of activated parameters. Finally, Brainformer largely outperforms a Primer dense model derived with NAS with similar computation per token on fewshot evaluations.
△ Less
Submitted 25 April, 2024; v1 submitted 29 May, 2023;
originally announced June 2023.
-
An Empirical Study on the Language Modal in Visual Question Answering
Authors:
Daowan Peng,
Wei Wei,
Xian-Ling Mao,
Yuanyuan Fu,
Dangyang Chen
Abstract:
Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influ…
▽ More
Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.
△ Less
Submitted 4 September, 2023; v1 submitted 17 May, 2023;
originally announced May 2023.
-
LayerNAS: Neural Architecture Search in Polynomial Complexity
Authors:
Yicheng Fan,
Dana Alon,
**gyue Shen,
Daiyi Peng,
Keshav Kumar,
Yun Long,
Xin Wang,
Fotis Iliopoulos,
Da-Cheng Juan,
Erik Vee
Abstract:
Neural Architecture Search (NAS) has become a popular method for discovering effective model architectures, especially for target hardware. As such, NAS methods that find optimal architectures under constraints are essential. In our paper, we propose LayerNAS to address the challenge of multi-objective NAS by transforming it into a combinatorial optimization problem, which effectively constrains t…
▽ More
Neural Architecture Search (NAS) has become a popular method for discovering effective model architectures, especially for target hardware. As such, NAS methods that find optimal architectures under constraints are essential. In our paper, we propose LayerNAS to address the challenge of multi-objective NAS by transforming it into a combinatorial optimization problem, which effectively constrains the search complexity to be polynomial.
For a model architecture with $L$ layers, we perform layerwise-search for each layer, selecting from a set of search options $\mathbb{S}$. LayerNAS groups model candidates based on one objective, such as model size or latency, and searches for the optimal model based on another objective, thereby splitting the cost and reward elements of the search. This approach limits the search complexity to $ O(H \cdot |\mathbb{S}| \cdot L) $, where $H$ is a constant set in LayerNAS.
Our experiments show that LayerNAS is able to consistently discover superior models across a variety of search spaces in comparison to strong baselines, including search spaces derived from NATS-Bench, MobileNetV2 and MobileNetV3.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Equilibrium distribution and diffusion of mixed hydrogen-methane gas in gravity field
Authors:
Shiyao Peng,
Qiao He,
Ducheng Peng,
Xin Ouyang,
Xiaorui Zhang,
Chong Chai,
Lianlai Zhang,
Xu Sun,
Huiqiu Deng,
Wangyu Hu,
Jie Hou
Abstract:
Repurposing existing natural gas pipelines is a promising solution for large-scale transportation of mixed hydrogen-methane gas. However, it remains debatable whether gravitational stratification can notably affect hydrogen partial pressure in the gas mixture. To address this issue, we combined molecular dynamics simulation with thermodynamic and diffusion theories. Our study systematically examin…
▽ More
Repurposing existing natural gas pipelines is a promising solution for large-scale transportation of mixed hydrogen-methane gas. However, it remains debatable whether gravitational stratification can notably affect hydrogen partial pressure in the gas mixture. To address this issue, we combined molecular dynamics simulation with thermodynamic and diffusion theories. Our study systematically examined the equilibrium distribution of hydrogen-methane mixtures in gravity fields. We demonstrated that partial pressures of both gases decrease with altitude, with hydrogen showing slower decrease due to its smaller molar mass. As a result, the volume fraction of hydrogen is maximized at the top end of pipes. The stratification is more favorable at low temperature and large altitude drops, with notable gas stratification only occurring at extremely large drops in altitude, being generally negligible even at a drop of 1500 m. Furthermore, we showed that the diffusion time required to achieve the equilibrium distribution is proportional to gas pressure and the square of pipeline height. This requires approximately 300 years for a 1500 m pipeline at 1 bar. Therefore, temporary interruptions in pipeline gas transportation will not cause visible stratification. Our work clarifies the effect of gravity on hydrogen-methane gas mixtures and provides quantitative insights into assessing the stratification of gas mixtures in pipelines.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Is a Compact Group with All Dense Subgroups Separable Metrizable?
Authors:
Dekui Peng
Abstract:
A compact group with {\bf all dense subspaces} separable is metrizable. Inspired by this, we conjecture that a compact group with {\bf all dense subgroups} separable is metrizable. Positive answers are given here for two elementary cases, say, when the compact group is additionally assumed to be abelian or connected. However, a locally compact abelian group, even when it has an open compact subgro…
▽ More
A compact group with {\bf all dense subspaces} separable is metrizable. Inspired by this, we conjecture that a compact group with {\bf all dense subgroups} separable is metrizable. Positive answers are given here for two elementary cases, say, when the compact group is additionally assumed to be abelian or connected. However, a locally compact abelian group, even when it has an open compact subgroup, with all dense subgroups separable, may not be metrizable. At the end of this note, it is shown that a locally compact group with {\bf all subgroups} separable is metrizable. Our arguments are formalized in a more general form, namely, not restricted to the countable case.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
Correspondence-Free Domain Alignment for Unsupervised Cross-Domain Image Retrieval
Authors:
Xu Wang,
Dezhong Peng,
Ming Yan,
Peng Hu
Abstract:
Cross-domain image retrieval aims at retrieving images across different domains to excavate cross-domain classificatory or correspondence relationships. This paper studies a less-touched problem of cross-domain image retrieval, i.e., unsupervised cross-domain image retrieval, considering the following practical assumptions: (i) no correspondence relationship, and (ii) no category annotations. It i…
▽ More
Cross-domain image retrieval aims at retrieving images across different domains to excavate cross-domain classificatory or correspondence relationships. This paper studies a less-touched problem of cross-domain image retrieval, i.e., unsupervised cross-domain image retrieval, considering the following practical assumptions: (i) no correspondence relationship, and (ii) no category annotations. It is challenging to align and bridge distinct domains without cross-domain correspondence. To tackle the challenge, we present a novel Correspondence-free Domain Alignment (CoDA) method to effectively eliminate the cross-domain gap through In-domain Self-matching Supervision (ISS) and Cross-domain Classifier Alignment (CCA). To be specific, ISS is presented to encapsulate discriminative information into the latent common space by elaborating a novel self-matching supervision mechanism. To alleviate the cross-domain discrepancy, CCA is proposed to align distinct domain-specific classifiers. Thanks to the ISS and CCA, our method could encode the discrimination into the domain-invariant embedding space for unsupervised cross-domain image retrieval. To verify the effectiveness of the proposed method, extensive experiments are conducted on four benchmark datasets compared with six state-of-the-art methods.
△ Less
Submitted 23 March, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
Rover: An online Spark SQL tuning service via generalized transfer learning
Authors:
Yu Shen,
Xinyuyang Ren,
Yupeng Lu,
Huaijun Jiang,
Huanyong Xu,
Di Peng,
Yang Li,
Wentao Zhang,
Bin Cui
Abstract:
Distributed data analytic engines like Spark are common choices to process massive data in industry. However, the performance of Spark SQL highly depends on the choice of configurations, where the optimal ones vary with the executed workloads. Among various alternatives for Spark SQL tuning, Bayesian optimization (BO) is a popular framework that finds near-optimal configurations given sufficient b…
▽ More
Distributed data analytic engines like Spark are common choices to process massive data in industry. However, the performance of Spark SQL highly depends on the choice of configurations, where the optimal ones vary with the executed workloads. Among various alternatives for Spark SQL tuning, Bayesian optimization (BO) is a popular framework that finds near-optimal configurations given sufficient budget, but it suffers from the re-optimization issue and is not practical in real production. When applying transfer learning to accelerate the tuning process, we notice two domain-specific challenges: 1) most previous work focus on transferring tuning history, while expert knowledge from Spark engineers is of great potential to improve the tuning performance but is not well studied so far; 2) history tasks should be carefully utilized, where using dissimilar ones lead to a deteriorated performance in production. In this paper, we present Rover, a deployed online Spark SQL tuning service for efficient and safe search on industrial workloads. To address the challenges, we propose generalized transfer learning to boost the tuning performance based on external knowledge, including expert-assisted Bayesian optimization and controlled history transfer. Experiments on public benchmarks and real-world tasks show the superiority of Rover over competitive baselines. Notably, Rover saves an average of 50.1% of the memory cost on 12k real-world Spark SQL tasks in 20 iterations, among which 76.2% of the tasks achieve a significant memory reduction of over 60%.
△ Less
Submitted 29 May, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
PyGlove: Efficiently Exchanging ML Ideas as Code
Authors:
Daiyi Peng,
Xuanyi Dong,
Esteban Real,
Yifeng Lu,
Quoc V. Le
Abstract:
The increasing complexity and scale of machine learning (ML) has led to the need for more efficient collaboration among multiple teams. For example, when a research team invents a new architecture like "ResNet," it is desirable for multiple engineering teams to adopt it. However, the effort required for each team to study and understand the invention does not scale well with the number of teams or…
▽ More
The increasing complexity and scale of machine learning (ML) has led to the need for more efficient collaboration among multiple teams. For example, when a research team invents a new architecture like "ResNet," it is desirable for multiple engineering teams to adopt it. However, the effort required for each team to study and understand the invention does not scale well with the number of teams or inventions. In this paper, we present an extension of our PyGlove library to easily and scalably share ML ideas. PyGlove represents ideas as symbolic rule-based patches, enabling researchers to write down the rules for models they have not seen. For example, an inventor can write rules that will "add skip-connections." This permits a network effect among teams: at once, any team can issue patches to all other teams. Such a network effect allows users to quickly surmount the cost of adopting PyGlove by writing less code quicker, providing a benefit that scales with time. We describe the new paradigm of organizing ML through symbolic patches and compare it to existing approaches. We also perform a case study of a large codebase where PyGlove led to an 80% reduction in the number of lines of code.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Incomplete Multi-view Clustering via Prototype-based Imputation
Authors:
Haobin Li,
Yunfan Li,
Mouxing Yang,
Peng Hu,
Dezhong Peng,
Xi Peng
Abstract:
In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention laye…
▽ More
In this paper, we study how to achieve two characteristics highly-expected by incomplete multi-view clustering (IMvC). Namely, i) instance commonality refers to that within-cluster instances should share a common pattern, and ii) view versatility refers to that cross-view samples should own view-specific patterns. To this end, we design a novel dual-stream model which employs a dual attention layer and a dual contrastive learning loss to learn view-specific prototypes and model the sample-prototype relationship. When the view is missed, our model performs data recovery using the prototypes in the missing view and the sample-prototype relationship inherited from the observed view. Thanks to our dual-stream model, both cluster- and view-specific information could be captured, and thus the instance commonality and view versatility could be preserved to facilitate IMvC. Extensive experiments demonstrate the superiority of our method on six challenging benchmarks compared with 11 approaches. The code will be released.
△ Less
Submitted 29 January, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.
-
Power law hop** of single particles in one-dimensional non-Hermitian quasicrystals
Authors:
Dechi Peng,
Shujie Cheng,
Gao Xianlong
Abstract:
In this paper, a non-Hermitian Aubry-André-Harper model with power-law hop**s ($1/s^{a}$) and quasiperiodic parameter $β$ is studied, where $a$ is the power-law index, $s$ is the hop** distance, and $β$ is a member of the metallic mean family. We find that under the weak non-Hermitian effect, there preserves $P_{\ell=1,2,3,4}$ regimes where the fraction of ergodic eigenstates is $β$-dependent…
▽ More
In this paper, a non-Hermitian Aubry-André-Harper model with power-law hop**s ($1/s^{a}$) and quasiperiodic parameter $β$ is studied, where $a$ is the power-law index, $s$ is the hop** distance, and $β$ is a member of the metallic mean family. We find that under the weak non-Hermitian effect, there preserves $P_{\ell=1,2,3,4}$ regimes where the fraction of ergodic eigenstates is $β$-dependent as $β^{\ell}$L ($L$ is the system size) similar to those in the Hermitian case. However, $P_{\ell}$ regimes are ruined by the strong non-Hermitian effect. Moreover, by analyzing the fractal dimension, we find that there are two types of edges aroused by the power-law index $a$ in the single-particle spectrum, i.e., an ergodic-to-multifractal edge for the long-range hop** case ($a<1$), and an ergodic-to-localized edge for the short-range hop** case ($a>1$). Meanwhile, the existence of these two types of edges is found to be robust against the non-Hermitian effect. By employing the Simon-Spence theory, we analyzed the absence of the localized states for $a<1$. For the short-range hop** case, with the Avila's global theory and the Sarnak method, we consider a specific example with $a=2$ to reveal the presence of the intermediate phase and to analytically locate the intermediate regime and the ergodic-to-multifractal edge, which are self-consistent with the numerically results.
△ Less
Submitted 21 January, 2023;
originally announced January 2023.
-
SPTS v2: Single-Point Scene Text Spotting
Authors:
Yuliang Liu,
Jiaxin Zhang,
Dezhi Peng,
Mingxin Huang,
Xinyu Wang,
**gqun Tang,
Can Huang,
Dahua Lin,
Chunhua Shen,
Xiang Bai,
Lianwen **
Abstract:
End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text…
▽ More
End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and polygons as a prerequisite, which are much more expensive than using single-point. Our new framework, SPTS v2, allows us to train high-performing text-spotting models using a single-point annotation. SPTS v2 reserves the advantage of the auto-regressive Transformer with an Instance Assignment Decoder (IAD) through sequentially predicting the center points of all text instances inside the same predicting sequence, while with a Parallel Recognition Decoder (PRD) for text recognition in parallel, which significantly reduces the requirement of the length of the sequence. These two decoders share the same parameters and are interactively connected with a simple but effective information transmission process to pass the gradient and information. Comprehensive experiments on various existing benchmark datasets demonstrate the SPTS v2 can outperform previous state-of-the-art single-point text spotters with fewer parameters while achieving 19$\times$ faster inference speed. Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations. Such an attempt provides a significant opportunity for scene text spotting applications beyond the realms of existing paradigms. Code is available at: https://github.com/Yuliang-Liu/SPTSv2.
△ Less
Submitted 2 September, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
Densities and Weights of Quotients of Precompact Abelian Groups
Authors:
Dekui Peng
Abstract:
The topological group version of the celebrated Banach-Mazur problem asks wether every infinite topological group has a non-trivial separable quotient group. It is known that compact groups have infinite separable metrizable quotient groups. However, as dense subgroups of compact groups, precompact groups may admit no non-trivial metrizable quotient groups, so also no non-trivial separable quotien…
▽ More
The topological group version of the celebrated Banach-Mazur problem asks wether every infinite topological group has a non-trivial separable quotient group. It is known that compact groups have infinite separable metrizable quotient groups. However, as dense subgroups of compact groups, precompact groups may admit no non-trivial metrizable quotient groups, so also no non-trivial separable quotient groups. In this paper, we study the least cardinal $\mathfrak{m}$ (resp. $\mathfrak{n}$) such that every infinite precompact abelian group admits a quotient group with density character $\leq \mathfrak{m}$ (resp. with weight $\leq \mathfrak{n}$). It is shown that if $2^{<\mathfrak{c}}=\mathfrak{c}$, then $\mathfrak{m}=\mathfrak{c}$ and $\mathfrak{n}=2^\mathfrak{c}$.
A more general problem is to describe the set $QW(G)$ of all possible weights of infinite proper quotient groups of a precompact abelian group $G$. We prove that for every subset $E$ of the interval $[ω, \mathfrak{c}]$, there exists a precompact abelian group $G$ with $QW(G)=E$. If $ω\in E$, then $G$ can be chosen to be pseudocompact.
In an appendix, we give an example to show that a non-totally disconnected locally compact group may admit no separable quotient groups. This answers an open problem posed in \cite{LMT}.
△ Less
Submitted 20 July, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Twin Contrastive Learning for Online Clustering
Authors:
Yunfan Li,
Mouxing Yang,
Dezhong Peng,
Taihao Li,
Jiantao Huang,
Xi Peng
Abstract:
This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a…
▽ More
This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a given dataset, the proposed TCL first constructs positive and negative pairs through data augmentations. Thereafter, in the row and column space of the feature matrix, instance- and cluster-level contrastive learning are respectively conducted by pulling together positive pairs while pushing apart the negatives. To alleviate the influence of intrinsic false-negative pairs and rectify cluster assignments, we adopt a confidence-based criterion to select pseudo-labels for boosting both the instance- and cluster-level contrastive learning. As a result, the clustering performance is further improved. Besides the elegant idea of twin contrastive learning, another advantage of TCL is that it could independently predict the cluster assignment for each instance, thus effortlessly fitting online scenarios. Extensive experiments on six widely-used image and text benchmarks demonstrate the effectiveness of TCL. The code will be released on GitHub.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization
Authors:
Danni Peng,
Sinno Jialin Pan
Abstract:
To address the distribution shifts between training and test data, domain generalization (DG) leverages multiple source domains to learn a model that generalizes well to unseen domains. However, existing DG methods generally suffer from overfitting to the source domains, partly due to the limited coverage of the expected region in feature space. Motivated by this, we propose to perform mixup with…
▽ More
To address the distribution shifts between training and test data, domain generalization (DG) leverages multiple source domains to learn a model that generalizes well to unseen domains. However, existing DG methods generally suffer from overfitting to the source domains, partly due to the limited coverage of the expected region in feature space. Motivated by this, we propose to perform mixup with data interpolation and extrapolation to cover the potential unseen regions. To prevent the detrimental effects of unconstrained extrapolation, we carefully design a policy to generate the instance weights, named Flatness-aware Gradient-based Mixup (FGMix). The policy employs a gradient-based similarity to assign greater weights to instances that carry more invariant information, and learns the similarity function towards flatter minima for better generalization. On the DomainBed benchmark, we validate the efficacy of various designs of FGMix and demonstrate its superiority over other DG algorithms.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric
Authors:
Pengxin Zeng,
Yunfan Li,
Peng Hu,
Dezhong Peng,
Jiancheng Lv,
Xi Peng
Abstract:
Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by develo**…
▽ More
Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by develo** a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. In brief, through maximizing and minimizing mutual information, FCMI is designed to achieve four characteristics highly expected by deep fair clustering, \textit{i.e.}, compact, balanced, and fair clusters, as well as informative features. Besides the contributions to theory and algorithm, another contribution of this work is proposing a novel fair clustering metric built upon information theory as well. Unlike existing evaluation metrics, our metric measures the clustering quality and fairness as a whole instead of separate manner. To verify the effectiveness of the proposed FCMI, we conduct experiments on six benchmarks including a single-cell RNA-seq atlas compared with 11 state-of-the-art methods in terms of five metrics. The code could be accessed from \url{ https://pengxi.me}.
△ Less
Submitted 20 April, 2023; v1 submitted 25 September, 2022;
originally announced September 2022.
-
Adaptive Meta-learner via Gradient Similarity for Few-shot Text Classification
Authors:
Tianyi Lei,
Honghui Hu,
Qiaoyang Luo,
Dezhong Peng,
Xu Wang
Abstract:
Few-shot text classification aims to classify the text under the few-shot scenario. Most of the previous methods adopt optimization-based meta learning to obtain task distribution. However, due to the neglect of matching between the few amount of samples and complicated models, as well as the distinction between useful and useless task features, these methods suffer from the overfitting issue. To…
▽ More
Few-shot text classification aims to classify the text under the few-shot scenario. Most of the previous methods adopt optimization-based meta learning to obtain task distribution. However, due to the neglect of matching between the few amount of samples and complicated models, as well as the distinction between useful and useless task features, these methods suffer from the overfitting issue. To address this issue, we propose a novel Adaptive Meta-learner via Gradient Similarity (AMGS) method to improve the model generalization ability to a new task. Specifically, the proposed AMGS alleviates the overfitting based on two aspects: (i) acquiring the potential semantic representation of samples and improving model generalization through the self-supervised auxiliary task in the inner loop, (ii) leveraging the adaptive meta-learner via gradient similarity to add constraints on the gradient obtained by base-learner in the outer loop. Moreover, we make a systematic analysis of the influence of regularization on the entire framework. Experimental results on several benchmarks demonstrate that the proposed AMGS consistently improves few-shot text classification performance compared with the state-of-the-art optimization-based meta-learning approaches.
△ Less
Submitted 28 July, 2023; v1 submitted 10 September, 2022;
originally announced September 2022.
-
FOLIO: Natural Language Reasoning with First-Order Logic
Authors:
Simeng Han,
Hailey Schoelkopf,
Yilun Zhao,
Zhenting Qi,
Martin Riddell,
Wenfei Zhou,
James Coady,
David Peng,
Yujie Qiao,
Luke Benson,
Lucy Sun,
Alex Wardle-Solano,
Hannah Szabo,
Ekaterina Zubova,
Matthew Burtell,
Jonathan Fan,
Yixin Liu,
Brian Wong,
Malcolm Sailor,
Ansong Ni,
Linyong Nan,
Jungo Kasai,
Tao Yu,
Rui Zhang,
Alexander R. Fabbri
, et al. (10 additional authors not shown)
Abstract:
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FO…
▽ More
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4.
△ Less
Submitted 17 May, 2024; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Superconductivity of Cs$_3$C$_{60}$ at atmosphere pressure
Authors:
Di Peng,
Ren-Shu Wang,
Li-Na Zong,
Xiao-Jia Chen
Abstract:
Pressure as a clean and efficient tool can bring about unexpected extraordinary physical and chemical properties of matters. The recent discoveries of superconductivity at nearly room temperature in hydrides highlight the power of pressure in this aspect. Capturing such Tc superconductivity at atmosphere pressure for the technological applications is highly desired. The large-scale growth of diamo…
▽ More
Pressure as a clean and efficient tool can bring about unexpected extraordinary physical and chemical properties of matters. The recent discoveries of superconductivity at nearly room temperature in hydrides highlight the power of pressure in this aspect. Capturing such Tc superconductivity at atmosphere pressure for the technological applications is highly desired. The large-scale growth of diamond through the chemical vapor deposition away from the usual high-pressure and high-temperature conditions fuels such a hope. Similar to hydrides, Cs-doped C$_{60}$ was also found to exhibit superconductivity by the application of pressure with a comparable Tc of 40 K as MgB$_2$. Here, we report the successful realization of superconductivity in Cs-doped C$_{60}$ at atmosphere pressure. The phase is characterized to have the primitive cubic structure in the space group of Pa-3 with the stoichiometry of Cs$_3$C$_{60}$. The superconductivity is evidenced from the observations of both the Meissner effect and zero-resistance state. Although the pressure effects on superconductivity are different for the newly discovered Cs$_{3}$C$_{60}$ compared to the known two phases with fcc and A15 structure, the evolution of Tc with the volume for all these superconductors follows the same universal trend, suggesting the same pairing mechanism of the superconductivity. Such a trend together with the nearly linear Tc vs the lattice constant in the structure with smaller unit-cell volumes and the neighbouring antiferromagnetic state in the structure with larger unit-cell volumes invites the electron-phonon coupling and the electron correlations together to account for the superconductivity in Cs$_3$C$_{60}$. The present results and findings suggest a new route to capturing the superconductivity which takes place at high pressures to atmosphere pressure environment.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Synthesis of Superconducting Phase of La$_{0.5}$Ce$_{0.5}$H$_{10}$ at High Pressures
Authors:
Ge Huang,
Tao Luo,
Philip Dalladay-Simpson,
Liu-Cheng Chen,
Zi-Yu Cao,
Di Peng,
Federico A. Gorelli,
Guo-Hua Zhong,
Hai-Qing Lin,
Xiao-Jia Chen
Abstract:
Clathrate hydride \emph{Fm}\={3}\emph{m}-LaH$_{10}$ has been proven as the most extraordinary superconductor with the critical temperature $T_c$ above 250 K upon compression of hundreds of GPa in recent years. A general hope is to reduce the stabilization pressure and maintain the high $T_c$ value of the specific phase in LaH$_{10}$. However, strong structural instability distorts \emph{Fm}\={3}\e…
▽ More
Clathrate hydride \emph{Fm}\={3}\emph{m}-LaH$_{10}$ has been proven as the most extraordinary superconductor with the critical temperature $T_c$ above 250 K upon compression of hundreds of GPa in recent years. A general hope is to reduce the stabilization pressure and maintain the high $T_c$ value of the specific phase in LaH$_{10}$. However, strong structural instability distorts \emph{Fm}\={3}\emph{m} structure and leads to a rapid decrease of $T_c$ at low pressures. Here, we investigate the phase stability and superconducting behaviors of \emph{Fm}\={3}\emph{m}-LaH$_{10}$ with enhanced chemical pre-compression through partly replacing La by Ce atoms from both experiments and calculations. For explicitly characterizing the synthesized hydride, we choose lanthanum-cerium alloy with stoichiometry composition of 1:1. X-ray diffraction and Raman scattering measurements reveal the stabilization of \emph{Fm}\={3}\emph{m}-La$_{0.5}$Ce$_{0.5}$H$_{10}$ in the pressure range of 140-160 GPa. Superconductivity with $T_c$ of 175$\pm$2 K at 155 GPa is confirmed with the observation of the zero-resistivity state and supported by the theoretical calculations. These findings provide applicability in the future explorations for a large variety of hydrogen-rich hydrides.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Synthesis and Superconductivity in Yttrium-Cerium Hydrides at Moderate Pressures
Authors:
Liu-Cheng Chen,
Tao Luo,
Philip Dalladay-Simpson,
Ge Huang,
Zi-Yu Cao,
Di Peng,
Federico Aiace Gorelli,
Guo-Hua Zhong,
Hai-Qing Lin,
Xiao-Jia Chen
Abstract:
Inspired by the high critical temperature in yttrium superhydride and the low stabilized pressure in superconducting cerium superhydride, we carry out four independent runs to synthesize yttrium-cerium alloy hydrides. The phases examined by the Raman scattering and x-ray diffraction measurements. The superconductivity is detected with the zero-resistance state at the critical temperature in the ra…
▽ More
Inspired by the high critical temperature in yttrium superhydride and the low stabilized pressure in superconducting cerium superhydride, we carry out four independent runs to synthesize yttrium-cerium alloy hydrides. The phases examined by the Raman scattering and x-ray diffraction measurements. The superconductivity is detected with the zero-resistance state at the critical temperature in the range of 97-140 K at pressures ranging from 114 GPa to 120$\pm$4 GPa. The maximum critical temperature of the synthesized hydrides is larger than those reported for cerium hydrides, while the corresponding stabilized pressure is much lower than those for superconducting yttrium hydrides. The structural analysis and theoretical calculations suggest that the phase of Y$_{0.5}$Ce$_{0.5}$H$_9$ has the space group $P6_3/mmc$ with the calculated critical temperature of 119 K, in fair agreement with the experiments. These results indicate that alloying superhydrides indeed can maintain relatively high critical temperature at modest pressures accessible by many laboratories.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Robust superconductivity near constant temperature in rubidium-doped C$_{60}$
Authors:
Li-Na Zong,
Ren-Shu Wang,
Di Peng,
Xiao-Jia Chen
Abstract:
To establish the do**-dependent phase diagram in alkali-metal doped C$_{60}$, we synthesize Rb-doped C$_{60}$ samples with different stoichiometries by using the improved wet-chemistry technique. The do** levels determined from the Raman scattering spectra often show the appearance of three electrons corresponding to the band filling of three for the synthesized compounds no matter matter what…
▽ More
To establish the do**-dependent phase diagram in alkali-metal doped C$_{60}$, we synthesize Rb-doped C$_{60}$ samples with different stoichiometries by using the improved wet-chemistry technique. The do** levels determined from the Raman scattering spectra often show the appearance of three electrons corresponding to the band filling of three for the synthesized compounds no matter matter what dopants are used. The multiple phase coexistence with the unique Rb$_{3}$C$_{60}$ is identified from the refined x-ray diffraction patterns. The phase fraction of Rb$_{3}$C$_{60}$ is found to behave with the do** in a similar manor as the superconducting shielding fraction. These rigorously established correlations among the superconducting transition temperature along with the structural and phonon vibrational properties allow us to single out Rb$_{3}$C$_{60}$ as the only superconducting phase with the nearly constant transition temperature regardless the do** level. These findings provide an experimental constraint on the theory developments for the superconductivity in fullerides.
△ Less
Submitted 7 August, 2022;
originally announced August 2022.
-
Full Set of Superconducting Parameters of K$_3$C$_{60}$
Authors:
Ren-Shu Wang,
Di Peng,
Li-Na Zong,
Zeng-Wei Zhu,
Xiao-Jia Chen
Abstract:
The superconducting parameters are the key for building or identifying the theory responsible for the mechanism of superconductivity. Such parameters for fulleride superconductors have not been well established despite the tremendous efforts over the past 30 years. Here we provide a full set of parameters through a systematic study on a well-characterized K$_{3}$C$_{60}$ sample. The obtained high…
▽ More
The superconducting parameters are the key for building or identifying the theory responsible for the mechanism of superconductivity. Such parameters for fulleride superconductors have not been well established despite the tremendous efforts over the past 30 years. Here we provide a full set of parameters through a systematic study on a well-characterized K$_{3}$C$_{60}$ sample. The obtained high upper critical field of 33.0$\pm$0.5 T from the direct electrical transport measurements together with the relatively high critical temperature and large critical current density classifies K$_{3}$C$_{60}$ as a promising three-dimensional superconducting magnet material with the advantage of the rich carbon abundance on the Earth. This high upper critical field along with the large reduced superconducting energy gap and strong phonon self-energy effect supports the strong electron-phonon coupling interactions in this superconductor. The evaluation of all self-consistently obtained parameters suggests the unconventional nature of the superconductivity for K$_3$C$_{60}$ with the joint contributions from the strong electron-phonon coupling and electron correlations. These results and findings are important not only for fundamentally understanding the superconductivity in fullerides but also for future superconducting magnet developments and applications.
△ Less
Submitted 6 August, 2022;
originally announced August 2022.
-
PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition
Authors:
Dezhi Peng,
Lianwen **,
Yuliang Liu,
Canjie Luo,
Songxuan Lai
Abstract:
Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the recognition of cropped text line images, ignoring the error caused by text line detection in real-world applications. Although some approaches aimed at page-level text recognition have been proposed in recent years, they either are limited to simple layouts…
▽ More
Handwritten Chinese text recognition (HCTR) has been an active research topic for decades. However, most previous studies solely focus on the recognition of cropped text line images, ignoring the error caused by text line detection in real-world applications. Although some approaches aimed at page-level text recognition have been proposed in recent years, they either are limited to simple layouts or require very detailed annotations including expensive line-level and even character-level bounding boxes. To this end, we propose PageNet for end-to-end weakly supervised page-level HCTR. PageNet detects and recognizes characters and predicts the reading order between them, which is more robust and flexible when dealing with complex layouts including multi-directional and curved text lines. Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines. Extensive experiments conducted on five datasets demonstrate the superiority of PageNet over existing weakly supervised and fully supervised page-level methods. These experimental results may spark further research beyond the realms of existing methods based on connectionist temporal classification or attention. The source code is available at https://github.com/shannanyinxiang/PageNet.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
Recognition of Handwritten Chinese Text by Segmentation: A Segment-annotation-free Approach
Authors:
Dezhi Peng,
Lianwen **,
Weihong Ma,
Canyu Xie,
Hesuo Zhang,
Shenggao Zhu,
**g Li
Abstract:
Online and offline handwritten Chinese text recognition (HTCR) has been studied for decades. Early methods adopted oversegmentation-based strategies but suffered from low speed, insufficient accuracy, and high cost of character segmentation annotations. Recently, segmentation-free methods based on connectionist temporal classification (CTC) and attention mechanism, have dominated the field of HCTR…
▽ More
Online and offline handwritten Chinese text recognition (HTCR) has been studied for decades. Early methods adopted oversegmentation-based strategies but suffered from low speed, insufficient accuracy, and high cost of character segmentation annotations. Recently, segmentation-free methods based on connectionist temporal classification (CTC) and attention mechanism, have dominated the field of HCTR. However, people actually read text character by character, especially for ideograms such as Chinese. This raises the question: are segmentation-free strategies really the best solution to HCTR? To explore this issue, we propose a new segmentation-based method for recognizing handwritten Chinese text that is implemented using a simple yet efficient fully convolutional network. A novel weakly supervised learning method is proposed to enable the network to be trained using only transcript annotations; thus, the expensive character segmentation annotations required by previous segmentation-based methods can be avoided. Owing to the lack of context modeling in fully convolutional networks, we propose a contextual regularization method to integrate contextual information into the network during the training stage, which can further improve the recognition performance. Extensive experiments conducted on four widely used benchmarks, namely CASIA-HWDB, CASIA-OLHWDB, ICDAR2013, and SCUT-HCCDoc, show that our method significantly surpasses existing methods on both online and offline HCTR, and exhibits a considerably higher inference speed than CTC/attention-based approaches.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
Sub-monolayer Biolasers: Lower Gain, Higher Sensitivity
Authors:
C. Gong,
X. Yang,
S. J. Tang,
Q. Q. Zhang,
Y. Wang,
Y. L. Liu,
Y. C. Chen,
G. D. Peng,
X. Fan,
Y. F. Xiao,
Y. J. Rao,
Y. Gong
Abstract:
Biomarker detection is the key to identifying health risks. However, designing sensitive biosensors in a single-use mode for disease diagnosis remains a major challenge. Here, we report sub-monolayer biolasers with remarkable repeatability for ultrasensitive and disposable biomarker detection. The biolaser sensors are designed by employing the telecom optical fibers as distributed optical microcav…
▽ More
Biomarker detection is the key to identifying health risks. However, designing sensitive biosensors in a single-use mode for disease diagnosis remains a major challenge. Here, we report sub-monolayer biolasers with remarkable repeatability for ultrasensitive and disposable biomarker detection. The biolaser sensors are designed by employing the telecom optical fibers as distributed optical microcavities and pushing the gain molecules down to the sub-monolayer level. We observe a status transition from the monolayer biolaser to the sub-monolayer biolaser by tuning the specific conjugation. By reducing the fluorophores down to the threshold density (~ 3.2 x 10-13 mol/cm2), we demonstrate an ultimate sensitivity of sub-monolayer biolaser with six orders of magnitude enhancement compared with the monolayer biolasers. We further achieved ultrasensitive immunoassay for Parkinson's disease biomarker, alpha-synuclein, with a lower limit of detection of 0.32 pM in serum. This biosensor with massive fabrication capability at ultralow cost provides a general method for the ultrasensitive disposable biodetection of disease biomarkers.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Declaration-based Prompt Tuning for Visual Question Answering
Authors:
Yuhang Liu,
Wei Wei,
Daowan Peng,
Feida Zhu
Abstract:
In recent years, the pre-training-then-fine-tuning paradigm has yielded immense success on a wide spectrum of cross-modal tasks, such as visual question answering (VQA), in which a visual-language (VL) model is first optimized via self-supervised task objectives, e.g., masked language modeling (MLM) and image-text matching (ITM), and then fine-tuned to adapt to downstream task (e.g., VQA) via a br…
▽ More
In recent years, the pre-training-then-fine-tuning paradigm has yielded immense success on a wide spectrum of cross-modal tasks, such as visual question answering (VQA), in which a visual-language (VL) model is first optimized via self-supervised task objectives, e.g., masked language modeling (MLM) and image-text matching (ITM), and then fine-tuned to adapt to downstream task (e.g., VQA) via a brand-new objective function, e.g., answer prediction. The inconsistency of the objective forms not only severely limits the generalization of pre-trained VL models to downstream tasks, but also requires a large amount of labeled data for fine-tuning. To alleviate the problem, we propose an innovative VL fine-tuning paradigm (named Declaration-based Prompt Tuning, abbreviated as DPT), which jointly optimizes the objectives of pre-training and fine-tuning of VQA model, boosting the effective adaptation of pre-trained VL models to the downstream task. Specifically, DPT reformulates the objective form of VQA task via (1) textual adaptation, which converts the given questions into declarative sentence-form for prompt-tuning, and (2) task adaptation, which optimizes the objective function of VQA problem in the manner of pre-training phase. Experimental results on GQA dataset show that DPT outperforms the fine-tuned counterpart by a large margin regarding accuracy in both fully-supervised (2.68%) and zero-shot/few-shot (over 31%) settings. All the data and codes will be available to facilitate future research.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
Semantic-Aware Domain Generalized Segmentation
Authors:
Duo Peng,
Yinjie Lei,
Munawar Hayat,
Yulan Guo,
Wen Li
Abstract:
Deep models trained on source domain lack generalization when evaluated on unseen target domains with different data distributions. The problem becomes even more pronounced when we have no access to target domain samples for adaptation. In this paper, we address domain generalized semantic segmentation, where a segmentation model is trained to be domain-invariant without using any target domain da…
▽ More
Deep models trained on source domain lack generalization when evaluated on unseen target domains with different data distributions. The problem becomes even more pronounced when we have no access to target domain samples for adaptation. In this paper, we address domain generalized semantic segmentation, where a segmentation model is trained to be domain-invariant without using any target domain data. Existing approaches to tackle this problem standardize data into a unified distribution. We argue that while such a standardization promotes global normalization, the resulting features are not discriminative enough to get clear segmentation boundaries. To enhance separation between categories while simultaneously promoting domain invariance, we propose a framework including two novel modules: Semantic-Aware Normalization (SAN) and Semantic-Aware Whitening (SAW). Specifically, SAN focuses on category-level center alignment between features from different image styles, while SAW enforces distributed alignment for the already center-aligned features. With the help of SAN and SAW, we encourage both intra-category compactness and inter-category separability. We validate our approach through extensive experiments on widely-used datasets (i.e. GTAV, SYNTHIA, Cityscapes, Mapillary and BDDS). Our approach shows significant improvements over existing state-of-the-art on various backbone networks. Code is available at https://github.com/leolyj/SAN-SAW
△ Less
Submitted 2 April, 2022;
originally announced April 2022.
-
All-optical determination of one or two emitters using quantum polarization with nitrogen-vacancy centers in diamond
Authors:
Davin Yue Ming Peng,
Josef G. Worboys,
Qiang Sun,
Shuo Li,
Marco Capelli,
Shinobu Onoda,
Takeshi Ohshima,
Philipp Reineck,
Brant C. Gibson,
Andrew D. Greentree
Abstract:
Qubit technologies using nitrogen-vacancy color centers in diamonds require precise knowledge of the centers, including the number of emitters within a diffraction-limited spot and their orientations. However, the number of emitters is challenging to determine when there is finite background, which affects the precision of resulting quantum protocols. Here we show the photoluminescence (PL) intens…
▽ More
Qubit technologies using nitrogen-vacancy color centers in diamonds require precise knowledge of the centers, including the number of emitters within a diffraction-limited spot and their orientations. However, the number of emitters is challenging to determine when there is finite background, which affects the precision of resulting quantum protocols. Here we show the photoluminescence (PL) intensity and quantum correlation (Hanbury Brown and Twiss) measurements as a function of polarization for one- and two-emitter systems. The sample was made by implanting low concentrations of adenine (C5H5N5) into a low nitrogen chemical vapor deposition diamond. This approach yielded well-spaced regions with few nitrogen-vacancy centers. By map** the PL intensity and quantum correlation as a function of polarization, we can distinguish two emitter systems from single emitters with background, providing a method to quantify the background signal at implanted sites, which might be different from off-site background levels. This approach also provides a valuable new all-optical mechanism for the determination of one or two emitter systems useful for quantum sensing, communication, and computation tasks.
△ Less
Submitted 5 June, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Implications of Mortality Displacement for Effect Modification and Selection Bias
Authors:
Honghyok Kim,
Jong-Tae Lee,
Roger D. Peng,
Kelvin C. Fong,
Michelle L. Bell
Abstract:
Mortality displacement is the concept that deaths are moved forward in time (e.g., a few days, several months, and years) by exposure from when they would occur without the exposure, which is common in environmental time-series studies. Using concepts of a frail population and loss of life expectancy, it is understood that mortality displacement may decrease rate ratio (RR). Such decreases are tho…
▽ More
Mortality displacement is the concept that deaths are moved forward in time (e.g., a few days, several months, and years) by exposure from when they would occur without the exposure, which is common in environmental time-series studies. Using concepts of a frail population and loss of life expectancy, it is understood that mortality displacement may decrease rate ratio (RR). Such decreases are thought to be minimal or substantial depending on study populations. Environmental epidemiologists have interpreted RR considering mortality displacement. This theoretical paper reveals that mortality displacement can be formulated as a built-in selection bias of RR in Cox models due to unmeasured risk factors independent from exposure of interest, and mortality displacement can also be viewed as an effect modifier by integrating the concepts of rate and loss of life expectancy. Thus, depending on the framework through which we view bias, mortality displacement can be categorized as selection bias in the bias taxonomy of epidemiology, and simultaneously mortality displacement can be seen as an effect modifier. This dichotomy provides useful implications regarding policy, effect modification, exposure time-windows selection, and generalizability, specifically why research in epidemiology may produce unexpected and heterogeneous RR over different studies and sub-populations.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
Authors:
Yingwei Li,
Adams Wei Yu,
Tianjian Meng,
Ben Caine,
Jiquan Ngiam,
Daiyi Peng,
Junyang Shen,
Bo Wu,
Yifeng Lu,
Denny Zhou,
Quoc V. Le,
Alan Yuille,
Mingxing Tan
Abstract:
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to existing 3D detection models, our study shows that fusing camera features with deep lidar features instead of raw points, can lead to better performance. Howev…
▽ More
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to existing 3D detection models, our study shows that fusing camera features with deep lidar features instead of raw points, can lead to better performance. However, as those features are often augmented and aggregated, a key challenge in fusion is how to effectively align the transformed features from two modalities. In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion. Based on InverseAug and LearnableAlign, we develop a family of generic multi-modal 3D detection models named DeepFusion, which is more accurate than previous methods. For example, DeepFusion improves PointPillars, CenterPoint, and 3D-MAN baselines on Pedestrian detection for 6.7, 8.9, and 6.2 LEVEL_2 APH, respectively. Notably, our models achieve state-of-the-art performance on Waymo Open Dataset, and show strong model robustness against input corruptions and out-of-distribution data. Code will be publicly available at https://github.com/tensorflow/lingvo/tree/master/lingvo/.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text
Authors:
Canjie Luo,
Yuanzhi Zhu,
Lianwen **,
Zhe Li,
Dezhi Peng
Abstract:
Large amounts of labeled data are urgently required for the training of robust text recognizers. However, collecting handwriting data of diverse styles, along with an immense lexicon, is considerably expensive. Although data synthesis is a promising way to relieve data hunger, two key issues of handwriting synthesis, namely, style representation and content embedding, remain unsolved. To this end,…
▽ More
Large amounts of labeled data are urgently required for the training of robust text recognizers. However, collecting handwriting data of diverse styles, along with an immense lexicon, is considerably expensive. Although data synthesis is a promising way to relieve data hunger, two key issues of handwriting synthesis, namely, style representation and content embedding, remain unsolved. To this end, we propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text based on a Generative Adversarial Network (GAN), termed SLOGAN. Specifically, we propose a style bank to parameterize the specific handwriting styles as latent vectors, which are input to a generator as style priors to achieve the corresponding handwritten styles. The training of the style bank requires only the writer identification of the source images, rather than attribute annotations. Moreover, we embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved by changing the input printed image. Finally, the generator is guided by dual discriminators to handle both the handwriting characteristics that appear as separated characters and in a series of cursive joins. Our method can synthesize words that are not included in the training vocabulary and with various new styles. Extensive experiments have shown that high-quality text images with great style diversity and rich vocabulary can be synthesized using our method, thereby enhancing the robustness of the recognizer.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Turán Number of Subdivisions of Multipartite Graphs
Authors:
Xiao-Chuan Liu,
Danni Peng,
Xu Yang
Abstract:
In this paper, we initiate the study of the extremal exponent for the $1$-subdivisions of non-bipartite non-complete graphs. Let $K_{C_k,1}$ be the graph by connecting an additional vertex to every vertex of the cycle $C_k$, $k\geq 4$. We show that for the $1$-subdivision of the tripartite graph $K_{C_k,1}^{\text{sub}}$, its Turán number is $Θ(n^{4/3})$. Then we obtain the upper bound for the extr…
▽ More
In this paper, we initiate the study of the extremal exponent for the $1$-subdivisions of non-bipartite non-complete graphs. Let $K_{C_k,1}$ be the graph by connecting an additional vertex to every vertex of the cycle $C_k$, $k\geq 4$. We show that for the $1$-subdivision of the tripartite graph $K_{C_k,1}^{\text{sub}}$, its Turán number is $Θ(n^{4/3})$. Then we obtain the upper bound for the extremal number of a family of graphs consisting of (possibly degenerate) $1$-subdivisions of certain tripartite graphs. We also obtain the same upper bound for the 1-subdivision of $(K_{s+1,t}^+)^{sub}$, where $K_{s+1,t}^+$ is a graph obtained by joining $K_{s,t}$ with one new vertex $r$ by means of connecting $r$ with one vertex in the $s$-part and all the vertices in the $t$-part.
△ Less
Submitted 11 March, 2023; v1 submitted 24 December, 2021;
originally announced December 2021.
-
An experimental approach to map** chemical bonds in nanostructured materials
Authors:
Philip N. H. Nakashima,
Ding Peng,
Xiaofen Tan,
Anna N. Mortazavi,
Tianyu Liu,
Joanne Etheridge,
Laure Bourgeois,
David R. Clarke
Abstract:
We introduce a number of techniques in quantitative convergent-beam electron diffraction under development by our group and discuss the basis for measuring interatomic electrostatic potentials (and therefore also electron densities), localised at sub-nanometre scales, with sufficient accuracy and precision to map chemical bonds in and around nanostructures in nanostructured materials. This has nev…
▽ More
We introduce a number of techniques in quantitative convergent-beam electron diffraction under development by our group and discuss the basis for measuring interatomic electrostatic potentials (and therefore also electron densities), localised at sub-nanometre scales, with sufficient accuracy and precision to map chemical bonds in and around nanostructures in nanostructured materials. This has never been possible as experimental measurements of bonding have always been restricted to homogeneous single-phased crystals.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
SPTS: Single-Point Text Spotting
Authors:
Dezhi Peng,
Xinyu Wang,
Yuliang Liu,
Jiaxin Zhang,
Mingxin Huang,
Songxuan Lai,
Shenggao Zhu,
**g Li,
Dahua Lin,
Chunhua Shen,
Xiang Bai,
Lianwen **
Abstract:
Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spot…
▽ More
Existing scene text spotting (i.e., end-to-end text detection and recognition) methods rely on costly bounding box annotations (e.g., text-line, word-level, or character-level bounding boxes). For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance. We propose an end-to-end scene text spotting method that tackles scene text spotting as a sequence prediction task. Given an image as input, we formulate the desired detection and recognition results as a sequence of discrete tokens and use an auto-regressive Transformer to predict the sequence. The proposed method is simple yet effective, which can achieve state-of-the-art results on widely used benchmarks. Most significantly, we show that the performance is not very sensitive to the positions of the point annotation, meaning that it can be much easier to be annotated or even be automatically generated than the bounding box that requires precise positions. We believe that such a pioneer attempt indicates a significant opportunity for scene text spotting applications of a much larger scale than previously possible. The code is available at https://github.com/shannanyinxiang/SPTS.
△ Less
Submitted 29 August, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems
Authors:
Danni Peng,
Sinno Jialin Pan,
Jie Zhang,
Anxiang Zeng
Abstract:
Recommender Systems (RSs) in real-world applications often deal with billions of user interactions daily. To capture the most recent trends effectively, it is common to update the model incrementally using only the newly arrived data. However, this may impede the model's ability to retain long-term information due to the potential overfitting and forgetting issues. To address this problem, we prop…
▽ More
Recommender Systems (RSs) in real-world applications often deal with billions of user interactions daily. To capture the most recent trends effectively, it is common to update the model incrementally using only the newly arrived data. However, this may impede the model's ability to retain long-term information due to the potential overfitting and forgetting issues. To address this problem, we propose a novel Adaptive Sequential Model Generation (ASMG) framework, which generates a better serving model from a sequence of historical models via a meta generator. For the design of the meta generator, we propose to employ Gated Recurrent Units (GRUs) to leverage its ability to capture the long-term dependencies. We further introduce some novel strategies to apply together with the GRU meta generator, which not only improve its computational efficiency but also enable more accurate sequential modeling. By instantiating the model-agnostic framework on a general deep learning-based RS model, we demonstrate that our method achieves state-of-the-art performance on three public datasets and one industrial dataset.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
Redefining the Quantum Supremacy Baseline With a New Generation Sunway Supercomputer
Authors:
Xin Liu,
Chu Guo,
Yong Liu,
Yuling Yang,
Jiawei Song,
Jie Gao,
Zhen Wang,
Wenzhao Wu,
Dajia Peng,
Pengpeng Zhao,
Fang Li,
He-Liang Huang,
Haohuan Fu,
Dexun Chen
Abstract:
A major milestone in the era of noisy intermediate scale quantum computers is \textit{quantum supremacy} [Nature \textbf{574}, 505 (2019)] claimed on the Sycamore quantum processor of $53$ qubits, which can perform a random circuit sampling task within $200$ seconds while the same task is estimated to require a runtime of $10,000$ years on Summit. This record has been renewed with two recent exper…
▽ More
A major milestone in the era of noisy intermediate scale quantum computers is \textit{quantum supremacy} [Nature \textbf{574}, 505 (2019)] claimed on the Sycamore quantum processor of $53$ qubits, which can perform a random circuit sampling task within $200$ seconds while the same task is estimated to require a runtime of $10,000$ years on Summit. This record has been renewed with two recent experiments on the Zuchongzhi $2.0$ ($56$ qubits) and Zuchongzhi $2.1$ ($60$ qubits) quantum processors. On the other front of quantum supremacy comparison, there has also been continuous improvements on both the classical simulation algorithm as well as the underlying hardware. And a fair justification of the computational advantages for those quantum supremacy experiments would require to practically simulate the same problems on current top supercomputers, which is still in lack. Here we report the full-scale simulations of these problems on new generation Sunway supercomputer, based on a customized tensor network contraction algorithm. Our benchmark shows that the most challenging sampling task performed on Sycamore can be accomplished within $1$ week, thus collapsing the quantum supremacy claim of Sycamore. Additionally, we show that the XEB fidelities of the \textit{quantum supremacy circuits} with up to $14$ cycles can be verified in minutes, which also provides strong consistency check for quantum supremacy experiments. Our results redefine quantum supremacy baseline using the new generation Sunway supercomputer.
△ Less
Submitted 21 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
NetDAM: Network Direct Attached Memory with Programmable In-Memory Computing ISA
Authors:
Kevin Fang,
David Peng
Abstract:
Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient hardware "template" for Memory pooling and in-memory / in-network computing. We built an FPGA prototype of the NetDAM, andwe demonstrate MPI-Allreduce communica…
▽ More
Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient hardware "template" for Memory pooling and in-memory / in-network computing. We built an FPGA prototype of the NetDAM, andwe demonstrate MPI-Allreduce communication case, the NetDAM can be used as a software and hardware friendly programmable architeture with high performance alternative for RDMA.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Closing the "Quantum Supremacy" Gap: Achieving Real-Time Simulation of a Random Quantum Circuit Using a New Sunway Supercomputer
Authors:
Yong,
Liu,
Xin,
Liu,
Fang,
Li,
Haohuan Fu,
Yuling Yang,
Jiawei Song,
Pengpeng Zhao,
Zhen Wang,
Dajia Peng,
Huarong Chen,
Chu Guo,
Heliang Huang,
Wenzhao Wu,
Dexun Chen
Abstract:
We develop a high-performance tensor-based simulator for random quantum circuits(RQCs) on the new Sunway supercomputer. Our major innovations include: (1) a near-optimal slicing scheme, and a path-optimization strategy that considers both complexity and compute density; (2) a three-level parallelization scheme that scales to about 42 million cores; (3) a fused permutation and multiplication design…
▽ More
We develop a high-performance tensor-based simulator for random quantum circuits(RQCs) on the new Sunway supercomputer. Our major innovations include: (1) a near-optimal slicing scheme, and a path-optimization strategy that considers both complexity and compute density; (2) a three-level parallelization scheme that scales to about 42 million cores; (3) a fused permutation and multiplication design that improves the compute efficiency for a wide range of tensor contraction scenarios; and (4) a mixed-precision scheme to further improve the performance. Our simulator effectively expands the scope of simulatable RQCs to include the 10*10(qubits)*(1+40+1)(depth) circuit, with a sustained performance of 1.2 Eflops (single-precision), or 4.4 Eflops (mixed-precision)as a new milestone for classical simulation of quantum circuits; and reduces the simulation sampling time of Google Sycamore to 304 seconds, from the previously claimed 10,000 years.
△ Less
Submitted 22 November, 2021; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Exploring the Sensory Spaces of English Perceptual Verbs in Natural Language Data
Authors:
Roxana Girju,
David Peng
Abstract:
In this study, we explore how language captures the meaning of words, in particular meaning related to sensory experiences learned from statistical distributions across texts. We focus on the most frequent perception verbs of English analyzed from an and Agentive vs. Experiential distinction across the five basic sensory modalities: Visual (to look vs. to see), Auditory (to listen vs. to hear), Ta…
▽ More
In this study, we explore how language captures the meaning of words, in particular meaning related to sensory experiences learned from statistical distributions across texts. We focus on the most frequent perception verbs of English analyzed from an and Agentive vs. Experiential distinction across the five basic sensory modalities: Visual (to look vs. to see), Auditory (to listen vs. to hear), Tactile (to touch vs. to feel), Olfactory (to smell), and Gustatory (to taste). In this study we report on a data-driven approach based on distributional-semantic word embeddings and clustering models to identify and uncover the descriptor sensory spaces of the perception verbs. In the analysis, we identified differences and similarities of the generated descriptors based on qualitative and quantitative differences of the perceptual experience they denote. For instance, our results show that while the perceptual spaces of the experiential verbs like to see, to hear show a more detached, logical way of knowing and learning, their agentive counterparts (to look, listen) provide a more intentional as well as more intimate and intuitive way of discovering and interacting with the world around us. We believe that such an approach has a high potential to expand our understanding and the applicability of such sensory spaces to different fields of social and cultural analysis. Research on the semantic organization of sensory spaces for various applications might benefit from an the Agentive/Experiential account to address the complexity of multiple senses wired with each other in still unexplored ways.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Global and Local Texture Randomization for Synthetic-to-Real Semantic Segmentation
Authors:
Duo Peng,
Yinjie Lei,
Lingqiao Liu,
**** Zhang,
Jun Liu
Abstract:
Semantic segmentation is a crucial image understanding task, where each pixel of image is categorized into a corresponding label. Since the pixel-wise labeling for ground-truth is tedious and labor intensive, in practical applications, many works exploit the synthetic images to train the model for real-word image semantic segmentation, i.e., Synthetic-to-Real Semantic Segmentation (SRSS). However,…
▽ More
Semantic segmentation is a crucial image understanding task, where each pixel of image is categorized into a corresponding label. Since the pixel-wise labeling for ground-truth is tedious and labor intensive, in practical applications, many works exploit the synthetic images to train the model for real-word image semantic segmentation, i.e., Synthetic-to-Real Semantic Segmentation (SRSS). However, Deep Convolutional Neural Networks (CNNs) trained on the source synthetic data may not generalize well to the target real-world data. In this work, we propose two simple yet effective texture randomization mechanisms, Global Texture Randomization (GTR) and Local Texture Randomization (LTR), for Domain Generalization based SRSS. GTR is proposed to randomize the texture of source images into diverse unreal texture styles. It aims to alleviate the reliance of the network on texture while promoting the learning of the domain-invariant cues. In addition, we find the texture difference is not always occurred in entire image and may only appear in some local areas. Therefore, we further propose a LTR mechanism to generate diverse local regions for partially stylizing the source images. Finally, we implement a regularization of Consistency between GTR and LTR (CGL) aiming to harmonize the two proposed mechanisms during training. Extensive experiments on five publicly available datasets (i.e., GTA5, SYNTHIA, Cityscapes, BDDS and Mapillary) with various SRSS settings (i.e., GTA5/SYNTHIA to Cityscapes/BDDS/Mapillary) demonstrate that the proposed method is superior to the state-of-the-art methods for domain generalization based SRSS.
△ Less
Submitted 5 August, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation
Authors:
Duo Peng,
Yinjie Lei,
Wen Li,
**** Zhang,
Yulan Guo
Abstract:
Domain adaptation is critical for success when confronting with the lack of annotations in a new domain. As the huge time consumption of labeling process on 3D point cloud, domain adaptation for 3D semantic segmentation is of great expectation. With the rise of multi-modal datasets, large amount of 2D images are accessible besides 3D point clouds. In light of this, we propose to further leverage 2…
▽ More
Domain adaptation is critical for success when confronting with the lack of annotations in a new domain. As the huge time consumption of labeling process on 3D point cloud, domain adaptation for 3D semantic segmentation is of great expectation. With the rise of multi-modal datasets, large amount of 2D images are accessible besides 3D point clouds. In light of this, we propose to further leverage 2D data for 3D domain adaptation by intra and inter domain cross modal learning. As for intra-domain cross modal learning, most existing works sample the dense 2D pixel-wise features into the same size with sparse 3D point-wise features, resulting in the abandon of numerous useful 2D features. To address this problem, we propose Dynamic sparse-to-dense Cross Modal Learning (DsCML) to increase the sufficiency of multi-modality information interaction for domain adaptation. For inter-domain cross modal learning, we further advance Cross Modal Adversarial Learning (CMAL) on 2D and 3D data which contains different semantic content aiming to promote high-level modal complementarity. We evaluate our model under various multi-modality domain adaptation settings including day-to-night, country-to-country and dataset-to-dataset, brings large improvements over both uni-modal and multi-modal domain adaptation methods on all settings.
△ Less
Submitted 7 August, 2021; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Authors:
Shengjie Luo,
Shanda Li,
Tianle Cai,
Di He,
Dinglan Peng,
Shuxin Zheng,
Guolin Ke,
Liwei Wang,
Tie-Yan Liu
Abstract:
The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful atte…
▽ More
The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention. Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT). With FFT, our method achieves $\mathcal{O}(n\log n)$ time complexity. Interestingly, we further demonstrate that properly using relative positional encoding can mitigate the training instability problem of vanilla kernelized attention. On a wide range of tasks, we empirically show that our models can be trained from scratch without any optimization issues. The learned model performs better than many efficient Transformer variants and is faster than standard Transformer in the long-sequence regime.
△ Less
Submitted 2 November, 2021; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter
Authors:
Tianwei Wang,
Yuanzhi Zhu,
Lianwen **,
Dezhi Peng,
Zhe Li,
Mengchao He,
Yongpan Wang,
Canjie Luo
Abstract:
Text recognition is a popular research subject with many associated challenges. Despite the considerable progress made in recent years, the text recognition task itself is still constrained to solve the problem of reading cropped line text images and serves as a subtask of optical character recognition (OCR) systems. As a result, the final text recognition result is limited by the performance of t…
▽ More
Text recognition is a popular research subject with many associated challenges. Despite the considerable progress made in recent years, the text recognition task itself is still constrained to solve the problem of reading cropped line text images and serves as a subtask of optical character recognition (OCR) systems. As a result, the final text recognition result is limited by the performance of the text detector. In this paper, we propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA), which can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference. This enables an ordinary text recognizer to process multi-line text such that text detection can be completely freed. Specifically, we integrate IFA into the two most prevailing text recognition streams (attention-based and CTC-based) and propose attention-guided dense prediction (ADP) and Extended CTC (ExCTC). Furthermore, the Wasserstein-based Hollow Aggregation Cross-Entropy (WH-ACE) is proposed to suppress negative predictions to assist in training ADP and ExCTC. We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks while maintaining the fastest speed, and ADP and ExCTC complement each other on the perspective of different application scenarios. Code will be available at https://github.com/WangTianwei/Implicit-feature-alignment.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Differentiable Architecture Search for Reinforcement Learning
Authors:
Yingjie Miao,
Xingyou Song,
John D. Co-Reyes,
Daiyi Peng,
Summer Yue,
Eugene Brevdo,
Aleksandra Faust
Abstract:
In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-pol…
▽ More
In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-policy and on-policy RL algorithms, at only 3x more computation time. Furthermore, through numerous ablation studies, we systematically verify that not only does DARTS correctly upweight operations during its supernet phrase, but also gradually improves resulting discrete cells up to 30x more efficiently than random search, suggesting DARTS is surprisingly an effective tool for improving architectures in RL.
△ Less
Submitted 15 November, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Perspective on Data Science
Authors:
Roger D. Peng,
Hilary S. Parker
Abstract:
The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be natural, but over time maintaining such a broad definition becomes unwieldy and impedes progress. In particular, the teaching of data science is hampered by the s…
▽ More
The field of data science currently enjoys a broad definition that includes a wide array of activities which borrow from many other established fields of study. Having such a vague characterization of a field in the early stages might be natural, but over time maintaining such a broad definition becomes unwieldy and impedes progress. In particular, the teaching of data science is hampered by the seeming need to cover many different points of interest. Data scientists must ultimately identify the core of the field by determining what makes the field unique and what it means to develop new knowledge in data science. In this review we attempt to distill some core ideas from data science by focusing on the iterative process of data analysis and develop some generalizations from past experience. Generalizations of this nature could form the basis of a theory of data science and would serve to unify and scale the teaching of data science to large audiences.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.