-
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
Authors:
Qiang Zhou,
Chaohui Yu,
Zhibin Wang,
Qi Qian,
Hao Li
Abstract:
Supervised learning based object detection frameworks demand plenty of laborious manual annotations, which may not be practical in real applications. Semi-supervised object detection (SSOD) can effectively leverage unlabeled data to improve the model performance, which is of great significance for the application of object detection models. In this paper, we revisit SSOD and propose Instant-Teachi…
▽ More
Supervised learning based object detection frameworks demand plenty of laborious manual annotations, which may not be practical in real applications. Semi-supervised object detection (SSOD) can effectively leverage unlabeled data to improve the model performance, which is of great significance for the application of object detection models. In this paper, we revisit SSOD and propose Instant-Teaching, a completely end-to-end and effective SSOD framework, which uses instant pseudo labeling with extended weak-strong data augmentations for teaching during each training iteration. To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$. Extensive experiments on both MS-COCO and PASCAL VOC datasets substantiate the superiority of our framework. Specifically, our method surpasses state-of-the-art methods by 4.2 mAP on MS-COCO when using $2\%$ labeled data. Even with full supervised information of MS-COCO, the proposed method still outperforms state-of-the-art methods by about 1.0 mAP. On PASCAL VOC, we can achieve more than 5 mAP improvement by applying VOC07 as labeled data and VOC12 as unlabeled data.
△ Less
Submitted 21 March, 2021;
originally announced March 2021.
-
A preliminary study about gravitational wave radiation and cosmic heat death
Authors:
Jianming Zhang,
Qiyue Qian,
Yiqing Guo,
Xin Wang,
Xiao-Dong Li
Abstract:
We study the role of gravitational waves (GW) in the heat death of the universe. Due to the GW emission, in a very long period, dynamical systems in the universe suffer from persistent mechanical energy dissipation, evolving to a state of universal rest and death. With N-body simulations, we adopt a simple yet representative scheme to calculate the energy loss due to the GW emission. For current d…
▽ More
We study the role of gravitational waves (GW) in the heat death of the universe. Due to the GW emission, in a very long period, dynamical systems in the universe suffer from persistent mechanical energy dissipation, evolving to a state of universal rest and death. With N-body simulations, we adopt a simple yet representative scheme to calculate the energy loss due to the GW emission. For current dark matter systems with mass $\sim10^{12}-10^{15} M_\odot$, we estimate their GW emission timescale as $\sim10^{19}-10^{25}$ years. This timescale is significantly larger than any baryon processes in the universe, but still $\sim10^{80}$ times shorter than that of the Hawking radiation. We stress that our analysis could be invalid due to many unknowns such as the dynamical chaos, the quadrupole momentum of halos, the angular momentum loss, the dynamic friction, the central black hole accretion, the dark matter decays or annihilations, the property of dark energy and the future evolution of the universe.
△ Less
Submitted 18 May, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition
Authors:
Ming Lin,
Pichao Wang,
Zhenhong Sun,
Hesen Chen,
Xiuyu Sun,
Qi Qian,
Hao Li,
Rong **
Abstract:
Accuracy predictor is a key component in Neural Architecture Search (NAS) for ranking architectures. Building a high-quality accuracy predictor usually costs enormous computation. To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures. The Zen-Score represents the network expressivity and positively correlates wi…
▽ More
Accuracy predictor is a key component in Neural Architecture Search (NAS) for ranking architectures. Building a high-quality accuracy predictor usually costs enormous computation. To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures. The Zen-Score represents the network expressivity and positively correlates with the model accuracy. The calculation of Zen-Score only takes a few forward inferences through a randomly initialized network, without training network parameters. Built upon the Zen-Score, we further propose a new NAS algorithm, termed as Zen-NAS, by maximizing the Zen-Score of the target network under given inference budgets. Within less than half GPU day, Zen-NAS is able to directly search high performance architectures in a data-free style. Comparing with previous NAS methods, the proposed Zen-NAS is magnitude times faster on multiple server-side and mobile-side GPU platforms with state-of-the-art accuracy on ImageNet. Our source code and pre-trained models are released on https://github.com/idstcv/ZenNAS.
△ Less
Submitted 22 August, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Beyond optimization -- supervised learning applications in relativistic laser-plasma experiments
Authors:
**pu Lin,
Qian Qian,
Jon Murphy,
Abigail Hsu,
Yong Ma,
Alfred Hero,
Alexander G. R. Thomas,
Karl Krushelnick
Abstract:
We explore the applications of machine learning techniques in relativistic laser-plasma experiments beyond optimization purposes. We predict the beam charge of electrons produced in a laser wakefield accelerator given the laser wavefront change caused by a deformable mirror. Machine learning enables feature analysis beyond merely searching for an optimal beam charge, showing that specific aberrati…
▽ More
We explore the applications of machine learning techniques in relativistic laser-plasma experiments beyond optimization purposes. We predict the beam charge of electrons produced in a laser wakefield accelerator given the laser wavefront change caused by a deformable mirror. Machine learning enables feature analysis beyond merely searching for an optimal beam charge, showing that specific aberrations in the laser wavefront are favored in generating higher beam charges. Supervised learning models allow characterizing the measured data quality as well as recognizing irreproducible data and potential outliers. We also include virtual measurement errors in the experimental data to examine the model robustness under these conditions. This work demonstrates how machine learning methods can benefit data analysis and physics interpretation in a highly nonlinear problem of relativistic laser-plasma interaction.
△ Less
Submitted 3 January, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
WeMix: How to Better Utilize Data Augmentation
Authors:
Yi Xu,
Asaf Noy,
Ming Lin,
Qi Qian,
Hao Li,
Rong **
Abstract:
Data augmentation is a widely used training trick in deep learning to improve the network generalization ability. Despite many encouraging results, several recent studies did point out limitations of the conventional data augmentation scheme in certain scenarios, calling for a better theoretical understanding of data augmentation. In this work, we develop a comprehensive analysis that reveals pros…
▽ More
Data augmentation is a widely used training trick in deep learning to improve the network generalization ability. Despite many encouraging results, several recent studies did point out limitations of the conventional data augmentation scheme in certain scenarios, calling for a better theoretical understanding of data augmentation. In this work, we develop a comprehensive analysis that reveals pros and cons of data augmentation. The main limitation of data augmentation arises from the data bias, i.e. the augmented data distribution can be quite different from the original one. This data bias leads to a suboptimal performance of existing data augmentation methods. To this end, we develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation. Our theoretical analysis shows that both algorithms are guaranteed to improve the effect of data augmentation through the bias correction, which is further validated by our empirical studies. Finally, we propose a generic algorithm "WeMix" by combining AugDrop and MixLoss, whose effectiveness is observed from extensive empirical evaluations.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
Improved Knowledge Distillation via Full Kernel Matrix Transfer
Authors:
Qi Qian,
Hao Li,
Juhua Hu
Abstract:
Knowledge distillation is an effective way for model compression in deep learning. Given a large model (i.e., teacher model), it aims to improve the performance of a compact model (i.e., student model) by transferring the information from the teacher. Various information for distillation has been studied. Recently, a number of works propose to transfer the pairwise similarity between examples to d…
▽ More
Knowledge distillation is an effective way for model compression in deep learning. Given a large model (i.e., teacher model), it aims to improve the performance of a compact model (i.e., student model) by transferring the information from the teacher. Various information for distillation has been studied. Recently, a number of works propose to transfer the pairwise similarity between examples to distill relative information. However, most of efforts are devoted to develo** different similarity measurements, while only a small matrix consisting of examples within a mini-batch is transferred at each iteration that can be inefficient for optimizing the pairwise similarity over the whole data set. In this work, we aim to transfer the full similarity matrix effectively. The main challenge is from the size of the full matrix that is quadratic to the number of examples. To address the challenge, we decompose the original full matrix with Nystr{ö}m method. By selecting appropriate landmark points, our theoretical analysis indicates that the loss for transfer can be further simplified. Concretely, we find that the difference between the original full kernel matrices between teacher and student can be well bounded by that of the corresponding partial matrices, which only consists of similarities between original examples and landmark points. Compared with the full matrix, the size of the partial matrix is linear in the number of examples, which improves the efficiency of optimization significantly. The empirical study on benchmark data sets demonstrates the effectiveness of the proposed algorithm. Code is available at \url{https://github.com/idstcv/KDA}.
△ Less
Submitted 29 March, 2022; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Semi-Anchored Detector for One-Stage Object Detection
Authors:
Lei Chen,
Qi Qian,
Hao Li
Abstract:
A standard one-stage detector is comprised of two tasks: classification and regression. Anchors of different shapes are introduced for each location in the feature map to mitigate the challenge of regression for multi-scale objects. However, the performance of classification can degrade due to the highly class-imbalanced problem in anchors. Recently, many anchor-free algorithms have been proposed…
▽ More
A standard one-stage detector is comprised of two tasks: classification and regression. Anchors of different shapes are introduced for each location in the feature map to mitigate the challenge of regression for multi-scale objects. However, the performance of classification can degrade due to the highly class-imbalanced problem in anchors. Recently, many anchor-free algorithms have been proposed to classify locations directly. The anchor-free strategy benefits the classification task but can lead to sup-optimum for the regression task due to the lack of prior bounding boxes. In this work, we propose a semi-anchored framework. Concretely, we identify positive locations in classification, and associate multiple anchors to the positive locations in regression. With ResNet-101 as the backbone, the proposed semi-anchored detector achieves 43.6% mAP on COCO data set, which demonstrates the state-of-art performance among one-stage detectors.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Reaffirming the Cosmic Acceleration without Supernova and CMB
Authors:
Xiaolin Luo,
Zhiqi Huang,
Qiyue Qian,
Lu Huang
Abstract:
Recent discussions about supernova magnitude evolution have raised doubts about the robustness of the late-universe acceleration. In a previous letter, Huang did a null test of the cosmic acceleration by using a Parameterization based on the cosmic Age (PAge), which covers a broad class of cosmological models including the standard $Λ$ cold dark matter model and its many extensions. In this work,…
▽ More
Recent discussions about supernova magnitude evolution have raised doubts about the robustness of the late-universe acceleration. In a previous letter, Huang did a null test of the cosmic acceleration by using a Parameterization based on the cosmic Age (PAge), which covers a broad class of cosmological models including the standard $Λ$ cold dark matter model and its many extensions. In this work, we continue to explore the cosmic expansion history with the PAge approximation. Using baryon acoustic oscillations ({\it without} a CMB prior on the acoustic scale), gravitational strong lens time delay, and passively evolving early galaxies as cosmic chronometers, we obtain $\gtrsim 4σ$ detections of cosmic acceleration for both flat and nonflat PAge universes. In the nonflat case, we find a $\gtrsim 3σ$ tension between the spatial curvatures derived from baryon acoustic oscillations and strong lens time delay. Implications and possible systematics are discussed.
△ Less
Submitted 12 December, 2020; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Using the Mark Weighted Correlation Functions to Improve the Constraints on Cosmological Parameters
Authors:
Yizhao Yang,
Haitao Miao,
Qinglin Ma,
Miaoxin Liu,
Cristiano G. Sabiu,
Jaime Forero-Romero,
Yuanzhu Huang,
Limin Lai,
Qiyue Qian,
Yi Zheng,
Xiao-Dong Li
Abstract:
We used the mark weighted correlation functions (MCFs), $W(s)$, to study the large scale structure of the Universe. We studied five types of MCFs with the weighting scheme $ρ^α$, where $ρ$ is the local density, and $α$ is taken as $-1,\ -0.5,\ 0,\ 0.5$, and 1. We found that different MCFs have very different amplitudes and scale-dependence. Some of the MCFs exhibit distinctive peaks and valleys th…
▽ More
We used the mark weighted correlation functions (MCFs), $W(s)$, to study the large scale structure of the Universe. We studied five types of MCFs with the weighting scheme $ρ^α$, where $ρ$ is the local density, and $α$ is taken as $-1,\ -0.5,\ 0,\ 0.5$, and 1. We found that different MCFs have very different amplitudes and scale-dependence. Some of the MCFs exhibit distinctive peaks and valleys that do not exist in the standard correlation functions. Their locations are robust against the redshifts and the background geometry, however it is unlikely that they can be used as ``standard rulers'' to probe the cosmic expansion history. Nonetheless we find that these features may be used to probe parameters related with the structure formation history, such as the values of $σ_8$ and the galaxy bias. Finally, after conducting a comprehensive analysis using the full shapes of the $W(s)$s and $W_{Δs}(μ)$s, we found that, combining different types of MCFs can significantly improve the cosmological parameter constraints. Compared with using only the standard correlation function, the combinations of MCFs with $α=0,\ 0.5,\ 1$ and $α=0,\ -1,\ -0.5,\ 0.5,\ 1$ can improve the constraints on $Ω_m$ and $w$ by $\approx30\%$ and $50\%$, respectively. We find highly significant evidence that MCFs can improve cosmological parameter constraints.
△ Less
Submitted 1 August, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Neural Architecture Design for GPU-Efficient Networks
Authors:
Ming Lin,
Hesen Chen,
Xiuyu Sun,
Qi Qian,
Hao Li,
Rong **
Abstract:
Many mission-critical systems are based on GPU for inference. It requires not only high recognition accuracy but also low latency in responding time. Although many studies are devoted to optimizing the structure of deep models for efficient inference, most of them do not leverage the architecture of \textbf{modern GPU} for fast inference, leading to suboptimal performance. To address this issue, w…
▽ More
Many mission-critical systems are based on GPU for inference. It requires not only high recognition accuracy but also low latency in responding time. Although many studies are devoted to optimizing the structure of deep models for efficient inference, most of them do not leverage the architecture of \textbf{modern GPU} for fast inference, leading to suboptimal performance. To address this issue, we propose a general principle for designing GPU-efficient networks based on extensive empirical studies. This design principle enables us to search for GPU-efficient network structures effectively by a simple and lightweight method as opposed to most Neural Architecture Search (NAS) methods that are complicated and computationally expensive. Based on the proposed framework, we design a family of GPU-Efficient Networks, or GENets in short. We did extensive evaluations on multiple GPU platforms and inference engines. While achieving $\geq 81.3\%$ top-1 accuracy on ImageNet, GENet is up to $6.4$ times faster than EfficienNet on GPU. It also outperforms most state-of-the-art models that are more efficient than EfficientNet in high precision regimes. Our source code and pre-trained models are available from \url{https://github.com/idstcv/GPU-Efficient-Networks}.
△ Less
Submitted 11 August, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Towards Understanding Label Smoothing
Authors:
Yi Xu,
Yuanhong Xu,
Qi Qian,
Hao Li,
Rong **
Abstract:
Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants. However, the theoretical understanding of its power from the view of optimization is still rare. This study opens the door to a deep understanding of LSR by initiating the analysis. In this paper, we analyze the convergence behavio…
▽ More
Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants. However, the theoretical understanding of its power from the view of optimization is still rare. This study opens the door to a deep understanding of LSR by initiating the analysis. In this paper, we analyze the convergence behaviors of stochastic gradient descent with label smoothing regularization for solving non-convex problems and show that an appropriate LSR can help to speed up the convergence by reducing the variance. More interestingly, we proposed a simple yet effective strategy, namely Two-Stage LAbel smoothing algorithm (TSLA), that uses LSR in the early training epochs and drops it off in the later training epochs. We observe from the improved convergence result of TSLA that it benefits from LSR in the first stage and essentially converges faster in the second stage. To the best of our knowledge, this is the first work for understanding the power of LSR via establishing convergence complexity of stochastic methods with LSR in non-convex optimization. We empirically demonstrate the effectiveness of the proposed method in comparison with baselines on training ResNet models over benchmark data sets.
△ Less
Submitted 2 October, 2020; v1 submitted 20 June, 2020;
originally announced June 2020.
-
Weakly Supervised Representation Learning with Coarse Labels
Authors:
Yuanhong Xu,
Qi Qian,
Hao Li,
Rong **,
Juhua Hu
Abstract:
With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most existing algorithms on visual benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent…
▽ More
With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most existing algorithms on visual benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent manner. Therefore, the representations obtained by deep learning outperform hand-crafted features significantly. However, for some real-world applications, it is too expensive to collect the task-specific labels, such as visual search in online shop**. Compared to the limited availability of these task-specific labels, their coarse-class labels are much more affordable, but representations learned from them can be suboptimal for the target task. To mitigate this challenge, we propose an algorithm to learn the fine-grained patterns for the target task, when only its coarse-class labels are available. More importantly, we provide a theoretical guarantee for this. Extensive experiments on real-world data sets demonstrate that the proposed method can significantly improve the performance of learned representations on the target task, when only coarse-class information is available for training. Code is available at \url{https://github.com/idstcv/CoIns}.
△ Less
Submitted 24 August, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Hierarchically Robust Representation Learning
Authors:
Qi Qian,
Juhua Hu,
Hao Li
Abstract:
With the tremendous success of deep learning in visual tasks, the representations extracted from intermediate layers of learned models, that is, deep features, attract much attention of researchers. Previous empirical analysis shows that those features can contain appropriate semantic information. Therefore, with a model trained on a large-scale benchmark data set (e.g., ImageNet), the extracted f…
▽ More
With the tremendous success of deep learning in visual tasks, the representations extracted from intermediate layers of learned models, that is, deep features, attract much attention of researchers. Previous empirical analysis shows that those features can contain appropriate semantic information. Therefore, with a model trained on a large-scale benchmark data set (e.g., ImageNet), the extracted features can work well on other tasks. In this work, we investigate this phenomenon and demonstrate that deep features can be suboptimal due to the fact that they are learned by minimizing the empirical risk. When the data distribution of the target task is different from that of the benchmark data set, the performance of deep features can degrade. Hence, we propose a hierarchically robust optimization method to learn more generic features. Considering the example-level and concept-level robustness simultaneously, we formulate the problem as a distributionally robust optimization problem with Wasserstein ambiguity set constraints, and an efficient algorithm with the conventional training pipeline is proposed. Experiments on benchmark data sets demonstrate the effectiveness of the robust deep representations.
△ Less
Submitted 27 March, 2020; v1 submitted 10 November, 2019;
originally announced November 2019.
-
SoftTriple Loss: Deep Metric Learning Without Triplet Sampling
Authors:
Qi Qian,
Lei Shang,
Baigui Sun,
Juhua Hu,
Hao Li,
Rong **
Abstract:
Distance metric learning (DML) is to learn the embeddings where examples from the same class are closer than examples from different classes. It can be cast as an optimization problem with triplet constraints. Due to the vast number of triplet constraints, a sampling strategy is essential for DML. With the tremendous success of deep learning in classifications, it has been applied for DML. When le…
▽ More
Distance metric learning (DML) is to learn the embeddings where examples from the same class are closer than examples from different classes. It can be cast as an optimization problem with triplet constraints. Due to the vast number of triplet constraints, a sampling strategy is essential for DML. With the tremendous success of deep learning in classifications, it has been applied for DML. When learning embeddings with deep neural networks (DNNs), only a mini-batch of data is available at each iteration. The set of triplet constraints has to be sampled within the mini-batch. Since a mini-batch cannot capture the neighbors in the original set well, it makes the learned embeddings sub-optimal. On the contrary, optimizing SoftMax loss, which is a classification loss, with DNN shows a superior performance in certain DML tasks. It inspires us to investigate the formulation of SoftMax. Our analysis shows that SoftMax loss is equivalent to a smoothed triplet loss where each class has a single center. In real-world data, one class can contain several local clusters rather than a single one, e.g., birds of different poses. Therefore, we propose the SoftTriple loss to extend the SoftMax loss with multiple centers for each class. Compared with conventional deep metric learning algorithms, optimizing SoftTriple loss can learn the embeddings without the sampling phase by mildly increasing the size of the last fully connected layer. Experiments on the benchmark fine-grained data sets demonstrate the effectiveness of the proposed loss function. Code is available at https://github.com/idstcv/SoftTriple
△ Less
Submitted 14 April, 2020; v1 submitted 11 September, 2019;
originally announced September 2019.
-
On the connections between algorithmic regularization and penalization for convex losses
Authors:
Qian Qian,
Xiaoyuan Qian
Abstract:
In this work we establish the equivalence of algorithmic regularization and explicit convex penalization for generic convex losses. We introduce a geometric condition for the optimization path of a convex function, and show that if such a condition is satisfied, the optimization path of an iterative algorithm on the unregularized optimization problem can be represented as the solution path of a co…
▽ More
In this work we establish the equivalence of algorithmic regularization and explicit convex penalization for generic convex losses. We introduce a geometric condition for the optimization path of a convex function, and show that if such a condition is satisfied, the optimization path of an iterative algorithm on the unregularized optimization problem can be represented as the solution path of a corresponding penalized problem.
△ Less
Submitted 7 September, 2019;
originally announced September 2019.
-
GR-MHD disk winds and jets from black holes and resistive accretion disks
Authors:
Christos Vourellis,
Christian Fendt,
Qian Qian,
Scott Noble
Abstract:
We perform GR-MHD simulations of outflow launching from thin accretion disks. As in the non-relativistic case, resistivity is essential for the mass loading of the disk wind. We implemented resistivity in the ideal GR-MHD code HARM3D, extending previous works (Qian et al. 2017, 2018) for larger physical grids, higher spatial resolution, and longer simulation time. We consider an initially thin, re…
▽ More
We perform GR-MHD simulations of outflow launching from thin accretion disks. As in the non-relativistic case, resistivity is essential for the mass loading of the disk wind. We implemented resistivity in the ideal GR-MHD code HARM3D, extending previous works (Qian et al. 2017, 2018) for larger physical grids, higher spatial resolution, and longer simulation time. We consider an initially thin, resistive disk orbiting the black hole, threaded by a large-scale magnetic flux. As the system evolves, outflows are launched from the black hole magnetosphere and the disk surface. We mainly focus on disk outflows, investigating their MHD structure and energy output in comparison with the Poynting-dominated black hole jet. The disk wind encloses two components -- a fast component dominated by the toroidal magnetic field and a slower component dominated by the poloidal field. The disk wind transitions from sub to super-Alfvénic speed, reaching velocities $\simeq 0.1c$. We provide parameter studies varying spin parameter and resistivity level, and measure the respective mass and energy fluxes. A higher spin strengthens the $B_φ$-dominated disk wind along the inner jet. We disentangle a critical resistivity level that leads to a maximum matter and energy output for both, resulting from the interplay between re-connection and diffusion, which in combination govern the magnetic flux and the mass loading. For counter-rotating black holes the outflow structure shows a magnetic field reversal. We estimate the opacity of the inner-most accretion stream and the outflow structure around it. This stream may be critically opaque for a lensed signal, while the axial jet funnel remains optically thin.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
DR Loss: Improving Object Detection by Distributional Ranking
Authors:
Qi Qian,
Lei Chen,
Hao Li,
Rong **
Abstract:
Most of object detection algorithms can be categorized into two classes: two-stage detectors and one-stage detectors. Recently, many efforts have been devoted to one-stage detectors for the simple yet effective architecture. Different from two-stage detectors, one-stage detectors aim to identify foreground objects from all candidates in a single stage. This architecture is efficient but can suffer…
▽ More
Most of object detection algorithms can be categorized into two classes: two-stage detectors and one-stage detectors. Recently, many efforts have been devoted to one-stage detectors for the simple yet effective architecture. Different from two-stage detectors, one-stage detectors aim to identify foreground objects from all candidates in a single stage. This architecture is efficient but can suffer from the imbalance issue with respect to two aspects: the inter-class imbalance between the number of candidates from foreground and background classes and the intra-class imbalance in the hardness of background candidates, where only a few candidates are hard to be identified. In this work, we propose a novel distributional ranking (DR) loss to handle the challenge. For each image, we convert the classification problem to a ranking problem, which considers pairs of candidates within the image, to address the inter-class imbalance problem. Then, we push the distributions of confidence scores for foreground and background towards the decision boundary. After that, we optimize the rank of the expectations of derived distributions in lieu of original pairs. Our method not only mitigates the intra-class imbalance issue in background candidates but also improves the efficiency for the ranking algorithm. By merely replacing the focal loss in RetinaNet with the developed DR loss and applying ResNet-101 as the backbone, mAP of the single-scale test on COCO can be improved from 39.1% to 41.7% without bells and whistles, which demonstrates the effectiveness of the proposed loss function. Code is available at \url{https://github.com/idstcv/DR_loss}.
△ Less
Submitted 13 April, 2020; v1 submitted 23 July, 2019;
originally announced July 2019.
-
The Implicit Bias of AdaGrad on Separable Data
Authors:
Qian Qian,
Xiaoyuan Qian
Abstract:
We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper u…
▽ More
We study the implicit bias of AdaGrad on separable linear classification problems. We show that AdaGrad converges to a direction that can be characterized as the solution of a quadratic optimization problem with the same feasible set as the hard SVM problem. We also give a discussion about how different choices of the hyperparameters of AdaGrad might impact this direction. This provides a deeper understanding of why adaptive methods do not seem to have the generalization ability as good as gradient descent does in practice.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.
-
Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement
Authors:
Ming Lin,
Xiaomin Song,
Qi Qian,
Hao Li,
Liang Sun,
Shenghuo Zhu,
Rong **
Abstract:
Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression…
▽ More
Satellite-based positioning system such as GPS often suffers from large amount of noise that degrades the positioning accuracy dramatically especially in real-time applications. In this work, we consider a data-mining approach to enhance the GPS signal. We build a large-scale high precision GPS receiver grid system to collect real-time GPS signals for training. The Gaussian Process (GP) regression is chosen to model the vertical Total Electron Content (vTEC) distribution of the ionosphere of the Earth. Our experiments show that the noise in the real-time GPS signals often exceeds the breakdown point of the conventional robust regression methods resulting in sub-optimal system performance. We propose a three-step approach to address this challenge. In the first step we perform a set of signal validity tests to separate the signals into clean and dirty groups. In the second step, we train an initial model on the clean signals and then reweigting the dirty signals based on the residual error. A final model is retrained on both the clean signals and the reweighted dirty signals. In the theoretical analysis, we prove that the proposed three-step approach is able to tolerate much higher noise level than the vanilla robust regression methods if two reweighting rules are followed. We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods. Our method achieves centimeter positioning accuracy in the benchmark region with probability $78.4\%$ , outperforming the second best baseline method by a margin of $8.3\%$. The benchmark takes 6 hours on 20,000 CPU cores or 14 years on a single CPU.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
Effect of density on microwave-induced resistance oscillations in back-gated GaAs quantum wells
Authors:
X. Fu,
M. D. Borisov,
M. A. Zudov,
Q. Qian,
J. D. Watson,
M. J. Manfra
Abstract:
We report on microwave-induced resistance oscillations (MIROs) in a tunable-density 30-nm-wide GaAs/AlGaAs quantum well. We find that the MIRO amplitude increases dramatically with carrier density. Our analysis shows that the anticipated increase in the effective microwave power and quantum lifetime with density is not sufficient to explain the observed growth of the amplitude. We further observe…
▽ More
We report on microwave-induced resistance oscillations (MIROs) in a tunable-density 30-nm-wide GaAs/AlGaAs quantum well. We find that the MIRO amplitude increases dramatically with carrier density. Our analysis shows that the anticipated increase in the effective microwave power and quantum lifetime with density is not sufficient to explain the observed growth of the amplitude. We further observe that the fundamental oscillation extrema move towards cyclotron resonance with increasing density, which also contradicts theoretical predictions. These findings reveal that the density dependence is not properly captured by existing theories, calling for further studies.
△ Less
Submitted 5 April, 2019;
originally announced April 2019.
-
Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee
Authors:
Ming Lin,
Shuang Qiu,
Jie** Ye,
Xiaomin Song,
Qi Qian,
Liang Sun,
Shenghuo Zhu,
Rong **
Abstract:
Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank $k$ generalized FM of $d$ dimensional input, the previous best known sampling complexity is $\mathcal{O}[k^{3}d\cdot\mathrm{polylog}(kd)]$ under Gaussian distribution. This bound is sub-opt…
▽ More
Factorization machine (FM) is a popular machine learning model to capture the second order feature interactions. The optimal learning guarantee of FM and its generalized version is not yet developed. For a rank $k$ generalized FM of $d$ dimensional input, the previous best known sampling complexity is $\mathcal{O}[k^{3}d\cdot\mathrm{polylog}(kd)]$ under Gaussian distribution. This bound is sub-optimal comparing to the information theoretical lower bound $\mathcal{O}(kd)$. In this work, we aim to tighten this bound towards optimal and generalize the analysis to sub-gaussian distribution. We prove that when the input data satisfies the so-called $τ$-Moment Invertible Property, the sampling complexity of generalized FM can be improved to $\mathcal{O}[k^{2}d\cdot\mathrm{polylog}(kd)/τ^{2}]$. When the second order self-interaction terms are excluded in the generalized FM, the bound can be improved to the optimal $\mathcal{O}[kd\cdot\mathrm{polylog}(kd)]$ up to the logarithmic factors. Our analysis also suggests that the positive semi-definite constraint in the conventional FM is redundant as it does not improve the sampling complexity while making the model difficult to optimize. We evaluate our improved FM model in real-time high precision GPS signal calibration task to validate its superiority.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
First detection of vibrationally excited Glycolaldehyde in the solar-type protostar IRAS 16293-2422
Authors:
Yan Zhou,
Sheng-Li Qin,
Alvaro Sanchez-Monge,
Peter Schilke,
Tie Liu,
Luis A. Zapata,
Di Li,
Yuefang Wu,
Quan Qian,
Xianghua Li
Abstract:
This paper was withdrawed from the ApJ after the comments from the referee, please be carefully.
This paper was withdrawed from the ApJ after the comments from the referee, please be carefully.
△ Less
Submitted 20 October, 2019; v1 submitted 3 July, 2018;
originally announced July 2018.
-
Large-scale Distance Metric Learning with Uncertainty
Authors:
Qi Qian,
Jiasheng Tang,
Hao Li,
Shenghuo Zhu,
Rong **
Abstract:
Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which makes it challenging for DML to handle the large-…
▽ More
Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which makes it challenging for DML to handle the large-scale data set. Besides, the real-world data may contain various uncertainty, especially for the image data. The uncertainty can mislead the learning procedure and cause the performance degradation. By investigating the image data, we find that the original data can be observed from a small set of clean latent examples with different distortions. In this work, we propose the margin preserving metric learning framework to learn the distance metric and latent examples simultaneously. By leveraging the ideal properties of latent examples, the training efficiency can be improved significantly while the learned metric also becomes robust to the uncertainty in the original data. Furthermore, we can show that the metric is learned from latent examples only, but it can preserve the large margin property even for the original data. The empirical study on the benchmark image data sets demonstrates the efficacy and efficiency of the proposed method.
△ Less
Submitted 25 May, 2018;
originally announced May 2018.
-
Robust Optimization over Multiple Domains
Authors:
Qi Qian,
Shenghuo Zhu,
Jiasheng Tang,
Rong **,
Baigui Sun,
Hao Li
Abstract:
In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-recognition model should be applicable to hand-writte…
▽ More
In this work, we study the problem of learning a single model for multiple domains. Unlike the conventional machine learning scenario where each domain can have the corresponding model, multiple domains (i.e., applications/users) may share the same machine learning model due to maintenance loads in cloud computing services. For example, a digit-recognition model should be applicable to hand-written digits, house numbers, car plates, etc. Therefore, an ideal model for cloud computing has to perform well at each applicable domain. To address this new challenge from cloud computing, we develop a framework of robust optimization over multiple domains. In lieu of minimizing the empirical risk, we aim to learn a model optimized to the adversarial distribution over multiple domains. Hence, we propose to learn the model and the adversarial distribution simultaneously with the stochastic algorithm for efficiency. Theoretically, we analyze the convergence rate for convex and non-convex models. To our best knowledge, we first study the convergence rate of learning a robust non-convex model with a practical algorithm. Furthermore, we demonstrate that the robustness of the framework and the convergence rate can be further enhanced by appropriate regularizers over the adversarial distribution. The empirical study on real-world fine-grained visual categorization and digits recognition tasks verifies the effectiveness and efficiency of the proposed framework.
△ Less
Submitted 14 November, 2018; v1 submitted 19 May, 2018;
originally announced May 2018.
-
Jet launching in resistive GR-MHD black hole - accretion disk systems
Authors:
Qian Qian,
Christian Fendt,
Christos Vourellis
Abstract:
We investigate the launching mechanism of relativistic jets from black hole sources, in particular the strong winds from the surrounding accretion disk. Numerical investigations of the disk wind launching - the simulation of the accretion-ejection transition - have so far almost only been done for non-relativistic systems. From these simulations we know that resistivity, or magnetic diffusivity, p…
▽ More
We investigate the launching mechanism of relativistic jets from black hole sources, in particular the strong winds from the surrounding accretion disk. Numerical investigations of the disk wind launching - the simulation of the accretion-ejection transition - have so far almost only been done for non-relativistic systems. From these simulations we know that resistivity, or magnetic diffusivity, plays an important role for the launching process. Here, we extend this treatment to general relativistic magnetohydrodynamics (GR-MHD) applying the resistive GR-MHD code rHARM. Our model setup considers a thin accretion disk threaded by a large-scale open magnetic field. We run a series of simulations with different Kerr parameter, field strength and diffusivity level. Indeed we find strong disk winds with, however, mildly relativistic speed, the latter most probably due to our limited computational domain. Further, we find that magnetic diffusivity lowers the efficiency of accretion and ejection, as it weakens the efficiency of the magnetic lever arm of the disk wind. As major driving force of the disk wind we disentangle the toroidal magnetic field pressure gradient, however,magneto-centrifugal driving may also contribute. Black hole rotation in our simulations suppresses the accretion rate due to an enhanced toroidal magnetic field pressure that seems to be induced by frame-dragging. Comparing the energy fluxes from the Blandford-Znajek-driven central spine and the surrounding disk wind, we find that the total electromagnetic energy flux is dominated by the total matter energy flux of the disk wind (by a factor 20). The kinetic energy flux of the matter outflow is comparatively small and comparable to the Blandford-Znajek electromagnetic energy flux.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.
-
Viewport Adaptation-Based Immersive Video Streaming: Perceptual Modeling and Applications
Authors:
Shaowei Xie,
Qiu Shen,
Yiling Xu,
Qiaojian Qian,
Shaowei Wang,
Zhan Ma,
Wenjun Zhang
Abstract:
Immersive video offers the freedom to navigate inside virtualized environment. Instead of streaming the bulky immersive videos entirely, a viewport (also referred to as field of view, FoV) adaptive streaming is preferred. We often stream the high-quality content within current viewport, while reducing the quality of representation elsewhere to save the network bandwidth consumption. Consider that…
▽ More
Immersive video offers the freedom to navigate inside virtualized environment. Instead of streaming the bulky immersive videos entirely, a viewport (also referred to as field of view, FoV) adaptive streaming is preferred. We often stream the high-quality content within current viewport, while reducing the quality of representation elsewhere to save the network bandwidth consumption. Consider that we could refine the quality when focusing on a new FoV, in this paper, we model the perceptual impact of the quality variations (through adapting the quantization stepsize and spatial resolution) with respect to the refinement duration, and yield a product of two closed-form exponential functions that well explain the joint quantization and resolution induced quality impact. Analytical model is cross-validated using another set of data, where both Pearson and Spearman's rank correlation coefficients are close to 0.98. Our work is devised to optimize the adaptive FoV streaming of the immersive video under limited network resource. Numerical results show that our proposed model significantly improves the quality of experience of users, with about 9.36\% BD-Rate (Bjontegaard Delta Rate) improvement on average as compared to other representative methods, particularly under the limited bandwidth.
△ Less
Submitted 16 February, 2018;
originally announced February 2018.
-
Possible nematic to smectic phase transition in a two-dimensional electron gas at half-filling
Authors:
Q. Qian,
J. Nakamura,
S. Fallahi,
G. C. Gardner,
M. J. Manfra
Abstract:
Liquid crystalline phases of matter permeate nature and technology, with examples ranging from cell membranes to liquid-crystal displays. Remarkably, electronic liquid crystal phases can exist in two-dimensional electron systems (2DES) at half Landau level filling in the quantum Hall regime. Theory has predicted the existence of a liquid crystal smectic phase that breaks both rotational and transl…
▽ More
Liquid crystalline phases of matter permeate nature and technology, with examples ranging from cell membranes to liquid-crystal displays. Remarkably, electronic liquid crystal phases can exist in two-dimensional electron systems (2DES) at half Landau level filling in the quantum Hall regime. Theory has predicted the existence of a liquid crystal smectic phase that breaks both rotational and translational symmetries. However, previous experiments in 2DES are most consistent with an anisotropic nematic phase breaking only rotational symmetry. Here we report three transport phenomena at half-filling in ultra-low disorder 2DES: a non-monotonic temperature dependence of the sample resistance, dramatic onset of large time-dependent resistance fluctuations, and a sharp feature in the differential resistance suggestive of depinning. These data suggest that a sequence of symmetry-breaking phase transitions occurs as temperature is lowered: first a transition from an isotropic liquid to a nematic phase and finally to a liquid crystal smectic phase.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
Microwave-induced resistance oscillations in a back-gated GaAs quantum well
Authors:
X. Fu,
Q. A. Ebner,
Q. Shi,
M. A. Zudov,
Q. Qian,
M. J. Manfra
Abstract:
We performed effective mass measurements employing microwave-induced resistance oscillation in a tunable-density GaAs/AlGaAs quantum well. Our main result is a clear observation of an effective mass increase with decreasing density, in general agreement with earlier studies which investigated the density dependence of the effective mass employing Shubnikov- de Haas oscillations. This finding provi…
▽ More
We performed effective mass measurements employing microwave-induced resistance oscillation in a tunable-density GaAs/AlGaAs quantum well. Our main result is a clear observation of an effective mass increase with decreasing density, in general agreement with earlier studies which investigated the density dependence of the effective mass employing Shubnikov- de Haas oscillations. This finding provides further evidence that microwave-induced resistance oscillations are sensitive to electron-electron interactions and offer a convenient and accurate way to obtain the effective mass.
△ Less
Submitted 31 August, 2017;
originally announced August 2017.
-
Assigning personality/identity to a chatting machine for coherent conversation generation
Authors:
Qiao Qian,
Minlie Huang,
Haizhou Zhao,
**gfang Xu,
Xiaoyan Zhu
Abstract:
Endowing a chatbot with personality or an identity is quite challenging but critical to deliver more realistic and natural conversations. In this paper, we address the issue of generating responses that are coherent to a pre-specified agent profile. We design a model consisting of three modules: a profile detector to decide whether a post should be responded using the profile and which key should…
▽ More
Endowing a chatbot with personality or an identity is quite challenging but critical to deliver more realistic and natural conversations. In this paper, we address the issue of generating responses that are coherent to a pre-specified agent profile. We design a model consisting of three modules: a profile detector to decide whether a post should be responded using the profile and which key should be addressed, a bidirectional decoder to generate responses forward and backward starting from a selected profile value, and a position detector that predicts a word position from which decoding should start given a selected profile value. We show that general conversation data from social media can be used to generate profile-coherent responses. Manual and automatic evaluation shows that our model can deliver more coherent, natural, and diversified responses.
△ Less
Submitted 21 June, 2017; v1 submitted 9 June, 2017;
originally announced June 2017.
-
Quantum Lifetime in Ultra-High Quality GaAs Quantum Wells: Relationship to $Δ_{5/2}$ and Impact of Density Fluctuations
Authors:
Qi Qian,
James R. Nakamura,
Saeed Fallahi,
Geoffrey C. Gardner,
John D. Watson,
Silvia Luscher,
Joshua A. Folk,
Gabor A. Csathy,
Michael J. Manfra
Abstract:
We consider quantum lifetime derived from low-field Shubnikov-de Haas oscillations as a metric of quality of the two-dimensional electron gas in GaAs quantum wells that expresses large excitation gaps in the fractional quantum Hall states of the N=1 Landau level. Analysis indicates two salient features: 1) small density inhomogeneities dramatically impact the amplitude of Shubnikov-de Haas oscilla…
▽ More
We consider quantum lifetime derived from low-field Shubnikov-de Haas oscillations as a metric of quality of the two-dimensional electron gas in GaAs quantum wells that expresses large excitation gaps in the fractional quantum Hall states of the N=1 Landau level. Analysis indicates two salient features: 1) small density inhomogeneities dramatically impact the amplitude of Shubnikov-de Haas oscillations such that the canonical method (cf. Coleridge, Phys. Rev. B \textbf{44}, 3793) for determination of quantum lifetime substantially underestimates $τ_q$ unless density inhomogeneity is explicitly considered; 2) $τ_q$ does not correlate well with quality as measured by $Δ_{5/2}$, the excitation gap of the fractional quantum Hall state at 5/2 filling.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
High temperature resistivity measured at ν = 5/2 as a predictor of 2DEG quality in the N=1 Landau level
Authors:
Qi Qian,
James R. Nakamura,
Saeed Fallahi,
Geoffrey C. Gardner,
John D. Watson,
Michael J. Manfra
Abstract:
We report a high temperature (T = 0.3K) indicator of the excitation gap $Δ_{5/2}$ at the filling factor $ ν=5/2$ fractional quantum Hall state in ultra-high quality AlGaAs/GaAs two-dimensional electron gases. As the lack of correlation between mobility $μ$ and $Δ_{5/2}$ has been well established in previous experiments, we define, analyze and discuss the utility of a different metric $ρ_{5/2}$, th…
▽ More
We report a high temperature (T = 0.3K) indicator of the excitation gap $Δ_{5/2}$ at the filling factor $ ν=5/2$ fractional quantum Hall state in ultra-high quality AlGaAs/GaAs two-dimensional electron gases. As the lack of correlation between mobility $μ$ and $Δ_{5/2}$ has been well established in previous experiments, we define, analyze and discuss the utility of a different metric $ρ_{5/2}$, the resistivity at $ν=5/2$, as a high temperature predictor of $Δ_{5/2}$. This high-field resistivity reflects the scattering rate of composite fermions. Good correlation between $ρ_{5/2}$ and $Δ_{5/2}$ is observed in both a density tunable device and in a series of identically structured wafers with similar density but vastly different mobility. This correlation can be explained by the fact that both $ρ_{5/2}$ and $Δ_{5/2}$ are sensitive to long-range disorder from remote impurities, while $μ$ is sensitive primarily to disorder localized near the quantum well.
△ Less
Submitted 12 April, 2017;
originally announced April 2017.
-
Effect of density on quantum Hall stripe orientation in tilted magnetic fields
Authors:
Q. Shi,
M. A. Zudov,
Q. Qian,
J. D. Watson,
M. J. Manfra
Abstract:
We investigate quantum Hall stripes under in-plane magnetic field $B_\parallel$ in a variable-density two-dimensional electron gas. At filling factor $ν= 9/2$, we observe one, two, and zero $B_\parallel$-induced reorientations at low, intermediate, and high densities, respectively. The appearance of these distinct regimes is due to a strong density dependence of the $B_\parallel$-induced orienting…
▽ More
We investigate quantum Hall stripes under in-plane magnetic field $B_\parallel$ in a variable-density two-dimensional electron gas. At filling factor $ν= 9/2$, we observe one, two, and zero $B_\parallel$-induced reorientations at low, intermediate, and high densities, respectively. The appearance of these distinct regimes is due to a strong density dependence of the $B_\parallel$-induced orienting mechanism which triggers the second reorientation, rendering stripes \emph{parallel} to $B_\parallel$. In contrast, the mechanism which reorients stripes perpendicular to $B_\parallel$ showed no noticeable dependence on density. Measurements at $ν= 9/2$ and $11/2$ at the same, tilted magnetic field, allows us to rule out density dependence of the native symmetry-breaking field as a dominant factor. Our findings further suggest that screening might play an important role in determining stripe orientation, providing guidance in develo** theories aimed at identifying and describing native and $B_\parallel$-induced symmetry-breaking fields.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.
-
$L^p$ solutions of doubly reflected BSDEs under general assumptions
Authors:
Shengjun Fan,
Qianyun Qian
Abstract:
Under a generalized Mokobodzki condition for reflected BSDEs with two continuous barriers which relates the growth of the generator $g$ and that of the barriers, we establish several existence and uniqueness results on $L^p\ (p>1)$ solutions of doubly reflected BSDEs with generators satisfying a one-sided Osgood condition together with a general growth in the state variable $y$, and a uniform cont…
▽ More
Under a generalized Mokobodzki condition for reflected BSDEs with two continuous barriers which relates the growth of the generator $g$ and that of the barriers, we establish several existence and uniqueness results on $L^p\ (p>1)$ solutions of doubly reflected BSDEs with generators satisfying a one-sided Osgood condition together with a general growth in the state variable $y$, and a uniform continuity condition or a linear growth condition in the state variable $z$. This Mokobodzki condition is also proved to be necessary for existence of the $L^p$ solutions. And, we prove that the $L^p$ solutions can be approximated by the penalization method and by some sequences of the $L^p$ solutions of doubly reflected BSDEs.
△ Less
Submitted 21 February, 2021; v1 submitted 15 January, 2017;
originally announced January 2017.
-
Linguistically Regularized LSTMs for Sentiment Classification
Authors:
Qiao Qian,
Minlie Huang,
**hao Lei,
Xiaoyan Zhu
Abstract:
Sentiment understanding has been a long-term goal of AI in the past decades. This paper deals with sentence-level sentiment classification. Though a variety of neural network models have been proposed very recently, however, previous models either depend on expensive phrase-level annotation, whose performance drops substantially when trained with only sentence-level annotation; or do not fully emp…
▽ More
Sentiment understanding has been a long-term goal of AI in the past decades. This paper deals with sentence-level sentiment classification. Though a variety of neural network models have been proposed very recently, however, previous models either depend on expensive phrase-level annotation, whose performance drops substantially when trained with only sentence-level annotation; or do not fully employ linguistic resources (e.g., sentiment lexicons, negation words, intensity words), thus not being able to produce linguistically coherent representations. In this paper, we propose simple models trained with sentence-level annotation, but also attempt to generating linguistically coherent representations by employing regularizers that model the linguistic role of sentiment lexicons, negation words, and intensity words. Results show that our models are effective to capture the sentiment shifting effect of sentiment, negation, and intensity words, while still obtain competitive results without sacrificing the models' simplicity.
△ Less
Submitted 25 April, 2017; v1 submitted 11 November, 2016;
originally announced November 2016.
-
rHARM: Accretion and Ejection in Resistive GR-MHD
Authors:
Qian Qian,
Christian Fendt,
Scott Noble,
Matteo Bugli
Abstract:
Turbulent magnetic diffusivity plays an important role for accretion disks and the launching of disk winds. We have implemented magnetic diffusivity, respective resistivity in the general relativistic MHD code HARM. This paper describes the theoretical background of our implementation, its numerical realization, our numerical tests and preliminary applications. The test simulations of the new code…
▽ More
Turbulent magnetic diffusivity plays an important role for accretion disks and the launching of disk winds. We have implemented magnetic diffusivity, respective resistivity in the general relativistic MHD code HARM. This paper describes the theoretical background of our implementation, its numerical realization, our numerical tests and preliminary applications. The test simulations of the new code rHARM are compared with an analytic solution of the diffusion equation and a classical shock tube problem. We have further investigated the evolution of the magneto-rotational instability (MRI) in tori around black holes for a range of magnetic diffusivities. We find indication for a critical magnetic diffusivity (for our setup) beyond which no MRI develops in the linear regime and for which accretion of torus material to the black hole is delayed. Preliminary simulations of magnetically diffusive thin accretion disks around Schwarzschild black holes that are threaded by a large-scale poloidal magnetic field show the launching of disk winds with mass fluxes of about 50% of the accretion rate. The disk magnetic diffusivity allows for efficient disk accretion that replenishes the mass reservoir of the inner disk area and thus allows for long-term simulations of wind launching for more than 5000 time units.
△ Less
Submitted 14 October, 2016;
originally announced October 2016.
-
Similarity Learning via Adaptive Regression and Its Application to Image Retrieval
Authors:
Qi Qian,
Inci M. Baytas,
Rong **,
Anil Jain,
Shenghuo Zhu
Abstract:
We study the problem of similarity learning and its application to image retrieval with large-scale data. The similarity between pairs of images can be measured by the distances between their high dimensional representations, and the problem of learning the appropriate similarity is often addressed by distance metric learning. However, distance metric learning requires the learned metric to be a P…
▽ More
We study the problem of similarity learning and its application to image retrieval with large-scale data. The similarity between pairs of images can be measured by the distances between their high dimensional representations, and the problem of learning the appropriate similarity is often addressed by distance metric learning. However, distance metric learning requires the learned metric to be a PSD matrix, which is computational expensive and not necessary for retrieval ranking problem. On the other hand, the bilinear model is shown to be more flexible for large-scale image retrieval task, hence, we adopt it to learn a matrix for estimating pairwise similarities under the regression framework. By adaptively updating the target matrix in regression, we can mimic the hinge loss, which is more appropriate for similarity learning problem. Although the regression problem can have the closed-form solution, the computational cost can be very expensive. The computational challenges come from two aspects: the number of images can be very large and image features have high dimensionality. We address the first challenge by compressing the data by a randomized algorithm with the theoretical guarantee. For the high dimensional issue, we address it by taking low rank assumption and applying alternating method to obtain the partial matrix, which has a global optimal solution. Empirical studies on real world image datasets (i.e., Caltech and ImageNet) demonstrate the effectiveness and efficiency of the proposed method.
△ Less
Submitted 5 December, 2015;
originally announced December 2015.
-
Towards Making High Dimensional Distance Metric Learning Practical
Authors:
Qi Qian,
Rong **,
Lijun Zhang,
Shenghuo Zhu
Abstract:
In this work, we study distance metric learning (DML) for high dimensional data. A typical approach for DML with high dimensional data is to perform the dimensionality reduction first before learning the distance metric. The main shortcoming of this approach is that it may result in a suboptimal solution due to the subspace removed by the dimensionality reduction method. In this work, we present a…
▽ More
In this work, we study distance metric learning (DML) for high dimensional data. A typical approach for DML with high dimensional data is to perform the dimensionality reduction first before learning the distance metric. The main shortcoming of this approach is that it may result in a suboptimal solution due to the subspace removed by the dimensionality reduction method. In this work, we present a dual random projection frame for DML with high dimensional data that explicitly addresses the limitation of dimensionality reduction for DML. The key idea is to first project all the data points into a low dimensional space by random projection, and compute the dual variables using the projected vectors. It then reconstructs the distance metric in the original space using the estimated dual variables. The proposed method, on one hand, enjoys the light computation of random projection, and on the other hand, alleviates the limitation of most dimensionality reduction methods. We verify both empirically and theoretically the effectiveness of the proposed algorithm for high dimensional DML.
△ Less
Submitted 14 September, 2015;
originally announced September 2015.
-
Gate-tunable high mobility remote-doped InSb/In_{1-x}Al_{x}Sb quantum well heterostructures
Authors:
Wei Yi,
Andrey A. Kiselev,
Jacob Thorp,
Ramsey Noah,
Binh-Minh Nguyen,
Steven Bui,
Rajesh D. Rajavel,
Tahir Hussain,
Mark Gyure,
Philip Kratz,
Qi Qian,
Michael J. Manfra,
Vlad S. Pribiag,
Leo P. Kouwenhoven,
Charles M. Marcus,
Marko Sokolich
Abstract:
Gate-tunable high-mobility InSb/In_{1-x}Al_{x}Sb quantum wells (QWs) grown on GaAs substrates are reported. The QW two-dimensional electron gas (2DEG) channel mobility in excess of 200,000 cm^{2}/Vs is measured at T=1.8K. In asymmetrically remote-doped samples with an HfO_{2} gate dielectric formed by atomic layer deposition, parallel conduction is eliminated and complete 2DEG channel depletion is…
▽ More
Gate-tunable high-mobility InSb/In_{1-x}Al_{x}Sb quantum wells (QWs) grown on GaAs substrates are reported. The QW two-dimensional electron gas (2DEG) channel mobility in excess of 200,000 cm^{2}/Vs is measured at T=1.8K. In asymmetrically remote-doped samples with an HfO_{2} gate dielectric formed by atomic layer deposition, parallel conduction is eliminated and complete 2DEG channel depletion is reached with minimal hysteresis in gate bias response of the 2DEG electron density. The integer quantum Hall effect with Landau level filling factor down to 1 is observed. A high-transparency non-alloyed Ohmic contact to the 2DEG with contact resistance below 1Ω \cdot mm is achieved at 1.8K.
△ Less
Submitted 23 March, 2015;
originally announced March 2015.
-
Fine-Grained Visual Categorization via Multi-stage Metric Learning
Authors:
Qi Qian,
Rong **,
Shenghuo Zhu,
Yuanqing Lin
Abstract:
Fine-grained visual categorization (FGVC) is to categorize objects into subordinate classes instead of basic classes. One major challenge in FGVC is the co-occurrence of two issues: 1) many subordinate classes are highly correlated and are difficult to distinguish, and 2) there exists the large intra-class variation (e.g., due to object pose). This paper proposes to explicitly address the above tw…
▽ More
Fine-grained visual categorization (FGVC) is to categorize objects into subordinate classes instead of basic classes. One major challenge in FGVC is the co-occurrence of two issues: 1) many subordinate classes are highly correlated and are difficult to distinguish, and 2) there exists the large intra-class variation (e.g., due to object pose). This paper proposes to explicitly address the above two issues via distance metric learning (DML). DML addresses the first issue by learning an embedding so that data points from the same class will be pulled together while those from different classes should be pushed apart from each other; and it addresses the second issue by allowing the flexibility that only a portion of the neighbors (not all data points) from the same class need to be pulled together. However, feature representation of an image is often high dimensional, and DML is known to have difficulty in dealing with high dimensional feature vectors since it would require $\mathcal{O}(d^2)$ for storage and $\mathcal{O}(d^3)$ for optimization. To this end, we proposed a multi-stage metric learning framework that divides the large-scale high dimensional learning problem to a series of simple subproblems, achieving $\mathcal{O}(d)$ computational complexity. The empirical study with FVGC benchmark datasets verifies that our method is both effective and efficient compared to the state-of-the-art FGVC approaches.
△ Less
Submitted 4 June, 2015; v1 submitted 3 February, 2014;
originally announced February 2014.
-
Efficient Distance Metric Learning by Adaptive Sampling and Mini-Batch Stochastic Gradient Descent (SGD)
Authors:
Qi Qian,
Rong **,
**feng Yi,
Lijun Zhang,
Shenghuo Zhu
Abstract:
Distance metric learning (DML) is an important task that has found applications in many domains. The high computational cost of DML arises from the large number of variables to be determined and the constraint that a distance metric has to be a positive semi-definite (PSD) matrix. Although stochastic gradient descent (SGD) has been successfully applied to improve the efficiency of DML, it can stil…
▽ More
Distance metric learning (DML) is an important task that has found applications in many domains. The high computational cost of DML arises from the large number of variables to be determined and the constraint that a distance metric has to be a positive semi-definite (PSD) matrix. Although stochastic gradient descent (SGD) has been successfully applied to improve the efficiency of DML, it can still be computationally expensive because in order to ensure that the solution is a PSD matrix, it has to, at every iteration, project the updated distance metric onto the PSD cone, an expensive operation. We address this challenge by develo** two strategies within SGD, i.e. mini-batch and adaptive sampling, to effectively reduce the number of updates (i.e., projections onto the PSD cone) in SGD. We also develop hybrid approaches that combine the strength of adaptive sampling with that of mini-batch online learning techniques to further improve the computational efficiency of SGD for DML. We prove the theoretical guarantees for both adaptive sampling and mini-batch based approaches for DML. We also conduct an extensive empirical study to verify the effectiveness of the proposed algorithms for DML.
△ Less
Submitted 3 April, 2013;
originally announced April 2013.
-
Metric Learning across Heterogeneous Domains by Respectively Aligning Both Priors and Posteriors
Authors:
Qiang Qian,
Songcan Chen
Abstract:
In this paper, we attempts to learn a single metric across two heterogeneous domains where source domain is fully labeled and has many samples while target domain has only a few labeled samples but abundant unlabeled samples. To the best of our knowledge, this task is seldom touched. The proposed learning model has a simple underlying motivation: all the samples in both the source and the target d…
▽ More
In this paper, we attempts to learn a single metric across two heterogeneous domains where source domain is fully labeled and has many samples while target domain has only a few labeled samples but abundant unlabeled samples. To the best of our knowledge, this task is seldom touched. The proposed learning model has a simple underlying motivation: all the samples in both the source and the target domains are mapped into a common space, where both their priors P(sample)s and their posteriors P(label|sample)s are forced to be respectively aligned as much as possible. We show that the two map**s, from both the source domain and the target domain to the common space, can be reparameterized into a single positive semi-definite(PSD) matrix. Then we develop an efficient Bregman Projection algorithm to optimize the PDS matrix over which a LogDet function is used to regularize. Furthermore, we also show that this model can be easily kernelized and verify its effectiveness in crosslanguage retrieval task and cross-domain object recognition task.
△ Less
Submitted 9 August, 2012;
originally announced August 2012.
-
Raman Model Predicting Hardness of Covalent Crystals
Authors:
Xiang-Feng Zhou,
Quang-Rui Qian,
Jian Sun,
Yongjun Tian,
Hui-Tian Wang
Abstract:
Based on the fact that both hardness and vibrational Raman spectrum depend on the intrinsic property of chemical bonds, we propose a new theoretical model for predicting hardness of a covalent crystal. The quantitative relationship between hardness and vibrational Raman frequencies deduced from the typical zincblende covalent crystals is validated to be also applicable for the complex multicompo…
▽ More
Based on the fact that both hardness and vibrational Raman spectrum depend on the intrinsic property of chemical bonds, we propose a new theoretical model for predicting hardness of a covalent crystal. The quantitative relationship between hardness and vibrational Raman frequencies deduced from the typical zincblende covalent crystals is validated to be also applicable for the complex multicomponent crystals. This model enables us to nondestructively and indirectly characterize the hardness of novel superhard materials synthesized under ultra-high pressure condition with the in situ Raman spectrum measurement.
△ Less
Submitted 25 December, 2009;
originally announced December 2009.