Search | arXiv e-print repository

Neural Network-Based Tracking and 3D Reconstruction of Baseball Pitch Trajectories from Single-View 2D Video

Abstract: In this paper, we present a neural network-based approach for tracking and reconstructing the trajectories of baseball pitches from 2D video footage to 3D coordinates. We utilize OpenCV's CSRT algorithm to accurately track the baseball and fixed reference points in 2D video frames. These tracked pixel coordinates are then used as input features for our neural network model, which comprises multipl… ▽ More In this paper, we present a neural network-based approach for tracking and reconstructing the trajectories of baseball pitches from 2D video footage to 3D coordinates. We utilize OpenCV's CSRT algorithm to accurately track the baseball and fixed reference points in 2D video frames. These tracked pixel coordinates are then used as input features for our neural network model, which comprises multiple fully connected layers to map the 2D coordinates to 3D space. The model is trained on a dataset of labeled trajectories using a mean squared error loss function and the Adam optimizer, optimizing the network to minimize prediction errors. Our experimental results demonstrate that this approach achieves high accuracy in reconstructing 3D trajectories from 2D inputs. This method shows great potential for applications in sports analysis, coaching, and enhancing the accuracy of trajectory predictions in various sports. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.10238 [pdf, other]

Rounding Large Independent Sets on Expanders

Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has the second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an inde… ▽ More We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has the second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an independent set of size $(1/2-ε)n$. Our second result above can be refined to require only a weaker vertex expansion property with an efficient certificate. Somewhat surprisingly, we observe that the analogous task of finding a linear-sized independent set in almost $4$-colorable one-sided expanders (even when the second eigenvalue is $o_n(1)$) is NP-hard, assuming the Unique Games Conjecture. All prior algorithms that beat the worst-case guarantees for this problem rely on bottom eigenspace enumeration techniques (following the classical spectral methods of Alon and Kahale) and require two-sided expansion, meaning a bounded number of negative eigenvalues of magnitude $Ω(1)$. Such techniques naturally extend to almost $k$-colorable graphs for any constant $k$, in contrast to analogous guarantees on one-sided expanders, which are Unique Games-hard to achieve for $k \geq 4$. Our rounding builds on the method of simulating multiple samples from a pseudodistribution introduced by Barak et. al. for rounding Unique Games instances. The key to our analysis is a new clustering property of large independent sets in expanding graphs - every large independent set has a larger-than-expected intersection with some member of a small list - and its formalization in the low-degree sum-of-squares proof system. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 57 pages, 3 figures

arXiv:2405.05373 [pdf, other]

Certifying Euclidean Sections and Finding Planted Sparse Vectors Beyond the $\sqrt{n}$ Dimension Threshold

Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Prasad Raghavendra

Abstract: We consider the task of certifying that a random $d$-dimensional subspace $X$ in $\mathbb{R}^n$ is well-spread - every vector $x \in X$ satisfies $c\sqrt{n} \|x\|_2 \leq \|x\|_1 \leq \sqrt{n}\|x\|_2$. In a seminal work, Barak et. al. showed a polynomial-time certification algorithm when $d \leq O(\sqrt{n})$. On the other hand, when $d \gg \sqrt{n}$, the certification task is information-theoretica… ▽ More We consider the task of certifying that a random $d$-dimensional subspace $X$ in $\mathbb{R}^n$ is well-spread - every vector $x \in X$ satisfies $c\sqrt{n} \|x\|_2 \leq \|x\|_1 \leq \sqrt{n}\|x\|_2$. In a seminal work, Barak et. al. showed a polynomial-time certification algorithm when $d \leq O(\sqrt{n})$. On the other hand, when $d \gg \sqrt{n}$, the certification task is information-theoretically possible but there is evidence that it is computationally hard [MW21,Cd22], a phenomenon known as the information-computation gap. In this paper, we give subexponential-time certification algorithms in the $d \gg \sqrt{n}$ regime. Our algorithm runs in time $\exp(\widetilde{O}(n^{\varepsilon}))$ when $d \leq \widetilde{O}(n^{(1+\varepsilon)/2})$, establishing a smooth trade-off between runtime and the dimension. Our techniques naturally extend to the related planted problem, where the task is to recover a sparse vector planted in a random subspace. Our algorithm achieves the same runtime and dimension trade-off for this task. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 32 pages, 2 Figures

arXiv:2404.09432 [pdf, other]

The 8th AI City Challenge

Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, **-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

arXiv:2403.15004 [pdf]

ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

Authors: Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Jun-Wei Hsieh, Hui-Kai Su, Wen-Kai Kuo

Abstract: This work presents ParFormer as an enhanced transformer architecture that allows the incorporation of different token mixers into a single stage, hence improving feature extraction capabilities. Integrating both local and global data allows for precise representation of short- and long-range spatial relationships without the need for computationally intensive methods such as shifting windows. Alon… ▽ More This work presents ParFormer as an enhanced transformer architecture that allows the incorporation of different token mixers into a single stage, hence improving feature extraction capabilities. Integrating both local and global data allows for precise representation of short- and long-range spatial relationships without the need for computationally intensive methods such as shifting windows. Along with the parallel token mixer encoder, We offer the Convolutional Attention Patch Embedding (CAPE) as an enhancement of standard patch embedding to improve token mixer extraction with a convolutional attention module. Our comprehensive evaluation demonstrates that our ParFormer outperforms CNN-based and state-of-the-art transformer-based architectures in image classification and several complex tasks such as object recognition. The proposed CAPE has been demonstrated to benefit the overall MetaFormer architecture, even while utilizing the Identity Map** Token Mixer, resulting in a 0.5\% increase in accuracy. The ParFormer models outperformed ConvNeXt and Swin Transformer for the pure convolution and transformer model in accuracy. Furthermore, our model surpasses the current leading hybrid transformer by reaching competitive Top-1 scores in the ImageNet-1K classification test. Specifically, our model variants with 11M, 23M, and 34M parameters achieve scores of 80.4\%, 82.1\%, and 83.1\%, respectively. Code: https://github.com/novendrastywn/ParFormer-CAPE-2024 △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.02452 [pdf]

Programming the scalable optical learning operator with spatial-spectral optimization

Authors: Yi Zhou, Jih-Liang Hsieh, Ilker Oguz, Mustafa Yildirim, Niyazi Ulas Dinc, Carlo Gigli, Kenneth K. Y. Wong, Christophe Moser, Demetri Psaltis

Abstract: Electronic computers have evolved drastically over the past years with an ever-growing demand for improved performance. However, the transfer of information from memory and high energy consumption have emerged as issues that require solutions. Optical techniques are considered promising solutions to these problems with higher speed than their electronic counterparts and with reduced energy consump… ▽ More Electronic computers have evolved drastically over the past years with an ever-growing demand for improved performance. However, the transfer of information from memory and high energy consumption have emerged as issues that require solutions. Optical techniques are considered promising solutions to these problems with higher speed than their electronic counterparts and with reduced energy consumption. Here, we use the optical reservoir computing framework we have previously described (Scalable Optical Learning Operator or SOLO) to program the spatial-spectral output of the light after nonlinear propagation in a multimode fiber. The novelty in the current paper is that the system is programmed through an output sampling scheme, similar to that used in hyperspectral imaging in astronomy. Linear and nonlinear computations are performed by light in the multimode fiber and the high dimensional spatial-spectral information at the fiber output is optically programmed before it reaches the camera. We then used a digital computer to classify the programmed output of the multi-mode fiber using a simple, single layer network. When combining front-end programming and the proposed spatial-spectral programming, we were able to achieve 89.9% classification accuracy on the dataset consisting of chest X-ray images from COVID-19 patients. At the same time, we obtained a decrease of 99% in the number of tunable parameters compared to an equivalently performing digital neural network. These results show that the performance of programmed SOLO is comparable with cutting-edge electronic computing platforms, albeit with a much-reduced number of electronic operations. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.02363 [pdf, other]

Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity

Authors: Ying-Hsuan Wu, Jun-Wei Hsieh, Li Xin, Shin-You Teng, Yi-Kuan Hsieh, Ming-Ching Chang

Abstract: Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining s… ▽ More Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining soft-label refurbishing with multi-expert ensemble learning. In the first stage of robust soft label refurbishing, we acquire unbiased features through contrastive learning, making preliminary predictions using a classifier trained with a carefully designed BAlanced Noise-tolerant Cross-entropy (BANC) loss. In the second stage, our label refurbishment method is applied to obtain soft labels for multi-expert ensemble learning, providing a principled solution to the long-tail noisy label problem. Experiments conducted across multiple benchmarks validate the superiority of our approach, Label Refurbishment considering Label Rarity (LR^2), achieving remarkable accuracies of 94.19% and 77.05% on simulated noisy CIFAR-10 and CIFAR-100 long-tail datasets, as well as 77.74% and 81.40% on real-noise long-tail datasets, Food-101N and Animal-10N, surpassing existing state-of-the-art methods. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.11590 [pdf, ps, other]

Small Even Covers, Locally Decodable Codes and Restricted Subgraphs of Edge-Colored Kikuchi Graphs

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty, David Munhá Correia, Benny Sudakov

Abstract: Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refutin… ▽ More Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refuting unsatisfiable $k$-SAT formulas. Analogous to the irregular Moore bound of Alon, Hoory, and Linial (2002), in 2008, Feige conjectured an extremal trade-off between the number of hyperedges and the length of the smallest even cover in a $k$-uniform hypergraph. This conjecture was recently settled up to a multiplicative logarithmic factor in the number of hyperedges (Guruswami, Kothari, and 1Manohar 2022 and Hsieh, Kothari, and Mohanty 2023). These works introduce the new technique that relates hypergraph even covers to cycles in the associated \emph{Kikuchi} graphs. Their analysis of these Kikuchi graphs, especially for odd $k$, is rather involved and relies on matrix concentration inequalities. In this work, we give a simple and purely combinatorial argument that recovers the best-known bound for Feige's conjecture for even $k$. We also introduce a novel variant of a Kikuchi graph which together with this argument improves the logarithmic factor in the best-known bounds for odd $k$. As an application of our ideas, we also give a purely combinatorial proof of the improved lower bounds (Alrabiah, Guruswami, Kothari and Manohar, 2023) on 3-query binary linear locally decodable codes. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 19 pages

arXiv:2312.16771 [pdf, other]

Scale-Aware Crowd Count Network with Annotation Error Correction

Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Li Xin

Abstract: Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varyi… ▽ More Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varying pixel distribution with respect to the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a ``scale-aware'' architecture with error-correcting capabilities of noisy annotations. For the first time, we {\bf simultaneously} model labeling errors (mean) and scale variations (variance) by spatially-varying Gaussian distributions to produce fine-grained heat maps for crowd counting. Furthermore, the proposed adaptive Gaussian kernel variance enables the model to learn dynamically with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. The performance of SACC-Net is extensively evaluated on four public datasets: UCF-QNRF, UCF CC 50, NWPU, and ShanghaiTech A-B. Experimental results demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd counting accuracy. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 7 pages, 6 figues. arXiv admin note: text overlap with arXiv:2211.06835

arXiv:2311.09354 [pdf]

doi 10.1063/5.0189222

Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

Authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani

Abstract: Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measure… ▽ More Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measurements to assess the continuity of possible cellular states and responses to perturbation across cell culture conditions is needed. In this work, we demonstrate an image processing algorithm for quantifying cellular viability in 3D cultures without the need for assay-based indicators. We show that our algorithm performs similarly to a pair of human experts in whole-well images over a range of days and culture matrix compositions. To demonstrate potential utility, we perform a longitudinal study investigating the impact of a known therapeutic on pancreatic cancer spheroids. Using images taken with a high content imaging system, the algorithm successfully tracks viability at the individual spheroid and whole-well level. The method we propose reduces analysis time by 97% in comparison to the experts. Because the method is independent of the microscope or imaging system used, this approach lays the foundation for accelerating progress in and for improving the robustness and reproducibility of 3D culture analysis across biological and clinical research. △ Less

Submitted 11 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 52 total pages, Main text and SI included, 35 figures (5 main text, 30 supplemental), 9 tables, 6 datasets (provided on linked GitHub), linked image files on Zenodo

arXiv:2310.00393 [pdf, ps, other]

New SDP Roundings and Certifiable Approximation for Cubic Optimization

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Lucas Pesenti, Luca Trevisan

Abstract: We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves o… ▽ More We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves on the rounding algorithms of Bhattiprolu et. al. [BGG+17] that need quasi-polynomial time to obtain a similar approximation guarantee. Over the $n$-dimensional hypercube, our results match the guarantee of a search algorithm of Khot and Naor [KN08] that obtains a similar approximation ratio via techniques from convex geometry. Unlike their method, our algorithm obtains an upper bound on the integrality gap of SDP relaxations for the problem and as a result, also yields a certificate on the optimum value of the input instance. Our results naturally generalize to homogeneous polynomials of higher degree and imply improved algorithms for approximating satisfiable instances of Max-3SAT. Our main motivation is the stark lack of rounding techniques for SDP relaxations of higher degree polynomial optimization in sharp contrast to a rich theory of SDP roundings for the quadratic case. Our rounding algorithms introduce two new ideas: 1) a new polynomial reweighting based method to round sum-of-squares relaxations of higher degree polynomial maximization problems, and 2) a general technique to compress such relaxations down to substantially smaller SDPs by relying on an explicit construction of certain hitting sets. We hope that our work will inspire improved rounding algorithms for polynomial optimization and related problems. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.16897 [pdf, other]

Efficient Algorithms for Semirandom Planted CSPs at the Refutation Threshold

Authors: Venkatesan Guruswami, Jun-Ting Hsieh, Pravesh K. Kothari, Peter Manohar

Abstract: We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbi… ▽ More We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbitrary distribution "shifted by $x^*$" so that $x^*$ satisfies each constraint. For an $n$ variable semirandom planted instance of a $k$-arity CSP, our algorithm runs in polynomial time and outputs an assignment that satisfies all but a $o(1)$-fraction of constraints, provided that the instance has at least $\tilde{O}(n^{k/2})$ constraints. This matches, up to $polylog(n)$ factors, the clause threshold for algorithms that solve fully random planted CSPs [FPV15], as well as algorithms that refute random and semirandom CSPs [AOW15, AGK21]. Our result shows that despite having worst-case clause structure, the randomness in the literal patterns makes semirandom planted CSPs significantly easier than worst-case, where analogous results require $O(n^k)$ constraints [AKK95, FLP16]. Perhaps surprisingly, our algorithm follows a significantly different conceptual framework when compared to the recent resolution of semirandom CSP refutation. This turns out to be inherent and, at a technical level, can be attributed to the need for relative spectral approximation of certain random matrices - reminiscent of the classical spectral sparsification - which ensures that an SDP can certify the uniqueness of the planted assignment. In contrast, in the refutation setting, it suffices to obtain a weaker guarantee of absolute upper bounds on the spectral norm of related matrices. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: FOCS 2023

arXiv:2308.12817 [pdf, other]

MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild

Authors: Yu-Xiang Zeng, Jun-Wei Hsieh, Xin Li, Ming-Ching Chang

Abstract: Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting… ▽ More Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting conditions. MixNet incorporates two key modules: (1) the Feature Shuffle Network (FSNet) to serve as the backbone and (2) the Central Transformer Block (CTBlock) to exploit the 1D manifold constraint of the scene text. We first introduce a novel feature shuffling strategy in FSNet to facilitate the exchange of features across multiple scales, generating high-resolution features superior to popular ResNet and HRNet. The FSNet backbone has achieved significant improvements over many existing text detection methods, including PAN, DB, and FAST. Then we design a complementary CTBlock to leverage center line based features similar to the medial axis of text regions and show that it can outperform contour-based approaches in challenging cases when small scene texts appear closely. Extensive experimental results show that MixNet, which mixes FSNet with CTBlock, achieves state-of-the-art results on multiple scene text detection datasets. △ Less

Submitted 27 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.07427 [pdf, other]

doi 10.1145/3610092

Nip it in the Bud: Moderation Strategies in Open Source Software Projects and the Role of Bots

Authors: Jane Hsieh, Joselyn Kim, Laura Dabbish, Haiyi Zhu

Abstract: Much of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. T… ▽ More Much of our modern digital infrastructure relies critically upon open sourced software. The communities responsible for building this cyberinfrastructure require maintenance and moderation, which is often supported by volunteer efforts. Moderation, as a non-technical form of labor, is a necessary but often overlooked task that maintainers undertake to sustain the community around an OSS project. This study examines the various structures and norms that support community moderation, describes the strategies moderators use to mitigate conflicts, and assesses how bots can play a role in assisting these processes. We interviewed 14 practitioners to uncover existing moderation practices and ways that automation can provide assistance. Our main contributions include a characterization of moderated content in OSS projects, moderation techniques, as well as perceptions of and recommendations for improving the automation of moderation tasks. We hope that these findings will inform the implementation of more effective moderation practices in open source communities. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.05954 [pdf, other]

Ellipsoid Fitting Up to a Constant

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Aaron Potechin, Jeff Xu

Abstract: In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the exist… ▽ More In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the existence of a positive semidefinite matrix $X$ such that $v_i^{\top}X v_i =1$ for every $1 \leq i \leq m$ - a natural example of a random semidefinite program. SPW conjectured that $m= (1-o(1)) d^2/4$ with high probability. Very recently, Potechin, Turner, Venkat and Wein and Kane and Diakonikolas proved that $m \geq d^2/\log^{O(1)}(d)$ via certain explicit constructions. In this work, we give a substantially tighter analysis of their construction to prove that $m \geq d^2/C$ for an absolute constant $C>0$. This resolves one direction of the SPW conjecture up to a constant. Our analysis proceeds via the method of Graphical Matrix Decomposition that has recently been used to analyze correlated random matrices arising in various areas [BHK+19]. Our key new technical tool is a refined method to prove singular value upper bounds on certain correlated random matrices that are tight up to absolute dimension-independent constants. In contrast, all previous methods that analyze such matrices lose logarithmic factors in the dimension. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: ICALP 2023

arXiv:2307.00826 [pdf, ps, other]

Levitation by a dipole electric field

Authors: **-Rui Tsai, Hong-Yue Huang, Jih-Kang Hsieh, Yu-Ting Cheng, Ying-Pin Tsai, Cheng-Wei Lai, Yu-Hsuan Kao, Wen-Chi Chen, Fu-Li Hsiao, Po-Heng Lin, Tzay-Ming Hong

Abstract: The phenomenon of floating can be fascinating in any field, with its presence seen in art, films, and scientific research. This phenomenon is a captivating and pertinent subject with practical applications, such as Penning traps for antimatter confinement and Ion traps as essential architectures for quantum computing models. In our project, we reproduced the 1893 water bridge experiment using glyc… ▽ More The phenomenon of floating can be fascinating in any field, with its presence seen in art, films, and scientific research. This phenomenon is a captivating and pertinent subject with practical applications, such as Penning traps for antimatter confinement and Ion traps as essential architectures for quantum computing models. In our project, we reproduced the 1893 water bridge experiment using glycerol and first observed that lump-like macroscopic dipole moments can undergo near-periodic oscillations that exhibit floating effects and do not need classical bridge form. By combining experimental analysis, neural networks, investigation of Kelvin force generated by the Finite element method, and exploration of discharging, we gain insights into the mechanisms of motion. Our discovery has overturned the previous impression of a bridge floating in the water, leading to a deeper understanding of the new trap mechanism under strong electric fields with a single pair of electrodes. △ Less

Submitted 2 January, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: 5 pages, 5 figures

arXiv:2306.12972 [pdf, ps, other]

doi 10.1145/3596671.3598576

Designing Individualized Policy and Technology Interventions to Improve Gig Work Conditions

Authors: Jane Hsieh, Oluwatobi Adisa, Sachi Bafna, Haiyi Zhu

Abstract: The gig economy is characterized by short-term contract work completed by independent workers who are paid to perform "gigs", and who have control over when, whether and how they conduct work. Gig economy platforms (e.g., Uber, Lyft, Instacart) offer workers increased job opportunities, lower barriers to entry, and improved flexibility. However, growing evidence suggests that worker well-being and… ▽ More The gig economy is characterized by short-term contract work completed by independent workers who are paid to perform "gigs", and who have control over when, whether and how they conduct work. Gig economy platforms (e.g., Uber, Lyft, Instacart) offer workers increased job opportunities, lower barriers to entry, and improved flexibility. However, growing evidence suggests that worker well-being and gig work conditions have become significant societal issues. In designing public-facing policies and technologies for improving gig work conditions, inherent tradeoffs exist between offering individual flexibility and when attempting to meet all community needs. In platform-based gig work, contractors pursue the flexibility of short-term tasks, but policymakers resist segmenting the population when designing policies to support their work. As platforms offer an ever-increasing variety of services, we argue that policymakers and platform designers must provide more targeted and personalized policies, benefits, and protections for platform-based workers, so that they can lead more successful and sustainable gig work careers. We present in this paper relevant legal and scholarly evidence from the United States to support this position, and make recommendations for future innovations in policy and technology. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.09662 [pdf, other]

Cooperative Multi-Objective Reinforcement Learning for Traffic Signal Control and Carbon Emission Reduction

Authors: Cheng Ruei Tang, Jun Wei Hsieh, Shin You Teng

Abstract: Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weigh… ▽ More Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weights. Our approach involves two types of agents: one focuses on optimizing local traffic at each intersection, while the other aims to optimize global traffic throughput. We evaluate our method using real-world traffic data collected from an Asian country's traffic cameras. Despite the inclusion of a global agent, our solution remains decentralized as this agent is no longer necessary during the inference stage. Our results demonstrate the effectiveness of MOMA-DDPG, outperforming state-of-the-art methods across all performance metrics. Additionally, our proposed system minimizes both waiting time and carbon emissions. Notably, this paper is the first to link carbon emissions and global agents in traffic signal control. △ Less

Submitted 16 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2205.11291

arXiv:2306.00838 [pdf, other]

The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI

Authors: Ahmed W. Moawad, Anastasia Janas, Ujjwal Baid, Divya Ramakrishnan, Rachit Saluja, Nader Ashraf, Leon Jekel, Raisa Amiruddin, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Sanjay Aneja, Syed Muhammad Anwar, Timothy Bergquist, Evan Calabrese, Veronica Chiang, Verena Chung, Gian Marco Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Juan Eugenio Iglesias, Zhifan Jiang , et al. (206 additional authors not shown)

Abstract: The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and chara… ▽ More The translation of AI-generated brain metastases (BM) segmentation into clinical practice relies heavily on diverse, high-quality annotated medical imaging datasets. The BraTS-METS 2023 challenge has gained momentum for testing and benchmarking algorithms using rigorously annotated internationally compiled real-world datasets. This study presents the results of the segmentation challenge and characterizes the challenging cases that impacted the performance of the winning algorithms. Untreated brain metastases on standard anatomic MRI sequences (T1, T2, FLAIR, T1PG) from eight contributed international datasets were annotated in stepwise method: published UNET algorithms, student, neuroradiologist, final approver neuroradiologist. Segmentations were ranked based on lesion-wise Dice and Hausdorff distance (HD95) scores. False positives (FP) and false negatives (FN) were rigorously penalized, receiving a score of 0 for Dice and a fixed penalty of 374 for HD95. Eight datasets comprising 1303 studies were annotated, with 402 studies (3076 lesions) released on Synapse as publicly available datasets to challenge competitors. Additionally, 31 studies (139 lesions) were held out for validation, and 59 studies (218 lesions) were used for testing. Segmentation accuracy was measured as rank across subjects, with the winning team achieving a LesionWise mean score of 7.9. Common errors among the leading teams included false negatives for small lesions and misregistration of masks in space.The BraTS-METS 2023 challenge successfully curated well-annotated, diverse datasets and identified common errors, facilitating the translation of BM segmentation across varied clinical environments and providing personalized volumetric reports to patients undergoing BM treatment. △ Less

Submitted 17 June, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.19170 [pdf]

Forward-Forward Training of an Optical Neural Network

Authors: Ilker Oguz, Junjie Ke, Qifei Wang, Feng Yang, Mustafa Yildirim, Niyazi Ulas Dinc, Jih-Liang Hsieh, Christophe Moser, Demetri Psaltis

Abstract: Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these… ▽ More Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these physical systems poses challenges, as they are difficult to fully characterize and describe with differentiable functions, hindering the use of error backpropagation algorithm. The recently introduced Forward-Forward Algorithm (FFA) eliminates the need for perfect characterization of the learning system and shows promise for efficient training with large numbers of programmable parameters. The FFA does not require backpropagating an error signal to update the weights, rather the weights are updated by only sending information in one direction. The local loss function for each set of trainable weights enables low-power analog hardware implementations without resorting to metaheuristic algorithms or reinforcement learning. In this paper, we present an experiment utilizing multimode nonlinear wave propagation in an optical fiber demonstrating the feasibility of the FFA approach using an optical system. The results show that incorporating optical transforms in multilayer NN architectures trained with the FFA, can lead to performance improvements, even with a relatively small number of trainable weights. The proposed method offers a new path to the challenge of training optical NNs and provides insights into leveraging physical transformations for enhancing NN performance. △ Less

Submitted 10 August, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.17449 [pdf, ps, other]

FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection

Authors: Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Erkhembayar Ganbold, Jun-Wei Hsieh, Ming-Ching Chang, **-Yang Chen, Byambaa Dorj, Hamad Al Jassmi, Ganzorig Batnasan, Fady Alnajjar, Mohammed Abduljabbar, Fang-Pang Lin

Abstract: With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open… ▽ More With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080$\times$1080 and 1280$\times$1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640$\times$640 and 1280$\times$1280, respectively. The dataset will be available on GitHub with PASCAL VOC, MS COCO, and YOLO annotation formats. The FishEye8K benchmark will provide significant contributions to the fisheye video analytics and smart city applications. △ Less

Submitted 6 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: CVPR Workshops 2023

arXiv:2305.04206 [pdf, other]

RATs-NAS: Redirection of Adjacent Trails on GCN for Neural Architecture Search

Authors: Yu-Ming Zhang, Jun-Wei Hsieh, Chun-Chieh Lee, Kuo-Chin Fan

Abstract: Various hand-designed CNN architectures have been developed, such as VGG, ResNet, DenseNet, etc., and achieve State-of-the-Art (SoTA) levels on different tasks. Neural Architecture Search (NAS) now focuses on automatically finding the best CNN architecture to handle the above tasks. However, the verification of a searched architecture is very time-consuming and makes predictor-based methods become… ▽ More Various hand-designed CNN architectures have been developed, such as VGG, ResNet, DenseNet, etc., and achieve State-of-the-Art (SoTA) levels on different tasks. Neural Architecture Search (NAS) now focuses on automatically finding the best CNN architecture to handle the above tasks. However, the verification of a searched architecture is very time-consuming and makes predictor-based methods become an essential and important branch of NAS. Two commonly used techniques to build predictors are graph-convolution networks (GCN) and multilayer perceptron (MLP). In this paper, we consider the difference between GCN and MLP on adjacent operation trails and then propose the Redirected Adjacent Trails NAS (RATs-NAS) to quickly search for the desired neural network architecture. The RATs-NAS consists of two components: the Redirected Adjacent Trails GCN (RATs-GCN) and the Predictor-based Search Space Sampling (P3S) module. RATs-GCN can change trails and their strengths to search for a better neural network architecture. P3S can rapidly focus on tighter intervals of FLOPs in the search space. Based on our observations on cell-based NAS, we believe that architectures with similar FLOPs will perform similarly. Finally, the RATs-NAS consisting of RATs-GCN and P3S beats WeakNAS, Arch-Graph, and others by a significant margin on three sub-datasets of NASBench-201. △ Less

Submitted 8 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

arXiv:2302.13436 [pdf, other]

doi 10.1145/3563657.3595982

Navigating Multi-Stakeholder Incentives and Preferences: Co-Designing Alternatives for the Future of Gig Worker Well-Being

Authors: Jane Hsieh, Miranda Karger, Lucas Zagal, Haiyi Zhu

Abstract: Gig workers, and the products and services they provide, play an increasingly ubiquitous role in our daily lives. But despite growing evidence suggesting that worker well-being in gig economy platforms have become significant societal problems, few studies have investigated possible solutions. We take a stride in this direction by engaging workers, platform employees, and local regulators in a ser… ▽ More Gig workers, and the products and services they provide, play an increasingly ubiquitous role in our daily lives. But despite growing evidence suggesting that worker well-being in gig economy platforms have become significant societal problems, few studies have investigated possible solutions. We take a stride in this direction by engaging workers, platform employees, and local regulators in a series of speed dating workshops using storyboards based on real-life situations to rapidly elicit stakeholder preferences for addressing financial, physical, and social issues related to worker well-being. Our results reveal that existing public and platformic infrastructures fall short in providing workers with resources needed to perform gigs, surfacing a need for multi-platform collaborations, technological innovations, as well as changes in regulations, labor laws, and the public's perception of gig workers, among others. Drawing from multi-stakeholder findings, we discuss these implications for technology, policy, and service as well as avenues for collaboration. △ Less

Submitted 5 June, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

arXiv:2302.01212 [pdf, other]

Explicit two-sided unique-neighbor expanders

Authors: Jun-Ting Hsieh, Theo McKenzie, Sidhanth Mohanty, Pedro Paredes

Abstract: We study the problem of constructing explicit sparse graphs that exhibit strong vertex expansion. Our main result is the first two-sided construction of imbalanced unique-neighbor expanders, meaning bipartite graphs where small sets contained in both the left and right bipartitions exhibit unique-neighbor expansion, along with algebraic properties relevant to constructing quantum codes. Our cons… ▽ More We study the problem of constructing explicit sparse graphs that exhibit strong vertex expansion. Our main result is the first two-sided construction of imbalanced unique-neighbor expanders, meaning bipartite graphs where small sets contained in both the left and right bipartitions exhibit unique-neighbor expansion, along with algebraic properties relevant to constructing quantum codes. Our constructions are obtained from instantiations of the tripartite line product of a large tripartite spectral expander and a sufficiently good constant-sized unique-neighbor expander, a new graph product we defined that generalizes the line product in the work of Alon and Capalbo and the routed product in the work of Asherov and Dinur. To analyze the vertex expansion of graphs arising from the tripartite line product, we develop a sharp characterization of subgraphs that can arise in bipartite spectral expanders, generalizing results of Kahale, which may be of independent interest. By picking appropriate graphs to apply our product to, we give a strongly explicit construction of an infinite family of $(d_1,d_2)$-biregular graphs $(G_n)_{n\ge 1}$ (for large enough $d_1$ and $d_2$) where all sets $S$ with fewer than a small constant fraction of vertices have $Ω(d_1\cdot |S|)$ unique-neighbors (assuming $d_1 \leq d_2$). Additionally, we can also guarantee that subsets of vertices of size up to $\exp(Ω(\sqrt{\log |V(G_n)|}))$ expand losslessly. △ Less

Submitted 15 January, 2024; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: New version contains stronger result, and many new technical ingredients. 45 pages, 2 figures

MSC Class: 05C48 ACM Class: G.2.1; G.2.2

arXiv:2212.01287 [pdf, other]

SARAS-Net: Scale and Relation Aware Siamese Network for Change Detection

Authors: Chao-Peng Chen, Jun-Wei Hsieh, **-Yang Chen, Yi-Kuan Hsieh, Bor-Shiun Wang

Abstract: Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they igno… ▽ More Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they ignore spatial information and scaling changes between objects, giving rise to blurry or wrong boundaries. In addition to these, they also neglect the interactive information of two different images. To alleviate these problems, we propose our network, the Scale and Relation-Aware Siamese Network (SARAS-Net) to deal with this issue. In this paper, three modules are proposed that include relation-aware, scale-aware, and cross-transformer to tackle the problem of scene change detection more effectively. To verify our model, we tested three public datasets, including LEVIR-CD, WHU-CD, and DSFIN, and obtained SoTA accuracy. Our code is available at https://github.com/f64051041/SARAS-Net. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2211.08824 [pdf, other]

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

Authors: Yu-Hsiang Wang, Jun-Wei Hsieh, **-Yang Chen, Ming-Ching Chang, Hung Hin So, Xin Li

Abstract: Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by int… ▽ More Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by integrating an efficient object detector with a Siamese network-based Similarity Learning Module (SLM). The technical contributions of SMILETrack are twofold. First, we propose an SLM that calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding (SDE) models. The SLM incorporates a Patch Self-Attention (PSA) block inspired by the vision Transformer, which generates reliable features for accurate similarity matching. Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance. Together, these innovations help SMILETrack achieve an improved trade-off between the cost ({\em e.g.}, running speed) and performance (e.g., tracking accuracy) over several existing state-of-the-art benchmarks, including the popular BYTETrack method. SMILETrack outperforms BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets. Code is available at https://github.com/**yang1117/SMILEtrack_Official △ Less

Submitted 22 January, 2024; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: Our paper was accepted by AAAI2024

arXiv:2211.06835 [pdf, other]

Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Bor-Shiun Wang

Abstract: We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of… ▽ More We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations. △ Less

Submitted 2 January, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: 8 pages, 8 figures, 4 tables

arXiv:2210.11173 [pdf, other]

Mathematical Justification of Hard Negative Mining via Isometric Approximation Theorem

Authors: Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Eliana Cohen, Howie Choset, Lu Li

Abstract: In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predom… ▽ More In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predominately solve this problem by using triplet mining strategies. While hard negative mining is the most effective of these strategies, existing formulations lack strong theoretical justification for their empirical success. In this paper, we utilize the mathematical theory of isometric approximation to show an equivalence between the Triplet Loss sampled by hard negative mining and an optimization problem that minimizes a Hausdorff-like distance between the neural network and its ideal counterpart function. This provides the theoretical justifications for hard negative mining's empirical efficacy. In addition, our novel application of the isometric approximation theorem provides the groundwork for future forms of hard negative mining that avoid network collapse. Our theory can also be extended to analyze other Euclidean space-based metric learning methods like Ladder Loss or Contrastive Learning. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 9 pages, 6 figures, submitted to AAAI 2023

arXiv:2210.00698 [pdf, other]

NAS-based Recursive Stage Partial Network (RSPNet) for Light-Weight Semantic Segmentation

Authors: Yi-Chun Wang, Jun-Wei Hsieh, Ming-Ching Chang

Abstract: Current NAS-based semantic segmentation methods focus on accuracy improvements rather than light-weight design. In this paper, we proposed a two-stage framework to design our NAS-based RSPNet model for light-weight semantic segmentation. The first architecture search determines the inner cell structure, and the second architecture search considers exponentially growing paths to finalize the outer… ▽ More Current NAS-based semantic segmentation methods focus on accuracy improvements rather than light-weight design. In this paper, we proposed a two-stage framework to design our NAS-based RSPNet model for light-weight semantic segmentation. The first architecture search determines the inner cell structure, and the second architecture search considers exponentially growing paths to finalize the outer structure of the network. It was shown in the literature that the fusion of high- and low-resolution feature maps produces stronger representations. To find the expected macro structure without manual design, we adopt a new path-attention mechanism to efficiently search for suitable paths to fuse useful information for better segmentation. Our search for repeatable micro-structures from cells leads to a superior network architecture in semantic segmentation. In addition, we propose an RSP (recursive Stage Partial) architecture to search a light-weight design for NAS-based semantic segmentation. The proposed architecture is very efficient, simple, and effective that both the macro- and micro- structure searches can be completed in five days of computation on two V100 GPUs. The light-weight NAS architecture with only 1/4 parameter size of SoTA architectures can achieve SoTA performance on semantic segmentation on the Cityscapes dataset without using any backbones. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2210.00546 [pdf, other]

Siamese-NAS: Using Trained Samples Efficiently to Find Lightweight Neural Architecture by Prior Knowledge

Authors: Yu-Ming Zhang, Jun-Wei Hsieh, Chun-Chieh Lee, Kuo-Chin Fan

Abstract: In the past decade, many architectures of convolution neural networks were designed by handcraft, such as Vgg16, ResNet, DenseNet, etc. They all achieve state-of-the-art level on different tasks in their time. However, it still relies on human intuition and experience, and it also takes so much time consumption for trial and error. Neural Architecture Search (NAS) focused on this issue. In recent… ▽ More In the past decade, many architectures of convolution neural networks were designed by handcraft, such as Vgg16, ResNet, DenseNet, etc. They all achieve state-of-the-art level on different tasks in their time. However, it still relies on human intuition and experience, and it also takes so much time consumption for trial and error. Neural Architecture Search (NAS) focused on this issue. In recent works, the Neural Predictor has significantly improved with few training architectures as training samples. However, the sampling efficiency is already considerable. In this paper, our proposed Siamese-Predictor is inspired by past works of predictor-based NAS. It is constructed with the proposed Estimation Code, which is the prior knowledge about the training procedure. The proposed Siamese-Predictor gets significant benefits from this idea. This idea causes it to surpass the current SOTA predictor on NASBench-201. In order to explore the impact of the Estimation Code, we analyze the relationship between it and accuracy. We also propose the search space Tiny-NanoBench for lightweight CNN architecture. This well-designed search space is easier to find better architecture with few FLOPs than NASBench-201. In summary, the proposed Siamese-Predictor is a predictor-based NAS. It achieves the SOTA level, especially with limited computation budgets. It applied to the proposed Tiny-NanoBench can just use a few trained samples to find extremely lightweight CNN architecture. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.01332

Class-Specific Channel Attention for Few-Shot Learning

Authors: Ying-Yu Chen, Jun-Wei Hsieh, Ming-Ching Chang

Abstract: Few-Shot Learning (FSL) has attracted growing attention in computer vision due to its capability in model training without the need for excessive data. FSL is challenging because the training and testing categories (the base vs. novel sets) can be largely diversified. Conventional transfer-based solutions that aim to transfer knowledge learned from large labeled training sets to target testing set… ▽ More Few-Shot Learning (FSL) has attracted growing attention in computer vision due to its capability in model training without the need for excessive data. FSL is challenging because the training and testing categories (the base vs. novel sets) can be largely diversified. Conventional transfer-based solutions that aim to transfer knowledge learned from large labeled training sets to target testing sets are limited, as critical adverse impacts of the shift in task distribution are not adequately addressed. In this paper, we extend the solution of transfer-based methods by incorporating the concept of metric-learning and channel attention. To better exploit the feature representations extracted by the feature backbone, we propose Class-Specific Channel Attention (CSCA) module, which learns to highlight the discriminative channels in each class by assigning each class one CSCA weight vector. Unlike general attention modules designed to learn global-class features, the CSCA module aims to learn local and class-specific features with very effective computation. We evaluated the performance of the CSCA module on standard benchmarks including miniImagenet, Tiered-ImageNet, CIFAR-FS, and CUB-200-2011. Experiments are performed in inductive and in/cross-domain settings. We achieve new state-of-the-art results. △ Less

Submitted 13 December, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: There are errors in the phase of testing, leading to the wrong results listed in the paper

arXiv:2208.04951 [pdf]

Programming Nonlinear Propagation for Efficient Optical Learning Machines

Authors: Ilker Oguz, Jih-Liang Hsieh, Niyazi Ulas Dinc, Uğur Teğin, Mustafa Yildirim, Carlo Gigli, Christophe Moser, Demetri Psaltis

Abstract: The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with l… ▽ More The ever-increasing demand for processing data with larger machine learning models requires more efficient hardware solutions due to limitations such as power dissipation and scalability. Optics is a promising contender for providing lower power computation since light propagation through a non-absorbing medium is a lossless operation. However, to carry out useful and efficient computations with light, generating and controlling nonlinearity optically is a necessity that is still elusive. Multimode fibers (MMF) have been shown that they can provide nonlinear effects with microwatts of average power while maintaining parallelism and low loss. In this work, we propose an optical neural network architecture, which performs nonlinear optical computation by controlling the propagation of ultrashort pulses in MMF by wavefront sha**. With a surrogate model, optimal sets of parameters are found to program this optical computer for different tasks with minimal utilization of an electronic computer. We show a remarkable decrease of 97% in the number of model parameters, which leads to an overall 99% digital operation reduction compared to an equivalently performing digital neural network. We further demonstrate that a fully optical implementation can also be performed with competitive accuracies. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 32 pages, 11 figures

arXiv:2208.00122 [pdf, ps, other]

Polynomial-Time Power-Sum Decomposition of Polynomials

Authors: Mitali Bafna, Jun-Ting Hsieh, Pravesh K. Kothari, Jeff Xu

Abstract: We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the… ▽ More We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments. Unlike tensor decomposition, both the unique identifiability and algorithms for this problem are not well-understood. For the simplest setting of quadratic $p_i$s and $d=3$, prior work of Ge, Huang and Kakade yields an algorithm only when $m \leq \tilde{O}(\sqrt{n})$. On the other hand, the more general recent result of Garg, Kayal and Saha builds an algebraic approach to handle any $m=n^{O(1)}$ components but only when $d$ is large enough (while yielding no bounds for $d=3$ or even $d=100$) and only handles an inverse exponential noise. Our results obtain a substantial quantitative improvement on both the prior works above even in the base case of $d=3$ and quadratic $p_i$s. Specifically, our algorithm succeeds in decomposing a sum of $m \sim \tilde{O}(n)$ generic quadratic $p_i$s for $d=3$ and more generally the $d$th power-sum of $m \sim n^{2d/15}$ generic degree-$K$ polynomials for any $K \geq 2$. Our algorithm relies only on basic numerical linear algebraic primitives, is exact (i.e., obtain arbitrarily tiny error up to numerical precision), and handles an inverse polynomial noise when the $p_i$s have random Gaussian coefficients. Our main tool is a new method for extracting the linear span of $p_i$s by studying the linear subspace of low-order partial derivatives of the input $P$. For establishing polynomial stability of our algorithm in average-case, we prove inverse polynomial bounds on the smallest singular value of certain correlated random matrices with low-degree polynomial entries that arise in our analyses. △ Less

Submitted 29 July, 2022; originally announced August 2022.

Comments: To appear in FOCS 2022

arXiv:2207.10850 [pdf, other]

A simple and sharper proof of the hypergraph Moore bound

Authors: Jun-Ting Hsieh, Pravesh K. Kothari, Sidhanth Mohanty

Abstract: The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory an… ▽ More The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory and Linial [AHL02]. For hypergraphs of uniformity $k>2$, an appropriate generalization was conjectured by Feige [Fei08]. The conjecture was settled up to an additional $\log^{4k+1} n$ factor in the size in a recent work of Guruswami, Kothari and Manohar [GKM21]. Their argument relies on a connection between the existence of short even covers and the spectrum of a certain randomly signed Kikuchi matrix. Their analysis, especially for the case of odd $k$, is significantly complicated. In this work, we present a substantially simpler and shorter proof of the hypergraph Moore bound. Our key idea is the use of a new reweighted Kikuchi matrix and an edge deletion step that allows us to drop several involved steps in [GKM21]'s analysis such as combinatorial bucketing of rows of the Kikuchi matrix and the use of the Schudy-Sviridenko polynomial concentration. Our simpler proof also obtains tighter parameters: in particular, the argument gives a new proof of the classical Moore bound of [AHL02] with no loss (the proof in [GKM21] loses a $\log^3 n$ factor), and loses only a single logarithmic factor for all $k>2$-uniform hypergraphs. As in [GKM21], our ideas naturally extend to yield a simpler proof of the full trade-off for strongly refuting smoothed instances of constraint satisfaction problems with similarly improved parameters. △ Less

Submitted 21 July, 2022; originally announced July 2022.

arXiv:2206.09204 [pdf, ps, other]

Approximating Max-Cut on Bounded Degree Graphs: Tighter Analysis of the FKL Algorithm

Authors: Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis… ▽ More In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis of the solution obtained by applying a natural local improvement procedure to the Goemans-Williamson rounding of the basic SDP strengthened with triangle inequalities. △ Less

Submitted 18 June, 2022; originally announced June 2022.

arXiv:2205.11291 [pdf, other]

Cooperative Reinforcement Learning on Traffic Signal Control

Authors: Chi-Chun Chao, Jun-Wei Hsieh, Bor-Shiun Wang

Abstract: Traffic signal control is a challenging real-world problem aiming to minimize overall travel time by coordinating vehicle movements at road intersections. Existing traffic signal control systems in use still rely heavily on oversimplified information and rule-based methods. Specifically, the periodicity of green/red light alternations can be considered as a prior for better planning of each agent… ▽ More Traffic signal control is a challenging real-world problem aiming to minimize overall travel time by coordinating vehicle movements at road intersections. Existing traffic signal control systems in use still rely heavily on oversimplified information and rule-based methods. Specifically, the periodicity of green/red light alternations can be considered as a prior for better planning of each agent in policy optimization. To better learn such adaptive and predictive priors, traditional RL-based methods can only return a fixed length from predefined action pool with only local agents. If there is no cooperation between these agents, some agents often make conflicts to other agents and thus decrease the whole throughput. This paper proposes a cooperative, multi-objective architecture with age-decaying weights to better estimate multiple reward terms for traffic signal control optimization, which termed COoperative Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (COMMA-DDPG). Two types of agents running to maximize rewards of different goals - one for local traffic optimization at each intersection and the other for global traffic waiting time optimization. The global agent is used to guide the local agents as a means for aiding faster learning but not used in the inference phase. We also provide an analysis of solution existence together with convergence proof for the proposed RL optimization. Evaluation is performed using real-world traffic data collected using traffic cameras from an Asian country. Our method can effectively reduce the total delayed time by 60\%. Results demonstrate its superiority when compared to SoTA methods. △ Less

Submitted 6 August, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

arXiv:2204.04339 [pdf, other]

doi 10.1145/3491102.3517546

A Little Too Personal: Effects of Standardization versus Personalization on Job Acquisition, Work Completion, and Revenue for Online Freelancers

Authors: Jane Hsieh, Yili Hong, Gordon Burtch, Haiyi Zhu

Abstract: As more individuals consider permanently working from home, the online labor market continues to grow as an alternative working environment. While the flexibility and autonomy of these online gigs attracts many workers, success depends critically upon self-management and workers' efficient allocation of scarce resources. To achieve this, freelancers may develop alternative work strategies, employi… ▽ More As more individuals consider permanently working from home, the online labor market continues to grow as an alternative working environment. While the flexibility and autonomy of these online gigs attracts many workers, success depends critically upon self-management and workers' efficient allocation of scarce resources. To achieve this, freelancers may develop alternative work strategies, employing highly standardized schedules and communication patterns while taking on large work volumes, or engaging in smaller numbers of jobs whilst tailoring their activities to build relationships with individual employers. In this study, we consider this contrast in relation to worker communication patterns. We demonstrate the heterogeneous effects of standardization versus personalization across different stages of a project and examine the relative impact on job acquisition, project completion, and earnings. Our findings can inform the design of platforms and various worker support tools for the gig economy. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: CHI'22, April 29-May 5, 2022, New Orleans, LA, USA

arXiv:2203.02445 [pdf, other]

SFPN: Synthetic FPN for Object Detection

Authors: Yu-Ming Zhang, Jun-Wei Hsieh, Chun-Chieh Lee, Kuo-Chin Fan

Abstract: FPN (Feature Pyramid Network) has become a basic component of most SoTA one stage object detectors. Many previous studies have repeatedly proved that FPN can caputre better multi-scale feature maps to more precisely describe objects if they are with different sizes. However, for most backbones such VGG, ResNet, or DenseNet, the feature maps at each layer are downsized to their quarters due to the… ▽ More FPN (Feature Pyramid Network) has become a basic component of most SoTA one stage object detectors. Many previous studies have repeatedly proved that FPN can caputre better multi-scale feature maps to more precisely describe objects if they are with different sizes. However, for most backbones such VGG, ResNet, or DenseNet, the feature maps at each layer are downsized to their quarters due to the pooling operation or convolutions with stride 2. The gap of down-scaling-by-2 is large and makes its FPN not fuse the features smoothly. This paper proposes a new SFPN (Synthetic Fusion Pyramid Network) arichtecture which creates various synthetic layers between layers of the original FPN to enhance the accuracy of light-weight CNN backones to extract objects' visual features more accurately. Finally, experiments prove the SFPN architecture outperforms either the large backbone VGG16, ResNet50 or light-weight backbones such as MobilenetV2 based on AP score. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2110.08677 [pdf, ps, other]

Algorithmic Thresholds for Refuting Random Polynomial Systems

Authors: Jun-Ting Hsieh, Pravesh K. Kothari

Abstract: Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can f… ▽ More Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations. We show that for every $d \in \mathbb{N}$, the $(n+m)^{O(d)}$-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever $m \geq O(n) \cdot (\frac{n}{d})^{D-1}$. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all $d$. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-$4$ sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at $m \gtrsim \widetilde{O}(n) \cdot n^{(1-δ)(D-1)}$ for $2^{n^δ}$-time algorithms for all $δ$. △ Less

Submitted 16 October, 2021; originally announced October 2021.

arXiv:2109.15150 [pdf, other]

A polyurethane-urea elastomer at low to extreme strain rates

Authors: Jaehee Lee, David Veysset, Alex J. Hsieh, Gregory C. Rutledge, Hansohl Cho

Abstract: A finite strain nonlinear constitutive model is presented to study the extreme mechanical behavior of a polyurethane-urea well suited for many engineering applications. The micromechanically- and thermodynamically based constitutive model captures salient features in resilience and dissipation in the material at low to high strain rate. The extreme deformation features are further elucidated by la… ▽ More A finite strain nonlinear constitutive model is presented to study the extreme mechanical behavior of a polyurethane-urea well suited for many engineering applications. The micromechanically- and thermodynamically based constitutive model captures salient features in resilience and dissipation in the material at low to high strain rate. The extreme deformation features are further elucidated by laser-induced micro-particle impact tests for the material, where an ultrafast strain rate ($>10^6$ s$^{-1}$) incurs. Numerical simulations for the strongly inhomogeneous deformation events are in good agreement with the experimental data, supporting the predictive capabilities of the constitutive model for the extreme deformation features of the PUU material over at least 9 orders of magnitude in strain rates ($10^{-3}$ to $10^{6}$ s$^{-1}$). △ Less

Submitted 19 April, 2023; v1 submitted 30 September, 2021; originally announced September 2021.

arXiv:2109.06638 [pdf, other]

Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks

Authors: Bor-Shiun Wang, Jun-Wei Hsieh, Ming-Ching Chang, **-Yang Chen, Lipeng Ke, Siwei Lyu

Abstract: Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features… ▽ More Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to retain features better and avoid information drop**. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures and consistently outperform standard (max, mean, mixed, and stochastic) pooling operations. △ Less

Submitted 20 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by BMVC 2021

arXiv:2108.09996 [pdf, other]

MS-DARTS: Mean-Shift Based Differentiable Architecture Search

Authors: Jun-Wei Hsieh, Ming-Ching Chang, **-Yang Chen, Santanu Santra, Cheng-Han Chou, Chih-Sheng Huang

Abstract: Differentiable Architecture Search (DARTS) is an effective continuous relaxation-based network architecture search (NAS) method with low search cost. It has attracted significant attentions in Auto-ML research and becomes one of the most useful paradigms in NAS. Although DARTS can produce superior efficiency over traditional NAS approaches with better control of complex parameters, oftentimes it s… ▽ More Differentiable Architecture Search (DARTS) is an effective continuous relaxation-based network architecture search (NAS) method with low search cost. It has attracted significant attentions in Auto-ML research and becomes one of the most useful paradigms in NAS. Although DARTS can produce superior efficiency over traditional NAS approaches with better control of complex parameters, oftentimes it suffers from stabilization issues in producing deteriorating architectures when discretizing the continuous architecture. We observed considerable loss of validity causing dramatic decline in performance at this final discretization step of DARTS. To address this issue, we propose a Mean-Shift based DARTS (MS-DARTS) to improve stability based on sampling and perturbation. Our approach can improve bot the stability and accuracy of DARTS, by smoothing the loss landscape and sampling architecture parameters within a suitable bandwidth. We investigate the convergence of our mean-shift approach, together with the effects of bandwidth selection that affects stability and accuracy. Evaluations performed on CIFAR-10, CIFAR-100, and ImageNet show that MS-DARTS archives higher performance over other state-of-the-art NAS methods with reduced search cost. △ Less

Submitted 9 March, 2022; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: 14pages

arXiv:2107.04829 [pdf, other]

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing

Authors: Yu-Ming Zhang, Chun-Chieh Lee, Jun-Wei Hsieh, Kuo-Chin Fan

Abstract: The development of lightweight object detectors is essential due to the limited computation resources. To reduce the computation cost, how to generate redundant features plays a significant role. This paper proposes a new lightweight Convolution method Cross-Stage Lightweight (CSL) Module, to generate redundant features from cheap operations. In the intermediate expansion stage, we replaced Pointw… ▽ More The development of lightweight object detectors is essential due to the limited computation resources. To reduce the computation cost, how to generate redundant features plays a significant role. This paper proposes a new lightweight Convolution method Cross-Stage Lightweight (CSL) Module, to generate redundant features from cheap operations. In the intermediate expansion stage, we replaced Pointwise Convolution with Depthwise Convolution to produce candidate features. The proposed CSL-Module can reduce the computation cost significantly. Experiments conducted at MS-COCO show that the proposed CSL-Module can approximate the fitting ability of Convolution-3x3. Finally, we use the module to construct a lightweight detector CSL-YOLO, achieving better detection performance with only 43% FLOPs and 52% parameters than Tiny-YOLOv4. △ Less

Submitted 10 July, 2021; originally announced July 2021.

arXiv:2106.12710 [pdf, other]

Certifying solution geometry in random CSPs: counts, clusters and balance

Authors: Jun-Ting Hsieh, Sidhanth Mohanty, Jeff Xu

Abstract: An active topic in the study of random constraint satisfaction problems (CSPs) is the geometry of the space of satisfying or almost satisfying assignments as the function of the density, for which a precise landscape of predictions has been made via statistical physics-based heuristics. In parallel, there has been a recent flurry of work on refuting random constraint satisfaction problems, via nai… ▽ More An active topic in the study of random constraint satisfaction problems (CSPs) is the geometry of the space of satisfying or almost satisfying assignments as the function of the density, for which a precise landscape of predictions has been made via statistical physics-based heuristics. In parallel, there has been a recent flurry of work on refuting random constraint satisfaction problems, via nailing refutation thresholds for spectral and semidefinite programming-based algorithms, and also on counting solutions to CSPs. Inspired by this, the starting point for our work is the following question: what does the solution space for a random CSP look like to an efficient algorithm? In pursuit of this inquiry, we focus on the following problems about random Boolean CSPs at the densities where they are unsatisfiable but no refutation algorithm is known. 1. Counts. For every Boolean CSP we give algorithms that with high probability certify a subexponential upper bound on the number of solutions. We also give algorithms to certify a bound on the number of large cuts in a Gaussian-weighted graph, and the number of large independent sets in a random $d$-regular graph. 2. Clusters. For Boolean $3$CSPs we give algorithms that with high probability certify an upper bound on the number of clusters of solutions. 3. Balance. We also give algorithms that with high probability certify that there are no "unbalanced" solutions, i.e., solutions where the fraction of $+1$s deviates significantly from $50\%$. Finally, we also provide hardness evidence suggesting that our algorithms for counting are optimal. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2012.01724 [pdf, other]

doi 10.1109/TIP.2021.3118953

Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate Single-Shot Object Detection

Authors: **-Yang Chen, Ming-Ching Chang, Jun-Wei Hsieh, Yong-Sheng Chen

Abstract: This paper proposes the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN) for fast and accurate single-shot object detection. Feature Pyramid (FP) is widely used in recent visual detection, however the top-down pathway of FP cannot preserve accurate localization due to pooling shifting. The advantage of FP is weakened as deeper backbones with more layers are used. In addition, it canno… ▽ More This paper proposes the Parallel Residual Bi-Fusion Feature Pyramid Network (PRB-FPN) for fast and accurate single-shot object detection. Feature Pyramid (FP) is widely used in recent visual detection, however the top-down pathway of FP cannot preserve accurate localization due to pooling shifting. The advantage of FP is weakened as deeper backbones with more layers are used. In addition, it cannot keep up accurate detection of both small and large objects at the same time. To address these issues, we propose a new parallel FP structure with bi-directional (top-down and bottom-up) fusion and associated improvements to retain high-quality features for accurate localization. We provide the following design improvements: (1) A parallel bifusion FP structure with a bottom-up fusion module (BFM) to detect both small and large objects at once with high accuracy. (2) A concatenation and re-organization (CORE) module provides a bottom-up pathway for feature fusion, which leads to the bi-directional fusion FP that can recover lost information from lower-layer feature maps. (3) The CORE feature is further purified to retain richer contextual information. Such CORE purification in both top-down and bottom-up pathways can be finished in only a few iterations. (4) The adding of a residual design to CORE leads to a new Re-CORE module that enables easy training and integration with a wide range of deeper or lighter backbones. The proposed network achieves state-of-the-art performance on the UAVDT17 and MS COCO datasets. Code is available at https://github.com/**yang1117/PRBNet_PyTorch. △ Less

Submitted 18 May, 2023; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: accepted by IEEE transactions on Image Processing

Journal ref: IEEE Transactions on Image Processing, vol. 30, pp. 9099-9111, 2021

arXiv:1911.12051 [pdf]

Residual Bi-Fusion Feature Pyramid Network for Accurate Single-shot Object Detection

Authors: **-Yang Chen, Jun-Wei Hsieh, Chien-Yao Wang, Hong-Yuan Mark Liao, Munkhjargal Gochoo

Abstract: State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this top-down pathway cannot preserve accurate object positions due to the shift-effect of pooling. Thus, the advantage of FP… ▽ More State-of-the-art (SoTA) models have improved the accuracy of object detection with a large margin via a FP (feature pyramid). FP is a top-down aggregation to collect semantically strong features to improve scale invariance in both two-stage and one-stage detectors. However, this top-down pathway cannot preserve accurate object positions due to the shift-effect of pooling. Thus, the advantage of FP to improve detection accuracy will disappear when more layers are used. The original FP lacks a bottom-up pathway to offset the lost information from lower-layer feature maps. It performs well in large-sized object detection but poor in small-sized object detection. A new structure "residual feature pyramid" is proposed in this paper. It is bidirectional to fuse both deep and shallow features towards more effective and robust detection for both small-sized and large-sized objects. Due to the "residual" nature, it can be easily trained and integrated to different backbones (even deeper or lighter) than other bi-directional methods. One important property of this residual FP is: accuracy improvement is still found even if more layers are adopted. Extensive experiments on VOC and MS COCO datasets showed the proposed method achieved the SoTA results for highly-accurate and efficient object detection.. △ Less

Submitted 10 December, 2019; v1 submitted 27 November, 2019; originally announced November 2019.

arXiv:1911.11929 [pdf, other]

CSPNet: A New Backbone that can Enhance Learning Capability of CNN

Authors: Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, **-Yang Chen, Jun-Wei Hsieh

Abstract: Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works re… ▽ More Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP50 on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet. Source code is at https://github.com/WongKinYiu/CrossStagePartialNetworks. △ Less

Submitted 26 November, 2019; originally announced November 2019.

arXiv:1907.05594 [pdf, ps, other]

doi 10.1109/TII.2019.2928520

Pareto Optimal Demand Response Based on Energy Costs and Load Factor in Smart Grid

Authors: Wei-Yu Chiu, Jui-Ting Hsieh, Chia-Ming Chen

Abstract: Demand response for residential users is essential to the realization of modern smart grids. This paper proposes a multiobjective approach to designing a demand response program that considers the energy costs of residential users and the load factor of the underlying grid. A multiobjective optimization problem (MOP) is formulated and Pareto optimality is adopted. Stochastic search methods of gene… ▽ More Demand response for residential users is essential to the realization of modern smart grids. This paper proposes a multiobjective approach to designing a demand response program that considers the energy costs of residential users and the load factor of the underlying grid. A multiobjective optimization problem (MOP) is formulated and Pareto optimality is adopted. Stochastic search methods of generating feasible values for decision variables are proposed. Theoretical analysis is performed to show that the proposed methods can effectively generate and preserve feasible points during the solution process, which comparable methods can hardly achieve. A multiobjective evolutionary algorithm is developed to solve the MOP, producing a Pareto optimal demand response (PODR) program. Simulations reveal that the proposed method outperforms the comparable methods in terms of energy costs while producing a satisfying load factor. The proposed PODR program is able to systematically balance the needs of the grid and residential users. △ Less

Submitted 12 July, 2019; originally announced July 2019.

arXiv:1906.01200 [pdf, other]

Learning Neural PDE Solvers with Convergence Guarantees

Authors: Jun-Ting Hsieh, Shengjia Zhao, Stephan Eismann, Lucia Mirabella, Stefano Ermon

Abstract: Partial differential equations (PDEs) are widely used across the physical and computational sciences. Decades of research and engineering went into designing fast iterative solution methods. Existing solvers are general purpose, but may be sub-optimal for specific classes of problems. In contrast to existing hand-crafted solutions, we propose an approach to learn a fast iterative solver tailored t… ▽ More Partial differential equations (PDEs) are widely used across the physical and computational sciences. Decades of research and engineering went into designing fast iterative solution methods. Existing solvers are general purpose, but may be sub-optimal for specific classes of problems. In contrast to existing hand-crafted solutions, we propose an approach to learn a fast iterative solver tailored to a specific domain. We achieve this goal by learning to modify the updates of an existing solver using a deep neural network. Crucially, our approach is proven to preserve strong correctness and convergence guarantees. After training on a single geometry, our model generalizes to a wide variety of geometries and boundary conditions, and achieves 2-3 times speedup compared to state-of-the-art solvers. △ Less

Submitted 4 June, 2019; originally announced June 2019.

arXiv:1812.00169 [pdf, other]

Vision-Based Gait Analysis for Senior Care

Authors: David Xue, Anin Sayana, Evan Darke, Kelly Shen, Jun-Ting Hsieh, Zelun Luo, Li-Jia Li, N. Lance Downing, Arnold Milstein, Li Fei-Fei

Abstract: As the senior population rapidly increases, it is challenging yet crucial to provide effective long-term care for seniors who live at home or in senior care facilities. Smart senior homes, which have gained widespread interest in the healthcare community, have been proposed to improve the well-being of seniors living independently. In particular, non-intrusive, cost-effective sensors placed in the… ▽ More As the senior population rapidly increases, it is challenging yet crucial to provide effective long-term care for seniors who live at home or in senior care facilities. Smart senior homes, which have gained widespread interest in the healthcare community, have been proposed to improve the well-being of seniors living independently. In particular, non-intrusive, cost-effective sensors placed in these senior homes enable gait characterization, which can provide clinically relevant information including mobility level and early neurodegenerative disease risk. In this paper, we present a method to perform gait analysis from a single camera placed within the home. We show that we can accurately calculate various gait parameters, demonstrating the potential for our system to monitor the long-term gait of seniors and thus aid clinicians in understanding a patient's medical profile. △ Less

Submitted 1 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/78

Showing 1–50 of 66 results for author: Hsieh, J