Search | arXiv e-print repository

FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

Authors: Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying **, Mingyu Huang, Xiangyang Li, Shuhuan Mei, Shuqiang Jiang

Abstract: Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery a… ▽ More Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery and understanding. Towards this goal, for powerful capabilities across various domains and tasks in Large Language Models (LLMs), we introduce Food-oriented LLM FoodSky to comprehend food data through perception and reasoning. Considering the complexity and typicality of Chinese cuisine, we first construct one comprehensive Chinese food corpus FoodEarth from various authoritative sources, which can be leveraged by FoodSky to achieve deep understanding of food-related data. We then propose Topic-based Selective State Space Model (TS3M) and the Hierarchical Topic Retrieval Augmented Generation (HTRAG) mechanism to enhance FoodSky in capturing fine-grained food semantics and generating context-aware food-relevant text, respectively. Our extensive evaluations demonstrate that FoodSky significantly outperforms general-purpose LLMs in both chef and dietetic examinations, with an accuracy of 67.2% and 66.4% on the Chinese National Chef Exam and the National Dietetic Exam, respectively. FoodSky not only promises to enhance culinary creativity and promote healthier eating patterns, but also sets a new standard for domain-specific LLMs that address complex real-world issues in the food domain. An online demonstration of FoodSky is available at http://222.92.101.211:8200. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 32 pages, 19 figures

arXiv:2405.02207 [pdf]

Water Structure and Electric Fields at the Interface of Oil Droplets

Authors: Lixue Shi, R. Allen LaCour, Xiaoqi Lang, Joseph P. Heindel, Teresa Head-Gordon, Wei Min

Abstract: Mesoscale water-hydrophobic interfaces are of fundamental importance in multiple disciplines, but their molecular properties have remained elusive for decades due to experimental complications and alternate theoretical explanations. Surface-specific spectroscopies, such as vibrational sum-frequency techniques, suffer from either sample preparation issues or the need for complex spectral correction… ▽ More Mesoscale water-hydrophobic interfaces are of fundamental importance in multiple disciplines, but their molecular properties have remained elusive for decades due to experimental complications and alternate theoretical explanations. Surface-specific spectroscopies, such as vibrational sum-frequency techniques, suffer from either sample preparation issues or the need for complex spectral corrections. Here, we report on a robust "in solution" interface-selective Raman spectroscopy approach using multivariate curve resolution to probe hexadecane in water emulsions. Computationally, we use the recently developed monomer field model for Raman spectroscopy to help interpret the interfacial spectra. Unlike with vibrational sum frequency techniques, our interfacial spectra are readily comparable to the spectra of bulk water, yielding new insights. The combination of experiment and theory show that the interface leads to reduced tetrahedral order and weaker hydrogen bonding, giving rise to a substantial water population with dangling OH at the interface. Additionally, the stretching mode of these free OH experiences a ~80 cm-1 red-shift due to a strong electric field which we attribute to the negative zeta potential that is general to oil droplets. These findings are either opposite to, or absent in, the molecular hydrophobic interface formed by small solutes. Together, water structural disorder and enhanced electrostatics are an emergent feature at the mesoscale interface of oil-water emulsions, with an estimated interfacial electric field of ~35-70 MV/cm that is important for chemical reactivity. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2403.10910 [pdf, other]

Graph Regularized NMF with L20-norm for Unsupervised Feature Learning

Authors: Zhen Wang, Wenwen Min

Abstract: Nonnegative Matrix Factorization (NMF) is a widely applied technique in the fields of machine learning and data mining. Graph Regularized Non-negative Matrix Factorization (GNMF) is an extension of NMF that incorporates graph regularization constraints. GNMF has demonstrated exceptional performance in clustering and dimensionality reduction, effectively discovering inherent low-dimensional structu… ▽ More Nonnegative Matrix Factorization (NMF) is a widely applied technique in the fields of machine learning and data mining. Graph Regularized Non-negative Matrix Factorization (GNMF) is an extension of NMF that incorporates graph regularization constraints. GNMF has demonstrated exceptional performance in clustering and dimensionality reduction, effectively discovering inherent low-dimensional structures embedded within high-dimensional spaces. However, the sensitivity of GNMF to noise limits its stability and robustness in practical applications. In order to enhance feature sparsity and mitigate the impact of noise while mining row sparsity patterns in the data for effective feature selection, we introduce the $\ell_{2,0}$-norm constraint as the sparsity constraints for GNMF. We propose an unsupervised feature learning framework based on GNMF\_$\ell_{20}$ and devise an algorithm based on PALM and its accelerated version to address this problem. Additionally, we establish the convergence of the proposed algorithms and validate the efficacy and superiority of our approach through experiments conducted on both simulated and real image data. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Submitted to IEEE Trans journal

arXiv:2403.10863 [pdf, other]

stMCDI: Masked Conditional Diffusion Model with Graph Neural Network for Spatial Transcriptomics Data Imputation

Authors: Xiaoyu Li, Wenwen Min, Shunfang Wang, Changmiao Wang, Taosheng Xu

Abstract: Spatially resolved transcriptomics represents a significant advancement in single-cell analysis by offering both gene expression data and their corresponding physical locations. However, this high degree of spatial resolution entails a drawback, as the resulting spatial transcriptomic data at the cellular level is notably plagued by a high incidence of missing values. Furthermore, most existing im… ▽ More Spatially resolved transcriptomics represents a significant advancement in single-cell analysis by offering both gene expression data and their corresponding physical locations. However, this high degree of spatial resolution entails a drawback, as the resulting spatial transcriptomic data at the cellular level is notably plagued by a high incidence of missing values. Furthermore, most existing imputation methods either overlook the spatial information between spots or compromise the overall gene expression data distribution. To address these challenges, our primary focus is on effectively utilizing the spatial location information within spatial transcriptomic data to impute missing values, while preserving the overall data distribution. We introduce \textbf{stMCDI}, a novel conditional diffusion model for spatial transcriptomics data imputation, which employs a denoising network trained using randomly masked data portions as guidance, with the unmasked data serving as conditions. Additionally, it utilizes a GNN encoder to integrate the spatial position information, thereby enhancing model performance. The results obtained from spatial transcriptomics datasets elucidate the performance of our methods relative to existing approaches. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Submitted to IJCAI2024

arXiv:2402.09242 [pdf, other]

doi 10.1109/TIP.2024.3360899

Synthesizing Knowledge-enhanced Features for Real-world Zero-shot Food Detection

Authors: Pengfei Zhou, Weiqing Min, Jiajun Song, Yang Zhang, Shuqiang Jiang

Abstract: Food computing brings various perspectives to computer vision like vision-based food analysis for nutrition and health. As a fundamental task in food computing, food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support real-world scenarios, such as intelligent kitchens and smart restaurants. Therefore, we first benchmark the task of Zero-Shot Food Detection (ZSFD) by i… ▽ More Food computing brings various perspectives to computer vision like vision-based food analysis for nutrition and health. As a fundamental task in food computing, food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support real-world scenarios, such as intelligent kitchens and smart restaurants. Therefore, we first benchmark the task of Zero-Shot Food Detection (ZSFD) by introducing FOWA dataset with rich attribute annotations. Unlike ZSD, fine-grained problems in ZSFD like inter-class similarity make synthesized features inseparable. The complexity of food semantic attributes further makes it more difficult for current ZSD methods to distinguish various food categories. To address these problems, we propose a novel framework ZSFDet to tackle fine-grained problems by exploiting the interaction between complex attributes. Specifically, we model the correlation between food categories and attributes in ZSFDet by multi-source graphs to provide prior knowledge for distinguishing fine-grained features. Within ZSFDet, Knowledge-Enhanced Feature Synthesizer (KEFS) learns knowledge representation from multiple sources (e.g., ingredients correlation from knowledge graph) via the multi-source graph fusion. Conditioned on the fusion of semantic knowledge representation, the region feature diffusion model in KEFS can generate fine-grained features for training the effective zero-shot detector. Extensive evaluations demonstrate the superior performance of our method ZSFDet on FOWA and the widely-used food dataset UECFOOD-256, with significant improvements by 1.8% and 3.7% ZSD mAP compared with the strong baseline RRFS. Further experiments on PASCAL VOC and MS COCO prove that enhancement of the semantic knowledge can also improve the performance on general ZSD. Code and dataset are available at https://github.com/LanceZPF/KEFS. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 14 pages, accepted by IEEE Transactions on Image Processing (2024)

arXiv:2312.07473 [pdf]

Quantum Mechanical Treatment of Stimulated Raman Cross Sections

Authors: Wei Min, Xin Gao

Abstract: Stimulated Raman scattering (SRS) has played an increasingly pivotal role in chemistry and photonics. Recently, understanding of light-molecule interaction during SRS was brought to a new quantitative level through the introduction of stimulated Raman cross section, $σ_{SRS}$. Measurements of Raman-active molecules have revealed interesting insights, and theoretical consideration has suggested an… ▽ More Stimulated Raman scattering (SRS) has played an increasingly pivotal role in chemistry and photonics. Recently, understanding of light-molecule interaction during SRS was brought to a new quantitative level through the introduction of stimulated Raman cross section, $σ_{SRS}$. Measurements of Raman-active molecules have revealed interesting insights, and theoretical consideration has suggested an Einstein-coefficient-like relation between $σ_{SRS}$ and the commonly used spontaneous Raman cross sections, $σ_{Raman}$. However, the theoretical underpinning of $σ_{SRS}$ is not known. Herein we provide a full quantum mechanical treatment for $σ_{SRS}$, via both a semi-classical method and a quantum electrodynamic (QED) method. The resulting formula provides a rigorous theory to predict experimental outcome from first principles, and unveils key physical factors rendering $σ_{SRS}$ inherently strong response. Through this formula, we also confirm the validity of the Einstein-coefficient-like equation connecting $σ_{Raman}$ and $σ_{SRS}$ reported earlier, and discuss the inherent symmetry between all spontaneous and stimulated optical processes. Hence the present treatment shall deepen the fundamental understanding of the molecular response during SRS, and facilitate quantitative applications in various experiments. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 19 pages, 2 figures

arXiv:2311.15328 [pdf, other]

BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images

Authors: Zhanghao Chen, Yifei Sun, Wenjian Qin, Ruiquan Ge, Cheng Pan, Wenming Deng, Zhou Liu, Wenwen Min, Ahmed Elazab, Xiang Wan, Changmiao Wang

Abstract: Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in t… ▽ More Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in the clinic requires costly equipment and subjects being exposed to high radiation. To circumvent these issues, deep learning-based image generation algorithms have been proposed. However, existing methods fall short in terms of producing high-quality images and capturing texture details, particularly with pulmonary vessels. To address these issues, this paper proposes a new bone suppression framework, termed BS-Diff, that comprises a conditional diffusion model equipped with a U-Net architecture and a simple enhancement module to incorporate an autoencoder. Our proposed network cannot only generate soft tissue images with a high bone suppression rate but also possesses the capability to capture fine image details. Additionally, we compiled the largest dataset since 2010, including data from 120 patients with high-definition, high-resolution paired CXRs and soft tissue images collected by our affiliated hospital. Extensive experiments, comparative analyses, ablation studies, and clinical evaluations indicate that the proposed BS-Diff outperforms several bone-suppression models across multiple metrics. Our code can be accessed at https://github.com/Benny0323/BS-Diff. △ Less

Submitted 28 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures, accepted by IEEE ISBI 2024

arXiv:2311.02400 [pdf, other]

From Plate to Production: Artificial Intelligence in Modern Consumer-Driven Food Systems

Authors: Weiqing Min, Pengfei Zhou, Leyi Xu, Tao Liu, Tianhao Li, Mingyu Huang, Ying **, Yifan Yi, Min Wen, Shuqiang Jiang, Ramesh Jain

Abstract: Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices,… ▽ More Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices, subsequently sha** agricultural outputs, and promoting an optimized feedback loop from consumption to cultivation. Initially, we delve into AI tools and techniques spanning the food supply chain, and subsequently assess how AI subfields$\unicode{x2013}$encompassing machine learning, computer vision, and speech recognition$\unicode{x2013}$are harnessed within the AI-enabled Food System (AIFS) framework, which increasingly leverages Internet of Things, multimodal sensors and real-time data exchange. We spotlight the AIFS framework, emphasizing its fusion of AI with technologies such as digitalization, big data analytics, biotechnology, and IoT extensively used in modern food systems in every component. This paradigm shifts the conventional "farm to fork" narrative to a cyclical "consumer-driven farm to fork" model for better achieving sustainable, nutritious diets. This paper explores AI's promise and the intrinsic challenges it poses within the food domain. By championing stringent AI governance, uniform data architectures, and cross-disciplinary partnerships, we argue that AI, when synergized with consumer-centric strategies, holds the potential to steer food systems toward a sustainable trajectory. We furnish a comprehensive survey for the state-of-the-art in diverse facets of food systems, subsequently pinpointing gaps and advocating for the judicious and efficacious deployment of emergent AI methodologies. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.04689 [pdf, other]

doi 10.1145/3581783.3612661

SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection

Authors: Pengfei Zhou, Weiqing Min, Yang Zhang, Jiajun Song, Ying **, Shuqiang Jiang

Abstract: Food detection is becoming a fundamental task in food computing that supports various multimedia applications, including food recommendation and dietary monitoring. To deal with real-world scenarios, food detection needs to localize and recognize novel food objects that are not seen during training, demanding Zero-Shot Detection (ZSD). However, the complexity of semantic attributes and intra-class… ▽ More Food detection is becoming a fundamental task in food computing that supports various multimedia applications, including food recommendation and dietary monitoring. To deal with real-world scenarios, food detection needs to localize and recognize novel food objects that are not seen during training, demanding Zero-Shot Detection (ZSD). However, the complexity of semantic attributes and intra-class feature diversity poses challenges for ZSD methods in distinguishing fine-grained food classes. To tackle this, we propose the Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food Detection (ZSFD). SeeDS consists of two modules: a Semantic Separable Synthesizing Module (S$^3$M) and a Region Feature Denoising Diffusion Model (RFDDM). The S$^3$M learns the disentangled semantic representation for complex food attributes from ingredients and cuisines, and synthesizes discriminative food features via enhanced semantic information. The RFDDM utilizes a novel diffusion model to generate diversified region features and enhances ZSFD via fine-grained synthesized features. Extensive experiments show the state-of-the-art ZSFD performance of our proposed method on two food datasets, ZSFooD and UECFOOD-256. Moreover, SeeDS also maintains effectiveness on general ZSD datasets, PASCAL VOC and MS COCO. The code and dataset can be found at https://github.com/LanceZPF/SeeDS. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted by ACM Multimedia 2023

arXiv:2309.09239 [pdf, other]

Globally Convergent Accelerated Algorithms for Multilinear Sparse Logistic Regression with $\ell_0$-constraints

Authors: Weifeng Yang, Wenwen Min

Abstract: Tensor data represents a multidimensional array. Regression methods based on low-rank tensor decomposition leverage structural information to reduce the parameter count. Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data. To improve its efficacy and interpretability, we present a Multilinear Sparse Logistic Regression model with $\ell_0$-constraints… ▽ More Tensor data represents a multidimensional array. Regression methods based on low-rank tensor decomposition leverage structural information to reduce the parameter count. Multilinear logistic regression serves as a powerful tool for the analysis of multidimensional data. To improve its efficacy and interpretability, we present a Multilinear Sparse Logistic Regression model with $\ell_0$-constraints ($\ell_0$-MLSR). In contrast to the $\ell_1$-norm and $\ell_2$-norm, the $\ell_0$-norm constraint is better suited for feature selection. However, due to its nonconvex and nonsmooth properties, solving it is challenging and convergence guarantees are lacking. Additionally, the multilinear operation in $\ell_0$-MLSR also brings non-convexity. To tackle these challenges, we propose an Accelerated Proximal Alternating Linearized Minimization with Adaptive Momentum (APALM$^+$) method to solve the $\ell_0$-MLSR model. We provide a proof that APALM$^+$ can ensure the convergence of the objective function of $\ell_0$-MLSR. We also demonstrate that APALM$^+$ is globally convergent to a first-order critical point as well as establish convergence rate by using the Kurdyka-Lojasiewicz property. Empirical results obtained from synthetic and real-world datasets validate the superior performance of our algorithm in terms of both accuracy and speed compared to other state-of-the-art methods. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2308.12126

arXiv:2308.12126 [pdf, other]

An Accelerated Block Proximal Framework with Adaptive Momentum for Nonconvex and Nonsmooth Optimization

Authors: Weifeng Yang, Wenwen Min

Abstract: We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$^+$) for nonconvex and nonsmooth optimization. We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process that evaluates the trade-off between the proximal gradient step and the linear extrapolation step in our algorithm. Furth… ▽ More We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$^+$) for nonconvex and nonsmooth optimization. We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process that evaluates the trade-off between the proximal gradient step and the linear extrapolation step in our algorithm. Furthermore, we extends our algorithm to any scenario involving updating block variables with positive integers, allowing each cycle to randomly shuffle the update order of the variable blocks. Additionally, under mild assumptions, we prove that ABPL$^+$ can monotonically decrease the function value without strictly restricting the extrapolation parameters and step size, demonstrates the viability and effectiveness of updating these blocks in a random order, and we also more obviously and intuitively demonstrate that the derivative set of the sequence generated by our algorithm is a critical point set. Moreover, we demonstrate the global convergence as well as the linear and sublinear convergence rates of our algorithm by utilizing the Kurdyka-Lojasiewicz (KŁ) condition. To enhance the effectiveness and flexibility of our algorithm, we also expand the study to the imprecise version of our algorithm and construct an adaptive extrapolation parameter strategy, which improving its overall performance. We apply our algorithm to multiple non-negative matrix factorization with the $\ell_0$ norm, nonnegative tensor decomposition with the $\ell_0$ norm, and perform extensive numerical experiments to validate its effectiveness and efficiency. △ Less

Submitted 24 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.06740 [pdf, other]

Weighted Sparse Partial Least Squares for Joint Sample and Feature Selection

Authors: Wenwen Min, Taosheng Xu, Chris Ding

Abstract: Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance. However, sPLS extracts the combinations between two data sets with all data samples so that it cannot detect latent subsets of samples. To extend the application of s… ▽ More Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance. However, sPLS extracts the combinations between two data sets with all data samples so that it cannot detect latent subsets of samples. To extend the application of sPLS by identifying a specific subset of samples and remove outliers, we propose an $\ell_\infty/\ell_0$-norm constrained weighted sparse PLS ($\ell_\infty/\ell_0$-wsPLS) method for joint sample and feature selection, where the $\ell_\infty/\ell_0$-norm constrains are used to select a subset of samples. We prove that the $\ell_\infty/\ell_0$-norm constrains have the Kurdyka-Ł{ojasiewicz}~property so that a globally convergent algorithm is developed to solve it. Moreover, multi-view data with a same set of samples can be available in various real problems. To this end, we extend the $\ell_\infty/\ell_0$-wsPLS model and propose two multi-view wsPLS models for multi-view data fusion. We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property. As well as numerical and biomedical data experiments demonstrate the efficiency of the proposed methods. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2303.07840 [pdf, other]

doi 10.1109/TIP.2023.3261749

Precise Facial Landmark Detection by Reference Heatmap Transformer

Authors: Jun Wan, Jun Liu, Jie Zhou, Zhihui Lai, Linlin Shen, Hang Sun, ** Xiong, Wenwen Min

Abstract: Most facial landmark detection methods predict landmarks by map** the input facial appearance features to landmark heatmaps and have achieved promising results. However, when the face image is suffering from large poses, heavy occlusions and complicated illuminations, they cannot learn discriminative feature representations and effective facial shape constraints, nor can they accurately predict… ▽ More Most facial landmark detection methods predict landmarks by map** the input facial appearance features to landmark heatmaps and have achieved promising results. However, when the face image is suffering from large poses, heavy occlusions and complicated illuminations, they cannot learn discriminative feature representations and effective facial shape constraints, nor can they accurately predict the value of each element in the landmark heatmap, limiting their detection accuracy. To address this problem, we propose a novel Reference Heatmap Transformer (RHT) by introducing reference heatmap information for more precise facial landmark detection. The proposed RHT consists of a Soft Transformation Module (STM) and a Hard Transformation Module (HTM), which can cooperate with each other to encourage the accurate transformation of the reference heatmap information and facial shape constraints. Then, a Multi-Scale Feature Fusion Module (MSFFM) is proposed to fuse the transformed heatmap features and the semantic features learned from the original face images to enhance feature representations for producing more accurate target heatmaps. To the best of our knowledge, this is the first study to explore how to enhance facial landmark detection by transforming the reference heatmap information. The experimental results from challenging benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art methods in the literature. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE Transactions on Image Processing, March 2023

arXiv:2210.04399 [pdf, other]

Deep Learning for Logo Detection: A Survey

Authors: Sujuan Hou, Jiacheng Li, Weiqing Min, Qiang Hou, Yanna Zhao, Yuanjie Zheng, Shuqiang Jiang

Abstract: When logos are increasingly created, logo detection has gradually become a research hotspot across many domains and tasks. Recent advances in this area are dominated by deep learning-based solutions, where many datasets, learning strategies, network architectures, etc. have been employed. This paper reviews the advance in applying deep learning techniques to logo detection. Firstly, we discuss a c… ▽ More When logos are increasingly created, logo detection has gradually become a research hotspot across many domains and tasks. Recent advances in this area are dominated by deep learning-based solutions, where many datasets, learning strategies, network architectures, etc. have been employed. This paper reviews the advance in applying deep learning techniques to logo detection. Firstly, we discuss a comprehensive account of public datasets designed to facilitate performance evaluation of logo detection algorithms, which tend to be more diverse, more challenging, and more reflective of real life. Next, we perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy. Subsequently, we summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance. Finally, we analyze the potential challenges and present the future directions for the development of logo detection to complete this survey. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2209.14624 [pdf, other]

Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning

Authors: Manas Gupta, Efe Camci, Vishandi Rudy Keneta, Abhishek Vaidyanathan, Ritwik Kanodia, Chuan-Sheng Foo, Wu Min, Lin Jie

Abstract: Pruning neural networks has become popular in the last decade when it was shown that a large number of weights can be safely removed from modern neural networks without compromising accuracy. Numerous pruning methods have been proposed since, each claiming to be better than prior art, however, at the cost of increasingly complex pruning methodologies. These methodologies include utilizing importan… ▽ More Pruning neural networks has become popular in the last decade when it was shown that a large number of weights can be safely removed from modern neural networks without compromising accuracy. Numerous pruning methods have been proposed since, each claiming to be better than prior art, however, at the cost of increasingly complex pruning methodologies. These methodologies include utilizing importance scores, getting feedback through back-propagation or having heuristics-based pruning rules amongst others. In this work, we question whether this pattern of introducing complexity is really necessary to achieve better pruning results. We benchmark these SOTA techniques against a simple pruning baseline, namely, Global Magnitude Pruning (Global MP), that ranks weights in order of their magnitudes and prunes the smallest ones. Surprisingly, we find that vanilla Global MP performs very well against the SOTA techniques. When considering sparsity-accuracy trade-off, Global MP performs better than all SOTA techniques at all sparsity ratios. When considering FLOPs-accuracy trade-off, some SOTA techniques outperform Global MP at lower sparsity ratios, however, Global MP starts performing well at high sparsity ratios and performs very well at extremely high sparsity ratios. Moreover, we find that a common issue that many pruning algorithms run into at high sparsity rates, namely, layer-collapse, can be easily fixed in Global MP. We explore why layer collapse occurs in networks and how it can be mitigated in Global MP by utilizing a technique called Minimum Threshold. We showcase the above findings on various models (WRN-28-8, ResNet-32, ResNet-50, MobileNet-V1 and FastGRNN) and multiple datasets (CIFAR-10, ImageNet and HAR-2). Code is available at https://github.com/manasgupta-1/GlobalMP. △ Less

Submitted 7 January, 2024; v1 submitted 29 September, 2022; originally announced September 2022.

arXiv:2204.10614 [pdf, other]

Modelling graph dynamics in fraud detection with "Attention"

Authors: Susie Xi Rao, Clémence Lanfranchi, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Mo Cheng, Yinan Shan, Yang Zhao, Ce Zhang

Abstract: At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions. Despite the variety of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are both heterogeneous and dynamic. In this paper, we propose DyHGN (Dynamic Heterogeneous Graph N… ▽ More At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions. Despite the variety of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are both heterogeneous and dynamic. In this paper, we propose DyHGN (Dynamic Heterogeneous Graph Neural Network) and its variants to capture both temporal and heterogeneous information. We first construct dynamic heterogeneous graphs from registration and transaction data from eBay. Then, we build models with diachronic entity embedding and heterogeneous graph transformer. We also use model explainability techniques to understand the behaviors of DyHGN-* models. Our findings reveal that modelling graph dynamics with heterogeneous inputs need to be conducted with "attention" depending on the data structure, distribution, and computation cost. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: Manuscript under review. arXiv admin note: text overlap with arXiv:2012.10831

arXiv:2203.04910 [pdf, other]

doi 10.1145/3575693.3575748

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture

Authors: Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seung Won Min, Amna Masood, Jeongmin Park, **jun Xiong, CJ Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William Dally, Wen-mei Hwu

Abstract: Graphics Processing Units (GPUs) have traditionally relied on the host CPU to initiate access to the data storage. This approach is well-suited for GPU applications with known data access patterns that enable partitioning of their dataset to be processed in a pipelined fashion in the GPU. However, emerging applications such as graph and data analytics, recommender systems, or graph neural networks… ▽ More Graphics Processing Units (GPUs) have traditionally relied on the host CPU to initiate access to the data storage. This approach is well-suited for GPU applications with known data access patterns that enable partitioning of their dataset to be processed in a pipelined fashion in the GPU. However, emerging applications such as graph and data analytics, recommender systems, or graph neural networks, require fine-grained, data-dependent access to storage. CPU initiation of storage access is unsuitable for these applications due to high CPU-GPU synchronization overheads, I/O traffic amplification, and long CPU processing latencies. GPU-initiated storage removes these overheads from the storage control path and, thus, can potentially support these applications at much higher speed. However, there is a lack of systems architecture and software stack that enable efficient GPU-initiated storage access. This work presents a novel system architecture, BaM, that fills this gap. BaM features a fine-grained software cache to coalesce data storage requests while minimizing I/O traffic amplification. This software cache communicates with the storage system via high-throughput queues that enable the massive number of concurrent threads in modern GPUs to make I/O requests at a high rate to fully utilize the storage devices and the system interconnect. Experimental results show that BaM delivers 1.0x and 1.49x end-to-end speed up for BFS and CC graph analytics benchmarks while reducing hardware costs by up to 21.7x over accessing the graph data from the host memory. Furthermore, BaM speeds up data-analytics workloads by 5.3x over CPU-initiated storage access on the same hardware. △ Less

Submitted 6 February, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: This is an extension to the published conference paper at ASPLOS'23: https://dl.acm.org/doi/abs/10.1145/3575693.3575748

Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

arXiv:2203.04559 [pdf, other]

Source-free Video Domain Adaptation by Learning Temporal Consistency for Action Recognition

Authors: Yuecong Xu, Jianfei Yang, Haozhi Cao, Keyu Wu, Wu Min, Zhenghua Chen

Abstract: Video-based Unsupervised Domain Adaptation (VUDA) methods improve the robustness of video models, enabling them to be applied to action recognition tasks across different environments. However, these methods require constant access to source data during the adaptation process. Yet in many real-world applications, subjects and scenes in the source video domain should be irrelevant to those in the t… ▽ More Video-based Unsupervised Domain Adaptation (VUDA) methods improve the robustness of video models, enabling them to be applied to action recognition tasks across different environments. However, these methods require constant access to source data during the adaptation process. Yet in many real-world applications, subjects and scenes in the source video domain should be irrelevant to those in the target video domain. With the increasing emphasis on data privacy, such methods that require source data access would raise serious privacy issues. Therefore, to cope with such concern, a more practical domain adaptation scenario is formulated as the Source-Free Video-based Domain Adaptation (SFVDA). Though there are a few methods for Source-Free Domain Adaptation (SFDA) on image data, these methods yield degenerating performance in SFVDA due to the multi-modality nature of videos, with the existence of additional temporal features. In this paper, we propose a novel Attentive Temporal Consistent Network (ATCoN) to address SFVDA by learning temporal consistency, guaranteed by two novel consistency objectives, namely feature consistency and source prediction consistency, performed across local temporal features. ATCoN further constructs effective overall temporal features by attending to local temporal features based on prediction confidence. Empirical results demonstrate the state-of-the-art performance of ATCoN across various cross-domain action recognition benchmarks. △ Less

Submitted 11 July, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by ECCV 2022, update to camera-ready version with updated title. 22 pages, 5 figures, 7 tables

arXiv:2111.05894 [pdf, other]

Graph Neural Network Training with Data Tiering

Authors: Seung Won Min, Kun Wu, Mert Hidayetoğlu, **jun Xiong, Xiang Song, Wen-mei Hwu

Abstract: Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU memory capacity is limited and can be insufficient for large datasets, and 2) the graph-based data structure causes irregular data access patterns. In this work,… ▽ More Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU memory capacity is limited and can be insufficient for large datasets, and 2) the graph-based data structure causes irregular data access patterns. In this work, we provide a method to statistical analyze and identify more frequently accessed data ahead of GNN training. Our data tiering method not only utilizes the structure of input graph, but also an insight gained from actual GNN training process to achieve a higher prediction result. With our data tiering method, we additionally provide a new data placement and access strategy to further minimize the CPU-GPU communication overhead. We also take into account of multi-GPU GNN training as well and we demonstrate the effectiveness of our strategy in a multi-GPU system. The evaluation results show that our work reduces CPU-GPU traffic by 87-95% and improves the training speed of GNN over the existing solutions by 1.6-2.1x on graphs with hundreds of millions of nodes and billions of edges. △ Less

Submitted 10 November, 2021; originally announced November 2021.

arXiv:2108.13775

Discriminative Semantic Feature Pyramid Network with Guided Anchoring for Logo Detection

Authors: Baisong Zhang, Weiqing Min, **g Wang, Sujuan Hou, Qiang Hou, Yuanjie Zheng, Shuqiang Jiang

Abstract: Recently, logo detection has received more and more attention for its wide applications in the multimedia field, such as intellectual property protection, product brand management, and logo duration monitoring. Unlike general object detection, logo detection is a challenging task, especially for small logo objects and large aspect ratio logo objects in the real-world scenario. In this paper, we pr… ▽ More Recently, logo detection has received more and more attention for its wide applications in the multimedia field, such as intellectual property protection, product brand management, and logo duration monitoring. Unlike general object detection, logo detection is a challenging task, especially for small logo objects and large aspect ratio logo objects in the real-world scenario. In this paper, we propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA), which can address these challenges via aggregating the semantic information and generating different aspect ratio anchor boxes. More specifically, our approach mainly consists of Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA). Considering that low-level feature maps that are used to detect small logo objects lack semantic information, we propose the DSFP, which can enrich more discriminative semantic features of low-level feature maps and can achieve better performance on small logo objects. Furthermore, preset anchor boxes are less efficient for detecting large aspect ratio logo objects. We therefore integrate the GA into our method to generate large aspect ratio anchor boxes to mitigate this issue. Extensive experimental results on four benchmarks demonstrate the effectiveness of our proposed DSFP-GA. Moreover, we further conduct visual analysis and ablation studies to illustrate the advantage of our method in detecting small and large aspect logo objects. The code and models can be found at https://github.com/Zhangbaisong/DSFP-GA. △ Less

Submitted 6 January, 2023; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: We are very sorry that the result of the whole experiment is wrong because of the wrong derivation of Equation 3, and we would like to withdraw the manuscript to stop the propagation of the mistake

arXiv:2108.04644 [pdf, other]

doi 10.1145/3474085.3475289

FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via Multi-Scale Feature Decoupling Network

Authors: Qiang Hou, Weiqing Min, **g Wang, Sujuan Hou, Yuanjie Zheng, Shuqiang Jiang

Abstract: Food logo detection plays an important role in the multimedia for its wide real-world applications, such as food recommendation of the self-service shop and infringement detection on e-commerce platforms. A large-scale food logo dataset is urgently needed for develo** advanced food logo detection algorithms. However, there are no available food logo datasets with food brand information. To suppo… ▽ More Food logo detection plays an important role in the multimedia for its wide real-world applications, such as food recommendation of the self-service shop and infringement detection on e-commerce platforms. A large-scale food logo dataset is urgently needed for develo** advanced food logo detection algorithms. However, there are no available food logo datasets with food brand information. To support efforts towards food logo detection, we introduce the dataset FoodLogoDet-1500, a new large-scale publicly available food logo dataset, which has 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects. We describe the collection and annotation process of FoodLogoDet-1500, analyze its scale and diversity, and compare it with other logo datasets. To the best of our knowledge, FoodLogoDet-1500 is the first largest publicly available high-quality dataset for food logo detection. The challenge of food logo detection lies in the large-scale categories and similarities between food logo categories. For that, we propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet), which decouples classification and regression into two branches and focuses on the classification branch to solve the problem of distinguishing multiple food logo categories. Specifically, we introduce the feature offset module, which utilizes the deformation-learning for optimal classification offset and can effectively obtain the most representative features of classification in detection. In addition, we adopt a balanced feature pyramid in MFDNet, which pays attention to global information, balances the multi-scale feature maps, and enhances feature extraction capability. Comprehensive experiments on FoodLogoDet-1500 and other two benchmark logo datasets demonstrate the effectiveness of the proposed method. The FoodLogoDet-1500 can be found at this https URL. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: This paper has been accepted to ACM MM 2021. The FoodLogoDet-1500, see https://github.com/hq03/FoodLogoDet-1500-Dataset

arXiv:2108.02947 [pdf, other]

doi 10.1016/j.tifs.2022.02.017

A review on vision-based analysis for automatic dietary assessment

Authors: Wei Wang, Weiqing Min, Tianhao Li, Xiaoxiao Dong, Haisheng Li, Shuqiang Jiang

Abstract: Background: Maintaining a healthy diet is vital to avoid health-related issues, e.g., undernutrition, obesity and many non-communicable diseases. An indispensable part of the health diet is dietary assessment. Traditional manual recording methods are not only burdensome but time-consuming, and contain substantial biases and errors. Recent advances in Artificial Intelligence (AI), especially comput… ▽ More Background: Maintaining a healthy diet is vital to avoid health-related issues, e.g., undernutrition, obesity and many non-communicable diseases. An indispensable part of the health diet is dietary assessment. Traditional manual recording methods are not only burdensome but time-consuming, and contain substantial biases and errors. Recent advances in Artificial Intelligence (AI), especially computer vision technologies, have made it possible to develop automatic dietary assessment solutions, which are more convenient, less time-consuming and even more accurate to monitor daily food intake. Scope and approach: This review presents Vision-Based Dietary Assessment (VBDA) architectures, including multi-stage architecture and end-to-end one. The multi-stage dietary assessment generally consists of three stages: food image analysis, volume estimation and nutrient derivation. The prosperity of deep learning makes VBDA gradually move to an end-to-end implementation, which applies food images to a single network to directly estimate the nutrition. The recently proposed end-to-end methods are also discussed. We further analyze existing dietary assessment datasets, indicating that one large-scale benchmark is urgently needed, and finally highlight critical challenges and future trends for VBDA. Key findings and conclusions: After thorough exploration, we find that multi-task end-to-end deep learning approaches are one important trend of VBDA. Despite considerable research progress, many challenges remain for VBDA due to the meal complexity. We also provide the latest ideas for future development of VBDA, e.g., fine-grained food analysis and accurate volume estimation. This review aims to encourage researchers to propose more practical solutions for VBDA. △ Less

Submitted 6 March, 2022; v1 submitted 6 August, 2021; originally announced August 2021.

Comments: Accepted by Trends in Food Science & Technology

arXiv:2107.05869 [pdf, other]

doi 10.1016/j.patter.2022.100484

Applications of knowledge graphs for food science and industry

Authors: Weiqing Min, Chunlin Liu, Leyi Xu, Shuqiang Jiang

Abstract: The deployment of various networks (e.g., Internet of Things [IoT] and mobile networks), databases (e.g., nutrition tables and food compositional databases), and social media (e.g., Instagram and Twitter) generates huge amounts of food data, which present researchers with an unprecedented opportunity to study various problems and applications in food science and industry via data-driven computatio… ▽ More The deployment of various networks (e.g., Internet of Things [IoT] and mobile networks), databases (e.g., nutrition tables and food compositional databases), and social media (e.g., Instagram and Twitter) generates huge amounts of food data, which present researchers with an unprecedented opportunity to study various problems and applications in food science and industry via data-driven computational methods. However, these multi-source heterogeneous food data appear as information silos, leading to difficulty in fully exploiting these food data. The knowledge graph provides a unified and standardized conceptual terminology in a structured form, and thus can effectively organize these food data to benefit various applications. In this review, we provide a brief introduction to knowledge graphs and the evolution of food knowledge organization mainly from food ontology to food knowledge graphs. We then summarize seven representative applications of food knowledge graphs, such as new recipe development, diet-disease correlation discovery, and personalized dietary recommendation. We also discuss future directions in this field, such as multimodal food knowledge graph construction and food knowledge graphs for human health. △ Less

Submitted 16 May, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: 45 pages, 6 figures

Journal ref: Patterns Volume 3, Issue 5 (2022) 100484

arXiv:2104.13171 [pdf, other]

Structured Sparse Non-negative Matrix Factorization with L20-Norm for scRNA-seq Data Analysis

Authors: Wenwen Min, Taosheng Xu, Xiang Wan, Tsung-Hui Chang

Abstract: Non-negative matrix factorization (NMF) is a powerful tool for dimensionality reduction and clustering. Unfortunately, the interpretation of the clustering results from NMF is difficult, especially for the high-dimensional biological data without effective feature selection. In this paper, we first introduce a row-sparse NMF with $\ell_{2,0}$-norm constraint (NMF_$\ell_{20}$), where the basis matr… ▽ More Non-negative matrix factorization (NMF) is a powerful tool for dimensionality reduction and clustering. Unfortunately, the interpretation of the clustering results from NMF is difficult, especially for the high-dimensional biological data without effective feature selection. In this paper, we first introduce a row-sparse NMF with $\ell_{2,0}$-norm constraint (NMF_$\ell_{20}$), where the basis matrix $W$ is constrained by the $\ell_{2,0}$-norm, such that $W$ has a row-sparsity pattern with feature selection. It is a challenge to solve the model, because the $\ell_{2,0}$-norm is non-convex and non-smooth. Fortunately, we prove that the $\ell_{2,0}$-norm satisfies the Kurdyka-Ł{ojasiewicz} property. Based on the finding, we present a proximal alternating linearized minimization algorithm and its monotone accelerated version to solve the NMF_$\ell_{20}$ model. In addition, we also present a orthogonal NMF with $\ell_{2,0}$-norm constraint (ONMF_$\ell_{20}$) to enhance the clustering performance by using a non-negative orthogonal constraint. We propose an efficient algorithm to solve ONMF_$\ell_{20}$ by transforming it into a series of constrained and penalized matrix factorization problems. The results on numerical and scRNA-seq datasets demonstrate the efficiency of our methods in comparison with existing methods. △ Less

Submitted 27 April, 2021; originally announced April 2021.

arXiv:2103.16107 [pdf, other]

Large Scale Visual Food Recognition

Authors: Weiqing Min, Zhiling Wang, Yuxin Liu, Mengjiang Luo, Li** Kang, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang

Abstract: Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in… ▽ More Food recognition plays an important role in food choice and intake, which is essential to the health and well-being of humans. It is thus of importance to the computer vision community, and can further support many food-oriented vision and multimodal tasks. Unfortunately, we have witnessed remarkable advancements in generic visual recognition for released large-scale datasets, yet largely lags in the food domain. In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning. Furthermore, we propose a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. The former adopts improved progressive training to learn diverse and complementary local features, while the latter utilizes self-attention to incorporate richer context with multiple scales into local features for further local feature enhancement. Extensive experiments on Food2K demonstrate the effectiveness of our proposed method. More importantly, we have verified better generalization ability of Food2K in various tasks, including food recognition, food image retrieval, cross-modal recipe retrieval, food detection and segmentation. Food2K can be further explored to benefit more food-relevant tasks including emerging and more complex ones (e.g., nutritional understanding of food), and the trained models on Food2K can be expected as backbones to improve the performance of more food-relevant tasks. We also hope Food2K can serve as a large scale fine-grained visual recognition benchmark. △ Less

Submitted 26 February, 2023; v1 submitted 30 March, 2021; originally announced March 2021.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2103.03330 [pdf, other]

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

Authors: Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, **jun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

Abstract: Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the C… ▽ More Graph Convolutional Networks (GCNs) are increasingly adopted in large-scale graph-based recommender systems. Training GCN requires the minibatch generator traversing graphs and sampling the sparsely located neighboring nodes to obtain their features. Since real-world graphs often exceed the capacity of GPU memory, current GCN training systems keep the feature table in host memory and rely on the CPU to collect sparse features before sending them to the GPUs. This approach, however, puts tremendous pressure on host memory bandwidth and the CPU. This is because the CPU needs to (1) read sparse features from memory, (2) write features into memory as a dense format, and (3) transfer the features from memory to the GPUs. In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help. By removing the CPU gathering stage, our method significantly reduces the consumption of the host resources and data access latency. We further present two important techniques to achieve high host memory access efficiency by the GPU: (1) automatic data access address alignment to maximize PCIe packet efficiency, and (2) asynchronous zero-copy access and kernel execution to fully overlap data transfer with training. We incorporate our method into PyTorch and evaluate its effectiveness using several graphs with sizes up to 111 million nodes and 1.6 billion edges. In a multi-GPU training setup, our method is 65-92% faster than the conventional data transfer method, and can even match the performance of all-in-GPU-memory training for some graphs that fit in GPU memory. △ Less

Submitted 14 August, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: Paper accepted for PVLDB Vol 14

arXiv:2102.04640 [pdf, other]

Rethinking the Optimization of Average Precision: Only Penalizing Negative Instances before Positive Ones is Enough

Authors: Zhuo Li, Weiqing Min, Jiajun Song, Yaohui Zhu, Li** Kang, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang

Abstract: Optimizing the approximation of Average Precision (AP) has been widely studied for image retrieval. Limited by the definition of AP, such methods consider both negative and positive instances ranking before each positive instance. However, we claim that only penalizing negative instances before positive ones is enough, because the loss only comes from these negative instances. To this end, we prop… ▽ More Optimizing the approximation of Average Precision (AP) has been widely studied for image retrieval. Limited by the definition of AP, such methods consider both negative and positive instances ranking before each positive instance. However, we claim that only penalizing negative instances before positive ones is enough, because the loss only comes from these negative instances. To this end, we propose a novel loss, namely Penalizing Negative instances before Positive ones (PNP), which can directly minimize the number of negative instances before each positive one. In addition, AP-based methods adopt a fixed and sub-optimal gradient assignment strategy. Therefore, we systematically investigate different gradient assignment solutions via constructing derivative functions of the loss, resulting in PNP-I with increasing derivative functions and PNP-D with decreasing ones. PNP-I focuses more on the hard positive instances by assigning larger gradients to them and tries to make all relevant instances closer. In contrast, PNP-D pays less attention to such instances and slowly corrects them. For most real-world data, one class usually contains several local clusters. PNP-I blindly gathers these clusters while PNP-D keeps them as they were. Therefore, PNP-D is more superior. Experiments on three standard retrieval datasets show consistent results with the above analysis. Extensive evaluations demonstrate that PNP-D achieves the state-of-the-art performance. Code is available at https://github.com/interestingzhuo/PNPloss △ Less

Submitted 7 May, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

arXiv:2101.07956 [pdf, other]

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

Authors: Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, **jun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

Abstract: With the increasing adoption of graph neural networks (GNNs) in the machine learning community, GPUs have become an essential tool to accelerate GNN training. However, training GNNs on very large graphs that do not fit in GPU memory is still a challenging task. Unlike conventional neural networks, mini-batching input samples in GNNs requires complicated tasks such as traversing neighboring nodes a… ▽ More With the increasing adoption of graph neural networks (GNNs) in the machine learning community, GPUs have become an essential tool to accelerate GNN training. However, training GNNs on very large graphs that do not fit in GPU memory is still a challenging task. Unlike conventional neural networks, mini-batching input samples in GNNs requires complicated tasks such as traversing neighboring nodes and gathering their feature values. While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step. This "all-in-CPU" approach has negative impact on the overall GNN training performance as it over-utilizes CPU resources and hinders GPU acceleration of GNN training. To overcome such limitations, we introduce PyTorch-Direct, which enables a GPU-centric data accessing paradigm for GNN training. In PyTorch-Direct, GPUs are capable of efficiently accessing complicated data structures in host memory directly without CPU intervention. Our microbenchmark and end-to-end GNN training results show that PyTorch-Direct reduces data transfer time by 47.1% on average and speeds up GNN training by up to 1.6x. Furthermore, by reducing CPU utilization, PyTorch-Direct also saves system power by 12.4% to 17.5% during training. To minimize programmer effort, we introduce a new "unified tensor" type along with necessary changes to the PyTorch memory allocator, dispatch logic, and placement rules. As a result, users need to change at most two lines of their PyTorch GNN training code for each tensor object to take advantage of PyTorch-Direct. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2101.04285 [pdf, other]

Explainable Deep Behavioral Sequence Clustering for Transaction Fraud Detection

Authors: Wei Min, Weiming Liang, Hang Yin, Zhurong Wang, Mei Li, Alok Lal

Abstract: In e-commerce industry, user behavior sequence data has been widely used in many business units such as search and merchandising to improve their products. However, it is rarely used in financial services not only due to its 3V characteristics - i.e. Volume, Velocity and Variety - but also due to its unstructured nature. In this paper, we propose a Financial Service scenario Deep learning based Be… ▽ More In e-commerce industry, user behavior sequence data has been widely used in many business units such as search and merchandising to improve their products. However, it is rarely used in financial services not only due to its 3V characteristics - i.e. Volume, Velocity and Variety - but also due to its unstructured nature. In this paper, we propose a Financial Service scenario Deep learning based Behavior data representation method for Clustering (FinDeepBehaviorCluster) to detect fraudulent transactions. To utilize the behavior sequence data, we treat click stream data as event sequence, use time attention based Bi-LSTM to learn the sequence embedding in an unsupervised fashion, and combine them with intuitive features generated by risk experts to form a hybrid feature representation. We also propose a GPU powered HDBSCAN (pHDBSCAN) algorithm, which is an engineering optimization for the original HDBSCAN algorithm based on FAISS project, so that clustering can be carried out on hundreds of millions of transactions within a few minutes. The computation efficiency of the algorithm has increased 500 times compared with the original implementation, which makes flash fraud pattern detection feasible. Our experimental results show that the proposed FinDeepBehaviorCluster framework is able to catch missed fraudulent transactions with considerable business values. In addition, rule extraction method is applied to extract patterns from risky clusters using intuitive features, so that narrative descriptions can be attached to the risky clusters for case investigation, and unknown risk patterns can be mined for real-time fraud detection. In summary, FinDeepBehaviorCluster as a complementary risk management strategy to the existing real-time fraud detection engine, can further increase our fraud detection and proactive risk defense capabilities. △ Less

Submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted by AAAI2021 KDF Workshop

arXiv:2012.10831 [pdf, other]

Suspicious Massive Registration Detection via Dynamic Heterogeneous Graph Neural Networks

Authors: Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Mo Cheng, Yinan Shan, Yang Zhao, Ce Zhang

Abstract: Massive account registration has raised concerns on risk management in e-commerce companies, especially when registration increases rapidly within a short time frame. To monitor these registrations constantly and minimize the potential loss they might incur, detecting massive registration and predicting their riskiness are necessary. In this paper, we propose a Dynamic Heterogeneous Graph Neural N… ▽ More Massive account registration has raised concerns on risk management in e-commerce companies, especially when registration increases rapidly within a short time frame. To monitor these registrations constantly and minimize the potential loss they might incur, detecting massive registration and predicting their riskiness are necessary. In this paper, we propose a Dynamic Heterogeneous Graph Neural Network framework to capture suspicious massive registrations (DHGReg). We first construct a dynamic heterogeneous graph from the registration data, which is composed of a structural subgraph and a temporal subgraph. Then, we design an efficient architecture to predict suspicious/benign accounts. Our proposed model outperforms the baseline models and is computationally efficient in processing a dynamic heterogeneous graph constructed from a real-world dataset. In practice, the DHGReg framework would benefit the detection of suspicious registration behaviors at an early stage. △ Less

Submitted 19 December, 2020; originally announced December 2020.

Comments: 8 pages, 1 figure, accepted in the AAAI Workshop on Deep Learning on Graphs 2021

ACM Class: I.2.6

arXiv:2011.12193 [pdf, other]

doi 10.14778/3494124.3494128

xFraud: Explainable Fraud Transaction Detection

Authors: Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Zhiyao Chen, Yinan Shan, Yang Zhao, Ce Zhang

Abstract: At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly composed of a detector and an explainer. The xFraud detector can effectively and efficiently predict the legitimacy of incoming transactions. Specific… ▽ More At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly composed of a detector and an explainer. The xFraud detector can effectively and efficiently predict the legitimacy of incoming transactions. Specifically, it utilizes a heterogeneous graph neural network to learn expressive representations from the informative heterogeneously typed entities in the transaction logs. The explainer in xFraud can generate meaningful and human-understandable explanations from graphs to facilitate further processes in the business unit. In our experiments with xFraud on real transaction networks with up to 1.1 billion nodes and 3.7 billion edges, xFraud is able to outperform various baseline models in many evaluation metrics while remaining scalable in distributed settings. In addition, we show that xFraud explainer can generate reasonable explanations to significantly assist the business analysis via both quantitative and qualitative evaluations. △ Less

Submitted 25 May, 2022; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: This is the extended version of a full paper to appear in PVLDB 15 (3) (VLDB 2022)

ACM Class: I.2.6

arXiv:2008.10169 [pdf, other]

Tearing Down the Memory Wall

Authors: Zaid Qureshi, Vikram Sharma Mailthody, Seung Won Min, I-Hsin Chung, **jun Xiong, Wen-mei Hwu

Abstract: We present a vision for the Erudite architecture that redefines the compute and memory abstractions such that memory bandwidth and capacity become first-class citizens along with compute throughput. In this architecture, we envision coupling a high-density, massively parallel memory technology like Flash with programmable near-data accelerators, like the streaming multiprocessors in modern GPUs. E… ▽ More We present a vision for the Erudite architecture that redefines the compute and memory abstractions such that memory bandwidth and capacity become first-class citizens along with compute throughput. In this architecture, we envision coupling a high-density, massively parallel memory technology like Flash with programmable near-data accelerators, like the streaming multiprocessors in modern GPUs. Each accelerator has a local pool of storage-class memory that it can access at high throughput by initiating very large numbers of overlap** requests that help to tolerate long access latency. The accelerators can also communicate with each other and remote memory through a high-throughput low-latency interconnect. As a result, systems based on the Erudite architecture scale compute and memory bandwidth at the same rate, tearing down the notorious memory wall that has plagued computer architecture for generations. In this paper, we present the motivation, rationale, design, benefit, and research challenges for Erudite. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Comments: SRC Techcon 2020 paper. Discusses vision of GPU-Centric architecture, Erudite

arXiv:2008.07960 [pdf, other]

Dataset Bias in Few-shot Image Recognition

Authors: Shuqiang Jiang, Yaohui Zhu, Chenlong Liu, Xinhang Song, Xiangyang Li, Weiqing Min

Abstract: The goal of few-shot image recognition (FSIR) is to identify novel categories with a small number of annotated samples by exploiting transferable knowledge from training data (base categories). Most current studies assume that the transferable knowledge can be well used to identify novel categories. However, such transferable capability may be impacted by the dataset bias, and this problem has rar… ▽ More The goal of few-shot image recognition (FSIR) is to identify novel categories with a small number of annotated samples by exploiting transferable knowledge from training data (base categories). Most current studies assume that the transferable knowledge can be well used to identify novel categories. However, such transferable capability may be impacted by the dataset bias, and this problem has rarely been investigated before. Besides, most of few-shot learning methods are biased to different datasets, which is also an important issue that needs to be investigated deeply. In this paper, we first investigate the impact of transferable capabilities learned from base categories. Specifically, we use the relevance to measure relationships between base categories and novel categories. Distributions of base categories are depicted via the instance density and category diversity. The FSIR model learns better transferable knowledge from relevant training data. In the relevant data, dense instances or diverse categories can further enrich the learned knowledge. Experimental results on different sub-datasets of ImagNet demonstrate category relevance, instance density and category diversity can depict transferable bias from base categories. Second, we investigate performance differences on different datasets from dataset structures and different few-shot learning methods. Specifically, we introduce image complexity, intra-concept visual consistency, and inter-concept visual similarity to quantify characteristics of dataset structures. We use these quantitative characteristics and four few-shot learning methods to analyze performance differences on five different datasets. Based on the experimental analysis, some insightful observations are obtained from the perspective of both dataset structures and few-shot learning methods. We hope these observations are useful to guide future FSIR research. △ Less

Submitted 15 March, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

arXiv:2008.05655 [pdf, other]

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Authors: Weiqing Min, Linhu Liu, Zhiling Wang, Zhengdong Luo, Xiaoming Wei, Xiaolin Wei, Shuqiang Jiang

Abstract: Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for develo** advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in… ▽ More Food recognition has received more and more attention in the multimedia community for its various real-world applications, such as diet management and self-service restaurants. A large-scale ontology of food images is urgently needed for develo** advanced large-scale food recognition algorithms, as well as for providing the benchmark dataset for such algorithms. To encourage further progress in food recognition, we introduce the dataset ISIA Food- 500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume. Furthermore, we propose a stacked global-local attention network, which consists of two sub-networks for food recognition. One subnetwork first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale discriminative features from multiple layers into global-level representation (e.g., texture and shape information about food). The other one generates attentional regions (e.g., ingredient relevant regions) from different regions via cascaded spatial transformers, and further aggregates these multi-scale regional features from different layers into local-level representation. These two types of features are finally fused as comprehensive representation for food recognition. Extensive experiments on ISIA Food-500 and other two popular benchmark datasets demonstrate the effectiveness of our proposed method, and thus can be considered as one strong baseline. The dataset, code and models can be found at http://123.57.42.89/FoodComputing-Dataset/ISIA-Food500.html. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: Accepted by ACM Multimedia 2020

arXiv:2008.05359 [pdf, other]

LogoDet-3K: A Large-Scale Image Dataset for Logo Detection

Authors: **g Wang, Weiqing Min, Sujuan Hou, Shengnan Ma, Yuanjie Zheng, Shuqiang Jiang

Abstract: Logo detection has been gaining considerable attention because of its wide range of applications in the multimedia field, such as copyright infringement detection, brand visibility monitoring, and product brand management on social media. In this paper, we introduce LogoDet-3K, the largest logo detection dataset with full annotation, which has 3,000 logo categories, about 200,000 manually annotate… ▽ More Logo detection has been gaining considerable attention because of its wide range of applications in the multimedia field, such as copyright infringement detection, brand visibility monitoring, and product brand management on social media. In this paper, we introduce LogoDet-3K, the largest logo detection dataset with full annotation, which has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images. LogoDet-3K creates a more challenging benchmark for logo detection, for its higher comprehensive coverage and wider variety in both logo categories and annotated objects compared with existing datasets. We describe the collection and annotation process of our dataset, analyze its scale and diversity in comparison to other datasets for logo detection. We further propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the state-of-the-art YOLOv3 framework for large-scale logo detection. Logo-Yolo can solve the problems of multi-scale objects, logo sample imbalance and inconsistent bounding-box regression. It obtains about 4% improvement on the average performance compared with YOLOv3, and greater improvements compared with reported several deep detection models on LogoDet-3K. The evaluations on other three existing datasets further verify the effectiveness of our method, and demonstrate better generalization ability of LogoDet-3K on logo detection and retrieval tasks. The LogoDet-3K dataset is used to promote large-scale logo-related research and it can be found at https://github.com/Wang**g1551/LogoDet-3K-Dataset. △ Less

Submitted 12 August, 2020; originally announced August 2020.

arXiv:2006.06890 [pdf, other]

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Authors: Seung Won Min, Vikram Sharma Mailthody, Zaid Qureshi, **jun Xiong, Eiman Ebrahimi, Wen-mei Hwu

Abstract: Modern analytics and recommendation systems are increasingly based on graph data that capture the relations between entities being analyzed. Practical graphs come in huge sizes, offer massive parallelism, and are stored in sparse-matrix formats such as CSR. To exploit the massive parallelism, developers are increasingly interested in using GPUs for graph traversal. However, due to their sizes, gra… ▽ More Modern analytics and recommendation systems are increasingly based on graph data that capture the relations between entities being analyzed. Practical graphs come in huge sizes, offer massive parallelism, and are stored in sparse-matrix formats such as CSR. To exploit the massive parallelism, developers are increasingly interested in using GPUs for graph traversal. However, due to their sizes, graphs often do not fit into the GPU memory. Prior works have either used input data pre-processing/partitioning or UVM to migrate chunks of data from the host memory to the GPU memory. However, the large, multi-dimensional, and sparse nature of graph data presents a major challenge to these schemes and results in significant amplification of data movement and reduced effective data throughput. In this work, we propose EMOGI, an alternative approach to traverse graphs that do not fit in GPU memory using direct cacheline-sized access to data stored in host memory. This paper addresses the open question of whether a sufficiently large number of overlap** cacheline-sized accesses can be sustained to 1) tolerate the long latency to host memory, 2) fully utilize the available bandwidth, and 3) achieve favorable execution performance. We analyze the data access patterns of several graph traversal applications in GPU over PCIe using an FPGA to understand the cause of poor external bandwidth utilization. By carefully coalescing and aligning external memory requests, we show that we can minimize the number of PCIe transactions and nearly fully utilize the PCIe bandwidth even with direct cache-line accesses to the host memory. EMOGI achieves 2.92$\times$ speedup on average compared to the optimized UVM implementations in various graph traversal applications. We also show that EMOGI scales better than a UVM-based solution when the system uses higher bandwidth interconnects such as PCIe 4.0. △ Less

Submitted 14 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:1911.07924 [pdf, other]

Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Authors: **g Wang, Weiqing Min, Sujuan Hou, Shengnan Ma, Yuanjie Zheng, Haishuai Wang, Shuqiang Jiang

Abstract: Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support eff… ▽ More Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classification task, we have curated a dataset, Logo-2K+, a new large-scale publicly available real-world logo dataset with 2,341 categories and 167,140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classification. DRNA-Net consists of four sub-networks: the navigator sub-network first selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its confidence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region crop** and region drop**. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classification. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https://github.com/msn199959/Logo-2k-plus-Dataset. △ Less

Submitted 11 November, 2019; originally announced November 2019.

Comments: Accepted by AAAI2020

arXiv:1910.04499 [pdf, other]

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

Authors: Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yu**g Wang, Bin Cui, Ce Zhang

Abstract: Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem. In this work, we first characterize this phenomenon from the information-theoretic perspective and show that under certain conditions, the mutual information between the output after $l$ layers and the input of GCN… ▽ More Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem. In this work, we first characterize this phenomenon from the information-theoretic perspective and show that under certain conditions, the mutual information between the output after $l$ layers and the input of GCN converges to 0 exponentially with respect to $l$. We also show that, on the other hand, graph decomposition can potentially weaken the condition of such convergence rate, which enabled our analysis for GraphCNN. While different graph structures can only benefit from the corresponding decomposition, in practice, we propose an automatic connectivity-aware graph decomposition algorithm, DeGNN, to improve the performance of general graph neural networks. Extensive experiments on widely adopted benchmark datasets demonstrate that DeGNN can not only significantly boost the performance of corresponding GNNs, but also achieves the state-of-the-art performances. △ Less

Submitted 29 June, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: 20 pages, 5 figures, 5 tables

arXiv:1908.01261 [pdf, other]

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device

Authors: Seung Won Min, Sitao Huang, Mohamed El-Hadedy, **jun Xiong, Deming Chen, Wen-mei Hwu

Abstract: Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patter… ▽ More Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA devices provide tighter integrations between software running on CPUs and hardware accelerators. Modern heterogeneous SoC-FPGA platforms support multiple I/O cache coherence options between CPUs and FPGAs, but these options can have inadvertent effects on the achieved bandwidths depending on applications and data access patterns. To provide the most efficient communications between CPUs and accelerators, understanding the data transaction behaviors and selecting the right I/O cache coherence method is essential. In this paper, we use Xilinx Zynq UltraScale+ as the SoC platform to show how certain I/O cache coherence method can perform better or worse in different situations, ultimately affecting the overall accelerator performances as well. Based on our analysis, we further explore possible software and hardware modifications to improve the I/O performances with different I/O cache coherence options. With our proposed modifications, the overall performance of SoC design can be averagely improved by 20%. △ Less

Submitted 3 August, 2019; originally announced August 2019.

arXiv:1905.06269 [pdf, other]

Food Recommendation: Framework, Existing Solutions and Challenges

Authors: Weiqing Min, Shuqiang Jiang, Ramesh Jain

Abstract: A growing proportion of the global population is becoming overweight or obese, leading to various diseases (e.g., diabetes, ischemic heart disease and even cancer) due to unhealthy eating patterns, such as increased intake of food with high energy and high fat. Food recommendation is of paramount importance to alleviate this problem. Unfortunately, modern multimedia research has enhanced the perfo… ▽ More A growing proportion of the global population is becoming overweight or obese, leading to various diseases (e.g., diabetes, ischemic heart disease and even cancer) due to unhealthy eating patterns, such as increased intake of food with high energy and high fat. Food recommendation is of paramount importance to alleviate this problem. Unfortunately, modern multimedia research has enhanced the performance and experience of multimedia recommendation in many fields such as movies and POI, yet largely lags in the food domain. This article proposes a unified framework for food recommendation, and identifies main issues affecting food recommendation including building the personal model, analyzing unique food characteristics, incorporating various context and domain knowledge. We then review existing solutions for these issues, and finally elaborate research challenges and future directions in this field. To our knowledge, this is the first survey that targets the study of food recommendation in the multimedia field and offers a collection of research studies and technologies to benefit researchers in this field. △ Less

Submitted 19 November, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: Accepted by IEEE Transactions on Multimedia

arXiv:1810.09833 [pdf, other]

Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category Prediction

Authors: Shuqiang Jiang, Weiqing Min, Shuhuan Mei

Abstract: In this work, we focus on visual venue category prediction, which can facilitate various applications for location-based service and personalization. Considering that the complementarity of different media platforms, it is reasonable to leverage venue-relevant media data from different platforms to boost the prediction performance. Intuitively, recognizing one venue category involves multiple sema… ▽ More In this work, we focus on visual venue category prediction, which can facilitate various applications for location-based service and personalization. Considering that the complementarity of different media platforms, it is reasonable to leverage venue-relevant media data from different platforms to boost the prediction performance. Intuitively, recognizing one venue category involves multiple semantic cues, especially objects and scenes, and thus they should contribute together to venue category prediction. In addition, these venues can be organized in a natural hierarchical structure, which provides prior knowledge to guide venue category estimation. Taking these aspects into account, we propose a Hierarchy-dependent Cross-platform Multi-view Feature Learning (HCM-FL) framework for venue category prediction from videos by leveraging images from other platforms. HCM-FL includes two major components, namely Cross-Platform Transfer Deep Learning (CPTDL) and Multi-View Feature Learning with the Hierarchical Venue Structure (MVFL-HVS). CPTDL is capable of reinforcing the learned deep network from videos using images from other platforms. Specifically, CPTDL first trained a deep network using videos. These images from other platforms are filtered by the learnt network and these selected images are then fed into this learnt network to enhance it. Two kinds of pre-trained networks on the ImageNet and Places dataset are employed. MVFL-HVS is then developed to enable multi-view feature fusion. It is capable of embedding the hierarchical structure ontology to support more discriminative joint feature learning. We conduct the experiment on videos from Vine and images from Foursqure. These experimental results demonstrate the advantage of our proposed framework. △ Less

Submitted 23 October, 2018; originally announced October 2018.

Comments: Accepted by IEEE Transactions on Multimedia

arXiv:1808.07202 [pdf, other]

A Survey on Food Computing

Authors: Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, Ramesh Jain

Abstract: Food is very essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture. With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food… ▽ More Food is very essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture. With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries, leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in medicine, biology, gastronomy and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, vast amounts of work has been conducted in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews, which shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this paper, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields. △ Less

Submitted 16 July, 2019; v1 submitted 21 August, 2018; originally announced August 2018.

Comments: Accepted by ACM Computing Surveys

arXiv:1808.05329 [pdf, other]

Sequential Behavioral Data Processing Using Deep Learning and the Markov Transition Field in Online Fraud Detection

Authors: Ruinan Zhang, Fanglan Zheng, Wei Min

Abstract: Due to the popularity of the Internet and smart mobile devices, more and more financial transactions and activities have been digitalized. Compared to traditional financial fraud detection strategies using credit-related features, customers are generating a large amount of unstructured behavioral data every second. In this paper, we propose an Recurrent Neural Netword (RNN) based deep-learning str… ▽ More Due to the popularity of the Internet and smart mobile devices, more and more financial transactions and activities have been digitalized. Compared to traditional financial fraud detection strategies using credit-related features, customers are generating a large amount of unstructured behavioral data every second. In this paper, we propose an Recurrent Neural Netword (RNN) based deep-learning structure integrated with Markov Transition Field (MTF) for predicting online fraud behaviors using customer's interactions with websites or smart-phone apps as a series of states. In practice, we tested and proved that the proposed network structure for processing sequential behavioral data could significantly boost fraud predictive ability comparing with the multilayer perceptron network and distance based classifier with Dynamic Time War**(DTW) as distance metric. △ Less

Submitted 15 August, 2018; originally announced August 2018.

Comments: KDD2018 Data Science in Fintech Workshop Paper

arXiv:1807.10956 [pdf, other]

Group-sparse SVD Models and Their Applications in Biological Data

Authors: Wenwen Min, Juan Liu, Shihua Zhang

Abstract: Sparse Singular Value Decomposition (SVD) models have been proposed for biclustering high dimensional gene expression data to identify block patterns with similar expressions. However, these models do not take into account prior group effects upon variable selection. To this end, we first propose group-sparse SVD models with group Lasso (GL1-SVD) and group L0-norm penalty (GL0-SVD) for non-overlap… ▽ More Sparse Singular Value Decomposition (SVD) models have been proposed for biclustering high dimensional gene expression data to identify block patterns with similar expressions. However, these models do not take into account prior group effects upon variable selection. To this end, we first propose group-sparse SVD models with group Lasso (GL1-SVD) and group L0-norm penalty (GL0-SVD) for non-overlap** group structure of variables. However, such group-sparse SVD models limit their applicability in some problems with overlap** structure. Thus, we also propose two group-sparse SVD models with overlap** group Lasso (OGL1-SVD) and overlap** group L0-norm penalty (OGL0-SVD). We first adopt an alternating iterative strategy to solve GL1-SVD based on a block coordinate descent method, and GL0-SVD based on a projection method. The key of solving OGL1-SVD is a proximal operator with overlap** group Lasso penalty. We employ an alternating direction method of multipliers (ADMM) to solve the proximal operator. Similarly, we develop an approximate method to solve OGL0-SVD. Applications of these methods and comparison with competing ones using simulated data demonstrate their effectiveness. Extensive applications of them onto several real gene expression data with gene prior group knowledge identify some biologically interpretable gene modules. △ Less

Submitted 28 July, 2018; originally announced July 2018.

Comments: 14 pages, 4 figures

arXiv:1801.07239 [pdf, other]

Food recognition and recipe analysis: integrating visual content, context and external knowledge

Authors: Luis Herranz, Weiqing Min, Shuqiang Jiang

Abstract: The central role of food in our individual and social life, combined with recent technological advances, has motivated a growing interest in applications that help to better monitor dietary habits as well as the exploration and retrieval of food-related information. We review how visual content, context and external knowledge can be integrated effectively into food-oriented applications, with spec… ▽ More The central role of food in our individual and social life, combined with recent technological advances, has motivated a growing interest in applications that help to better monitor dietary habits as well as the exploration and retrieval of food-related information. We review how visual content, context and external knowledge can be integrated effectively into food-oriented applications, with special focus on recipe analysis and retrieval, food recommendation, and the restaurant context as emerging directions. △ Less

Submitted 22 January, 2018; originally announced January 2018.

Comments: Survey about contextual food recognition and multimodal recipe analysis

arXiv:1710.04792 [pdf, other]

Sparse Weighted Canonical Correlation Analysis

Authors: Wenwen Min, Juan Liu, Shihua Zhang

Abstract: Given two data matrices $X$ and $Y$, sparse canonical correlation analysis (SCCA) is to seek two sparse canonical vectors $u$ and $v$ to maximize the correlation between $Xu$ and $Yv$. However, classical and sparse CCA models consider the contribution of all the samples of data matrices and thus cannot identify an underlying specific subset of samples. To this end, we propose a novel sparse weight… ▽ More Given two data matrices $X$ and $Y$, sparse canonical correlation analysis (SCCA) is to seek two sparse canonical vectors $u$ and $v$ to maximize the correlation between $Xu$ and $Yv$. However, classical and sparse CCA models consider the contribution of all the samples of data matrices and thus cannot identify an underlying specific subset of samples. To this end, we propose a novel sparse weighted canonical correlation analysis (SWCCA), where weights are used for regularizing different samples. We solve the $L_0$-regularized SWCCA ($L_0$-SWCCA) using an alternating iterative algorithm. We apply $L_0$-SWCCA to synthetic data and real-world data to demonstrate its effectiveness and superiority compared to related methods. Lastly, we consider also SWCCA with different penalties like LASSO (Least absolute shrinkage and selection operator) and Group LASSO, and extend it for integrating more than three data matrices. △ Less

Submitted 12 October, 2017; originally announced October 2017.

Comments: 8 pages, 5 figures

ACM Class: I.5.1; H.2.8; G.1.6

arXiv:1609.06480 [pdf, ps, other]

Network-regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery

Authors: Wenwen Min, Juan Liu, Shihua Zhang

Abstract: Molecular profiling data (e.g., gene expression) has been used for clinical risk prediction and biomarker discovery. However, it is necessary to integrate other prior knowledge like biological pathways or gene interaction networks to improve the predictive ability and biological interpretability of biomarkers. Here, we first introduce a general regularized Logistic Regression (LR) framework with r… ▽ More Molecular profiling data (e.g., gene expression) has been used for clinical risk prediction and biomarker discovery. However, it is necessary to integrate other prior knowledge like biological pathways or gene interaction networks to improve the predictive ability and biological interpretability of biomarkers. Here, we first introduce a general regularized Logistic Regression (LR) framework with regularized term $λ\|\bm{w}\|_1 + η\bm{w}^T\bm{M}\bm{w}$, which can reduce to different penalties, including Lasso, elastic net, and network-regularized terms with different $\bm{M}$. This framework can be easily solved in a unified manner by a cyclic coordinate descent algorithm which can avoid inverse matrix operation and accelerate the computing speed. However, if those estimated $\bm{w}_i$ and $\bm{w}_j$ have opposite signs, then the traditional network-regularized penalty may not perform well. To address it, we introduce a novel network-regularized sparse LR model with a new penalty $λ\|\bm{w}\|_1 + η|\bm{w}|^T\bm{M}|\bm{w}|$ to consider the difference between the absolute values of the coefficients. And we develop two efficient algorithms to solve it. Finally, we test our methods and compare them with the related ones using simulated and real data to show their efficiency. △ Less

Submitted 21 September, 2016; originally announced September 2016.

Comments: 10 pages, 3 figures

ACM Class: J.3; H.2.8; G.1.6; I.5

arXiv:1603.06035 [pdf, other]

L0-norm Sparse Graph-regularized SVD for Biclustering

Authors: Wenwen Min, Juan Liu, Shihua Zhang

Abstract: Learning the "blocking" structure is a central challenge for high dimensional data (e.g., gene expression data). Recently, a sparse singular value decomposition (SVD) has been used as a biclustering tool to achieve this goal. However, this model ignores the structural information between variables (e.g., gene interaction graph). Although typical graph-regularized norm can incorporate such prior gr… ▽ More Learning the "blocking" structure is a central challenge for high dimensional data (e.g., gene expression data). Recently, a sparse singular value decomposition (SVD) has been used as a biclustering tool to achieve this goal. However, this model ignores the structural information between variables (e.g., gene interaction graph). Although typical graph-regularized norm can incorporate such prior graph information to get accurate discovery and better interpretability, it fails to consider the opposite effect of variables with different signs. Motivated by the development of sparse coding and graph-regularized norm, we propose a novel sparse graph-regularized SVD as a powerful biclustering tool for analyzing high-dimensional data. The key of this method is to impose two penalties including a novel graph-regularized norm ($|\pmb{u}|\pmb{L}|\pmb{u}|$) and $L_0$-norm ($\|\pmb{u}\|_0$) on singular vectors to induce structural sparsity and enhance interpretability. We design an efficient Alternating Iterative Sparse Projection (AISP) algorithm to solve it. Finally, we apply our method and related ones to simulated and real data to show its efficiency in capturing natural blocking structures. △ Less

Submitted 18 March, 2016; originally announced March 2016.

Comments: 8 pages, 2 figures

ACM Class: I.5.1, I.5.3, H.2.8

arXiv:1511.09368 [pdf, ps, other]

A neurodynamic framework for local community extraction in networks

Authors: Shihua Zhang, Guanghua Hu, Wenwen Min

Abstract: To understand the structure and organization of a large-scale social, biological or technological network, it can be helpful to describe and extract local communities or modules of the network. In this article, we develop a neurodynamic framework to describe the local communities which correspond to the stable states of a neuro-system built based on the network. The quantitative criteria to descri… ▽ More To understand the structure and organization of a large-scale social, biological or technological network, it can be helpful to describe and extract local communities or modules of the network. In this article, we develop a neurodynamic framework to describe the local communities which correspond to the stable states of a neuro-system built based on the network. The quantitative criteria to describe the neurodynamic system can cover a large range of objective functions. The resolution limit of these functions enable us to propose a generic criterion to explore multi-resolution local communities. We explain the advantages of this framework and illustrate them by testing on a number of model and real-world networks. △ Less

Submitted 30 November, 2015; originally announced November 2015.

Comments: 4 figures

arXiv:1507.05696 [pdf, ps, other]

Testing and Performance of UFFO Burst Alert & Trigger Telescope

Authors: J. Ripa, M. B. Kim, J. Lee, I. H. Park, J. E. Kim, H. Lim, S. Jeong, A. J. Castro-Tirado, P. H. Connell, C. Eyles, V. Reglero, J. M. Rodrigo, V. Bogomolov, M. I. Panasyuk, V. Petrov, S. Svertilov, I. Yashin, S. Brandt, C. Budtz-Jorgensen, Y. -Y. Chang, P. Chen, M. A. Huang, T. -C. Liu, J. W. Nam, M. -Z. Wang , et al. (4 additional authors not shown)

Abstract: The Ultra-Fast Flash Observatory pathfinder (UFFO-p) is a new space mission dedicated to detect Gamma-Ray Bursts (GRBs) and rapidly follow their afterglows in order to provide early optical/ultraviolet measurements. A GRB location is determined in a few seconds by the UFFO Burst Alert & Trigger telescope (UBAT) employing the coded mask imaging technique and the detector combination of Yttrium Oxyo… ▽ More The Ultra-Fast Flash Observatory pathfinder (UFFO-p) is a new space mission dedicated to detect Gamma-Ray Bursts (GRBs) and rapidly follow their afterglows in order to provide early optical/ultraviolet measurements. A GRB location is determined in a few seconds by the UFFO Burst Alert & Trigger telescope (UBAT) employing the coded mask imaging technique and the detector combination of Yttrium Oxyorthosilicate (YSO) scintillating crystals and multi-anode photomultiplier tubes. The results of the laboratory tests of UBAT's functionality and performance are described in this article. The detector setting, the pixel-to-pixel response to X-rays of different energies, the imaging capability for <50 keV X-rays, the localization accuracy measurements, and the combined test with the Block for X-ray and Gamma-Radiation Detection (BDRG) scintillator detector to check the efficiency of UBAT are all described. The UBAT instrument has been assembled and integrated with other equipment on UFFO-p and should be launched on board the Lomonosov satellite in late-2015. △ Less

Submitted 20 July, 2015; originally announced July 2015.

Comments: journal: Proceedings of Science, Swift: 10 Years of Discovery; conference date: 2-5 December 2014; location: La Sapienza University, Rome, Italy; 7 pages, 4 figures; accepted for publication in July 9 2015

Journal ref: Proceedings of Science 233 (2015) 102

Showing 1–50 of 67 results for author: Min, W