Search | arXiv e-print repository

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Authors: Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir

Abstract: Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial intera… ▽ More Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters. △ Less

Submitted 14 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, developed under Penn State University's Multi-Campus Research Experience for Undergraduates Symposium, 2023. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.03042 [pdf, other]

doi 10.1109/SBGames54170.2021.00023

Perception of Personality Traits in Crowds of Virtual Humans

Authors: Lucas Nardino, Enzo Krzmienszki, Vinícius Jurinic Cassol, Diogo Schaffer, Victor Flávio de Andrade Araujo, Rodolfo Migon Favaretto, Felipe Elsner, Gabriel Fonseca Silva, Soraia Raupp Musse

Abstract: This paper proposes a perceptual visual analysis regarding the personality of virtual humans. Many studies have presented findings regarding the way human beings perceive virtual humans with respect to their faces, body animation, motion in the virtual environment and etc. We are interested in investigating the way people perceive visual manifestations of virtual humans' personality traits when th… ▽ More This paper proposes a perceptual visual analysis regarding the personality of virtual humans. Many studies have presented findings regarding the way human beings perceive virtual humans with respect to their faces, body animation, motion in the virtual environment and etc. We are interested in investigating the way people perceive visual manifestations of virtual humans' personality traits when they are interactive and organized in groups. Many applications in games and movies can benefit from the findings regarding the perceptual analysis with the main goal to provide more realistic characters and improve the users' experience. We provide experiments with subjects and obtained results indicate that, although is very subtle, people perceive more the extraversion (the personality trait that we measured), into the crowds of virtual humans, when interacting with virtual humans behaviors, than when just observing as a spectator camera. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures, 1 table

arXiv:1906.11715 [pdf, other]

Evaluating data-flow coverage in spectrum-based fault localization

Authors: Henrique Lemos Ribeiro, Higor Amario de Souza, Roberto Paulo de Andrioli Araujo, Marcos Lordello Chaim, Fabio Kon

Abstract: Background: Debugging is a key task during the software development cycle. Spectrum-based Fault Localization (SFL) is a promising technique to improve and automate debugging. SFL techniques use control-flow spectra to pinpoint the most suspicious program elements. However, data-flow spectra provide more detailed information about the program execution, which may be useful for fault localization. A… ▽ More Background: Debugging is a key task during the software development cycle. Spectrum-based Fault Localization (SFL) is a promising technique to improve and automate debugging. SFL techniques use control-flow spectra to pinpoint the most suspicious program elements. However, data-flow spectra provide more detailed information about the program execution, which may be useful for fault localization. Aims: We evaluate the effectiveness and efficiency of ten SFL ranking metrics using data-flow spectra. Method: We compare the performance of data- and control-flow spectra for SFL using 163 faults from 5 real-world open source programs, which contain from 468 to 4130 test cases. The data- and control-flow spectra types used in our evaluation are definition-use associations (DUAs) and lines, respectively. Results: Using data-flow spectra, up to 50% more faults are ranked in the top-15 positions compared to control-flow spectra. Also, most SFL ranking metrics present better effectiveness using data-flow to inspect up to the top-40 positions. The execution cost of data-flow spectra is higher than control-flow, taking from 22 seconds to less than 9 minutes. Data-flow has an average overhead of 353% for all programs, while the average overhead for control-flow is of 102%. Conclusions: The results suggest that SFL techniques can benefit from using data-flow spectra to classify faults in better positions, which may lead developers to inspect less code to find bugs. The execution cost to gather data-flow is higher compared to control-flow, but it is not prohibitive. Moreover, data-flow spectra also provide information about suspicious variables for fault localization, which may improve the developers' performance using SFL. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: 13th International Symposium on Empirical Software Engineering and Measurement (ESEM 2019)

arXiv:1605.03804 [pdf, other]

doi 10.1016/j.neucom.2016.03.099

A Mid-level Video Representation based on Binary Descriptors: A Case Study for Pornography Detection

Authors: Carlos Caetano, Sandra Avila, William Robson Schwartz, Silvio Jamil F. Guimarães, Arnaldo de A. Araújo

Abstract: With the growing amount of inappropriate content on the Internet, such as pornography, arises the need to detect and filter such material. The reason for this is given by the fact that such content is often prohibited in certain environments (e.g., schools and workplaces) or for certain publics (e.g., children). In recent years, many works have been mainly focused on detecting pornographic images… ▽ More With the growing amount of inappropriate content on the Internet, such as pornography, arises the need to detect and filter such material. The reason for this is given by the fact that such content is often prohibited in certain environments (e.g., schools and workplaces) or for certain publics (e.g., children). In recent years, many works have been mainly focused on detecting pornographic images and videos based on visual content, particularly on the detection of skin color. Although these approaches provide good results, they generally have the disadvantage of a high false positive rate since not all images with large areas of skin exposure are necessarily pornographic images, such as people wearing swimsuits or images related to sports. Local feature based approaches with Bag-of-Words models (BoW) have been successfully applied to visual recognition tasks in the context of pornography detection. Even though existing methods provide promising results, they use local feature descriptors that require a high computational processing time yielding high-dimensional vectors. In this work, we propose an approach for pornography detection based on local binary feature extraction and BossaNova image representation, a BoW model extension that preserves more richly the visual information. Moreover, we propose two approaches for video description based on the combination of mid-level representations namely BossaNova Video Descriptor (BNVD) and BoW Video Descriptor (BoW-VD). The proposed techniques are promising, achieving an accuracy of 92.40%, thus reducing the classification error by 16% over the current state-of-the-art local features approach on the Pornography dataset. △ Less

Submitted 12 May, 2016; originally announced May 2016.

Comments: Manuscript accepted at Elsevier Neurocomputing

Showing 1–4 of 4 results for author: Araujo, D A