-
Full reference point cloud quality assessment using support vector regression
Authors:
Ryosuke Watanabe,
Shashank N. Sridhara,
Haoran Hong,
Eduardo Pavez,
Keisuke Nonaka,
Tatsuya Kobayashi,
Antonio Ortega
Abstract:
Point clouds are a general format for representing realistic 3D objects in diverse 3D applications. Since point clouds have large data sizes, develo** efficient point cloud compression methods is crucial. However, excessive compression leads to various distortions, which deteriorates the point cloud quality perceived by end users. Thus, establishing reliable point cloud quality assessment (PCQA)…
▽ More
Point clouds are a general format for representing realistic 3D objects in diverse 3D applications. Since point clouds have large data sizes, develo** efficient point cloud compression methods is crucial. However, excessive compression leads to various distortions, which deteriorates the point cloud quality perceived by end users. Thus, establishing reliable point cloud quality assessment (PCQA) methods is essential as a benchmark to develop efficient compression methods. This paper presents an accurate full-reference point cloud quality assessment (FR-PCQA) method called full-reference quality assessment using support vector regression (FRSVR) for various types of degradations such as compression distortion, Gaussian noise, and down-sampling. The proposed method demonstrates accurate PCQA by integrating five FR-based metrics covering various types of errors (e.g., considering geometric distortion, color distortion, and point count) using support vector regression (SVR). Moreover, the proposed method achieves a superior trade-off between accuracy and calculation speed because it includes only the calculation of these five simple metrics and SVR, which can perform fast prediction. Experimental results with three types of open datasets show that the proposed method is more accurate than conventional FR-PCQA methods. In addition, the proposed method is faster than state-of-the-art methods that utilize complicated features such as curvature and multi-scale features. Thus, the proposed method provides excellent performance in terms of the accuracy of PCQA and processing speed. Our method is available from https://github.com/STAC-USC/FRSVR-PCQA.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Extraction of In-Phase and Quadrature Components by Time-Encoding Sampling
Authors:
Y. H. Shao,
S. Y. Chen,
H. Z. Yang,
F. Xi,
H. Hong,
Z. Liu
Abstract:
Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper…
▽ More
Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper-edge frequency and amplitude as in the case of bandlimited/lowpass signals. We show that the I and Q components can be perfectly reconstructed from the TEM measurements if the minimum firing rate is equal to the Landau's rate of the signal. For the reconstruction of I and Q components, we develop an alternating projection onto convex sets (POCS) algorithm in which two POCS algorithms are alternately iterated. For the algorithm analysis, we define a solution space of vector-valued signals and prove that the proposed reconstruction algorithm converges to the correct unique solution in the noiseless case. The proposed TEM can operate regardless of the center frequencies of the bandpass signals. This is quite different from traditional bandpass sampling, where the center frequency should be carefully allocated for Landau's rate and its variations have the negative effect on the sampling performance. In addition, the proposed TEM achieves certain reconstructed signal-to-noise-plus-distortion ratios for small firing rates in thermal noise, which is unavoidably present and will be aliased to the Nyquist band in the traditional sampling such that high sampling rates are required. We demonstrate the reconstruction performance and substantiate our claims via simulation experiments.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
RIS-Aided Receive Generalized Spatial Modulation Design with Reflecting Modulation
Authors:
Xinghao Guo,
Yin Xu,
Hanjiang Hong,
De Mi,
Ruiqi Liu,
Dazhi He,
Wenjun Zhang,
Yi-yan Wu
Abstract:
Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception perform…
▽ More
Spatial modulation (SM) transmits additional information bits by the selection of antennas. Generalized spatial modulation (GSM), as an advanced type of SM, can be divided into diversity and multiplexing (MUX) schemes according to the symbols carried on the selected antennas are identical or different. Recently, reconfigurable intelligent surface (RIS) assisted SM exhibits better reception performance compared to conventional SM. To overcome the limitations of SM, this paper combines GSM with RIS and proposes the RIS-aided receive generalized spatial modulation (RIS-RGSM) scheme. The RIS-RGSM diversity scheme is realized via a simple improvement based on the state-of-the-art scheme. To further increase the transmission rate, a novel RIS-RGSM MUX scheme is proposed, where the reflection phase shifts and on/off states of RIS elements are configured to achieve bit map**. The theoretical bit error rate (BER) of the proposed scheme is derived and agrees well with the simulation results. Numerical simulations show that the RIS-RGSM MUX scheme has better BER performance than the diversity scheme. The proposed scheme can significantly increase the transmission rate and maintain good performance compared to the existing scheme under a limited number of antennas.
△ Less
Submitted 15 April, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Capacity-based Spatial Modulation Constellation and Pre-scaling Design
Authors:
Xinghao Guo,
Hanjiang Hong,
Yin Xu,
Yi-yan Wu,
Dazhi He,
Wenjun Zhang
Abstract:
Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scali…
▽ More
Spatial Modulation (SM) can utilize the index of the transmit antenna (TA) to transmit additional information. In this paper, to improve the performance of SM, a non-uniform constellation (NUC) and pre-scaling coefficients optimization design scheme is proposed. The bit-interleaved coded modulation (BICM) capacity calculation formula of SM system is firstly derived. The constellation and pre-scaling coefficients are optimized by maximizing the BICM capacity without channel state information (CSI) feedback. Optimization results are given for the multiple-input-single-output (MISO) system with Rayleigh channel. Simulation result shows the proposed scheme provides a meaningful performance gain compared to conventional SM system without CSI feedback. The proposed optimization design scheme can be a promising technology for future 6G to achieve high-efficiency.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
FieldHAR: A Fully Integrated End-to-end RTL Framework for Human Activity Recognition with Neural Networks from Heterogeneous Sensors
Authors:
Mengxi Liu,
Bo Zhou,
Zimin Zhao,
Hyeonseok Hong,
Hyun Kim,
Sungho Suh,
Vitor Fortes Rey,
Paul Lukowicz
Abstract:
In this work, we propose an open-source scalable end-to-end RTL framework FieldHAR, for complex human activity recognition (HAR) from heterogeneous sensors using artificial neural networks (ANN) optimized for FPGA or ASIC integration. FieldHAR aims to address the lack of apparatus to transform complex HAR methodologies often limited to offline evaluation to efficient run-time edge applications. Th…
▽ More
In this work, we propose an open-source scalable end-to-end RTL framework FieldHAR, for complex human activity recognition (HAR) from heterogeneous sensors using artificial neural networks (ANN) optimized for FPGA or ASIC integration. FieldHAR aims to address the lack of apparatus to transform complex HAR methodologies often limited to offline evaluation to efficient run-time edge applications. The framework uses parallel sensor interfaces and integer-based multi-branch convolutional neural networks (CNNs) to support flexible modality extensions with synchronous sampling at the maximum rate of each sensor. To validate the framework, we used a sensor-rich kitchen scenario HAR application which was demonstrated in a previous offline study. Through resource-aware optimizations, with FieldHAR the entire RTL solution was created from data acquisition to ANN inference taking as low as 25\% logic elements and 2\% memory bits of a low-end Cyclone IV FPGA and less than 1\% accuracy loss from the original FP32 precision offline study. The RTL implementation also shows advantages over MCU-based solutions, including superior data acquisition performance and virtually eliminating ANN inference bottleneck.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
REMAST: Real-time Emotion-based Music Arrangement with Soft Transition
Authors:
Zihao Wang,
Le Ma,
Chen Zhang,
Bo Han,
Yunfei Xu,
Yikai Wang,
Xinyi Chen,
HaoRong Hong,
Wenbo Liu,
Xinda Wu,
Kejun Zhang
Abstract:
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion rea…
▽ More
Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of smooth transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose REMAST to address this trade-off. Specifically, we recognize the last timestep's music emotion and fuse it with the current timestep's input emotion. The fused emotion then guides REMAST to generate the music based on the input melody. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features by domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics. These results demonstrate that REMAST achieves real-time fit and smooth transition simultaneously, enhancing the coherence of the generated music.
△ Less
Submitted 5 February, 2024; v1 submitted 13 May, 2023;
originally announced May 2023.
-
HUST bearing: a practical dataset for ball bearing fault diagnosis
Authors:
Nguyen Duc Thuan,
Hoang Si Hong
Abstract:
In this work, we introduce a practical dataset named HUST bearing, that provides a large set of vibration data on different ball bearings. This dataset contains 90 raw vibration data of 6 types of defects (inner crack, outer crack, ball crack, and their 2-combinations) on 5 types of bearing at 3 working conditions with the sample rate of 51,200 samples per second. We established the envelope analy…
▽ More
In this work, we introduce a practical dataset named HUST bearing, that provides a large set of vibration data on different ball bearings. This dataset contains 90 raw vibration data of 6 types of defects (inner crack, outer crack, ball crack, and their 2-combinations) on 5 types of bearing at 3 working conditions with the sample rate of 51,200 samples per second. We established the envelope analysis and order tracking analysis on the introduced dataset to allow an initial evaluation of the data. A number of classical machine learning classification methods are used to identify bearing faults of the dataset using features in different domains. The typical advanced unsupervised transfer learning algorithms also perform to observe the transferability of knowledge among parts of the dataset. The experimental results of examined methods on the dataset gain divergent accuracy up to 100% on classification task and 60-80% on unsupervised transfer learning task.
△ Less
Submitted 2 October, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
On the estimation of the evolutionary power spectral density
Authors:
H. P. Hong
Abstract:
Two popular spectral-based approaches for estimating the evolutionary power spectral density (EPSD) function from the samples of the evolutionary process are based on the short-time Fourier transform (STFT) and the continuous wavelet transform. Both rely on the concept of slowly varying modulation or EPSD function, although the quantification of the effect of the 'slow' variation in the estimated…
▽ More
Two popular spectral-based approaches for estimating the evolutionary power spectral density (EPSD) function from the samples of the evolutionary process are based on the short-time Fourier transform (STFT) and the continuous wavelet transform. Both rely on the concept of slowly varying modulation or EPSD function, although the quantification of the effect of the 'slow' variation in the estimated EPSD is elusive. We propose, in the present study, to use the derivatives of the EPSD function to quantify the smoothness of the EPSD function in the context of estimating the EPSD function. We derive equations for estimating EPSD by using the S-transform and continuous wavelet transform. These equations are as simple to use as that derived based on STFT. We also derive the corresponding equations for assessing the residual for the estimated EPSD by using these transforms, including STFT. The residual provides an approach for identifying or quantifying, in the context of its estimation, the 'slow' variation of the EPSD function. The derived equations and numerical results indicate that the residual depends on both the derivatives of the EPSD function with respect to time and frequency as well as the adopted transform.
△ Less
Submitted 25 October, 2022;
originally announced November 2022.
-
Motion estimation and filtered prediction for dynamic point cloud attribute compression
Authors:
Haoran Hong,
Eduardo Pavez,
Antonio Ortega,
Ryosuke Watanabe,
Keisuke Nonaka
Abstract:
In point cloud compression, exploiting temporal redundancy for inter predictive coding is challenging because of the irregular geometry. This paper proposes an efficient block-based inter-coding scheme for color attribute compression. The scheme includes integer-precision motion estimation and an adaptive graph based in-loop filtering scheme for improved attribute prediction. The proposed block-ba…
▽ More
In point cloud compression, exploiting temporal redundancy for inter predictive coding is challenging because of the irregular geometry. This paper proposes an efficient block-based inter-coding scheme for color attribute compression. The scheme includes integer-precision motion estimation and an adaptive graph based in-loop filtering scheme for improved attribute prediction. The proposed block-based motion estimation scheme consists of an initial motion search that exploits geometric and color attributes, followed by a motion refinement that only minimizes color prediction error. To further improve color prediction, we propose a vertex-domain low-pass graph filtering scheme that can adaptively remove noise from predictors computed from motion estimation with different accuracy. Our experiments demonstrate significant coding gain over state-of-the-art coding methods.
△ Less
Submitted 28 October, 2022; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Multi-task Learning for Monocular Depth and Defocus Estimations with Real Images
Authors:
Renzhi He,
Hualin Hong,
Boya Fu,
Fei Liu
Abstract:
Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Thr…
▽ More
Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Through the multi-task network, the depth estimation facilitates the defocus estimation to get better results in the weak texture region and the defocus estimation facilitates the depth estimation by the strong physical connection between the two maps. We set up a dataset (named ALL-in-3D dataset) which is the first all-real image dataset consisting of 100K sets of all-in-focus images, focused images with focus depth, depth maps, and defocus maps. It enables the network to learn features and solid physical connections between the depth and real defocus images. Experiments demonstrate that the network learns more solid features from the real focused images than the synthetic focused images. Benefiting from this multi-task structure where different tasks facilitate each other, our depth and defocus estimations achieve significantly better performance than other state-of-art algorithms. The code and dataset will be publicly available at https://github.com/cubhe/MDDNet.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
Segmentation of kidney stones in endoscopic video feeds
Authors:
Zachary A Stoebner,
Daiwei Lu,
Seok Hee Hong,
Nicholas L Kavoussi,
Ipek Oguz
Abstract:
Image segmentation has been increasingly applied in medical settings as recent developments have skyrocketed the potential applications of deep learning. Urology, specifically, is one field of medicine that is primed for the adoption of a real-time image segmentation system with the long-term aim of automating endoscopic stone treatment. In this project, we explored supervised deep learning models…
▽ More
Image segmentation has been increasingly applied in medical settings as recent developments have skyrocketed the potential applications of deep learning. Urology, specifically, is one field of medicine that is primed for the adoption of a real-time image segmentation system with the long-term aim of automating endoscopic stone treatment. In this project, we explored supervised deep learning models to annotate kidney stones in surgical endoscopic video feeds. In this paper, we describe how we built a dataset from the raw videos and how we developed a pipeline to automate as much of the process as possible. For the segmentation task, we adapted and analyzed three baseline deep learning models -- U-Net, U-Net++, and DenseNet -- to predict annotations on the frames of the endoscopic videos with the highest accuracy above 90\%. To show clinical potential for real-time use, we also confirmed that our best trained model can accurately annotate new videos at 30 frames per second. Our results demonstrate that the proposed method justifies continued development and study of image segmentation to annotate ureteroscopic video feeds.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Fractional Motion Estimation for Point Cloud Compression
Authors:
Haoran Hong,
Eduardo Pavez,
Antonio Ortega,
Ryosuke Watanabe,
Keisuke Nonaka
Abstract:
Motivated by the success of fractional pixel motion in video coding, we explore the design of motion estimation with fractional-voxel resolution for compression of color attributes of dynamic 3D point clouds. Our proposed block-based fractional-voxel motion estimation scheme takes into account the fundamental differences between point clouds and videos, i.e., the irregularity of the distribution o…
▽ More
Motivated by the success of fractional pixel motion in video coding, we explore the design of motion estimation with fractional-voxel resolution for compression of color attributes of dynamic 3D point clouds. Our proposed block-based fractional-voxel motion estimation scheme takes into account the fundamental differences between point clouds and videos, i.e., the irregularity of the distribution of voxels within a frame and across frames. We show that motion compensation can benefit from the higher resolution reference and more accurate displacements provided by fractional precision. Our proposed scheme significantly outperforms comparable methods that only use integer motion. The proposed scheme can be combined with and add sizeable gains to state-of-the-art systems that use transforms such as Region Adaptive Graph Fourier Transform and Region Adaptive Haar Transform.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
Hybrid Beamforming for Intelligent Reflecting Surface Aided Millimeter Wave MIMO Systems
Authors:
Sung Hyuck Hong,
Jaeyong Park,
Sung-** Kim,
Junil Choi
Abstract:
As communication systems that employ millimeter wave (mmWave) frequency bands must use large antenna arrays to overcome the severe propagation loss of mmWave signals, hybrid beamforming has been considered as an integral component of mmWave communications. Recently, intelligent reflecting surface (IRS) has been proposed as an innovative technology that can significantly improve the performance of…
▽ More
As communication systems that employ millimeter wave (mmWave) frequency bands must use large antenna arrays to overcome the severe propagation loss of mmWave signals, hybrid beamforming has been considered as an integral component of mmWave communications. Recently, intelligent reflecting surface (IRS) has been proposed as an innovative technology that can significantly improve the performance of mmWave communication systems through the use of low-cost passive reflecting elements. In this paper, we study IRS-aided mmWave multiple-input multiple-output (MIMO) systems with hybrid beamforming architectures. We first exploit the sparse-scattering structure and large dimension of mmWave channels to develop the joint design of IRS reflection matrix and hybrid beamformer for narrowband MIMO systems. Then, we generalize the proposed joint design to broadband MIMO systems with orthogonal frequency division multiplexing (OFDM) modulation by leveraging the angular sparsity of frequency-selective mmWave channels. Simulation results demonstrate that the proposed joint designs can significantly enhance the spectral efficiency of the systems of interest and achieve superior performance over the existing designs.
△ Less
Submitted 27 June, 2022; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Pre-demosaic Graph-based Light Field Image Compression
Authors:
Yung-Hsuan Chao,
Haoran Hong,
Gene Cheung,
Antonio Ortega
Abstract:
An unfocused plenoptic light field (LF) camera places an array of microlenses in front of an image sensor in order to separately capture different directional rays arriving at an image pixel. Using a conventional Bayer pattern, data captured at each pixel is a single color component (R, G or B).The sensed data then undergoes demosaicking (interpolation of RGB components per pixel) and conversion t…
▽ More
An unfocused plenoptic light field (LF) camera places an array of microlenses in front of an image sensor in order to separately capture different directional rays arriving at an image pixel. Using a conventional Bayer pattern, data captured at each pixel is a single color component (R, G or B).The sensed data then undergoes demosaicking (interpolation of RGB components per pixel) and conversion to an array of sub-aperture images (SAIs). In this paper, we propose a new LF image coding scheme based on graph lifting transform (GLT), where the acquired sensor data are coded in the original captured form without pre-processing. Specifically, we directly map raw sensed color data to the SAIs, resulting in sparsely distributed color pixels on 2D grids, and perform demosaicking at the receiver after decoding. To exploit spatial correlation among the sparse pixels, we propose a novel intra-prediction scheme, where the prediction kernel is determined according to the local gradient estimated from already coded neighboring pixel blocks. We then connect the pixels by forming a graph, modeling the prediction residuals statistically as a Gaussian Markov Random Field (GMRF). The optimal edge weights are computed via a graph learning method using a set of training SAIs. The residual data is encoded via low-complexity GLT. Experiments show that at high PSNRs -- important for archiving and instant storage scenarios -- our method outperformed significantly a conventional light field image coding scheme with demosaicking followed by High Efficiency Video Coding (HEVC).
△ Less
Submitted 6 January, 2022; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Polar-Cap Codebook Design for MISO Rician Fading Channels with Limited Feedback
Authors:
Sung Hyuck Hong,
Sucheol Kim,
Junil Choi,
Wan Choi
Abstract:
Most of the prior works on designing codebooks for limited feedback systems have not considered the presence of strong line-of-sight (LOS) channel component. This paper proposes the design of polar-cap codebook (PCC) for multipleinput single-output (MISO) limited feedback systems subject to Rician fading channels. The codewords of the designed PCC are adaptively constructed according to the instan…
▽ More
Most of the prior works on designing codebooks for limited feedback systems have not considered the presence of strong line-of-sight (LOS) channel component. This paper proposes the design of polar-cap codebook (PCC) for multipleinput single-output (MISO) limited feedback systems subject to Rician fading channels. The codewords of the designed PCC are adaptively constructed according to the instantaneous strength of the LOS channel component. Simulation results show that the codebook can significantly enhance the performance of transmit beamforming in terms of received signal-to-noise ratio (SNR).
△ Less
Submitted 27 June, 2022; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Laser scanning reflection-matrix microscopy for label-free in vivo imaging of a mouse brain through an intact skull
Authors:
Seokchan Yoon,
Hojun Lee,
** Hee Hong,
Yong-Sik Lim,
Wonshik Choi
Abstract:
We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field map** of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the s…
▽ More
We present a laser scanning reflection-matrix microscopy combining the scanning of laser focus and the wide-field map** of the electric field of the backscattered waves for eliminating higher-order aberrations even in the presence of strong multiple light scattering noise. Unlike conventional confocal laser scanning microscopy, we record the amplitude and phase maps of reflected waves from the sample not only at the confocal pinhole, but also at other non-confocal points. These additional measurements lead us to constructing a time-resolved reflection matrix, with which the sample-induced aberrations for the illumination and detection pathways are separately identified and corrected. We realized in vivo reflectance imaging of myelinated axons through an intact skull of a living mouse with the spatial resolution close to the ideal diffraction limit. Furthermore, we demonstrated near-diffraction-limited multiphoton imaging through an intact skull by physically correcting the aberrations identified from the reflection matrix. The proposed method is expected to extend the range of applications, where the knowledge of the detailed microscopic information deep within biological tissues is critical.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs
Authors:
Jonas Kubilius,
Martin Schrimpf,
Kohitij Kar,
Ha Hong,
Najib J. Majaj,
Rishi Rajalingham,
Elias B. Issa,
Pouya Bashivan,
Jonathan Prescott-Roy,
Kailyn Schmidt,
Aran Nayebi,
Daniel Bear,
Daniel L. K. Yamins,
James J. DiCarlo
Abstract:
Deep convolutional artificial neural networks (ANNs) are the leading class of candidate models of the mechanisms of visual processing in the primate ventral stream. While initially inspired by brain anatomy, over the past years, these ANNs have evolved from a simple eight-layer architecture in AlexNet to extremely deep and branching architectures, demonstrating increasingly better object categoriz…
▽ More
Deep convolutional artificial neural networks (ANNs) are the leading class of candidate models of the mechanisms of visual processing in the primate ventral stream. While initially inspired by brain anatomy, over the past years, these ANNs have evolved from a simple eight-layer architecture in AlexNet to extremely deep and branching architectures, demonstrating increasingly better object categorization performance, yet bringing into question how brain-like they still are. In particular, typical deep models from the machine learning community are often hard to map onto the brain's anatomy due to their vast number of layers and missing biologically-important connections, such as recurrence. Here we demonstrate that better anatomical alignment to the brain and high performance on machine learning as well as neuroscience measures do not have to be in contradiction. We developed CORnet-S, a shallow ANN with four anatomically mapped areas and recurrent connectivity, guided by Brain-Score, a new large-scale composite of neural and behavioral benchmarks for quantifying the functional fidelity of models of the primate ventral visual stream. Despite being significantly shallower than most models, CORnet-S is the top model on Brain-Score and outperforms similarly compact models on ImageNet. Moreover, our extensive analyses of CORnet-S circuitry variants reveal that recurrence is the main predictive factor of both Brain-Score and ImageNet top-1 performance. Finally, we report that the temporal evolution of the CORnet-S "IT" neural population resembles the actual monkey IT population dynamics. Taken together, these results establish CORnet-S, a compact, recurrent ANN, as the current best model of the primate ventral visual stream.
△ Less
Submitted 28 October, 2019; v1 submitted 13 September, 2019;
originally announced September 2019.
-
SIAN: software for structural identifiability analysis of ODE models
Authors:
Hoon Hong,
Alexey Ovchinnikov,
Gleb Pogudin,
Chee Yap
Abstract:
Biological processes are often modeled by ordinary differential equations with unknown parameters. The unknown parameters are usually estimated from experimental data. In some cases, due to the structure of the model, this estimation problem does not have a unique solution even in the case of continuous noise-free data. It is therefore desirable to check the uniqueness a priori before carrying out…
▽ More
Biological processes are often modeled by ordinary differential equations with unknown parameters. The unknown parameters are usually estimated from experimental data. In some cases, due to the structure of the model, this estimation problem does not have a unique solution even in the case of continuous noise-free data. It is therefore desirable to check the uniqueness a priori before carrying out actual experiments. We present a new software SIAN (Structural Identifiability ANalyser) that does this. Our software can tackle problems that could not be tackled by previously developed packages.
△ Less
Submitted 25 December, 2018;
originally announced December 2018.