-
Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase
Authors:
Yoshiki Masuyama,
Natsuki Ueno,
Nobutaka Ono
Abstract:
We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and…
▽ More
We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and is applicable to various audio signals. In this paper, we jointly reconstruct the full-band magnitude and phase by considering the bi-level relationships among the time-domain signal, its STFT coefficients, and its mel-spectrogram. The proposed method is formulated as a rigorous optimization problem and estimates the full-band magnitude based on the criterion used in GLA. Our experiments demonstrate the effectiveness of the proposed method on speech, music, and environmental signals.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons
Authors:
Shoichi Koyama,
Keisuke Kimura,
Natsuki Ueno
Abstract:
Two sound field reproduction methods, weighted pressure matching and weighted mode matching, are theoretically and experimentally compared. The weighted pressure and mode matching are a generalization of conventional pressure and mode matching, respectively. Both methods are derived by introducing a weighting matrix in the pressure and mode matching. The weighting matrix in the weighted pressure m…
▽ More
Two sound field reproduction methods, weighted pressure matching and weighted mode matching, are theoretically and experimentally compared. The weighted pressure and mode matching are a generalization of conventional pressure and mode matching, respectively. Both methods are derived by introducing a weighting matrix in the pressure and mode matching. The weighting matrix in the weighted pressure matching is defined on the basis of the kernel interpolation of the sound field from pressure at a discrete set of control points. In the weighted mode matching, the weighting matrix is defined by a regional integration of spherical wavefunctions. It is theoretically shown that the weighted pressure matching is a special case of the weighted mode matching by infinite-dimensional harmonic analysis for estimating expansion coefficients from pressure observations. The difference between the two methods are discussed through experiments.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Disappearing of the Fermi level pinning at semiconductor interfaces
Authors:
**peng Yang,
Nobuo Ueno
Abstract:
We identify a universality in the Fermi level change of Van der Waals interacting semiconductor interfaces-based Schottky junctions. We show that the disappearing of quasi-Fermi level pinning at a certain thickness of semiconductor films for both intrinsic (undoped) and extrinsic (doped) semiconductors, over a wide range of bulk systems including inorganic, organic, and even organic-inorganic hybr…
▽ More
We identify a universality in the Fermi level change of Van der Waals interacting semiconductor interfaces-based Schottky junctions. We show that the disappearing of quasi-Fermi level pinning at a certain thickness of semiconductor films for both intrinsic (undoped) and extrinsic (doped) semiconductors, over a wide range of bulk systems including inorganic, organic, and even organic-inorganic hybridized semiconductors. The Fermi level (EF) position located in the energy bandgap was dominated by not only the substrate work function, but also the thickness of semiconductor films, in which the final EF shall be located at the position reflecting the thermal equilibrium of semiconductors themselves. Such universalities originate from the charge transfer between the substrate and semiconductor films after solving one-dimensional Poisson's equation. Our calculation resolves some of the conflicting results from experimental results determined by using ultraviolet photoelectron spectroscopy (UPS) and unifies the general rule on extracting EF positions in energy bandgaps from (i) inorganic semiconductors to organic semiconductors and (ii) intrinsic (undoped) to extrinsic (doped) semiconductors. Our findings shall provide a simple analytical scaling for obtaining the quantitative energy diagram regarding thickness in the real devices, thus paving the way for a fundamental understanding of interface physics and designing functional devices.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field
Authors:
Keisuke Kimura,
Shoichi Koyama,
Natsuki Ueno,
Hiroshi Saruwatari
Abstract:
A method of optimizing secondary source placement in sound field synthesis is proposed. Such an optimization method will be useful when the allowable placement region and available number of loudspeakers are limited. We formulate a mean-square-error-based cost function, incorporating the statistical properties of possible desired sound fields, for general linear-least-squares-based sound field syn…
▽ More
A method of optimizing secondary source placement in sound field synthesis is proposed. Such an optimization method will be useful when the allowable placement region and available number of loudspeakers are limited. We formulate a mean-square-error-based cost function, incorporating the statistical properties of possible desired sound fields, for general linear-least-squares-based sound field synthesis methods, including pressure matching and (weighted) mode matching, whereas most of the current methods are applicable only to the pressure-matching method. An efficient greedy algorithm for minimizing the proposed cost function is also derived. Numerical experiments indicated that a high reproduction accuracy can be achieved by the placement optimized by the proposed method compared with the empirically used regular placement.
△ Less
Submitted 9 December, 2021;
originally announced December 2021.
-
Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation
Authors:
Shoichi Koyama,
Keisuke Kimura,
Natsuki Ueno
Abstract:
Sound field reproduction methods based on numerical optimization, which aim to minimize the error between synthesized and desired sound fields, are useful in many practical scenarios because of their flexibility in the array geometry of loudspeakers. However, the reproduction performance of these methods in a practical environment has not been sufficiently investigated. We evaluate weighted mode m…
▽ More
Sound field reproduction methods based on numerical optimization, which aim to minimize the error between synthesized and desired sound fields, are useful in many practical scenarios because of their flexibility in the array geometry of loudspeakers. However, the reproduction performance of these methods in a practical environment has not been sufficiently investigated. We evaluate weighted mode matching, which is a sound field reproduction method based on the spherical wavefunction expansion of the sound field, in comparison with conventional pressure matching. We also introduce a method of infinite-dimensional harmonic analysis for estimating the expansion coefficients of the sound field from microphone measurements. Experimental results indicated that weighted mode matching using the expansion coefficients of the transfer functions estimated by the infinite-dimensional harmonic analysis outperforms conventional pressure matching, especially when the number of microphones is small.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations
Authors:
Ryosuke Horiuchi,
Shoichi Koyama,
Juliano G. C. Ribeiro,
Natsuki Ueno,
Hiroshi Saruwatari
Abstract:
A method to estimate an acoustic field from discrete microphone measurements is proposed. A kernel-interpolation-based method using the kernel function formulated for sound field interpolation has been used in various applications. The kernel function with directional weighting makes it possible to incorporate prior information on source directions to improve estimation accuracy. However, in prior…
▽ More
A method to estimate an acoustic field from discrete microphone measurements is proposed. A kernel-interpolation-based method using the kernel function formulated for sound field interpolation has been used in various applications. The kernel function with directional weighting makes it possible to incorporate prior information on source directions to improve estimation accuracy. However, in prior studies, parameters for directional weighting have been empirically determined. We propose a method to optimize these parameters using observation values, which is particularly useful when prior information on source directions is uncertain. The proposed algorithm is based on discretization of the parameters and representation of the kernel function as a weighted sum of sub-kernels. Two types of regularization for the weights, $L_1$ and $L_2$, are investigated. Experimental results indicate that the proposed method achieves higher estimation accuracy than the method without kernel learning.
△ Less
Submitted 12 October, 2021; v1 submitted 10 October, 2021;
originally announced October 2021.
-
MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods
Authors:
Shoichi Koyama,
Tomoya Nishida,
Keisuke Kimura,
Takumi Abe,
Natsuki Ueno,
Jesper Brunnström
Abstract:
A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been exten…
▽ More
A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been extensively investigated in recent years; however, the current IR datasets are not applicable to validating and comparing these methods because of the low spatial resolution of measurement points. MeshRIR consists of IRs measured at positions obtained by finely discretizing a spatial region. Two subdatasets are currently available: one consists of IRs in a three-dimensional cuboidal region from a single source, and the other consists of IRs in a two-dimensional square region from an array of 32 sources. Therefore, MeshRIR is suitable for evaluating sound field analysis and synthesis methods. This dataset is freely available at https://sh01k.github.io/MeshRIR/ with some codes of sample applications.
△ Less
Submitted 23 July, 2021; v1 submitted 20 June, 2021;
originally announced June 2021.
-
Accessing the conduction band dispersion in CH3NH3PbI3 single crystals
Authors:
**peng Yang,
Haruki Sato,
Hibiki Orio,
Xianjie Liu,
Mats Fahlman,
Nobuo Ueno,
Hiroyuki Yoshida,
Takashi Yamada,
Satoshi Kera
Abstract:
The conduction band structure in methylammonium lead iodide (CH3NH3PbI3) was studied both by angle-resolved two-photon photoemission spectroscopy (AR-2PPE) with low-photon intensity and angle-resolved low-energy inverse photoelectron spectroscopy (AR-LEIPS). Clear energy dispersion of the conduction band along the ΓM direction was observed by these independent methods under different temperatures,…
▽ More
The conduction band structure in methylammonium lead iodide (CH3NH3PbI3) was studied both by angle-resolved two-photon photoemission spectroscopy (AR-2PPE) with low-photon intensity and angle-resolved low-energy inverse photoelectron spectroscopy (AR-LEIPS). Clear energy dispersion of the conduction band along the ΓM direction was observed by these independent methods under different temperatures, and the dispersion was found to be consistent with band calculations under the cubic phase. The effective mass of the electrons at the Γ point was estimated to be (0.20+-0.05)m0 at 90 K. The observed energy position was largely different between the AR-LEIPS and AR-2PPE, demonstrating the electron correlation effects on the band structures. The present results also indicate that the surface structure in CH3NH3PbI3 provides the cubic-dominated electronic property even at lower temperatures.
△ Less
Submitted 23 December, 2020;
originally announced December 2020.
-
Accessing surface Brillouin zone and band structure of picene single crystals
Authors:
Qian Xin,
Steffen Duhm,
Fabio Bussolotti,
Kouki Akaike,
Yoshihiro Kubozono,
Hideo Aoki,
Taichi Kosugi,
Satoshi Kera,
Nobuo Ueno
Abstract:
We have experimentally revealed the band structure and the surface Brillouin zone of insulating picene single crystals (SCs), the mother organic system for a recently discovered aromatic superconductor, with ultraviolet photoelectron spectroscopy (UPS) and low-energy electron diffraction with laser for photoconduction. A hole effective mass of 2.24 m_0 and the hole mobility mu_h >= 9.0 cm^2/Vs (29…
▽ More
We have experimentally revealed the band structure and the surface Brillouin zone of insulating picene single crystals (SCs), the mother organic system for a recently discovered aromatic superconductor, with ultraviolet photoelectron spectroscopy (UPS) and low-energy electron diffraction with laser for photoconduction. A hole effective mass of 2.24 m_0 and the hole mobility mu_h >= 9.0 cm^2/Vs (298 K) were deduced in Gamma-Y direction. We have further shown that some picene SCs did not show charging during UPS even without the laser, which indicates that pristine UPS works for high-quality organic SCs.
△ Less
Submitted 9 October, 2012; v1 submitted 18 April, 2012;
originally announced April 2012.