-
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Authors:
Wangyou Zhang,
Christoph Boeddeker,
Shinji Watanabe,
Tomohiro Nakatani,
Marc Delcroix,
Keisuke Kinoshita,
Tsubasa Ochiai,
Naoyuki Kamo,
Reinhold Haeb-Umbach,
Yanmin Qian
Abstract:
Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions. However, severe performance degradation is still observed in the reverberant and noisy scenarios, and there is still a large performance gap between anechoic and reverberant conditions. In this work, we focus on the multichannel mult…
▽ More
Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions. However, severe performance degradation is still observed in the reverberant and noisy scenarios, and there is still a large performance gap between anechoic and reverberant conditions. In this work, we focus on the multichannel multi-speaker reverberant condition, and propose to extend our previous framework for end-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend subnetworks including voice activity detection like masks. The techniques significantly stabilize the end-to-end training process. The experiments on the spatialized wsj1-2mix corpus show that the proposed system achieves about 35% WER relative reduction compared to our conventional multi-channel E2E ASR system, and also obtains decent speech dereverberation and separation performance (SDR=12.5 dB) in the reverberant multi-speaker condition while trained only with the ASR criterion.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Multimodal Attention Fusion for Target Speaker Extraction
Authors:
Hiroshi Sato,
Tsubasa Ochiai,
Keisuke Kinoshita,
Marc Delcroix,
Tomohiro Nakatani,
Shoko Araki
Abstract:
Target speaker extraction, which aims at extracting a target speaker's voice from a mixture of voices using audio, visual or locational clues, has received much interest. Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues. Although audio-visual target speaker extraction offers a more stable performance than…
▽ More
Target speaker extraction, which aims at extracting a target speaker's voice from a mixture of voices using audio, visual or locational clues, has received much interest. Recently an audio-visual target speaker extraction has been proposed that extracts target speech by using complementary audio and visual clues. Although audio-visual target speaker extraction offers a more stable performance than single modality methods for simulated data, its adaptation towards realistic situations has not been fully explored as well as evaluations on real recorded mixtures. One of the major issues to handle realistic situations is how to make the system robust to clue corruption because in real recordings both clues may not be equally reliable, e.g. visual clues may be affected by occlusions. In this work, we propose a novel attention mechanism for multi-modal fusion and its training methods that enable to effectively capture the reliability of the clues and weight the more reliable ones. Our proposals improve signal to distortion ratio (SDR) by 1.0 dB over conventional fusion mechanisms on simulated data. Moreover, we also record an audio-visual dataset of simultaneous speech with realistic visual clue corruption and show that audio-visual target speaker extraction with our proposals successfully work on real data.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Speaker activity driven neural speech extraction
Authors:
Marc Delcroix,
Katerina Zmolikova,
Tsubasa Ochiai,
Keisuke Kinoshita,
Tomohiro Nakatani
Abstract:
Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enrollment utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel…
▽ More
Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enrollment utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel neural network-based speech extraction. We propose a speaker activity driven speech extraction neural network (ADEnet) and show that it can achieve performance levels competitive with enrollment-based approaches, without the need for pre-recordings. We further demonstrate the potential of the proposed approach for processing meeting-like recordings, where the speaker activity is obtained from a diarization system. We show that this simple yet practical approach can successfully extract speakers after diarization, which results in improved ASR performance, especially in high overlap** conditions, with a relative word error rate reduction of up to 25%.
△ Less
Submitted 9 February, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Neural Network-based Virtual Microphone Estimator
Authors:
Tsubasa Ochiai,
Marc Delcroix,
Tomohiro Nakatani,
Rintaro Ikeshita,
Keisuke Kinoshita,
Shoko Araki
Abstract:
Develo** microphone array technologies for a small number of microphones is important due to the constraints of many devices. One direction to address this situation consists of virtually augmenting the number of microphone signals, e.g., based on several physical model assumptions. However, such assumptions are not necessarily met in realistic conditions. In this paper, as an alternative approa…
▽ More
Develo** microphone array technologies for a small number of microphones is important due to the constraints of many devices. One direction to address this situation consists of virtually augmenting the number of microphone signals, e.g., based on several physical model assumptions. However, such assumptions are not necessarily met in realistic conditions. In this paper, as an alternative approach, we propose a neural network-based virtual microphone estimator (NN-VME). The NN-VME estimates virtual microphone signals directly in the time domain, by utilizing the precise estimation capability of the recent time-domain neural networks. We adopt a fully supervised learning framework that uses actual observations at the locations of the virtual microphones at training time. Consequently, the NN-VME can be trained using only multi-channel observations and thus directly on real recordings, avoiding the need for unrealistic physical model-based assumptions. Experiments on the CHiME-4 corpus show that the proposed NN-VME achieves high virtual microphone estimation performance even for real recordings and that a beamformer augmented with the NN-VME improves both the speech enhancement and recognition performance.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording
Authors:
Cong Han,
Yi Luo,
Chenda Li,
Tianyan Zhou,
Keisuke Kinoshita,
Shinji Watanabe,
Marc Delcroix,
Hakan Erdogan,
John R. Hershey,
Nima Mesgarani,
Zhuo Chen
Abstract:
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all th…
▽ More
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all these systems ideally assume that the pre-enrolled speaker signals are available and are only evaluated on simple data configurations. In realistic multi-talker conversations, the speech signal contains a large proportion of non-overlapped regions, where we can derive robust speaker embedding of individual talkers. In this work, we adopt the SSUSI model in long recordings and propose a self-informed, clustering-based inventory forming scheme for long recording, where the speaker inventory is fully built from the input signal without the need for external speaker signals. Experiment results on simulated noisy reverberant long recording datasets show that the proposed method can significantly improve the separation performance across various conditions.
△ Less
Submitted 18 December, 2020; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation
Authors:
Christoph Boeddeker,
Wangyou Zhang,
Tomohiro Nakatani,
Keisuke Kinoshita,
Tsubasa Ochiai,
Marc Delcroix,
Naoyuki Kamo,
Yanmin Qian,
Reinhold Haeb-Umbach
Abstract:
Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective functio…
▽ More
Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based system and alternative objectives like Scale Invariant Signal-to-Distortion Ratio by a large margin.
△ Less
Submitted 8 June, 2021; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Integration of variational autoencoder and spatial clustering for adaptive multi-channel neural speech separation
Authors:
Katerina Zmolikova,
Marc Delcroix,
Lukáš Burget,
Tomohiro Nakatani,
Jan "Honza" Černocký
Abstract:
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work,…
▽ More
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work, we combine the strengths of both approaches, by building a factorial model based on a generative neural network, a variational autoencoder. By doing so, we can exploit the modeling power of neural networks, but at the same time, keep a structured model. Such a model can be advantageous when adapting to new noise conditions as only the noise part of the model needs to be modified. We show experimentally, that our model significantly outperforms previous factorial model based on Gaussian mixture model (DOLPHIN), performs comparably to integration of permutation invariant training with spatial clustering, and enables us to easily adapt to new noise conditions. The code for the method is available at https://github.com/BUTSpeechFIT/vae_dolphin
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds
Authors:
Keisuke Kinoshita,
Marc Delcroix,
Naohiro Tawara
Abstract:
Recent diarization technologies can be categorized into two approaches, i.e., clustering and end-to-end neural approaches, which have different pros and cons. The clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. While it can be seen as a current state-of-the-art approach that works for various challenging data with reasonable r…
▽ More
Recent diarization technologies can be categorized into two approaches, i.e., clustering and end-to-end neural approaches, which have different pros and cons. The clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. While it can be seen as a current state-of-the-art approach that works for various challenging data with reasonable robustness and accuracy, it has a critical disadvantage that it cannot handle overlapped speech that is inevitable in natural conversational data. In contrast, the end-to-end neural diarization (EEND), which directly predicts diarization labels using a neural network, was devised to handle the overlapped speech. While the EEND, which can easily incorporate emerging deep-learning technologies, has started outperforming the x-vector clustering approach in some realistic database, it is difficult to make it work for `long' recordings (e.g., recordings longer than 10 minutes) because of, e.g., its huge memory consumption. Block-wise independent processing is also difficult because it poses an inter-block label permutation problem, i.e., an ambiguity of the speaker label assignments between blocks. In this paper, we propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers. It modifies the conventional EEND framework to simultaneously output global speaker embeddings so that speaker clustering can be performed across blocks to solve the permutation problem. With experiments based on simulated noisy reverberant 2-speaker meeting-like data, we show that the proposed framework works significantly better than the original EEND especially when the input data is long.
△ Less
Submitted 4 February, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Far-Field Automatic Speech Recognition
Authors:
Reinhold Haeb-Umbach,
Jahn Heymann,
Lukas Drude,
Shinji Watanabe,
Marc Delcroix,
Tomohiro Nakatani
Abstract:
The machine recognition of speech spoken at a distance from the microphones, known as far-field automatic speech recognition (ASR), has received a significant increase of attention in science and industry, which caused or was caused by an equally significant improvement in recognition accuracy. Meanwhile it has entered the consumer market with digital home assistants with a spoken language interfa…
▽ More
The machine recognition of speech spoken at a distance from the microphones, known as far-field automatic speech recognition (ASR), has received a significant increase of attention in science and industry, which caused or was caused by an equally significant improvement in recognition accuracy. Meanwhile it has entered the consumer market with digital home assistants with a spoken language interface being its most prominent application. Speech recorded at a distance is affected by various acoustic distortions and, consequently, quite different processing pipelines have emerged compared to ASR for close-talk speech. A signal enhancement front-end for dereverberation, source separation and acoustic beamforming is employed to clean up the speech, and the back-end ASR engine is robustified by multi-condition training and adaptation. We will also describe the so-called end-to-end approach to ASR, which is a new promising architecture that has recently been extended to the far-field scenario. This tutorial article gives an account of the algorithms used to enable accurate speech recognition from a distance, and it will be seen that, although deep learning has a significant share in the technological breakthroughs, a clever combination with traditional signal processing can lead to surprisingly effective solutions.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation
Authors:
Keisuke Kinoshita,
Thilo von Neumann,
Marc Delcroix,
Tomohiro Nakatani,
Reinhold Haeb-Umbach
Abstract:
Recently, the source separation performance was greatly improved by time-domain audio source separation based on dual-path recurrent neural network (DPRNN). DPRNN is a simple but effective model for a long sequential data. While DPRNN is quite efficient in modeling a sequential data of the length of an utterance, i.e., about 5 to 10 second data, it is harder to apply it to longer sequences such as…
▽ More
Recently, the source separation performance was greatly improved by time-domain audio source separation based on dual-path recurrent neural network (DPRNN). DPRNN is a simple but effective model for a long sequential data. While DPRNN is quite efficient in modeling a sequential data of the length of an utterance, i.e., about 5 to 10 second data, it is harder to apply it to longer sequences such as whole conversations consisting of multiple utterances. It is simply because, in such a case, the number of time steps consumed by its internal module called inter-chunk RNN becomes extremely large. To mitigate this problem, this paper proposes a multi-path RNN (MPRNN), a generalized version of DPRNN, that models the input data in a hierarchical manner. In the MPRNN framework, the input data is represented at several (>3) time-resolutions, each of which is modeled by a specific RNN sub-module. For example, the RNN sub-module that deals with the finest resolution may model temporal relationship only within a phoneme, while the RNN sub-module handling the most coarse resolution may capture only the relationship between utterances such as speaker information. We perform experiments using simulated dialogue-like mixtures and show that MPRNN has greater model capacity, and it outperforms the current state-of-the-art DPRNN framework especially in online processing scenarios.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Listen to What You Want: Neural Network-based Universal Sound Selector
Authors:
Tsubasa Ochiai,
Marc Delcroix,
Yuma Koizumi,
Hiroaki Ito,
Keisuke Kinoshita,
Shoko Araki
Abstract:
Being able to control the acoustic events (AEs) to which we want to listen would allow the development of more controllable hearable devices. This paper addresses the AE sound selection (or removal) problems, that we define as the extraction (or suppression) of all the sounds that belong to one or multiple desired AE classes. Although this problem could be addressed with a combination of source se…
▽ More
Being able to control the acoustic events (AEs) to which we want to listen would allow the development of more controllable hearable devices. This paper addresses the AE sound selection (or removal) problems, that we define as the extraction (or suppression) of all the sounds that belong to one or multiple desired AE classes. Although this problem could be addressed with a combination of source separation followed by AE classification, this is a sub-optimal way of solving the problem. Moreover, source separation usually requires knowing the maximum number of sources, which may not be practical when dealing with AEs. In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes. The proposed framework can be explicitly optimized to simultaneously select sounds from multiple desired AE classes, independently of the number of sources in the mixture. We experimentally show that the proposed method achieves promising AE sound selection performance and could be generalized to mixtures with a number of sources that are unseen during training.
△ Less
Submitted 10 June, 2020;
originally announced June 2020.
-
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
Authors:
Thilo von Neumann,
Christoph Boeddeker,
Lukas Drude,
Keisuke Kinoshita,
Marc Delcroix,
Tomohiro Nakatani,
Reinhold Haeb-Umbach
Abstract:
Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi…
▽ More
Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database.
△ Less
Submitted 21 December, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Jointly optimal denoising, dereverberation, and source separation
Authors:
Tomohiro Nakatani,
Christoph Boeddeker,
Keisuke Kinoshita,
Rintaro Ikeshita,
Marc Delcroix,
Reinhold Haeb-Umbach
Abstract:
This paper proposes methods that can optimize a Convolutional BeamFormer (CBF) for jointly performing denoising, dereverberation, and source separation (DN+DR+SS) in a computationally efficient way. Conventionally, cascade configuration composed of a Weighted Prediction Error minimization (WPE) dereverberation filter followed by a Minimum Variance Distortionless Response beamformer has been usedas…
▽ More
This paper proposes methods that can optimize a Convolutional BeamFormer (CBF) for jointly performing denoising, dereverberation, and source separation (DN+DR+SS) in a computationally efficient way. Conventionally, cascade configuration composed of a Weighted Prediction Error minimization (WPE) dereverberation filter followed by a Minimum Variance Distortionless Response beamformer has been usedas the state-of-the-art frontend of far-field speech recognition, however, overall optimality of this approach is not guaranteed. In the blind signal processing area, an approach for jointly optimizing dereverberation and source separation (DR+SS) has been proposed, however, this approach requires huge computing cost, and has not been extended for application to DN+DR+SS. To overcome the above limitations, this paper develops new approaches for jointly optimizing DN+DR+SS in a computationally much more efficient way. To this end, we first present an objective function to optimize a CBF for performing DN+DR+SS based on the maximum likelihood estimation, on an assumption that the steering vectors of the target signals are given or can be estimated, e.g., using a neural network. This paper refers to a CBF optimized by this objective function as a weighted Minimum-Power Distortionless Response (wMPDR) CBF. Then, we derive two algorithms for optimizing a wMPDR CBF based on two different ways of factorizing a CBF into WPE filters and beamformers. Experiments using noisy reverberant sound mixtures show that the proposed optimization approaches greatly improve the performance of the speech enhancement in comparison with the conventional cascade configuration in terms of the signal distortion measures and ASR performance. It is also shown that the proposed approaches can greatly reduce the computing cost with improved estimation accuracy in comparison with the conventional joint optimization approach.
△ Less
Submitted 2 August, 2020; v1 submitted 20 May, 2020;
originally announced May 2020.
-
Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding
Authors:
Ali Aroudi,
Marc Delcroix,
Tomohiro Nakatani,
Keisuke Kinoshita,
Shoko Araki,
Simon Doclo
Abstract:
The performance of speech enhancement algorithms in a multi-speaker scenario depends on correctly identifying the target speaker to be enhanced. Auditory attention decoding (AAD) methods allow to identify the target speaker which the listener is attending to from single-trial EEG recordings. Aiming at enhancing the target speaker and suppressing interfering speakers, reverberation and ambient nois…
▽ More
The performance of speech enhancement algorithms in a multi-speaker scenario depends on correctly identifying the target speaker to be enhanced. Auditory attention decoding (AAD) methods allow to identify the target speaker which the listener is attending to from single-trial EEG recordings. Aiming at enhancing the target speaker and suppressing interfering speakers, reverberation and ambient noise, in this paper we propose a cognitive-driven multi-microphone speech enhancement system, which combines a neural-network-based mask estimator, weighted minimum power distortionless response convolutional beamformers and AAD. To control the suppression of the interfering speaker, we also propose an extension incorporating an interference suppression constraint. The experimental results show that the proposed system outperforms the state-of-the-art cognitive-driven speech enhancement systems in challenging reverberant and noisy conditions.
△ Less
Submitted 10 May, 2020;
originally announced May 2020.
-
Improving noise robust automatic speech recognition with single-channel time-domain enhancement network
Authors:
Keisuke Kinoshita,
Tsubasa Ochiai,
Marc Delcroix,
Tomohiro Nakatani
Abstract:
With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly. However, ASR performance in noisy conditions of single-channel systems remains unsatisfactory. Indeed, most single-channel speech enhancement (SE) methods (denoising) have brought only limited performance gains over state-of-the-art ASR back-end trained on multi-condition training…
▽ More
With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly. However, ASR performance in noisy conditions of single-channel systems remains unsatisfactory. Indeed, most single-channel speech enhancement (SE) methods (denoising) have brought only limited performance gains over state-of-the-art ASR back-end trained on multi-condition training data. Recently, there has been much research on neural network-based SE methods working in the time-domain showing levels of performance never attained before. However, it has not been established whether the high enhancement performance achieved by such time-domain approaches could be translated into ASR. In this paper, we show that a single-channel time-domain denoising approach can significantly improve ASR performance, providing more than 30 % relative word error reduction over a strong ASR back-end on the real evaluation data of the single-channel track of the CHiME-4 dataset. These positive results demonstrate that single-channel noise reduction can still improve ASR performance, which should open the door to more research in that direction.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system
Authors:
Keisuke Kinoshita,
Marc Delcroix,
Shoko Araki,
Tomohiro Nakatani
Abstract:
Automatic meeting analysis is an essential fundamental technology required to let, e.g. smart devices follow and respond to our conversations. To achieve an optimal automatic meeting analysis, we previously proposed an all-neural approach that jointly solves source separation, speaker diarization and source counting problems in an optimal way (in a sense that all the 3 tasks can be jointly optimiz…
▽ More
Automatic meeting analysis is an essential fundamental technology required to let, e.g. smart devices follow and respond to our conversations. To achieve an optimal automatic meeting analysis, we previously proposed an all-neural approach that jointly solves source separation, speaker diarization and source counting problems in an optimal way (in a sense that all the 3 tasks can be jointly optimized through error back-propagation). It was shown that the method could well handle simulated clean (noiseless and anechoic) dialog-like data, and achieved very good performance in comparison with several conventional methods. However, it was not clear whether such all-neural approach would be successfully generalized to more complicated real meeting data containing more spontaneously-speaking speakers, severe noise and reverberation, and how it performs in comparison with the state-of-the-art systems in such scenarios. In this paper, we first consider practical issues required for improving the robustness of the all-neural approach, and then experimentally show that, even in real meeting scenarios, the all-neural approach can perform effective speech enhancement, and simultaneously outperform state-of-the-art systems.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Fragmentation modelling of the August 2019 impact on Jupiter
Authors:
Ramanakumar Sankar,
Csaba Palotai,
Ricardo Hueso,
Marc Delcroix,
Ethan Chappel,
Agustin Sanchez-Lavega
Abstract:
On 7th August 2019, an impact flash lasting $\sim1$s was observed on Jupiter. The video of this event was analysed to obtain the lightcurve and determine the energy release and initial mass. We find that the impactor released a total energy of $96-151$ kilotons of TNT, corresponding to an initial mass between $190-260$ metric tonnes with a diameter between $4-10$m. We developed a fragmentation mod…
▽ More
On 7th August 2019, an impact flash lasting $\sim1$s was observed on Jupiter. The video of this event was analysed to obtain the lightcurve and determine the energy release and initial mass. We find that the impactor released a total energy of $96-151$ kilotons of TNT, corresponding to an initial mass between $190-260$ metric tonnes with a diameter between $4-10$m. We developed a fragmentation model to simulate the atmospheric breakup of the object and reproduce the lightcurve. We model three different materials: cometary, stony and metallic at speeds of $60$, $65 $ and $70$ km/s to determine the material makeup of the impacting object. The slower cases are best fit by a strong, metallic object while the faster cases require a weaker material.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention
Authors:
Yuma Koizumi,
Kohei Yatabe,
Marc Delcroix,
Yoshiki Masuyama,
Daiki Takeuchi
Abstract:
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and s…
▽ More
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)--based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is whether a DNN for speech enhancement can be adopted to unknown speakers without any auxiliary guidance signal in test-phase. To achieve this, we adopt multi-task learning of speech enhancement and speaker identification, and use the output of the final hidden layer of speaker identification branch as an auxiliary feature. In addition, we use multi-head self-attention for capturing long-term dependencies in the speech and noise. Experimental results on a public dataset show that our strategy achieves the state-of-the-art performance and also outperform conventional methods in terms of subjective quality.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam
Authors:
Marc Delcroix,
Tsubasa Ochiai,
Katerina Zmolikova,
Keisuke Kinoshita,
Naohiro Tawara,
Tomohiro Nakatani,
Shoko Araki
Abstract:
Target speech extraction, which extracts a single target source in a mixture given clues about the target speaker, has attracted increasing attention. We have recently proposed SpeakerBeam, which exploits an adaptation utterance of the target speaker to extract his/her voice characteristics that are then used to guide a neural network towards extracting speech of that speaker. SpeakerBeam presents…
▽ More
Target speech extraction, which extracts a single target source in a mixture given clues about the target speaker, has attracted increasing attention. We have recently proposed SpeakerBeam, which exploits an adaptation utterance of the target speaker to extract his/her voice characteristics that are then used to guide a neural network towards extracting speech of that speaker. SpeakerBeam presents a practical alternative to speech separation as it enables tracking speech of a target speaker across utterances, and achieves promising speech extraction performance. However, it sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures, because it is difficult to discriminate the target speaker from the interfering speakers. In this paper, we investigate strategies for improving the speaker discrimination capability of SpeakerBeam. First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation. Besides, we investigate (1) the use of spatial features to better discriminate speakers when microphone array recordings are available, (2) adding an auxiliary speaker identification loss for hel** to learn more discriminative voice characteristics. We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures, and outperform TasNet in terms of target speech extraction.
△ Less
Submitted 23 January, 2020;
originally announced January 2020.
-
End-to-end training of time domain audio separation and recognition
Authors:
Thilo von Neumann,
Keisuke Kinoshita,
Lukas Drude,
Christoph Boeddeker,
Marc Delcroix,
Tomohiro Nakatani,
Reinhold Haeb-Umbach
Abstract:
The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition. However, up until now, state-of-the-art neural network-based time domain source separation has not yet been combined with E2E speech recognition. We here demonstrate how to combine a separation module based on a Convolutional Time domain Audi…
▽ More
The rising interest in single-channel multi-speaker speech separation sparked development of End-to-End (E2E) approaches to multi-speaker speech recognition. However, up until now, state-of-the-art neural network-based time domain source separation has not yet been combined with E2E speech recognition. We here demonstrate how to combine a separation module based on a Convolutional Time domain Audio Separation Network (Conv-TasNet) with an E2E speech recognizer and how to train such a model jointly by distributing it over multiple GPUs or by approximating truncated back-propagation for the convolutional front-end. To put this work into perspective and illustrate the complexity of the design space, we provide a compact overview of single-channel multi-speaker recognition systems. Our experiments show a word error rate of 11.0% on WSJ0-2mix and indicate that our joint time domain model can yield substantial improvements over cascade DNN-HMM and monolithic E2E frequency domain systems proposed so far.
△ Less
Submitted 13 April, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.
-
Saturn atmospheric dynamics one year after Cassini: Long-lived features and time variations in the drift of the Hexagon
Authors:
R. Hueso,
A. Sánchez-Lavega,
J. F. Rojas,
A. A. Simon,
T. Barry,
T. del Río-Gaztelurrutia,
A. Antuñano,
K. M. Sayanagi,
M. Delcroix,
L. N. Fletcher,
E. García-Melendo,
S. Pérez-Hoyos,
J. Blalock,
F. Colas,
J. M. Gómez-Forrellad,
J. L. Gunnarson,
D. Peach,
M. H. Wong
Abstract:
We examine Saturn's atmosphere with observations from ground-based telescopes and Hubble Space Telescope (HST). We present a detailed analysis of observations acquired during 2018. A system of polar storms that appeared in the planet in March 2018 and remained active with a complex phenomenology at least until Sept. is analyzed elsewhere (Sanchez-Lavega et al., in press , 2019). Many of the cloud…
▽ More
We examine Saturn's atmosphere with observations from ground-based telescopes and Hubble Space Telescope (HST). We present a detailed analysis of observations acquired during 2018. A system of polar storms that appeared in the planet in March 2018 and remained active with a complex phenomenology at least until Sept. is analyzed elsewhere (Sanchez-Lavega et al., in press , 2019). Many of the cloud features in 2018 are long-lived and can be identified in images in 2017, and in some cases, for up to a decade using also Cassini ISS images. Without considering the polar storms, the most interesting long-lived cloud systems are: i) A bright spot in the EZ that can be tracked continuously since 2014 with a zonal velocity of 444 m/s in 2014 and 452 m/s in 2018. This velocity is different from the zonal winds at the cloud level at its latitude during the Cassini mission, and is closer to zonal winds obtained at the time of the Voyager flybys and zonal winds from Cassini VIMS infrared images of the lower atmosphere. ii) A large Anticyclone Vortex, here AV, that formed after the GWS of 2010-2011. This vortex has changed significantly in visual contrast, drift rate and latitude with minor changes in size over the last years. iii) A system of subpolar vortices present at least since 2011. These vortices follow drift rates consistent with zonal winds obtained by Cassini. We also present the positions of the vertices of the North polar hexagon from 2015 to 2018 compared with previous analyses during Cassini (2007-2014), observations with HST, and Voyager data in 1980-1981 to explore the long-term hexagon's drift rate. Variations in the drift rate cannot be fit by seasonal changes. Instead, the different drift rates reinforce the role of the North Polar Spot that was present in the Voyager epoch to cause a faster drift rate of the hexagon at that time compared with the current one.
△ Less
Submitted 30 September, 2019;
originally announced September 2019.
-
The onset and growth of the 2018 Martian Global Dust Storm
Authors:
Agustín Sánchez-Lavega,
Teresa del Río-Gaztelurrutia,
Jorge Hernández-Bernal,
Marc Delcroix
Abstract:
We analyze the onset and initial expansion of the 2018 Martian Global Dust Storm (GDS 2018) using ground-based images in the visual range. This is the first case of a confirmed GDS initiating in the Northern Hemisphere. A dusty area extending about 1.4x10e5 km^2 and centered at latitude +31.7° $\pm$ 1.8° and west longitude 18° $\pm$ 5°W in Acidalia Planitia was captured on 30 and 31 May 2018 (Ls =…
▽ More
We analyze the onset and initial expansion of the 2018 Martian Global Dust Storm (GDS 2018) using ground-based images in the visual range. This is the first case of a confirmed GDS initiating in the Northern Hemisphere. A dusty area extending about 1.4x10e5 km^2 and centered at latitude +31.7° $\pm$ 1.8° and west longitude 18° $\pm$ 5°W in Acidalia Planitia was captured on 30 and 31 May 2018 (Ls = 184.9°). From 1 to 8 June, daily image series showed the storm expanding southwards along the Acidalia corridor with velocities of 5 m/s, and simultaneously progressing eastwards and westwards with horizontal velocities ranging from 5 to 40 m/s. By 8 June the dust reached latitude -55° and later penetrated in the South polar region, whereas in the North the dust progression stopped at latitude +46°. We compare the onset and expansion stage of this GDS with the previous confirmed storms.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
All-neural online source separation, counting, and diarization for meeting analysis
Authors:
Thilo von Neumann,
Keisuke Kinoshita,
Marc Delcroix,
Shoko Araki,
Tomohiro Nakatani,
Reinhold Haeb-Umbach
Abstract:
Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural ap…
▽ More
Automatic meeting analysis comprises the tasks of speaker counting, speaker diarization, and the separation of overlapped speech, followed by automatic speech recognition. This all has to be carried out on arbitrarily long sessions and, ideally, in an online or block-online manner. While significant progress has been made on individual tasks, this paper presents for the first time an all-neural approach to simultaneous speaker counting, diarization and source separation. The NN-based estimator operates in a block-online fashion and tracks speakers even if they remain silent for a number of time blocks, thus learning a stable output order for the separated sources. The neural network is recurrent over time as well as over the number of sources. The simulation experiments show that state of the art separation performance is achieved, while at the same time delivering good diarization and source counting results. It even generalizes well to an unseen large number of blocks.
△ Less
Submitted 21 February, 2019;
originally announced February 2019.
-
Analysis of Neptune's 2017 Bright Equatorial Storm
Authors:
Edward Molter,
Imke de Pater,
Statia Luszcz-Cook,
Ricardo Hueso,
Joshua Tollefson,
Carlos Alvarez,
Agustín Sánchez-Lavega,
Michael H. Wong,
Andrew I. Hsu,
Lawrence A. Sromovsky,
Patrick M. Fry,
Marc Delcroix,
Randy Campbell,
Katherine de Kleer,
Elinor Gates,
Paul David Lynam,
S. Mark Ammons,
Brandon Park Coy,
Gaspard Duchene,
Erica J. Gonzales,
Lea Hirsch,
Eugene A. Magnier,
Sam Ragland,
R. Michael Rich,
Feige Wang
Abstract:
We report the discovery of a large ($\sim$8500 km diameter) infrared-bright storm at Neptune's equator in June 2017. We tracked the storm over a period of 7 months with high-cadence infrared snapshot imaging, carried out on 14 nights at the 10 meter Keck II telescope and 17 nights at the Shane 120 inch reflector at Lick Observatory. The cloud feature was larger and more persistent than any equator…
▽ More
We report the discovery of a large ($\sim$8500 km diameter) infrared-bright storm at Neptune's equator in June 2017. We tracked the storm over a period of 7 months with high-cadence infrared snapshot imaging, carried out on 14 nights at the 10 meter Keck II telescope and 17 nights at the Shane 120 inch reflector at Lick Observatory. The cloud feature was larger and more persistent than any equatorial clouds seen before on Neptune, remaining intermittently active from at least 10 June to 31 December 2017. Our Keck and Lick observations were augmented by very high-cadence images from the amateur community, which permitted the determination of accurate drift rates for the cloud feature. Its zonal drift speed was variable from 10 June to at least 25 July, but remained a constant $237.4 \pm 0.2$ m s$^{-1}$ from 30 September until at least 15 November. The pressure of the cloud top was determined from radiative transfer calculations to be 0.3-0.6 bar; this value remained constant over the course of the observations. Multiple cloud break-up events, in which a bright cloud band wrapped around Neptune's equator, were observed over the course of our observations. No "dark spot" vortices were seen near the equator in HST imaging on 6 and 7 October. The size and pressure of the storm are consistent with moist convection or a planetary-scale wave as the energy source of convective upwelling, but more modeling is required to determine the driver of this equatorial disturbance as well as the triggers for and dynamics of the observed cloud break-up events.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
A New, Long-Lived, Jupiter Mesoscale Wave Observed at Visible Wavelengths
Authors:
Amy A. Simon,
Ricardo Hueso,
Peio Inurrigarro,
Agustin Sanchez-Lavega,
Raul Morales-Juberias,
Richard Cosentino,
Leigh N. Fletcher,
Michael H. Wong,
Andrew I. Hsu,
Imke de Pater,
Glenn S. Orton,
Francois Colas,
Marc Delcroix,
Damian Peach,
Josep-Maria Gomez-Forrellad
Abstract:
Small-scale waves were observed along the boundary between Jupiter's North Equatorial Belt and North Tropical Zone, ~16.5° N planetographic latitude in Hubble Space Telescope data in 2012 and throughout 2015 to 2018, observable at all wavelengths from the UV to the near IR. At peak visibility, the waves have sufficient contrast (~10%) to be observed from ground-based telescopes. They have a typica…
▽ More
Small-scale waves were observed along the boundary between Jupiter's North Equatorial Belt and North Tropical Zone, ~16.5° N planetographic latitude in Hubble Space Telescope data in 2012 and throughout 2015 to 2018, observable at all wavelengths from the UV to the near IR. At peak visibility, the waves have sufficient contrast (~10%) to be observed from ground-based telescopes. They have a typical wavelength of about 1.2° (1400 km), variable-length wave trains, and westward phase speeds of a few m/s or less. New analysis of Voyager 2 data shows similar wave trains over at least 300 hours. Some waves appear curved when over cyclones and anticyclones, but most are straight, but tilted, shifting in latitude as they pass vortices. Based on their wavelengths, phase speeds, and faint appearance at high-altitude sensitive passbands, the observed NEB waves are consistent with inertia-gravity waves at the 500-mbar pressure level, though formation altitude is not well constrained. Preliminary General Circulation Model simulations generate inertia-gravity waves from vortices interacting with the environment and can reproduce the observed wavelengths and orientations. Several mechanisms can generate these waves, and all may contribute: geostrophic adjustment of cyclones; cyclone/anticyclone interactions; wind interactions with obstructions or heat pulses from convection; or changing vertical wind shear. However, observations also show that the presence of vortices and/or regions of convection are not sufficient by themselves for wave formation, implying that a change in vertical structure may affect their stability, or that changes in haze properties may affect their visibility.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
Small impacts on the giant planet Jupiter
Authors:
R. Hueso,
M. Delcroix,
A. Sánchez-Lavega,
S. Pedranghelu,
G. Kernbauer,
J. McKeon,
A. Fleckstein,
A. Wesley,
J. M. Gómez-Forrellad,
J. F. Rojas,
J. Juaristi
Abstract:
Video observations of Jupiter obtained by amateur astronomers over the past eight years have shown five flashes of light of 1-2 s. The first three of these events occurred on 3 June 2010, 20 August 2010, and 10 September 2012. Previous analyses showed that they were caused by the impact of objects of 5-20 m in diameter, depending on their density, with a released energy comparable to superbolides…
▽ More
Video observations of Jupiter obtained by amateur astronomers over the past eight years have shown five flashes of light of 1-2 s. The first three of these events occurred on 3 June 2010, 20 August 2010, and 10 September 2012. Previous analyses showed that they were caused by the impact of objects of 5-20 m in diameter, depending on their density, with a released energy comparable to superbolides on Earth of the class of the Chelyabinsk airburst. The most recent two flashes on Jupiter were detected on 17 March 2016 and 26 May 2017 and are analyzed here. We characterize the energy involved together with the masses and sizes of the objects that produced these flashes. The rate of similar impacts on Jupiter provides improved constraints on the total flux of impacts on the planet, which can be compared to the amount of exogenic species detected in the upper atmosphere of Jupiter. We extracted light curves of the flashes and calculated the masses and sizes of the impacting objects. An examination of the number of amateur observations of Jupiter as a function of time allows us to interpret the statistics of these detections. The cumulative flux of small objects (5-20 m or larger) that impact Jupiter is predicted to be low (10-65 impacts per year), and only a fraction of them are potentially observable from Earth (4-25 per year in a perfect survey). More impacts will be found in the next years, with Jupiter opposition displaced toward summer in the northern hemisphere. Objects of this size contribute negligibly to the exogenous species and dust in the stratosphere of Jupiter when compared with the continuous flux from interplanetary dust punctuated by giant impacts. Flashes of a high enough could produce an observable debris field on the planet. We estimate that a continuous search for these impacts might find these events once every 0.4 to 2.6 years.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
Neptune long-lived atmospheric features in 2013-2015 from small (28-cm) to large (10-m) telescopes
Authors:
R. Hueso,
I. de Pater,
A. Simon,
A. Sanchez-Lavega,
M. Delcroix,
M. H. Wong,
J. W. Tollefson,
C. Baranec,
K. de Kleer,
S. H. Luszcz-Cook,
G. S. Orton,
H. B. Hammel,
J. M. Gomez-Forrellad,
I. Ordonez-Etxeberria,
L. Sromovsky,
P. Fry,
F. Colas,
J. F. Rojas,
S. Perez-Hoyos,
P. Gorczynski,
J. Guarro,
W. Kivits,
P. Miles,
D. Millika,
P. Nicholas
, et al. (10 additional authors not shown)
Abstract:
Since 2013, observations of Neptune with small telescopes have resulted in several detections of long-lived bright atmospheric features that have also been observed by large telescopes such as Keck II or Hubble. The combination of both types of images allows the study of the long term evolution of major cloud systems in the planet. In 2013 and 2014 two bright features were present on the planet at…
▽ More
Since 2013, observations of Neptune with small telescopes have resulted in several detections of long-lived bright atmospheric features that have also been observed by large telescopes such as Keck II or Hubble. The combination of both types of images allows the study of the long term evolution of major cloud systems in the planet. In 2013 and 2014 two bright features were present on the planet at southern mid latitudes. These may have merged in late 2014, possibly leading to the formation of a single bright feature observed during 2015 at the same latitude. This cloud system was first observed in January 2015 and nearly continuously from July to December 2015 in observations with telescopes in the 2 to 10 meter class and in images from amateur astronomers. These images show the bright spot as a compact feature at 40.1 deg South planetographic latitude well resolved from a nearby bright zonal band that extended from 42 deg South to 20 deg South. Tracking its motion from July to November 2015 suggests a longitudinal oscillation of 16 deg in amplitude with a 90 day period, typical of dark spots on Neptune and similar to the Great Red Spot oscillation in Jupiter. The limited time covered by high-resolution observations only covers one full oscillation and other interpretations of the changing motions could be possible. HST images in September 2015 show the presence of a dark spot at short wavelengths in the southern flank of the bright cloud observed throughout 2015.
△ Less
Submitted 26 September, 2017;
originally announced September 2017.
-
The need for Professional-Amateur collaborations in studies of Jupiter and Saturn
Authors:
Emmanuel Kardasis,
John H. Rogers,
Glenn Orton,
Marc Delcroix,
Apostolos Christou,
Mike Foulkes,
Padma Yanamandra-Fisher,
Michel Jacquesson,
Grigoris Maravelias
Abstract:
The observation of gaseous giant planets is of high scientific interest. Although they have been the targets of several spacecraft missions, there still remains a need for continuous ground-based observations. As their atmospheres present fast dynamic environments on various time scales, the availability of time at professional telescopes is neither uniform nor of sufficient duration to assess tem…
▽ More
The observation of gaseous giant planets is of high scientific interest. Although they have been the targets of several spacecraft missions, there still remains a need for continuous ground-based observations. As their atmospheres present fast dynamic environments on various time scales, the availability of time at professional telescopes is neither uniform nor of sufficient duration to assess temporal changes. However, numerous amateurs with small telescopes (of 15-40 cm) and modern hardware and software equipment can monitor these changes daily (within the 360-900nm range). Amateurs are able to trace the structure and the evolution of atmospheric features, such as major planetary-scale disturbances, vortices, and storms. Their observations provide a continuous record and it is not uncommon to trigger professional observations in cases of important events, such as sudden onset of global changes, storms and celestial impacts. For example, the continuous amateur monitoring has led to the discovery of fireballs in Jupiter's atmosphere, providing information not only on Jupiter's gravitational influence but also on the properties and populations of the impactors. Photometric monitoring of stellar occultations by the planets can reveal spatial/temporal variability in their atmospheric structure. Therefore, co-ordination and communication between professionals and amateurs is important. We present examples of such collaborations that: (i) engage systematic multi-wavelength observations and databases, (ii) examine the variability of cloud features over timescales from days to decades, (iii) provide, by ground-based professional and amateur observations, the necessary spatial and temporal resolution of features that will be studied by the interplanetary mission Juno, (iv) investigate video observations of Jupiter to identify impacts of small objects, (v) carry out stellar-occultation campaigns.
△ Less
Submitted 26 March, 2015;
originally announced March 2015.
-
Instrumental Methods for Professional and Amateur Collaborations in Planetary Astronomy
Authors:
O. Mousis,
R. Hueso,
J. -P. Beaulieu,
S. Bouley,
B. Carry,
F. Colas,
A. Klotz,
C. Pellier,
J. -M. Petit,
P. Rousselot,
M. Ali Dib,
W. Beisker,
M. Birlan,
C. Buil,
A. Delsanti,
E. Frappa,
H. B. Hammel,
A. -C. Levasseur-Regourd,
G. S. Orton,
A. Sanchez-Lavega,
A. Santerne,
P. Tanga,
J. Vaubaillon,
B. Zanda,
D. Baratoux
, et al. (35 additional authors not shown)
Abstract:
Amateur contributions to professional publications have increased exponentially over the last decades in the field of Planetary Astronomy. Here we review the different domains of the field in which collaborations between professional and amateur astronomers are effective and regularly lead to scientific publications. We discuss the instruments, detectors, softwares and methodologies typically used…
▽ More
Amateur contributions to professional publications have increased exponentially over the last decades in the field of Planetary Astronomy. Here we review the different domains of the field in which collaborations between professional and amateur astronomers are effective and regularly lead to scientific publications. We discuss the instruments, detectors, softwares and methodologies typically used by amateur astronomers to collect the scientific data in the different domains of interest. Amateur contributions to the monitoring of planets and interplanetary matter, characterization of asteroids and comets, as well as the determination of the physical properties of Kuiper Belt Objects and exoplanets are discussed.
△ Less
Submitted 4 March, 2014; v1 submitted 15 May, 2013;
originally announced May 2013.
-
Overview of Saturn lightning observations
Authors:
G. Fischer,
U. A. Dyudina,
W. S. Kurth,
D. A. Gurnett,
P. Zarka,
T. Barry,
M. Delcroix,
C. Go,
D. Peach,
R. Vandebergh,
A. Wesley
Abstract:
The lightning activity in Saturn's atmosphere has been monitored by Cassini for more than six years. The continuous observations of the radio signatures called SEDs (Saturn Electrostatic Discharges) combine favorably with imaging observations of related cloud features as well as direct observations of flash-illuminated cloud tops. The Cassini RPWS (Radio and Plasma Wave Science) instrument and ISS…
▽ More
The lightning activity in Saturn's atmosphere has been monitored by Cassini for more than six years. The continuous observations of the radio signatures called SEDs (Saturn Electrostatic Discharges) combine favorably with imaging observations of related cloud features as well as direct observations of flash-illuminated cloud tops. The Cassini RPWS (Radio and Plasma Wave Science) instrument and ISS (Imaging Science Subsystem) in orbit around Saturn also received ground-based support: The intense SED radio waves were also detected by the giant UTR-2 radio telescope, and committed amateurs observed SED-related white spots with their backyard optical telescopes. Furthermore, the Cassini VIMS (Visual and Infrared Map** Spectrometer) and CIRS (Composite Infrared Spectrometer) instruments have provided some information on chemical constituents possibly created by the lightning discharges and transported upward to Saturn's upper atmosphere by vertical convection. In this paper we summarize the main results on Saturn lightning provided by this multi-instrumental approach and compare Saturn lightning to lightning on Jupiter and Earth.
△ Less
Submitted 21 November, 2011;
originally announced November 2011.
-
Contribution of amateur observations to Saturn storm studies
Authors:
Marc Delcroix,
Georg Fischer
Abstract:
Since 2004, Saturn Electrostatic Discharges (SEDs), which are the radio signatures of lightning in Saturn's atmosphere, have been observed by the Cassini Radio and Plasma Wave Science instrument (RPWS). Despite their important time coverage, these observations lack the resolution and positioning given by imaging around visible wavelengths. Amateur observations from Earth have been increasing in qu…
▽ More
Since 2004, Saturn Electrostatic Discharges (SEDs), which are the radio signatures of lightning in Saturn's atmosphere, have been observed by the Cassini Radio and Plasma Wave Science instrument (RPWS). Despite their important time coverage, these observations lack the resolution and positioning given by imaging around visible wavelengths. Amateur observations from Earth have been increasing in quality and coverage since a few years, bringing information on positions, drift rates and shape evolutions of large visible white spots in Saturn's atmosphere. Combining these two complementary sources has brought better analysis of Saturn's storms evolutions.
△ Less
Submitted 4 November, 2010;
originally announced November 2010.