Skip to main content

Showing 1–27 of 27 results for author: Ellis, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.15456  [pdf, ps, other

    math.CO cs.CC math.GR math.PR

    Product Mixing in Compact Lie Groups

    Authors: David Ellis, Guy Kindler, Noam Lifshitz, Dor Minzer

    Abstract: If $G$ is a group, we say a subset $S$ of $G$ is product-free if the equation $xy=z$ has no solutions with $x,y,z \in S$. For $D \in \mathbb{N}$, a group $G$ is said to be $D$-quasirandom if the minimal dimension of a nontrivial complex irreducible representation of $G$ is at least $D$. Gowers showed that in a $D$-quasirandom finite group $G$, the maximal size of a product-free set is at most… ▽ More

    Submitted 3 May, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: References updated

    MSC Class: 05D05; 22E30; 20F69; 22D40; 60B15; 68Q17

  2. arXiv:2308.16139  [pdf, other

    cs.CV cs.DB cs.LG

    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

    Authors: Jianning Li, Zongwei Zhou, Jiancheng Yang, Antonio Pepe, Christina Gsaxner, Gijs Luijten, Chongyu Qu, Tiezheng Zhang, Xiaoxi Chen, Wenxuan Li, Marek Wodzinski, Paul Friedrich, Kangxian Xie, Yuan **, Narmada Ambigapathy, Enrico Nasca, Naida Solak, Gian Marco Melito, Viet Duc Vu, Afaque R. Memon, Christopher Schlachta, Sandrine De Ribaupierre, Rajnikant Patel, Roy Eagleson, Xiaojun Chen , et al. (132 additional authors not shown)

    Abstract: Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of Shape… ▽ More

    Submitted 12 December, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 16 pages

    MSC Class: 68T01

  3. Dataset balancing can hurt model performance

    Authors: R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal

    Abstract: Machine learning from training data with a skewed distribution of examples per class can lead to models that favor performance on common classes at the expense of performance on rare ones. AudioSet has a very wide range of priors over its 527 sound event classes. Classification performance on AudioSet is usually evaluated by a simple average over per-class metrics, meaning that performance on rare… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

    Comments: 5 pages, 3 figures, ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

  4. arXiv:2303.17719  [pdf, other

    cs.CV cs.LG

    Why is the winner the best?

    Authors: Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Sharib Ali, Vincent Andrearczyk, Marc Aubreville, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Veronika Cheplygina, Marie Daum, Marleen de Bruijne, Adrien Depeursinge, Reuben Dorent, Jan Egger, David G. Ellis, Sandy Engelhardt, Melanie Ganz , et al. (100 additional authors not shown)

    Abstract: International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To addre… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: accepted to CVPR 2023

  5. arXiv:2210.07856  [pdf, other

    eess.AS cs.SD

    Description and analysis of novelties introduced in DCASE Task 4 2022 on the baseline system

    Authors: Francesca Ronchini, Samuele Cornell, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Daniel P. W. Ellis

    Abstract: The aim of the Detection and Classification of Acoustic Scenes and Events Challenge Task 4 is to evaluate systems for the detection of sound events in domestic environments using an heterogeneous dataset. The systems need to be able to correctly detect the sound events present in a recorded audio clip, as well as localize the events in time. This year's task is a follow-up of DCASE 2021 Task 4, wi… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022)

  6. arXiv:2208.12415  [pdf, other

    eess.AS cs.CL cs.SD stat.ML

    MuLan: A Joint Embedding of Music Audio and Natural Language

    Authors: Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P. W. Ellis

    Abstract: Music tagging and content-based retrieval systems have traditionally been constructed using pre-defined ontologies covering a rigid set of music attributes or text queries. This paper presents MuLan: a first attempt at a new generation of acoustic models that link music audio directly to unconstrained natural language music descriptions. MuLan takes the form of a two-tower, joint audio-text embedd… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: To appear in ISMIR 2022

  7. arXiv:2204.11826  [pdf, other

    cs.SE cs.HC

    Personality Traits in Game Development

    Authors: Miriam Sturdee, Matthew Ivory, David Ellis, Patrick Stacey, Paul Ralph

    Abstract: Existing work on personality traits in software development excludes game developers as a discrete group. Whilst games are software, game development has unique considerations, so game developers may exhibit different personality traits from other software professionals. We assessed responses from 123 game developers on an International Personality Item Pool Five Factor Model scale and demographic… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: 10 pages, 2 figures, 4 tables,

    Journal ref: In proceedings of the International Conference on Evaluation and Assessment in Software Engineering (EASE 2022), June 13--15, Gothenburg, Sweden

  8. arXiv:2204.05703  [pdf, other

    cs.CV cs.AI cs.LG

    Back to the Roots: Reconstructing Large and Complex Cranial Defects using an Image-based Statistical Shape Model

    Authors: Jianning Li, David G. Ellis, Antonio Pepe, Christina Gsaxner, Michele R. Aizenberg, Jens Kleesiek, Jan Egger

    Abstract: Designing implants for large and complex cranial defects is a challenging task, even for professional designers. Current efforts on automating the design process focused mainly on convolutional neural networks (CNN), which have produced state-of-the-art results on reconstructing synthetic defects. However, existing CNN-based methods have been difficult to translate to clinical practice in craniopl… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: 9 pages

  9. arXiv:2202.12689  [pdf, other

    eess.SP cs.LG

    Domain Adaptation: the Key Enabler of Neural Network Equalizers in Coherent Optical Systems

    Authors: Pedro J. Freire, Bernhard Spinnler, Daniel Abode, Jaroslaw E. Prilepsky, Abdallah A. I. Ali, Nelson Costa, Wolfgang Schairer, Antonio Napoli, Andrew D. Ellis, Sergei K. Turitsyn

    Abstract: We introduce the domain adaptation and randomization approach for calibrating neural network-based equalizers for real transmissions, using synthetic data. The approach renders up to 99\% training process reduction, which we demonstrate in three experimental setups.

    Submitted 25 February, 2022; originally announced February 2022.

    Comments: Paper Accepted at OFC 2022

  10. arXiv:2105.07031  [pdf, other

    cs.SD eess.AS

    The Benefit Of Temporally-Strong Labels In Audio Event Classification

    Authors: Shawn Hershey, Daniel P W Ellis, Eduardo Fonseca, Aren Jansen, Caroline Liu, R Channing Moore, Manoj Plakal

    Abstract: To reveal the importance of temporal precision in ground truth audio event labels, we collected precise (~0.1 sec resolution) "strong" labels for a portion of the AudioSet dataset. We devised a temporally strong evaluation set (including explicit negatives of varying difficulty) and a small strong-labeled training subset of 67k clips (compared to the original dataset's 1.8M clips labeled at 10 sec… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at ICASSP 2021

  11. arXiv:2105.02132  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Learning from Automatically Separated Sound Scenes

    Authors: Eduardo Fonseca, Aren Jansen, Daniel P. W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore, Xavier Serra

    Abstract: Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and each other is semantically constrained: the sound scene contains the union of source classes and not all classes naturally co-occur. With this motivation, this… ▽ More

    Submitted 14 September, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  12. arXiv:2011.01143  [pdf, other

    cs.SD cs.CV eess.AS

    Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

    Authors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

    Abstract: Recent progress in deep learning has enabled many advances in sound separation and visual scene understanding. However, extracting sound sources which are apparent in natural videos remains an open problem. In this work, we present AudioScope, a novel audio-visual sound separation framework that can be trained without supervision to isolate on-screen sound sources from real in-the-wild videos. Pri… ▽ More

    Submitted 29 May, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: ICLR 2021, 27 pages

  13. arXiv:2011.00803  [pdf, other

    cs.SD eess.AS

    What's All the FUSS About Free Universal Sound Separation Data?

    Authors: Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel, Nicolas Turpault, Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

    Abstract: We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio data drawn from 357 classes, which are used to create mixtures of one to four sources. To simulate reverberation, an acoustic room simulator is used to generate… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  14. arXiv:2005.00878  [pdf, other

    cs.SD cs.LG eess.AS

    Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking

    Authors: Eduardo Fonseca, Shawn Hershey, Manoj Plakal, Daniel P. W. Ellis, Aren Jansen, R. Channing Moore, Xavier Serra

    Abstract: The study of label noise in sound event recognition has recently gained attention with the advent of larger and noisier datasets. This work addresses the problem of missing labels, one of the big weaknesses of large audio datasets, and one of the most conspicuous issues for AudioSet. We propose a simple and model-agnostic method based on a teacher-student framework with loss masking to first ident… ▽ More

    Submitted 25 July, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: Accepted in IEEE Signal Processing Letters, openly accessible at https://ieeexplore.ieee.org/document/9130823

    Journal ref: IEEE Signal Processing Letters, Vol. 27, 2020, pages 1235-1239

  15. arXiv:1912.05869  [pdf, other

    eess.AS cs.NE cs.SD q-bio.NC

    On Neural Phone Recognition of Mixed-Source ECoG Signals

    Authors: Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. Chang

    Abstract: The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings. In this study, we demonstrate the utility of NSR systems to objectively prove the ability of human beings to attend to a single speech source while suppressing the interfering signals in… ▽ More

    Submitted 12 December, 2019; originally announced December 2019.

    Comments: 5 pages, showing algorithms, results and references from our collaboration during a 2017 postdoc stay of the first author

  16. arXiv:1911.07951  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Improving Universal Sound Separation Using Sound Classification

    Authors: Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, Daniel P. W. Ellis

    Abstract: Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic s… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  17. arXiv:1911.05894  [pdf, other

    cs.SD eess.AS stat.ML

    Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision

    Authors: Aren Jansen, Daniel P. W. Ellis, Shawn Hershey, R. Channing Moore, Manoj Plakal, Ashok C. Popat, Rif A. Saurous

    Abstract: Humans do not acquire perceptual abilities in the way we train machines. While machine learning algorithms typically operate on large collections of randomly-chosen, explicitly-labeled examples, human acquisition relies more heavily on multimodal unsupervised learning (as infants) and active learning (as children). With this motivation, we present a learning framework for sound representation and… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: This extended version of a ICASSP 2020 submission under same title has an added figure and additional discussion for easier consumption

  18. arXiv:1906.02975  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Audio tagging with noisy labels and minimal supervision

    Authors: Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Serra

    Abstract: This paper introduces Task 2 of the DCASE2019 Challenge, titled "Audio tagging with noisy labels and minimal supervision". This task was hosted on the Kaggle platform as "Freesound Audio Tagging 2019". The task evaluates systems for multi-label audio tagging using a large set of noisy-labeled data, and a much smaller set of manually-labeled data, under a large vocabulary setting of 80 everyday sou… ▽ More

    Submitted 19 January, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: DCASE2019 Workshop

  19. arXiv:1901.01189  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Learning Sound Event Classifiers from Web Audio with Noisy Labels

    Authors: Eduardo Fonseca, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, Xavier Serra

    Abstract: As sound event classification moves towards larger datasets, issues of label noise become inevitable. Web sites can supply large volumes of user-contributed audio and metadata, but inferring labels from this metadata introduces errors due to unreliable inputs, and limitations in the map**. There is, however, little research into the impact of these errors. To foster the investigation of label no… ▽ More

    Submitted 7 March, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019)

  20. arXiv:1808.00606  [pdf, other

    cs.SD eess.AS

    AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

    Authors: Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

    Abstract: Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or… ▽ More

    Submitted 23 August, 2018; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Interspeech, 2018

  21. arXiv:1807.09902  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

    Authors: Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

    Abstract: This paper describes Task 2 of the DCASE 2018 Challenge, titled "General-purpose audio tagging of Freesound content with AudioSet labels". This task was hosted on the Kaggle platform as "Freesound General-Purpose Audio Tagging Challenge". The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the A… ▽ More

    Submitted 6 October, 2018; v1 submitted 25 July, 2018; originally announced July 2018.

    Comments: Camera ready for DCASE Workshop 2018

  22. arXiv:1711.02209  [pdf, ps, other

    cs.SD eess.AS stat.ML

    Unsupervised Learning of Semantic Audio Representations

    Authors: Aren Jansen, Manoj Plakal, Ratheet Pandya, Daniel P. W. Ellis, Shawn Hershey, Jiayang Liu, R. Channing Moore, Rif A. Saurous

    Abstract: Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the ca… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: Submitted to ICASSP 2018

  23. arXiv:1611.05136  [pdf, other

    cs.LG stat.ML

    Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery

    Authors: Mahtab J. Fard, Sattar Ameri, Ratna B. Chinnam, Abhilash K. Pandya, Michael D. Klein, R. Darin Ellis

    Abstract: Evaluating surgeon skill has predominantly been a subjective task. Development of objective methods for surgical skill assessment are of increased interest. Recently, with technological advances such as robotic-assisted minimally invasive surgery (RMIS), new opportunities for objective and automated assessment frameworks have arisen. In this paper, we applied machine learning methods to automatica… ▽ More

    Submitted 15 November, 2016; originally announced November 2016.

    Journal ref: Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016, 19-21 October, 2016, San Francisco, USA

  24. arXiv:1610.07245  [pdf, other

    cs.RO

    Toward Personalized Training and Skill Assessment in Robotic Minimally Invasive Surgery

    Authors: Mahtab J. Fard, Sattar Ameri, R. Darin Ellis

    Abstract: Despite the immense technology advancement in the surgeries the criteria of assessing the surgical skills still remains based on subjective standards. With the advent of robotic-assisted surgery, new opportunities for objective and autonomous skill assessment is introduced. Previous works in this area are mostly based on structured-based method such as Hidden Markov Model (HMM) which need enormous… ▽ More

    Submitted 13 November, 2016; v1 submitted 23 October, 2016; originally announced October 2016.

    Comments: Submitted to World Congress on Engineering and Computer Science 2016

    Journal ref: Lecture Notes in Engineering and Computer Science: Proceedings of The World Congress on Engineering and Computer Science 2016, 19-21 October, 2016, San Francisco, USA

  25. arXiv:1609.09430  [pdf, other

    cs.SD cs.LG stat.ML

    CNN Architectures for Large-Scale Audio Classification

    Authors: Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

    Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th… ▽ More

    Submitted 10 January, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new additions

  26. arXiv:1512.08756  [pdf, other

    cs.LG cs.NE

    Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems

    Authors: Colin Raffel, Daniel P. W. Ellis

    Abstract: We propose a simplified model of attention which is applicable to feed-forward neural networks and demonstrate that the resulting model can solve the synthetic "addition" and "multiplication" long-term memory problems for sequence lengths which are both longer and more widely varying than the best published results for these tasks.

    Submitted 20 September, 2016; v1 submitted 29 December, 2015; originally announced December 2015.

  27. arXiv:1510.00258  [pdf, other

    math.MG cs.IT math.CO

    Geometric stability via information theory

    Authors: David Ellis, Ehud Friedgut, Guy Kindler, Amir Yehudayoff

    Abstract: The Loomis-Whitney inequality, and the more general Uniform Cover inequality, bound the volume of a body in terms of a product of the volumes of lower-dimensional projections of the body. In this paper, we prove stability versions of these inequalities, showing that when they are close to being tight, the body in question is close in symmetric difference to a 'box'. Our results are best possible u… ▽ More

    Submitted 16 January, 2017; v1 submitted 29 September, 2015; originally announced October 2015.

    Comments: 28 pages. Reformatted for Discrete Analysis, but otherwise identical to the previous version

    MSC Class: 52C07; 05D99 ACM Class: G.2.1