Skip to main content

Showing 1–21 of 21 results for author: Melas-Kyriazi, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13203  [pdf, other

    cs.LG cs.CL

    Modeling Real-Time Interactive Conversations as Timed Diarized Transcripts

    Authors: Garrett Tanzer, Gustaf Ahdritz, Luke Melas-Kyriazi

    Abstract: Chatbots built upon language models have exploded in popularity, but they have largely been limited to synchronous, turn-by-turn dialogues. In this paper we present a simple yet general method to simulate real-time interactive conversations using pretrained text-only language models, by modeling timed diarized transcripts and decoding them with causal rejection sampling. We demonstrate the promise… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: GT and GA contributed equally

  2. arXiv:2402.10128  [pdf, other

    cs.CV cs.GR cs.LG

    GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

    Authors: Abdullah Hamdi, Luke Melas-Kyriazi, **jie Mai, Guocheng Qian, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi

    Abstract: Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represe… ▽ More

    Submitted 24 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 paper. project website https://abdullahamdi.com/ges

  3. arXiv:2402.08682  [pdf, other

    cs.CV cs.AI cs.LG

    IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

    Authors: Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos

    Abstract: Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In th… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  4. arXiv:2401.08741  [pdf, other

    cs.CV cs.AI cs.LG

    Fixed Point Diffusion Models

    Authors: Xingjian Bai, Luke Melas-Kyriazi

    Abstract: We introduce the Fixed Point Diffusion Model (FPDM), a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling. Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems. C… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Project page: https://lukemelas.github.io/fixed-point-diffusion-models

  5. arXiv:2311.14665  [pdf, other

    cs.CV

    Understanding Self-Supervised Features for Learning Unsupervised Instance Segmentation

    Authors: Paul Engstler, Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina

    Abstract: Self-supervised learning (SSL) can be used to solve complex visual tasks without human labels. Self-supervised representations encode useful semantic information about images, and as a result, they have already been used for tasks such as unsupervised semantic segmentation. In this paper, we investigate self-supervised representations for instance segmentation without any manual annotations. We fi… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  6. arXiv:2309.16575  [pdf, ps, other

    cs.CL

    A Benchmark for Learning to Translate a New Language from One Grammar Book

    Authors: Garrett Tanzer, Mirac Suzgun, Eline Visser, Dan Jurafsky, Luke Melas-Kyriazi

    Abstract: Large language models (LLMs) can perform impressive feats with in-context learning or lightweight finetuning. It is natural to wonder how well these models adapt to genuinely new tasks, but how does one find tasks that are unseen in internet-scale training sets? We turn to a field that is explicitly motivated and bottlenecked by a scarcity of web data: low-resource languages. In this paper, we int… ▽ More

    Submitted 9 February, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Project site: https://lukemelas.github.io/mtob/

  7. arXiv:2308.12453  [pdf, other

    cs.CV cs.AI cs.LG

    Augmenting medical image classifiers with synthetic data from latent diffusion models

    Authors: Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh, Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou, Arjun K. Manrai

    Abstract: While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  8. arXiv:2302.10668  [pdf, other

    cs.CV cs.AI cs.LG

    $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Andrea Vedaldi

    Abstract: Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision. In this paper, we propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process. Our method takes as input a single RGB image along with its camera pose and gradually denoises a set of 3… ▽ More

    Submitted 23 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Project page: https://lukemelas.github.io/projection-conditioned-point-cloud-diffusion

  9. arXiv:2302.10663  [pdf, other

    cs.CV cs.AI cs.LG

    RealFusion: 360° Reconstruction of Any Object from a Single Image

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: We consider the problem of reconstructing a full 360° photographic model of an object from a single image of it. We do so by fitting a neural radiance field to the image, but find this problem to be severely ill-posed. We thus take an off-the-self conditional image generator based on diffusion and engineer a prompt that encourages it to "dream up" novel views of the object. Using an approach inspi… ▽ More

    Submitted 23 February, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Project page: https://lukemelas.github.io/realfusion

  10. arXiv:2211.07634  [pdf, other

    cs.CL cs.LG

    Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding

    Authors: Mirac Suzgun, Luke Melas-Kyriazi, Dan Jurafsky

    Abstract: In open-ended natural-language generation, existing text decoding methods typically struggle to produce text which is both diverse and high-quality. Greedy and beam search are known to suffer from text degeneration and linguistic diversity issues, while temperature, top-k, and nucleus sampling often yield diverse but low-quality outputs. In this work, we present crowd sampling, a family of decodin… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: https://github.com/suzgunmirac/crowd-sampling

  11. arXiv:2207.04043  [pdf, other

    cs.CL cs.CY cs.LG

    The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications

    Authors: Mirac Suzgun, Luke Melas-Kyriazi, Suproteem K. Sarkar, Scott Duke Kominers, Stuart M. Shieber

    Abstract: Innovation is a major driver of economic and social development, and information about many kinds of innovation is embedded in semi-structured data from patents and patent applications. Although the impact and novelty of innovations expressed in patent data are difficult to measure through traditional means, ML offers a promising set of techniques for evaluating novelty, summarizing contributions,… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: Website: https://patentdataset.org/, GitHub Repository: https://github.com/suzgunmirac/hupd, Hugging Face Datasets: https://huggingface.co/datasets/HUPD/hupd

  12. arXiv:2205.11503  [pdf, other

    cs.CL

    Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models

    Authors: Mirac Suzgun, Luke Melas-Kyriazi, Dan Jurafsky

    Abstract: We propose a method for arbitrary textual style transfer (TST)--the task of transforming a text into any given style--utilizing general-purpose pre-trained language models. Our method, Prompt-and-Rerank, is based on a mathematical formulation of the TST task, decomposing it into three constituent components: textual similarity, target style strength, and fluency. Specifically, our method first use… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: GitHub page: https://github.com/suzgunmirac/prompt-and-rerank. Project page: https://lukemelas.github.io/prompt-and-rerank/

  13. arXiv:2205.07839  [pdf, other

    cs.CV cs.AI

    Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically-meaningful segments without any labeled data. These tasks are particularly interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but existing unsupervised approaches struggle with complex scenes containing… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: Published at CVPR 2022. Project Page: https://lukemelas.github.io/deep-spectral-segmentation

  14. arXiv:2112.02656  [pdf, other

    cs.LG cs.DC

    Intrinisic Gradient Compression for Federated Learning

    Authors: Luke Melas-Kyriazi, Franklyn Wang

    Abstract: Federated learning is a rapidly-growing area of research which enables a large number of clients to jointly train a machine learning model on privately-held data. One of the largest barriers to wider adoption of federated learning is the communication cost of sending model updates from and to the clients, which is accentuated by the fact that many of these devices are bandwidth-constrained. In thi… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

  15. arXiv:2105.08128  [pdf, other

    cs.CV cs.AI

    PixMatch: Unsupervised Domain Adaptation via Pixelwise Consistency Training

    Authors: Luke Melas-Kyriazi, Arjun K. Manrai

    Abstract: Unsupervised domain adaptation is a promising technique for semantic segmentation and other computer vision tasks for which large-scale data annotation is costly and time-consuming. In semantic segmentation, it is attractive to train models on annotated images from a simulated (source) domain and deploy them on real (target) domains. In this work, we present a novel framework for unsupervised doma… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: CVPR 2021

  16. arXiv:2105.08127  [pdf, other

    cs.CV cs.AI

    Finding an Unsupervised Image Segmenter in Each of Your Deep Generative Models

    Authors: Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina, Andrea Vedaldi

    Abstract: Recent research has shown that numerous human-interpretable directions exist in the latent space of GANs. In this paper, we develop an automatic procedure for finding directions that lead to foreground-background image separation, and we use these directions to train an image segmentation model without human supervision. Our method is generator-agnostic, producing strong segmentation results with… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

    Comments: Project page and GitHub link: https://lukemelas.github.io/unsupervised-image-segmentation & https://github.com/lukemelas/unsupervised-image-segmentation

  17. arXiv:2105.02723  [pdf, other

    cs.CV

    Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

    Authors: Luke Melas-Kyriazi

    Abstract: The strong performance of vision transformers on image classification and other vision tasks is often attributed to the design of their multi-head attention layers. However, the extent to which attention is responsible for this strong performance remains unclear. In this short report, we ask: is the attention layer even necessary? Specifically, we replace the attention layer in a vision transforme… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: Short Technical Report. GitHub: https://github.com/lukemelas/do-you-even-need-attention

  18. arXiv:2011.01307  [pdf, other

    cs.LG cs.AI

    The Mathematical Foundations of Manifold Learning

    Authors: Luke Melas-Kyriazi

    Abstract: Manifold learning is a popular and quickly-growing subfield of machine learning based on the assumption that one's observed data lie on a low-dimensional manifold embedded in a higher-dimensional space. This thesis presents a mathematical perspective on manifold learning, delving into the intersection of kernel learning, spectral graph theory, and differential geometry. Emphasis is placed on the r… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: Undergraduate Thesis (Harvard Mathematics Department)

  19. arXiv:2003.03107  [pdf, other

    cs.CV

    Show, Edit and Tell: A Framework for Editing Image Captions

    Authors: Fawaz Sammani, Luke Melas-Kyriazi

    Abstract: Most image captioning frameworks generate captions directly from images, learning a map** from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing… ▽ More

    Submitted 6 March, 2020; originally announced March 2020.

    Comments: Accepted to CVPR 2020

  20. arXiv:2002.00733  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Generation-Distillation for Efficient Natural Language Understanding in Low-Data Settings

    Authors: Luke Melas-Kyriazi, George Han, Celine Liang

    Abstract: Over the past year, the emergence of transfer learning with large-scale language models (LM) has led to dramatic performance improvements across a broad range of natural language understanding tasks. However, the size and memory footprint of these large LMs makes them difficult to deploy in many scenarios (e.g. on mobile phones). Recent research points to knowledge distillation as a potential solu… ▽ More

    Submitted 25 January, 2020; originally announced February 2020.

    Comments: EMNLP 2019 Workshop on Deep Learning for Low-resource NLP

  21. arXiv:1908.06938  [pdf, other

    cs.CL

    Encoder-Agnostic Adaptation for Conditional Language Generation

    Authors: Zachary M. Ziegler, Luke Melas-Kyriazi, Sebastian Gehrmann, Alexander M. Rush

    Abstract: Large pretrained language models have changed the way researchers approach discriminative natural language understanding tasks, leading to the dominance of approaches that adapt a pretrained model for arbitrary downstream tasks. However it is an open-question how to use similar techniques for language generation. Early results in the encoder-agnostic setting have been mostly negative. In this work… ▽ More

    Submitted 10 September, 2019; v1 submitted 19 August, 2019; originally announced August 2019.