Search | arXiv e-print repository

Disentangled 3D Scene Generation with Layout Learning

Authors: Dave Epstein, Ben Poole, Ben Mildenhall, Alexei A. Efros, Aleksander Holynski

Abstract: We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jo… ▽ More We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/ △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.11353 [pdf, other]

doi 10.1145/3613904.3642420

Understanding the Impact of Long-Term Memory on Self-Disclosure with Large Language Model-Driven Chatbots for Public Health Intervention

Authors: Eunkyung Jo, Yuin Jeong, SoHyun Park, Daniel A. Epstein, Young-Ho Kim

Abstract: Recent large language models (LLMs) offer the potential to support public health monitoring by facilitating health disclosure through open-ended conversations but rarely preserve the knowledge gained about individuals across repeated interactions. Augmenting LLMs with long-term memory (LTM) presents an opportunity to improve engagement and self-disclosure, but we lack an understanding of how LTM i… ▽ More Recent large language models (LLMs) offer the potential to support public health monitoring by facilitating health disclosure through open-ended conversations but rarely preserve the knowledge gained about individuals across repeated interactions. Augmenting LLMs with long-term memory (LTM) presents an opportunity to improve engagement and self-disclosure, but we lack an understanding of how LTM impacts people's interaction with LLM-driven chatbots in public health interventions. We examine the case of CareCall -- an LLM-driven voice chatbot with LTM -- through the analysis of 1,252 call logs and interviews with nine users. We found that LTM enhanced health disclosure and fostered positive perceptions of the chatbot by offering familiarity. However, we also observed challenges in promoting self-disclosure through LTM, particularly around addressing chronic health conditions and privacy concerns. We discuss considerations for LTM integration in LLM-driven chatbots for public health monitoring, including carefully deciding what topics need to be remembered in light of public health goals. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: Accepted to ACM CHI 2024 as a full paper

ACM Class: H.5.2; I.2.7

Journal ref: In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA. ACM, New York, NY, USA

arXiv:2310.15150 [pdf, other]

Online Detection of AI-Generated Images

Authors: David C. Epstein, Ishan Jain, Oliver Wang, Richard Zhang

Abstract: With advancements in AI-generated images coming on a continuous basis, it is increasingly difficult to distinguish traditionally-sourced images (e.g., photos, artwork) from AI-generated ones. Previous detection methods study the generalization from a single generator to another in isolation. However, in reality, new generators are released on a streaming basis. We study generalization in this sett… ▽ More With advancements in AI-generated images coming on a continuous basis, it is increasingly difficult to distinguish traditionally-sourced images (e.g., photos, artwork) from AI-generated ones. Previous detection methods study the generalization from a single generator to another in isolation. However, in reality, new generators are released on a streaming basis. We study generalization in this setting, training on N models and testing on the next (N+k), following the historical release dates of well-known generation methods. Furthermore, images increasingly consist of both real and generated components, for example through image inpainting. Thus, we extend this approach to pixel prediction, demonstrating strong performance using automatically-generated inpainted data. In addition, for settings where commercial models are not publicly available for automatic data generation, we evaluate if pixel detectors can be trained solely on whole synthetic images. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: ICCV DeepFake Analysis and Detection Workshop, 2023

arXiv:2308.14411 [pdf]

Community College Articulation Agreement Websites: Students' Suggestions for New Academic Advising Software Features

Authors: David V. Nguyen, Shayan Doroudi, Daniel A. Epstein

Abstract: Articulation agreements provide more transparency about how community college courses will transfer and fulfill university requirements. However, the literature displays conflicting results on whether articulation agreements improve transfer-related outcomes; perhaps one contributor to these conflicting research results is the subpar user experience of articulation agreement reports and the websit… ▽ More Articulation agreements provide more transparency about how community college courses will transfer and fulfill university requirements. However, the literature displays conflicting results on whether articulation agreements improve transfer-related outcomes; perhaps one contributor to these conflicting research results is the subpar user experience of articulation agreement reports and the websites that host them. Accordingly, we surveyed and interviewed California community college transfer students to gather their suggestions for new academic-advising-related software features for the ASSIST website. ASSIST is California's official centralized repository of articulation agreement reports between public California community colleges and universities. We analyzed the open-ended survey and interview data using structural coding and thematic analysis. We identified four themes around students' software feature suggestions for ASSIST: (a) features that automate laborious academic advising tasks, (b) features to reduce ambiguity with articulation agreements, (c) features to mitigate mistakes in term-by-term course planning, and (d) features to facilitate online advising from advisors and student peers. △ Less

Submitted 30 April, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2307.04500 [pdf]

Optimal Academic Plan Derived from Articulation Agreements: A Preliminary Experiment on Human-Generated and (Hypothetical) Algorithm-Generated Academic Plans

Authors: David V. Nguyen, Shayan Doroudi, Daniel A. Epstein

Abstract: Objective: Community college students typically submit transfer applications to multiple universities. However, each university may have differing lower-division major requirements in order to transfer. Accordingly, our study examined one pain point users may have with ASSIST, which is California's official statewide database of articulation agreements. That pain point is cross-referencing multipl… ▽ More Objective: Community college students typically submit transfer applications to multiple universities. However, each university may have differing lower-division major requirements in order to transfer. Accordingly, our study examined one pain point users may have with ASSIST, which is California's official statewide database of articulation agreements. That pain point is cross-referencing multiple articulation agreements to manually develop an optimal academic plan. Optimal is defined as the minimal set of community college courses that satisfy all the transfer requirements for the multiple universities a student is preparing to apply to. Methods: To address that pain point, we designed a low-fidelity prototype that lists the minimal set of community college courses that a hypothetical optimization algorithm would output based on the user's selected articulation agreements. 24 California college students were tasked with creating an optimal academic plan using either ASSIST (which requires manual optimization) or the optimization prototype (which already provides the minimal set of classes). Results: Experiment participants assigned to use the prototype had less optimality mistakes in their academic plan, were faster in creating their plan, and provided higher usability ratings compared to the ASSIST users. All differences were statistically significant (p < 0.05) and had large effect sizes (d > 0.8). Conclusions: Our preliminary experiment suggests manually develo** optimal academic plans can be error prone and that algorithm-generated academic plans can potentially reduce unnecessary excess community college credits. However, future research needs to move beyond our proof of value of a hypothetical optimization algorithm and towards actually implementing an algorithm. △ Less

Submitted 30 April, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2306.00986 [pdf, other]

Diffusion Self-Guidance for Controllable Image Generation

Authors: Dave Epstein, Allan Jabri, Ben Poole, Alexei A. Efros, Aleksander Holynski

Abstract: Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, locati… ▽ More Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/ △ Less

Submitted 11 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Project page at https://dave.ml/selfguidance/

arXiv:2208.05456 [pdf, other]

doi 10.1145/1122445.1122456

Revisiting Piggyback Prototy**: Examining Benefits and Tradeoffs in Extending Existing Social Computing Systems

Authors: Daniel A. Epstein, Fannie Liu, Andrés Monroy-Hernández, Dennis Wang

Abstract: The CSCW community has a history of designing, implementing, and evaluating novel social interactions in technology, but the process requires significant technical effort for uncertain value. We discuss the opportunities and applications of "piggyback prototy**", building and evaluating new ideas for social computing on top of existing ones, expanding on its potential to contribute design recomm… ▽ More The CSCW community has a history of designing, implementing, and evaluating novel social interactions in technology, but the process requires significant technical effort for uncertain value. We discuss the opportunities and applications of "piggyback prototy**", building and evaluating new ideas for social computing on top of existing ones, expanding on its potential to contribute design recommendations. Drawing on about 50 papers which use the method, we critically examine the intellectual and technical benefits it provides, such as ecological validity and leveraging well-tested features, as well as research-product and ethical tensions it imposes, such as limits to customization and violation of participant privacy. We discuss considerations for future researchers deciding whether to use piggyback prototy** and point to new research agendas which can reduce the burden of implementing the method. △ Less

Submitted 23 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: To appear at the 25th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW '22)

ACM Class: H.5.3

arXiv:2205.02837 [pdf, other]

BlobGAN: Spatially Disentangled Scene Representations

Authors: Dave Epstein, Taesung Park, Richard Zhang, Eli Shechtman, Alexei A. Efros

Abstract: We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial unifor… ▽ More We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features. Blobs are differentiably placed onto a feature grid that is decoded into an image by a generative adversarial network. Due to the spatial uniformity of blobs and the locality inherent to convolution, our network learns to associate different blobs with different entities in a scene and to arrange these blobs to capture scene layout. We demonstrate this emergent behavior by showing that, despite training without any supervision, our method enables applications such as easy manipulation of objects within a scene (e.g., moving, removing, and restyling furniture), creation of feasible scenes given constraints (e.g., plausible rooms with drawers at a particular location), and parsing of real-world images into constituent parts. On a challenging multi-category dataset of indoor scenes, BlobGAN outperforms StyleGAN2 in image quality as measured by FID. See our project page for video results and interactive demo: https://www.dave.ml/blobgan △ Less

Submitted 29 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: ECCV 2022. Project webpage available at https://www.dave.ml/blobgan

arXiv:2101.02337 [pdf, other]

Learning Temporal Dynamics from Cycles in Narrated Video

Authors: Dave Epstein, Jiajun Wu, Cordelia Schmid, Chen Sun

Abstract: Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community. We propose a self-supervised solution to this problem using temporal cycle consistency jointly in vision and language, training on narrated video. Our model learns modality-agnostic functions to predict forward and backward in time, which must undo each other when composed. T… ▽ More Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community. We propose a self-supervised solution to this problem using temporal cycle consistency jointly in vision and language, training on narrated video. Our model learns modality-agnostic functions to predict forward and backward in time, which must undo each other when composed. This constraint leads to the discovery of high-level transitions between moments in time, since such transitions are easily inverted and shared across modalities. We justify the design of our model with an ablation study on different configurations of the cycle consistency problem. We then show qualitatively and quantitatively that our approach yields a meaningful, high-level model of the future and past. We apply the learned dynamics model without further training to various tasks, such as predicting future action and temporally ordering sets of images. Project page: https://dave.ml/mmcc △ Less

Submitted 12 September, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

Comments: ICCV 2021

arXiv:2012.04631 [pdf, other]

Globetrotter: Connecting Languages by Connecting Images

Authors: Dídac Surís, Dave Epstein, Carl Vondrick

Abstract: Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain. Our key insight is that, while languages may vary drastically, the underlying visual appearance of the world remains consistent. We introduce a method that uses visual observations to bridge the gap between languag… ▽ More Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain. Our key insight is that, while languages may vary drastically, the underlying visual appearance of the world remains consistent. We introduce a method that uses visual observations to bridge the gap between languages, rather than relying on parallel corpora or topological properties of the representations. We train a model that aligns segments of text from different languages if and only if the images associated with them are similar and each image in turn is well-aligned with its textual description. We train our model from scratch on a new dataset of text in over fifty languages with accompanying images. Experiments show that our method outperforms previous work on unsupervised word and sentence translation using retrieval. Code, models and data are available on globetrotter.cs.columbia.edu. △ Less

Submitted 31 March, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: CVPR 2022 (Oral)

arXiv:2006.15657 [pdf, other]

Learning Goals from Failure

Authors: Dave Epstein, Carl Vondrick

Abstract: We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision. Our approach models videos as contextual trajectories that represent both low-level motion and high-level action features. Experiments and visualizat… ▽ More We introduce a framework that predicts the goals behind observable human action in video. Motivated by evidence in developmental psychology, we leverage video of unintentional action to learn video representations of goals without direct supervision. Our approach models videos as contextual trajectories that represent both low-level motion and high-level action features. Experiments and visualizations show our trained model is able to predict the underlying goals in video of unintentional action. We also propose a method to "automatically correct" unintentional action by leveraging gradient signals of our model to adjust latent trajectories. Although the model is trained with minimal supervision, it is competitive with or outperforms baselines trained on large (supervised) datasets of successfully executed goals, showing that observing unintentional action is crucial to learning about goals in video. Project page: https://aha.cs.columbia.edu/ △ Less

Submitted 12 December, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

arXiv:2004.03037 [pdf, other]

Dense Steerable Filter CNNs for Exploiting Rotational Symmetry in Histology Images

Authors: Simon Graham, David Epstein, Nasir Rajpoot

Abstract: Histology images are inherently symmetric under rotation, where each orientation is equally as likely to appear. However, this rotational symmetry is not widely utilised as prior knowledge in modern Convolutional Neural Networks (CNNs), resulting in data hungry models that learn independent features at each orientation. Allowing CNNs to be rotation-equivariant removes the necessity to learn this s… ▽ More Histology images are inherently symmetric under rotation, where each orientation is equally as likely to appear. However, this rotational symmetry is not widely utilised as prior knowledge in modern Convolutional Neural Networks (CNNs), resulting in data hungry models that learn independent features at each orientation. Allowing CNNs to be rotation-equivariant removes the necessity to learn this set of transformations from the data and instead frees up model capacity, allowing more discriminative features to be learned. This reduction in the number of required parameters also reduces the risk of overfitting. In this paper, we propose Dense Steerable Filter CNNs (DSF-CNNs) that use group convolutions with multiple rotated copies of each filter in a densely connected framework. Each filter is defined as a linear combination of steerable basis filters, enabling exact rotation and decreasing the number of trainable parameters compared to standard filters. We also provide the first in-depth comparison of different rotation-equivariant CNNs for histology image analysis and demonstrate the advantage of encoding rotational symmetry into modern architectures. We show that DSF-CNNs achieve state-of-the-art performance, with significantly fewer parameters, when applied to three different tasks in the area of computational pathology: breast tumour classification, colon gland segmentation and multi-tissue nuclear segmentation. △ Less

Submitted 20 July, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

arXiv:1911.11237 [pdf, other]

Learning to Learn Words from Visual Scenes

Authors: Dídac Surís, Dave Epstein, Heng Ji, Shih-Fu Chang, Carl Vondrick

Abstract: Language acquisition is the process of learning words from the surrounding scene. We introduce a meta-learning framework that learns how to learn word representations from unconstrained scenes. We leverage the natural compositional structure of language to create training episodes that cause a meta-learner to learn strong policies for language acquisition. Experiments on two datasets show that our… ▽ More Language acquisition is the process of learning words from the surrounding scene. We introduce a meta-learning framework that learns how to learn word representations from unconstrained scenes. We leverage the natural compositional structure of language to create training episodes that cause a meta-learner to learn strong policies for language acquisition. Experiments on two datasets show that our approach is able to more rapidly acquire novel words as well as more robustly generalize to unseen compositions, significantly outperforming established baselines. A key advantage of our approach is that it is data efficient, allowing representations to be learned from scratch without language pre-training. Visualizations and analysis suggest visual information helps our approach learn a rich cross-modal representation from minimal examples. Project webpage is available at https://expert.cs.columbia.edu/ △ Less

Submitted 12 July, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: 26 pages, 12 figures

Journal ref: European Conference on Computer Vision (ECCV), 2020

arXiv:1911.11206 [pdf, other]

Oops! Predicting Unintentional Action in Video

Authors: Dave Epstein, Boyuan Chen, Carl Vondrick

Abstract: From just a short glance at a video, we can often tell whether a person's action is intentional or not. Can we train a model to recognize this? We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recognizing, localizing, and anticipating its onset. We train a supervised neural network as a baseline and analyze its performance compared to human cons… ▽ More From just a short glance at a video, we can often tell whether a person's action is intentional or not. Can we train a model to recognize this? We introduce a dataset of in-the-wild videos of unintentional action, as well as a suite of tasks for recognizing, localizing, and anticipating its onset. We train a supervised neural network as a baseline and analyze its performance compared to human consistency on the tasks. We also investigate self-supervised representations that leverage natural signals in our dataset, and show the effectiveness of an approach that uses the intrinsic speed of video to perform competitively with highly-supervised pretraining. However, a significant gap between machine and human performance remains. The project website is available at https://oops.cs.columbia.edu △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: 11 pages, 9 figures

arXiv:1807.05620 [pdf, other]

NEUZZ: Efficient Fuzzing with Neural Program Smoothing

Authors: Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana

Abstract: Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs. Most popular fuzzers use evolutionary guidance to generate inputs that can trigger different bugs. Such evolutionary algorithms, while fast and simple to implement, often get stuck in fruitless sequences of… ▽ More Fuzzing has become the de facto standard technique for finding software vulnerabilities. However, even state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs. Most popular fuzzers use evolutionary guidance to generate inputs that can trigger different bugs. Such evolutionary algorithms, while fast and simple to implement, often get stuck in fruitless sequences of random mutations. Gradient-guided optimization presents a promising alternative to evolutionary guidance. Gradient-guided techniques have been shown to significantly outperform evolutionary algorithms at solving high-dimensional structured optimization problems in domains like machine learning by efficiently utilizing gradients or higher-order derivatives of the underlying function. However, gradient-guided approaches are not directly applicable to fuzzing as real-world program behaviors contain many discontinuities, plateaus, and ridges where the gradient-based methods often get stuck. We observe that this problem can be addressed by creating a smooth surrogate function approximating the discrete branching behavior of target program. In this paper, we propose a novel program smoothing technique using surrogate neural network models that can incrementally learn smooth approximations of a complex, real-world program's branching behaviors. We further demonstrate that such neural network models can be used together with gradient-guided input generation schemes to significantly improve the fuzzing efficiency. Our extensive evaluations demonstrate that NEUZZ significantly outperforms 10 state-of-the-art graybox fuzzers on 10 real-world programs both at finding new bugs and achieving higher edge coverage. NEUZZ found 31 unknown bugs that other fuzzers failed to find in 10 real world programs and achieved 3X more edge coverage than all of the tested graybox fuzzers for 24 hours running. △ Less

Submitted 12 July, 2019; v1 submitted 15 July, 2018; originally announced July 2018.

Comments: To appear in the 40th IEEE Symposium on Security and Privacy, May 20--22, 2019, San Francisco, CA, USA

arXiv:1805.03699 [pdf, other]

Fast and Accurate Tumor Segmentation of Histology Images using Persistent Homology and Deep Convolutional Features

Authors: Talha Qaiser, Yee-Wah Tsang, Daiki Taniyama, Naoya Sakamoto, Kazuaki Nakane, David Epstein, Nasir Rajpoot

Abstract: Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. W… ▽ More Tumor segmentation in whole-slide images of histology slides is an important step towards computer-assisted diagnosis. In this work, we propose a tumor segmentation framework based on the novel concept of persistent homology profiles (PHPs). For a given image patch, the homology profiles are derived by efficient computation of persistent homology, which is an algebraic tool from homology theory. We propose an efficient way of computing topological persistence of an image, alternative to simplicial homology. The PHPs are devised to distinguish tumor regions from their normal counterparts by modeling the atypical characteristics of tumor nuclei. We propose two variants of our method for tumor segmentation: one that targets speed without compromising accuracy and the other that targets higher accuracy. The fast version is based on the selection of exemplar image patches from a convolution neural network (CNN) and patch classification by quantifying the divergence between the PHPs of exemplars and the input image patch. Detailed comparative evaluation shows that the proposed algorithm is significantly faster than competing algorithms while achieving comparable results. The accurate version combines the PHPs and high-level CNN features and employs a multi-stage ensemble strategy for image patch labeling. Experimental results demonstrate that the combination of PHPs and CNN features outperforms competing algorithms. This study is performed on two independently collected colorectal datasets containing adenoma, adenocarcinoma, signet and healthy cases. Collectively, the accurate tumor segmentation produces the highest average patch-level F1-score, as compared with competing algorithms, on malignant and healthy cases from both the datasets. Overall the proposed framework highlights the utility of persistent homology for histopathology image analysis. △ Less

Submitted 9 May, 2018; originally announced May 2018.

arXiv:1804.08145 [pdf, other]

doi 10.1016/j.media.2018.12.003

Micro-Net: A unified model for segmentation of various objects in microscopy images

Authors: Shan E Ahmed Raza, Linda Cheung, Muhammad Shaban, Simon Graham, David Epstein, Stella Pelengaris, Michael Khan, Nasir M. Rajpoot

Abstract: Object segmentation and structure localization are important steps in automated image analysis pipelines for microscopy images. We present a convolution neural network (CNN) based deep learning architecture for segmentation of objects in microscopy images. The proposed network can be used to segment cells, nuclei and glands in fluorescence microscopy and histology images after slight tuning of inp… ▽ More Object segmentation and structure localization are important steps in automated image analysis pipelines for microscopy images. We present a convolution neural network (CNN) based deep learning architecture for segmentation of objects in microscopy images. The proposed network can be used to segment cells, nuclei and glands in fluorescence microscopy and histology images after slight tuning of input parameters. The network trains at multiple resolutions of the input image, connects the intermediate layers for better localization and context and generates the output using multi-resolution deconvolution filters. The extra convolutional layers which bypass the max-pooling operation allow the network to train for variable input intensities and object size and make it robust to noisy data. We compare our results on publicly available data sets and show that the proposed network outperforms recent deep learning algorithms. △ Less

Submitted 22 January, 2019; v1 submitted 22 April, 2018; originally announced April 2018.

Journal ref: Medical Image Analysis. 52 (2019) 160-173

arXiv:1801.07451 [pdf, other]

Novel digital tissue phenotypic signatures of distant metastasis in colorectal cancer

Authors: Korsuk Sirinukunwattana, David Snead, David Epstein, Zia Aftab, Imaad Mujeeb, Yee Wah Tsang, Ian Cree, Nasir Rajpoot

Abstract: Distant metastasis is the major cause of death in colorectal cancer (CRC). Patients at high risk of develo** distant metastasis could benefit from appropriate adjuvant and follow-up treatments if stratified accurately at an early stage of the disease. Studies have increasingly recognized the role of diverse cellular components within the tumor microenvironment in the development and progression… ▽ More Distant metastasis is the major cause of death in colorectal cancer (CRC). Patients at high risk of develo** distant metastasis could benefit from appropriate adjuvant and follow-up treatments if stratified accurately at an early stage of the disease. Studies have increasingly recognized the role of diverse cellular components within the tumor microenvironment in the development and progression of CRC tumors. In this paper, we show that a new method of automated analysis of digitized images from colorectal cancer tissue slides can provide important estimates of distant metastasis-free survival (DMFS, the time before metastasis is first observed) on the basis of details of the microenvironment. Specifically, we determine what cell types are found in the vicinity of other cell types, and in what numbers, rather than concentrating exclusively on the cancerous cells. We then extract novel tissue phenotypic signatures using statistical measurements about tissue composition. Such signatures can underpin clinical decisions about the advisability of various types of adjuvant therapy. △ Less

Submitted 23 January, 2018; originally announced January 2018.

arXiv:1703.08658 [pdf, other]

Maximizing the area of intersection of rectangles

Authors: David B. A. Epstein, Mike Paterson

Abstract: This paper attacks the following problem. We are given a large number $N$ of rectangles in the plane, each with horizontal and vertical sides, and also a number $r<N$. The given list of $N$ rectangles may contain duplicates. The problem is to find $r$ of these rectangles, such that, if they are discarded, then the intersection of the remaining $(N-r)$ rectangles has an intersection with as large a… ▽ More This paper attacks the following problem. We are given a large number $N$ of rectangles in the plane, each with horizontal and vertical sides, and also a number $r<N$. The given list of $N$ rectangles may contain duplicates. The problem is to find $r$ of these rectangles, such that, if they are discarded, then the intersection of the remaining $(N-r)$ rectangles has an intersection with as large an area as possible. We will find an upper bound, depending only on $N$ and $r$, and not on the particular data presented, for the number of steps needed to run the algorithm on (a mathematical model of) a computer. In fact our algorithm is able to determine, for each $s\le r$, $s$ rectangles from the given list of $N$ rectangles, such that the remaining $(N-s)$ rectangles have as large an area as possible, and this takes hardly any more time than taking care only of the case $s=r$. Our algorithm extends to $d$-dimensional rectangles. Our method is to exhaustively examine all possible intersections---this is much faster than it sounds, because we do not need to examine all $\binom Ns$ subsets in order to find all possible intersection rectangles. For an extreme example, suppose the rectangles are nested, for example concentric squares of distinct sizes, then the only intersections examined are the smallest $s+1$ rectangles. △ Less

Submitted 25 March, 2017; originally announced March 2017.

Comments: 16 pages, 1 figure

ACM Class: F.2.2

Showing 1–19 of 19 results for author: Epstein, D