Search | arXiv e-print repository

It's your turn! -- A collaborative human-robot pick-and-place scenario in a virtual industrial setting

Authors: Brigitte Krenn, Tim Reinboth, Stephanie Gross, Christine Busch, Martina Mara, Kathrin Meyer, Michael Heiml, Thomas Layer-Wagner

Abstract: In human-robot collaborative interaction scenarios, nonverbal communication plays an important role. Both, signals sent by a human collaborator need to be identified and interpreted by the robotic system, and the signals sent by the robot need to be identified and interpreted by the human. In this paper, we focus on the latter. We implemented on an industrial robot in a VR environment nonverbal be… ▽ More In human-robot collaborative interaction scenarios, nonverbal communication plays an important role. Both, signals sent by a human collaborator need to be identified and interpreted by the robotic system, and the signals sent by the robot need to be identified and interpreted by the human. In this paper, we focus on the latter. We implemented on an industrial robot in a VR environment nonverbal behavior signalling the user that it is now their turn to proceed with a pick-and-place task. The signals were presented in four different test conditions: no signal, robot arm gesture, light signal, combination of robot arm gesture and light signal. Test conditions were presented to the participants in two rounds. The qualitative analysis was conducted with focus on (i) potential signals in human behaviour indicating why some participants immediately took over from the robot whereas others needed more time to explore, (ii) human reactions after the nonverbal signal of the robot, and (iii) whether participants showed different behaviours in the different test conditions. We could not identify potential signals why some participants were immediately successful and others not. There was a bandwidth of behaviors after the robot stopped working, e.g. participants rearranged the objects, looked at the robot or the object, or gestured the robot to proceed. We found evidence that robot deictic gestures were helpful for the human to correctly interpret what to do next. Moreover, there was a strong tendency that humans interpreted the light signal projected on the robot's gripper as a request to give the object in focus to the robot. Whereas a robot's pointing gesture at the object was a strong trigger for the humans to look at the object. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: 6 pages, 5 figures, 2 tables, HRI21 Workshop on "Exploring Applications for Autonomous Non-Verbal Human-Robot Interactions" March 8 2021

arXiv:2105.13812 [pdf, other]

A proxemics game between festival visitors and an industrial robot

Authors: Brigitte Krenn, Stephanie Gross, Bernhard Dieber, Horst Pichler, Kathrin Meyer

Abstract: With increased applications of collaborative robots (cobots) in industrial workplaces, behavioural effects of human-cobot interactions need to be further investigated. This is of particular importance as nonverbal behaviours of collaboration partners in human-robot teams significantly influence the experience of the human interaction partners and the success of the collaborative task. During the A… ▽ More With increased applications of collaborative robots (cobots) in industrial workplaces, behavioural effects of human-cobot interactions need to be further investigated. This is of particular importance as nonverbal behaviours of collaboration partners in human-robot teams significantly influence the experience of the human interaction partners and the success of the collaborative task. During the Ars Electronica 2020 Festival for Art, Technology and Society (Linz, Austria), we invited visitors to exploratively interact with an industrial robot, exhibiting restricted interaction capabilities: extending and retracting its arm, depending on the movements of the volunteer. The movements of the arm were pre-programmed and telecontrolled for safety reasons (which was not obvious to the participants). We recorded video data of these interactions and investigated general nonverbal behaviours of the humans interacting with the robot, as well as nonverbal behaviours of people in the audience. Our results showed that people were more interested in exploring the robot's action and perception capabilities than just reproducing the interaction game as introduced by the instructors. We also found that the majority of participants interacting with the robot approached it up to a distance which would be perceived as threatening or intimidating, if it were a human interaction partner. Regarding bystanders, we found examples where people made movements as if trying out variants of the current participant's behaviour. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: 5 pager, 2 pictures, HRI21 Workshop on "Exploring Applications for Autonomous Non-Verbal Human-Robot Interactions" March 8 2021

arXiv:2008.01725 [pdf, other]

A Large Scale Analysis of Android-Web Hybridization

Authors: Abhishek Tiwari, Jyoti Prakash, Sascha Gross, Christian Hammer

Abstract: Many Android applications embed webpages via WebView components and execute JavaScript code within Android. Hybrid applications leverage dedicated APIs to load a resource and render it in a WebView. Furthermore, Android objects can be shared with the JavaScript world. However, bridging the interfaces of the Android and JavaScript world might also incur severe security threats: Potentially untruste… ▽ More Many Android applications embed webpages via WebView components and execute JavaScript code within Android. Hybrid applications leverage dedicated APIs to load a resource and render it in a WebView. Furthermore, Android objects can be shared with the JavaScript world. However, bridging the interfaces of the Android and JavaScript world might also incur severe security threats: Potentially untrusted webpages and their JavaScript might interfere with the Android environment and its access to native features. No general analysis is currently available to assess the implications of such hybrid apps bridging the two worlds. To understand the semantics and effects of hybrid apps, we perform a large-scale study on the usage of the hybridization APIs in the wild. We analyze and categorize the parameters to hybridization APIs for 7,500 randomly selected and the 196 most popular applications from the Google Playstore as well as 1000 malware samples. Our results advance the general understanding of hybrid applications, as well as implications for potential program analyses, and the current security situation: We discovered thousands of flows of sensitive data from Android to JavaScript, the vast majority of which could flow to potentially untrustworthy code. Our analysis identified numerous web pages embedding vulnerabilities, which we exemplarily exploited. Additionally, we discovered a multitude of applications in which potentially untrusted JavaScript code may interfere with (trusted) Android objects, both in benign and malign applications. △ Less

Submitted 4 August, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2004.10188 [pdf, other]

Residual Energy-Based Models for Text

Authors: Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Abstract: Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmati… ▽ More Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmative even if we do not. This suggests that the auto-regressive models can be improved by incorporating the (globally normalized) discriminators into the generative process. We give a formalism for this using the Energy-Based Model framework, and show that it indeed improves the results of the generative models, measured both in terms of perplexity and in terms of human evaluation. △ Less

Submitted 21 December, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: long journal version

Journal ref: Journal of Machine Learning Research 21 (2020) 1-41

arXiv:1912.01703 [pdf, other]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Authors: Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, Soumith Chintala

Abstract: Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting… ▽ More Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Comments: 12 pages, 3 figures, NeurIPS 2019

arXiv:1910.12603 [pdf, other]

A blockchain-orchestrated Federated Learning architecture for healthcare consortia

Authors: Jonathan Passerat-Palmbach, Tyler Farnan, Robert Miller, Marielle S. Gross, Heather Leigh Flannery, Bill Gleim

Abstract: We propose a novel architecture for federated learning within healthcare consortia. At the heart of the solution is a unique integration of privacy preserving technologies, built upon native enterprise blockchain components available in the Ethereum ecosystem. We show how the specific characteristics and challenges of healthcare consortia informed our design choices, notably the conception of a ne… ▽ More We propose a novel architecture for federated learning within healthcare consortia. At the heart of the solution is a unique integration of privacy preserving technologies, built upon native enterprise blockchain components available in the Ethereum ecosystem. We show how the specific characteristics and challenges of healthcare consortia informed our design choices, notably the conception of a new Secure Aggregation protocol assembled with a protected hardware component and an encryption toolkit native to Ethereum. Our architecture also brings in a privacy preserving audit trail that logs events in the network without revealing identities. △ Less

Submitted 12 October, 2019; originally announced October 2019.

arXiv:1906.03351 [pdf, other]

Real or Fake? Learning to Discriminate Machine from Human Generated Text

Authors: Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam

Abstract: Energy-based models (EBMs), a.k.a. un-normalized models, have had recent successes in continuous spaces. However, they have not been successfully applied to model text sequences. While decreasing the energy at training samples is straightforward, mining (negative) samples where the energy should be increased is difficult. In part, this is because standard gradient-based methods are not readily app… ▽ More Energy-based models (EBMs), a.k.a. un-normalized models, have had recent successes in continuous spaces. However, they have not been successfully applied to model text sequences. While decreasing the energy at training samples is straightforward, mining (negative) samples where the energy should be increased is difficult. In part, this is because standard gradient-based methods are not readily applicable when the input is high-dimensional and discrete. Here, we side-step this issue by generating negatives using pre-trained auto-regressive language models. The EBM then works in the residual of the language model; and is trained to discriminate real text from text generated by the auto-regressive models. We investigate the generalization ability of residual EBMs, a pre-requisite for using them in other applications. We extensively analyze generalization for the task of classifying whether an input is machine or human generated, a natural task given the training loss and how we mine negatives. Overall, we observe that EBMs can generalize remarkably well to changes in the architecture of the generators producing negatives. However, EBMs exhibit more sensitivity to the training set used by such generators. △ Less

Submitted 25 November, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

arXiv:1905.12245 [pdf, other]

Automatic Realistic Music Video Generation from Segments of Youtube Videos

Authors: Sarah Gross, Xingxing Wei, Jun Zhu

Abstract: A Music Video (MV) is a video aiming at visually illustrating or extending the meaning of its background music. This paper proposes a novel method to automatically generate, from an input music track, a music video made of segments of Youtube music videos which would fit this music. The system analyzes the input music to find its genre (pop, rock, ...) and finds segmented MVs with the same genre i… ▽ More A Music Video (MV) is a video aiming at visually illustrating or extending the meaning of its background music. This paper proposes a novel method to automatically generate, from an input music track, a music video made of segments of Youtube music videos which would fit this music. The system analyzes the input music to find its genre (pop, rock, ...) and finds segmented MVs with the same genre in the database. Then, a K-Means clustering is done to group video segments by color histogram, meaning segments of MVs having the same global distribution of colors. A few clusters are randomly selected, then are assembled around music boundaries, which are moments where a significant change in the music occurs (for instance, transitioning from verse to chorus). This way, when the music changes, the video color mood changes as well. This work aims at generating high-quality realistic MVs, which could be mistaken for man-made MVs. By asking users to identify, in a batch of music videos containing professional MVs, amateur-made MVs and generated MVs by our algorithm, we show that our algorithm gives satisfying results, as 45% of generated videos are mistaken for professional MVs and 21.6% are mistaken for amateur-made MVs. More information can be found in the project website: http://ml.cs.tsinghua.edu.cn/~sarah/ △ Less

Submitted 29 May, 2019; originally announced May 2019.

arXiv:1904.01038 [pdf, other]

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Authors: Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

Abstract: fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found… ▽ More fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at https://www.youtube.com/watch?v=OtgDdWtHvto △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: NAACL 2019 Demo paper

arXiv:1812.05380 [pdf, other]

IIFA: Modular Inter-app Intent Information Flow Analysis of Android Applications

Authors: Abhishek Tiwari, Sascha Groß, Christian Hammer

Abstract: Android apps cooperate through message passing via intents. However, when apps do not have identical sets of privileges inter-app communication (IAC) can accidentally or maliciously be misused, e.g., to leak sensitive information contrary to users expectations. Recent research considered static program analysis to detect dangerous data leaks due to inter-component communication (ICC) or IAC, but s… ▽ More Android apps cooperate through message passing via intents. However, when apps do not have identical sets of privileges inter-app communication (IAC) can accidentally or maliciously be misused, e.g., to leak sensitive information contrary to users expectations. Recent research considered static program analysis to detect dangerous data leaks due to inter-component communication (ICC) or IAC, but suffers from shortcomings with respect to precision, soundness, and scalability. To solve these issues we propose a novel approach for static ICC/IAC analysis. We perform a fixed-point iteration of ICC/IAC summary information to precisely resolve intent communication with more than two apps involved. We integrate these results with information flows generated by a baseline (i.e. not considering intents) information flow analysis, and resolve if sensitive data is flowing (transitively) through components/apps in order to be ultimately leaked. Our main contribution is the first fully automatic sound and precise ICC/IAC information flow analysis that is scalable for realistic apps due to modularity, avoiding combinatorial explosion: Our approach determines communicating apps using short summaries rather than inlining intent calls, which often requires simultaneously analyzing all tuples of apps. We evaluated our tool IIFA in terms of scalability, precision, and recall. Using benchmarks we establish that precision and recall of our algorithm are considerably better than prominent state-of-the-art analyses for IAC. But foremost, applied to the 90 most popular applications from the Google Playstore, IIFA demonstrated its scalability to a large corpus of real-world apps. IIFA reports 62 problematic ICC-/IAC-related information flows via two or more apps/components. △ Less

Submitted 13 December, 2018; originally announced December 2018.

arXiv:1811.00164 [pdf, other]

Deep Counterfactual Regret Minimization

Authors: Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm

Abstract: Counterfactual Regret Minimization (CFR) is the leading framework for solving large imperfect-information games. It converges to an equilibrium by iteratively traversing the game tree. In order to deal with extremely large games, abstraction is typically applied before running CFR. The abstracted game is solved with tabular CFR, and its solution is mapped back to the full game. This process can be… ▽ More Counterfactual Regret Minimization (CFR) is the leading framework for solving large imperfect-information games. It converges to an equilibrium by iteratively traversing the game tree. In order to deal with extremely large games, abstraction is typically applied before running CFR. The abstracted game is solved with tabular CFR, and its solution is mapped back to the full game. This process can be problematic because aspects of abstraction are often manual and domain specific, abstraction algorithms may miss important strategic nuances of the game, and there is a chicken-and-egg problem because determining a good abstraction requires knowledge of the equilibrium of the game. This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game. We show that Deep CFR is principled and achieves strong performance in large poker games. This is the first non-tabular variant of CFR to be successful in large games. △ Less

Submitted 22 May, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

Journal ref: International Conference on Machine Learning (ICML), 2019

arXiv:1708.06564 [pdf, other]

The Continuous Hint Factory - Providing Hints in Vast and Sparsely Populated Edit Distance Spaces

Authors: Benjamin Paaßen, Barbara Hammer, Thomas William Price, Tiffany Barnes, Sebastian Gross, Niels Pinkwart

Abstract: Intelligent tutoring systems can support students in solving multi-step tasks by providing hints regarding what to do next. However, engineering such next-step hints manually or via an expert model becomes infeasible if the space of possible states is too large. Therefore, several approaches have emerged to infer next-step hints automatically, relying on past students' data. In particular, the Hin… ▽ More Intelligent tutoring systems can support students in solving multi-step tasks by providing hints regarding what to do next. However, engineering such next-step hints manually or via an expert model becomes infeasible if the space of possible states is too large. Therefore, several approaches have emerged to infer next-step hints automatically, relying on past students' data. In particular, the Hint Factory (Barnes & Stamper, 2008) recommends edits that are most likely to guide students from their current state towards a correct solution, based on what successful students in the past have done in the same situation. Still, the Hint Factory relies on student data being available for any state a student might visit while solving the task, which is not the case for some learning tasks, such as open-ended programming tasks. In this contribution we provide a mathematical framework for edit-based hint policies and, based on this theory, propose a novel hint policy to provide edit hints in vast and sparsely populated state spaces. In particular, we extend the Hint Factory by considering data of past students in all states which are similar to the student's current state and creating hints approximating the weighted average of all these reference states. Because the space of possible weighted averages is continuous, we call this approach the Continuous Hint Factory. In our experimental evaluation, we demonstrate that the Continuous Hint Factory can predict more accurately what capable students would do compared to existing prediction schemes on two learning tasks, especially in an open-ended programming task, and that the Continuous Hint Factory is comparable to existing hint policies at reproducing tutor hints on a simple UML diagram task. △ Less

Submitted 30 June, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

Journal ref: Journal of Educational Data Mining, 10 (2018) 1-35. Retrieved from https://jedm.educationaldatamining.org/index.php/JEDM/article/view/158

arXiv:1704.06363 [pdf, other]

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Authors: Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

Abstract: Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effe… ▽ More Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new (Jacobs et. al. 1991, Collobert et. al. 2003), but in the past, researchers have had to devise sophisticated methods to deal with data fragmentation. We show empirically that modern weakly supervised data sets are large enough to support naive partitioning schemes where each data point is assigned to a single expert. Because the experts are independent, training them in parallel is easy, and evaluation is cheap for the size of the model. Furthermore, we show that we can use a single decoding layer for all the experts, allowing a unified feature embedding space. We demonstrate that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets. △ Less

Submitted 20 April, 2017; originally announced April 2017.

Comments: Appearing in CVPR 2017

arXiv:1611.06430 [pdf, other]

Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks

Authors: Remi Denton, Sam Gross, Rob Fergus

Abstract: We introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as… ▽ More We introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as a regularizer for standard supervised training of the discriminator. Using our approach we are able to directly train large VGG-style networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL datasets, where our approach obtains performance comparable or superior to existing methods. △ Less

Submitted 19 November, 2016; originally announced November 2016.

arXiv:1604.02135 [pdf, other]

A MultiPath Network for Object Detection

Authors: Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr Dollár

Abstract: The recent COCO object detection dataset presents several new challenges for object detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple… ▽ More The recent COCO object detection dataset presents several new challenges for object detection. In particular, it contains objects at a broad range of scales, less prototypical images, and requires more precise localization. To address these challenges, we test three modifications to the standard Fast R-CNN object detector: (1) skip connections that give the detector access to features at multiple network layers, (2) a foveal structure to exploit object context at multiple object resolutions, and (3) an integral loss function and corresponding network adjustment that improve localization. The result of these modifications is that information can flow along multiple paths in our network, including through features from multiple network layers and from multiple object views. We refer to our modified classifier as a "MultiPath" network. We couple our MultiPath network with DeepMask object proposals, which are well suited for localization and small objects, and adapt our pipeline to predict segmentation masks in addition to bounding boxes. The combined system improves results over the baseline Fast R-CNN detector with Selective Search by 66% overall and by 4x on small objects. It placed second in both the COCO 2015 detection and segmentation challenges. △ Less

Submitted 8 August, 2016; v1 submitted 7 April, 2016; originally announced April 2016.

arXiv:1603.01312 [pdf, other]

Learning Physical Intuition of Block Towers by Example

Authors: Adam Lerer, Sam Gross, Rob Fergus

Abstract: Wooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data all… ▽ More Wooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the block trajectories. The models are also able to generalize in two important ways: (i) to new physical scenarios, e.g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects. △ Less

Submitted 3 March, 2016; originally announced March 2016.

arXiv:1412.6890 [pdf, other]

Software for Distributed Computation on Medical Databases: A Demonstration Project

Authors: Balasubramanian Narasimhan, Daniel L. Rubin, Samuel M. Gross, Marina Bendersky, Philip W. Lavori

Abstract: Bringing together the information latent in distributed medical databases promises to personalize medical care by enabling reliable, stable modeling of outcomes with rich feature sets (including patient characteristics and treatments received). However, there are barriers to aggregation of medical data, due to lack of standardization of ontologies, privacy concerns, proprietary attitudes toward da… ▽ More Bringing together the information latent in distributed medical databases promises to personalize medical care by enabling reliable, stable modeling of outcomes with rich feature sets (including patient characteristics and treatments received). However, there are barriers to aggregation of medical data, due to lack of standardization of ontologies, privacy concerns, proprietary attitudes toward data, and a reluctance to give up control over end use. Aggregation of data is not always necessary for model fitting. In models based on maximizing a likelihood, the computations can be distributed, with aggregation limited to the intermediate results of calculations on local data, rather than raw data. Distributed fitting is also possible for singular value decomposition. There has been work on the technical aspects of shared computation for particular applications, but little has been published on the software needed to support the "social networking" aspect of shared computing, to reduce the barriers to collaboration. We describe a set of software tools that allow the rapid assembly of a collaborative computational project, based on the flexible and extensible R statistical software and other open source packages, that can work across a heterogeneous collection of database environments, with full transparency to allow local officials concerned with privacy protections to validate the safety of the method. We describe the principles, architecture, and successful test results for the site-stratified Cox model and rank-k Singular Value Decomposition (SVD). △ Less

Submitted 9 February, 2017; v1 submitted 22 December, 2014; originally announced December 2014.

arXiv:1209.2873 [pdf]

doi 10.1038/srep02197

Extraction of hidden information by efficient community detection in networks

Authors: Juyong Lee, Steven P. Gross, Jooyoung Lee

Abstract: Currently, we are overwhelmed by a deluge of experimental data, and network physics has the potential to become an invaluable method to increase our understanding of large interacting datasets. However, this potential is often unrealized for two reasons: uncovering the hidden community structure of a network, known as community detection, is difficult, and further, even if one has an idea of this… ▽ More Currently, we are overwhelmed by a deluge of experimental data, and network physics has the potential to become an invaluable method to increase our understanding of large interacting datasets. However, this potential is often unrealized for two reasons: uncovering the hidden community structure of a network, known as community detection, is difficult, and further, even if one has an idea of this community structure, it is not a priori obvious how to efficiently use this information. Here, to address both of these issues, we, first, identify optimal community structure of given networks in terms of modularity by utilizing a recently introduced community detection method. Second, we develop an approach to use this community information to extract hidden information from a network. When applied to a protein-protein interaction network, the proposed method outperforms current state-of-the-art methods that use only the local information of a network. The method is generally applicable to networks from many areas. △ Less

Submitted 13 September, 2012; originally announced September 2012.

Comments: 17 pages, 2 figures and 2 tables

Journal ref: Scientific Reports (2013)

arXiv:1202.5398 [pdf, ps, other]

Mod-CSA: Modularity optimization by conformational space annealing

Authors: Juyong Lee, Steven P. Gross, Jooyoung Lee

Abstract: We propose a new modularity optimization method, Mod-CSA, based on stochastic global optimization algorithm, conformational space annealing (CSA). Our method outperforms simulated annealing in terms of both efficiency and accuracy, finding higher modularity partitions with less computational resources required. The high modularity values found by our method are higher than, or equal to, the larges… ▽ More We propose a new modularity optimization method, Mod-CSA, based on stochastic global optimization algorithm, conformational space annealing (CSA). Our method outperforms simulated annealing in terms of both efficiency and accuracy, finding higher modularity partitions with less computational resources required. The high modularity values found by our method are higher than, or equal to, the largest values previously reported. In addition, the method can be combined with other heuristic methods, and implemented in parallel fashion, allowing it to be applicable to large graphs with more than 10000 nodes. △ Less

Submitted 25 April, 2012; v1 submitted 24 February, 2012; originally announced February 2012.

Showing 1–19 of 19 results for author: Gross, S