Search | arXiv e-print repository

Counterfactual Explanations for Linear Optimization

Authors: Jannis Kurtz, Ş. İlker Birbil, Dick den Hertog

Abstract: The concept of counterfactual explanations (CE) has emerged as one of the important concepts to understand the inner workings of complex AI systems. In this paper, we translate the idea of CEs to linear optimization and propose, motivate, and analyze three different types of CEs: strong, weak, and relative. While deriving strong and weak CEs appears to be computationally intractable, we show that… ▽ More The concept of counterfactual explanations (CE) has emerged as one of the important concepts to understand the inner workings of complex AI systems. In this paper, we translate the idea of CEs to linear optimization and propose, motivate, and analyze three different types of CEs: strong, weak, and relative. While deriving strong and weak CEs appears to be computationally intractable, we show that calculating relative CEs can be done efficiently. By detecting and exploiting the hidden convex structure of the optimization problem that arises in the latter case, we show that obtaining relative CEs can be done in the same magnitude of time as solving the original linear optimization problem. This is confirmed by an extensive numerical experiment study on the NETLIB library. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2402.16269 [pdf, other]

From Large Language Models and Optimization to Decision Optimization CoPilot: A Research Manifesto

Authors: Segev Wasserkrug, Leonard Boussioux, Dick den Hertog, Farzaneh Mirzazadeh, Ilker Birbil, Jannis Kurtz, Donato Maragno

Abstract: Significantly simplifying the creation of optimization models for real-world business problems has long been a major goal in applying mathematical optimization more widely to important business and societal decisions. The recent capabilities of Large Language Models (LLMs) present a timely opportunity to achieve this goal. Therefore, we propose research at the intersection of LLMs and optimization… ▽ More Significantly simplifying the creation of optimization models for real-world business problems has long been a major goal in applying mathematical optimization more widely to important business and societal decisions. The recent capabilities of Large Language Models (LLMs) present a timely opportunity to achieve this goal. Therefore, we propose research at the intersection of LLMs and optimization to create a Decision Optimization CoPilot (DOCP) - an AI tool designed to assist any decision maker, interacting in natural language to grasp the business problem, subsequently formulating and solving the corresponding optimization model. This paper outlines our DOCP vision and identifies several fundamental requirements for its implementation. We describe the state of the art through a literature survey and experiments using ChatGPT. We show that a) LLMs already provide substantial novel capabilities relevant to a DOCP, and b) major research challenges remain to be addressed. We also propose possible research directions to overcome these gaps. We also see this work as a call to action to bring together the LLM and optimization communities to pursue our vision, thereby enabling much more widespread improved decision-making. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.02552 [pdf, other]

Neur2BiLO: Neural Bilevel Optimization

Authors: Justin Dumouchelle, Esther Julien, Jannis Kurtz, Elias B. Khalil

Abstract: Bilevel optimization deals with nested problems in which a leader takes the first decision to minimize their objective function while accounting for a follower's best-response reaction. Constrained bilevel problems with integer variables are particularly notorious for their hardness. While exact solvers have been proposed for mixed-integer linear bilevel optimization, they tend to scale poorly wit… ▽ More Bilevel optimization deals with nested problems in which a leader takes the first decision to minimize their objective function while accounting for a follower's best-response reaction. Constrained bilevel problems with integer variables are particularly notorious for their hardness. While exact solvers have been proposed for mixed-integer linear bilevel optimization, they tend to scale poorly with problem size and are hard to generalize to the non-linear case. On the other hand, problem-specific algorithms (exact and heuristic) are limited in scope. Under a data-driven setting in which similar instances of a bilevel problem are solved routinely, our proposed framework, Neur2BiLO, embeds a neural network approximation of the leader's or follower's value function, trained via supervised regression, into an easy-to-solve mixed-integer program. Neur2BiLO serves as a heuristic that produces high-quality solutions extremely fast for the bilevel knapsack interdiction problem, the "critical node game" from network security, a donor-recipient healthcare problem, and discrete network design from transportation planning. These problems are diverse in that they have linear or non-linear objectives/constraints and integer or mixed-integer variables, making Neur2BiLO unique in its versatility. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2312.14211 [pdf, ps, other]

Experimenting with Large Language Models and vector embeddings in NASA SciX

Authors: Sergi Blanco-Cuaresma, Ioana Ciucă, Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Kelly E. Lockhart, Felix Grezes, Thomas Allen, Golnaz Shapurian, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Daniel Chivvis, Fernanda de Macedo Alves, Jean-Claude Paquin, Jennifer Bartlett, Mugdha Polimera, Stephanie Jarmak

Abstract: Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed a… ▽ More Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: To appear in the proceedings of the 33th annual international Astronomical Data Analysis Software & Systems (ADASS XXXIII)

arXiv:2312.08579 [pdf, other]

Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach

Authors: Golnaz Shapurian, Michael J Kurtz, Alberto Accomazzi

Abstract: The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Na… ▽ More The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap with places or people's names that they are named after, for example, Syria, Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some feature names have been used in many contexts, for instance, Apollo, which can refer to mission, program, sample, astronaut, seismic, seismometers, core, era, data, collection, instrument, and station, in addition to the crater on the Moon. Some feature names can appear in the text as adjectives, like the lunar craters Black, Green, and White. Some feature names in other contexts serve as directions, like craters West and South on the Moon. Additionally, some features share identical names across different celestial bodies, requiring disambiguation, such as the Adams crater, which exists on both the Moon and Mars. We present a multi-step pipeline combining rule-based filtering, statistical relevance analysis, part-of-speech (POS) tagging, named entity recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) matching, and inference with a locally installed large language model (LLM) to reliably identify planetary names despite these challenges. When evaluated on a dataset of astronomy papers from the Astrophysics Data System (ADS), this methodology achieves an F1-score over 0.97 in disambiguating planetary feature names. △ Less

Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.00197 [pdf, other]

Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations

Authors: Ryan Dorosh, Justin Allen, Zixuan He, Christopher Ninatanta, Jack Coleman, Jack Spieker, Ethan Tuck, Jordan Kurtz, Qin Zhang, Matthew D. Whiting, Jiecai Luo, Manoj Karkee, Ming Luo

Abstract: Tree fruit growers around the world are facing labor shortages for critical operations, including harvest and pruning. There is a great interest in develo** robotic solutions for these labor-intensive tasks, but current efforts have been prohibitively costly, slow, or require a reconfiguration of the orchard in order to function. In this paper, we introduce an alternative approach to robotics us… ▽ More Tree fruit growers around the world are facing labor shortages for critical operations, including harvest and pruning. There is a great interest in develo** robotic solutions for these labor-intensive tasks, but current efforts have been prohibitively costly, slow, or require a reconfiguration of the orchard in order to function. In this paper, we introduce an alternative approach to robotics using a novel and low-cost soft-growing robotic platform. Our platform features the ability to extend up to 1.2 m linearly at a maximum speed of 0.27 m/s. The soft-growing robotic arm can operate with a terminal payload of up to 1.4 kg (4.4 N), more than sufficient for carrying an apple. This platform decouples linear and steering motions to simplify path planning and the controller design for targeting. We anticipate our platform being relatively simple to maintain compared to rigid robotic arms. Herein we also describe and experimentally verify the platform's kinematic model, including the prediction of the relationship between the steering angle and the angular positions of the three steering motors. Information from the model enables the position controller to guide the end effector to the targeted positions faster and with higher stability than without this information. Overall, our research show promise for using soft-growing robotic platforms in orchard operations. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: International Conference on Intelligent Robots and Systems (IROS) 2023

arXiv:2310.04345 [pdf, other]

Neur2RO: Neural Two-Stage Robust Optimization

Authors: Justin Dumouchelle, Esther Julien, Jannis Kurtz, Elias B. Khalil

Abstract: Robust optimization provides a mathematical framework for modeling and solving decision-making problems under worst-case uncertainty. This work addresses two-stage robust optimization (2RO) problems (also called adjustable robust optimization), wherein first-stage and second-stage decisions are made before and after uncertainty is realized, respectively. This results in a nested min-max-min optimi… ▽ More Robust optimization provides a mathematical framework for modeling and solving decision-making problems under worst-case uncertainty. This work addresses two-stage robust optimization (2RO) problems (also called adjustable robust optimization), wherein first-stage and second-stage decisions are made before and after uncertainty is realized, respectively. This results in a nested min-max-min optimization problem which is extremely challenging computationally, especially when the decisions are discrete. We propose Neur2RO, an efficient machine learning-driven instantiation of column-and-constraint generation (CCG), a classical iterative algorithm for 2RO. Specifically, we learn to estimate the value function of the second-stage problem via a novel neural network architecture that is easy to optimize over by design. Embedding our neural network into CCG yields high-quality solutions quickly as evidenced by experiments on two 2RO benchmarks, knapsack and capital budgeting. For knapsack, Neur2RO finds solutions that are within roughly $2\%$ of the best-known values in a few seconds compared to the three hours of the state-of-the-art exact branch-and-price algorithm; for larger and more complex instances, Neur2RO finds even better solutions. For capital budgeting, Neur2RO outperforms three variants of the $k$-adaptability algorithm, particularly on the largest instances, with a 10 to 100-fold reduction in solution time. Our code and data are available at https://github.com/khalil-research/Neur2RO. △ Less

Submitted 15 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2301.11113 [pdf, other]

Finding Regions of Counterfactual Explanations via Robust Optimization

Authors: Donato Maragno, Jannis Kurtz, Tabea E. Röber, Rob Goedhart, Ş. Ilker Birbil, Dick den Hertog

Abstract: Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calcu… ▽ More Counterfactual explanations play an important role in detecting bias and improving the explainability of data-driven classification models. A counterfactual explanation (CE) is a minimal perturbed data point for which the decision of the model changes. Most of the existing methods can only provide one CE, which may not be achievable for the user. In this work we derive an iterative method to calculate robust CEs, i.e. CEs that remain valid even after the features are slightly perturbed. To this end, our method provides a whole region of CEs allowing the user to choose a suitable recourse to obtain a desired outcome. We use algorithmic ideas from robust optimization and prove convergence results for the most common machine learning methods including logistic regression, decision trees, random forests, and neural networks. Our experiments show that our method can efficiently generate globally optimal robust CEs for a variety of common data sets and classification models. △ Less

Submitted 26 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

arXiv:2212.00744 [pdf, ps, other]

Improving astroBERT using Semantic Textual Similarity

Authors: Felix Grezes, Thomas Allen, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Pavlos Protopapas

Abstract: The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first… ▽ More The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT. △ Less

Submitted 29 November, 2022; originally announced December 2022.

arXiv:2203.16642 [pdf, other]

Data-driven Prediction of Relevant Scenarios for Robust Combinatorial Optimization

Authors: Marc Goerigk, Jannis Kurtz

Abstract: We study iterative methods for (two-stage) robust combinatorial optimization problems with discrete uncertainty. We propose a machine-learning-based heuristic to determine starting scenarios that provide strong lower bounds. To this end, we design dimension-independent features and train a Random Forest Classifier on small-dimensional instances. Experiments show that our method improves the soluti… ▽ More We study iterative methods for (two-stage) robust combinatorial optimization problems with discrete uncertainty. We propose a machine-learning-based heuristic to determine starting scenarios that provide strong lower bounds. To this end, we design dimension-independent features and train a Random Forest Classifier on small-dimensional instances. Experiments show that our method improves the solution process for larger instances than contained in the training set and also provides a feature importance-score which gives insights into the role of scenario properties. △ Less

Submitted 23 December, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

arXiv:2203.01606 [pdf, other]

Ensemble Methods for Robust Support Vector Machines using Integer Programming

Authors: Jannis Kurtz

Abstract: In this work we study binary classification problems where we assume that our training data is subject to uncertainty, i.e. the precise data points are not known. To tackle this issue in the field of robust machine learning the aim is to develop models which are robust against small perturbations in the training data. We study robust support vector machines (SVM) and extend the classical approach… ▽ More In this work we study binary classification problems where we assume that our training data is subject to uncertainty, i.e. the precise data points are not known. To tackle this issue in the field of robust machine learning the aim is to develop models which are robust against small perturbations in the training data. We study robust support vector machines (SVM) and extend the classical approach by an ensemble method which iteratively solves a non-robust SVM on different perturbations of the dataset, where the perturbations are derived by an adversarial problem. Afterwards for classification of an unknown data point we perform a majority vote of all calculated SVM solutions. We study three different variants for the adversarial problem, the exact problem, a relaxed variant and an efficient heuristic variant. While the exact and the relaxed variant can be modeled using integer programming formulations, the heuristic one can be implemented by an easy and efficient algorithm. All derived methods are tested on random and realistic datasets and the results indicate that the derived ensemble methods have a much more stable behaviour when changing the protection level compared to the classical robust SVM model. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2202.00777 [pdf, ps, other]

Web accessibility trends and implementation in dynamic web applications

Authors: Timothy W. Hostetler, Shinyi Chen, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Carolyn S. Grant, Edwin Henneken, Donna M. Thompson, Roman Chyla, Golnaz Shapurian, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Stephen McDonald, Felix Grezes

Abstract: The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to pre… ▽ More The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to present the information provided by the website. NASA ADS follows the official Web Content Accessibility Guidelines (WCAG) standard for ensuring accessibility of all its applications, striving to exceed this standard where possible. Through the use of both internal audits and external expert review based on these guidelines, we have identified many areas for improving accessibility in our current web application, and have implemented a number of updates to the UI as a result of this. We present an overview of some current web accessibility trends, discuss our experience incorporating these trends in our web application, and discuss the lessons learned and recommendations for future projects. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: Submitted to ADASS XXXI (2021)

arXiv:2112.00590 [pdf, ps, other]

Building astroBERT, a language model for Astronomy & Astrophysics

Authors: Felix Grezes, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Shinyi Chen, Chris Tanner, Pavlos Protopapas

Abstract: The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and… ▽ More The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are develo** our own named entity recognition tool. We present here our preliminary results and lessons learned. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2110.11382 [pdf, other]

Efficient and Robust Mixed-Integer Optimization Methods for Training Binarized Deep Neural Networks

Authors: Jannis Kurtz, Bubacarr Bah

Abstract: Compared to classical deep neural networks its binarized versions can be useful for applications on resource-limited devices due to their reduction in memory consumption and computational demands. In this work we study deep neural networks with binary activation functions and continuous or integer weights (BDNN). We show that the BDNN can be reformulated as a mixed-integer linear program with boun… ▽ More Compared to classical deep neural networks its binarized versions can be useful for applications on resource-limited devices due to their reduction in memory consumption and computational demands. In this work we study deep neural networks with binary activation functions and continuous or integer weights (BDNN). We show that the BDNN can be reformulated as a mixed-integer linear program with bounded weight space which can be solved to global optimality by classical mixed-integer programming solvers. Additionally, a local search heuristic is presented to calculate locally optimal networks. Furthermore to improve efficiency we present an iterative data-splitting heuristic which iteratively splits the training set into smaller subsets by using the k-mean method. Afterwards all data points in a given subset are forced to follow the same activation pattern, which leads to a much smaller number of integer variables in the mixed-integer programming formulation and therefore to computational improvements. Finally for the first time a robust model is presented which enforces robustness of the BDNN during training. All methods are tested on random and real datasets and our results indicate that all models can often compete with or even outperform classical DNNs on small network architectures confirming the viability for applications having restricted memory or computing power. △ Less

Submitted 25 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: added GitHub link for code. arXiv admin note: substantial text overlap with arXiv:2007.03326

arXiv:2106.03107 [pdf, other]

Approximation Algorithms for Min-max-min Robust Optimization and K-Adaptability under Objective Uncertainty

Authors: Jannis Kurtz

Abstract: In this work we investigate the min-max-min robust optimization problem and the k-adaptability robust optimization problem for binary problems with uncertain costs. The idea of the first approach is to calculate a set of k feasible solutions which are worst-case optimal if in each possible scenario the best of the k solutions is implemented. It is known that the min-max-min robust problem can be s… ▽ More In this work we investigate the min-max-min robust optimization problem and the k-adaptability robust optimization problem for binary problems with uncertain costs. The idea of the first approach is to calculate a set of k feasible solutions which are worst-case optimal if in each possible scenario the best of the k solutions is implemented. It is known that the min-max-min robust problem can be solved efficiently if k is at least the dimension of the problem, while it is theoretically and computationally hard if k is small. However, nothing is known about the intermediate case, i.e. k lies between one and the dimension of the problem. We approach this open question and present an approximation algorithm which achieves good problem-specific approximation guarantees for the cases where k is close to or where k is a fraction of the dimension. The derived bounds can be used to show that the min-max-min robust problem is solvable in oracle-polynomial time under certain conditions even if k is smaller than the dimension. We extend the previous results to the robust k-adaptability problem. As a consequence we can provide bounds on the number of necessary second-stage policies to approximate the exact two-stage robust problem. We derive an approximation algorithm for the k-adaptability problem which has similar guarantees as for the min-max-min problem. Finally, we test both algorithms on knapsack and shortest path problems and related two-stage variants. The experiments show that both algorithms calculate solutions with relatively small optimality gap in seconds. △ Less

Submitted 15 August, 2023; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: This is a completely revised version of my previous preprint "New complexity results and algorithms for min-max-min robust combinatorial optimization". Some results were removed while several new results were added

arXiv:2012.03470 [pdf, ps, other]

doi 10.3847/1538-3881/abc06e

Center for Astrophysics Optical Infrared Science Archive. I. FAST Spectrograph

Authors: Jessica Mink, Warren R. Brown, Igor V. Chilingarian, Daniel Fabricant, Michael J. Kurtz, Sean Moran, Jaehyon Rhee, Susan Tokarz, William F. Wyatt

Abstract: We announce the public release of 141,531 moderate-dispersion optical spectra of 72,247 objects acquired over the past 25 years with the FAST Spectrograph on the Fred L. Whipple Observatory 1.5-meter Tillinghast telescope. We describe the data acquisition and processing so that scientists can understand the spectra. We highlight some of the largest FAST survey programs, and make recommendations fo… ▽ More We announce the public release of 141,531 moderate-dispersion optical spectra of 72,247 objects acquired over the past 25 years with the FAST Spectrograph on the Fred L. Whipple Observatory 1.5-meter Tillinghast telescope. We describe the data acquisition and processing so that scientists can understand the spectra. We highlight some of the largest FAST survey programs, and make recommendations for use. The spectra have been placed in a Virtual Observatory accessible archive and are ready for download. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 17 pages, 18 figures, 8 tables

arXiv:2011.09769 [pdf, other]

Data-Driven Robust Optimization using Unsupervised Deep Learning

Authors: Marc Goerigk, Jannis Kurtz

Abstract: Robust optimization has been established as a leading methodology to approach decision problems under uncertainty. To derive a robust optimization model, a central ingredient is to identify a suitable model for uncertainty, which is called the uncertainty set. An ongoing challenge in the recent literature is to derive uncertainty sets from given historical data that result in solutions that are ro… ▽ More Robust optimization has been established as a leading methodology to approach decision problems under uncertainty. To derive a robust optimization model, a central ingredient is to identify a suitable model for uncertainty, which is called the uncertainty set. An ongoing challenge in the recent literature is to derive uncertainty sets from given historical data that result in solutions that are robust regarding future scenarios. In this paper we use an unsupervised deep learning method to learn and extract hidden structures from data, leading to non-convex uncertainty sets and better robust solutions. We prove that most of the classical uncertainty classes are special cases of our derived sets and that optimizing over them is strongly NP-hard. Nevertheless, we show that the trained neural networks can be integrated into a robust optimization model by formulating the adversarial problem as a convex quadratic mixed-integer program. This allows us to derive robust solutions through an iterative scenario generation process. In our computational experiments, we compare this approach to a similar approach using kernel-based support vector clustering. We find that uncertainty sets derived by the unsupervised deep learning method find a better description of data and lead to robust solutions that outperform the comparison method both with respect to objective value and feasibility. △ Less

Submitted 9 September, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

arXiv:2010.01418 [pdf]

doi 10.3847/25c2cfeb.8d12c399

Second Order Operators in the NASA Astrophysics Data System

Authors: Michael J. Kurtz, Roman Chyla

Abstract: Second Order Operators (SOOs) are database functions which form secondary queries based on attributes of the objects returned in an initial query; they can provide powerful methods to investigate complex, multipartite information graphs. The NASA Astrophysics Data System (ADS) has implemented four SOOs, reviews, useful, trending, and similar which use the citations, references, downloads, and abst… ▽ More Second Order Operators (SOOs) are database functions which form secondary queries based on attributes of the objects returned in an initial query; they can provide powerful methods to investigate complex, multipartite information graphs. The NASA Astrophysics Data System (ADS) has implemented four SOOs, reviews, useful, trending, and similar which use the citations, references, downloads, and abstract text. This tutorial describes these operators in detail, both alone and in conjunction with other functions. It is intended for scientists and others who wish to make fuller use of the ADS database. Basic knowledge of the ADS is assumed. △ Less

Submitted 3 October, 2020; originally announced October 2020.

Comments: ADS Bibcode:2020BAAS...52b0207K, author's version

Journal ref: Bulletin of the American Astronomical Society, Vol. 52, No. 2, id. 0207 2020

arXiv:2009.14323 [pdf]

doi 10.3847/25c2cfeb.704b260e

Enabling Synergy: Improving the Information Infrastructure for Planetary Science

Authors: Michael J. Kurtz, Alberto Accomazzi, Edwin A. Henneken

Abstract: In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information… ▽ More In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information infrastructure be created to collate and curate information on features and objects in our solar system, beginning with the USGS/IAU Gazetteer of Planetary Nomenclature. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: 8 pages, submitted to the Planetary Science and Astrobiology Decadal Survey 2023-2032

arXiv:2009.05048 [pdf, ps, other]

Agile methodologies in teams with highly creative and autonomous members

Authors: Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi

Abstract: The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented c… ▽ More The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented correctly and not because Scrum may have its own flaws. This grants irrefutability to the methodology, discouraging deviations to fit the actual needs and peculiarities of the developers. In particular, the members of the NASA ADS team are highly creative and autonomous whose motivation can be affected if their freedom is too strongly constrained. We present our experience following Agile principles, reusing certain Scrum elements and seeking the satisfaction of the team members, while rapidly reacting/kee** the project in line with our stakeholders expectations. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Comments: To appear in the proceedings of the 29th annual international Astronomical Data Analysis Software & Systems (ADASS XXIX)

arXiv:2007.03326 [pdf, other]

An Integer Programming Approach to Deep Neural Networks with Binary Activation Functions

Authors: Bubacarr Bah, Jannis Kurtz

Abstract: We study deep neural networks with binary activation functions (BDNN), i.e. the activation function only has two states. We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers. Additionally, a heuristic solution algorithm is presented and we study the model under data uncertainty, applying a two-… ▽ More We study deep neural networks with binary activation functions (BDNN), i.e. the activation function only has two states. We show that the BDNN can be reformulated as a mixed-integer linear program which can be solved to global optimality by classical integer programming solvers. Additionally, a heuristic solution algorithm is presented and we study the model under data uncertainty, applying a two-stage robust optimization approach. We implemented our methods on random and real datasets and show that the heuristic version of the BDNN outperforms classical deep neural networks on the Breast Cancer Wisconsin dataset while performing worse on random data. △ Less

Submitted 7 August, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Journal ref: Workshop on Beyond first-order methods in ML systems at the 37th International Conference on Machine Learning, Vienna, Austria, 2020

arXiv:1910.12608 [pdf, other]

Min-Max-Min Robustness for Combinatorial Problems with Discrete Budgeted Uncertainty

Authors: Marc Goerigk, Jannis Kurtz, Michael Poss

Abstract: We consider robust combinatorial optimization problems with cost uncertainty where the decision maker can prepare K solutions beforehand and chooses the best of them once the true cost is revealed. Also known as min-max-min robustness (a special case of K-adaptability), it is a viable alternative to otherwise intractable two-stage problems. The uncertainty set assumed in this paper considers that… ▽ More We consider robust combinatorial optimization problems with cost uncertainty where the decision maker can prepare K solutions beforehand and chooses the best of them once the true cost is revealed. Also known as min-max-min robustness (a special case of K-adaptability), it is a viable alternative to otherwise intractable two-stage problems. The uncertainty set assumed in this paper considers that in any scenario, at most Gamma of the components of the cost vectors will be higher than expected, which corresponds to the extreme points of the budgeted uncertainty set. While the classical min-max problem with budgeted uncertainty is essentially as easy as the underlying deterministic problem, it turns out that the min-max-min problem is NPhard for many easy combinatorial optimization problems, and not approximable in general. We thus present an integer programming formulation for solving the problem through a row-and-column generation algorithm. While exact, this algorithm can only cope with small problems, so we present two additional heuristics leveraging the structure of budgeted uncertainty. We compare our row-and-column generation algorithm and our heuristics on knapsack and shortest path instances previously used in the scientific literature and find that the heuristics obtain good quality solutions in short computational times. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1905.05257 [pdf, other]

Oracle-Based Algorithms for Binary Two-Stage Robust Optimization

Authors: Nicolas Kämmerling, Jannis Kurtz

Abstract: In this work we study binary two-stage robust optimization problems with objective uncertainty. We present an algorithm to calculate efficiently lower bounds for the binary two-stage robust problem by solving alternately the underlying deterministic problem and an adversarial problem. For the deterministic problem any oracle can be used which returns an optimal solution for every possible scenario… ▽ More In this work we study binary two-stage robust optimization problems with objective uncertainty. We present an algorithm to calculate efficiently lower bounds for the binary two-stage robust problem by solving alternately the underlying deterministic problem and an adversarial problem. For the deterministic problem any oracle can be used which returns an optimal solution for every possible scenario. We show that the latter lower bound can be implemented in a branch & bound procedure, where the branching is performed only over the first-stage decision variables. All results even hold for non-linear objective functions which are concave in the uncertain parameters. As an alternative solution method we apply a column-and-constraint generation algorithm to the binary two-stage robust problem with objective uncertainty. We test both algorithms on benchmark instances of the uncapacitated single-allocation hub-location problem and of the capital budgeting problem. Our results show that the branch & bound procedure outperforms the column-and-constraint generation algorithm. △ Less

Submitted 16 January, 2020; v1 submitted 13 May, 2019; originally announced May 2019.

arXiv:1904.01542 [pdf, other]

Discrete Optimization Methods for Group Model Selection in Compressed Sensing

Authors: Bubacarr Bah, Jannis Kurtz, Oliver Schaudt

Abstract: In this article we study the problem of signal recovery for group models. More precisely for a given set of groups, each containing a small subset of indices, and for given linear sketches of the true signal vector which is known to be group-sparse in the sense that its support is contained in the union of a small number of these groups, we study algorithms which successfully recover the true sign… ▽ More In this article we study the problem of signal recovery for group models. More precisely for a given set of groups, each containing a small subset of indices, and for given linear sketches of the true signal vector which is known to be group-sparse in the sense that its support is contained in the union of a small number of these groups, we study algorithms which successfully recover the true signal just by the knowledge of its linear sketches. We derive model projection complexity results and algorithms for more general group models than the state-of-the-art. We consider two versions of the classical Iterative Hard Thresholding algorithm (IHT). The classical version iteratively calculates the exact projection of a vector onto the group model, while the approximate version (AM-IHT) uses a head- and a tail-approximation iteratively. We apply both variants to group models and analyse the two cases where the sensing matrix is a Gaussian matrix and a model expander matrix. To solve the exact projection problem on the group model, which is known to be equivalent to the maximum weight coverage problem, we use discrete optimization methods based on dynamic programming and Benders' Decomposition. The head- and tail-approximations are derived by a classical greedy-method and LP-rounding, respectively. △ Less

Submitted 27 February, 2020; v1 submitted 2 April, 2019; originally announced April 2019.

arXiv:1903.00297 [pdf]

From Dark Energy to Exolife: Improving the Digital Information Infrastructure for Astrophysics

Authors: Michael J. Kurtz, Alberto Accomazzi

Abstract: Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served b… ▽ More Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served by the existence of an open, distributed network of data centers and archives. However, institutional barriers and differing research cultures have prevented cross-disciplinary collaborations, creating fragmented knowledge and stove-piped research activities. This must change in order for the broader community of scientists to work together and solve our most ambitious decadal challenges. Interdisciplinary inquiry is best supported by bringing researchers together at the information discovery level. In order to cross the traditional disciplinary silos we must allow scientists both to explore new ideas and to gain access to new data and knowledge. This is best enabled by providing discovery platforms which allow them to explore and connect different research threads in the literature, identify communities of experts, access and analyze the related published datasets, measurements and catalogs. △ Less

Submitted 1 March, 2019; originally announced March 2019.

Comments: 6 pages, whitepaper submitted to Astro2020, the Astronomy and Astrophysics Decadal Survey

arXiv:1901.05463 [pdf, ps, other]

Fundamentals of effective cloud management for the new NASA Astrophysics Data System

Authors: Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi, Nathan Rapport

Abstract: The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient s… ▽ More The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: To appear in the proceedings of the 28th annual international Astronomical Data Analysis Software & Systems (ADASS XXVIII)

arXiv:1803.03598 [pdf]

Merging the Astrophysics and Planetary Science Information Systems

Authors: Michael J. Kurtz, Alberto Accomazzi, Edwin A. Henneken

Abstract: Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the… ▽ More Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the Astrophysics Data System (ADS). The information system for Planetary Science does not have a central component linking the literature with the observational and theoretical data. Here we propose that the Committee on an Exoplanet Science Strategy recommend that this linkage be built, with the ADS playing the role in Planetary Science which it already plays in Astrophysics. This will require additional resources for the ADS, and the Planetary Data System (PDS), as well as other international collaborators △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: Whitepaper submitted to the Committee on an Exoplanet Science Strategy

arXiv:1802.05072 [pdf, ps, other]

Faster Algorithms for Min-max-min Robustness for Combinatorial Problems with Budgeted Uncertainty

Authors: André Chassein, Marc Goerigk, Jannis Kurtz, Michael Poss

Abstract: We consider robust combinatorial optimization problems where the decision maker can react to a scenario by choosing from a finite set of $k$ solutions. This approach is appropriate for decision problems under uncertainty where the implementation of decisions requires preparing the ground. We focus on the case that the set of possible scenarios is described through a budgeted uncertainty set and pr… ▽ More We consider robust combinatorial optimization problems where the decision maker can react to a scenario by choosing from a finite set of $k$ solutions. This approach is appropriate for decision problems under uncertainty where the implementation of decisions requires preparing the ground. We focus on the case that the set of possible scenarios is described through a budgeted uncertainty set and provide three algorithms for the problem. The first algorithm solves heuristically the dualized problem, a non-convex mixed-integer non-linear program (MINLP), via an alternating optimization approach. The second algorithm solves the MINLP exactly for $k=2$ through a dedicated spatial branch-and-bound algorithm. The third approach enumerates $k$-tuples, relying on strong bounds to avoid a complete enumeration. We test our methods on shortest path instances that were used in the previous literature and on randomly generated knapsack instances, and find that our methods considerably outperform previous approaches. Many instances that were previously not solved within hours can now be solved within few minutes, often even faster. △ Less

Submitted 27 March, 2019; v1 submitted 14 February, 2018; originally announced February 2018.

arXiv:1801.00815 [pdf]

doi 10.1007/978-0-585-33110-2_3

Advice from the Oracle: Really Intelligent Information Retrieval

Authors: Michael J. Kurtz

Abstract: What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as ment… ▽ More What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: Author copy; published 25 years ago at the beginning of the Astrophysics Data System; 2018 keywords added

Journal ref: In: Heck A., Murtagh F. (eds) Intelligent Information Retrieval: The Case of Astronomy and Related Space Sciences. Astrophysics and Space Science Library, vol 182. Springer, Dordrecht (1993)

arXiv:1712.06704 [pdf, ps, other]

Multilingual Topic Models

Authors: Kriste Krstovski, Michael J. Kurtz, David A. Smith, Alberto Accomazzi

Abstract: Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document… ▽ More Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of map** diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement. △ Less

Submitted 18 December, 2017; originally announced December 2017.

Comments: 18 pages, 9 figures

arXiv:1711.06775 [pdf, other]

Grain boundary phases in bcc metals

Authors: T. Frolov, W. Setyawan, R. J. Kurtz, J. Marian, A. R. Oganov, R. E. Rudd, Q. Zhu

Abstract: We report a computational discovery of novel grain boundary structures and multiple grain boundary phases in elemental bcc tungsten. While grain boundary structures created by the γ-surface method as a union of two perfect half crystals have been studied extensively, it is known that the method has limitations and does not always predict the correct ground states. Here, we use a newly developed co… ▽ More We report a computational discovery of novel grain boundary structures and multiple grain boundary phases in elemental bcc tungsten. While grain boundary structures created by the γ-surface method as a union of two perfect half crystals have been studied extensively, it is known that the method has limitations and does not always predict the correct ground states. Here, we use a newly developed computational tool, based on evolutionary algorithms, to perform a grand-canonical search of a high-angle symmetric tilt boundary in tungsten, and we find new ground states and multiple phases that cannot be described using the conventional structural unit model. We use MD simulations to demonstrate that the new structures can coexist at finite temperature in a closed system, confirming these are examples of different GB phases. The new ground state is confirmed by first-principles calculations. △ Less

Submitted 17 November, 2017; originally announced November 2017.

Comments: 19 pages, 6 figures

arXiv:1710.08505 [pdf, ps, other]

doi 10.1051/epjconf/201818608001

New ADS Functionality for the Curator

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Steven McDonald, Taylor J. Shaulis, Sergi Blanco-Cuaresma, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton

Abstract: In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community… ▽ More In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community to support data and software citations, we discuss what steps the ADS is taking to provide the needed infrastructure in collaboration with publishers and data providers. A new API provides access to the ADS search interface, metrics, and libraries allowing users to programmatically automate discovery and curation tasks. The new ADS interface supports a greater integration of content and services with a variety of partners, including ORCID claiming, indexing of SIMBAD objects, and article graphics from a variety of publishers. Finally, we highlight how librarians can facilitate the ingest of gray literature that they curate into our system. △ Less

Submitted 23 October, 2017; originally announced October 2017.

Comments: Submitted to the Proceedings of Library and Information Services in Astronomy VIII, Strasbourg, France

arXiv:1707.09955 [pdf]

doi 10.1051/epjconf/201818606004

Comparing People with Bibliometrics

Authors: Michael J. Kurtz

Abstract: Bibliometric indicators, citation counts and/or download counts are increasingly being used to inform personnel decisions such as hiring or promotions. These statistics are very often misused. Here we provide a guide to the factors which should be considered when using these so-called quantitative measures to evaluate people. Rules of thumb are given for when begin to use bibliometric measures whe… ▽ More Bibliometric indicators, citation counts and/or download counts are increasingly being used to inform personnel decisions such as hiring or promotions. These statistics are very often misused. Here we provide a guide to the factors which should be considered when using these so-called quantitative measures to evaluate people. Rules of thumb are given for when begin to use bibliometric measures when comparing otherwise similar candidates. △ Less

Submitted 31 July, 2017; originally announced July 2017.

Comments: to appear in Proceedings of Library and Information Science in Astronomy VIII (LISA-8)

arXiv:1706.02153 [pdf]

doi 10.1007/978-3-030-02511-3_32

Usage Bibliometrics as a Tool to Measure Research Activity

Authors: Edwin A. Henneken, Michael J. Kurtz

Abstract: Measures for research activity and impact have become an integral ingredient in the assessment of a wide range of entities (individual researchers, organizations, instruments, regions, disciplines). Traditional bibliometric indicators, like publication and citation based indicators, provide an essential part of this picture, but cannot describe the complete picture. Since reading scholarly publica… ▽ More Measures for research activity and impact have become an integral ingredient in the assessment of a wide range of entities (individual researchers, organizations, instruments, regions, disciplines). Traditional bibliometric indicators, like publication and citation based indicators, provide an essential part of this picture, but cannot describe the complete picture. Since reading scholarly publications is an essential part of the research life cycle, it is only natural to introduce measures for this activity in attempts to quantify the efficiency, productivity and impact of an entity. Citations and reads are significantly different signals, so taken together, they provide a more complete picture of research activity. Most scholarly publications are now accessed online, making the study of reads and their patterns possible. Click-stream logs allow us to follow information access by the entire research community, real-time. Publication and citation datasets just reflect activity by authors. In addition, download statistics will help us identify publications with significant impact, but which do not attract many citations. Click-stream signals are arguably more complex than, say, citation signals. For one, they are a superposition of different classes of readers. Systematic downloads by crawlers also contaminate the signal, as does browsing behavior. We discuss the complexities associated with clickstream data and how, with proper filtering, statistically significant relations and conclusions can be inferred from download statistics. We describe how download statistics can be used to describe research activity at different levels of aggregation, ranging from organizations to countries. These statistics show a correlation with socio-economic indicators. A comparison will be made with traditional bibliometric indicators. We will argue that astronomy is representative of more general trends. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: 25 pages, 11 figures, accepted for publication in Handbook of Quantitative Science and Technology Research, Springer

arXiv:1606.01308 [pdf]

Object Kinetic Monte Carlo Simulations of Radiation Damage in Bulk Tungsten Part-II: With a PKA Spectrum Corresponding to 14-MeV Neutrons

Authors: Giridhar Nandipati, Wahyu Setyawan, Howard L. Heinisch, Kenneth J. Roche, Richard J. Kurtz, Brian D. Wirth

Abstract: Object kinetic Monte Carlo was employed to study the effect of dose rate on the evolution of vacancy microstructure in polycrystalline tungsten under neutron bombardment. The evolution was followed up to 1.0 displacement per atom (dpa) with point defects generated in accordance with a primary knock-on atom (PKA) spectrum corresponding to 14-MeV neutrons. The present study includes the effect of gr… ▽ More Object kinetic Monte Carlo was employed to study the effect of dose rate on the evolution of vacancy microstructure in polycrystalline tungsten under neutron bombardment. The evolution was followed up to 1.0 displacement per atom (dpa) with point defects generated in accordance with a primary knock-on atom (PKA) spectrum corresponding to 14-MeV neutrons. The present study includes the effect of grain size (2.0 and 4.0 $μ$m) but excludes the impact of transmutation or pre-existing defects beyond grain boundary sinks. Vacancy cluster density increases with dose rate, while the density of vacancies decreases. Consequently, the average vacancy cluster size and the fraction of vacancies in visible clusters decrease with increasing dose rate. The density of vacancies and vacancy clusters decrease with grain size such that the average size of the clusters remains similar. However, the average size is larger for larger grains at dose rates < 4.5 x 10-7 dpa/s. The trend of vacancy accumulation as a function of dose, dose rate, and grain size is similar to that obtained with the High Flux Isotope Reactor (HFIR) PKA spectrum. However, the amount of vacancy accumulation and the vacancy microstructure are quite different. Compared to the HFIR case, we find that even though the dose rates are 2.5 times higher, the density of vacancies and the average vacancy cluster sizes are lower. In addition, a void lattice forms only for the lowest two dose rates (4.5 x 10-8 and 4.5 x 10-9 dpa/s). In contrast, a void lattice formed at all dose rates studied using the HFIR PKA spectrum. We discuss in detail the factors that lead to these different microstructures. △ Less

Submitted 3 June, 2016; originally announced June 2016.

arXiv:1603.06885 [pdf, ps, other]

doi 10.3847/0067-0049/224/1/11

SHELS: Complete Redshift Surveys of Two Widely Separated Fields

Authors: Margaret J. Geller, Ho Seong Hwang, Ian P. Dell'Antonio, Harus Jabran Zahid, Michael J. Kurtz, Daniel G. Fabricant

Abstract: The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey. Both fields are more than 94% complete to a Galactic extinction corrected R0 = 20.2. Here we describe the redshift survey of the F1 field centered at R.A. = 00h53m25.3s and Decl = 12d33m55s; like F2, the F1 field covers 4 sq deg. The redshift survey… ▽ More The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey. Both fields are more than 94% complete to a Galactic extinction corrected R0 = 20.2. Here we describe the redshift survey of the F1 field centered at R.A. = 00h53m25.3s and Decl = 12d33m55s; like F2, the F1 field covers 4 sq deg. The redshift survey of the F1 field includes 9426 new galaxy redshifts measured with Hectospec on the MMT (published here). As a guide to future uses of the combined survey we compare the mass metallicity relation and the distributions of D4000 as a function of stellar mass and redshift for the two fields. The mass-metallicity relations differ by an insignificant 1.6 sigma. For galaxies in the stellar mass range 1.e10 to 1.e11 MSun, the increase in the star-forming fraction with redshift is remarkably similar in the two fields. The seemingly surprising 31-38% difference in the overall galaxy counts in F1 and F2 is probably consistent with the expected cosmic variance given the subtleties of the relative systematics in the two surveys. We also review the Deep Lens Survey cluster detections in the two fields: poorer photometric data for F1 precluded secure detection of the single massive cluster at z = 0.35 that we find in SHELS. Taken together the two fields include 16,055 redshifts for galaxies with R0 <= 20.2 and 20,754 redshifts for galaxies with R <= 20.6. These dense surveys in two well-separated fields provide a basis for future investigations of galaxy properties and large-scale structure. △ Less

Submitted 22 March, 2016; originally announced March 2016.

Comments: 24 pages, 6 tables, 13 figures; ApJS, accepted; full data tables available in journal upon publication

arXiv:1602.06343 [pdf, ps, other]

doi 10.3847/0004-637X/818/2/173

HectoMAP and Horizon Run 4: Dense Structures and Voids in the Real and Simulated Universe

Authors: Ho Seong Hwang, Margaret J. Geller, Changbom Park, Daniel G. Fabricant, Michael J. Kurtz, Kenneth J. Rines, Juhan Kim, Antonaldo Diaferio, H. Jabran Zahid, Perry Berlind, Michael Calkins, Susan Tokarz, Sean Moran

Abstract: HectoMAP is a dense redshift survey of red galaxies covering a 53 $deg^{2}$ strip of the northern sky. HectoMAP is 97\% complete for galaxies with $r<20.5$, $(g-r)>1.0$, and $(r-i)>0.5$. The survey enables tests of the physical properties of large-scale structure at intermediate redshift against cosmological models. We use the Horizon Run 4, one of the densest and largest cosmological simulations… ▽ More HectoMAP is a dense redshift survey of red galaxies covering a 53 $deg^{2}$ strip of the northern sky. HectoMAP is 97\% complete for galaxies with $r<20.5$, $(g-r)>1.0$, and $(r-i)>0.5$. The survey enables tests of the physical properties of large-scale structure at intermediate redshift against cosmological models. We use the Horizon Run 4, one of the densest and largest cosmological simulations based on the standard $Λ$ Cold Dark Matter ($Λ$CDM) model, to compare the physical properties of observed large-scale structures with simulated ones in a volume-limited sample covering 8$\times10^6$ $h^{-3}$ Mpc$^3$ in the redshift range $0.22<z<0.44$. We apply the same criteria to the observations and simulations to identify over- and under-dense large-scale features of the galaxy distribution. The richness and size distributions of observed over-dense structures agree well with the simulated ones. Observations and simulations also agree for the volume and size distributions of under-dense structures, voids. The properties of the largest over-dense structure and the largest void in HectoMAP are well within the distributions for the largest structures drawn from 300 Horizon Run 4 mock surveys. Overall the size, richness and volume distributions of observed large-scale structures in the redshift range $0.22<z<0.44$ are remarkably consistent with predictions of the standard $Λ$CDM model. △ Less

Submitted 19 February, 2016; originally announced February 2016.

Comments: 20 pages, 16 figures, 1 table. Published in ApJ (818:106, 2016). Paper with high resolution figures is available at https://astro.kias.re.kr/~hshwang/Hwang_etal16_LSS_HectoMAP_HorizonRun4_high.pdf

Journal ref: 2016, ApJ, 818, 173

arXiv:1601.07858 [pdf, ps, other]

Aggregation and Linking of Observational Metadata in the ADS

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Alexandra Holachek, Jonathan Elliott

Abstract: We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched b… ▽ More We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched by ADS via the generation of citations and usage data, and through the aggregation of external resources from astronomy data archives and libraries. Important sources of such additional information are the metadata describing observing proposals and high level data products, which, once ingested in ADS, become easily discoverable and citeable by the science community. Bibliographic studies have shown that the integration of links between data archives and the ADS provides greater visibility to data products and increased citations to the literature associated with them. △ Less

Submitted 28 January, 2016; originally announced January 2016.

Comments: 4 pages, Proceedings of the ADASS XXV conference

arXiv:1601.01611 [pdf, other]

Automatic Construction of Evaluation Sets and Evaluation of Document Similarity Models in Large Scholarly Retrieval Systems

Authors: Kriste Krstovski, David A. Smith, Michael J. Kurtz

Abstract: Retrieval systems for scholarly literature offer the ability for the scientific community to search, explore and download scholarly articles across various scientific disciplines. Mostly used by the experts in the particular field, these systems contain user community logs including information on user specific downloaded articles. In this paper we present a novel approach for automatically evalua… ▽ More Retrieval systems for scholarly literature offer the ability for the scientific community to search, explore and download scholarly articles across various scientific disciplines. Mostly used by the experts in the particular field, these systems contain user community logs including information on user specific downloaded articles. In this paper we present a novel approach for automatically evaluating document similarity models in large collections of scholarly publications. Unlike typical evaluation settings that use test collections consisting of query documents and human annotated relevance judgments, we use download logs to automatically generate pseudo-relevant set of similar document pairs. More specifically we show that consecutively downloaded document pairs, extracted from a scholarly information retrieval (IR) system, could be utilized as a test collection for evaluating document similarity models. Another novel aspect of our approach lies in the method that we employ for evaluating the performance of the model by comparing the distribution of consecutively downloaded document pairs and random document pairs in log space. Across two families of similarity models, that represent documents in the term vector and topic spaces, we show that our evaluation approach achieves very high correlation with traditional performance metrics such as Mean Average Precision (MAP), while being more efficient to compute. △ Less

Submitted 7 January, 2016; originally announced January 2016.

arXiv:1510.09099 [pdf]

doi 10.1002/asi.23689

Measuring Metrics - A forty year longitudinal cross-validation of citations, downloads, and peer review in Astrophysics

Authors: Michael J. Kurtz, Edwin A. Henneken

Abstract: Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 indiv… ▽ More Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 individuals who received a U.S. PhD in astronomy in the period 1972-1976. By examining the same and different measures at the same and different times for the same individuals we are able to show the capabilities and limitations of each measure. Because the distributions are lognormal measurement uncertainties are multiplicative; we show that in order to state with 95% confidence that one person's citations and/or downloads are significantly higher than another person's, the log difference in the ratio of counts must be at least 0.3 dex, which corresponds to a multiplicative factor of two. △ Less

Submitted 30 October, 2015; originally announced October 2015.

Comments: Author's version of manuscript accepted for publication in the Journal of the Association for Information Science and Technology (JASIST); 35 pages 16 figures

arXiv:1510.02732 [pdf]

Object Kinetic Monte Carlo Simulations of Radiation Damage in Neutron-Irradiated Tungsten Part-I: Neutron Flux with a PKA Spectrum Corresponding to the High-flux Isotope Reactor

Authors: Giridhar Nandipati, Wahyu Setyawana, Howard L. Heinisch, Kenneth J. Roche, Richard J. Kurtz, Brian D. Wirth

Abstract: Object kinetic Monte Carlo simulations were performed to study the impact of varying dose rate and grain size up to a dose of 1.0 dpa in pure, polycrystalline tungsten, subjected to a neutron irradiation having a PKA spectrum corresponding to the High Flux Isotope Reactor. The present study models defect cluster accumulation in tungsten, but does not consider the impact of transmutation or pre-exi… ▽ More Object kinetic Monte Carlo simulations were performed to study the impact of varying dose rate and grain size up to a dose of 1.0 dpa in pure, polycrystalline tungsten, subjected to a neutron irradiation having a PKA spectrum corresponding to the High Flux Isotope Reactor. The present study models defect cluster accumulation in tungsten, but does not consider the impact of transmutation or pre-existing defects beyond the grain boundary sinks, with varying grain size. With increasing dose rate, the vacancy cluster density increases, while the number density of vacancies decreases. Accordingly, the average vacancy cluster size and the fraction of vacancies that are part of visible clusters decreases with increasing dose rate. With increasing grain size, both the number densities of vacancies and vacancy clusters decrease, while both the fraction of vacancies in visible clusters and the average vacancy cluster size increase. This is caused by the pseudo-ripening of the vacancy clusters due to the longer-lived self-interstitial clusters in larger grains. The spatial ordering of vacancy clusters along {110} planes was observed for both grain sizes and all dose rates studied. Interplanar spacing increases with grain size; however, no clear dependence on dose or dose rate was observed. The results of this study show that 1D diffusion of self-interstitial clusters, while necessary, is not sufficient to form a void lattice, and that the diffusion of vacancies is also required. A methodology is suggested for choosing the simulation box dimensions so as to represent more faithfully the effects of one-dimensional migrating self-interstitial-atom clusters. △ Less

Submitted 25 March, 2016; v1 submitted 9 October, 2015; originally announced October 2015.

arXiv:1503.05881 [pdf, other]

ADS 2.0: new architecture, API and services

Authors: Roman Chyla, Alberto Accomazzi, Alexandra Holachek, Carolyn S. Grant, Jonathan Elliott, Edwin A. Henneken, Donna M. Thompson, Michael J. Kurtz, Stephen S. Murray, Vladimir Sudilovsky

Abstract: The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using… ▽ More The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADS's underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Comments: ADASS Conference 2014

arXiv:1503.04194 [pdf, other]

ADS: The Next Generation Search Platform

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Roman Chyla, James Luker, Carolyn S. Grant, Donna M. Thompson, Alexandra Holachek, Rahul Dave, Stephen S. Murray

Abstract: Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently develo**. Starting in 2011, the ADS started to systematically collect, parse… ▽ More Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently develo**. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/ △ Less

Submitted 13 March, 2015; originally announced March 2015.

Comments: Submitted to Library and Information Services in Astronomy VII, Naples, Italy

arXiv:1412.7452 [pdf, other]

doi 10.1088/0953-8984/27/22/225402

Cascade morphology transition in bcc metals

Authors: Wahyu Setyawan, Aaron P. Selby, Niklas Juslin, Roger E. Stoller, Brian D. Wirth, Richard J. Kurtz

Abstract: Energetic atom collisions in solids induce shockwaves with complex morphologies. In this paper, we establish the existence of a morphological transition in such cascades. The order parameter of the morphology is defined as the exponent, $b$, in the defect production curve as a function of cascade energy ($N_F \sim E_{MD}^b$). Response of different bcc metals can be compared in a consistent energy… ▽ More Energetic atom collisions in solids induce shockwaves with complex morphologies. In this paper, we establish the existence of a morphological transition in such cascades. The order parameter of the morphology is defined as the exponent, $b$, in the defect production curve as a function of cascade energy ($N_F \sim E_{MD}^b$). Response of different bcc metals can be compared in a consistent energy domain when the energy is normalized by the transition energy, $μ$, between the high- and the low-energy regime. Using Cr, Fe, Mo and W data, an empirical formula of $μ$ as a function of displacement threshold energy, $E_d$, is presented for bcc metals. △ Less

Submitted 23 December, 2014; originally announced December 2014.

Comments: 7 pages, 6 figures

arXiv:1406.7424 [pdf, other]

doi 10.1016/j.jmp.2015.01.001

Complexity Measures and Concept Learning

Authors: Andreas D. Pape, Kenneth J. Kurtz, Hiroki Sayama

Abstract: The nature of concept learning is a core question in cognitive science. Theories must account for the relative difficulty of acquiring different concepts by supervised learners. For a canonical set of six category types, two distinct orderings of classification difficulty have been found. One ordering, which we call paradigm-specific, occurs when adult human learners classify objects with easily d… ▽ More The nature of concept learning is a core question in cognitive science. Theories must account for the relative difficulty of acquiring different concepts by supervised learners. For a canonical set of six category types, two distinct orderings of classification difficulty have been found. One ordering, which we call paradigm-specific, occurs when adult human learners classify objects with easily distinguishable characteristics such as size, shape, and shading. The general order occurs in all other known cases: when adult humans classify objects with characteristics that are not readily distinguished (e.g., brightness, saturation, hue); for children and monkeys; and when categorization difficulty is extrapolated from errors in identification learning. The paradigm-specific order was found to be predictable mathematically by measuring the logical complexity of tasks, i.e., how concisely the solution can be represented by logical rules. However, logical complexity explains only the paradigm-specific order but not the general order. Here we propose a new difficulty measurement, information complexity, that calculates the amount of uncertainty remaining when a subset of the dimensions are specified. This measurement is based on Shannon entropy. We show that, when the metric extracts minimal uncertainties, this new measurement predicts the paradigm-specific order for the canonical six category types, and when the metric extracts average uncertainties, this new measurement predicts the general order. Moreover, for learning category types beyond the canonical six, we find that the minimal-uncertainty formulation correctly predicts the paradigm-specific order as well or better than existing metrics (Boolean complexity and GIST) in most cases. △ Less

Submitted 23 January, 2015; v1 submitted 28 June, 2014; originally announced June 2014.

Comments: 27 pages, 7 tables, 1 figure. Accepted for publication in Journal of Mathematical Psychology, in press

Journal ref: Journal of Mathematical Psychology, vol. 64-65, pp. 66-75, 2015

arXiv:1406.4542 [pdf, ps, other]

Computing and Using Metrics in the ADS

Authors: Edwin A. Henneken, Alberto Accomazzi, Michael J. Kurtz, Carolyn S. Grant, Donna Thompson, Jay Luker, Roman Chyla, Alexandra Holachek, Stephen S. Murray

Abstract: Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication… ▽ More Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness. △ Less

Submitted 17 June, 2014; originally announced June 2014.

Comments: to appear in proceedings of LISA VII conference, Naples, Italy

arXiv:1405.7704 [pdf, ps, other]

doi 10.1088/0067-0049/213/2/35

SHELS: A Complete Galaxy Redshift Survey with R$\leq$20.6

Authors: Margaret J. Geller, Ho Seong Hwang, Daniel G. Fabricant, Michael J. Kurtz, Ian P. Dell'Antonio, Harus Jabran Zahid

Abstract: The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey to a limiting R = 20.6. Here we describe the redshift survey of the F2 field (R.A.$_{2000}$ = 09$^h$19$^m$32.4$^s$ and Decl.$_{2000}$ = +30$^{\circ}$00$^{\prime}$00$^{\prime\prime}$). The survey includes 16,294 new redshifts measured with the Hectosp… ▽ More The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey to a limiting R = 20.6. Here we describe the redshift survey of the F2 field (R.A.$_{2000}$ = 09$^h$19$^m$32.4$^s$ and Decl.$_{2000}$ = +30$^{\circ}$00$^{\prime}$00$^{\prime\prime}$). The survey includes 16,294 new redshifts measured with the Hectospec on the MMT. The resulting survey of the 4 deg$^2$ F2 field is 95\% complete to R = 20.6, currently the densest survey to this magnitude limit. The median survey redshift is $ z = 0.3$; the survey provides a view of structure in the range 0.1 $ \lesssim z \lesssim 0.6$. A movie displays the large-scale structure in the survey region. We provide a redshift, spectral index D$_n$4000, and stellar mass for each galaxy in the survey. We also provide a metallicity for each galaxy in the range 0.2 $< z <0. 38$. To demonstrate potential applications of the survey, we examine the behavior of the index D$_n$4000 as a function of galaxy luminosity, stellar mass, and redshift. The known evolutionary and stellar mass dependent properties of the galaxy population are cleanly evident in the data. We also show that the mass-metallicity relation previously determined from these data is robust to the analysis approach. △ Less

Submitted 29 May, 2014; originally announced May 2014.

Comments: 45 pages, 16 figures, 7 tables. Data will be available only when the paper is published in Astrophysical Journal Supplements (now submitted). Movie and full resolution figures are available at https://www.cfa.harvard.edu/~mjg/f6movie.mp4 and https://www.cfa.harvard.edu/~mjg/SHELS.pdf

arXiv:1404.6490 [pdf, other]

doi 10.1016/j.jnucmat.2014.12.056

Displacement cascades and defects annealing in tungsten, Part I: defect database from molecular dynamics simulations

Authors: Wahyu Setyawan, Giridhar Nandipati, Kenneth J. Roche, Howard L. Heinisch, Brian D. Wirth, Richard J. Kurtz

Abstract: Molecular dynamics simulations have been used to generate a comprehensive database of surviving defects due to displacement cascades in bulk tungsten. Twenty one data points of primary knock-on atom (PKA) energies ranging from 100 eV (sub-threshold energy) to 100 keV ($\sim$780$\times E_d$, where $E_d$ is the average displacement threshold energy) have been completed at 300 K, 1025 K and 2050 K. W… ▽ More Molecular dynamics simulations have been used to generate a comprehensive database of surviving defects due to displacement cascades in bulk tungsten. Twenty one data points of primary knock-on atom (PKA) energies ranging from 100 eV (sub-threshold energy) to 100 keV ($\sim$780$\times E_d$, where $E_d$ is the average displacement threshold energy) have been completed at 300 K, 1025 K and 2050 K. Within this range of PKA energies, two regimes of power-law energy-dependence of the defect production are observed. A distinct power-law exponent characterizes the number of Frenkel pairs produced within each regime. The two regimes intersect at a transition energy which occurs at approximately 250$\times E_d$. The transition energy also marks the onset of the formation of large self-interstitial atom (SIAs) clusters (size 14 or more). The observed defect clustering behavior is asymmetric, with SIA clustering increasing with temperature, while the vacancy clustering decreases. This asymmetry increases with temperature such that at 2050 K ($\sim 0.5 T_m$) practically no large vacancy clusters are formed, meanwhile large SIA clusters appear in all simulations. The implication of such asymmetry on the long-term defect survival and damage accumulation is discussed. In addition, rare $<$100$>${110} SIA loops are observed. △ Less

Submitted 25 April, 2014; originally announced April 2014.

Comments: 10 pages, 6 figures

arXiv:1404.5247 [pdf, other]

doi 10.1016/j.jnucmat.2014.09.067

Displacement cascades and defects annealing in tungsten, Part II: Object kinetic Monte Carlo Simulation of Tungsten Cascade Aging

Authors: Giridhar Nandipati, Wahyu Setyawan, Howard L. Heinisch, Kenneth J. Roche, Richard J. Kurtz, Brian D. Wirth

Abstract: We describe the results of object kinetic Monte Carlo (OKMC) simulations of the annealing of primary cascade damage in bulk tungsten using a comprehensive database of cascades obtained from molecular dynamics [1] as a function of primary knock-on atom (PKA) energy and direction, and temperatures of 300, 1025 and 2050 K. An increase in SIA clustering but decrease in vacancy clustering with temperat… ▽ More We describe the results of object kinetic Monte Carlo (OKMC) simulations of the annealing of primary cascade damage in bulk tungsten using a comprehensive database of cascades obtained from molecular dynamics [1] as a function of primary knock-on atom (PKA) energy and direction, and temperatures of 300, 1025 and 2050 K. An increase in SIA clustering but decrease in vacancy clustering with temperature combined with disparate mobilities of SIAs versus vacancies causes an interesting temperature effect on cascade annealing, which is quite different from what one would expect. The annealing efficiency (ratio of number of defects after and before annealing) exhibits an inverse U-shape curve as a function of temperature. In addition, we will also describe the capabilities of our newly developed OKMC code; KSOME (kinetic simulations of microstructure evolution) used to carryout these simulations △ Less

Submitted 21 April, 2014; originally announced April 2014.

arXiv:1401.1440 [pdf, ps, other]

doi 10.1088/0004-637X/783/1/52

A Redshift Survey of the Strong Lensing Cluster Abell 383

Authors: Margaret J. Geller, Ho Seong Hwang, Antonaldo Diaferio, Michael J. Kurtz, Dan Coe, Kenneth J. Rines

Abstract: Abell 383 is a famous rich cluster (z = 0.1887) imaged extensively as a basis for intensive strong and weak lensing studies. Nonetheless there are few spectroscopic observations. We enable dynamical analyses by measuring 2360 new redshifts for galaxies with r$_{petro} \leq 20.5$ and within 50$^\prime$ of the BCG (Brightest Cluster Galaxy: R.A.$_{2000} = 42.014125^\circ$, Decl… ▽ More Abell 383 is a famous rich cluster (z = 0.1887) imaged extensively as a basis for intensive strong and weak lensing studies. Nonetheless there are few spectroscopic observations. We enable dynamical analyses by measuring 2360 new redshifts for galaxies with r$_{petro} \leq 20.5$ and within 50$^\prime$ of the BCG (Brightest Cluster Galaxy: R.A.$_{2000} = 42.014125^\circ$, Decl$_{2000} = -03.529228^\circ$). We apply the caustic technique to identify 275 cluster members within 7$h^{-1}$ Mpc of the hierarchical cluster center. The BCG lies within $-11 \pm 110$ km s$^{-1}$ and 21 $\pm 56 h^{-1}$ kpc of the hierarchical cluster center; the velocity dispersion profile of the BCG appears to be an extension of the velocity dispersion profile based on cluster members. The distribution of cluster members on the sky corresponds impressively with the weak lensing contours of Okabe et al. (2010) especially when the impact of foreground and background structure is included. The values of R$_{200}$ = $1.22\pm 0.01 h^{-1}$ Mpc and M$_{200}$ = $(5.07 \pm 0.09)\times 10^{14} h^{-1}$ M$_\odot$ obtained by application of the caustic technique agree well with recent completely independent lensing measures. The caustic estimate extends direct measurement of the cluster mass profile to a radius of $\sim 5 h^{-1}$ Mpc. △ Less

Submitted 7 January, 2014; originally announced January 2014.

Comments: 29 pages, 9 figures, ApJ accepted

Showing 1–50 of 127 results for author: Kurtz, J