Search | arXiv e-print repository

Experimenting with Large Language Models and vector embeddings in NASA SciX

Authors: Sergi Blanco-Cuaresma, Ioana Ciucă, Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Kelly E. Lockhart, Felix Grezes, Thomas Allen, Golnaz Shapurian, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Daniel Chivvis, Fernanda de Macedo Alves, Jean-Claude Paquin, Jennifer Bartlett, Mugdha Polimera, Stephanie Jarmak

Abstract: Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed a… ▽ More Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: To appear in the proceedings of the 33th annual international Astronomical Data Analysis Software & Systems (ADASS XXXIII)

arXiv:2312.08579 [pdf, other]

Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach

Authors: Golnaz Shapurian, Michael J Kurtz, Alberto Accomazzi

Abstract: The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Na… ▽ More The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap with places or people's names that they are named after, for example, Syria, Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some feature names have been used in many contexts, for instance, Apollo, which can refer to mission, program, sample, astronaut, seismic, seismometers, core, era, data, collection, instrument, and station, in addition to the crater on the Moon. Some feature names can appear in the text as adjectives, like the lunar craters Black, Green, and White. Some feature names in other contexts serve as directions, like craters West and South on the Moon. Additionally, some features share identical names across different celestial bodies, requiring disambiguation, such as the Adams crater, which exists on both the Moon and Mars. We present a multi-step pipeline combining rule-based filtering, statistical relevance analysis, part-of-speech (POS) tagging, named entity recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) matching, and inference with a locally installed large language model (LLM) to reliably identify planetary names despite these challenges. When evaluated on a dataset of astronomy papers from the Astrophysics Data System (ADS), this methodology achieves an F1-score over 0.97 in disambiguating planetary feature names. △ Less

Submitted 17 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2212.00744 [pdf, ps, other]

Improving astroBERT using Semantic Textual Similarity

Authors: Felix Grezes, Thomas Allen, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Pavlos Protopapas

Abstract: The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first… ▽ More The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT. △ Less

Submitted 29 November, 2022; originally announced December 2022.

arXiv:2202.00777 [pdf, ps, other]

Web accessibility trends and implementation in dynamic web applications

Authors: Timothy W. Hostetler, Shinyi Chen, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Carolyn S. Grant, Edwin Henneken, Donna M. Thompson, Roman Chyla, Golnaz Shapurian, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Stephen McDonald, Felix Grezes

Abstract: The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to pre… ▽ More The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to present the information provided by the website. NASA ADS follows the official Web Content Accessibility Guidelines (WCAG) standard for ensuring accessibility of all its applications, striving to exceed this standard where possible. Through the use of both internal audits and external expert review based on these guidelines, we have identified many areas for improving accessibility in our current web application, and have implemented a number of updates to the UI as a result of this. We present an overview of some current web accessibility trends, discuss our experience incorporating these trends in our web application, and discuss the lessons learned and recommendations for future projects. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: Submitted to ADASS XXXI (2021)

arXiv:2112.00590 [pdf, ps, other]

Building astroBERT, a language model for Astronomy & Astrophysics

Authors: Felix Grezes, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Shinyi Chen, Chris Tanner, Pavlos Protopapas

Abstract: The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and… ▽ More The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are develo** our own named entity recognition tool. We present here our preliminary results and lessons learned. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2012.03470 [pdf, ps, other]

doi 10.3847/1538-3881/abc06e

Center for Astrophysics Optical Infrared Science Archive. I. FAST Spectrograph

Authors: Jessica Mink, Warren R. Brown, Igor V. Chilingarian, Daniel Fabricant, Michael J. Kurtz, Sean Moran, Jaehyon Rhee, Susan Tokarz, William F. Wyatt

Abstract: We announce the public release of 141,531 moderate-dispersion optical spectra of 72,247 objects acquired over the past 25 years with the FAST Spectrograph on the Fred L. Whipple Observatory 1.5-meter Tillinghast telescope. We describe the data acquisition and processing so that scientists can understand the spectra. We highlight some of the largest FAST survey programs, and make recommendations fo… ▽ More We announce the public release of 141,531 moderate-dispersion optical spectra of 72,247 objects acquired over the past 25 years with the FAST Spectrograph on the Fred L. Whipple Observatory 1.5-meter Tillinghast telescope. We describe the data acquisition and processing so that scientists can understand the spectra. We highlight some of the largest FAST survey programs, and make recommendations for use. The spectra have been placed in a Virtual Observatory accessible archive and are ready for download. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 17 pages, 18 figures, 8 tables

arXiv:2010.01418 [pdf]

doi 10.3847/25c2cfeb.8d12c399

Second Order Operators in the NASA Astrophysics Data System

Authors: Michael J. Kurtz, Roman Chyla

Abstract: Second Order Operators (SOOs) are database functions which form secondary queries based on attributes of the objects returned in an initial query; they can provide powerful methods to investigate complex, multipartite information graphs. The NASA Astrophysics Data System (ADS) has implemented four SOOs, reviews, useful, trending, and similar which use the citations, references, downloads, and abst… ▽ More Second Order Operators (SOOs) are database functions which form secondary queries based on attributes of the objects returned in an initial query; they can provide powerful methods to investigate complex, multipartite information graphs. The NASA Astrophysics Data System (ADS) has implemented four SOOs, reviews, useful, trending, and similar which use the citations, references, downloads, and abstract text. This tutorial describes these operators in detail, both alone and in conjunction with other functions. It is intended for scientists and others who wish to make fuller use of the ADS database. Basic knowledge of the ADS is assumed. △ Less

Submitted 3 October, 2020; originally announced October 2020.

Comments: ADS Bibcode:2020BAAS...52b0207K, author's version

Journal ref: Bulletin of the American Astronomical Society, Vol. 52, No. 2, id. 0207 2020

arXiv:2009.14323 [pdf]

doi 10.3847/25c2cfeb.704b260e

Enabling Synergy: Improving the Information Infrastructure for Planetary Science

Authors: Michael J. Kurtz, Alberto Accomazzi, Edwin A. Henneken

Abstract: In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information… ▽ More In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information infrastructure be created to collate and curate information on features and objects in our solar system, beginning with the USGS/IAU Gazetteer of Planetary Nomenclature. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: 8 pages, submitted to the Planetary Science and Astrobiology Decadal Survey 2023-2032

arXiv:2009.05048 [pdf, ps, other]

Agile methodologies in teams with highly creative and autonomous members

Authors: Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi

Abstract: The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented c… ▽ More The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented correctly and not because Scrum may have its own flaws. This grants irrefutability to the methodology, discouraging deviations to fit the actual needs and peculiarities of the developers. In particular, the members of the NASA ADS team are highly creative and autonomous whose motivation can be affected if their freedom is too strongly constrained. We present our experience following Agile principles, reusing certain Scrum elements and seeking the satisfaction of the team members, while rapidly reacting/kee** the project in line with our stakeholders expectations. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Comments: To appear in the proceedings of the 29th annual international Astronomical Data Analysis Software & Systems (ADASS XXIX)

arXiv:1903.00297 [pdf]

From Dark Energy to Exolife: Improving the Digital Information Infrastructure for Astrophysics

Authors: Michael J. Kurtz, Alberto Accomazzi

Abstract: Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served b… ▽ More Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served by the existence of an open, distributed network of data centers and archives. However, institutional barriers and differing research cultures have prevented cross-disciplinary collaborations, creating fragmented knowledge and stove-piped research activities. This must change in order for the broader community of scientists to work together and solve our most ambitious decadal challenges. Interdisciplinary inquiry is best supported by bringing researchers together at the information discovery level. In order to cross the traditional disciplinary silos we must allow scientists both to explore new ideas and to gain access to new data and knowledge. This is best enabled by providing discovery platforms which allow them to explore and connect different research threads in the literature, identify communities of experts, access and analyze the related published datasets, measurements and catalogs. △ Less

Submitted 1 March, 2019; originally announced March 2019.

Comments: 6 pages, whitepaper submitted to Astro2020, the Astronomy and Astrophysics Decadal Survey

arXiv:1901.05463 [pdf, ps, other]

Fundamentals of effective cloud management for the new NASA Astrophysics Data System

Authors: Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi, Nathan Rapport

Abstract: The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient s… ▽ More The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: To appear in the proceedings of the 28th annual international Astronomical Data Analysis Software & Systems (ADASS XXVIII)

arXiv:1803.03598 [pdf]

Merging the Astrophysics and Planetary Science Information Systems

Authors: Michael J. Kurtz, Alberto Accomazzi, Edwin A. Henneken

Abstract: Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the… ▽ More Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the Astrophysics Data System (ADS). The information system for Planetary Science does not have a central component linking the literature with the observational and theoretical data. Here we propose that the Committee on an Exoplanet Science Strategy recommend that this linkage be built, with the ADS playing the role in Planetary Science which it already plays in Astrophysics. This will require additional resources for the ADS, and the Planetary Data System (PDS), as well as other international collaborators △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: Whitepaper submitted to the Committee on an Exoplanet Science Strategy

arXiv:1801.00815 [pdf]

doi 10.1007/978-0-585-33110-2_3

Advice from the Oracle: Really Intelligent Information Retrieval

Authors: Michael J. Kurtz

Abstract: What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as ment… ▽ More What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: Author copy; published 25 years ago at the beginning of the Astrophysics Data System; 2018 keywords added

Journal ref: In: Heck A., Murtagh F. (eds) Intelligent Information Retrieval: The Case of Astronomy and Related Space Sciences. Astrophysics and Space Science Library, vol 182. Springer, Dordrecht (1993)

arXiv:1712.06704 [pdf, ps, other]

Multilingual Topic Models

Authors: Kriste Krstovski, Michael J. Kurtz, David A. Smith, Alberto Accomazzi

Abstract: Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document… ▽ More Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of map** diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement. △ Less

Submitted 18 December, 2017; originally announced December 2017.

Comments: 18 pages, 9 figures

arXiv:1710.08505 [pdf, ps, other]

doi 10.1051/epjconf/201818608001

New ADS Functionality for the Curator

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Steven McDonald, Taylor J. Shaulis, Sergi Blanco-Cuaresma, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton

Abstract: In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community… ▽ More In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community to support data and software citations, we discuss what steps the ADS is taking to provide the needed infrastructure in collaboration with publishers and data providers. A new API provides access to the ADS search interface, metrics, and libraries allowing users to programmatically automate discovery and curation tasks. The new ADS interface supports a greater integration of content and services with a variety of partners, including ORCID claiming, indexing of SIMBAD objects, and article graphics from a variety of publishers. Finally, we highlight how librarians can facilitate the ingest of gray literature that they curate into our system. △ Less

Submitted 23 October, 2017; originally announced October 2017.

Comments: Submitted to the Proceedings of Library and Information Services in Astronomy VIII, Strasbourg, France

arXiv:1707.09955 [pdf]

doi 10.1051/epjconf/201818606004

Comparing People with Bibliometrics

Authors: Michael J. Kurtz

Abstract: Bibliometric indicators, citation counts and/or download counts are increasingly being used to inform personnel decisions such as hiring or promotions. These statistics are very often misused. Here we provide a guide to the factors which should be considered when using these so-called quantitative measures to evaluate people. Rules of thumb are given for when begin to use bibliometric measures whe… ▽ More Bibliometric indicators, citation counts and/or download counts are increasingly being used to inform personnel decisions such as hiring or promotions. These statistics are very often misused. Here we provide a guide to the factors which should be considered when using these so-called quantitative measures to evaluate people. Rules of thumb are given for when begin to use bibliometric measures when comparing otherwise similar candidates. △ Less

Submitted 31 July, 2017; originally announced July 2017.

Comments: to appear in Proceedings of Library and Information Science in Astronomy VIII (LISA-8)

arXiv:1706.02153 [pdf]

doi 10.1007/978-3-030-02511-3_32

Usage Bibliometrics as a Tool to Measure Research Activity

Authors: Edwin A. Henneken, Michael J. Kurtz

Abstract: Measures for research activity and impact have become an integral ingredient in the assessment of a wide range of entities (individual researchers, organizations, instruments, regions, disciplines). Traditional bibliometric indicators, like publication and citation based indicators, provide an essential part of this picture, but cannot describe the complete picture. Since reading scholarly publica… ▽ More Measures for research activity and impact have become an integral ingredient in the assessment of a wide range of entities (individual researchers, organizations, instruments, regions, disciplines). Traditional bibliometric indicators, like publication and citation based indicators, provide an essential part of this picture, but cannot describe the complete picture. Since reading scholarly publications is an essential part of the research life cycle, it is only natural to introduce measures for this activity in attempts to quantify the efficiency, productivity and impact of an entity. Citations and reads are significantly different signals, so taken together, they provide a more complete picture of research activity. Most scholarly publications are now accessed online, making the study of reads and their patterns possible. Click-stream logs allow us to follow information access by the entire research community, real-time. Publication and citation datasets just reflect activity by authors. In addition, download statistics will help us identify publications with significant impact, but which do not attract many citations. Click-stream signals are arguably more complex than, say, citation signals. For one, they are a superposition of different classes of readers. Systematic downloads by crawlers also contaminate the signal, as does browsing behavior. We discuss the complexities associated with clickstream data and how, with proper filtering, statistically significant relations and conclusions can be inferred from download statistics. We describe how download statistics can be used to describe research activity at different levels of aggregation, ranging from organizations to countries. These statistics show a correlation with socio-economic indicators. A comparison will be made with traditional bibliometric indicators. We will argue that astronomy is representative of more general trends. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: 25 pages, 11 figures, accepted for publication in Handbook of Quantitative Science and Technology Research, Springer

arXiv:1603.06885 [pdf, ps, other]

doi 10.3847/0067-0049/224/1/11

SHELS: Complete Redshift Surveys of Two Widely Separated Fields

Authors: Margaret J. Geller, Ho Seong Hwang, Ian P. Dell'Antonio, Harus Jabran Zahid, Michael J. Kurtz, Daniel G. Fabricant

Abstract: The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey. Both fields are more than 94% complete to a Galactic extinction corrected R0 = 20.2. Here we describe the redshift survey of the F1 field centered at R.A. = 00h53m25.3s and Decl = 12d33m55s; like F2, the F1 field covers 4 sq deg. The redshift survey… ▽ More The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey. Both fields are more than 94% complete to a Galactic extinction corrected R0 = 20.2. Here we describe the redshift survey of the F1 field centered at R.A. = 00h53m25.3s and Decl = 12d33m55s; like F2, the F1 field covers 4 sq deg. The redshift survey of the F1 field includes 9426 new galaxy redshifts measured with Hectospec on the MMT (published here). As a guide to future uses of the combined survey we compare the mass metallicity relation and the distributions of D4000 as a function of stellar mass and redshift for the two fields. The mass-metallicity relations differ by an insignificant 1.6 sigma. For galaxies in the stellar mass range 1.e10 to 1.e11 MSun, the increase in the star-forming fraction with redshift is remarkably similar in the two fields. The seemingly surprising 31-38% difference in the overall galaxy counts in F1 and F2 is probably consistent with the expected cosmic variance given the subtleties of the relative systematics in the two surveys. We also review the Deep Lens Survey cluster detections in the two fields: poorer photometric data for F1 precluded secure detection of the single massive cluster at z = 0.35 that we find in SHELS. Taken together the two fields include 16,055 redshifts for galaxies with R0 <= 20.2 and 20,754 redshifts for galaxies with R <= 20.6. These dense surveys in two well-separated fields provide a basis for future investigations of galaxy properties and large-scale structure. △ Less

Submitted 22 March, 2016; originally announced March 2016.

Comments: 24 pages, 6 tables, 13 figures; ApJS, accepted; full data tables available in journal upon publication

arXiv:1602.06343 [pdf, ps, other]

doi 10.3847/0004-637X/818/2/173

HectoMAP and Horizon Run 4: Dense Structures and Voids in the Real and Simulated Universe

Authors: Ho Seong Hwang, Margaret J. Geller, Changbom Park, Daniel G. Fabricant, Michael J. Kurtz, Kenneth J. Rines, Juhan Kim, Antonaldo Diaferio, H. Jabran Zahid, Perry Berlind, Michael Calkins, Susan Tokarz, Sean Moran

Abstract: HectoMAP is a dense redshift survey of red galaxies covering a 53 $deg^{2}$ strip of the northern sky. HectoMAP is 97\% complete for galaxies with $r<20.5$, $(g-r)>1.0$, and $(r-i)>0.5$. The survey enables tests of the physical properties of large-scale structure at intermediate redshift against cosmological models. We use the Horizon Run 4, one of the densest and largest cosmological simulations… ▽ More HectoMAP is a dense redshift survey of red galaxies covering a 53 $deg^{2}$ strip of the northern sky. HectoMAP is 97\% complete for galaxies with $r<20.5$, $(g-r)>1.0$, and $(r-i)>0.5$. The survey enables tests of the physical properties of large-scale structure at intermediate redshift against cosmological models. We use the Horizon Run 4, one of the densest and largest cosmological simulations based on the standard $Λ$ Cold Dark Matter ($Λ$CDM) model, to compare the physical properties of observed large-scale structures with simulated ones in a volume-limited sample covering 8$\times10^6$ $h^{-3}$ Mpc$^3$ in the redshift range $0.22<z<0.44$. We apply the same criteria to the observations and simulations to identify over- and under-dense large-scale features of the galaxy distribution. The richness and size distributions of observed over-dense structures agree well with the simulated ones. Observations and simulations also agree for the volume and size distributions of under-dense structures, voids. The properties of the largest over-dense structure and the largest void in HectoMAP are well within the distributions for the largest structures drawn from 300 Horizon Run 4 mock surveys. Overall the size, richness and volume distributions of observed large-scale structures in the redshift range $0.22<z<0.44$ are remarkably consistent with predictions of the standard $Λ$CDM model. △ Less

Submitted 19 February, 2016; originally announced February 2016.

Comments: 20 pages, 16 figures, 1 table. Published in ApJ (818:106, 2016). Paper with high resolution figures is available at https://astro.kias.re.kr/~hshwang/Hwang_etal16_LSS_HectoMAP_HorizonRun4_high.pdf

Journal ref: 2016, ApJ, 818, 173

arXiv:1601.07858 [pdf, ps, other]

Aggregation and Linking of Observational Metadata in the ADS

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Alexandra Holachek, Jonathan Elliott

Abstract: We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched b… ▽ More We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched by ADS via the generation of citations and usage data, and through the aggregation of external resources from astronomy data archives and libraries. Important sources of such additional information are the metadata describing observing proposals and high level data products, which, once ingested in ADS, become easily discoverable and citeable by the science community. Bibliographic studies have shown that the integration of links between data archives and the ADS provides greater visibility to data products and increased citations to the literature associated with them. △ Less

Submitted 28 January, 2016; originally announced January 2016.

Comments: 4 pages, Proceedings of the ADASS XXV conference

arXiv:1601.01611 [pdf, other]

Automatic Construction of Evaluation Sets and Evaluation of Document Similarity Models in Large Scholarly Retrieval Systems

Authors: Kriste Krstovski, David A. Smith, Michael J. Kurtz

Abstract: Retrieval systems for scholarly literature offer the ability for the scientific community to search, explore and download scholarly articles across various scientific disciplines. Mostly used by the experts in the particular field, these systems contain user community logs including information on user specific downloaded articles. In this paper we present a novel approach for automatically evalua… ▽ More Retrieval systems for scholarly literature offer the ability for the scientific community to search, explore and download scholarly articles across various scientific disciplines. Mostly used by the experts in the particular field, these systems contain user community logs including information on user specific downloaded articles. In this paper we present a novel approach for automatically evaluating document similarity models in large collections of scholarly publications. Unlike typical evaluation settings that use test collections consisting of query documents and human annotated relevance judgments, we use download logs to automatically generate pseudo-relevant set of similar document pairs. More specifically we show that consecutively downloaded document pairs, extracted from a scholarly information retrieval (IR) system, could be utilized as a test collection for evaluating document similarity models. Another novel aspect of our approach lies in the method that we employ for evaluating the performance of the model by comparing the distribution of consecutively downloaded document pairs and random document pairs in log space. Across two families of similarity models, that represent documents in the term vector and topic spaces, we show that our evaluation approach achieves very high correlation with traditional performance metrics such as Mean Average Precision (MAP), while being more efficient to compute. △ Less

Submitted 7 January, 2016; originally announced January 2016.

arXiv:1510.09099 [pdf]

doi 10.1002/asi.23689

Measuring Metrics - A forty year longitudinal cross-validation of citations, downloads, and peer review in Astrophysics

Authors: Michael J. Kurtz, Edwin A. Henneken

Abstract: Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 indiv… ▽ More Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 individuals who received a U.S. PhD in astronomy in the period 1972-1976. By examining the same and different measures at the same and different times for the same individuals we are able to show the capabilities and limitations of each measure. Because the distributions are lognormal measurement uncertainties are multiplicative; we show that in order to state with 95% confidence that one person's citations and/or downloads are significantly higher than another person's, the log difference in the ratio of counts must be at least 0.3 dex, which corresponds to a multiplicative factor of two. △ Less

Submitted 30 October, 2015; originally announced October 2015.

Comments: Author's version of manuscript accepted for publication in the Journal of the Association for Information Science and Technology (JASIST); 35 pages 16 figures

arXiv:1503.05881 [pdf, other]

ADS 2.0: new architecture, API and services

Authors: Roman Chyla, Alberto Accomazzi, Alexandra Holachek, Carolyn S. Grant, Jonathan Elliott, Edwin A. Henneken, Donna M. Thompson, Michael J. Kurtz, Stephen S. Murray, Vladimir Sudilovsky

Abstract: The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using… ▽ More The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADS's underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Comments: ADASS Conference 2014

arXiv:1503.04194 [pdf, other]

ADS: The Next Generation Search Platform

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Roman Chyla, James Luker, Carolyn S. Grant, Donna M. Thompson, Alexandra Holachek, Rahul Dave, Stephen S. Murray

Abstract: Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently develo**. Starting in 2011, the ADS started to systematically collect, parse… ▽ More Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently develo**. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/ △ Less

Submitted 13 March, 2015; originally announced March 2015.

Comments: Submitted to Library and Information Services in Astronomy VII, Naples, Italy

arXiv:1406.4542 [pdf, ps, other]

Computing and Using Metrics in the ADS

Authors: Edwin A. Henneken, Alberto Accomazzi, Michael J. Kurtz, Carolyn S. Grant, Donna Thompson, Jay Luker, Roman Chyla, Alexandra Holachek, Stephen S. Murray

Abstract: Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication… ▽ More Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness. △ Less

Submitted 17 June, 2014; originally announced June 2014.

Comments: to appear in proceedings of LISA VII conference, Naples, Italy

arXiv:1405.7704 [pdf, ps, other]

doi 10.1088/0067-0049/213/2/35

SHELS: A Complete Galaxy Redshift Survey with R$\leq$20.6

Authors: Margaret J. Geller, Ho Seong Hwang, Daniel G. Fabricant, Michael J. Kurtz, Ian P. Dell'Antonio, Harus Jabran Zahid

Abstract: The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey to a limiting R = 20.6. Here we describe the redshift survey of the F2 field (R.A.$_{2000}$ = 09$^h$19$^m$32.4$^s$ and Decl.$_{2000}$ = +30$^{\circ}$00$^{\prime}$00$^{\prime\prime}$). The survey includes 16,294 new redshifts measured with the Hectosp… ▽ More The SHELS (Smithsonian Hectospec Lensing Survey) is a complete redshift survey covering two well-separated fields (F1 and F2) of the Deep Lens Survey to a limiting R = 20.6. Here we describe the redshift survey of the F2 field (R.A.$_{2000}$ = 09$^h$19$^m$32.4$^s$ and Decl.$_{2000}$ = +30$^{\circ}$00$^{\prime}$00$^{\prime\prime}$). The survey includes 16,294 new redshifts measured with the Hectospec on the MMT. The resulting survey of the 4 deg$^2$ F2 field is 95\% complete to R = 20.6, currently the densest survey to this magnitude limit. The median survey redshift is $ z = 0.3$; the survey provides a view of structure in the range 0.1 $ \lesssim z \lesssim 0.6$. A movie displays the large-scale structure in the survey region. We provide a redshift, spectral index D$_n$4000, and stellar mass for each galaxy in the survey. We also provide a metallicity for each galaxy in the range 0.2 $< z <0. 38$. To demonstrate potential applications of the survey, we examine the behavior of the index D$_n$4000 as a function of galaxy luminosity, stellar mass, and redshift. The known evolutionary and stellar mass dependent properties of the galaxy population are cleanly evident in the data. We also show that the mass-metallicity relation previously determined from these data is robust to the analysis approach. △ Less

Submitted 29 May, 2014; originally announced May 2014.

Comments: 45 pages, 16 figures, 7 tables. Data will be available only when the paper is published in Astrophysical Journal Supplements (now submitted). Movie and full resolution figures are available at https://www.cfa.harvard.edu/~mjg/f6movie.mp4 and https://www.cfa.harvard.edu/~mjg/SHELS.pdf

arXiv:1401.1440 [pdf, ps, other]

doi 10.1088/0004-637X/783/1/52

A Redshift Survey of the Strong Lensing Cluster Abell 383

Authors: Margaret J. Geller, Ho Seong Hwang, Antonaldo Diaferio, Michael J. Kurtz, Dan Coe, Kenneth J. Rines

Abstract: Abell 383 is a famous rich cluster (z = 0.1887) imaged extensively as a basis for intensive strong and weak lensing studies. Nonetheless there are few spectroscopic observations. We enable dynamical analyses by measuring 2360 new redshifts for galaxies with r$_{petro} \leq 20.5$ and within 50$^\prime$ of the BCG (Brightest Cluster Galaxy: R.A.$_{2000} = 42.014125^\circ$, Decl… ▽ More Abell 383 is a famous rich cluster (z = 0.1887) imaged extensively as a basis for intensive strong and weak lensing studies. Nonetheless there are few spectroscopic observations. We enable dynamical analyses by measuring 2360 new redshifts for galaxies with r$_{petro} \leq 20.5$ and within 50$^\prime$ of the BCG (Brightest Cluster Galaxy: R.A.$_{2000} = 42.014125^\circ$, Decl$_{2000} = -03.529228^\circ$). We apply the caustic technique to identify 275 cluster members within 7$h^{-1}$ Mpc of the hierarchical cluster center. The BCG lies within $-11 \pm 110$ km s$^{-1}$ and 21 $\pm 56 h^{-1}$ kpc of the hierarchical cluster center; the velocity dispersion profile of the BCG appears to be an extension of the velocity dispersion profile based on cluster members. The distribution of cluster members on the sky corresponds impressively with the weak lensing contours of Okabe et al. (2010) especially when the impact of foreground and background structure is included. The values of R$_{200}$ = $1.22\pm 0.01 h^{-1}$ Mpc and M$_{200}$ = $(5.07 \pm 0.09)\times 10^{14} h^{-1}$ M$_\odot$ obtained by application of the caustic technique agree well with recent completely independent lensing measures. The caustic estimate extends direct measurement of the cluster mass profile to a radius of $\sim 5 h^{-1}$ Mpc. △ Less

Submitted 7 January, 2014; originally announced January 2014.

Comments: 29 pages, 9 figures, ApJ accepted

arXiv:1308.4442 [pdf, ps, other]

doi 10.1086/673499

Measuring Galaxy Velocity Dispersions with Hectospec

Authors: Daniel Fabricant, Igor Chilingarian, Ho Seong Hwang, Michael J. Kurtz, Margaret J. Geller

Abstract: We describe a robust technique based on the ULySS IDL code for measuring velocity dispersions of galaxies observed with the MMT's fiber-fed spectrograph, Hectospec. This procedure is applicable to all Hectospec spectra having a signal-to-noise >5 and weak emission lines. We estimate the internal error in the Hectospec velocity dispersion measurements by comparing duplicate measurements of 171 gala… ▽ More We describe a robust technique based on the ULySS IDL code for measuring velocity dispersions of galaxies observed with the MMT's fiber-fed spectrograph, Hectospec. This procedure is applicable to all Hectospec spectra having a signal-to-noise >5 and weak emission lines. We estimate the internal error in the Hectospec velocity dispersion measurements by comparing duplicate measurements of 171 galaxies. For a sample of 984 galaxies with a median z=0.10, we compare velocity dispersions measured by Hectospec through a 1.5 arcsec diameter optical fiber with those measured by the Sloan Digital Sky Survey (SDSS) and Baryon Oscillation Spectral Survey (BOSS) through 3 arcsec and 2 arcsec diameter optical fibers, respectively. The systematic differences between the Hectospec and the SDSS/BOSS measurements are <7% for velocity dispersions between 100 and 300 km/s, the differences are no larger than the differences among the three BOSS velocity dispersion reductions. We analyze the scatter about the fundamental plane and find no significant redshift dependent systematics in our velocity dispersion measurements to z~0.6. This analysis also confirms our estimation of the measurement errors. In one hour in good conditions, we demonstrate that we achieve 30 km/s velocity dispersion errors for galaxies with an SDSS r fiber magnitude of 21. △ Less

Submitted 20 August, 2013; originally announced August 2013.

Comments: 8 figures

arXiv:1304.4656 [pdf, ps, other]

doi 10.1088/0004-637X/786/2/93

Reducing Systematic Error in Cluster Scale Weak Lensing

Authors: Yousuke Utsumi, Satoshi Miyazaki, Margaret J. Geller, Ian P. Dell'Antonio, Masamune Oguri, Michael J. Kurtz, Takashi Hamana, Daniel G. Fabricant

Abstract: Weak lensing provides an important route toward collecting samples of clusters of galaxies selected by mass. Subtle systematic errors in image reduction can compromise the power of this technique. We use the B-mode signal to quantify this systematic error and to test methods for reducing this error. We show that two procedures are efficient in suppressing systematic error in the B-mode: (1) refine… ▽ More Weak lensing provides an important route toward collecting samples of clusters of galaxies selected by mass. Subtle systematic errors in image reduction can compromise the power of this technique. We use the B-mode signal to quantify this systematic error and to test methods for reducing this error. We show that two procedures are efficient in suppressing systematic error in the B-mode: (1) refinement of the mosaic CCD war** procedure to conform to absolute celestial coordinates and (2) truncation of the smoothing procedure on a scale of 10$^{\prime}$. Application of these procedures reduces the systematic error to 20% of its original amplitude. We provide an analytic expression for the distribution of the highest peaks in noise maps that can be used to estimate the fraction of false peaks in the weak lensing $κ$-S/N maps as a function of the detection threshold. Based on this analysis we select a threshold S/N = 4.56 for identifying an uncontaminated set of weak lensing peaks in two test fields covering a total area of $\sim 3$deg$^2$. Taken together these fields contain seven peaks above the threshold. Among these, six are probable systems of galaxies and one is a superposition. We confirm the reliability of these peaks with dense redshift surveys, x-ray and imaging observations. The systematic error reduction procedures we apply are general and can be applied to future large-area weak lensing surveys. Our high peak analysis suggests that with a S/N threshold of 4.5, there should be only 2.7 spurious weak lensing peaks even in an area of 1000 deg$^2$ where we expect $\sim$ 2000 peaks based on our Subaru fields. △ Less

Submitted 16 April, 2013; originally announced April 2013.

Comments: 30 pages, Submitted to Astrophysical Journal

arXiv:1209.3786 [pdf, ps, other]

doi 10.1088/0004-637X/767/1/15

Measuring the Ultimate Mass of Galaxy Clusters: Redshifts and Mass Profiles from the Hectospec Cluster Survey (HeCS)

Authors: Kenneth Rines, Margaret J. Geller, Antonaldo Diaferio, Michael J. Kurtz

Abstract: The infall regions of galaxy clusters represent the largest gravitationally bound structures in a $Λ$CDM universe. Measuring cluster mass profiles into the infall regions provides an estimate of the ultimate mass of these haloes. We use the caustic technique to measure cluster mass profiles from galaxy redshifts obtained with the Hectospec Cluster Survey (HeCS), an extensive spectroscopic survey o… ▽ More The infall regions of galaxy clusters represent the largest gravitationally bound structures in a $Λ$CDM universe. Measuring cluster mass profiles into the infall regions provides an estimate of the ultimate mass of these haloes. We use the caustic technique to measure cluster mass profiles from galaxy redshifts obtained with the Hectospec Cluster Survey (HeCS), an extensive spectroscopic survey of galaxy clusters with MMT/Hectospec. We survey 58 clusters selected by X-ray flux at 0.1$<$$z$$<$0.3. The survey includes 21,314 unique MMT/Hectospec redshifts for individual galaxies; 10,275 of these galaxies are cluster members. For each cluster we acquired high signal-to-noise spectra for $\sim 200$ cluster members and a comparable number of foreground/background galaxies. The cluster members trace out infall patterns around the clusters. The members define a very narrow red sequence. The velocity dispersions decline with radius; we demonstrate that the determination of the velocity dispersion is insensitive to the inclusion of bluer members (a small fraction of the cluster population). We apply the caustic technique to define membership and estimate the mass profiles to large radii. The ultimate halo mass of clusters (the mass that remains bound in the far future of a $Λ$CDM universe) is on average (1.99$\pm$0.11)$M_{200}$, a new observational cosmological test in essential agreement with simulations. Summed profiles binned in $M_{200}$ and in $L_X$ demonstrate that the predicted NFW form of the density profile is a remarkably good representation of the data in agreement with weak lensing results extending to large radius. The concentration of these summed profiles is also consistent with theoretical predictions. △ Less

Submitted 25 April, 2013; v1 submitted 17 September, 2012; originally announced September 2012.

Comments: revised to match version published in ApJ

arXiv:1209.2124 [pdf, other]

doi 10.1371/journal.pone.0046428

A measure of total research impact independent of time and discipline

Authors: Alberto Pepe, Michael J. Kurtz

Abstract: Authorship and citation practices evolve with time and differ by academic discipline. As such, indicators of research productivity based on citation records are naturally subject to historical and disciplinary effects. We observe these effects on a corpus of astronomer career data constructed from a database of refereed publications. We employ a simple mechanism to measure research output using au… ▽ More Authorship and citation practices evolve with time and differ by academic discipline. As such, indicators of research productivity based on citation records are naturally subject to historical and disciplinary effects. We observe these effects on a corpus of astronomer career data constructed from a database of refereed publications. We employ a simple mechanism to measure research output using author and reference counts available in bibliographic databases to develop a citation-based indicator of research productivity. The total research impact (tori) quantifies, for an individual, the total amount of scholarly work that others have devoted to his/her work, measured in the volume of research papers. A derived measure, the research impact quotient (riq), is an age independent measure of an individual's research ability. We demonstrate that these measures are substantially less vulnerable to temporal debasement and cross-disciplinary bias than the most popular current measures. The proposed measures of research impact, tori and riq, have been implemented in the Smithsonian/NASA Astrophysics Data System. △ Less

Submitted 10 September, 2012; originally announced September 2012.

Comments: 14 pages, 5 figures. PLoS ONE, in press

arXiv:1209.1318 [pdf]

doi 10.7551/mitpress/9445.001.0001

Finding and Recommending Scholarly Articles

Authors: Michael J. Kurtz, Edwin A. Henneken

Abstract: The rate at which scholarly literature is being produced has been increasing at approximately 3.5 percent per year for decades. This means that during a typical 40 year career the amount of new literature produced each year increases by a factor of four. The methods scholars use to discover relevant literature must change. Just like everybody else involved in information discovery, scholars are co… ▽ More The rate at which scholarly literature is being produced has been increasing at approximately 3.5 percent per year for decades. This means that during a typical 40 year career the amount of new literature produced each year increases by a factor of four. The methods scholars use to discover relevant literature must change. Just like everybody else involved in information discovery, scholars are confronted with information overload. Two decades ago, this discovery process essentially consisted of paging through abstract books, talking to colleagues and librarians, and browsing journals. A time-consuming process, which could even be longer if material had to be shipped from elsewhere. Now much of this discovery process is mediated by online scholarly information systems. All these systems are relatively new, and all are still changing. They all share a common goal: to provide their users with access to the literature relevant to their specific needs. To achieve this each system responds to actions by the user by displaying articles which the system judges relevant to the user's current needs. Recently search systems which use particularly sophisticated methodologies to recommend a few specific papers to the user have been called "recommender systems". These methods are in line with the current use of the term "recommender system" in computer science. We do not adopt this definition, rather we view systems like these as components in a larger whole, which is presented by the scholarly information systems themselves. In what follows we view the recommender system as an aspect of the entire information system; one which combines the massive memory capacities of the machine with the cognitive abilities of the human user to achieve a human-machine synergy. △ Less

Submitted 6 September, 2012; originally announced September 2012.

Comments: 14 pages, part of the forthcoming MIT book "Bibliometrics and Beyond: Metrics-Based Evaluation of Scholarly Research" edited by Blaise Cronin and Cassidy R. Sugimoto

arXiv:1209.0125 [pdf, other]

A History of Cluster Analysis Using the Classification Society's Bibliography Over Four Decades

Authors: Fionn Murtagh, Michael J. Kurtz

Abstract: The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2011. The Classification Society's Service, as it was termed, has been produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the J… ▽ More The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2011. The Classification Society's Service, as it was termed, has been produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the Journal of Classification. Among our findings are the following: an enormous increase in scholarly production post approximately 2000; a very major increase in quantity, coupled with work in different disciplines, from approximately 2004; and a major shift also from cluster analysis in earlier times having mathematics and psychology as disciplines of the journals published in, and affiliations of authors, contrasted with, in more recent times, a "centre of gravity" in management and engineering. △ Less

Submitted 16 August, 2013; v1 submitted 1 September, 2012; originally announced September 2012.

Comments: 23 pages, 9 figures

MSC Class: 62H30 ACM Class: I.5.3; H.3.3

arXiv:1208.3119 [pdf, ps, other]

doi 10.1088/0004-637X/758/1/25

SHELS: Optical Spectral Properties of WISE 22 μm-selected Galaxies

Authors: Ho Seong Hwang, Margaret J. Geller, Michael J. Kurtz, Ian P. Dell'Antonio, Daniel G. Fabricant

Abstract: We use a dense, complete redshift survey, the Smithsonian Hectospec Lensing Survey (SHELS), covering a 4 square degree region of a deep imaging survey, the Deep Lens Survey (DLS), to study the optical spectral properties of Wide-field Infrared Survey Explorer (WISE) 22 μm-selected galaxies. Among 507 WISE 22 μm-selected sources with (S/N)_{22μm}>3 (\simS_{22μm}>2.5 mJy), we identify the optical co… ▽ More We use a dense, complete redshift survey, the Smithsonian Hectospec Lensing Survey (SHELS), covering a 4 square degree region of a deep imaging survey, the Deep Lens Survey (DLS), to study the optical spectral properties of Wide-field Infrared Survey Explorer (WISE) 22 μm-selected galaxies. Among 507 WISE 22 μm-selected sources with (S/N)_{22μm}>3 (\simS_{22μm}>2.5 mJy), we identify the optical counterparts of 481 sources (\sim98%) at R<25.2 in the very deep, DLS R-band source catalog. Among them, 337 galaxies at R<21 have SHELS spectroscopic data. Most of these objects are at z<0.8. The infrared (IR) luminosities are in the range 4.5x10^8 (L_sun) < L_{IR} < 5.4x10^{12} (L_sun). Most 22 μm-selected galaxies are dusty star-forming galaxies with a small (<1.5) 4000 Åbreak. The stacked spectra of the 22 μm-selected galaxies binned in IR luminosity show that the strength of the [O III] line relative to Hβgrows with increasing IR luminosity. The optical spectra of the 22 μm-selected galaxies also show that there are some (\sim2.8%) unusual galaxies with very strong [Ne III] λ3869, 3968 emission lines that require hard ionizing radiation such as AGN or extremely young massive stars. The specific star formation rates (sSFRs) derived from the 3.6 and 22 μm flux densities are enhanced if the 22 μm-selected galaxies have close late-type neighbors. The sSFR distribution of the 22 μm-selected galaxies containing active galactic nuclei (AGNs) is similar to the distribution for star-forming galaxies without AGNs. We identify 48 dust-obscured galaxy (DOG) candidates with large (\gtrsim1000) mid-IR to optical flux density ratio. The combination of deep photometric and spectroscopic data with WISE data suggests that WISE can probe the universe to z\sim2. △ Less

Submitted 15 August, 2012; originally announced August 2012.

Comments: 18 pages, 17 figures. To appear in ApJ

arXiv:1201.1616 [pdf, other]

doi 10.1088/0004-637X/757/1/22

CLASH: Precise New Constraints on the Mass Profile of Abell 2261

Authors: Dan Coe, Keiichi Umetsu, Adi Zitrin, Megan Donahue, Elinor Medezinski, Marc Postman, Mauricio Carrasco, Timo Anguita, Margaret J. Geller, Kenneth J. Rines, Antonaldo Diaferio, Michael J. Kurtz, Larry Bradley, Anton Koekemoer, Wei Zheng, Mario Nonino, Alberto Molino, Andisheh Mahdavi, Doron Lemze, Leopoldo Infante, Sara Ogaz, Peter Melchior, Ole Host, Holland Ford, Claudio Grillo , et al. (21 additional authors not shown)

Abstract: We precisely constrain the inner mass profile of Abell 2261 (z=0.225) for the first time and determine this cluster is not "over-concentrated" as found previously, implying a formation time in agreement with ΛCDM expectations. These results are based on strong lensing analyses of new 16-band HST imaging obtained as part of the Cluster Lensing and Supernova survey with Hubble (CLASH). Combining thi… ▽ More We precisely constrain the inner mass profile of Abell 2261 (z=0.225) for the first time and determine this cluster is not "over-concentrated" as found previously, implying a formation time in agreement with ΛCDM expectations. These results are based on strong lensing analyses of new 16-band HST imaging obtained as part of the Cluster Lensing and Supernova survey with Hubble (CLASH). Combining this with revised weak lensing analyses of Subaru wide field imaging with 5-band Subaru + KPNO photometry, we place tight new constraints on the halo virial mass M_vir = 2.2\pm0.2\times10^15 M\odot/h70 (within r \approx 3 Mpc/h70) and concentration c = 6.2 \pm 0.3 when assuming a spherical halo. This agrees broadly with average c(M,z) predictions from recent ΛCDM simulations which span 5 <~ <c> <~ 8. Our most significant systematic uncertainty is halo elongation along the line of sight. To estimate this, we also derive a mass profile based on archival Chandra X-ray observations and find it to be ~35% lower than our lensing-derived profile at r2500 ~ 600 kpc. Agreement can be achieved by a halo elongated with a ~2:1 axis ratio along our line of sight. For this elongated halo model, we find M_vir = 1.7\pm0.2\times10^15 M\odot/h70 and c_vir = 4.6\pm0.2, placing rough lower limits on these values. The need for halo elongation can be partially obviated by non-thermal pressure support and, perhaps entirely, by systematic errors in the X-ray mass measurements. We estimate the effect of background structures based on MMT/Hectospec spectroscopic redshifts and find these tend to lower Mvir further by ~7% and increase cvir by ~5%. △ Less

Submitted 8 January, 2012; originally announced January 2012.

Comments: Submitted to the Astrophysical Journal. 19 pages, 14 figures

arXiv:1110.1380 [pdf, ps, other]

doi 10.1088/0004-6256/142/4/133

Map** the Universe: The 2010 Russell Lecture

Authors: Margaret J. Geller, Antonaldo Diaferio, Michael J. Kurtz

Abstract: Redshift surveys are a powerful tool of modern cosmology. We discuss two aspects of their power to map the distribution of mass and light in the universe: (1) measuring the mass distribution extending into the infall regions of rich clusters and (2) applying deep redshift surveys to the selection of clusters of galaxies and to the identification of very large structures (Great Walls). We preview t… ▽ More Redshift surveys are a powerful tool of modern cosmology. We discuss two aspects of their power to map the distribution of mass and light in the universe: (1) measuring the mass distribution extending into the infall regions of rich clusters and (2) applying deep redshift surveys to the selection of clusters of galaxies and to the identification of very large structures (Great Walls). We preview the HectoMAP project, a redshift survey with median redshift z = 0.34 covering 50 square degrees to r= 21. We emphasize the importance and power of spectroscopy for exploring and understanding the nature and evolution of structure in the universe. △ Less

Submitted 6 October, 2011; originally announced October 2011.

Comments: 19 pages, 5 figures (2 videos available in the on-line journal article)

Journal ref: Astronomical Journal 2011, Vol. 142, id133

arXiv:1107.2930 [pdf, ps, other]

doi 10.1088/0004-6256/143/4/102

The Faint End of the Luminosity Function and Low Surface Brightness Galaxies

Authors: Margaret J. Geller, Antonaldo Diaferio, Michael J. Kurtz, Ian P. Dell'Antonio, Daniel G. Fabricant

Abstract: SHELS (Smithsonian Hectospec Lensing Survey) is a dense redshift survey covering a 4 square degree region to a limiting R = 20.6. In the construction of the galaxy catalog and in the acquisition of spectroscopic targets, we paid careful attention to the survey completeness for lower surface brightness dwarf galaxies. Thus, although the survey covers a small area, it is a robust basis for computati… ▽ More SHELS (Smithsonian Hectospec Lensing Survey) is a dense redshift survey covering a 4 square degree region to a limiting R = 20.6. In the construction of the galaxy catalog and in the acquisition of spectroscopic targets, we paid careful attention to the survey completeness for lower surface brightness dwarf galaxies. Thus, although the survey covers a small area, it is a robust basis for computation of the slope of the faint end of the galaxy luminosity function to a limiting M_R = -13.3 + 5logh. We calculate the faint end slope in the R-band for the subset of SHELS galaxies with redshif ts in the range 0.02 <= z < 0.1, SHELS_{0.1}. This sample contains 532 galaxies with R< 20.6 and with a median surface brightness within the half light radius of SB_{50,R} = 21.82 mag arcsec^{-2}. We used this sample to make one of the few direct measurements of the dependence of the faint end of the galaxy luminosity function on surface brightness. For the sample as a whole the faint end slope, alpha = -1.31 +/- 0.04, is consistent with both the Blanton et al. (2005b) analysis of the SDSS and the Liu et al. (2008) analysis of the COSMOS field. This consistency is impressive given the very different approaches of th ese three surveys. A magnitude limited sample of 135 galaxies with optical spectroscopic reds hifts with mean half-light surface brightness, SB_{50,R} >= 22.5 mag arcsec^{-2} is unique to SHELS_{0.1}. The faint end slope is alpha_{22.5} = -1.52+/- 0.16. SHELS_{0.1} shows that lower surface brightness objects dominate the faint end slope of the l uminosity function in the field, underscoring the importance of surface brightness limits in evaluating measurements of the faint end slope and its evolution. △ Less

Submitted 9 March, 2012; v1 submitted 14 July, 2011; originally announced July 2011.

Comments: 34 pages, 13 figures, 3 tables, Astronomical Journal, in press (updated based on review)

arXiv:1106.5644 [pdf, ps, other]

The ADS in the Information Age - Impact on Discovery

Authors: Edwin A. Henneken, Michael J. Kurtz, Alberto Accomazzi

Abstract: The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scientis… ▽ More The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scientists, the ADS also moved into the public domain, as a tool for science education. In this paper we will highlight and discuss some aspects indicative of the impact the ADS has had on research and the access to scholarly publications. The ADS is funded by NASA Grant NNX09AB39G △ Less

Submitted 28 June, 2011; originally announced June 2011.

Comments: 10 pages, 5 figures, to appear in "Organizations, People and Strategies in Astronomy (OPSA)", volume 8

arXiv:1102.5743 [pdf, ps, other]

doi 10.1088/0004-637X/750/2/168

Testing Weak Lensing Maps With Redshift Surveys: A Subaru Field

Authors: Michael J. Kurtz, Margaret J. Geller, Yousuke Utsumi, Satoshi Miyazaki, Ian P. Dell'Antonio, Daniel G. Fabricant

Abstract: We use a dense redshift survey in the foreground of the Subaru GTO2deg^2 weak lensing field (centered at $α_{2000}$ = 16$^h04^m44^s$;$δ_{2000}$ =43^\circ11^{\prime}24^{\prime\prime}$) to assess the completeness and comment on the purity of massive halo identification in the weak lensing map. The redshift survey (published here) includes 4541 galaxies; 4405 are new redshifts measured with the Hecto… ▽ More We use a dense redshift survey in the foreground of the Subaru GTO2deg^2 weak lensing field (centered at $α_{2000}$ = 16$^h04^m44^s$;$δ_{2000}$ =43^\circ11^{\prime}24^{\prime\prime}$) to assess the completeness and comment on the purity of massive halo identification in the weak lensing map. The redshift survey (published here) includes 4541 galaxies; 4405 are new redshifts measured with the Hectospec on the MMT. Among the weak lensing peaks with a signal-to-noise greater that 4.25, 2/3 correspond to individual massive systems; this result is essentially identical to the Geller et al. (2010) test of the Deep Lens Survey field F2. The Subaru map, based on images in substantially better seeing than the DLS, enables detection of less massive halos at fixed redshift as expected. We demonstrate that the procedure adopted by Miyazaki et al. (2007) for removing some contaminated peaks from the weak lensing map improves agreement between the lensing map and the redshift survey in the identification of candidate massive systems. △ Less

Submitted 2 April, 2012; v1 submitted 28 February, 2011; originally announced February 2011.

Comments: Astrophysical Journal accepted version

arXiv:1102.2891 [pdf]

doi 10.1002/aris.2010.1440440108

Usage Bibliometrics

Authors: Michael J. Kurtz, Johan Bollen

Abstract: Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process. Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process. △ Less

Submitted 14 February, 2011; originally announced February 2011.

Comments: Publisher's PDF (by permission). Publisher web site: books.infotoday.com/asist/arist44.shtml

Journal ref: Annual Review of Information Science and Technology, vol 44, p. 3-64 (2010)

arXiv:1008.0826 [pdf, ps, other]

doi 10.1007/978-1-4419-8369-5_3

The Emerging Scholarly Brain

Authors: Michael J. Kurtz

Abstract: It is now a commonplace observation that human society is becoming a coherent super-organism, and that the information infrastructure forms its emerging brain. Perhaps, as the underlying technologies are likely to become billions of times more powerful than those we have today, we could say that we are now building the lizard brain for the future organism. It is now a commonplace observation that human society is becoming a coherent super-organism, and that the information infrastructure forms its emerging brain. Perhaps, as the underlying technologies are likely to become billions of times more powerful than those we have today, we could say that we are now building the lizard brain for the future organism. △ Less

Submitted 4 August, 2010; originally announced August 2010.

Comments: to appear in Future Professional Communication in Astronomy-II (FPCA-II) editors A. Heck and A. Accomazzi

arXiv:1006.2823 [pdf, ps, other]

doi 10.1086/657452

Empirical optical k-Corrections for redshifts <= 0.7

Authors: Eduard Westra, Margaret J. Geller, Michael J. Kurtz, Daniel G. Fabricant, Ian Dell'Antonio

Abstract: The Smithsonian Hectospec Lensing Survey (SHELS) is a magnitude limited spectroscopically complete survey for R<=21.0 covering 4 square degrees. SHELS provides a large sample (15,513) of flux calibrated spectra. The wavelength range covered by the spectra allows empirical determination of k-corrections for the g- and r-band from z=0 to ~0.68 and 0.33, respectively, based on large samples of spectr… ▽ More The Smithsonian Hectospec Lensing Survey (SHELS) is a magnitude limited spectroscopically complete survey for R<=21.0 covering 4 square degrees. SHELS provides a large sample (15,513) of flux calibrated spectra. The wavelength range covered by the spectra allows empirical determination of k-corrections for the g- and r-band from z=0 to ~0.68 and 0.33, respectively, based on large samples of spectra. We approximate the k-corrections using only two parameters in a standard way: Dn4000 and redshift. We use Dn4000 rather than the standard observed galaxy color because Dn4000 is a redshift independent tracer of the stellar population of the galaxy. Our approximations for the k-corrections using Dn4000 are as good as (or better than) those based on observed galaxy color (g-r) (sigma of the scatter is ~0.08 mag). The approximations for the k-corrections are available in an on-line calculator. Our results agree with previously determined analytical approximations from single stellar population (SSP) models fitted to multi-band optical and near-infrared photometry for galaxies with a known redshift. Galaxies with the smallest Dn4000-the galaxies with the youngest stellar populations-are always attenuated and/or contain contributions from older stellar populations. We use simple single SSP fits to the SHELS spectra to study the influence of emission lines on the k-correction. The effects of emission lines can be ignored for rest-frame equivalent widths <~ 100 A depending on required photometric accuracy. We also provide analytic approximations to the k-corrections determined from our model fits for z<=0.7 as a function of redshift and Dn4000 for ugriz and UBVRI (sigma of the scatter is typically ~0.10 mag). Again, the approximations using Dn4000 are as good (or better than) those based on a suitably chosen observed galaxy color. We provide all analytical approximations in an on-line calculator. △ Less

Submitted 28 December, 2010; v1 submitted 14 June, 2010; originally announced June 2010.

Comments: 48 pages in total (includes 19 figures, 25 tables). Published in PASP. Version with high resolution figures available at http://www.cfa.harvard.edu/~ewestra/publications/. Online calculator at http://tdc-www.cfa.harvard.edu/instruments/hectospec/progs/EOK/. Tables with coefficients differ slightly from first astro-ph version, results barely changed

arXiv:1005.2308 [pdf, ps, other]

doi 10.1007/978-1-4419-8369-5_14

Finding Your Literature Match -- A Recommender System

Authors: Edwin A. Henneken, Michael J. Kurtz, Alberto Accomazzi, Carolyn Grant, Donna Thompson, Elizabeth Bohlen, Giovanni Di Milia, Jay Luker, Stephen S. Murray

Abstract: The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still… ▽ More The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still result in overwhelming results. An overload of information has the intrinsic danger of scaring visitors away, and any organization, for-profit or not-for-profit, in the business of providing scholarly information wants to capture and keep the attention of its target audience. Publishers and search engine engineers alike will benefit from a service that is able to provide visitors with recommendations that closely meet their interests. Providing visitors with special deals, new options and highlights may be interesting to a certain degree, but what makes more sense (especially from a commercial point of view) than to let visitors do most of the work by the mere action of making choices? Hiring psychics is not an option, so a technological solution is needed to recommend items that a visitor is likely to be looking for. In this presentation we will introduce such a solution and argue that it is practically feasible to incorporate this approach into a useful addition to any information retrieval system with enough usage. △ Less

Submitted 13 May, 2010; originally announced May 2010.

Comments: Contribution to the proceedings of the colloquium Future Professional Communication in Astronomy II, 13-14 April 2010, Cambridge, Massachusetts. 11 pages, 4 figures.

arXiv:1005.1886 [pdf, other]

Towards a Resource-Centric Data Network for Astronomy

Authors: Alberto Accomazzi, Michael J. Kurtz, Stephen S. Murray

Abstract: Over the past decade, astronomers have been using an increasingly larger number of web-based applications and archives to conduct their research. However, despite the early success in creating links across projects and data centers, the promise of a single integrated digital library environment supporting e-science in astronomy has proven elusive. While some of the issues hampering progress in t… ▽ More Over the past decade, astronomers have been using an increasingly larger number of web-based applications and archives to conduct their research. However, despite the early success in creating links across projects and data centers, the promise of a single integrated digital library environment supporting e-science in astronomy has proven elusive. While some of the issues hampering progress in this area are of technical nature, others are rooted in existing policies which should be re-analyzed if further rapid progress is to be made in this area. This paper describes a proposal that the NASA Astrophysics Data System project has put forth in order to improve its role as one of the primary discovery portals for astronomers, focusing on those aspects which could benefit from an increased level of involvement from the community, namely the effort to expose astronomy resources as linked data, and the harvesting of observational metadata. △ Less

Submitted 11 May, 2010; originally announced May 2010.

Comments: 6 pages, 1 figure, proceedings of IAU Special Session 5, "Accelerating the Rate of Astronomical Discovery." To be published in Proceedings of Science

arXiv:1002.0386 [pdf, ps, other]

doi 10.1088/0004-6256/139/5/1857

Triggered Star Formation in Galaxy Pairs at z=0.08-0.38

Authors: Deborah Freedman Woods, Margaret J. Geller, Michael J. Kurtz, Eduard Westra, Daniel G. Fabricant, Ian Dell'Antonio

Abstract: We measure the strength, frequency, and timescale of tidally triggered star formation at redshift z=0.08-0.38 in a spectroscopically complete sample of galaxy pairs drawn from the magnitude-limited redshift survey of 9,825 Smithsonian Hectospec Lensing Survey (SHELS) galaxies with R<20.3. To examine the evidence for tidal triggering, we identify a volume-limited sample of major (|ΔM_R|<1.75, cor… ▽ More We measure the strength, frequency, and timescale of tidally triggered star formation at redshift z=0.08-0.38 in a spectroscopically complete sample of galaxy pairs drawn from the magnitude-limited redshift survey of 9,825 Smithsonian Hectospec Lensing Survey (SHELS) galaxies with R<20.3. To examine the evidence for tidal triggering, we identify a volume-limited sample of major (|ΔM_R|<1.75, corresponding to mass ratio >1/5) pair galaxies with $M_R < -20.8 in the redshift range z=0.08-0.31. The size and completeness of the spectroscopic survey allows us to focus on regions of low local density. The spectrophotometric calibration enables the use of the 4000 Ang break (D_n4000), the Hαspecific star formation rate (SSFR_{Hα}), and population models to characterize the galaxies. We show that D_n4000 is a useful population classification tool; it closely tracks the identification of emission line galaxies. The sample of major pair galaxies in regions of low local density with low D_n4000 demonstrates the expected anti-correlation between pair-wise projected separation and a set of star formation indicators explored in previous studies. We measure the frequency of triggered star formation by comparing the SSFR_{Hα} in the volume-limited sample in regions of low local density: 32 +/-7% of the major pair galaxies have SSFR_{Hα} at least double the median rate of the unpaired field galaxies. Comparison of stellar population models for pair and for unpaired field galaxies implies a timescale for triggered star formation of ~300-400 Myr. △ Less

Submitted 1 February, 2010; originally announced February 2010.

Comments: 25 pages, 15 figures. Accepted to AJ

arXiv:0912.5235 [pdf, ps, other]

Using Multipartite Graphs for Recommendation and Discovery

Authors: Michael J. Kurtz, Alberto Accomazzi, Edwin Henneken, Giovanni Di Milia, Carolyn S. Grant

Abstract: The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user. The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user. △ Less

Submitted 30 December, 2009; originally announced December 2009.

Comments: To appear in ADASS XIX, ASP Conf Proc

arXiv:0912.2364 [pdf, ps, other]

doi 10.1088/0004-637X/709/2/832

SHELS: Testing Weak Lensing Maps with Redshift Surveys

Authors: Margaret J. Geller, Michael J. Kurtz, Ian P. Dell'Antonio, Massimo Ramella, Daniel G. Fabricant

Abstract: Weak lensing surveys are emerging as an important tool for the construction of "mass selected" clusters of galaxies. We evaluate both the efficiency and completeness of a weak lensing selection by combining a dense, complete redshift survey, the Smithsonian Hectospec Lensing Survey (SHELS), with a weak lensing map from the Deep Lens Survey (DLS). SHELS includes 11,692 redshifts for galaxies with… ▽ More Weak lensing surveys are emerging as an important tool for the construction of "mass selected" clusters of galaxies. We evaluate both the efficiency and completeness of a weak lensing selection by combining a dense, complete redshift survey, the Smithsonian Hectospec Lensing Survey (SHELS), with a weak lensing map from the Deep Lens Survey (DLS). SHELS includes 11,692 redshifts for galaxies with R < 20.6 in the four square degree DLS field; the survey is a solid basis for identifying massive clusters of galaxies with redshift z < 0.55. The range of sensitivity of the redshift survey is similar to the range for the DLS convergence map. Only four the twelve convergence peaks with signal-to-noise > 3.5 correspond to clusters of galaxies with M > 1.7 x 10^14 solar masses. Four of the eight massive clusters in SHELS are detected in the weak lensing map yielding a completeness of roughly 50%. We examine the seven known extended cluster x-ray sources in the DLS field: three can be detected in the weak lensing map, three should not be detected without boosting from superposed large-scale structure, and one is mysteriously undetected even though its optical properties suggest that it should produce a detectable lensing signal. Taken together, these results underscore the need for more extensive comparisons among different methods of massive cluster identification. △ Less

Submitted 11 December, 2009; originally announced December 2009.

Comments: 34 pages, 16 figures, ApJ accepted

Journal ref: Astrophys.J.709:832-850,2010

arXiv:0911.0417 [pdf, ps, other]

doi 10.1088/0004-637X/708/1/534

Evolution of the Halpha luminosity function

Authors: Eduard Westra, Margaret J. Geller, Michael J. Kurtz, Daniel G. Fabricant, Ian Dell'Antonio

Abstract: The Smithsonian Hectospec Lensing Survey (SHELS) is a window on the star formation history over the last 4 Gyr. SHELS is a spectroscopically complete survey for Rtot < 20.3 over 4 square degrees. We use the 10k spectra to select a sample of pure star forming galaxies based on their Halpha emission line. We use the spectroscopy to determine extinction corrections for individual galaxies and to re… ▽ More The Smithsonian Hectospec Lensing Survey (SHELS) is a window on the star formation history over the last 4 Gyr. SHELS is a spectroscopically complete survey for Rtot < 20.3 over 4 square degrees. We use the 10k spectra to select a sample of pure star forming galaxies based on their Halpha emission line. We use the spectroscopy to determine extinction corrections for individual galaxies and to remove active galaxies in order to reduce systematic uncertainties. We use the large volume of SHELS with the depth of a narrowband survey for Halpha galaxies at z ~ 0.24 to make a combined determination of the Halpha luminosity function at z ~ 0.24. The large area covered by SHELS yields a survey volume big enough to determine the bright end of the Halpha luminosity function from redshift 0.100 to 0.377 for an assumed fixed faint-end slope alpha = -1.20. The bright end evolves: the characteristic luminosity L* increases by 0.84 dex over this redshift range. Similarly, the star formation density increases by 0.11 dex. The fraction of galaxies with a close neighbor increases by a factor of 2-5 for L(Halpha) >~ L* in each of the redshift bins. We conclude that triggered star formation is an important influence for star forming galaxies with Halpha emission. △ Less

Submitted 3 November, 2009; originally announced November 2009.

Comments: 26 pages, 23 figures, submitted to ApJ; version with high resolution figures available at http://www.cfa.harvard.edu/~ewestra/publications/

Journal ref: Astrophys.J.708:534-549,2010

arXiv:0909.4789 [pdf]

doi 10.1002/asi.20096

The Bibliometric Properties of Article Readership Information

Authors: Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Markus Demleitner, Stephen S. Murray, Nathalie Martimbeau, Barbara Elwell

Abstract: The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliome… ▽ More The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses (``reads'') of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads, and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences between the citation rate, the readership rate and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read-cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read-cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites, and develop two new bibliometric measures which have substantially less age bias than citations △ Less

Submitted 25 September, 2009; originally announced September 2009.

Comments: ADS bibcode: 2005JASIS..56..111K This is the second paper (the first is Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library) from the original article The NASA Astrophysics Data System: Sociology, Bibliometrics, and Impact, which went on-line in the summer of 2003

Journal ref: The Journal of the American Society for Information Science and Technology, Vol. 56, p. 111 (2005)

arXiv:0909.4786 [pdf]

doi 10.1002/asi.20095

Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library

Authors: Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn Grant, Markus Demleitner, Stephen S. Murray

Abstract: By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries. Using the ADS usage logs along with membership statistics from the International Astronomical Union and data o… ▽ More By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries. Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP) we develop an accurate model for world-wide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP. We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 FTE researchers, or $250 Million, or the astronomical research done in France. Subject headings: digital libraries; bibliometrics; sociology of science; information retrieval △ Less

Submitted 25 September, 2009; originally announced September 2009.

Comments: ADS bibcode: 2005JASIS..56...36K This is a portion (The bibliometric properties of article readership information is the other part) of the article: The NASA Astrophysics Data System: Sociology, bibliometrics and impact, which went on-line in the summer of 2003

Journal ref: The Journal of the American Society for Information Science and Technology, Vol. 56, p. 36. (2005)

Showing 1–50 of 101 results for author: Kurtz, M J