-
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method
Authors:
Jerson Francia,
Derek Hansen,
Ben Schooley,
Matthew Taylor,
Shydra Murray,
Greg Snow
Abstract:
This paper explores the rising concern of utilizing Large Language Models (LLMs) in spear phishing message generation, and their performance compared to human-authored counterparts. Our pilot study compares the effectiveness of smishing (SMS phishing) messages created by GPT-4 and human authors, which have been personalized to willing targets. The targets assessed the messages in a modified ranked…
▽ More
This paper explores the rising concern of utilizing Large Language Models (LLMs) in spear phishing message generation, and their performance compared to human-authored counterparts. Our pilot study compares the effectiveness of smishing (SMS phishing) messages created by GPT-4 and human authors, which have been personalized to willing targets. The targets assessed the messages in a modified ranked-order experiment using a novel methodology we call TRAPD (Threshold Ranking Approach for Personalized Deception). Specifically, targets provide personal information (job title and location, hobby, item purchased online), spear smishing messages are created using this information by humans and GPT-4, targets are invited back to rank-order 12 messages from most to least convincing (and identify which they would click on), and then asked questions about why they ranked messages the way they did. They also guess which messages are created by an LLM and their reasoning. Results from 25 targets show that LLM-generated messages are most often perceived as more convincing than those authored by humans, with messages related to jobs being the most convincing. We characterize different criteria used when assessing the authenticity of messages including word choice, style, and personal relevance. Results also show that targets were unable to identify whether the messages was AI-generated or human-authored and struggled to identify criteria to use in order to make this distinction. This study aims to highlight the urgent need for further research and improved countermeasures against personalized AI-enabled social engineering attacks.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050
Authors:
R. M. McGranaghan,
B. Thompson,
E. Camporeale,
J. Bortnik,
M. Bobra,
G. Lapenta,
S. Wing,
B. Poduval,
S. Lotz,
S. Murray,
M. Kirk,
T. Y. Chen,
H. M. Bain,
P. Riley,
B. Tremblay,
M. Cheung,
V. Delouille
Abstract:
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires…
▽ More
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
NVRadarNet: Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving
Authors:
Alexander Popov,
Patrik Gebhardt,
Ke Chen,
Ryan Oldja,
Heeseok Lee,
Shane Murray,
Ruchi Bhargava,
Nikolai Smolyanskiy
Abstract:
Detecting obstacles is crucial for safe and efficient autonomous driving. To this end, we present NVRadarNet, a deep neural network (DNN) that detects dynamic obstacles and drivable free space using automotive RADAR sensors. The network utilizes temporally accumulated data from multiple RADAR sensors to detect dynamic obstacles and compute their orientation in a top-down bird's-eye view (BEV). The…
▽ More
Detecting obstacles is crucial for safe and efficient autonomous driving. To this end, we present NVRadarNet, a deep neural network (DNN) that detects dynamic obstacles and drivable free space using automotive RADAR sensors. The network utilizes temporally accumulated data from multiple RADAR sensors to detect dynamic obstacles and compute their orientation in a top-down bird's-eye view (BEV). The network also regresses drivable free space to detect unclassified obstacles. Our DNN is the first of its kind to utilize sparse RADAR signals in order to perform obstacle and free space detection in real time from RADAR data only. The network has been successfully used for perception on our autonomous vehicles in real self-driving scenarios. The network runs faster than real time on an embedded GPU and shows good generalization across geographic regions.
△ Less
Submitted 1 March, 2023; v1 submitted 28 September, 2022;
originally announced September 2022.
-
Avoiding bias when inferring race using name-based approaches
Authors:
Diego Kozlowski,
Dakota S. Murray,
Alexis Bell,
Will Hulsey,
Vincent Larivière,
Thema Monroe-White,
Cassidy R. Sugimoto
Abstract:
Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors' race, few large scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about aut…
▽ More
Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors' race, few large scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about authors, such as their names, to infer their perceived race. As with any other algorithm, the process of racial inference can generate biases if it is not carefully considered. The goal of this article is to assess the extent to which algorithmic bias is introduced using different approaches for name based racial inference. We use information from the U.S. Census and mortgage applications to infer the race of U.S. affiliated authors in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name based inference varies by race/ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article lays the foundation for more systematic and less biased investigations into racial disparities in science.
△ Less
Submitted 12 October, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Mixed Likelihood Gaussian Process Latent Variable Model
Authors:
Samuel Murray,
Hedvig Kjellström
Abstract:
We present the Mixed Likelihood Gaussian process latent variable model (GP-LVM), capable of modeling data with attributes of different types. The standard formulation of GP-LVM assumes that each observation is drawn from a Gaussian distribution, which makes the model unsuited for data with e.g. categorical or nominal attributes. Our model, for which we use a sampling based variational inference, i…
▽ More
We present the Mixed Likelihood Gaussian process latent variable model (GP-LVM), capable of modeling data with attributes of different types. The standard formulation of GP-LVM assumes that each observation is drawn from a Gaussian distribution, which makes the model unsuited for data with e.g. categorical or nominal attributes. Our model, for which we use a sampling based variational inference, instead assumes a separate likelihood for each observed dimension. This formulation results in more meaningful latent representations, and give better predictive performance for real world data with dimensions of different types.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
Map** the world population one building at a time
Authors:
Tobias G. Tiecke,
Xianming Liu,
Amy Zhang,
Andreas Gros,
Nan Li,
Gregory Yetman,
Talip Kilic,
Siobhan Murray,
Brian Blankespoor,
Espen B. Prydz,
Hai-Anh H. Dang
Abstract:
High resolution datasets of population density which accurately map sparsely-distributed human populations do not exist at a global scale. Typically, population data is obtained using censuses and statistical modeling. More recently, methods using remotely-sensed data have emerged, capable of effectively identifying urbanized areas. Obtaining high accuracy in estimation of population distribution…
▽ More
High resolution datasets of population density which accurately map sparsely-distributed human populations do not exist at a global scale. Typically, population data is obtained using censuses and statistical modeling. More recently, methods using remotely-sensed data have emerged, capable of effectively identifying urbanized areas. Obtaining high accuracy in estimation of population distribution in rural areas remains a very challenging task due to the simultaneous requirements of sufficient sensitivity and resolution to detect very sparse populations through remote sensing as well as reliable performance at a global scale. Here, we present a computer vision method based on machine learning to create population maps from satellite imagery at a global scale, with a spatial sensitivity corresponding to individual buildings and suitable for global deployment. By combining this settlement data with census data, we create population maps with ~30 meter resolution for 18 countries. We validate our method, and find that the building identification has an average precision and recall of 0.95 and 0.91, respectively and that the population estimates have a standard error of a factor ~2 or less. Based on our data, we analyze 29 percent of the world population, and show that 99 percent lives within 36 km of the nearest urban cluster. The resulting high-resolution population datasets have applications in infrastructure planning, vaccination campaign planning, disaster response efforts and risk analysis such as high accuracy flood risk analysis.
△ Less
Submitted 15 December, 2017;
originally announced December 2017.
-
Real-Time Multiple Object Tracking - A Study on the Importance of Speed
Authors:
Samuel Murray
Abstract:
In this project, we implement a multiple object tracker, following the tracking-by-detection paradigm, as an extension of an existing method. It works by modelling the movement of objects by solving the filtering problem, and associating detections with predicted new locations in new frames using the Hungarian algorithm. Three different similarity measures are used, which use the location and shap…
▽ More
In this project, we implement a multiple object tracker, following the tracking-by-detection paradigm, as an extension of an existing method. It works by modelling the movement of objects by solving the filtering problem, and associating detections with predicted new locations in new frames using the Hungarian algorithm. Three different similarity measures are used, which use the location and shape of the bounding boxes. Compared to other trackers on the MOTChallenge leaderboard, our method, referred to as C++SORT, is the fastest non-anonymous submission, while also achieving decent score on other metrics. By running our model on the Okutama-Action dataset, sampled at different frame-rates, we show that the performance is greatly reduced when running the model - including detecting objects - in real-time. In most metrics, the score is reduced by 50%, but in certain cases as much as 90%. We argue that this indicates that other, slower methods could not be used for tracking in real-time, but that more research is required specifically on this.
△ Less
Submitted 2 October, 2017; v1 submitted 11 September, 2017;
originally announced September 2017.
-
CODE-RADE - Community Infrastructure for the Delivery of Physics Applications
Authors:
Bruce Becker,
Sean Murray
Abstract:
Scientific computing can in some sense be distilled to the execution of an application - or rather sets of applications which are combined into complex workflows. Due to the complexity and number both of scientific packages as well as computing platforms, delivering these applications to end users has always been a significant challenge through the grid era, and remains so in the cloud era. In thi…
▽ More
Scientific computing can in some sense be distilled to the execution of an application - or rather sets of applications which are combined into complex workflows. Due to the complexity and number both of scientific packages as well as computing platforms, delivering these applications to end users has always been a significant challenge through the grid era, and remains so in the cloud era. In this contribution we describe a platform for user-driven, continuous integration and delivery of research applications in a distributed environment - project CODE-RADE. Starting with 6 hypotheses describing the problem at hand, we put forward technical and social solutions to these. Combining widely-used and thoroughly-tested tools, we show how it is possible to manage the dependencies and configurations of a wide range of scientific applications, in an almost fully-automated way. The CODE-RADE platform is a means for develo** trust between public computing and data infrastructures on the one hand and various developer and scientific communities on the other hand. Predefined integration tests are specified for any new application, allowing the system to be user-driven. This greatly accelerates time-to-production for scientific applications, while reducing the workload for administrators of HPC, grid and cloud installations. Finally, we will give some insight into how this platform could be extended to address issues of reproducibility and collaboration in scientific research in Africa.
△ Less
Submitted 31 July, 2017;
originally announced July 2017.
-
Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection
Authors:
Mohammadamin Barekatain,
Miquel Martí,
Hsueh-Fu Shih,
Samuel Murray,
Kotaro Nakayama,
Yutaka Matsuo,
Helmut Prendinger
Abstract:
Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing i…
▽ More
Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.
△ Less
Submitted 15 June, 2017; v1 submitted 9 June, 2017;
originally announced June 2017.
-
ADS 2.0: new architecture, API and services
Authors:
Roman Chyla,
Alberto Accomazzi,
Alexandra Holachek,
Carolyn S. Grant,
Jonathan Elliott,
Edwin A. Henneken,
Donna M. Thompson,
Michael J. Kurtz,
Stephen S. Murray,
Vladimir Sudilovsky
Abstract:
The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using…
▽ More
The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADS's underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next.
△ Less
Submitted 19 March, 2015;
originally announced March 2015.
-
ADS: The Next Generation Search Platform
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Roman Chyla,
James Luker,
Carolyn S. Grant,
Donna M. Thompson,
Alexandra Holachek,
Rahul Dave,
Stephen S. Murray
Abstract:
Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently develo**. Starting in 2011, the ADS started to systematically collect, parse…
▽ More
Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently develo**. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/
△ Less
Submitted 13 March, 2015;
originally announced March 2015.
-
Computing and Using Metrics in the ADS
Authors:
Edwin A. Henneken,
Alberto Accomazzi,
Michael J. Kurtz,
Carolyn S. Grant,
Donna Thompson,
Jay Luker,
Roman Chyla,
Alexandra Holachek,
Stephen S. Murray
Abstract:
Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication…
▽ More
Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness.
△ Less
Submitted 17 June, 2014;
originally announced June 2014.
-
Finding Your Literature Match -- A Recommender System
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi,
Carolyn Grant,
Donna Thompson,
Elizabeth Bohlen,
Giovanni Di Milia,
Jay Luker,
Stephen S. Murray
Abstract:
The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still…
▽ More
The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still result in overwhelming results. An overload of information has the intrinsic danger of scaring visitors away, and any organization, for-profit or not-for-profit, in the business of providing scholarly information wants to capture and keep the attention of its target audience. Publishers and search engine engineers alike will benefit from a service that is able to provide visitors with recommendations that closely meet their interests. Providing visitors with special deals, new options and highlights may be interesting to a certain degree, but what makes more sense (especially from a commercial point of view) than to let visitors do most of the work by the mere action of making choices? Hiring psychics is not an option, so a technological solution is needed to recommend items that a visitor is likely to be looking for. In this presentation we will introduce such a solution and argue that it is practically feasible to incorporate this approach into a useful addition to any information retrieval system with enough usage.
△ Less
Submitted 13 May, 2010;
originally announced May 2010.
-
The Bibliometric Properties of Article Readership Information
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Markus Demleitner,
Stephen S. Murray,
Nathalie Martimbeau,
Barbara Elwell
Abstract:
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliome…
▽ More
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses (``reads'') of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads, and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences between the citation rate, the readership rate and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read-cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read-cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites, and develop two new bibliometric measures which have substantially less age bias than citations
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Markus Demleitner,
Stephen S. Murray
Abstract:
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data o…
▽ More
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP) we develop an accurate model for world-wide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP.
We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 FTE researchers, or $250 Million, or the astronomical research done in France.
Subject headings: digital libraries; bibliometrics; sociology of science; information retrieval
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
The Smithsonian/NASA Astrophysics Data System (ADS) Decennial Report
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Stephen S. Murray
Abstract:
Eight years after the ADS first appeared the last decadal survey wrote: "NASA's initiative for the Astrophysics Data System has vastly increased the accessibility of the scientific literature for astronomers. NASA deserves credit for this valuable initiative and is urged to continue it." Here we summarize some of the changes concerning the ADS which have occurred in the past ten years, and we de…
▽ More
Eight years after the ADS first appeared the last decadal survey wrote: "NASA's initiative for the Astrophysics Data System has vastly increased the accessibility of the scientific literature for astronomers. NASA deserves credit for this valuable initiative and is urged to continue it." Here we summarize some of the changes concerning the ADS which have occurred in the past ten years, and we describe the current status of the ADS. We then point out two areas where the ADS is building an improved capability which could benefit from a policy statement of support in the ASTRO2010 report. These are: The Semantic Interlinking of Astronomy Observations and Datasets and The Indexing of the Full Text of Astronomy Research Publications.
△ Less
Submitted 18 March, 2009;
originally announced March 2009.
-
Use of Astronomical Literature - A Report on Usage Patterns
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways…
▽ More
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world.
The ADS is funded by NASA Grant NNG06GG68G.
△ Less
Submitted 3 October, 2008; v1 submitted 1 August, 2008;
originally announced August 2008.
-
Finding Astronomical Communities Through Co-readership Analysis
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the r…
▽ More
Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the relationship ``person i and person j have read the same paper''. Using the data logs of the NASA/Smithsonian Astrophysics Data System (ADS), we first determine the population that will participate by requiring that a user queries the ADS at a certain rate. Next, we apply the relationship to this population. The result of this will be an abstract ``relationship space'', which we will describe in terms of various ``representations''. Examples of such ``representations'' are the projection of co-read vectors onto Principal Components and the spectral density of the co-read network. We will show that the co-read relationship results in structure, we will describe this structure and we will provide a first attempt in the classification of this structure in terms of astronomical communities.
The ADS is funded by NASA Grant NNG06GG68G.
△ Less
Submitted 5 January, 2007;
originally announced January 2007.
-
Paper to Screen: Processing Historical Scans in the ADS
Authors:
Donna M. Thompson,
Alberto Accomazzi,
Guenther Eichhorn,
Carolyn Grant,
Edwin Henneken,
Michael J. Kurtz,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
The NASA Astrophysics Data System in conjunction with the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics is working on a project to microfilm historical observatory publications. The microfilm is then scanned for inclusion in the ADS. The ADS currently contains over 700,000 scanned pages of volumes of historical literature. Many of these volumes lack clear pagination or other…
▽ More
The NASA Astrophysics Data System in conjunction with the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics is working on a project to microfilm historical observatory publications. The microfilm is then scanned for inclusion in the ADS. The ADS currently contains over 700,000 scanned pages of volumes of historical literature. Many of these volumes lack clear pagination or other bibliographic data that are necessary to take advantage of the searching capabilities of the ADS. This paper will address some of the interesting challenges that needed to be resolved during the processing of the Observatory Reports included in the ADS.
△ Less
Submitted 5 October, 2006;
originally announced October 2006.
-
Data in the ADS -- Understanding How to Use it Better
Authors:
Carolyn S. Grant,
Alberto Accomazzi,
Donna Thompson,
Edwin Henneken,
Guenther Eichhorn,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
The Smithsonian/NASA ADS Abstract Service contains a wealth of data for astronomers and librarians alike, yet the vast majority of usage consists of rudimentary searches. Hints on how to obtain more focused search results by using more of the various capabilities of the ADS are presented, including searching by affiliation. We also discuss the classification of articles by content and by referee…
▽ More
The Smithsonian/NASA ADS Abstract Service contains a wealth of data for astronomers and librarians alike, yet the vast majority of usage consists of rudimentary searches. Hints on how to obtain more focused search results by using more of the various capabilities of the ADS are presented, including searching by affiliation. We also discuss the classification of articles by content and by referee status.
The ADS is funded by NASA Grant NNG06GG68G-16613687.
△ Less
Submitted 5 October, 2006;
originally announced October 2006.
-
Creation and use of Citations in the ADS
Authors:
Alberto Accomazzi,
Gunther Eichhorn,
Michael J. Kurtz,
Carolyn S. Grant,
Edwin Henneken,
Markus Demleitner,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data u…
▽ More
With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data used in the creation of citations, the policies and procedures that we follow to avoid double-counting and to eliminate contributions which may not be scholarly in nature. Finally, we describe how users and institutions can easily obtain quantitative citation data from the ADS, both interactively and via web-based programming tools.
The ADS is available at http://ads.harvard.edu.
△ Less
Submitted 3 October, 2006;
originally announced October 2006.
-
Connectivity in the Astronomy Digital Library
Authors:
Günther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Edwin A. Henneken,
Donna M. Thompson,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
The Astrophysics Data System (ADS) provides an extensive system of links between the literature and other on-line information. Recently, the journals of the American Astronomical Society (AAS) and a group of NASA data centers have collaborated to provide more links between on-line data obtained by space missions and the on-line journals. Authors can now specify which data sets they have used in…
▽ More
The Astrophysics Data System (ADS) provides an extensive system of links between the literature and other on-line information. Recently, the journals of the American Astronomical Society (AAS) and a group of NASA data centers have collaborated to provide more links between on-line data obtained by space missions and the on-line journals. Authors can now specify which data sets they have used in their article. This information is used by the participants to provide the links between the literature and the data.
The ADS is available at: http://ads.harvard.edu
△ Less
Submitted 2 October, 2006;
originally announced October 2006.
-
Full Text Searching in the Astrophysics Data System
Authors:
Günther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Edwin A. Henneken,
Donna M. Thompson,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
The Smithsonian/NASA Astrophysics Data System (ADS) provides a search system for the astronomy and physics scholarly literature. All major and many smaller astronomy journals that were published on paper have been scanned back to volume 1 and are available through the ADS free of charge. All scanned pages have been converted to text and can be searched through the ADS Full Text Search System. In…
▽ More
The Smithsonian/NASA Astrophysics Data System (ADS) provides a search system for the astronomy and physics scholarly literature. All major and many smaller astronomy journals that were published on paper have been scanned back to volume 1 and are available through the ADS free of charge. All scanned pages have been converted to text and can be searched through the ADS Full Text Search System. In addition, searches can be fanned out to several external search systems to include the literature published in electronic form. Results from the different search systems are combined into one results list.
The ADS Full Text Search System is available at: http://adsabs.harvard.edu/fulltext_service.html
△ Less
Submitted 5 October, 2006; v1 submitted 2 October, 2006;
originally announced October 2006.
-
E-prints and Journal Articles in Astronomy: a Productive Co-existence
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Simeon Warner,
Paul Ginsparg,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data Sy…
▽ More
Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data System drops to zero. This suggests that the majority of astronomers have access to institutional subscriptions and that they choose to read the journal article when given the choice. Within the NASA Astrophysics Data System they are given this choice, because the e-print and the journal article are treated equally, since both are just one click away. In other words, the e-prints have not undermined journal use in the astrophysics community and thus currently do not pose a financial threat to the publishers. We present readership data for the arXiv category "astro-ph" and the 4 core journals in astronomy (Astrophysical Journal, Astronomical Journal, Monthly Notices of the Royal Astronomical Society and Astronomy & Astrophysics). Furthermore, we show that the half-life (the point where the use of an article drops to half the use of a newly published article) for an e-print is shorter than for a journal paper.
The ADS is funded by NASA Grant NNG06GG68G. arXiv receives funding from NSF award #0404553
△ Less
Submitted 22 September, 2006;
originally announced September 2006.
-
The Future of Technical Libraries
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Edwin Henneken,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Technical libraries are currently experiencing very rapid change. In the near future their mission will change, their physical nature will change, and the skills of their employees will change. While some will not be able to make these changes, and will fail, others will lead us into a new era.
Technical libraries are currently experiencing very rapid change. In the near future their mission will change, their physical nature will change, and the skills of their employees will change. While some will not be able to make these changes, and will fail, others will lead us into a new era.
△ Less
Submitted 28 September, 2006;
originally announced September 2006.
-
myADS-arXiv - a Tailor-Made, Open Access, Virtual Journal
Authors:
E. Henneken,
M. J. Kurtz,
G. Eichhorn,
A. Accomazzi,
C. S. Grant,
D. Thompson,
E. Bohlen,
S. S. Murray
Abstract:
The myADS-arXiv service provides the scientific community with a one stop shop for staying up-to-date with a researcher's field of interest. The service provides a powerful and unique filter on the enormous amount of bibliographic information added to the ADS on a daily basis. It also provides a complete view with the most relevant papers available in the subscriber's field of interest. With thi…
▽ More
The myADS-arXiv service provides the scientific community with a one stop shop for staying up-to-date with a researcher's field of interest. The service provides a powerful and unique filter on the enormous amount of bibliographic information added to the ADS on a daily basis. It also provides a complete view with the most relevant papers available in the subscriber's field of interest. With this service, the subscriber will get to know the lastest developments, popular trends and the most important papers. This makes the service not only unique from a technical point of view, but also from a content point of view. On this poster we will argue why myADS-arXiv is a tailor-made, open access, virtual journal and we will illustrate its unique character.
△ Less
Submitted 4 August, 2006;
originally announced August 2006.
-
Effect of E-printing on Citation Rates in Astronomy and Physics
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Donna Thompson,
Stephen S. Murray
Abstract:
In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsoni…
▽ More
In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsonian Astrophysics Data System (ADS; Kurtz et al., 1993, 2000), we confirm the findings from other studies, we examine the average citation rate to e-printed papers in the Astrophysical Journal, and we show that for a number of major astronomy and physics journals the most important papers are submitted to the arXiv e-print repository first.
△ Less
Submitted 5 June, 2006; v1 submitted 13 April, 2006;
originally announced April 2006.
-
Bibliographic Classification using the ADS Databases
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Guenther Eichhorn,
Edwin Henneken,
Carolyn S. Grant,
Markus Demleitner,
Stephen S. Murray
Abstract:
We discuss two techniques used to characterize bibliographic records based on their similarity to and relationship with the contents of the NASA Astrophysics Data System (ADS) databases. The first method has been used to classify input text as being relevant to one or more subject areas based on an analysis of the frequency distribution of its individual words. The second method has been used to…
▽ More
We discuss two techniques used to characterize bibliographic records based on their similarity to and relationship with the contents of the NASA Astrophysics Data System (ADS) databases. The first method has been used to classify input text as being relevant to one or more subject areas based on an analysis of the frequency distribution of its individual words. The second method has been used to classify existing records as being relevant to one or more databases based on the distribution of the papers citing them. Both techniques have proven to be valuable tools in assigning new and existing bibliographic records to different disciplines within the ADS databases.
△ Less
Submitted 31 October, 2005;
originally announced November 2005.
-
The Effect of Use and Access on Citations
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Markus Demleitner,
Edwin Henneken,
Stephen S. Murray
Abstract:
It has been shown (S. Lawrence, 2001, Nature, 411, 521) that journal articles which have been posted without charge on the internet are more heavily cited than those which have not been. Using data from the NASA Astrophysics Data System (ads.harvard.edu) and from the ArXiv e-print archive at Cornell University (arXiv.org) we examine the causes of this effect.
It has been shown (S. Lawrence, 2001, Nature, 411, 521) that journal articles which have been posted without charge on the internet are more heavily cited than those which have not been. Using data from the NASA Astrophysics Data System (ads.harvard.edu) and from the ArXiv e-print archive at Cornell University (arXiv.org) we examine the causes of this effect.
△ Less
Submitted 14 March, 2005;
originally announced March 2005.
-
Automated Resolution of Noisy Bibliographic References
Authors:
Markus Demleitner,
Michael Kurtz,
Alberto Accomazzi,
Günther Eichhorn,
Carolyn S. Grant,
Steven S. Murray
Abstract:
We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against the database provides u…
▽ More
We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against the database provides unsatisfactory results. Instead, we propose a method that allows a controlled merging of correction, parsing and matching, inspired by dependency grammars. We also report on the effectiveness of various heuristics that we have employed to improve recall.
△ Less
Submitted 27 January, 2004;
originally announced January 2004.
-
Run Control and Monitor System for the CMS Experiment
Authors:
M. Bellato,
L. Berti,
V. Brigljevic,
G. Bruno,
E. Cano,
S. Cittolin,
A. Csilling,
S. Erhan,
D. Gigi,
F. Glege,
R. Gomez-Reino,
M. Gulmini,
J. Gutleber,
C. Jacobs,
M. Kozlovszky,
H. Larsen,
I. Magrans,
G. Maron,
F. Meijers,
E. Meschi,
S. Murray,
A. Oh,
L. Orsini,
L. Pollet,
A. Racz
, et al. (8 additional authors not shown)
Abstract:
The Run Control and Monitor System (RCMS) of the CMS experiment is the set of hardware and software components responsible for controlling and monitoring the experiment during data-taking. It provides users with a "virtual counting room", enabling them to operate the experiment and to monitor detector status and data quality from any point in the world. This paper describes the architecture of t…
▽ More
The Run Control and Monitor System (RCMS) of the CMS experiment is the set of hardware and software components responsible for controlling and monitoring the experiment during data-taking. It provides users with a "virtual counting room", enabling them to operate the experiment and to monitor detector status and data quality from any point in the world. This paper describes the architecture of the RCMS with particular emphasis on its scalability through a distributed collection of nodes arranged in a tree-based hierarchy. The current implementation of the architecture in a prototype RCMS used in test beam setups, detector validations and DAQ demonstrators is documented. A discussion of the key technologies used, including Web Services, and the results of tests performed with a 128-node system are presented.
△ Less
Submitted 18 June, 2003;
originally announced June 2003.