Search | arXiv e-print repository

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Authors: Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

Abstract: We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in… ▽ More We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices. △ Less

Submitted 15 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 46 pages, 31 figures, ICML '24

ACM Class: I.2.7

arXiv:2310.03193 [pdf]

The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices

Authors: Hancheng Cao, Jesse Dodge, Kyle Lo, Daniel A. McFarland, Lucy Lu Wang

Abstract: In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and co… ▽ More In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and computer science to analyze the adoption of data and method link-sharing practices over time and their impact on article reception. To identify links to data and methods, we train a neural text classification model to automatically classify URL types based on contextual mentions in papers. We find evidence that the practice of link-sharing to methods and data is spreading as more papers include such URLs over time. Reproducibility efforts may also be spreading because the same links are being increasingly reused across papers (especially in computer science); and these links are increasingly concentrated within fewer web domains (e.g. Github) over time. Lastly, articles that share data and method links receive increased recognition in terms of citation count, with a stronger effect when the shared links are active (rather than defunct). Together, these findings demonstrate the increased spread and perceived value of data and method sharing practices in open science. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.01783 [pdf, other]

Can large language models provide useful feedback on research papers? A large-scale empirical analysis

Authors: Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Smith, Yian Yin, Daniel McFarland, James Zou

Abstract: Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production and intricate knowledge specialization challenge the conventional scientific feedback mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who are more junior or from under-resourced settings have especially hard times getting timely feedback. With the brea… ▽ More Expert feedback lays the foundation of rigorous research. However, the rapid growth of scholarly production and intricate knowledge specialization challenge the conventional scientific feedback mechanisms. High-quality peer reviews are increasingly difficult to obtain. Researchers who are more junior or from under-resourced settings have especially hard times getting timely feedback. With the breakthrough of large language models (LLM) such as GPT-4, there is growing interest in using LLMs to generate scientific feedback on research manuscripts. However, the utility of LLM-generated feedback has not been systematically studied. To address this gap, we created an automated pipeline using GPT-4 to provide comments on the full PDFs of scientific papers. We evaluated the quality of GPT-4's feedback through two large-scale studies. We first quantitatively compared GPT-4's generated feedback with human peer reviewer feedback in 15 Nature family journals (3,096 papers in total) and the ICLR machine learning conference (1,709 papers). The overlap in the points raised by GPT-4 and by human reviewers (average overlap 30.85% for Nature journals, 39.23% for ICLR) is comparable to the overlap between two human reviewers (average overlap 28.58% for Nature journals, 35.25% for ICLR). The overlap between GPT-4 and human reviewers is larger for the weaker papers. We then conducted a prospective user study with 308 researchers from 110 US institutions in the field of AI and computational biology to understand how researchers perceive feedback generated by our GPT-4 system on their own papers. Overall, more than half (57.4%) of the users found GPT-4 generated feedback helpful/very helpful and 82.4% found it more beneficial than feedback from at least some human reviewers. While our findings show that LLM-generated feedback can help researchers, we also identify several limitations. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2301.13431 [pdf, other]

doi 10.1145/3544548.3581108

Breaking Out of the Ivory Tower: A Large-scale Analysis of Patent Citations to HCI Research

Authors: Hancheng Cao, Yujie Lu, Yuting Deng, Daniel A. McFarland, Michael S. Bernstein

Abstract: What is the impact of human-computer interaction research on industry? While it is impossible to track all research impact pathways, the growing literature on translational research impact measurement offers patent citations as one measure of how industry recognizes and draws on research in its inventions. In this paper, we perform a large-scale measurement study primarily of 70,000 patent citatio… ▽ More What is the impact of human-computer interaction research on industry? While it is impossible to track all research impact pathways, the growing literature on translational research impact measurement offers patent citations as one measure of how industry recognizes and draws on research in its inventions. In this paper, we perform a large-scale measurement study primarily of 70,000 patent citations to premier HCI research venues, tracing how HCI research are cited in United States patents over the last 30 years. We observe that 20.1% of papers from these venues, including 60--80% of papers at UIST and 13% of papers in a broader dataset of SIGCHI-sponsored venues overall, are cited by patents -- far greater than premier venues in science overall (9.7%) and NLP (11%). However, the time lag between a patent and its paper citations is long (10.5 years) and getting longer, suggesting that HCI research and practice may not be efficiently connected. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Comments: accepted to CHI 2023

arXiv:2204.10676 [pdf, other]

A Bayesian actor-oriented multilevel relational event model with hypothesis testing procedures

Authors: Fabio Vieira, Roger Leenders, Daniel McFarland, Joris Mulder

Abstract: Relational event network data are becoming increasingly available. Consequently, statistical models for such data have also surfaced. These models mainly focus on the analysis of single networks, while in many applications, multiple independent event sequences are observed, which are likely to display similar social interaction dynamics. Furthermore, statistical methods for testing hypotheses abou… ▽ More Relational event network data are becoming increasingly available. Consequently, statistical models for such data have also surfaced. These models mainly focus on the analysis of single networks, while in many applications, multiple independent event sequences are observed, which are likely to display similar social interaction dynamics. Furthermore, statistical methods for testing hypotheses about social interaction behavior are underdeveloped. Therefore, the contribution of the current paper is twofold. First, we present a multilevel extension of the dynamic actor-oriented model, which allows researchers to model sender and receiver processes separately. The multilevel formulation enables principled probabilistic borrowing of information across networks to accurately estimate drivers of social dynamics. Second, a flexible methodology is proposed to test hypotheses about common and heterogeneous social interaction drivers across relational event sequences. Social interaction data between children and teachers in classrooms are used to showcase the methodology. △ Less

Submitted 7 June, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

arXiv:2012.07571 [pdf, ps, other]

doi 10.1103/PhysRevLett.126.121801

Study of the $K_L \!\to\! π^0 ν\overlineν$ Decay at the J-PARC KOTO Experiment

Authors: KOTO Collaboration, J. K. Ahn, B. Beckford, M. Campbell, S. H. Chen, J. Comfort, K. Dona, M. S. Farrington, K. Hanai, N. Hara, H. Haraguchi, Y. B. Hsiung, M. Hutcheson, T. Inagaki, M. Isoe, I. Kamiji, T. Kato, E. J. Kim, J. L. Kim, H. M. Kim, T. K. Komatsubara, K. Kotera, S. K. Lee, J. W. Lee, G. Y. Lim , et al. (46 additional authors not shown)

Abstract: The rare decay $K_L \!\to\! π^0 ν\overlineν$ was studied with the dataset taken at the J-PARC KOTO experiment in 2016, 2017, and 2018. With a single event sensitivity of $( 7.20 \pm 0.05_{\rm stat} \pm 0.66_{\rm syst} ) \times 10^{-10}$, three candidate events were observed in the signal region. After unveiling them, contaminations from $K^{\pm}$ and scattered $K_L$ decays were studied, and the to… ▽ More The rare decay $K_L \!\to\! π^0 ν\overlineν$ was studied with the dataset taken at the J-PARC KOTO experiment in 2016, 2017, and 2018. With a single event sensitivity of $( 7.20 \pm 0.05_{\rm stat} \pm 0.66_{\rm syst} ) \times 10^{-10}$, three candidate events were observed in the signal region. After unveiling them, contaminations from $K^{\pm}$ and scattered $K_L$ decays were studied, and the total number of background events was estimated to be $1.22 \pm 0.26$. We conclude that the number of observed events is statistically consistent with the background expectation. For this dataset, we set an upper limit of $4.9 \times 10^{-9}$ on the branching fraction of $K_L \!\to\! π^0 ν\overlineν$ at the 90% confidence level. △ Less

Submitted 24 March, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

Comments: 6 pages, 5 figures; published version, no change in the results. Fig. 2 and Fig. 4 were revised. In Fig. 5, the plot of the $P_{t}$ versus ${Z_{\mathrm{vtx}}}$ of the beam-halo $K_L\!\to\!2γ$ background events was added as Fig. 5 (b)

Journal ref: Phys. Rev. Lett. 126, 121801 (2021)

arXiv:2010.06657 [pdf, other]

Will This Idea Spread Beyond Academia? Understanding Knowledge Transfer of Scientific Concepts across Text Corpora

Authors: Hancheng Cao, Mengjie Cheng, Zhepeng Cen, Daniel A. McFarland, Xiang Ren

Abstract: What kind of basic research ideas are more likely to get applied in practice? There is a long line of research investigating patterns of knowledge transfer, but it generally focuses on documents as the unit of analysis and follow their transfer into practice for a specific scientific domain. Here we study translational research at the level of scientific concepts for all scientific fields. We do t… ▽ More What kind of basic research ideas are more likely to get applied in practice? There is a long line of research investigating patterns of knowledge transfer, but it generally focuses on documents as the unit of analysis and follow their transfer into practice for a specific scientific domain. Here we study translational research at the level of scientific concepts for all scientific fields. We do this through text mining and predictive modeling using three corpora: 38.6 million paper abstracts, 4 million patent documents, and 0.28 million clinical trials. We extract scientific concepts (i.e., phrases) from corpora as instantiations of "research ideas", create concept-level features as motivated by literature, and then follow the trajectories of over 450,000 new concepts (emerged from 1995-2014) to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials. Results from our analysis suggest several mechanisms that distinguish which scientific concept will be adopted in practice, and which will not. We also demonstrate that our derived features can be used to explain and predict knowledge transfer with high accuracy. Our work provides greater understanding of knowledge transfer for researchers, practitioners, and government agencies interested in encouraging translational research. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: EMNLP 2020 Findings

arXiv:2007.13469 [pdf]

A Preliminary Investigation in the Molecular Basis of Host Shutoff Mechanism in SARS-CoV

Authors: Niharika Pandala, Casey A. Cole, Devaun McFarland, Anita Nag, Homayoun Valafar

Abstract: Recent events leading to the worldwide pandemic of COVID-19 have demonstrated the effective use of genomic sequencing technologies to establish the genetic sequence of this virus. In contrast, the COVID-19 pandemic has demonstrated the absence of computational approaches to understand the molecular basis of this infection rapidly. Here we present an integrated approach to the study of the nsp1 pro… ▽ More Recent events leading to the worldwide pandemic of COVID-19 have demonstrated the effective use of genomic sequencing technologies to establish the genetic sequence of this virus. In contrast, the COVID-19 pandemic has demonstrated the absence of computational approaches to understand the molecular basis of this infection rapidly. Here we present an integrated approach to the study of the nsp1 protein in SARS-CoV-1, which plays an essential role in maintaining the expression of viral proteins and further disabling the host protein expression, also known as the host shutoff mechanism. We present three independent methods of evaluating two potential binding sites speculated to participate in host shutoff by nsp1. We have combined results from computed models of nsp1, with deep mining of all existing protein structures (using PDBMine), and binding site recognition (using msTALI) to examine the two sites consisting of residues 55-59 and 73-80. Based on our preliminary results, we conclude that the residues 73-80 appear as the regions that facilitate the critical initial steps in the function of nsp1. Given the 90% sequence identity between nsp1 from SARS-CoV-1 and SARS-CoV-2, we conjecture the same critical initiation step in the function of COVID-19 nsp1. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: Consists of 9 pages, 8 figures and 7 tables. 11th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics 2020

arXiv:2006.14918 [pdf, ps, other]

doi 10.1103/PhysRevD.102.051103

First Search for the $K_L \to π^0 γ$ Decay

Authors: J. K. Ahn, B. Beckford, M. Campbell, S. H. Chen, J. M. Choi, J. Comfort, K. Dona, M. S. Farrington, N. Hara, H. Haraguchi, Y. B. Hsiung, M. Hutcheson, T. Inagaki, M. Isoe, I. Kamiji, E. J. Kim, J. L. Kim, H. M. Kim, T. K. Komatsubara, K. Kotera, J. W. Lee, G. Y. Lim, C. Lin, Q. S. Lin, Y. Luo , et al. (37 additional authors not shown)

Abstract: We report the first search for the $K_L \to π^0 γ$ decay, which is forbidden by Lorentz invariance, using the data from 2016 to 2018 at the J-PARC KOTO experiment. With a single event sensitivity of $(7.1\pm 0.3_{\rm stat.} \pm 1.6_{\rm syst.})\times 10^{-8}$, no candidate event was observed in the signal region. The upper limit on the branching fraction was set to be $1.7\times 10^{-7}$ at the 90… ▽ More We report the first search for the $K_L \to π^0 γ$ decay, which is forbidden by Lorentz invariance, using the data from 2016 to 2018 at the J-PARC KOTO experiment. With a single event sensitivity of $(7.1\pm 0.3_{\rm stat.} \pm 1.6_{\rm syst.})\times 10^{-8}$, no candidate event was observed in the signal region. The upper limit on the branching fraction was set to be $1.7\times 10^{-7}$ at the 90\% confidence level. △ Less

Submitted 24 March, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

Journal ref: Phys. Rev. D 102, 051103 (2020)

arXiv:2005.10739 [pdf]

Assessing the Precision and Recall of msTALI as Applied to an Active-Site Study on Fold Families

Authors: Devaun McFarland, Homayoun Valafar

Abstract: Proteins execute various activities required by biological cells. Further, they structurally support and pro-mote important biochemical reactions which functionally are sparked by active-sites. Active-sites are regions where reac-tions and binding events take place directly; they foster pro-tein purpose. Describing functional relationships depends on factors that incorporate sequence, structure, a… ▽ More Proteins execute various activities required by biological cells. Further, they structurally support and pro-mote important biochemical reactions which functionally are sparked by active-sites. Active-sites are regions where reac-tions and binding events take place directly; they foster pro-tein purpose. Describing functional relationships depends on factors that incorporate sequence, structure, and the biochem-ical properties of amino acids that form proteins. Our ap-proach to active-site description is computational, and many other approaches utilizing available protein data fall short of ideal. Successful recognition of functional interactions is cru-cial to advancements in protein annotation and the bioinfor-matics field at large. This research outlines our Multiple Structure Torsion Angle Alignment (msTALI) as a suitable strategy for addressing active-site identification by comparing results to other existing methods. Specifically, we address the precision of msTALI across three protein families. Our target proteins are PDBIDs 1A2B, 1B4V, 1B8S, 1COY, 1CXZ, 3COX, 1D7E, 1DPF, 1F9I, 1FTN, 1IJH, 1KOU, 1NWZ, 2PHY, and 1SIC. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: 8 pages, 3 figures, 5 tables. This is an extended version of a similar abridged or short version used in conference

arXiv:2004.01291 [pdf]

Map** Three Decades of Intellectual Change in Academia

Authors: Daniel Ramage, Christopher D. Manning, Daniel A. McFarland

Abstract: Research on the development of science has focused on the creation of multidisciplinary teams. However, while this coming together of people is symmetrical, the ideas, methods, and vocabulary of science have a directional flow. We present a statistical model of the text of dissertation abstracts from 1980 to 2010, revealing for the first time the large-scale flow of language across fields. Results… ▽ More Research on the development of science has focused on the creation of multidisciplinary teams. However, while this coming together of people is symmetrical, the ideas, methods, and vocabulary of science have a directional flow. We present a statistical model of the text of dissertation abstracts from 1980 to 2010, revealing for the first time the large-scale flow of language across fields. Results of the analysis include identifying methodological fields that export broadly, emerging topical fields that borrow heavily and expand, and old topical fields that grow insular and retract. Particular findings show a growing split between molecular and ecological forms of biology and a sea change in the humanities and social sciences driven by the rise of gender and ethnic studies. △ Less

Submitted 18 June, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

Comments: 10 pages and 6 figures plus appendix of 5 pages and 1 figure

arXiv:1909.02063 [pdf]

doi 10.1073/pnas.1915378117

The Diversity-Innovation Paradox in Science

Authors: Bas Hofstra, Vivek V. Kulkarni, Sebastian Munoz-Najar Galvez, Bryan He, Dan Jurafsky, Daniel A. McFarland

Abstract: Prior work finds a diversity paradox: diversity breeds innovation, and yet, underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-population of ~1.2 million US doctoral recipients from 1977-2015 and following their careers into publishing and faculty positions. We use… ▽ More Prior work finds a diversity paradox: diversity breeds innovation, and yet, underrepresented groups that diversify organizations have less successful careers within them. Does the diversity paradox hold for scientists as well? We study this by utilizing a near-population of ~1.2 million US doctoral recipients from 1977-2015 and following their careers into publishing and faculty positions. We use text analysis and machine learning to answer a series of questions: How do we detect scientific innovations? Are underrepresented groups more likely to generate scientific innovations? And are the innovations of underrepresented groups adopted and rewarded? Our analyses show that underrepresented groups produce higher rates of scientific novelty. However, their novel contributions are devalued and discounted: e.g., novel contributions by gender and racial minorities are taken up by other scholars at lower rates than novel contributions by gender and racial majorities, and equally impactful contributions of gender and racial minorities are less likely to result in successful scientific careers than for majority groups. These results suggest there may be unwarranted reproduction of stratification in academic careers that discounts diversity's role in innovation and partly explains the underrepresentation of some groups in academia. △ Less

Submitted 15 January, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

Comments: Updated paper; tightened up terminology, added better theoretical explanation, tested for a mechanism in the updated paper, added robustness analyses, updated and improved metrics across the board

arXiv:1810.09655 [pdf, ps, other]

doi 10.1103/PhysRevLett.122.021802

Search for $K_L \!\to\! π^0 ν\overlineν$ and $K_L \!\to\! π^0 X^0$ Decays at the J-PARC KOTO Experiment

Authors: KOTO Collaboration, J. K. Ahn, B. Beckford, J. Beechert, K. Bryant, M. Campbell, S. H. Chen, J. Comfort, K. Dona, N. Hara, H. Haraguchi, Y. B. Hsiung, M. Hutcheson, T. Inagaki, I. Kamiji, N. Kawasaki, E. J. Kim, J. L. Kim, Y. J. Kim, J. W. Ko, T. K. Komatsubara, K. Kotera, A. S. Kurilin, J. W. Lee, G. Y. Lim , et al. (45 additional authors not shown)

Abstract: A search for the rare decay $K_L \!\to\! π^0 ν\overlineν$ was performed. With the data collected in 2015, corresponding to $2.2 \times 10^{19}$ protons on target, a single event sensitivity of $( 1.30 \pm 0.01_{\rm stat} \pm 0.14_{\rm syst} ) \times 10^{-9}$ was achieved and no candidate events were observed. We set an upper limit of $3.0 \times 10^{-9}$ for the branching fraction of… ▽ More A search for the rare decay $K_L \!\to\! π^0 ν\overlineν$ was performed. With the data collected in 2015, corresponding to $2.2 \times 10^{19}$ protons on target, a single event sensitivity of $( 1.30 \pm 0.01_{\rm stat} \pm 0.14_{\rm syst} ) \times 10^{-9}$ was achieved and no candidate events were observed. We set an upper limit of $3.0 \times 10^{-9}$ for the branching fraction of $K_L \!\to\! π^0 ν\overlineν$ at the 90% confidence level (C.L.), which improved the previous limit by almost an order of magnitude. An upper limit for $K_L \!\to\! π^0 X^0$ was also set as $2.4 \times 10^{-9}$ at the 90% C.L., where $X^0$ is an invisible boson with a mass of $135~{\rm MeV}/c^2$. △ Less

Submitted 26 February, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

Comments: 6 pages, 4 figures; published version, no change in the results. Fig. 1, Fig. 2, and Fig. 3 were revised, and Fig. 4 presenting the $K_L \!\to\! π^0 X^0$ limit as a function of the $X^0$ mass was added

Journal ref: Phys. Rev. Lett. 122, 021802 (2019)

arXiv:1609.03637 [pdf, other]

doi 10.1093/ptep/ptx001

A new search for the $K_{L} \to π^0 ν\overlineν$ and $K_{L} \to π^{0} X^{0}$ decays

Authors: J. K. Ahn, K. Y. Baek, S. Banno, B. Beckford, B. Brubaker, T. Cai, M. Campbell, C. Carruth, S. H. Chen, S. Chu, J. Comfort, Y. T. Duh, T. Furukawa, H. Haraguchi, T. Hineno, Y. B. Hsiung, M. Hutcheson, T. Inagaki, M. Isoe, E. Iwai, T. Kamibayashi, I. Kamiji, N. Kawasaki, E. J. Kim, Y. J. Kim , et al. (66 additional authors not shown)

Abstract: We searched for the $CP$-violating rare decay of neutral kaon, $K_{L} \to π^0 ν\overlineν$, in data from the first 100 hours of physics running in 2013 of the J-PARC KOTO experiment. One candidate event was observed while $0.34\pm0.16$ background events were expected. We set an upper limit of $5.1\times10^{-8}$ for the branching fraction at the 90\% confidence level (C.L.). An upper limit of… ▽ More We searched for the $CP$-violating rare decay of neutral kaon, $K_{L} \to π^0 ν\overlineν$, in data from the first 100 hours of physics running in 2013 of the J-PARC KOTO experiment. One candidate event was observed while $0.34\pm0.16$ background events were expected. We set an upper limit of $5.1\times10^{-8}$ for the branching fraction at the 90\% confidence level (C.L.). An upper limit of $3.7\times10^{-8}$ at the 90\% C.L. for the $K_{L} \to π^{0} X^{0}$decay was also set for the first time, where $X^{0}$ is an invisible particle with a mass of 135 MeV/$c^{2}$. △ Less

Submitted 27 December, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

Comments: 11 pages, 5 figures, to be published in PTEP

Journal ref: Prog. Theor. Exp. Phys. (2017) 021C01

arXiv:1609.00435 [pdf, other]

Citation Classification for Behavioral Analysis of a Scientific Field

Authors: David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, Dan Jurafsky

Abstract: Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars. However, our understanding of citation behavior has been limited to small-scale manual citation analysis. We perform the largest behavioral study of citations to date, analyzing how citations are both framed and taken up by scholars in one e… ▽ More Citations are an important indicator of the state of a scientific field, reflecting how authors frame their work, and influencing uptake by future scholars. However, our understanding of citation behavior has been limited to small-scale manual citation analysis. We perform the largest behavioral study of citations to date, analyzing how citations are both framed and taken up by scholars in one entire field: natural language processing. We introduce a new dataset of nearly 2,000 citations annotated for function and centrality, and use it to develop a state-of-the-art classifier and label the entire ACL Reference Corpus. We then study how citations are framed by authors and use both papers and online traces to track how citations are followed by readers. We demonstrate that authors are sensitive to discourse structure and publication venue when citing, that online readers follow temporal links to previous and future work rather than methodological links, and that how a paper cites related work is predictive of its citation count. Finally, we use changes in citation roles to show that the field of NLP is undergoing a significant increase in consensus. △ Less

Submitted 1 September, 2016; originally announced September 2016.

arXiv:1606.09305 [pdf]

Experimental Study of Nonlinear Resonances and Anti-resonances in a Forced, Ordered Granular Chain

Authors: Yi**g Zhang, Dmitry Pozharskiy, D. Michael McFarland, Panayotis G. Kevrekidis, Ioannis G. Kevrekidis, Alexander F. Vakakis

Abstract: We experimentally study a one-dimensional uncompressed granular chain composed of a finite number of identical spherical beads with Hertzian interactions. The chain is harmonically excited by an amplitude- and frequency-dependent boundary drive at its left end and has a fixed boundary at its right end. Such ordered granular media represent an interesting new class of nonlinear acoustic metamateria… ▽ More We experimentally study a one-dimensional uncompressed granular chain composed of a finite number of identical spherical beads with Hertzian interactions. The chain is harmonically excited by an amplitude- and frequency-dependent boundary drive at its left end and has a fixed boundary at its right end. Such ordered granular media represent an interesting new class of nonlinear acoustic metamaterials, since they exhibit essentially nonlinear acoustics and have been designated as 'sonic vacua' due to the fact that their corresponding speed of sound (as defined in classical acoustics) is zero.This paves the way for essentially nonlinear and energy-dependent acoustics with no counterparts in linear theory. We experimentally detect time-periodic, strongly nonlinear resonances whereby the particles (beads) of the granular chain respond at integer multiples of the excitation period, and which correspond to local peaks of the maximum transmitted force at the chain's right, fixed end.In between these resonances we detect a local minimum of the maximum transmitted forces corresponding to an anti-resonance in the stationary-state dynamics. The experimental results of this work confirm previous theoretical predictions, and verify the existence of strongly nonlinear resonance responses in a system with a complete absence of any linear spectrum; as such, the experimentally detected nonlinear resonance spectrum is passively tunable with energy and sensitive to dissipative effects such as internal structural dam** in the beads, and friction or plasticity effects. The experimental results are verified by direct numerical simulations and by numerical stability analysis. △ Less

Submitted 29 June, 2016; originally announced June 2016.

arXiv:1507.01025 [pdf, other]

doi 10.1103/PhysRevE.92.063203

Nonlinear Resonances and Antiresonances of a Forced Sonic Vacuum

Authors: D. Pozharskiy, Y. Zhang, M. O. Williams, D. M. McFarland, P. G. Kevrekidis, A. F. Vakakis, I. G. Kevrekidis

Abstract: We consider a harmonically driven acoustic medium in the form of a (finite length) highly nonlinear granular crystal with an amplitude and frequency dependent boundary drive. Remarkably, despite the absence of a linear spectrum in the system, we identify resonant periodic propagation whereby the crystal responds at integer multiples of the drive period, and observe that they can lead to local maxi… ▽ More We consider a harmonically driven acoustic medium in the form of a (finite length) highly nonlinear granular crystal with an amplitude and frequency dependent boundary drive. Remarkably, despite the absence of a linear spectrum in the system, we identify resonant periodic propagation whereby the crystal responds at integer multiples of the drive period, and observe that they can lead to local maxima of transmitted force at its fixed boundary. In addition, we identify and discuss minima of the transmitted force ("antiresonances") in between these resonances. Representative one-parameter complex bifurcation diagrams involve period doublings, Neimark-Sacker bifurcations as well as multiple isolas (e.g. of period-3, -4 or -5 solutions entrained by the forcing). We combine them in a more detailed, two-parameter bifurcation diagram describing the stability of such responses to both frequency and amplitude variations of the drive. This picture supports an unprecedented example of a notion of a (purely) "nonlinear spectrum" in a system which allows no sound wave propagation (due to zero sound speed: the so-called sonic vacuum). We rationalize this behavior in terms of purely nonlinear building blocks: apparent traveling and standing nonlinear waves. △ Less

Submitted 3 July, 2015; originally announced July 2015.

Journal ref: Phys. Rev. E 92, 063203 (2015)

arXiv:1406.3907 [pdf, other]

doi 10.1109/TNS.2015.2417312

The Data Acquisition System for the KOTO Experiment

Authors: Yasuyuki Sugiyama, Jia Xu, Monica Tecchio, Nikola Whallon, Duncan McFarland, Jiasen Ma, Manabu Togawa, Yasuhisa Tajima, Mircea Bogdan, Jon Ameel, Myron Campbell, Yau Wai Wah, Joseph Comfort, Taku Yamanaka

Abstract: We developed and built a new system of readout and trigger electronics, based on the waveform digitization and pipeline readout, for the KOTO experiment at J-PARC, Japan. KOTO aims at observing the rare kaon decay $K_{L}\rightarrowπ^{0}ν\barν$. A total of 4000 readout channels from various detector subsystems are digitized by 14-bit 125-MHz ADC modules equipped with a 10-pole Bessel filter in orde… ▽ More We developed and built a new system of readout and trigger electronics, based on the waveform digitization and pipeline readout, for the KOTO experiment at J-PARC, Japan. KOTO aims at observing the rare kaon decay $K_{L}\rightarrowπ^{0}ν\barν$. A total of 4000 readout channels from various detector subsystems are digitized by 14-bit 125-MHz ADC modules equipped with a 10-pole Bessel filter in order to reduce the pile-up effects. The trigger decision is made every 8-ns using the digitized waveform information. To avoid dead time, the ADC and trigger modules have pipelines in their FPGA chips to store data while waiting for the trigger decision. The KOTO experiment performed the first physics run in May 2013. The data acquisition system worked stably during the run. △ Less

Submitted 16 June, 2014; originally announced June 2014.

Comments: 5 pages,12 figures, Transactions on Nuclear Science, Proceedings of the 19th Real Time Conference, Preprint

arXiv:1403.6870 [pdf, other]

A modified ziggurat algorithm for generating exponentially- and normally-distributed pseudorandom numbers

Authors: Christopher D McFarland

Abstract: The Ziggurat Algorithm is a very fast rejection sampling method for generating PseudoRandom Numbers (PRNs) from common statistical distributions. The algorithm divides a distribution into rectangular layers that stack on top of each other (resembling a Ziggurat), subsuming the desired distribution. Random values within these rectangular layers are then sampled by rejection. This implementation spl… ▽ More The Ziggurat Algorithm is a very fast rejection sampling method for generating PseudoRandom Numbers (PRNs) from common statistical distributions. The algorithm divides a distribution into rectangular layers that stack on top of each other (resembling a Ziggurat), subsuming the desired distribution. Random values within these rectangular layers are then sampled by rejection. This implementation splits layers into two types: those constituting the majority that fall completely under the distribution and can be sampled extremely fast without a rejection test, and a few additional layers that encapsulate the fringe of the distribution and require a rejection test. This method offers speedups of 65% for exponentially- and 82% for normally-distributed PRNs when compared to the best available C implementations of these generators. Even greater speedups are obtained when the algorithm is extended to the Python and MATLAB/OCTAVE programming environments. △ Less

Submitted 21 April, 2014; v1 submitted 26 March, 2014; originally announced March 2014.

arXiv:1312.1416 [pdf, other]

doi 10.1016/j.nima.2013.12.001

Flash ADC data processing with correlation coefficients

Authors: D. Blyth, M. Gibson, D. Mcfarland, J. R. Comfort

Abstract: The large growth of flash ADC techniques for processing signals, especially in applications of streaming data, raises issues such as data flow through an acquisition system, long-term storage, and greater complexity in data analysis. In addition, experiments that push the limits of sensitivity need to distinguish legitimate signals from noise. The use of correlation coefficients is examined to add… ▽ More The large growth of flash ADC techniques for processing signals, especially in applications of streaming data, raises issues such as data flow through an acquisition system, long-term storage, and greater complexity in data analysis. In addition, experiments that push the limits of sensitivity need to distinguish legitimate signals from noise. The use of correlation coefficients is examined to address these issues. They are found to be quite successful well into the noise region. The methods can also be extended to Field Programmable Gate Array modules for compressing the data flow and greatly enhancing the event rate capabilities. △ Less

Submitted 4 December, 2013; originally announced December 2013.

Comments: 9 pages, 2 figures

arXiv:1208.6068 [pdf]

doi 10.1073/pnas.1213968110

The impact of deleterious passenger mutations on cancer progression

Authors: Christopher D McFarland, Gregory V Kryukov, Shamil Sunyaev, Leonid Mirny

Abstract: Cancer progression is driven by a small number of genetic alterations accumulating in a neoplasm. These few driver alterations reside in a cancer genome alongside tens of thousands of other mutations that are widely believed to have no role in cancer and termed passengers. Many passengers, however, fall within protein coding genes and other functional elements and can possibly have deleterious eff… ▽ More Cancer progression is driven by a small number of genetic alterations accumulating in a neoplasm. These few driver alterations reside in a cancer genome alongside tens of thousands of other mutations that are widely believed to have no role in cancer and termed passengers. Many passengers, however, fall within protein coding genes and other functional elements and can possibly have deleterious effects on cancer cells. Here we investigate a potential of mildly deleterious passengers to accumulate and alter the course of neoplastic progression. Our approach combines evolutionary simulations of cancer progression with the analysis of cancer sequencing data. In our simulations, individual cells stochastically divide, acquire advantageous driver and deleterious passenger mutations, or die. Surprisingly, despite selection against them, passengers accumulate and largely evade selection during progression. Although individually weak, the collective burden of passengers alters the course of progression leading to several phenomena observed in oncology that cannot be explained by a traditional driver-centric view. We tested predictions of the model using cancer genomic data. We find that many passenger mutations are likely to be damaging and that, in agreement with the model, they have largely evaded purifying selection. Finally, we used our model to explore cancer treatments that exploit the load of passengers by either 1) increasing the mutation rate; or 2) exacerbating their deleterious effects. While both approaches lead to cancer regression, the later leads to less frequent relapse. Our results suggest a new framework for understanding cancer progression as a balance of driver and passenger mutations. △ Less

Submitted 29 August, 2012; originally announced August 2012.

Comments: 10 pages, 4 figures and Supplemental Information

arXiv:1207.7306 [pdf, ps, other]

Hierarchical Models for Relational Event Sequences

Authors: Christopher DuBois, Carter T. Butts, Daniel McFarland, Padhraic Smyth

Abstract: Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating… ▽ More Interaction within small groups can often be represented as a sequence of events, where each event involves a sender and a recipient. Recent methods for modeling network data in continuous time model the rate at which individuals interact conditioned on the previous history of events as well as actor covariates. We present a hierarchical extension for modeling multiple such sequences, facilitating inferences about event-level dynamics and their variation across sequences. The hierarchical approach allows one to share information across sequences in a principled manner---we illustrate the efficacy of such sharing through a set of prediction experiments. After discussing methods for adequacy checking and model selection for this class of models, the method is illustrated with an analysis of high school classroom dynamics. △ Less

Submitted 31 July, 2012; originally announced July 2012.

arXiv:1004.3351 [pdf, other]

Citing for High Impact

Authors: Xiaolin Shi, Jure Leskovec, Daniel A. McFarland

Abstract: The question of citation behavior has always intrigued scientists from various disciplines. While general citation patterns have been widely studied in the literature we develop the notion of citation projection graphs by investigating the citations among the publications that a given paper cites. We investigate how patterns of citations vary between various scientific disciplines and how such pat… ▽ More The question of citation behavior has always intrigued scientists from various disciplines. While general citation patterns have been widely studied in the literature we develop the notion of citation projection graphs by investigating the citations among the publications that a given paper cites. We investigate how patterns of citations vary between various scientific disciplines and how such patterns reflect the scientific impact of the paper. We find that idiosyncratic citation patterns are characteristic for low impact papers; while narrow, discipline-focused citation patterns are common for medium impact papers. Our results show that crossing-community, or bridging citation patters are high risk and high reward since such patterns are characteristic for both low and high impact papers. Last, we observe that recently citation networks are trending toward more bridging and interdisciplinary forms. △ Less

Submitted 20 April, 2010; originally announced April 2010.

Comments: 10 pages, 6 figures, 1 table

ACM Class: H.3.7; H.4.0

Showing 1–23 of 23 results for author: Mcfarland, D