-
Designing an Evaluation Framework for Large Language Models in Astronomy Research
Authors:
John F. Wu,
Alina Hyk,
Kiera McCormick,
Christine Ye,
Simone Astarita,
Elina Baral,
Jo Ciuca,
Jesse Cranney,
Anjalie Field,
Kartheik Iyer,
Philipp Koehn,
Jenn Kotler,
Sandor Kruk,
Michelle Ntampaka,
Charles O'Neill,
Joshua E. G. Peek,
Sanjib Sharma,
Mikaeel Yunus
Abstract:
Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy rese…
▽ More
Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy researchers interact with LLMs. We deploy a Slack chatbot that can answer queries from users via Retrieval-Augmented Generation (RAG); these responses are grounded in astronomy papers from arXiv. We record and anonymize user questions and chatbot answers, user upvotes and downvotes to LLM responses, user feedback to the LLM, and retrieved documents and similarity scores with the query. Our data collection method will enable future dynamic evaluations of LLM tools for astronomy.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Design and Evaluation of a Tutor Platform for Personalized Vocabulary Learning
Authors:
Ravi Kokku,
Aditya Vempaty,
Tamer Abuelsaad,
Prasenjit Dey,
Tammy Humphrey,
Akimi Gibson,
Jennifer Kotler
Abstract:
This paper presents our experiences in designing, implementing, and piloting an intelligent vocabulary learning tutor. The design builds on several intelligent tutoring design concepts, including graph-based knowledge representation, learner modeling, and adaptive learning content and assessment exposition. Specifically, we design a novel phased learner model approach to enable systematic exposure…
▽ More
This paper presents our experiences in designing, implementing, and piloting an intelligent vocabulary learning tutor. The design builds on several intelligent tutoring design concepts, including graph-based knowledge representation, learner modeling, and adaptive learning content and assessment exposition. Specifically, we design a novel phased learner model approach to enable systematic exposure to words during vocabulary instruction. We also built an example application over the tutor platform that uses a learning activity involving videos and an assessment activity involving word to picture/image association. More importantly, the tutor adapts to the significant variation in children's knowledge at the beginning of kindergarten, and evolves the application at the speed of each individual learner. A pilot study with 180 kindergarten learners allowed the tutor to collect various kinds of activity information suitable for insights and interventions both at an individual- and class-level. The effort also demonstrates that we can do A/B testing for a variety of hypotheses at scale with such a framework.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Angular decay coefficients of $J/ψ$ mesons at forward rapidity from $p+p$ collisions at $\sqrt{s}=510$ GeV
Authors:
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
M. Alfred,
V. Andrieux,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. T. Atomssa,
T. C. Awes,
C. Ayuso,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont
, et al. (365 additional authors not shown)
Abstract:
We report the first measurement of the full angular distribution for inclusive $J/ψ\rightarrowμ^{+}μ^{-}$ decays in $p$$+$$p$ collisions at $\sqrt{s}=510$ GeV. The measurements are made for $J/ψ$ transverse momentum $2<p_{T}<10$ GeV/$c$ and rapidity $1.2<y<2.2$ in the Helicity, Collins-Soper, and Gottfried-Jackson reference frames. In all frames the polar coefficient $λ_θ$ is strongly negative at…
▽ More
We report the first measurement of the full angular distribution for inclusive $J/ψ\rightarrowμ^{+}μ^{-}$ decays in $p$$+$$p$ collisions at $\sqrt{s}=510$ GeV. The measurements are made for $J/ψ$ transverse momentum $2<p_{T}<10$ GeV/$c$ and rapidity $1.2<y<2.2$ in the Helicity, Collins-Soper, and Gottfried-Jackson reference frames. In all frames the polar coefficient $λ_θ$ is strongly negative at low $p_{T}$ and becomes close to zero at high $p_{T}$, while the azimuthal coefficient $λ_φ$ is close to zero at low $p_{T}$, and becomes slightly negative at higher $p_{T}$. The frame-independent coefficient $\tildeλ$ is strongly negative at all $p_{T}$ in all frames. The data are compared to the theoretical predictions provided by nonrelativistic quantum chromodynamics models.
△ Less
Submitted 12 April, 2017; v1 submitted 20 December, 2016;
originally announced December 2016.
-
Measurement of the relative yields of $ψ(2S)$ to $ψ(1S)$ mesons produced at forward and backward rapidity in $p$$+$$p$, $p$$+$Al, $p$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
M. Alfred,
V. Andrieux,
K. Aoki,
N. Apadula,
H. Asano,
C. Ayuso,
B. Azmoun,
V. Babintsev,
M. Bai,
N. S. Bandara,
B. Bannier,
K. N. Barish,
S. Bathe,
A. Bazilevsky,
M. Beaumier,
S. Beckman,
R. Belmont,
A. Berdnikov,
Y. Berdnikov,
D. S. Blau,
M. Boer
, et al. (336 additional authors not shown)
Abstract:
The PHENIX Collaboration has measured the ratio of the yields of $ψ(2S)$ to $ψ(1S)$ mesons produced in $p$$+$$p$, $p$$+$Al, $p$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV over the forward and backward rapidity intervals $1.2<|y|<2.2$. We find that the ratio in $p$$+$$p$ collisions is consistent with measurements at other collision energies. In collisions with nuclei, we find…
▽ More
The PHENIX Collaboration has measured the ratio of the yields of $ψ(2S)$ to $ψ(1S)$ mesons produced in $p$$+$$p$, $p$$+$Al, $p$$+$Au, and $^{3}$He$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV over the forward and backward rapidity intervals $1.2<|y|<2.2$. We find that the ratio in $p$$+$$p$ collisions is consistent with measurements at other collision energies. In collisions with nuclei, we find that in the forward ($p$-going or $^{3}$He-going) direction, the relative yield of $ψ(2S)$ mesons to $ψ(1S)$ mesons is consistent with the value measured in \pp collisions. However, in the backward (nucleus-going) direction, the $ψ(2S)$ is preferentially suppressed by a factor of $\sim$2. This suppression is attributed in some models to breakup of the weakly-bound $ψ(2S)$ through final state interactions with comoving particles, which have a higher density in the nucleus-going direction. These breakup effects may compete with color screening in a deconfined quark-gluon plasma to produce sequential suppression of excited quarkonia states.
△ Less
Submitted 31 January, 2017; v1 submitted 21 September, 2016;
originally announced September 2016.