Skip to main content

Showing 1–23 of 23 results for author: Dolfi, M

.
  1. arXiv:2406.19102  [pdf, other

    cs.CL cs.AI cs.IR

    Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs

    Authors: Lokesh Mishra, Sohayl Dhibi, Yusik Kim, Cesar Berrospi Ramis, Shubham Gupta, Michele Dolfi, Peter Staar

    Abstract: Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuable quantitative information through tables. Unfortunately, extracting this information is difficult due to high variability in the table structure as… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted at the NLP4Climate workshop in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  2. arXiv:2405.10725  [pdf, other

    cs.CL cs.IR

    INDUS: Effective and Efficient Language Models for Scientific Applications

    Authors: Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kaylin Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grezes, Megan Ansdell, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics,… ▽ More

    Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  3. ESG Accountability Made Easy: DocQA at Your Service

    Authors: Lokesh Mishra, Cesar Berrospi, Kasper Dinkla, Diego Antognini, Francesco Fusco, Benedikt Bothur, Maksym Lysak, Nikolaos Livathinos, Ahmed Nassar, Panagiotis Vagenas, Lucas Morin, Christoph Auer, Michele Dolfi, Peter Staar

    Abstract: We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted at the Demonstration Track of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI 24)

    Journal ref: AAAI 2024, 38, 23814-23816

  4. ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

    Authors: Christoph Auer, Ahmed Nassar, Maksym Lysak, Michele Dolfi, Nikolaos Livathinos, Peter Staar

    Abstract: Transforming documents into machine-processable representations is a challenging task due to their complex structures and variability in formats. Recovering the layout structure and content from PDF files or scanned material has remained a key problem for decades. ICDAR has a long tradition in hosting competitions to benchmark the state-of-the-art and encourage the development of novel solutions t… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: ICDAR 2023, 10 pages, 4 figures

  5. arXiv:2209.03648  [pdf, other

    cs.CV

    FETA: Towards Specializing Foundation Models for Expert Task Applications

    Authors: Amit Alfassy, Assaf Arbelle, Oshri Halimi, Sivan Harary, Roei Herzig, Eli Schwartz, Rameswar Panda, Michele Dolfi, Christoph Auer, Kate Saenko, PeterW. J. Staar, Rogerio Feris, Leonid Karlinsky

    Abstract: Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e.g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail… ▽ More

    Submitted 19 December, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

  6. DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis

    Authors: Birgit Pfitzmann, Christoph Auer, Michele Dolfi, Ahmed S Nassar, Peter W J Staar

    Abstract: Accurate document layout analysis is a key requirement for high-quality PDF document conversion. With the recent availability of public, large ground-truth datasets such as PubLayNet and DocBank, deep-learning models have proven to be very effective at layout detection and segmentation. While these datasets are of adequate size to train such models, they severely lack in layout variability since t… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: 9 pages, 6 figures, 5 tables. Accepted paper at SIGKDD 2022 conference

  7. Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness

    Authors: Christoph Auer, Michele Dolfi, André Carvalho, Cesar Berrospi Ramis, Peter W. J. Staar

    Abstract: Document understanding is a key business process in the data-driven economy since documents are central to knowledge discovery and business insights. Converting documents into a machine-processable format is a particular challenge here due to their huge variability in formats and complex structure. Accordingly, many algorithms and machine-learning methods emerged to solve particular tasks such as… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 11 pages, 7 figures, to be published in IEEE CLOUD 2022

    ACM Class: I.7.5; I.2.1; C.1.4; C.4

  8. arXiv:2102.09395  [pdf, other

    cs.LG cs.CV cs.IR

    Robust PDF Document Conversion Using Recurrent Neural Networks

    Authors: Nikolaos Livathinos, Cesar Berrospi, Maksym Lysak, Viktor Kuropiatnyk, Ahmed Nassar, Andre Carvalho, Michele Dolfi, Christoph Auer, Kasper Dinkla, Peter Staar

    Abstract: The number of published PDF documents has increased exponentially in recent decades. There is a growing need to make their rich content discoverable to information retrieval tools. In this paper, we present a novel approach to document structure recovery in PDF using recurrent neural networks to process the low-level PDF data representation directly, instead of relying on a visual re-interpretatio… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: 9 pages, 2 tables, 4 figures, uses aaai21.sty. Accepted at the "Thirty-Third Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-21)". Received the "IAAI-21 Innovative Application Award"

    ACM Class: I.7.5; I.5.1; I.5.2; I.5.4; I.5.5; I.2.1

  9. arXiv:1907.08400  [pdf, other

    cs.IR cs.LG

    An Information Extraction and Knowledge Graph Platform for Accelerating Biochemical Discoveries

    Authors: Matteo Manica, Christoph Auer, Valery Weber, Federico Zipoli, Michele Dolfi, Peter Staar, Teodoro Laino, Costas Bekas, Akihiro Fujita, Hiroki Toda, Shuichi Hirose, Yasumitsu Orii

    Abstract: Information extraction and data mining in biochemical literature is a daunting task that demands resource-intensive computation and appropriate means to scale knowledge ingestion. Being able to leverage this immense source of technical information helps to drastically reduce costs and time to solution in multiple application fields from food safety to pharmaceutics. We present a scalable document… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Comments: 4 pages, 1 figure, Workshop on Applied Data Science for Healthcare at KDD, Anchorage, AK, 2019

  10. Understanding repulsively mediated superconductivity of correlated electrons via massively parallel DMRG

    Authors: Adrian Kantian, Michele Dolfi, Matthias Troyer, Thierry Giamarchi

    Abstract: The so-called minimal models of unconventional superconductivity are lattice models of interacting electrons derived from materials in which electron pairing arises from purely repulsive interactions. Showing unambiguously that a minimal model actually can have a superconducting ground state remains a challenge at nonperturbative interactions. We make a significant step in this direction by comput… ▽ More

    Submitted 6 August, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: Final published version: 18 pages including appendix, 11 figures

    Journal ref: Phys. Rev. B 100, 075138 (2019)

  11. arXiv:1806.02284  [pdf, other

    cs.DL cs.CV cs.DC

    Corpus Conversion Service: A Machine Learning Platform to Ingest Documents at Scale

    Authors: Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas

    Abstract: Over the past few decades, the amount of scientific articles and technical literature has increased exponentially in size. Consequently, there is a great need for systems that can ingest these documents at scale and make the contained knowledge discoverable. Unfortunately, both the format of these documents (e.g. the PDF format or bitmap images) as well as the presentation of the data (e.g. comple… ▽ More

    Submitted 24 May, 2018; originally announced June 2018.

    Comments: Accepted paper at KDD 2018 conference

  12. arXiv:1805.09687  [pdf, other

    cs.DL cs.CL cs.CV cs.DC cs.IR

    Corpus Conversion Service: A machine learning platform to ingest documents at scale [Poster abstract]

    Authors: Peter W J Staar, Michele Dolfi, Christoph Auer, Costas Bekas

    Abstract: Over the past few decades, the amount of scientific articles and technical literature has increased exponentially in size. Consequently, there is a great need for systems that can ingest these documents at scale and make their content discoverable. Unfortunately, both the format of these documents (e.g. the PDF format or bitmap images) as well as the presentation of the data (e.g. complex tables)… ▽ More

    Submitted 15 May, 2018; originally announced May 2018.

    Comments: Accepted in SysML 2018 (www.sysml.cc)

  13. arXiv:1607.06352  [pdf, ps, other

    cond-mat.quant-gas cond-mat.str-el physics.comp-ph

    Density redistribution effects in fermionic optical lattices

    Authors: Medha Soni, Michele Dolfi, Matthias Troyer

    Abstract: We simulate a one dimensional fermionic optical lattice to analyse heating due to non-adiabatic lattice loading. Our simulations reveal that, similar to the bosonic case, density redistribution effects are the major cause of heating in harmonic traps. We suggest protocols to modulate the local density distribution during the process of lattice loading, in order to reduce the excess energy. Our num… ▽ More

    Submitted 21 July, 2016; originally announced July 2016.

    Comments: 10 pages, 16 pages

    Journal ref: Phys. Rev. A 94, 063404 (2016)

  14. arXiv:1510.02026  [pdf, other

    physics.comp-ph cond-mat.str-el physics.chem-ph

    An Efficient Matrix Product Operator Representation of the Quantum-Chemical Hamiltonian

    Authors: Sebastian Keller, Michele Dolfi, Matthias Troyer, Markus Reiher

    Abstract: We describe how to efficiently construct the quantum chemical Hamiltonian operator in matrix product form. We present its implementation as a density matrix renormalization group (DMRG) algorithm for quantum chemical applications in a purely matrix product based framework. Existing implementations of DMRG for quantum chemistry are based on the traditional formulation of the method, which was devel… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Comments: 11 pages, 7 figures

    Journal ref: J. Chem. Phys. 143, 244118 (2015)

  15. arXiv:1509.04709  [pdf, other

    cond-mat.str-el cond-mat.supr-con physics.comp-ph

    Pair Correlations in Doped Hubbard Ladders

    Authors: Michele Dolfi, Bela Bauer, Sebastian Keller, Matthias Troyer

    Abstract: Hubbard ladders are an important step** stone to the physics of the two-dimensional Hubbard model. While many of their properties are accessible to numerical and analytical techniques, the question of whether weakly hole-doped Hubbard ladders are dominated by superconducting or charge-density-wave correlations has so far eluded a definitive answer. In particular, previous numerical simulations o… ▽ More

    Submitted 20 November, 2015; v1 submitted 15 September, 2015; originally announced September 2015.

    Comments: 10 pages, 12 figures, data analysis included

    Journal ref: Phys. Rev. B 92, 195139 (2015)

  16. arXiv:1410.5829  [pdf, other

    cond-mat.quant-gas cond-mat.str-el

    Minimizing nonadiabaticities in optical-lattice loading

    Authors: Michele Dolfi, Adrian Kantian, Bela Bauer, Matthias Troyer

    Abstract: In the quest to reach lower temperatures of ultra-cold gases in optical lattice experiments, non-adiabaticites during lattice loading are one of the limiting factors that prevent the same low temperatures to be reached as in experiments without lattice. Simulating the loading of a bosonic quantum gas into a one-dimensional optical lattice with and without a trap, we find that the redistribution of… ▽ More

    Submitted 23 March, 2015; v1 submitted 21 October, 2014; originally announced October 2014.

    Comments: 6 pages, 7 figures

    Journal ref: Phys. Rev. A 91, 033407 (2015)

  17. arXiv:1407.0872  [pdf, other

    cond-mat.str-el physics.comp-ph

    Matrix Product State applications for the ALPS project

    Authors: Michele Dolfi, Bela Bauer, Sebastian Keller, Alexandr Kosenkov, Timothée Ewart, Adrian Kantian, Thierry Giamarchi, Matthias Troyer

    Abstract: The density-matrix renormalization group method has become a standard computational approach to the low-energy physics as well as dynamics of low-dimensional quantum systems. In this paper, we present a new set of applications, available as part of the ALPS package, that provide an efficient and flexible implementation of these methods based on a matrix-product state (MPS) representation. Our appl… ▽ More

    Submitted 14 October, 2014; v1 submitted 3 July, 2014; originally announced July 2014.

    Comments: 11+5 pages, 8 figures, 2 examples

    Journal ref: Comput. Phys. Commun. 185, 3430 (2014)

  18. Hybridization expansion Monte Carlo simulation of multi-orbital quantum impurity problems: matrix product formalism and improved Monte Carlo sampling

    Authors: Hiroshi Shinaoka, Michele Dolfi, Matthias Troyer, Philipp Werner

    Abstract: We explore two complementary modifications of the hybridization-expansion continuous-time Monte Carlo method, aiming at large multi-orbital quantum impurity problems. One idea is to compute the imaginary-time propagation using a matrix product states representation. We show that bond dimensions considerably smaller than the dimension of the Hilbert space are sufficient to obtain accurate results,… ▽ More

    Submitted 30 June, 2014; v1 submitted 4 April, 2014; originally announced April 2014.

    Comments: 24 pages, 8 figures

    Journal ref: J. Stat. Mech., P0601 (2014)

  19. arXiv:1401.3017  [pdf, other

    cond-mat.str-el

    Chiral spin liquid and emergent anyons in a Kagome lattice Mott insulator

    Authors: B. Bauer, L. Cincio, B. P. Keller, M. Dolfi, G. Vidal, S. Trebst, A. W. W. Ludwig

    Abstract: Topological phases in frustrated quantum spin systems have fascinated researchers for decades. One of the earliest proposals for such a phase was the chiral spin liquid put forward by Kalmeyer and Laughlin in 1987 as the bosonic analogue of the fractional quantum Hall effect. Elusive for many years, recent times have finally seen a number of models that realize this phase. However, these models ar… ▽ More

    Submitted 13 January, 2014; originally announced January 2014.

    Comments: 9 pages, 9 figures; partially supersedes arXiv:1303.6963

    Journal ref: Nature Communications 5, 5137 (2014)

  20. arXiv:1401.2000  [pdf, other

    cs.CE cond-mat.stat-mech physics.comp-ph

    A model project for reproducible papers: critical temperature for the Ising model on a square lattice

    Authors: M. Dolfi, J. Gukelberger, A. Hehn, J. Imriška, K. Pakrouski, T. F. Rønnow, M. Troyer, I. Zintchenko, F. Chirigati, J. Freire, D. Shasha

    Abstract: In this paper we present a simple, yet typical simulation in statistical physics, consisting of large scale Monte Carlo simulations followed by an involved statistical analysis of the results. The purpose is to provide an example publication to explore tools for writing reproducible papers. The simulation estimates the critical temperature where the Ising model on the square lattice becomes magnet… ▽ More

    Submitted 9 January, 2014; originally announced January 2014.

    Comments: Authors are listed in alphabetical order by institution and name. 5 pages, 4 figures

  21. arXiv:1303.6963  [pdf, other

    cond-mat.str-el

    Gapped and gapless spin liquid phases on the Kagome lattice from chiral three-spin interactions

    Authors: Bela Bauer, Brendan P. Keller, Michele Dolfi, Simon Trebst, Andreas W. W. Ludwig

    Abstract: We argue that a relatively simple model containing only SU(2)-invariant chiral three-spin interactions on a Kagome lattice of S=1/2 spins can give rise to both a gapped and a gapless quantum spin liquid. Our arguments are rooted in a formulation in terms of network models of edge states and are backed up by a careful numerical analysis. For a uniform choice of chirality on the lattice, we realize… ▽ More

    Submitted 7 September, 2014; v1 submitted 27 March, 2013; originally announced March 2013.

    Comments: 5+5 pages, 6+6 figures. Manuscript partially superseded by arXiv:1401.3017

  22. arXiv:1203.6363  [pdf, other

    cond-mat.quant-gas cond-mat.str-el

    Multigrid Algorithms for Tensor Network States

    Authors: M. Dolfi, B. Bauer, M. Troyer, Z. Ristivojevic

    Abstract: The widely used density matrix renormalization group (DRMG) method often fails to converge in systems with multiple length scales, such as lattice discretizations of continuum models and dilute or weakly doped lattice models. The local optimization employed by DMRG to optimize the wave function is ineffective in updating large-scale features. Here we present a multigrid algorithm that solves these… ▽ More

    Submitted 12 June, 2012; v1 submitted 28 March, 2012; originally announced March 2012.

    Comments: 5 pages, 7 figures. Accepted for publication in PRL

    Journal ref: Phys. Rev. Lett. 109, 020604 (2012)

  23. arXiv:1103.0740  [pdf, ps, other

    cond-mat.soft physics.bio-ph q-bio.BM q-bio.QM

    Kinetics of double stranded DNA overstretching revealed by 0.5-2 pN force steps

    Authors: Pasquale Bianco, Lorenzo Bongini, Luca Melli, Mario Dolfi, Vincenzo Lombardi

    Abstract: A detailed description of the conformational plasticity of double stranded DNA (ds) is a necessary framework for understanding protein-DNA interactions. Until now, however structure and kinetics of the transition from the basic conformation of ds-DNA (B state) to the 1.7 times longer and partially unwound conformation (S state) have not been defined. The force-extension relation of the ds-DNA of l… ▽ More

    Submitted 3 March, 2011; originally announced March 2011.

    Comments: 13 pages, 10 figures