Skip to main content

Showing 1–20 of 20 results for author: Szymański, P

Searching in archive cs. Search in all archives.
.
  1. SRAI: Towards Standardization of Geospatial AI

    Authors: Piotr Gramacki, Kacper Leśniara, Kamil Raczycki, Szymon Woźniak, Marcin Przymus, Piotr Szymański

    Abstract: Spatial Representations for Artificial Intelligence (srai) is a Python library for working with geospatial data. The library can download geospatial data, split a given area into micro-regions using multiple algorithms and train an embedding model using various architectures. It includes baseline models as well as more complex methods from published works. Those capabilities make it possible to us… ▽ More

    Submitted 23 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted for the 6th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GeoAI 2023)

  2. arXiv:2306.01428  [pdf, other

    cs.SD cs.LG eess.AS

    Improved DeepFake Detection Using Whisper Features

    Authors: Piotr Kawa, Marcin Plata, Michał Czuba, Piotr Szymański, Piotr Syga

    Abstract: With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing. Several different detection methods have been presented as a countermeasure. Many methods are based on so-called front-ends, which, by transforming the raw audio, emphasize features crucial for assessing the genuineness of the audio sample. Our contribution contains investigating the… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted to INTERSPEECH 2023

  3. highway2vec -- representing OpenStreetMap microregions with respect to their road network characteristics

    Authors: Kacper Leśniara, Piotr Szymański

    Abstract: Recent years brought advancements in using neural networks for representation learning of various language or visual phenomena. New methods freed data scientists from hand-crafting features for common tasks. Similarly, problems that require considering the spatial variable can benefit from pretrained map region representations instead of manually creating feature tables that one needs to prepare t… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: Accepted at GeoAI '22: Proceedings of the 5th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery

  4. arXiv:2211.13112  [pdf, other

    cs.CL cs.IR cs.LG

    This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish

    Authors: Łukasz Augustyniak, Kamil Tagowski, Albert Sawczyn, Denis Janiak, Roman Bartusiak, Adrian Szymczak, Marcin Wątroba, Arkadiusz Janz, Piotr Szymański, Mikołaj Morzy, Tomasz Kajdanowicz, Maciej Piasecki

    Abstract: The availability of compute and data to train larger and larger language models increases the demand for robust methods of benchmarking the true progress of LM training. Recent years witnessed significant progress in standardized benchmarking for English. Benchmarks such as GLUE, SuperGLUE, or KILT have become de facto standard tools to compare large language models. Following the trend to replica… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 10 pages, 8 pages appendix

    Journal ref: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS 2022) - https://lepiszcze.ml

  5. Transfer Learning Approach to Bicycle-sharing Systems' Station Location Planning using OpenStreetMap Data

    Authors: Kamil Raczycki, Piotr Szymański

    Abstract: Bicycle-sharing systems (BSS) have become a daily reality for many citizens of larger, wealthier cities in developed regions. However, planning the layout of bicycle-sharing stations usually requires expensive data gathering, surveying travel behavior and trip modelling followed by station layout optimization. Many smaller cities and towns, especially in develo** areas, may have difficulty finan… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted to 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities

  6. Hex2vec -- Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags

    Authors: Szymon Woźniak, Piotr Szymański

    Abstract: Representation learning of spatial and geographic data is a rapidly develo** field which allows for similarity detection between areas and high-quality inference using deep neural networks. Past approaches however concentrated on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. In this paper we propose the first approach to learning vector representati… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted at 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GEOAI '21)

  7. gtfs2vec -- Learning GTFS Embeddings for comparing Public Transport Offer in Microregions

    Authors: Piotr Gramacki, Szymon Woźniak, Piotr Szymański

    Abstract: We selected 48 European cities and gathered their public transport timetables in the GTFS format. We utilized Uber's H3 spatial index to divide each city into hexagonal micro-regions. Based on the timetables data we created certain features describing the quantity and variety of public transport availability in each region. Next, we trained an auto-associative deep neural network to embed each of… ▽ More

    Submitted 2 November, 2021; v1 submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted at 1st ACM SIGSPATIAL International Workshop on Searching and Mining Large Collections of Geospatial Data (GeoSearch 2021)

  8. arXiv:2110.05573  [pdf, other

    cs.SI cs.CL cs.LG

    Spatial Data Mining of Public Transport Incidents reported in Social Media

    Authors: Kamil Raczycki, Marcin Szymański, Yahor Yeliseyenka, Piotr Szymański, Tomasz Kajdanowicz

    Abstract: Public transport agencies use social media as an essential tool for communicating mobility incidents to passengers. However, while the short term, day-to-day information about transport phenomena is usually posted in social media with low latency, its availability is short term as the content is rarely made an aggregated form. Social media communication of transport phenomena usually lacks GIS ann… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Preprint, accepted to IWCTS at SIGSPATIAL'21

  9. arXiv:2010.03432  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    WER we are and WER we think we are

    Authors: Piotr Szymański, Piotr Żelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Żyła-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, Yishay Carmiel

    Abstract: Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR s… ▽ More

    Submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to EMNLP Findings

  10. arXiv:2010.03088  [pdf, other

    cs.CL cs.LG stat.ME

    Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing

    Authors: Piotr Szymański, Kyle Gorman

    Abstract: Recent work raises concerns about the use of standard splits to compare natural language processing models. We propose a Bayesian statistical model comparison technique which uses k-fold cross-validation across multiple data sets to estimate the likelihood that one model will outperform the other, or that the two will produce practically equivalent results. We use this technique to rank six Englis… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted to EMNLP2020

  11. arXiv:2004.05985  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

    Authors: Łukasz Augustyniak, Piotr Szymanski, Mikołaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak

    Abstract: Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for better alignment of homonym embeddings… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: submitted to INTERSPEECH'20

  12. arXiv:1909.02851  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations

    Authors: Jan Mizgajski, Adrian Szymczak, Robert Głowski, Piotr Szymański, Piotr Żelasko, Łukasz Augustyniak, Mikołaj Morzy, Yishay Carmiel, Jeff Hodson, Łukasz Wójciak, Daniel Smoczyk, Adam Wróbel, Bartosz Borowik, Adam Artajew, Marcin Baran, Cezary Kwiatkowski, Marzena Żyła-Hoppe

    Abstract: Avaya Conversational Intelligence(ACI) is an end-to-end, cloud-based solution for real-time Spoken Language Understanding for call centers. It combines large vocabulary, real-time speech recognition, transcript refinement, and entity and intent recognition in order to convert live audio into a rich, actionable stream of structured events. These events can be further leveraged with a business rules… ▽ More

    Submitted 2 September, 2019; originally announced September 2019.

    Comments: Accepted for Interspeech 2019

  13. arXiv:1908.07888  [pdf, other

    cs.CL

    Towards Better Understanding of Spontaneous Conversations: Overcoming Automatic Speech Recognition Errors With Intent Recognition

    Authors: Piotr Żelasko, Jan Mizgajski, Mikołaj Morzy, Adrian Szymczak, Piotr Szymański, Łukasz Augustyniak, Yishay Carmiel

    Abstract: In this paper, we present a method for correcting automatic speech recognition (ASR) errors using a finite state transducer (FST) intent recognition framework. Intent recognition is a powerful technique for dialog flow management in turn-oriented, human-machine dialogs. This technique can also be very useful in the context of human-human dialogs, though it serves a different purpose of key insight… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

  14. arXiv:1812.02956  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    LNEMLC: Label Network Embeddings for Multi-Label Classification

    Authors: Piotr Szymański, Tomasz Kajdanowicz, Nitesh Chawla

    Abstract: Multi-label classification aims to classify instances with discrete non-exclusive labels. Most approaches on multi-label classification focus on effective adaptation or transformation of existing binary and multi-class learning approaches but fail in modelling the joint probability of labels or do not preserve generalization abilities for unseen label combinations. To address these issues we propo… ▽ More

    Submitted 1 January, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

    Comments: submitted to TPAMI

  15. arXiv:1807.00543  [pdf, other

    cs.CL

    Punctuation Prediction Model for Conversational Speech

    Authors: Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak

    Abstract: An ASR system usually does not predict any punctuation or capitalization. Lack of punctuation causes problems in result presentation and confuses both the human reader andoff-the-shelf natural language processing algorithms. To overcome these limitations, we train two variants of Deep Neural Network (DNN) sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutiona… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: Accepted for Interspeech 2018 Conference

  16. arXiv:1707.07913  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Spatio-temporal profiling of public transport delays based on large scale vehicle positioning data from GPS in Wrocław

    Authors: Piotr Szymański, Michał Żołnieruk, Piotr Oleszczyk, Igor Gisterek, Tomasz Kajdanowicz

    Abstract: In recent years many studies of urban mobility based on large data sets have been published: most of them based on crowdsourced GPS data or smart-card data. We present, what is to our knowledge the first, exploration of public transport delay data harvested from a large-scale, official public transport positioning system, provided by the Wrocław Municipality. We evaluate the characteristics of del… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.

    Comments: accepted to KnowME2017

  17. arXiv:1704.08756  [pdf, other

    stat.ML cs.LG stat.ME

    A Network Perspective on Stratification of Multi-Label Data

    Authors: Piotr Szymański, Tomasz Kajdanowicz

    Abstract: In the recent years, we have witnessed the development of multi-label classification methods which utilize the structure of the label space in a divide and conquer approach to improve classification performance and allow large data sets to be classified efficiently. Yet most of the available data sets have been provided in train/test splits that did not account for maintaining a distribution of hi… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

    Comments: submitted for ECML2017

  18. arXiv:1702.04013  [pdf, ps, other

    cs.LG stat.ML

    Is a Data-Driven Approach still Better than Random Choice with Naive Bayes classifiers?

    Authors: Piotr Szymański, Tomasz Kajdanowicz

    Abstract: We study the performance of data-driven, a priori and random approaches to label space partitioning for multi-label classification with a Gaussian Naive Bayes classifier. Experiments were performed on 12 benchmark data sets and evaluated on 5 established measures of classification quality: micro and macro averaged F1 score, Subset Accuracy and Hamming loss. Data-driven methods are significantly be… ▽ More

    Submitted 13 February, 2017; originally announced February 2017.

  19. arXiv:1702.01460  [pdf, other

    cs.LG cs.MS

    A scikit-based Python environment for performing multi-label classification

    Authors: Piotr Szymański, Tomasz Kajdanowicz

    Abstract: scikit-multilearn is a Python library for performing multi-label classification. The library is compatible with the scikit/scipy ecosystem and uses sparse matrices for all internal operations. It provides native Python implementations of popular multi-label classification methods alongside a novel framework for label space partitioning and division. It includes modern algorithm adaptation methods,… ▽ More

    Submitted 10 December, 2018; v1 submitted 5 February, 2017; originally announced February 2017.

  20. arXiv:1606.02346  [pdf, other

    cs.LG cs.PF cs.SI stat.ML

    How is a data-driven approach better than random choice in label space division for multi-label classification?

    Authors: Piotr Szymański, Tomasz Kajdanowicz, Kristian Kersting

    Abstract: We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both we… ▽ More

    Submitted 7 June, 2016; originally announced June 2016.