-
Real-Time Energy Measurement for Non-Intrusive Well-Being Monitoring of Elderly People -- a Case Study
Authors:
Mateusz Brzozowski,
Artur Janicki
Abstract:
This article presents a case study demonstrating a non-intrusive method for the well-being monitoring of elderly people. It is based on our real-time energy measurement system, which uses tiny beacons attached to electricity meters. Four participants aged 67-82 years took part in our study. We observed their electric power consumption for approx. a month, and then we analyzed them, taking into acc…
▽ More
This article presents a case study demonstrating a non-intrusive method for the well-being monitoring of elderly people. It is based on our real-time energy measurement system, which uses tiny beacons attached to electricity meters. Four participants aged 67-82 years took part in our study. We observed their electric power consumption for approx. a month, and then we analyzed them, taking into account the participants' notes on their activities. We created typical daily usage profiles for each participant and used anomaly detection to find unusual energy consumption. We found out that real-time energy measurement can give significant insight into someone's daily activities and, consequently, bring invaluable information to caregivers about the well-being of an elderly person, while being discreet and entirely non-intrusive.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
System for Measurement of Electric Energy Using Beacons with Optical Sensors and LoRaWAN Transmission
Authors:
Łukasz Marcul,
Mateusz Brzozowski,
Artur Janicki
Abstract:
In this article, we present the results of experiments with finding an efficient radio transmission method for an electric energy measurement system called OneMeter 2.0. This system offers a way of collecting energy usage data from beacons attached to regular, non-smart meters. In our study, we compared several low power wide area network (LPWAN) protocols, out of which we chose the LoRaWAN protoc…
▽ More
In this article, we present the results of experiments with finding an efficient radio transmission method for an electric energy measurement system called OneMeter 2.0. This system offers a way of collecting energy usage data from beacons attached to regular, non-smart meters. In our study, we compared several low power wide area network (LPWAN) protocols, out of which we chose the LoRaWAN protocol. We verified the energy consumption of a LoRa-based transmission unit, as well as the transmission range between network nodes in urban conditions. We discovered that LoRaWAN-based transmission was highly energy-efficient and offered decent coverage, even in a difficult, dense urban environment.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion
Authors:
Jakub Hoscilowicz,
Pawel Popiolek,
Jan Rudkowski,
Jedrzej Bieniasz,
Artur Janicki
Abstract:
With the help of simple fine-tuning, one can artificially embed hidden text into large language models (LLMs). This text is revealed only when triggered by a specific query to the LLM. Two primary applications are LLM fingerprinting and steganography. In the context of LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify licensing compliance. In the con…
▽ More
With the help of simple fine-tuning, one can artificially embed hidden text into large language models (LLMs). This text is revealed only when triggered by a specific query to the LLM. Two primary applications are LLM fingerprinting and steganography. In the context of LLM fingerprinting, a unique text identifier (fingerprint) is embedded within the model to verify licensing compliance. In the context of steganography, the LLM serves as a carrier for hidden messages that can be disclosed through a designated trigger.
Our work demonstrates that embedding hidden text in the LLM via fine-tuning, though seemingly secure due to the vast number of potential triggers (any sequence of characters or tokens could serve as a trigger), is susceptible to extraction through analysis of the LLM's output decoding process. We propose a novel approach to extraction called Unconditional Token Forcing. It is premised on the hypothesis that iteratively feeding each token from the LLM's vocabulary into the model should reveal sequences with abnormally high token probabilities, indicating potential embedded text candidates. Additionally, our experiments show that when the first token of a hidden fingerprint is used as an input, the LLM not only produces an output sequence with high token probabilities, but also repetitively generates the fingerprint itself. We also present a method to hide text in such a way that it is resistant to Unconditional Token Forcing, which we named Unconditional Token Forcing Confusion.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Large Language Models for Expansion of Spoken Language Understanding Systems to New Languages
Authors:
Jakub Hoscilowicz,
Pawel Pawlowski,
Marcin Skorupa,
Marcin Sowański,
Artur Janicki
Abstract:
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) that we fine-tune for machine translation of slot-annotated SLU training data. Our approach improved on the MultiATIS++ benchmark, a primar…
▽ More
Spoken Language Understanding (SLU) models are a core component of voice assistants (VA), such as Alexa, Bixby, and Google Assistant. In this paper, we introduce a pipeline designed to extend SLU systems to new languages, utilizing Large Language Models (LLMs) that we fine-tune for machine translation of slot-annotated SLU training data. Our approach improved on the MultiATIS++ benchmark, a primary multi-language SLU dataset, in the cloud scenario using an mBERT model. Specifically, we saw an improvement in the Overall Accuracy metric: from 53% to 62.18%, compared to the existing state-of-the-art method, Fine and Coarse-grained Multi-Task Learning Framework (FC-MTLF). In the on-device scenario (tiny and not pretrained SLU), our method improved the Overall Accuracy from 5.31% to 22.06% over the baseline Global-Local Contrastive Learning Framework (GL-CLeF) method. Contrary to both FC-MTLF and GL-CLeF, our LLM-based machine translation does not require changes in the production architecture of SLU. Additionally, our pipeline is slot-type independent: it does not require any slot definitions or examples.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Authors:
Jakub Hoscilowicz,
Adam Wiacek,
Jan Chojnacki,
Adam Cieslak,
Leszek Michon,
Vitalii Urbanevych,
Artur Janicki
Abstract:
In this work, we explore LLM's internal representation space to identify attention heads that contain the most truthful and accurate information. We further developed the Inference Time Intervention (ITI) framework, which lets bias LLM without the need for fine-tuning. The improvement manifests in introducing a non-linear multi-token probing and multi-token intervention: Non-Linear ITI (NL-ITI), w…
▽ More
In this work, we explore LLM's internal representation space to identify attention heads that contain the most truthful and accurate information. We further developed the Inference Time Intervention (ITI) framework, which lets bias LLM without the need for fine-tuning. The improvement manifests in introducing a non-linear multi-token probing and multi-token intervention: Non-Linear ITI (NL-ITI), which significantly enhances performance on evaluation benchmarks. NL-ITI is tested on diverse multiple-choice datasets, including TruthfulQA, on which we report over 16% relative MC1 (accuracy of model pointing to the correct answer) improvement with respect to the baseline ITI results. Moreover, we achieved a 10% relative improvement over the recently released Truth Forest (TrFf) method that also focused on ITI improvement.
△ Less
Submitted 6 June, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?
Authors:
Jakub Hościłowicz,
Marcin Sowański,
Piotr Czubowski,
Artur Janicki
Abstract:
In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT…
▽ More
In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical applications of the probing paradigm.
△ Less
Submitted 27 January, 2023;
originally announced January 2023.
-
YouSkyde: Information Hiding for Skype Video Traffic
Authors:
Wojciech Mazurczyk,
Maciej Karas,
Krzysztof Szczypiorski,
Artur Janicki
Abstract:
In this paper a new information hiding method for Skype videoconference calls - YouSkyde - is introduced. A Skype traffic analysis revealed that introducing intentional losses into the Skype video traffic stream to provide the means for clandestine communication is the most favourable solution. A YouSkyde proof-of-concept implementation was carried out and its experimental evaluation is presented.…
▽ More
In this paper a new information hiding method for Skype videoconference calls - YouSkyde - is introduced. A Skype traffic analysis revealed that introducing intentional losses into the Skype video traffic stream to provide the means for clandestine communication is the most favourable solution. A YouSkyde proof-of-concept implementation was carried out and its experimental evaluation is presented. The results obtained prove that the proposed method is feasible and offer a steganographic bandwidth as high as 0.93 kbps, while introducing negligible distortions into transmission quality and providing high undetectability.
△ Less
Submitted 25 August, 2016;
originally announced August 2016.
-
"The Good, The Bad And The Ugly": Evaluation of Wi-Fi Steganography
Authors:
Krzysztof Szczypiorski,
Artur Janicki,
Steffen Wendzel
Abstract:
In this paper we propose a new method for the evaluation of network steganography algorithms based on the new concept of "the moving observer". We considered three levels of undetectability named: "good", "bad", and "ugly". To illustrate this method we chose Wi-Fi steganography as a solid family of information hiding protocols. We present the state of the art in this area covering well-known hidin…
▽ More
In this paper we propose a new method for the evaluation of network steganography algorithms based on the new concept of "the moving observer". We considered three levels of undetectability named: "good", "bad", and "ugly". To illustrate this method we chose Wi-Fi steganography as a solid family of information hiding protocols. We present the state of the art in this area covering well-known hiding techniques for 802.11 networks. "The moving observer" approach could help not only in the evaluation of steganographic algorithms, but also might be a starting point for a new detection system of network steganography. The concept of a new detection system, called MoveSteg, is explained in detail.
△ Less
Submitted 9 September, 2015; v1 submitted 20 August, 2015;
originally announced August 2015.
-
Steganalysis of Transcoding Steganography
Authors:
Artur Janicki,
Wojciech Mazurczyk,
Krzysztof Szczypiorski
Abstract:
TranSteg (Trancoding Steganography) is a fairly new IP telephony steganographic method that functions by compressing overt (voice) data to make space for the steganogram by means of transcoding. It offers high steganographic bandwidth, retains good voice quality and is generally harder to detect than other existing VoIP steganographic methods. In TranSteg, after the steganogram reaches the receive…
▽ More
TranSteg (Trancoding Steganography) is a fairly new IP telephony steganographic method that functions by compressing overt (voice) data to make space for the steganogram by means of transcoding. It offers high steganographic bandwidth, retains good voice quality and is generally harder to detect than other existing VoIP steganographic methods. In TranSteg, after the steganogram reaches the receiver, the hidden information is extracted and the speech data is practically restored to what was originally sent. This is a huge advantage compared with other existing VoIP steganographic methods, where the hidden data can be extracted and removed but the original data cannot be restored because it was previously erased due to a hidden data insertion process. In this paper we address the issue of steganalysis of TranSteg. Various TranSteg scenarios and possibilities of warden(s) localization are analyzed with regards to the TranSteg detection. A steganalysis method based on MFCC (Mel-Frequency Cepstral Coefficients) parameters and GMMs (Gaussian Mixture Models) was developed and tested for various overt/covert codec pairs in a single warden scenario with double transcoding. The proposed method allowed for efficient detection of some codec pairs (e.g., G.711/G.729), whilst some others remained more resistant to detection (e.g., iLBC/AMR).
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
Influence of Speech Codecs Selection on Transcoding Steganography
Authors:
Artur Janicki,
Wojciech Mazurczyk,
Krzysztof Szczypiorski
Abstract:
The typical approach to steganography is to compress the covert data in order to limit its size, which is reasonable in the context of a limited steganographic bandwidth. TranSteg (Trancoding Steganography) is a new IP telephony steganographic method that was recently proposed that offers high steganographic bandwidth while retaining good voice quality. In TranSteg, compression of the overt data i…
▽ More
The typical approach to steganography is to compress the covert data in order to limit its size, which is reasonable in the context of a limited steganographic bandwidth. TranSteg (Trancoding Steganography) is a new IP telephony steganographic method that was recently proposed that offers high steganographic bandwidth while retaining good voice quality. In TranSteg, compression of the overt data is used to make space for the steganogram. In this paper we focus on analyzing the influence of the selection of speech codecs on hidden transmission performance, that is, which codecs would be the most advantageous ones for TranSteg. Therefore, by considering the codecs which are currently most popular for IP telephony we aim to find out which codecs should be chosen for transcoding to minimize the negative influence on voice quality while maximizing the obtained steganographic bandwidth.
△ Less
Submitted 30 January, 2012;
originally announced January 2012.