This study introduces a transformer-based anomaly detection model designed to flexibly analyze log data using semantic, sequential, and temporal features. Unlike previous methods that rely on fixed log sequence lengths, this approach adapts to varying input sizes while revealing that event occurrence plays the most critical role in detecting anomalies. Experiments across multiple datasets show stable and competitive performance, highlighting both the strengths and limitations of current benchmarks and the need for more diverse anomaly datasets.This study introduces a transformer-based anomaly detection model designed to flexibly analyze log data using semantic, sequential, and temporal features. Unlike previous methods that rely on fixed log sequence lengths, this approach adapts to varying input sizes while revealing that event occurrence plays the most critical role in detecting anomalies. Experiments across multiple datasets show stable and competitive performance, highlighting both the strengths and limitations of current benchmarks and the need for more diverse anomaly datasets.

A Transformer Approach to Log-Based Anomaly Detection

2025/11/04 01:52

:::info Authors:

  1. Xingfang Wu
  2. Heng Li
  3. Foutse Khomh

:::

Abstract

1 Introduction

2 Background and Related Work

2.1 Different Formulations of the Log-based Anomaly Detection Task

2.2 Supervised v.s. Unsupervised

2.3 Information within Log Data

2.4 Fix-Window Grouping

2.5 Related Works

3 A Configurable Transformer-based Anomaly Detection Approach

3.1 Problem Formulation

3.2 Log Parsing and Log Embedding

3.3 Positional & Temporal Encoding

3.4 Model Structure

3.5 Supervised Binary Classification

4 Experimental Setup

4.1 Datasets

4.2 Evaluation Metrics

4.3 Generating Log Sequences of Varying Lengths

4.4 Implementation Details and Experimental Environment

5 Experimental Results

5.1 RQ1: How does our proposed anomaly detection model perform compared to the baselines?

5.2 RQ2: How much does the sequential and temporal information within log sequences affect anomaly detection?

5.3 RQ3: How much do the different types of information individually contribute to anomaly detection?

6 Discussion

7 Threats to validity

8 Conclusions and References

\

Abstract

Log data are generated from logging statements in the source code, providing insights into the execution processes of software applications and systems. State-of-the-art log-based anomaly detection approaches typically leverage deep learning models to capture the semantic or sequential information in the log data and detect anomalous runtime behaviors. However, the impacts of these different types of information are not clear. In addition, existing approaches have not captured the timestamps in the log data which can potentially provide more fine-grained temporal information than the sequential information. In this work, we propose a configurable transformer-based anomaly detection model that can capture the semantic, sequential, and temporal information in the log data and allows us to configure the different types of information as the model’s features. Additionally, we train and evaluate the proposed model using log sequences of different lengths, thus overcoming the constraint of existing methods that rely on fixed-length or time-windowed log sequences as inputs. With the proposed model, we conduct a series of experiments with different combinations of input features to evaluate the roles of different types of information (i.e., sequential, temporal, semantic information) in anomaly detection. The model can attain competitive and consistently stable performance compared to the baselines when presented with log sequences of varying lengths. The results indicate that the event occurrence information plays the key role in identifying anomalies, while the impact of the sequential and temporal information is not significant for anomaly detection in the studied public datasets. On the other hand, the findings also reveal the simplicity of the studied public datasets and highlight the importance of constructing new datasets that contain different types of anomalies to better evaluate the performance of anomaly detection models.

\

1 Introduction

Logging is commonly used among software developers to track the runtime status of software systems. Logs, generated through logging statements within program source code, provide insight into the execution sequence of code. They serve as the primary source of information for understanding system status and performance issues [1]. With logs, practitioners can diagnose system failures and analyze root causes. Initially designed for human readability, logs contain elements of natural language to some extent. As systems and applications grow increasingly complex, the volume of generated logs expands, rendering manual examination impractical and inefficient [2]. Researchers and developers in both academia and industry have developed various automated log analysis approaches, leveraging different types of information within log data [3]. Despite numerous studies aimed at understanding the effectiveness of these approaches, the roles of different types of information in log-based anomaly detection remain unclear. Log data are semi-structural textual data following common structures defined by developers using logging libraries. Typically, an automatic log analysis workflow contains pre-processing steps that transform the semi-structural logs into structural logs with the knowledge of the structure of the log data being processed [3]. Logs, generated by logging statements comprising both static log templates and dynamic parameters during runtime, are typically separated for further processing. As we usually do not have access to the logging statements that generate the log messages, log parsers are developed to identify dynamic fields and group the logs by their templates.

\ Most of the existing log-based anomaly detection approaches work on log sequences [1, 5]. These approaches demand the log data to be grouped into log sequences for which anomalies are detected. Logs generated by some systems contain certain fields (e.g., block ID in HDFS dataset) by which logs can be grouped accordingly. For the log data that do not have clues to be grouped, previous approaches usually adopt a fix-length or fix-time sliding window grouping. These sequences serve as basic units for log anomaly detection. Besides, some approaches (e.g., Logsy [6]) focus on identifying anomalies associated with certain log templates and, therefore, work on log events without considering the contexts offered by log sequences. Log representation is an indispensable step in automated log analysis, which transforms textual log messages into numerical vectors [7]. In most classical approaches, Message Count Vector (MCV) [1, 8, 9], which counts the occurrences of log templates within a log sequence, is used to perform the transformation. In these representation techniques, sequential information within log sequences is lost. There are also some approaches that directly use the sequence of the log templates to represent a log sequence [10]. As log messages usually contain natural language, in order to harness the semantic information therein, embedding techniques or pre-trained language models from natural language processing are employed in more recent and advanced approaches [11, 12]. Differing in their mechanisms, these methods do not necessarily require the input logs to be grouped based on their templates by the parsing process.

\ Machine learning models that match the obtained log representation are adopted to detect the anomalies within log events or sequences. In particular, some sequential models (e.g., CNN, LSTM, Transformer) that accept inputs of a series of log items are proposed [5]. The utilization of sequential models is based on the intuition that log events within sequences follow certain patterns and rules [10]. Moreover, previous approaches formulate the anomaly detection task in different ways [1, 5]. Some approaches formulate the anomaly detection task as a binary classification problem and train the classifiers under a supervised scheme to classify logs or log sequences as anomalies or normal events/sequences [1]. Some works formulate the task to predict future log events given sequences of past events [5, 10]. Once the real future events are not among the predicted candidate events, the sequence is considered to be abnormal. Moreover, some works formulate the problem by identifying pattern violations and adopting machine learning techniques, like clustering and dimension reduction, on the feature represented to find anomalies

\ The existing research and approaches of log-based anomaly detection face several challenges. First, the evaluations are carried out under different settings with the same datasets, which hinders the fair comparisons of existing approaches [15]. Different grouping settings adopted by different works cause the mismatch of the sample amounts and thus influence the fairness of the evaluation metrics reported. Second, although various sequential models are employed, the significance of utilizing sequential information in anomaly detection is unclear [16]. Third, there is a limited number of datasets available for the anomaly detection task [17]. The quality and characteristics of these datasets are not clear despite their wide adoption. Fourth, timestamp as a common field for all log data, which may be informative for some kinds of anomalies associated with system performance, is usually ignored in existing anomaly detection approaches [1, 5]. Understanding the role of temporal information within timestamps may be beneficial for enhancing the effectiveness of anomaly detection tasks. Contribution In this study, we propose a log-based anomaly detection method that is based on a transformer model. Our proposed model is designed to be flexible and configurable: The input feature can be configured to use any combination of semantic, sequential, and temporal information, which provides flexibility for the evaluations. Moreover, the model can handle log sequences of different lengths, which alleviates the requirement of strict settings in the log grouping process that is commonly done in prior works [1, 5]. We evaluate our proposed approaches against baseline approaches over four public datasets. Furthermore, we utilize the proposed model as a tool to conduct extensive evaluations that aim to strengthen the understanding of the anomaly detection task and the characteristics of public datasets commonly used to evaluate the anomaly detection methods. Specifically, we aim to examine the roles of various types of information inherent in log sequences for the anomaly detection task. For instance, by incorporating the temporal or sequential information from log sequences into their representation, we investigate the importance of utilizing this information for anomaly detection in the studied datasets.

Research Questions We organize our study with the following research questions

(RQs): RQ1: How does our proposed anomaly detection model perform compared to the baselines?

RQ2: How much does the sequential and temporal information within log sequences affect anomaly detection?

RQ3: How much do the different types of information individually contribute to anomaly detection?

\ Paper Organization The rest of the paper is organized as follows. Section 2 introduces some background about the log-based anomaly detection tasks. Section 3 describes the design of our Transformer-based anomaly detection model used in the experiments. Section 4 details the experimental setup. We organize the experimental results by the three research questions proposed in Section 5. In Section 6, we further discuss the results and summarize the findings. Section 2.5 lists the works that are closely related to our study. Section 7 identifies the threats to validity of our experiments and findings. At last, we conclude our work in Section 8.

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Share Insights

You May Also Like

Here is What Every Investor Should Do in a Crypto Bear Market

Here is What Every Investor Should Do in a Crypto Bear Market

The post Here is What Every Investor Should Do in a Crypto Bear Market appeared on BitcoinEthereumNews.com. When prices start to crater, crowds of traders run for the hills in fear, selling into a market bottom. But history has also shown that, painful as they are, downturns in crypto can be among the richest moments for those who know what they are doing. But unlike traditional markets, crypto never sleeps and trades off narratives, as well as moves right now on innovation, or news around the world. Which is why bear markets are so volatile — and also a time when they can be fertile ground for disciplined investors who are ready rather than panicked. In past cycles, the money managers who took this longer-term approach rather than chasing quick rebounds tended to make the biggest gains when the bull market returned. Against that kind of backdrop, the humpbacked migration-type of big-game whale behavior, like seen on MAGACOIN FINANCE, is a signal that pro money has already been quietly positioning for what’s upcoming, regardless of whether retail follows their tempo or not.  Focus on Fundamentals Bear markets separate the wheat from the chaff, revealing who is genuinely building utility and who was just hype. Investors would do well to monitor developer activity, real-world applications and active partnerships along with them. Strongly established, tech-backed cryptocurrencies with active communities have the best chances of weathering a storm and also making it against the upcoming bull cycle.  Accumulate Gradually Finding the exact bottom is nearly impossible. Instead of waiting for the “perfect” entry, strategies like dollar-cost averaging (DCA) allow steady accumulation over time. This approach lowers the emotional pressure of market timing and builds exposure at more favorable prices, preparing portfolios for recovery when optimism returns. Diversify Wisely Focusing on one token is exhilarating when the market is booming, but it can also be destructive during down cycles. Holding a…
Share
BitcoinEthereumNews2025/09/20 10:16
Preliminary analysis of the Balancer V2 attack, which resulted in a loss of $120 million.

Preliminary analysis of the Balancer V2 attack, which resulted in a loss of $120 million.

On November 3, the Balancer V2 protocol and its fork projects were attacked on multiple chains, resulting in a serious loss of more than $120 million. BlockSec issued an early warning at the first opportunity [1] and gave a preliminary analysis conclusion [2]. This was a highly complex attack. Our preliminary analysis showed that the root cause was that the attacker manipulated the invariant, thereby distorting the calculation of the price of BPT (Balancer Pool Token) -- that is, the LP token of Balancer Pool -- so that it could profit in a stable pool through a batchSwap operation. Background Information 1. Scaling and Rounding To standardize the decimal places of different tokens, the Balancer contract will: upscale: Upscales the balance and amount to a uniform internal precision before performing the calculation; downscale: Reduces the result to its original precision and performs directional rounding (e.g., inputs are usually rounded up to ensure the pool is not under-filled; output paths are often truncated downwards). Conclusion: Within the same transaction, the asymmetrical rounding direction used in different stages can lead to a systematic slight deviation when executed repeatedly in very small steps. 2. Prices of D and BPT The Balancer V2 protocol’s Composable Stable Pool[3] and the fork protocol were affected by this attack. Stable Pool is used for assets that are expected to maintain a close 1:1 exchange ratio (or be exchanged at a known exchange rate), allowing large exchanges without causing significant price shocks, thereby greatly improving the efficiency of capital utilization between similar or related assets. The pool uses the Stable Math (a Curve-based StableSwap model), where the invariant D represents the pool's "virtual total value". The approximate price of BPT (Pool's LP Token) is: The formula above shows that if D is made smaller on paper (even if no funds are actually withdrawn), the price of BPT will be cheaper. BTP represents the pool share and is used to calculate how many pool reserves can be obtained when withdrawing liquidity. Therefore, if an attacker can obtain more BPT, they can profit when withdrawing liquidity. Attack Analysis Taking an attack transaction on Arbitrum as an example, the batchSwap operation can be divided into three stages: Phase 1: The attacker redeems BPT for the underlying asset to precisely adjust the balance of one of the tokens (cbETH) to a critical point (amount = 9) for rounding. This step sets the stage for the precision loss in the next phase. Phase Two: The attacker uses a carefully crafted quantity (= 8) to swap between another underlying asset (wstETH) and cbETH. Due to rounding down when scaling the token quantity, the calculated Δx is slightly smaller (from 8.918 to 8), causing Δy to be underestimated and the invariant D (derived from Curve's StableSwap model) to be smaller. Since BPT price = D / totalSupply, the BPT price is artificially suppressed. Phase 3: The attackers reverse-swap the underlying assets back to BPT, restoring the balance within the pool while profiting from the depressed price of BPT—acquiring more BPT tokens. Finally, the attacker used another profitable transaction to withdraw liquidity, thereby using the extra BPT to acquire other underlying assets (cbETH and wstETH) in the Pool and thus profit. Attacking the transaction: https://app.blocksec.com/explorer/tx/arbitrum/0x7da32ebc615d0f29a24cacf9d18254bea3a2c730084c690ee40238b1d8b55773 Profitable trades: https://app.blocksec.com/explorer/tx/arbitrum/0x4e5be713d986bcf4afb2ba7362525622acf9c95310bd77cd5911e7ef12d871a9 Reference: [1]https://x.com/Phalcon_xyz/status/1985262010347696312 [2]https://x.com/Phalcon_xyz/status/1985302779263643915 [3]https://docs-v2.balancer.fi/concepts/pools/composable-stable.html
Share
PANews2025/11/04 14:00