Date of Award

Spring 2024

Document Type

Thesis (Undergraduate)

Department

Computer Science

First Advisor

Sarah Masud Preum

Abstract

Transformer architectures have revolutionized deep learning, impacting natural language processing and computer vision. Recently, PatchTST has advanced long-term time-series forecasting by embedding patches of time-steps to use as tokens for transformers. This study examines and seeks to enhance PatchTST's embedding techniques. Using eight benchmark datasets, we explore explore novel token embedding techniques. To this end, we introduce several PatchTST variants, which alter the embedding methods of the original paper. These variants consist of the following architectural changes: using CNNs to embed inputs to tokens, embedding an aggregate measure like the mean, max, or sum of a patch, adding the exponential moving average (EMA) of prior tokens to any given token, and adding a residual between neighboring tokens. Our findings show that CNN-based patch embeddings outperform PatchTST’s linear layer strategy, andsimple aggregate measures, particularly embedding just the mean of a patch, provide comparable results to PatchTST for some datasets. These insights highlight the potential for optimizing time-series transformers through improved embedding strategies. Additionally, they point to PatchTST's inefficiency at exploiting all information available in a patch during the token embedding process.

Share

COinS