Date of Award
Spring 2024
Document Type
Thesis (Undergraduate)
Department
Computer Science
First Advisor
Sarah Masud Preum
Abstract
Transformer architectures have revolutionized deep learning, impacting natural language processing and computer vision. Recently, PatchTST has advanced long-term time-series forecasting by embedding patches of time-steps to use as tokens for transformers. This study examines and seeks to enhance PatchTST's embedding techniques. Using eight benchmark datasets, we explore explore novel token embedding techniques. To this end, we introduce several PatchTST variants, which alter the embedding methods of the original paper. These variants consist of the following architectural changes: using CNNs to embed inputs to tokens, embedding an aggregate measure like the mean, max, or sum of a patch, adding the exponential moving average (EMA) of prior tokens to any given token, and adding a residual between neighboring tokens. Our findings show that CNN-based patch embeddings outperform PatchTST’s linear layer strategy, andsimple aggregate measures, particularly embedding just the mean of a patch, provide comparable results to PatchTST for some datasets. These insights highlight the potential for optimizing time-series transformers through improved embedding strategies. Additionally, they point to PatchTST's inefficiency at exploiting all information available in a patch during the token embedding process.
Recommended Citation
Asher, Gabriel L., "Exploring Tokenization Techniques to Optimize Patch-Based Time-Series Transformers" (2024). Computer Science Senior Theses. 47.
https://digitalcommons.dartmouth.edu/cs_senior_theses/47