Date of Award

Spring 6-1-2021

Document Type

Thesis (Undergraduate)

Department or Program

Department of Computer Science

First Advisor

V.S. Subrahmanian


Publicly available estimates suggest that in the U.S. alone, IP theft costs our economy between $225 billion and $600 billion each year. In our paper, we propose combating IP theft by generating fake versions of technical documents. If an enterprise system has n fake documents for each real document, any IP thief must sift through an array of documents in an attempt to separate the original from a sea of fakes. This costs the attacker time and money - and inflicts pain and frustration on the part of its technical staff.

Leveraging a graph-theoretic approach, we created the Clique-FakeKG algorithm to achieve a formalized adversary-aware standard. That is, even an attacker who knows our algorithm as well as every input other than the original Knowledge Graph must not be able to identify the real graph. We create a distance graph between all the KGs in our input universal set U where vertices are KGs and edges are only drawn between KGs if the distance between them is within a desired interval. Then, if an (n + 1)-clique in our distance graph contains K0, we have found n fake KGs that fall within the desired distance interval from one another and from K0. In our paper, we first discuss the complexity of this problem and show that it is NP hard. Next, we develop Clique-FakeKG to solve it using probability inspired by work in cryptography.

When testing Clique-FakeKG on human subjects using 3 diverse real-world datasets, we achieved an 86.8% deception rate with users showing difficulty in distinguishing which KG from a set of KGs was the original K0 from which all the graphs were generated.