This page is outdated. Please visit xigaoli.com.
Last update: Aug 9, 2023
I have successfully defended my Ph.D thesis, “Measuring the Role of Automation in Malicious Web Activities” on August 4th, 2023.
Paper accepted in WWW 2023! [Scan Me If You Can: Understanding and Detecting Unwanted Vulnerability Scanning]
Paper accepted in NDSS 2023! [Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams]
Paper Accepted at Oakland 2021! [Good Bot, Bad Bot: Characterizing Automated Browsing Activity]
About Me
I am a Ph.D graduated in the Department of Computer Science at Stony Brook University.
During my Ph.D, I am co-advised by Professor Nick Nikiforakis and Professor Amir Rahmati. My research focuses on web security and machine learning. On one side, I develop systems to measure and classify automated Internet bots through both machine-learning and heuristic approaches, capture malicious bot behaviors by developing “fingerprinting” techniques. On the other side, I aim to build lightweight and pragmatic deep learning models and use information retrieval techniques to get security insights.
Prior to Stony Brook, I worked on file system security and optimization. My work of disaster-tolerance of MooseFS can be found in here(github), as well as some published paper.
Research Projects
Understanding and Detecting Cryptocurrency Giveaway Scams (Paper Accepted at NDSS 2023!) [Paper website]
- Created automated cryptocurrency scam tracking systems to capture malicious scam webpage advertising cryptocurrency giveaway scams.
- Collected 10,079 scam web pages in 6 months, extracted 2,266 cryptocurrency scam wallets.
- World’s first known quantitative analysis to cryptocurrency scam fund loss – attackers have stolen the equivalent of tens of millions of dollars ($26M – $70M).
Understanding and Detecting Unwanted Vulnerability Scanner [Paper PDF]
- Created a testbed for measuring Web Vulnerability Scanners (WVSs).
- Tested 12 WVSs and recruited 159 users to understand the difference between human/WVSs.
- Built high accuracy, high performance detection system, “ScannerScope”.
Measuring bot activity in the wild [Paper PDF]
- Created automatic systems that can deploy honeypot-like web servers to capture web bot activities.
- Developed behavioral fingerprinting techniques to detect bot behavior and intention.
- Analyzed bot behaviors, discover malicious bot intentions of bruteforcing, probing and exploiting vulnerabilities.
- Created visualization of captured bot dataset, provide security insights.
Disaster-tolerant open-source distributed file system (2015) [github] [paper]
- Developed hybrid disaster-tolerant model for open-sourced distributed file system
- Analyzed system performance under testing and production environment
- Customized and recompiled CentOS kernel for performance optimization
Malware Classification with Deep Neural Network using Lightweight Emulation
-
- Developed automated malware emulation pipeline, emulated 11 Million malwares with cost <10 hours for EMBER’17 dataset
- Extracted malware API call sequence, memory access information and RWX counter
- Trained lightGBM and character level CNN model, achieved 0.99 AUROC / 0.98 accuracy
- Developed a hybrid CNN model classifying malware families, reached 0.96 accuracy
Malicious URL detection for mobile browsers through Deep Neural Network
- Crawled both malicious and benign URLs from multiple sources
- Trained a classifier through CNN and RNN(LSTM).
- Make the model mobile-available, built a browser demo intergrated with ML model.
Animal breed classification with deep neural network [github]
- Trained a modified VGG16 model to classify cat/dog images and their specific breeds
- Fine-tuned hyperparameters to achieve best accuracy.
- Developed web app interface to classify animal breed from URL.
Empirical study with time series data from Anime market [github]
- Crawled anime ranking data from 2006 to 2021, extracted anime ranking and scoring data through websites, built a clean ranking dataset
- Analyzed anime ranking trend, visualized with dynamic video [youtube video]
- Analyzed popular anime picture tags through Safebooru, extracted popular tags from 2011 to 2021
- Designed a decay algorithm to measure the popularity of tags over time.
- Built and fine-tuned a multi-label classifier for anime figures based on a modified VGG-19 model; the model can predict possible tags from any anime figures.
Anime face dataset and generation through generative adversarial network
- Used face alignment technique to extract faces from ~30,000 anime portraits and ~2,500 cosplay human faces, build a anime-face oriented dataset.
- Generated anime faces through styleGAN2, with aligned 15,000 anime faces through face detection.
Other Projects
Other than major research threads, I build mini-projects for testing new techniques and for fun.
- safebooru tag trend from 2010-2020:
- Interesting covid tracking map through Plotly:
- A simple tool to crawl through your hard drive and show you random pictures every day:
- A simple browser fingerprinting tester using fingerprintJS2:
- Another mock personal homepage, but built through Wangler workers:
- Simple but pragmatic tool blocking SogouInput ads and tracking:
Certifications
Engineering Virtual Program Certificate – Golden Sachs [cert]
- Crawled common passwords from public online resources to build dictionary.
- Constructed pre-computed hash table from dictionary, performed reverse-lookup on given dataset.
- Provided detailed password policy recommendation to improve organization’s password security.
Neural Networks and Deep Learning – Coursera [cert]
- Manually created of deep neural network for image recognition
- Optimized hyper-parameters for best accuracy
Misc
I’m losing weight and workout for fitness – I was 106kg and now I’m 84kg.
I take pictures for fun, but I also make professional photographs.