Home

This page is outdated. Please visit xigaoli.com.

Last update: Aug 9, 2023

I have successfully defended my Ph.D thesis, “Measuring the Role of Automation in Malicious Web Activities” on August 4th, 2023.

Paper accepted in WWW 2023! [Scan Me If You Can: Understanding and Detecting Unwanted Vulnerability Scanning]

Paper accepted in NDSS 2023! [Double and Nothing: Understanding and Detecting Cryptocurrency Giveaway Scams]

Paper Accepted at Oakland 2021! [Good Bot, Bad Bot: Characterizing Automated Browsing Activity]

About Me

I am a Ph.D graduated in the Department of Computer Science at Stony Brook University.

During my Ph.D, I am co-advised by Professor Nick Nikiforakis and Professor Amir Rahmati. My research focuses on web security and machine learning. On one side, I develop systems to measure and classify automated Internet bots through both machine-learning and heuristic approaches, capture malicious bot behaviors by developing “fingerprinting” techniques. On the other side, I aim to build lightweight and pragmatic deep learning models and use information retrieval techniques to get security insights.

Prior to Stony Brook, I worked on file system security and optimization. My work of disaster-tolerance of MooseFS can be found in here(github), as well as some published paper.

Research Projects

Understanding and Detecting Cryptocurrency Giveaway Scams (Paper Accepted at NDSS 2023!) [Paper website]

  • Created automated cryptocurrency scam tracking systems to capture malicious scam webpage advertising cryptocurrency giveaway scams.
  • Collected 10,079 scam web pages in 6 months, extracted 2,266 cryptocurrency scam wallets.
  • World’s first known quantitative analysis to cryptocurrency scam fund loss – attackers have stolen the equivalent of tens of millions of dollars ($26M – $70M).

Understanding and Detecting Unwanted Vulnerability Scanner [Paper PDF]

  • Created a testbed for measuring Web Vulnerability Scanners (WVSs).
  • Tested 12 WVSs and recruited 159 users to understand the difference between human/WVSs.
  • Built high accuracy, high performance detection system, “ScannerScope”.

Measuring bot activity in the wild [Paper PDF]

  • Created automatic systems that can deploy honeypot-like web servers to capture web bot activities.
  • Developed behavioral fingerprinting techniques to detect bot behavior and intention.
  • Analyzed bot behaviors, discover malicious bot intentions of bruteforcing, probing and exploiting vulnerabilities.
  • Created visualization of captured bot dataset, provide security insights.

Disaster-tolerant open-source distributed file system (2015) [github] [paper]

  • Developed hybrid disaster-tolerant model for open-sourced distributed file system
  • Analyzed system performance under testing and production environment
  • Customized and recompiled CentOS kernel for performance optimization

Malware Classification with Deep Neural Network using Lightweight Emulation

    • Developed automated malware emulation pipeline, emulated 11 Million malwares with cost <10 hours for EMBER’17 dataset
    • Extracted malware API call sequence, memory access information and RWX counter
    • Trained lightGBM and character level CNN model, achieved 0.99 AUROC / 0.98 accuracy
    • Developed a hybrid CNN model classifying malware families, reached 0.96 accuracy

Malicious URL detection for mobile browsers through Deep Neural Network 

  • Crawled both malicious and benign URLs from multiple sources
  • Trained a classifier through CNN and RNN(LSTM).
  • Make the model mobile-available, built a browser demo intergrated with ML model.

Animal breed classification with deep neural network [github]

  • Trained a modified VGG16 model to classify cat/dog images and their specific breeds
  • Fine-tuned hyperparameters to achieve best accuracy.
  • Developed web app interface to classify animal breed from URL.

Empirical study with time series data from Anime market [github]

  • Crawled anime ranking data from 2006 to 2021, extracted anime ranking and scoring data through websites, built a clean ranking dataset
  • Analyzed anime ranking trend, visualized with dynamic video [youtube video]
  • Analyzed popular anime picture tags through Safebooru, extracted popular tags from 2011 to 2021
  • Designed a decay algorithm to measure the popularity of tags over time.
  • Built and fine-tuned a multi-label classifier for anime figures based on a modified VGG-19 model; the model can predict possible tags from any anime figures.

Anime face dataset and generation through generative adversarial network

  • Used face alignment technique to extract faces from ~30,000 anime portraits and ~2,500 cosplay human faces, build a anime-face oriented dataset.
  • Generated anime faces through styleGAN2, with aligned 15,000 anime faces through face detection.

Other Projects

Other than major research threads, I build mini-projects for testing new techniques and for fun.

Certifications

Engineering Virtual Program Certificate – Golden Sachs [cert]

  • Crawled common passwords from public online resources to build dictionary.
  • Constructed pre-computed hash table from dictionary, performed reverse-lookup on given dataset.
  • Provided detailed password policy recommendation to improve organization’s password security.

Neural Networks and Deep Learning – Coursera [cert]

  • Manually created of deep neural network for image recognition
  • Optimized hyper-parameters for best accuracy

Misc

[My github]

I’m losing weight and workout for fitness – I was 106kg and now I’m 84kg.

I take pictures for fun, but I also make professional photographs.