Youngseo Son
I am a PhD Candidate in Computer Science of Stony Brook University.
I am very grateful for having H Andrew Schwartz as my advisor.
My key research focus is in the field of Natural Language Processing (NLP) for social media analysis, language modeling, information extraction and data analysis. I collaborate with psychologists and computational linguists for Human-centered language modeling to obtain higher accuracies of various NLP tasks from traditional tasks (e.g., sentiment analysis) to novel tasks such as discourse style analysis for psychological assessment and well-being measurement. I especially focus on discourse relation parsing to extract key information for targeted tasks such as opinions or reasons for sentiment of reviews and a political stance, and finding the correlations of discourse styles with human variables such as personality.
[CV]
Experience
Data Scientist PhD Intern
Data Sciences and Analytics Group in Pacific Northwest National Laboratory
I am very grateful for the experience working with Svitlana Volkova as my PI and Maria Glenski as my mentor.
DARPA SocialSim Project
Developed SocialSim modules to analyze information / graph evolution and cross-platform misinformation / disinformation spreads on social media (e.g., Twitter, Reddit, Github)
Collaborated with Prasha Shrestha for detecting coordinated efforts and analyzing trends and spread mechanisms of cryptocurrencies over social media
Summer 2019
Research Scientist Intern
World Well-Being Project (WWBP), the University of Pennsylvania
Developed a discourse relation parser for social media to capture counterfactual thinking from tweets.
The NLP pipeline of the joint model of the rule-based model (regex with Tweet Brown Clusters) and the statistical model (Linear SVM with discourse unit extraction)
Led the project with the help of Prof. Lyle Ungar and Anneke Buffone of the WWBP team.
Summer 2016
Teaching Assistant
Stony Brook University
Graduate Courses: Big Data Analytics (Fall 2016, Fall 2017)
Undergraduate Courses: Senior Software Engineering, Computer Science III, Advanced Game Programming, and Computer Music
Spring 2014 – Fall 2017
Software Engineer Intern
Dassault Systemes
Development of the Product Lifecycle Management (PLM) Web Application (ENOVIA)
Worked on updating PLM chart display and data visualization functions with PLM Development senior software engineers and pre sales team members
December 2012 – February 2013
Information Technology Specialist (25B)
the ROK Army & the US Army
Worked with 2nd Infantry Division 8th Army of the United States
Managed networks, computers, peripheral devices, and the online portal of the battalion
August 2010 – May 2012
News
Invited Talks at INFORMS 2019
Two Session Talks for NLP Applications for Decision Support and Social Media Mining
Presenting our causal explanation analysis research at Machine Reading and Comprehension for Science-Practice Knowledge Synthesis session
Presenting our human-centered NLP social media mining technique at Social Media Mining: Techniques and Applications session
OCT 20-23, 2019
Selected Publication
Suicide Risk Assessment with Multi-level Dual-Context Language and BERT
Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy Vu, Mohammadzaman Zamani, Parth Limbachiya, Sharath Chandra Guntuku, H. Andrew Schwartz
Ranked No 1. for predicting reddit users' suicide risk level using their SuicideWatch and Non-SuicideWatch posts (Task B). Developed user-factor-adapted RNN models with post-level attention using BERT and psychology language model representations of reddit posts
NAACL 2019 CLPsych
The Language of Well-Being: Tracking Fluctuations in Emotion Experience through Everyday Speech
Jessie Sun, H.Andrew Schwartz, Youngseo Son, Margaret Kern, Simine Vazire
LDA Topic modeling to capture momentary emotions from language (validated by the replication in the second year). Exploration over Linguistic Inquiry and Word Count (LIWC) categories and open-vocabulary models for the correlation analysis between language and momentary emotion.
Journal of Personality and Social Psychology
Causal Explanation Analysis on Social Media
Youngseo Son, Nipun Bayas, H. Andrew Schwartz
The NLP pipeline of the joint model of the causality classifier (Linear SVM) and the causal explanation identifier (Bidirectional LSTM). The application of the pipeline to downstream tasks (Facebook Demographic Analysis and Yelp Review Sentiment Cause Detection)
[Data] [Code] [Pretrained Models]
EMNLP 2018
Human Centered NLP with User-Factor Adaptation
Veronica E. Lynn, Youngseo Son, Vivek Kulkarni, Niranjan Balasubramanian, H. Andrew Schwartz
Feature Adaptation of NLP models using human variables (age, gender, and personality) for downstream tasks (POS Tagging, PP-Attachment, Sentiment, Sarcasm, Stance)
[Code]
EMNLP 2017
Recognizing Counterfactual Thinking in Social Media Texts
Youngseo Son, Anneke Buffone, Anthony Janocko, Allegra Larche, Joseph Raso, Kevin Zembroski, H Andrew Schwartz, Lyle Ungar
The NLP pipeline of the joint model of the rule-based model (regular expression capable of capturing social-media-specific variations of discourse connectives with Tweet Brown Clusters) and the statistical model (Linear SVM)
[Data]
ACL 2017
Education
Stony Brook University
Doctor of Philosophy
Computer Science
GPA: 3.89/4.00
August 2015 – Present
Stony Brook University
Bachelor of Science
Summa Cum Laude
Departmental Honors
Computer Science
GPA: 3.93/4.00
August 2013 – May 2015
Ajou University
Bachelor of Engineering
Computer Engineering
GPA: 4.36/4.50
March 2009 – August 2013
Recent Projects
9/11 World Trade Center Project
Project in collaboration with Stony Brook WTC Wellness Program and Stony Brook Medicine
Analyzing interviews of people who were at the scene of 9/11 WTC Attack.
Correlating linguistic features of the subjects with their mental/physical health.
Using LDA topic clustering, discourse relation parsing, sentiment/emotion lexicons.
August 2017 – Present
The Language of Well-Being Project
Project in collaboration with the University of California, Davis and the University of Melbourne
Correlating linguistic features of people's everyday language with the changes of their emotions.
Conducting LDA topic clustering over the transcripts of the participants' daily speech for the emotion analysis.
Using N-gram, Linguistic Inquiry and Word Count (LIWC), sentiment/emotion lexicons.
July 2017 – Present
Awards & Scholarship
Special CS Department Chair Fellowship – August 2015
<i class=”fa-li fa fa-trophy text-warning“></i>
Stony Brook University Computer Science Award of Honor – May 2015
<i class=”fa-li fa fa-trophy text-warning“></i>
Stony Brook University URECA Stipend – Summer 2014
<i class=”fa-li fa fa-trophy text-warning“></i>
Stony Brook University Outstanding Academic Achievement Awards – Fall 2013, Spring 2014
<i class=”fa-li fa fa-trophy text-warning“></i>
Stony Brook University Dean's List Nomination – Fall 2013, Spring 2014
<i class=”fa-li fa fa-trophy text-warning“></i>
Ajou University Superior Academic Performance Scholarship – [Fall 2009 – Spring 2013]
<i class=”fa-li fa fa-trophy text-warning“></i>
US Army Best Korean Augmentation to the United States Army (KATUSA) of 2012 – May 2012
<i class=”fa-li fa fa-trophy text-warning“></i>
US Army Best Warrior 2012 – April 2012
