Hello, and welcome to my website. My name is Matthew Huh, a data science and analytics professional. Web design may not be my forte, but I figure it's the best way to showcase anything and everything that may not fit neatly into my resume.
Over the years, I've worked in geospatial analytics, inventory management, tracking, and sales forecasting with fresh fruit, digital customer experience analytics at Capital One, retail banking at Capital One, and developing fraud models for detecting and remediating fraud attacks in Azure and Xbox at Microsoft. My job skills range from writing and optimizing SQL queries, designing dashboards in Tableau, registering and updating data in Salesforce / Confluence, but my passion lies in using python for data science, automation, and artificial intelligence.
Overview: Energy usage has been an interesting topic in the last century with the rise of automobiles, pollution, and environmental regulations. What this project looks at is a broad overview on what energy consumption and production has looked like from 1946 to 2016, and extrapolate potential trends from existing data.
Skills: Data Exploration & Visualization, linear regression
Tools: Python (numpy, pandas, matplotlib, scipy)
Overview: Kickstarter allows creators to transform ideas into actual products, but a lot of projects fail to meet expectations or get off the ground. What factors are responsible for dictating the success or failure of project, and what should creators do to bolster their odds?
Skills: Data exploration & visualization, feature engineering & selection, machine Learning (logistic regression, random forest, gradient boosting)
Tools: Python (numpy, pandas, matplotlib, seaborn, plotly, scipy, scikit-learn)
Overview: For the most part, people are free to choose what news outlets they read and follow. In the United States, there is a near-endless list of sites that people can choose from in order to get their daily news and over time, they develop preferences for sites that they are more attached to, and do their best to avoid. What I would like to examine in this project is if it is possible to differentiate from several different publications based on their works.
Skills: Natural language processing, clustering (k-means, spectral, & affinity propagation), machine learning (logistic regression, random forest, gradient boosting)
Tools: Python (numpy, pandas, matplotlib, plotly, scipy, scikit-learn, nltk, re, spacy)
Overview: Social media is a treasure trove of textual data, as it allows users all over the world to express themselves and share whatever they want to say, get attention, and even start movements. For this project, I will be evaluating what people think about the choices they have in their travels using publicly available Twitter mentions.
Skill: Tweepy (Twitter API), natural language processing, clustering, machine learning (logistic regression, random forest, gradient boosting, neural networks)
Tools: Python (numpy, pandas, plotly, scipy, scikit-learn, nltk, re, wordcloud)
Overview: Can machine learning be used to predict if a patient will contract cancer based on other health factors; which ones?
Overview: What statistical tests do you need to use to properly implement A/B testing across different sample groups? Let's find out using data from the European Social Survey.
Overview: What does it take to guess the price of a house?
Overview: So, what does the market for renting units on Airbnb look like?
Overview: What types of apps exist in the play store, and how do users perceive most of them?
Overview: Can we determine what types of bank debts are worth pursuing?