Gibberish Detection Using Brown Corpus and NLP Techniques

Today I am going to share a Python script that would enable you to detect gibberish, or unusual Anglo-Saxon words (i.e. English, European languages) using NLP techniques with Python. To give you a little bit of background, Brown corpus is a dictionary that contains 1 million common English words. Despite comprising of only English words,… Continue reading Gibberish Detection Using Brown Corpus and NLP Techniques

Merging Human-In-The-Loop with Trust & Safety – The Best of Both Worlds

From Facebook’s data breach to rising identify theft around the world and strict European GDPR laws, Trust & Safety is becoming a critical part of major global companies that are data-driven. With 2 plus years of experience as a Trust & Safety Data Analyst, I’ve built various data models highly effective at surfacing anomalies among… Continue reading Merging Human-In-The-Loop with Trust & Safety – The Best of Both Worlds

Introduction to Applied Natural Language Processing with Python

Data source: NLP Github page In this blog I am excited to share a simple natural language processing methodology that I had learned very recently. The methodology is called semantic analysis and you can run it using Python’s NLTK (natural language toolkit) package. There are many functions in NLTK package that can achieve semantic analysis. The… Continue reading Introduction to Applied Natural Language Processing with Python

Abusive Behaviors in Software Platforms – Trust & Safety Perspective

Guideline: This article will share some observations I have made about abusive users and contributors in the use of social media sites and crowdsourcing products, and the challenges and opportunities in uncovering these users. I hope that it would shed some lights on data analysts/scientists dealing with the trust & safety side of crowdsourcing platforms,… Continue reading Abusive Behaviors in Software Platforms – Trust & Safety Perspective

Support Vector Machine (Supervised Machine Learning) using R

In this article I will talk about a supervised machine learning tool known as Support Vector Machine (SVM). I chose to write about SVM because it is one of the most commonly used and one of the most easily-implemented machine learning techniques. In addition, its’ methodology bears some similarities to the K-Nearest Neighbor (KNN) supervised… Continue reading Support Vector Machine (Supervised Machine Learning) using R