Capstone Project Notes

Fake News

Examples

https://github.com/davidmasse/freelancer-rates

https://github.com/slieb74/NBA-Shot-Analysis

https://github.com/cpease00/etf_forecasting

https://github.com/NaokoSuga/gentrification_yelp

https://github.com/mrethana/news_bias_final

https://github.com/paulinaczheng/twitter_flu_tracking

Project

lists of online sources
cleaned dataset (9.1GB)
import geoip2.database
import socket

ip = socket.gethostbyname('nike.com')
reader = geoip2.database.Reader('GeoLite2-Country_20190305/GeoLite2-Country.mmdb')
response = reader.country(ip)
response.country.iso_code # Results in 'US'

Workflow

  1. Data Collection

    Collect news articles from a set of credible and non-credible websites. Get training labels from OpenSources, a professionally curated database.

  2. Sampling

    Sample from the corpus in such a way that the training set contains an even number of unique articles from both credible and non-credible sources for each day of data collection.

  3. Classifier

    Build an ensemble classifier that considers the predictions of two separate models: a) "Content-only" model (Multinomial Naive Bayes) b) "Context-only" model (Adaptive Boosting)

Last updated