Resources

Notes from SharpestMinds

A list of links we find useful, divided up by categories. If you want to suggest one, message an admin on Slack!

Technical topics

Tutorials

SQL

GANs

The original GAN tutorial by Ian Goodfellow is a great one for an intro: https://www.youtube.com/watch?v=HGYYEUSm-0Q

Reinforcement learning

The tutorials by Thomas Simonini are very good and free: https://simoninithomas.github.io/Deep_reinforcement_learning_Course/ The RL book by Richard Sutton is a most definite read if you want to be serious in study: http://incompleteideas.net/book/RLbook2018.pdf

Cheat sheets

Datasets

Tools

Applying & interviewing

Application strategy

Interview questions

Good writing

Projects list

A partial list of the projects SharpestMinds students have built or are building. To be used for inspiration!

This list is under construction 🏗

If you'd like to update a project or add yours, let an admin know on Slack, or submit a pull request.

All projects

  • I'm working on this data science project, to parse out ingredients from user taken images of food labels and provide wikipedia links, with a Machine Learning Engineer at American Express. I had this thought of further implementing some NLP and recommendation systems to provide alternate options for products for people with allergies. (Pranavraj Thilagaraj with mentor (Andy) Kok-Leong Seow)

  • Designed and deployed a web application that allows users to upload an MP3 file and see a prediction of their age range. The app uses a model I built for Neurolex Labs. Used React, Flask, Docker, and AWS. (Rajesh Singh with mentor (Andy) Kok-Leong Seow)

  • Working with my mentor on a signal processing model that helps catch early signs of autism in young children and a multi-modal emotion classifier on the MELD dataset. (Dyllan McCreary with mentor Joe Papa)

  • Building a personalized food recommendation system by analyzing user genome data & nutrients in different food products. Combining data from various sources and using machine learning to process genome data to recommend healthy products. (Atishay Jain with mentor Oren Shklarsky)

  • Used gensim and nltk libraries for topic modeling on Twitter data. Working on getting the twitter data on to MongoDB and then processing/ modeling the data. Finally, to bring a model to a production level deployable model through Dockers. (Sahana Adiga with mentor Kiran Mantripragada)

  • Working on a production-grade project to predict clustering of Bird’s electric scooter geolocations based on city features under the mentorship of Susan Holcomb, former Head of Data at Pebble. (Perry Johnson with mentor Susan Holcomb)

  • Creating new datasets to classify bots using Twitter API. (Ganesh Nomula with mentor Susan Holcomb)

  • Data preprocessing, model building and evaluation, for an NLP-focused project. Working under the supervision of Arman Didandeh, Manager: Technology – Digital Integration (DSpace Innovation Lab) at Deloitte. (Bety E. Rodriguez-Milla with mentor Arman Didandeh)

  • Completing an exploratory analysis of the U.S. feature film industry under the direct mentorship of Betty Zhang, Data Scientist at Rubikloud Technologies. Collected an original dataset of over 2,000 films released between 2010-2018 through web scraping of sites likeBox Office Mojo, Wikipedia, and the Online Movie Database API. Using SQL and Python to analyze the differences of film profitability across multiple genres, time-series analysis of films’ domestic box offices, and the impact of certain key personnel (actors/directors) on various success metrics. (Will Barker with mentor Betty Zhang)

  • Working on a production-grade project to predict calls to the fire department and their response time under the mentorship of Larkin Liu, Data Scientist at Loblaw Digital. Collected over 4.5 million call records received by the San Francisco Fire Department and currently integrating this with web scraped historical and live weather data. (Shashank Badavanahalli Rajashekar with mentor Larkin Liu)

  • Researches Automatic Term Extraction methods that are effective in indexing highly unusual, technical documents. Implements an automatic term extraction method using word vectors trained on the local corpus as well as a global corpus, using Python in conjunction with gensim, nltk, Keras, and TensorFlow. (Alec Robinson with mentor Ehsan Amjadian)

  • Acquired proprietary datasets by networking with City of Boulder Open Data team. Built an FAQ Chatbot with custom NLP and ML systems, cleaning and reformatting data. Deployed ChatBot on Google Cloud with Docker, Flask, and Dialogflow. (Will Scott with mentor Sowmya Vajjala)

  • Under the guidance of Larkin Liu (Data Scientist at Loblaw Digital), established the ability to design, deploy, and fine tune a machine learning model to predict ClickThrough Rate with the goal of improving conversions. Demonstrated the ability to refactor existing codebases and deploy them to the cloud reducing compute cost by 1/50. Exhibited the ability to engineer features and tune hyperparameters based on data analysis and statistical tests to improve the accuracy of predicting clicks by 2%. (Pierre Damiba with mentor Larkin Liu)

Last updated