Resources
Last updated
Last updated
A list of links we find useful, divided up by categories. If you want to suggest one, message an admin on Slack!
Proper folder structure for a data science project: (⭐️⭐️⭐️⭐️⭐️)
Web scraping: (⭐️⭐️⭐️⭐️⭐️)
Data pipeline tools:
Training neural networks: (⭐️⭐️⭐️⭐️⭐️)
Google Cloud Platform (GCP):
Great visual tutorial on statistics: (⭐️⭐️⭐️⭐️⭐️)
Highly recommended set of math and stats videos, which make great interview prep:
Excellent conceptual treatment of data science and ML (textbook): (⭐️⭐️⭐️⭐️⭐️)
A good resource for learning Python:
Online data science Masters' (free):
Jupyter notebooks for everything:
On the NLP side, here are a bunch of great links for multiclass classification with BERT:
SQL
GANs
Reinforcement learning
A partial list of the projects SharpestMinds students have built or are building. To be used for inspiration!
This list is under construction 🏗
If you'd like to update a project or add yours, let an admin know on Slack, or submit a pull request.
The original GAN tutorial by Ian Goodfellow is a great one for an intro:
The tutorials by Thomas Simonini are very good and free: The RL book by Richard Sutton is a most definite read if you want to be serious in study:
Big list of data science cheat sheets:
Deep learning cheat sheet:
Data visualization cheat sheet:
Not exactly a cheat sheet, but a lot of mentees strongly recommend Anki for helping them remember important concepts:
Mentor kindly created these flash cards for machine learning interview prep:
Another good set of ML flash cards:
Searchable list of datasets for machine learning:
List of datasets to practice NLP:
GitHub repo of datasets for NLP:
Google Dataset Search: (⭐️⭐️⭐️⭐️⭐️)
Tool to quickly label your data: (⭐️⭐️⭐️⭐️⭐️)
An amazing company that turns any website into an API so you don't need to scrape it (mostly): (⭐️⭐️⭐️⭐️⭐️)
Data version control, like git but for datasets and models:
Build autoscaling AWS infrastructure with visual diagrams:
This is a great Medium publication, fully devoted to crushing the ML/AI application and interview process:
Colorful description of the strategy one person took to find a job in AI:
Must-read if you're trying to get a job at a FAANG or similar big company: (⭐️⭐️⭐️⭐️⭐️)
A fantastic strategic document for the technical interview process. Not ML-specific, but most of its tips will still apply: (⭐️⭐️⭐️⭐️⭐️)
Comprehensive instructions on how to prep for technical interviews in ML: (⭐️⭐️⭐️⭐️⭐️)
A Google Doc by our very own listing interview questions in all kinds of different topics: (⭐️⭐️⭐️⭐️⭐️)
List of ML interview questions:
Google AI interview questions:
How to bomb an interview and still get the job: (⭐️⭐️⭐️⭐️⭐️)
Massive list of top notch interview questions: (⭐️⭐️⭐️⭐️⭐️)
Not interview questions per se, but an article on topics that very often come up in interviews. Strongly recommended by our mentors: (⭐️⭐️⭐️⭐️⭐️)
Incredibly video explanations of 70 popular interview questions, by a former Google SWE. Highly recommended by our mentees: (⭐️⭐️⭐️⭐️⭐️) (Not free 💰)
Blog post full of interview questions:
Probably the best blog post on the Internet on how to write well (60 second read): (⭐️⭐️⭐️⭐️⭐️)
How to write a good cold outreach email when you're trying to get hired, by our very own : (⭐️⭐️⭐️⭐️⭐️)
Grammarly, which automatically checks your writing:
Sapling.ai, better than Grammarly and especially good for email: (⭐️⭐️⭐️⭐️⭐️)
I'm working on this data science project, to parse out ingredients from user taken images of food labels and provide wikipedia links, with a Machine Learning Engineer at American Express. I had this thought of further implementing some NLP and recommendation systems to provide alternate options for products for people with allergies. ( with mentor )
Designed and deployed a web application that allows users to upload an MP3 file and see a prediction of their age range. The app uses a model I built for Neurolex Labs. Used React, Flask, Docker, and AWS. ( with mentor )
Working with my mentor on a signal processing model that helps catch early signs of autism in young children and a multi-modal emotion classifier on the MELD dataset. ( with mentor )
Building a personalized food recommendation system by analyzing user genome data & nutrients in different food products. Combining data from various sources and using machine learning to process genome data to recommend healthy products. ( with mentor )
Used gensim and nltk libraries for topic modeling on Twitter data. Working on getting the twitter data on to MongoDB and then processing/ modeling the data. Finally, to bring a model to a production level deployable model through Dockers. ( with mentor )
Working on a production-grade project to predict clustering of Bird’s electric scooter geolocations based on city features under the mentorship of Susan Holcomb, former Head of Data at Pebble. ( with mentor )
Creating new datasets to classify bots using Twitter API. ( with mentor )
Data preprocessing, model building and evaluation, for an NLP-focused project. Working under the supervision of Arman Didandeh, Manager: Technology – Digital Integration (DSpace Innovation Lab) at Deloitte. ( with mentor )
Completing an exploratory analysis of the U.S. feature film industry under the direct mentorship of Betty Zhang, Data Scientist at Rubikloud Technologies. Collected an original dataset of over 2,000 films released between 2010-2018 through web scraping of sites likeBox Office Mojo, Wikipedia, and the Online Movie Database API. Using SQL and Python to analyze the differences of film profitability across multiple genres, time-series analysis of films’ domestic box offices, and the impact of certain key personnel (actors/directors) on various success metrics. ( with mentor )
Working on a production-grade project to predict calls to the fire department and their response time under the mentorship of Larkin Liu, Data Scientist at Loblaw Digital. Collected over 4.5 million call records received by the San Francisco Fire Department and currently integrating this with web scraped historical and live weather data. ( with mentor )
Researches Automatic Term Extraction methods that are effective in indexing highly unusual, technical documents. Implements an automatic term extraction method using word vectors trained on the local corpus as well as a global corpus, using Python in conjunction with gensim, nltk, Keras, and TensorFlow. ( with mentor )
Acquired proprietary datasets by networking with City of Boulder Open Data team. Built an FAQ Chatbot with custom NLP and ML systems, cleaning and reformatting data. Deployed ChatBot on Google Cloud with Docker, Flask, and Dialogflow. ( with mentor )
Under the guidance of Larkin Liu (Data Scientist at Loblaw Digital), established the ability to design, deploy, and fine tune a machine learning model to predict ClickThrough Rate with the goal of improving conversions. Demonstrated the ability to refactor existing codebases and deploy them to the cloud reducing compute cost by 1/50. Exhibited the ability to engineer features and tune hyperparameters based on data analysis and statistical tests to improve the accuracy of predicting clicks by 2%. ( with mentor )