scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • Notes from SharpestMinds
  • Links
  • Technical topics
  • Applying & interviewing
  • Good writing
  • Projects list
  • All projects
  1. data science

Resources

PreviousUntitledNextTensorflow C++

Last updated 4 years ago

Notes from

Links

A list of links we find useful, divided up by categories. If you want to suggest one, message an admin on Slack!

Technical topics

Tutorials

  • Proper folder structure for a data science project: (⭐️⭐️⭐️⭐️⭐️)

  • Web scraping: (⭐️⭐️⭐️⭐️⭐️)

  • Data pipeline tools:

  • Training neural networks: (⭐️⭐️⭐️⭐️⭐️)

  • Google Cloud Platform (GCP):

  • Great visual tutorial on statistics: (⭐️⭐️⭐️⭐️⭐️)

  • Highly recommended set of math and stats videos, which make great interview prep:

  • Excellent conceptual treatment of data science and ML (textbook): (⭐️⭐️⭐️⭐️⭐️)

  • A good resource for learning Python:

  • Online data science Masters' (free):

  • Jupyter notebooks for everything:

  • On the NLP side, here are a bunch of great links for multiclass classification with BERT:

SQL

GANs

Reinforcement learning

Cheat sheets

Datasets

Tools

Applying & interviewing

Application strategy

Interview questions

Good writing

Projects list

A partial list of the projects SharpestMinds students have built or are building. To be used for inspiration!

This list is under construction 🏗

If you'd like to update a project or add yours, let an admin know on Slack, or submit a pull request.

All projects

The original GAN tutorial by Ian Goodfellow is a great one for an intro:

The tutorials by Thomas Simonini are very good and free: The RL book by Richard Sutton is a most definite read if you want to be serious in study:

Big list of data science cheat sheets:

Deep learning cheat sheet:

Data visualization cheat sheet:

Not exactly a cheat sheet, but a lot of mentees strongly recommend Anki for helping them remember important concepts:

Mentor kindly created these flash cards for machine learning interview prep:

Another good set of ML flash cards:

Searchable list of datasets for machine learning:

List of datasets to practice NLP:

GitHub repo of datasets for NLP:

Google Dataset Search: (⭐️⭐️⭐️⭐️⭐️)

Tool to quickly label your data: (⭐️⭐️⭐️⭐️⭐️)

An amazing company that turns any website into an API so you don't need to scrape it (mostly): (⭐️⭐️⭐️⭐️⭐️)

Data version control, like git but for datasets and models:

Build autoscaling AWS infrastructure with visual diagrams:

This is a great Medium publication, fully devoted to crushing the ML/AI application and interview process:

Colorful description of the strategy one person took to find a job in AI:

Must-read if you're trying to get a job at a FAANG or similar big company: (⭐️⭐️⭐️⭐️⭐️)

A fantastic strategic document for the technical interview process. Not ML-specific, but most of its tips will still apply: (⭐️⭐️⭐️⭐️⭐️)

Comprehensive instructions on how to prep for technical interviews in ML: (⭐️⭐️⭐️⭐️⭐️)

A Google Doc by our very own listing interview questions in all kinds of different topics: (⭐️⭐️⭐️⭐️⭐️)

List of ML interview questions:

Google AI interview questions:

How to bomb an interview and still get the job: (⭐️⭐️⭐️⭐️⭐️)

Massive list of top notch interview questions: (⭐️⭐️⭐️⭐️⭐️)

Not interview questions per se, but an article on topics that very often come up in interviews. Strongly recommended by our mentors: (⭐️⭐️⭐️⭐️⭐️)

Incredibly video explanations of 70 popular interview questions, by a former Google SWE. Highly recommended by our mentees: (⭐️⭐️⭐️⭐️⭐️) (Not free 💰)

Blog post full of interview questions:

Probably the best blog post on the Internet on how to write well (60 second read): (⭐️⭐️⭐️⭐️⭐️)

How to write a good cold outreach email when you're trying to get hired, by our very own : (⭐️⭐️⭐️⭐️⭐️)

Grammarly, which automatically checks your writing:

Sapling.ai, better than Grammarly and especially good for email: (⭐️⭐️⭐️⭐️⭐️)

I'm working on this data science project, to parse out ingredients from user taken images of food labels and provide wikipedia links, with a Machine Learning Engineer at American Express. I had this thought of further implementing some NLP and recommendation systems to provide alternate options for products for people with allergies. ( with mentor )

Designed and deployed a web application that allows users to upload an MP3 file and see a prediction of their age range. The app uses a model I built for Neurolex Labs. Used React, Flask, Docker, and AWS. ( with mentor )

Working with my mentor on a signal processing model that helps catch early signs of autism in young children and a multi-modal emotion classifier on the MELD dataset. ( with mentor )

Building a personalized food recommendation system by analyzing user genome data & nutrients in different food products. Combining data from various sources and using machine learning to process genome data to recommend healthy products. ( with mentor )

Used gensim and nltk libraries for topic modeling on Twitter data. Working on getting the twitter data on to MongoDB and then processing/ modeling the data. Finally, to bring a model to a production level deployable model through Dockers. ( with mentor )

Working on a production-grade project to predict clustering of Bird’s electric scooter geolocations based on city features under the mentorship of Susan Holcomb, former Head of Data at Pebble. ( with mentor )

Creating new datasets to classify bots using Twitter API. ( with mentor )

Data preprocessing, model building and evaluation, for an NLP-focused project. Working under the supervision of Arman Didandeh, Manager: Technology – Digital Integration (DSpace Innovation Lab) at Deloitte. ( with mentor )

Completing an exploratory analysis of the U.S. feature film industry under the direct mentorship of Betty Zhang, Data Scientist at Rubikloud Technologies. Collected an original dataset of over 2,000 films released between 2010-2018 through web scraping of sites likeBox Office Mojo, Wikipedia, and the Online Movie Database API. Using SQL and Python to analyze the differences of film profitability across multiple genres, time-series analysis of films’ domestic box offices, and the impact of certain key personnel (actors/directors) on various success metrics. ( with mentor )

Working on a production-grade project to predict calls to the fire department and their response time under the mentorship of Larkin Liu, Data Scientist at Loblaw Digital. Collected over 4.5 million call records received by the San Francisco Fire Department and currently integrating this with web scraped historical and live weather data. ( with mentor )

Researches Automatic Term Extraction methods that are effective in indexing highly unusual, technical documents. Implements an automatic term extraction method using word vectors trained on the local corpus as well as a global corpus, using Python in conjunction with gensim, nltk, Keras, and TensorFlow. ( with mentor )

Acquired proprietary datasets by networking with City of Boulder Open Data team. Built an FAQ Chatbot with custom NLP and ML systems, cleaning and reformatting data. Deployed ChatBot on Google Cloud with Docker, Flask, and Dialogflow. ( with mentor )

Under the guidance of Larkin Liu (Data Scientist at Loblaw Digital), established the ability to design, deploy, and fine tune a machine learning model to predict ClickThrough Rate with the goal of improving conversions. Demonstrated the ability to refactor existing codebases and deploy them to the cloud reducing compute cost by 1/50. Exhibited the ability to engineer features and tune hyperparameters based on data analysis and statistical tests to improve the accuracy of predicting clicks by 2%. ( with mentor )

SharpestMinds
https://drivendata.github.io/cookiecutter-data-science/
https://automatetheboringstuff.com/chapter11/
https://data-flair.training/blogs/
https://karpathy.github.io/2019/04/25/recipe/
https://www.coursera.org/learn/gcp-big-data-ml-fundamentals
http://seeing-theory.brown.edu
https://www.youtube.com/channel/UCUcpVoi5KkJmnE3bvEhHR0Q
http://www-bcf.usc.edu/~gareth/ISL/
https://greenteapress.com/wp/think-python/
https://github.com/datasciencemasters/go
https://github.com/donnemartin/data-science-ipython-notebooks
https://www.kaggle.com/thebrownviking20/bert-multiclass-classification
https://github.com/karta282950/bert-multiclass
https://towardsdatascience.com/building-a-multi-label-text-classifier-using-bert-and-tensorflow-f188e0ecdc5d
https://medium.com/huggingface/multi-label-text-classification-using-bert-the-mighty-transformer-69714fa3fb3d
https://websitesetup.org/sql-cheat-sheet/
https://sqlzoo.net
https://mode.com/sql-tutorial/
https://www.codewars.com/?language=sql
https://confident-goldberg-55d959.netlify.com/
https://www.youtube.com/watch?v=HGYYEUSm-0Q
https://simoninithomas.github.io/Deep_reinforcement_learning_Course/
http://incompleteideas.net/book/RLbook2018.pdf
https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-science-pdf-f22dc900d2d7
https://github.com/afshinea/stanford-cs-230-deep-learning
https://datavizcatalogue.com
https://apps.ankiweb.net
Ray Phan
https://drive.google.com/file/d/12PImST4J6gJr9cjx-P2QPze7dgD3OCR_/view?usp=sharing
https://drive.google.com/file/d/1aRHhw-uOj5CDFVsHlG4lEBQd6X2yUe7g/view?usp=sharing
https://www.datasetlist.com
https://machinelearningmastery.com/datasets-natural-language-processing/
https://github.com/niderhoff/nlp-datasets
https://toolbox.google.com/datasetsearch
https://labelbox.com
https://dashblock.com
https://dvc.org/
https://cloudcraft.co
https://medium.com/acing-ai
https://blog.usejournal.com/what-i-learned-from-interviewing-at-multiple-ai-companies-and-start-ups-a9620415e4cc
https://blog.usejournal.com/i-interviewed-at-six-top-companies-in-silicon-valley-in-six-days-and-stumbled-into-six-job-offers-fe9cc7bbc996
https://yangshun.github.io/tech-interview-handbook/introduction/
https://github.com/ShuaiW/data-science-question-answer
Amber Teng
https://docs.google.com/document/d/1xmb5CLm4CZarhpThgB6PCn8vKS2UzcNsAticWxXBZ94
https://www.springboard.com/blog/machine-learning-interview-questions/
https://medium.com/acing-ai/google-ai-interview-questions-acing-the-ai-interview-1791ad7dc3ae
https://datasciencecareermap.com/2019/06/06/how-to-bomb-an-interview-and-still-get-the-job/
https://www.amazon.com/Heard-Data-Science-Interviews-Interview/dp/1727287320
https://towardsdatascience.com/lessons-from-how-to-lie-with-statistics-57060c0d2f19
https://www.algoexpert.io/product
https://blog-datasciencedojo-com.cdn.ampproject.org/c/s/blog.datasciencedojo.com/data-science-interview-questions/amp/
https://dilbertblog.typepad.com/the_dilbert_blog/2007/06/the_day_you_bec.html
Susan Holcomb
https://datasciencecareermap.com/2019/05/09/how-to-write-a-cold-email/
https://www.grammarly.com
https://sapling.ai
Pranavraj Thilagaraj
(Andy) Kok-Leong Seow
Rajesh Singh
(Andy) Kok-Leong Seow
Dyllan McCreary
Joe Papa
Atishay Jain
Oren Shklarsky
Sahana Adiga
Kiran Mantripragada
Perry Johnson
Susan Holcomb
Ganesh Nomula
Susan Holcomb
Bety E. Rodriguez-Milla
Arman Didandeh
Will Barker
Betty Zhang
Shashank Badavanahalli Rajashekar
Larkin Liu
Alec Robinson
Ehsan Amjadian
Will Scott
Sowmya Vajjala
Pierre Damiba
Larkin Liu
MIT Deep Learning 6.S191MIT Deep Learning 6.S191
Logopix2pix: Image-to-image translation with a conditional GAN  |  TensorFlow CoreTensorFlow