scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • Understanding data races
  • Passing data to a thread by value
  • Overwriting the copy constructor
  • Passing data using move semantics
  • Move semantics and uniqueness
  1. coding
  2. CPP

Avoiding Data Races

PreviousConcurrencyNextMutex

Last updated 3 years ago

Understanding data races

One of the primary sources of error in concurrent programming are data races. They occur, when two concurrent threads are accessing the same memory location while at least one of them is modifying (the other thread might be reading or modifying). In this scenario, the value at the memory location is completely undefined. Depending on the system scheduler, the second thread will be executed at an unknown point in time and thus see different data at the memory location with each execution. Depending on the type of program, the result might be anything from a crash to a security breach when data is read by a thread that was not meant to be read, such as a user password or other sensitive information. Such an error is called a „data race“ because two threads are racing to get access to a memory location first, with the content at the memory location depending on the result of the race.

The following diagram illustrates the principle: One thread wants to increment a variable x, whereas the other thread wants to print the same variable. Depending on the timing of the program and thus the order of execution, the printed result might change each time the program is executed.

In this example, one safe way of passing data to a thread would be to carefully synchronize the two threads using either join() or the promise-future concept that can guarantee the availability of a result. Data races are always to be avoided. Even if nothing bad seems to happen, they are a bug and should always be treated as such. Another possible solution for the above example would be to make a copy of the original argument and pass the copy to the thread, thereby preventing the data race.

Passing data to a thread by value

In the following example, an instance of the proprietary class Vehicle is created and passed to a thread by value, thus making a copy of it.

Note that the class Vehicle has a default constructor and an initializing constructor. In the main function, when the instances v0 and v1 are created, each constructor is called respectively. Note that v0 is passed by value to a Lambda, which serves as the thread function for std::async. Within the Lambda, the id of the Vehicle object is changed from the default (which is 0) to a new value 2. Note that the thread execution is paused for 500 milliseconds to guarantee that the change is performed well after the main thread has proceeded with its execution.

In the main thread, immediately after starting up the worker thread, the id of v0 is changed to 3. Then, after waiting for the completion of the thread, the vehicle id is printed to the console. In this program, the output will always be the following:

Passing data to a thread in this way is a clean and safe method as there is no danger of a data race - at least when atomic data types such as integers, doubles, chars or booleans are passed.

When passing a complex data structure however, there are sometimes pointer variables hidden within, that point to a (potentially) shared data buffer - which might cause a data race even though the programmer believes that the copied data will effectively preempt this. The next example illustrates this case by adding a new member variable to the Vehicle class, which is a pointer to a string object, as well as the corresponding getter and setter functions.

The output of the program looks like this:

The basic program structure is mostly identical to the previous example with the object v0 being copied by value when passed to the thread function. This time however, even though a copy has been made, the original object v0 is modified, when the thread function sets the new name. This happens because the member _name is a pointer to a string and after copying, even though the pointer variable has been duplicated, it still points to the same location as its value (i.e. the memory location) has not changed. Note that when the delay is removed in the thread function, the console output varies between "Vehicle 2" and "Vehicle 3", depending on the system scheduler. Such an error might go unnoticed for a long time. It could show itself well after a program has been shipped to the client - which is what makes this error type so treacherous.

Classes from the standard template library usually implement a deep copy behavior by default (such as std::vector). When dealing with proprietary data types, this is not guaranteed. The only safe way to tell whether a data structure can be safely passed is by looking at its implementation: Does it contain only atomic data types or are there pointers somewhere? If this is the case, does the data structure implement the copy constructor (and the assignment operator) correctly? Also, if the data structure under scrutiny contains sub-objects, their respective implementation has to be analyzed as well to ensure that deep copies are made everywhere.

Unfortunately, one of the primary concepts of object-oriented programming - information hiding - often prevents us from looking at the implementation details of a class - we can only see the interface, which does not tell us what we need to know to make sure that an object of the class may be safely passed by value.

Overwriting the copy constructor

The problem with passing a proprietary class is that the standard copy constructor makes a 1:1 copy of all data members, including pointers to objects. This behavior is also referred to as "shallow copy". In the above example we would have liked (and maybe expected) a "deep copy" of the object though, i.e. a copy of the data to which the pointer refers. A solution to this problem is to create a proprietary copy constructor in the class Vehicle. The following piece of code overwrites the default copy constructor and can be modified to make a customized copy of the data members.

Expanding on the code example from above, please implement the code required for a deep copy so that the program always prints "Vehicle 3" to the console, regardless of the delay within the thread function.

Passing data using move semantics

Even though a customized copy constructor can help us to avoid data races, it is also time (and memory) consuming. In the following, we will use move semantics to implement a more effective way of safely passing data to a thread.

A move constructor enables the resources owned by an rvalue object to be moved into an lvalue without physically copying it. Rvalue references support the implementation of move semantics, which enables the programmer to write code that transfers resources (such as dynamically allocated memory) from one object to another.

To make use of move semantics, we need to provide a move constructor (and optionally a move assignment operator). Copy and assignment operations whose sources are rvalues automatically take advantage of move semantics. Unlike the default copy constructor however, the compiler does not provide a default move constructor.

To define a move constructor for a C++ class, the following steps are required:

  1. Define an empty constructor method that takes an rvalue reference to the class type as its parameter

  2. In the move constructor, assign the class data members from the source object to the object that is being constructed

  3. Assign the data members of the source object to default values.

When launching the thread, the Vehicle object v0 can be passed using std::move() - which calls the move constructor and invalidates the original object v0 in the main thread.

Move semantics and uniqueness

As with the above-mentioned copy constructor, passing by value is usually safe - provided that a deep copy is made of all the data structures within the object that is to be passed. With move semantics, we can additionally use the notion of uniqueness to prevent data races by default. In the following example, a unique_pointer instead of a raw pointer is used for the string member in the Vehicle class.

As can be seen, the std::string has now been changed to a unique pointer, which means that only a single reference to the memory location it points to is allowed. Accordingly, the move constructor transfers the unique pointer to the worker by using std::move and thus invalidates the pointer in the main thread. When calling v0.getName(), an exception is thrown, making it clear to the programmer that accessing the data at this point is not permissible - which is the whole point of using a unique pointer here as a data race will now be effectively prevented.

The point of this example has been to illustrate that move semantics on its own is not enough to avoid data races. The key to thread safety is to use move semantics in conjunction with uniqueness. It is the responsibility of the programmer to ensure that pointers to objects that are moved between threads are unique.

👨‍💻
Image
Image
Image