Multicore Computing: Cores, Threads, And Parallelism

Data Parallelism

Why Multicore?

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
  • In the multithreading environment, achieving high level performance is the main reason to move into the multicore.
  • To overcome the complexity of the software development, it should be managed by the specialized knowledge with training.
  • Managing the hardware parallelism is the main issue in achieving scalable performance. SO the multi-core platform alleviates the complexity and eliminates the burdens in the manual thread management process.

Cores and Threads

  • Core is an essential part of the computer processor. It helps to execute the instructions.
  • There are two different type of threads. A software thread represents a stream of instructions which are given to the processor for execution. The hardware thread represents the hardware resources such as CPU. It is used to execute the software thread.

Data Parallelism

In data parallelism, the specified operation needs to be applied on each element of the given data set. This operation can be performed in parallel. 

Same data set is taken for various tasks which are independent each other.  

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

The pipeline execution can be done on both task and data parallelism. A stream of data is taken and multiple independent tasks are applied on it. Throughout several stages, the data element is passed and we can pass multiple data set items on different stages at the same time. 

To improve the proportion of the speedup to the total number of processors, the parallel system’s capacity is measured and it is known as the scalability.

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the serial fraction of the problem.

If the problem of size W has a serial component Ws, the Speedup of the program is

Assume that Ws=20% and W-Ws=80% then, 

Thus the Amdahl’s Law implies that the parallel computing is only useful when the total number of processors is very small such as 5.

It helps to overcome the insufficient problem. Because, it parallelized with efficient speedup. 

Where p represents the total number of processors and α represents the serial portion of the problem. It overcomes the larger problem sizes by improved speedup.

Execution time of program on a parallel computer is (a+b)

  • a is the sequential time and b is the parallel time
  • Total amount of work to be done in parallel varies linearly

with the number of processors. So b is fixed as p is varied. The total run time is (a + p*b)

  • The speedup is (a+p*b)/(a+b)
  • Define α = a/(a+b) , the sequential fraction of the execution

time, then 

A scalable parallel system can always be made cost?optimal by adjusting the number of processors and the problem size.

Incorrectness is not a big issue in sequential program. But in parallel programming, the precise order of operations potentially be different. So it leads to non-deterministic behavior of the result such as round off errors, deadlocks and race conditions.

A mechanism which is used to control the concurrent accessibility of resources.

Race conditions

While many tasks read from and write to same memory address, the race condition will be occurred. The main reason for race condition is improper synchronization.

MPI

MPI represents the Message Passing Interface which is mainly used for parallel computing. MPI is the combination of protocol and semantic features.

Broadcast

A data can be sent to all processes through communicator while it was holding on one process.

Task Parallelism

Eg:

void Get input(

int my_rank,

int comm_sz,

double* a_p,

double* b_p,

int* n_p ) {

if (my_rank == 0) { /* Process 0 reads an user input and sends

printf(“Enter a, b, and nn”); it to all other processes */

scanf(“%lf %lf %d”, a p, b p, n p); }

MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

MPI_Bcast(n_p, 1, MPI_INT, 0, MPI_COMM_WORLD); } 

Tags and Wildcards

  • MPI_ANY_SOURCE – It matches any sources with the receiver at end. It helps to avoid unnecessary waiting.
  • MPI_ANY_TAG – It matches any tag of the sender to the end receiver

MPI_Reduce

The lengthy code can be placed by a single invocation. The syntax is given below.

MPI Reduce(&local_int, &total_int, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

Example:

int MPI Reduce(

void* input_data_p /* in */,

void* output_data_p /* out */,

int count /* in */,

MPI_Datatype datatype /* in */,

MPI_Op operator /* in */,

int dest process /* in */,

MPI_Comm comm /* in */

);

MPI_Scatter

This functions reads the entire process but delivers the exact required components of each processes. The syntax is given below.

int MPI Scatter(

void* send_buf_p, /* pointer to data to divide */

int send_count, /* number of elements */

MPI_Datatype send_type,

void* recv_buf p, /*pointer to a local vector */

int recv_count, /* local_n (size of local vector*/

MPI_Datatype recv_type,

int src_proc, /* source of the data */

MPI_Comm comm ); 

MPI_gather

Scattering the vector is not sufficient for MPI communication. Gathering the results also necessary and it can be acheived by using MPI_Gather. The syntax is given below.

int MPI_Gather(

void* send_buf_p , /* data to send */

int send_count ,

MPI_Datatype send_type,

void* recv_buf_p, /* data to receive/gather */

int recv count,

MPI_Datatype recv_type,

int dest_proc,

MPI_Comm comm );

Data consolidation

From the multiple data sources, data is integrated, put together in a single source is called data consolidation.

MPI_Barrier

The barrier represents the synchronization point in the program. The will be suspended until all threads in a parallel region reach the barrier. The suspended threads will be resumed after reaching the barrier point. The main purpose is to improve the correctness of the program by reducing the data races.

OpenMP

Main directives

Directives used with end directive pair. The syntax is given below.

!$OMP  directive

    [ structured block of code ]

!$OMP end  directive

The format of c/c++ directives format is given below.

Example: #pragma omp parallel

Task based programming

While the thread catches the construction of task, a new task will be generated from the instruction code within a structured block. Based on the data sharing attributes, the data environment is created. After encountering each threads, the execution will be started. The syntax is given below.

Scalability

#pragma omp task [clause, clause, …]

structured-block

Mutual Exclusion

It is a mechanism for maintaining the concurrent process of threads while accessing the same resources.

Example:

P0: flag[0] = true;     | P1: flag[1] = true;

turn = 1;               | turn = 0;

while (flag[1] == true  | while (flag[0] == true

&& turn == 1)           | && turn==0)

{ // busy wait }        | {//busy wait}

// critical section … | //critical section

// end of critical section | //end of critical section

flag[0] = false;           | flag[1]==false;

P0 and P1 can’t be at the same time in critical section!

Locks

To control the accessibility of the threads ‘Locking’ is preferred. To get an exclusive accessibility to a variable in the data structure, the lock is required. These locks are necessary for ensuring the correct behavior of the multiple-threads and programs.

Eg:

omp_lock_t writelock;

omp_init_lock(&writelock);

#pragma omp parallel for for ( i = 0; i < x; i++ )

{ // some stuff

omp_set_lock(&writelock);

// one thread at a time stuff

omp_unset_lock(&writelock);

// some stuff }

omp_destroy_lock(&writelock);

Memory Consistency

To obtain better efficiency, the processor can rearrange the memory reads and writes. This rearrangement of memory can cause some problems and it is called as weak memory consistency. The main issues in due to the memory consistency is that the algorithms and programs will become not executable while they running on the weak memory consistency models. To overcome this problem of incorrectness for weak consistency models, the ‘Paterson Algorithm’ is used as given below. Consider P0 and P1 are processes which are concurrently executing.

Parallel for and data dependencies

OpenMP parallelize the for loops but neither do-while nor while loops. The syntax is given below.

#pragma omp parallel for [clauses]

for_statement

// execute for_statement in parallel

The computation of iterations will be based on the previous iterations. It is known as data dependencies or loop carried dependencies.

Example:

fibo[0] = fibo[1] = 1;

# pragma omp parallel for num_threads(thread_count)

for (i = 2; i < n; i++)

fibo[i] = fibo[i-1] + fibo[i-2];

Parallel sorting

In computation process, sorting plays an important role in classical tasks. It rearranges the numerical value in a sequential order.

Hybrid Programming:MPI+OpenMP

In MPI, the distributed memory systems, resources are not shared in an optimal way and the communication is expensive. To overcome these issues we go for hybrid MPI+OpenMP. It improves the performance in various aspects such as grain size and communication cost. 

Amdahl’s Law

Techniques for performance improvement

To improve the performance of parallel programs the following aspects need to be considered.

  1. Degree of parallelization available
  2. Grain size
  3. Locality

Generally, the grain size represents the size of work given to thread, process or processor. The large data packet is split into many small chunks of data. These count of chunks is called as grain size. The performance of parallel system depends on the parallel programs. So that the size of data chunks will affect the parallel performance.

Two different types of locality available which are given below.

  1. Temporal locality

When a processor accesses the memory location of a variable, it can revisit the memory location quickly and without consuming more timing.

  1. Data locality

When a processor accesses, the memory location, it can visit the nearby location immediately.

The locality helps in the execution of loops indexing through arrays. The locality makes the cache memory very useful.

Parallel overhead

Parallel overhead is an issue and it will occur in parallel execution because of bad locality of non-local interfering writes and reads. This issue can be overcome by reduction of parallel overhead in three ways which are given below.

  1. Loop scheduling

To improve the performance it is necessary to use different scheduling on various test cases.

  1. Conditionally executing in parallel

It helps to reduce parallelism overhead for small values of arguments.

  1. Replicating work

To make the thread interaction faster, the threads need to replicate work. It is faster than barrier synchronization.

Loop transformations

 In the parallelizable transformations, loop constructions are very essential. To get better performance, in multicore architectures, the loops need to be transformed. Types of loop transformations are listed below.

  • Loop fission
  • Loop fusion
  • Loop inversion

For MPI programs debugging process, serial debuggers are used. To separate processes, the serial debuggers are used as the attachment of gdb. In parallelized process other debuggers are used such as ‘TotalView’.

gdb allows the programmer to see what is going on inside another program when its execution. By using GDB, we can run the Microsoft windows and Unix variants. Basic gdb commands are given below.

  • run -To execute the program
  • Ctrl-C -To stop execution
  • continue -To continue execution
  • list -It shows where the program stopped
  • quit It quits debugger 

VTune for profiling parallel programs

The performance for Parallel Programs van be depends up on various factors and they are listed below.

  • Available hardware and software resources;
  • Communication cost;
  • Parallel overhead related to threads control;
  • Optimizations;

Advantages and limitations

It has multi-core processing units and thousands of parallel execution units on a single card. It has faster memory interfaces; but it has some limitations on the parallelism operations such as data parallelism. Because the same instructions were applied on same data set. 

In GPU execution model, GPU executes the kernel function. In this model an array of threads executed with same code but on different paths. Each thread has an unique ID for controlling decisions. These threads are grouped into blocks and these blockes are grouped together into a grid.

Mutual Exclusion

It is used with specialized library functions, compiler directives such as OpenACC and specialized languages such as CUDA, language extensions and OpenCL. 

OpenACC

OpenACC represents the open using accelerators including GPUs. The OpenACC uses directives in C, C++ code for helping the compiler to offload the chosen computations into accelerators. The OpenACC programming is same as OpenMP programming.

Example:

void saxpy(int n, float a, float *x, float *y)

{

#pragma acc kernels

for (int i = 0; i < n; ++i)

y[i] = a*x[i] + y[i];

}

CUDA

CUDA is a parallel computing platform and this programming model was proposed by NVIDIA Company. The NVIDIA CUDA Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit: a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing;

The CUDA programming model consists of both the CPU and GPU are used; in this model, the host represents the CPU and its memory; The device represents the GPU and its memory;

The code can run on the host can manage memory on both the host and device. The code runs on the host launches kernels which are functions executed on the device. These kernels are executed by many GPU threads in parallel.

Example:

for (int i = 0; i < N; i++) //initialization of host arrays

{ x[i] = 1.0f; y[i] = 2.0f; }

cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);

cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);

//copying the content of host arrays to device arrays 

Parallelization of prefix operations

Assume that we have x0, x1, x2,…. a sequence of integers (in an array); We need to compute

  • y0 = x0
  • y1 = x0 + x1
  • y2 = x0 + x1+ x2

Here we need to use Parallel algorithm for prefix sum

  1. Compute the sums of consecutive pairs

    z0 = x0 + x1, z1 = x2 + x3, etc

  1. Compute prefix sum of z0, z1, z2…:

    w0 = z0 = x0 + x1

    w1 =z0 + z1 = x0 + x1 + x2 + x3

  1. Compute required prefix sum from x and w:

    y0 = x0, y1 = w0, y2 = w0 + x2, y3 = w1, …

  1. Compute the sums of consecutive pairs

    z0 = x0 + x1, z1 = x2 + x3, etc

  1. Compute prefix sum of z0, z1, z2…:

    w0 = z0 = x0 + x1

    w1 =z0 + z1 = x0 + x1 + x2 + x3

  1. Compute required prefix sum from x and w:

    y0 = x0, y1 = w0, y2 = w0 + x2, y3 = w1, .

  1. Apply the recursive algorithm to find the z sequence in parallel 

It is the ability to run the parallel program sequentially. It is possible in parallel as well. So that it is called as relaxed sequential. To get appropriate result, without parallel execution process, we go for RSE model. In RSE model debugging and verification process can be done easily.

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Services offered

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
Place an Order Start Chat Now
image

Order your essay today and save 30% with the discount code ESSAYHELP