Multicore Computing: Cores, Threads, And Parallelism

Data Parallelism

Why Multicore?

In the multithreading environment, achieving high level performance is the main reason to move into the multicore.
To overcome the complexity of the software development, it should be managed by the specialized knowledge with training.
Managing the hardware parallelism is the main issue in achieving scalable performance. SO the multi-core platform alleviates the complexity and eliminates the burdens in the manual thread management process.

Cores and Threads

Core is an essential part of the computer processor. It helps to execute the instructions.
There are two different type of threads. A software thread represents a stream of instructions which are given to the processor for execution. The hardware thread represents the hardware resources such as CPU. It is used to execute the software thread.

Data Parallelism

In data parallelism, the specified operation needs to be applied on each element of the given data set. This operation can be performed in parallel.

Same data set is taken for various tasks which are independent each other.

The pipeline execution can be done on both task and data parallelism. A stream of data is taken and multiple independent tasks are applied on it. Throughout several stages, the data element is passed and we can pass multiple data set items on different stages at the same time.

To improve the proportion of the speedup to the total number of processors, the parallel system’s capacity is measured and it is known as the scalability.

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the serial fraction of the problem.

If the problem of size W has a serial component Ws, the Speedup of the program is

Assume that Ws=20% and W-Ws=80% then,

Thus the Amdahl’s Law implies that the parallel computing is only useful when the total number of processors is very small such as 5.

It helps to overcome the insufficient problem. Because, it parallelized with efficient speedup.

Where p represents the total number of processors and α represents the serial portion of the problem. It overcomes the larger problem sizes by improved speedup.

Execution time of program on a parallel computer is (a+b)

a is the sequential time and b is the parallel time
Total amount of work to be done in parallel varies linearly

with the number of processors. So b is fixed as p is varied. The total run time is (a + p*b)

The speedup is (a+p*b)/(a+b)
Define α = a/(a+b) , the sequential fraction of the execution

time, then

A scalable parallel system can always be made cost?optimal by adjusting the number of processors and the problem size.

Incorrectness is not a big issue in sequential program. But in parallel programming, the precise order of operations potentially be different. So it leads to non-deterministic behavior of the result such as round off errors, deadlocks and race conditions.

A mechanism which is used to control the concurrent accessibility of resources.

Race conditions

While many tasks read from and write to same memory address, the race condition will be occurred. The main reason for race condition is improper synchronization.

MPI

MPI represents the Message Passing Interface which is mainly used for parallel computing. MPI is the combination of protocol and semantic features.

Broadcast

A data can be sent to all processes through communicator while it was holding on one process.

Task Parallelism

Eg:

void Get input(

int my_rank,

int comm_sz,

double* a_p,

double* b_p,

int* n_p ) {

if (my_rank == 0) { /* Process 0 reads an user input and sends

printf(“Enter a, b, and nn”); it to all other processes */

scanf(“%lf %lf %d”, a p, b p, n p); }

MPI_Bcast(a_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

MPI_Bcast(b_p, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

MPI_Bcast(n_p, 1, MPI_INT, 0, MPI_COMM_WORLD); }

Tags and Wildcards

MPI_ANY_SOURCE – It matches any sources with the receiver at end. It helps to avoid unnecessary waiting.
MPI_ANY_TAG – It matches any tag of the sender to the end receiver

MPI_Reduce

The lengthy code can be placed by a single invocation. The syntax is given below.

MPI Reduce(&local_int, &total_int, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

Example:

int MPI Reduce(

void* input_data_p /* in */,

void* output_data_p /* out */,

int count /* in */,

MPI_Datatype datatype /* in */,

MPI_Op operator /* in */,

int dest process /* in */,

MPI_Comm comm /* in */

);

MPI_Scatter

This functions reads the entire process but delivers the exact required components of each processes. The syntax is given below.

int MPI Scatter(

void* send_buf_p, /* pointer to data to divide */

int send_count, /* number of elements */

MPI_Datatype send_type,

void* recv_buf p, /*pointer to a local vector */

int recv_count, /* local_n (size of local vector*/

MPI_Datatype recv_type,

int src_proc, /* source of the data */

MPI_Comm comm );

MPI_gather

Scattering the vector is not sufficient for MPI communication. Gathering the results also necessary and it can be acheived by using MPI_Gather. The syntax is given below.

int MPI_Gather(

void* send_buf_p , /* data to send */

int send_count ,

MPI_Datatype send_type,

void* recv_buf_p, /* data to receive/gather */

int recv count,

MPI_Datatype recv_type,

int dest_proc,

MPI_Comm comm );

Data consolidation

From the multiple data sources, data is integrated, put together in a single source is called data consolidation.

MPI_Barrier

The barrier represents the synchronization point in the program. The will be suspended until all threads in a parallel region reach the barrier. The suspended threads will be resumed after reaching the barrier point. The main purpose is to improve the correctness of the program by reducing the data races.

OpenMP

Main directives

Directives used with end directive pair. The syntax is given below.

!$OMP directive

[ structured block of code ]

!$OMP end directive

The format of c/c++ directives format is given below.

Example: #pragma omp parallel

Task based programming

While the thread catches the construction of task, a new task will be generated from the instruction code within a structured block. Based on the data sharing attributes, the data environment is created. After encountering each threads, the execution will be started. The syntax is given below.

Scalability

#pragma omp task [clause, clause, …]

structured-block

Mutual Exclusion

It is a mechanism for maintaining the concurrent process of threads while accessing the same resources.

Example:

P0: flag[0] = true; | P1: flag[1] = true;

turn = 1; | turn = 0;

while (flag[1] == true | while (flag[0] == true

&& turn == 1) | && turn==0)

{ // busy wait } | {//busy wait}

// critical section … | //critical section

// end of critical section | //end of critical section

flag[0] = false; | flag[1]==false;

P0 and P1 can’t be at the same time in critical section!

Locks

To control the accessibility of the threads ‘Locking’ is preferred. To get an exclusive accessibility to a variable in the data structure, the lock is required. These locks are necessary for ensuring the correct behavior of the multiple-threads and programs.

Eg:

omp_lock_t writelock;

omp_init_lock(&writelock);

#pragma omp parallel for for ( i = 0; i < x; i++ )

{ // some stuff

omp_set_lock(&writelock);

// one thread at a time stuff

omp_unset_lock(&writelock);

// some stuff }

omp_destroy_lock(&writelock);

Memory Consistency

To obtain better efficiency, the processor can rearrange the memory reads and writes. This rearrangement of memory can cause some problems and it is called as weak memory consistency. The main issues in due to the memory consistency is that the algorithms and programs will become not executable while they running on the weak memory consistency models. To overcome this problem of incorrectness for weak consistency models, the ‘Paterson Algorithm’ is used as given below. Consider P0 and P1 are processes which are concurrently executing.

Parallel for and data dependencies

OpenMP parallelize the for loops but neither do-while nor while loops. The syntax is given below.

#pragma omp parallel for [clauses]

for_statement

// execute for_statement in parallel

The computation of iterations will be based on the previous iterations. It is known as data dependencies or loop carried dependencies.

Example:

fibo[0] = fibo[1] = 1;

# pragma omp parallel for num_threads(thread_count)

for (i = 2; i < n; i++)

fibo[i] = fibo[i-1] + fibo[i-2];

Parallel sorting

In computation process, sorting plays an important role in classical tasks. It rearranges the numerical value in a sequential order.

Hybrid Programming:MPI+OpenMP

In MPI, the distributed memory systems, resources are not shared in an optimal way and the communication is expensive. To overcome these issues we go for hybrid MPI+OpenMP. It improves the performance in various aspects such as grain size and communication cost.

Amdahl’s Law

Techniques for performance improvement

To improve the performance of parallel programs the following aspects need to be considered.

Degree of parallelization available
Grain size
Locality

Generally, the grain size represents the size of work given to thread, process or processor. The large data packet is split into many small chunks of data. These count of chunks is called as grain size. The performance of parallel system depends on the parallel programs. So that the size of data chunks will affect the parallel performance.

Two different types of locality available which are given below.

Temporal locality

When a processor accesses the memory location of a variable, it can revisit the memory location quickly and without consuming more timing.

Data locality

When a processor accesses, the memory location, it can visit the nearby location immediately.

The locality helps in the execution of loops indexing through arrays. The locality makes the cache memory very useful.

Parallel overhead

Parallel overhead is an issue and it will occur in parallel execution because of bad locality of non-local interfering writes and reads. This issue can be overcome by reduction of parallel overhead in three ways which are given below.

Loop scheduling

To improve the performance it is necessary to use different scheduling on various test cases.

Conditionally executing in parallel

It helps to reduce parallelism overhead for small values of arguments.

Replicating work

To make the thread interaction faster, the threads need to replicate work. It is faster than barrier synchronization.

Loop transformations

In the parallelizable transformations, loop constructions are very essential. To get better performance, in multicore architectures, the loops need to be transformed. Types of loop transformations are listed below.

Loop fission
Loop fusion
Loop inversion

For MPI programs debugging process, serial debuggers are used. To separate processes, the serial debuggers are used as the attachment of gdb. In parallelized process other debuggers are used such as ‘TotalView’.

gdb allows the programmer to see what is going on inside another program when its execution. By using GDB, we can run the Microsoft windows and Unix variants. Basic gdb commands are given below.

run -To execute the program
Ctrl-C -To stop execution
continue -To continue execution
list -It shows where the program stopped
quit It quits debugger

VTune for profiling parallel programs

The performance for Parallel Programs van be depends up on various factors and they are listed below.

Available hardware and software resources;
Communication cost;
Parallel overhead related to threads control;
Optimizations;

Advantages and limitations

It has multi-core processing units and thousands of parallel execution units on a single card. It has faster memory interfaces; but it has some limitations on the parallelism operations such as data parallelism. Because the same instructions were applied on same data set.

In GPU execution model, GPU executes the kernel function. In this model an array of threads executed with same code but on different paths. Each thread has an unique ID for controlling decisions. These threads are grouped into blocks and these blockes are grouped together into a grid.

Mutual Exclusion

It is used with specialized library functions, compiler directives such as OpenACC and specialized languages such as CUDA, language extensions and OpenCL.

OpenACC

OpenACC represents the open using accelerators including GPUs. The OpenACC uses directives in C, C++ code for helping the compiler to offload the chosen computations into accelerators. The OpenACC programming is same as OpenMP programming.

Example:

void saxpy(int n, float a, float *x, float *y)

{

#pragma acc kernels

for (int i = 0; i < n; ++i)

y[i] = a*x[i] + y[i];

}

CUDA

CUDA is a parallel computing platform and this programming model was proposed by NVIDIA Company. The NVIDIA CUDA Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit: a compiler for NVIDIA GPUs, math libraries, and tools for debugging and optimizing;

The CUDA programming model consists of both the CPU and GPU are used; in this model, the host represents the CPU and its memory; The device represents the GPU and its memory;

The code can run on the host can manage memory on both the host and device. The code runs on the host launches kernels which are functions executed on the device. These kernels are executed by many GPU threads in parallel.

Example:

for (int i = 0; i < N; i++) //initialization of host arrays

{ x[i] = 1.0f; y[i] = 2.0f; }

cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);

cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);

//copying the content of host arrays to device arrays

Parallelization of prefix operations

Assume that we have x0, x1, x2,…. a sequence of integers (in an array); We need to compute

y0 = x0
y1 = x0 + x1
y2 = x0 + x1+ x2

Here we need to use Parallel algorithm for prefix sum

Compute the sums of consecutive pairs

z0 = x0 + x1, z1 = x2 + x3, etc

Compute prefix sum of z0, z1, z2…:

w0 = z0 = x0 + x1

w1 =z0 + z1 = x0 + x1 + x2 + x3

Compute required prefix sum from x and w:

y0 = x0, y1 = w0, y2 = w0 + x2, y3 = w1, …

Compute the sums of consecutive pairs

z0 = x0 + x1, z1 = x2 + x3, etc

Compute prefix sum of z0, z1, z2…:

w0 = z0 = x0 + x1

w1 =z0 + z1 = x0 + x1 + x2 + x3

Compute required prefix sum from x and w:

y0 = x0, y1 = w0, y2 = w0 + x2, y3 = w1, .

Apply the recursive algorithm to find the z sequence in parallel

It is the ability to run the parallel program sequentially. It is possible in parallel as well. So that it is called as relaxed sequential. To get appropriate result, without parallel execution process, we go for RSE model. In RSE model debugging and verification process can be done easily.

Turn in your highest-quality paper
Get a qualified writer to help you with

“ Multicore Computing: Cores, Threads, And Parallelism ”

Get high-quality paper

NEW! AI matching with writer

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

Order Now Talk to Us

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Essay

Thesis

Presentation

Dissertation

Term Paper

Research Paper

Book Review

Assignment

Report

Case Study

Letter

Article

Coursework

Speech

Q & A

Critical Thinking

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

Most Qualified Writer $10FREE
Plagiarism Scan Report $10FREE
Unlimited Revisions $08FREE
Paper Formatting $05FREE
Cover Page $05FREE
Referencing & Bibliography $10FREE
Dedicated User Area $08FREE
24/7 Order Tracking $05FREE
Periodic Email Alerts $05FREE

Services offered

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

On-time Delivery
24/7 Order Tracking
Access to Authentic Sources

Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Sign Up Talk to Us

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories

All samples

Essay (any type)

Essay (any type)

The Value of a Nursing Degree

Undergrad. (yrs 3-4)

Nursing

2

View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

Call Us +1 763 309 4299 Discuss Order Details Now

See How We Helped 9000+ Students Achieve Success

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

Clear elicitation of your requirements.
Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

Proactive analysis of your writing.
Active communication to understand requirements.

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

Thorough research and analysis for every order.
Deliverance of reliable writing service to improve your grades.

Place an Order Start Chat Now