Cross-Language Plagiarism Detection Tool: Guidelines And Project Description

Guidelines for Progress Report and Final Presentation

This application aims to take in an input from a user which would be a text file. The two languages that can be read from the file should be English and Hindi. The users will be able to provide the input as an UNICODE file. To achieve this we create a “Hindi representation” of the sentence in English.The application will then search for similar files on the internet and provide as with the results that are relevant to the text file that is uploaded. To achieve this we create a “Hindi representation” of the sentence in English.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

We went through many articles on the internet’s which were related to the development of the cross platform plagiarism tool. An article [1] on the stack overflow suggested that we could develop this application in Python using the NLTK library and GenSem library which is accomplished by creating the LDA or LSA of the document. We can ultimately use the Google Search API to search for those words.  NTLK [2] is the Natural Language Toolkit for the natural language processing. This toolkit supports libraries for classification, tokenization, stemming, tagging, parsing, semantic reasoning etc.

In [5], Chowet. al. mentions about the semantic plagiarism technique. Semantic plagiarism is where the sentence is reconstructed or some terms are changed into its corresponding synonyms. Both of these plagiarisms is hardly detected due to the difference in their fingerprints. Plagiarism detection tools that are available are not capable to detect such plagiarism cases.

Chow et. al. in [5] proposes a new approach in detecting both cross language and semantic plagiarism, where , the query document is shortened by utilising fuzzy swarm-based summarisation approach, the summary will give the most important keywords in the document. Input summary documents are translated into English using Google Translate Application Programming Interface (API) before the words are stemmed and the stop words are removed. Tokenized documents are sent to the Google AJAX Search API to search for similar documents throughout the World Wide Web. Stanford Parser and Word Net are used to determine the semantic similarity between the suspected documents with source documents. Stanford parser assigns each terms in the sentence to their corresponding roles such as Nouns, Verbs and Adjectives. Each sentence is then represented in a predicate form and similarity is measured based on those predicates using information from Word Net taxonomy. Testing dataset is built up from two sets of input documents which are produced based on different plagiarism techniques.

Bird et. al. in [3] overs the scope of using the NTLK toolkit for the natural language processing. We are thinking of using methodology where a Token class is used to represent of unit a text such as a word, sentence or a piece of document. Kuhn et. al.[4] describes the use of the application of semantic classification trees for the understanding of natural language processing. Speech understanding, semantic classification, machine learning, natural language and decision tree based capabilities for a translator application are covered up in this paper.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

These paragraphs speakabout the speech classification, machine learning based learning of artificial neural networks, decision trees, tokenization and several other methods.In [6], Jeremy et. al. talks about different state-of-art methods to detect the plagiarism. Some of the methods used in the experiment are Cross-Language Character N-Gram (CL-CnG) , Cross-Language Conceptual Thesaurus-based Similarity (CL-CTS), Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Explicit Semantic Analysis (CL-ESA), Translation + Monolingual Analysis (T+MA).  According to the author, there is a common behaviour of each method across different language pairs. There is not only a strong correlation across languages but also across text units that were considered. If a method is efficient on a particular language pair, it will be similarly efficient on another language pair as long as enough lexical resources are available for these languages. There was a strong correlation across types of text when they investigated the behaviour of the methods across different types of texts on a particular language pair. It was found that a method could be optimized on a particular collection of text and applied efficiently on another collection. Finally, it was concluded that methods behave differently in clustering match and mismatched units, even if they seem similar in performance.

Project Description

The Project Activities are shown below(Barrón-Cedeño, Gupta and Rosso, 2013).

Developing a cross-language plagiarism detection tool

   User management

   Document management

   Translation of input documents

      Translate the plagiarized Hindi documents into English

      Improve the effectiveness of the detection process

      Use Google Translate AP

   Removing Stop Words

      Before passing the translated documents for comparison through the Internet

      Remove the stop words in the translated text

   Stemming Words

      Remove the affixes

      Generate root word

      Pattern matching

      Text Stemmer and Porter Stemmer

      Use of Porter Stemming algorithm

      Removing the commoner morphological and in flexional endings from words in English

   Identifying Similar Documents

      Collection of documents that located around the World Wide Web

      Enables small and characteristic fragments translation

      Query documents or texts are inserted

      Use of Google AJAX Search API

   Comparison of Similar Pattern

      Detect plagiarism

      Represent the sentence uniquely.

   Summary of the Result

      Gathering the result

      Plagiarism detection is displayed

      Highlight the similarities between the two files.

Resource Name

Type

Initials

Max. Units

Std. Rate

Accrue At

Base Calendar

Project Manager

Work

P

100%

$1,000.00/hr

Prorated

Standard

System Analyst

Work

S

100%

$1,000.00/hr

Prorated

Standard

Developer

Work

D

100%

$1,000.00/hr

Prorated

Standard

Designer

Work

D

100%

$1,000.00/hr

Prorated

Standard

Technical Writer

Work

T

100%

$1,000.00/hr

Prorated

Standard

Code Designer

Work

C

100%

$1,000.00/hr

Prorated

Standard

Overall Project Activities are shown below(Chauhan, Arora and Singhal, 2017).

Task Name

Duration

Start

Finish

Predecessors

Resource Names

Developing a cross-language plagiarism detection tool

60 days

Wed 9/12/18

Tue 12/4/18

   User management

1 day

Wed 9/12/18

Wed 9/12/18

Designer, Developer

   Document management

2 days

Thu 9/13/18

Fri 9/14/18

2

Designer, Project Manager, Technical Writer

   Translation of input documents

8 days

Mon 9/17/18

Wed 9/26/18

3

      Translate the plagiarized Hindi documents into English

2 days

Mon 9/17/18

Tue 9/18/18

Code Designer, Developer, System Analyst

      Improve the effectiveness of the detection process

3 days

Wed 9/19/18

Fri 9/21/18

5

Developer

      Use Google Translate AP

3 days

Mon 9/24/18

Wed 9/26/18

6

Code Designer, Designer

   Removing Stop Words

5 days

Thu 9/27/18

Wed 10/3/18

4

      Before passing the translated documents for comparison through the Internet

2 days

Thu 9/27/18

Fri 9/28/18

Designer, System Analyst

      Remove the stop words in the translated text

3 days

Mon 10/1/18

Wed 10/3/18

9

Developer, Code Designer

   Stemming Words

15 days

Thu 10/4/18

Wed 10/24/18

8

      Remove the affixes

3 days

Thu 10/4/18

Mon 10/8/18

Designer

      Generate root word

4 days

Tue 10/9/18

Fri 10/12/18

12

System Analyst

      Pattern matching

2 days

Mon 10/15/18

Tue 10/16/18

13

Designer

      Text Stemmer and Porter Stemmer

2 days

Wed 10/17/18

Thu 10/18/18

14

Developer

      Use of Porter Stemming algorithm

2 days

Fri 10/19/18

Mon 10/22/18

15

Developer

      Removing the commoner morphological and in flexional endings from words in English

2 days

Tue 10/23/18

Wed 10/24/18

16

Developer, System Analyst

   Identifying Similar Documents

10 days

Thu 10/25/18

Wed 11/7/18

11

      Collection of documents that located around the World Wide Web

2 days

Thu 10/25/18

Fri 10/26/18

System Analyst

      Enables small and characteristic fragments translation

3 days

Thu 10/25/18

Mon 10/29/18

Developer

      Query documents or texts are inserted

3 days

Tue 10/30/18

Thu 11/1/18

20

System Analyst, Technical Writer

      Use of Google AJAX Search API

4 days

Fri 11/2/18

Wed 11/7/18

21

Code Designer, Developer

   Comparison of Similar Pattern

10 days

Thu 11/8/18

Wed 11/21/18

18

      Detect plagiarism

4 days

Tue 11/13/18

Fri 11/16/18

24

Code Designer, Project Manager, System Analyst

      Represent the sentence uniquely.

3 days

Mon 11/19/18

Wed 11/21/18

25

System Analyst, Technical Writer

   Summary of the Result

9 days

Thu 11/22/18

Tue 12/4/18

23

      Gathering the result

2 days

Thu 11/22/18

Fri 11/23/18

Project Manager, System Analyst

      Plagiarism detection is displayed

3 days

Mon 11/26/18

Wed 11/28/18

28

Designer, Developer, Project Manager

      Highlight the similarities between the two files.

4 days

Thu 11/29/18

Tue 12/4/18

29

Code Designer, Developer

Project charter is shown below.

Resource Cost status is shown below(Ehsan and Shakery, 2016).

Project Activities Cost is shown below.

Name

Fixed Cost

Actual Cost

Remaining Cost

Cost

Baseline Cost

Cost Variance

Developing a cross-language plagiarism detection tool

$0.00

$0.00

$912,000.00

$912,000.00

$0.00

$912,000.00

MS Project file is attached here.

Plagiarism is turning into a difficult issue for scholarly network. The recognition of counterfeiting at different levels is an important issue. The complexity of the issue increments when we are finding the plagiarism detection in the source codes that might be in a similar language or they have been changed into different languages(Franco-Salvador et al., 2016). This kind of written falsification is found in the scholastic fills in as well as in the ventures managing programming planning. The real issue with the source code written fabrication is that distinctive programming languages may have different linguistic structure.

In view of language homogeneity or heterogeneity of the writings being looked at, plagiarism detection discovery can be characterized into monolingual and cross-lingual. The cross-language written misrepresentation recognition process is like the outside plagiarism detection identification assignment with a few alterations in heuristic recovery and itemized investigation stages(Gelbukh, 2009). In cross-language heuristic recovery, this stage expects to recover the accumulation of source hopeful archives from the informational index. Deciphering the info archive from the inquiry language to the source language might be required in this stage. The cross-language point by point examination level estimates the cross-language likeness between segments of the suspicious record and segments of the hopeful reports which recovered in the past stage(Kashkur, Parshutin and Borisov, 2010).

Language used : Java script.

Software Design for Cross language plagiarism detection tool is illustrated below(Kasprowicz and Wada, 2014).

If you type  the any text it change from English to hindi..(Potthast et al., 2010)

Conclusion

With the project being accomplished, we hope that we would be able to find plagiarism related to any articles on the web provided an input file to our application.

References

‘How to develop a plagiarism detector?’ stackoverflow.com/questions/1193408 extracted on 11 August 2018.

NTLK 3.3 documentation for Natural Language Toolkit extracted from nltk.org on 11 August 2018.

Steven Bird, Edward Loper NTLK: The Natural Language Toolkit

Roland Kuhn, Renato De Mori – The Application of Semantic Classification Trees to Natural Language Understanding.

Chow Kok Kent, NaomieSalim- Web Based Cross Language Semantic Plagiarism Detection, 03 January, 2012

Jeremy Ferrero, Lauren Besacier, Didier Schwab, Frederic Agnes- Deep Investigation of Cross-Language Plagiarism Detection MethodsBarrón-Cedeño, A., Gupta, P. and Rosso, P. (2013). Methods for cross-language plagiarism detection. Knowledge-Based Systems, 50, pp.211-217.

Chauhan, S., Arora, A. and Singhal, Y. (2017). Plagiarism Detection of C Program using Assembly Language. International Journal of Computer Applications, 158(3), pp.17-22.

Ehsan, N. and Shakery, A. (2016). Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Information Processing & Management, 52(6), pp.1004-1017.

Franco-Salvador, M., Gupta, P., Rosso, P. and Banchs, R. (2016). Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowledge-Based Systems, 111, pp.87-99.

Gelbukh, A. (2009). Computational Linguistics and Intelligent Text Processing. Heidelberg: Springer.

Kashkur, M., Parshutin, S. and Borisov, A. (2010). Research into Plagiarism Cases and Plagiarism Detection Methods. Scientific Journal of Riga Technical University. Computer Sciences, 42(1).

Kasprowicz, D. and Wada, H. (2014). Methods for automated detection of plagiarism in integrated-circuit layouts. Microelectronics Journal, 45(9), pp.1212-1219.

Lee, Y. (2012). Plagiarism Detection among Source Codes using Adaptive Methods. KSII Transactions on Internet and Information Systems.

METHODS FOR INTRINSIC PLAGIARISM DETECTION. (2017). Informatics and Applications.

Potthast, M., Barrón-Cedeño, A., Stein, B. and Rosso, P. (2010). Cross-language plagiarism detection. Language Resources and Evaluation, 45(1), pp.45-62.

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Services offered

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
Place an Order Start Chat Now
image

Order your essay today and save 30% with the discount code ESSAYHELP