subject

Yates' big file sorter

for this assignment, you will design and implement a sort program that still works correctly, even when there is not enough memory to store an entire input file. this will give you practice with collections, files, exceptions, and sorting algorithms.

program description

at runtime, your program should read the name of a "large" input file to sort, and the name to give the sorted output file.

the program will sort the lines in the input file, and save the sorted lines in the output file. your sort algorithm will be a hybrid of the mergesort algorithm and java's built-in sorting algorithm as a two-phase process, as described below.

phase 1: breaking the input file into manageable chunks

in the first phase, the sort algorithm will read in a fixed number of lines from the input file, and store them into an array. it should pick a manageable number that will fit into memory. next, it will sort that array using java's built-in arrays. sort method. then it will store the sorted array in a temporary file called "temp_0_0.txt".

the algorithm will repeatedly read in chunks of lines until it has read in the entire contents of the input file. each time it reads in a chunk of lines from the input file, it stores that chunk in an array, sorts the array, and saves the array in another temporary file ("temp_0_1.txt", "temp_0_2.txt", "temp_0_n. txt"). notice that it is reading in an amount that will fit into memory each time, so that it does not run out of memory.

after this first phase is complete, each of the "temp_0_i. txt" files is in sorted order, and together they contain all of the stuff that was in the input file. the second phase must merge the files together, while making sure that the merged files remain in sorted order.

phase 2: merging the chunks together

remember from class that the mergesort algorithm repeatedly merges two small sorted arrays into a larger sorted array. here, you will do the same, but instead repeatedly merge two small sorted files into a larger sorted file. take care that you only read in one line of each file into memory at a time don't read the entire files into memory, or you will run out of memory!

your sort algorithm should begin phase 2 by merging "temp_0_0.txt" and "temp_0_1.txt", and saving it to a new file called "temp_1_0.txt". next, it will merge "temp_0_2.txt" with "temp_0_3.txt", and save the merged file as "temp_1_1.txt". this will repeat until there are no more "temp_0_i. txt" files left. if there are an odd number of these files, the last one will have nothing to merge with. that's ok, it can be merged in later iterations. just copy it into the next available "temp_1_i. txt".

after merging pairs of the "temp_0_i. txt" files, the sort algorithm needs to begin merging pairs of the "temp_1_i. txt" files. it will begin by merging "temp_1_0.txt" and "temp_1_1.txt", and saving the result to "temp_2_0.txt". then, it will merge "temp_1_2.txt" and "temp_1_3.txt", and save the result to "temp_2_1.txt", and so on.

each time through the set of temporary files, the merging process cuts the number of temporary files roughly in half, because it merges two files into one. (i say "roughly" because there may be an odd number of files, in which case the last file does not get merged with anything.) this phase of the sorting must keep repeating, until there are only two temporary files left. when that happens, it should merge those temporary files, but instead of saving the result to another temporary file, it should save it to the output file. then the sort is finished.

guidelines for testing your program

you may find a reasonably large text file (~7 mb) here, which you may use to test your program. it contains aesop's fables, the complete works of shakespeare, mary shelley's frankenstein, and mark twain's the adventures of huckleberry finn.

the file is certainly small enough to fit into memory i didn't want to you to have to deal with an enormous file. however, you should choose a setting so that your program reads in less than the whole file at a time. at first, try reading in around 1/20 of the lines at a time into memory during the first phase. you should test your program with several different settings.

unit tests

write a junit test to make sure that your merge() function works correctly. be sure to test that it merges arrays that are the same size and two arrays that are not the same size.

ansver
Answers: 1

Another question on Computers and Technology

question
Computers and Technology, 21.06.2019 22:00
What must you do before formatting a paragraph?
Answers: 1
question
Computers and Technology, 22.06.2019 09:40
It is vital to research each of the services you plan to disable before implementing any change, especially on critical machines such as the: a. servers in the test environment. b. domain controller and other infrastructure servers. c. desktops that have previously been attacked. d. desktops used by upper-level management.
Answers: 2
question
Computers and Technology, 23.06.2019 06:30
Martha is designing a single-player game. her manager suggests that she plan the design to incorporate future modifications. which principle of game design relates to planning for future modifications?
Answers: 1
question
Computers and Technology, 23.06.2019 13:10
What is domain name system (dns)? allows dynamic ip address allocation so users do not have to have a preconfigured ip address to use the network converts ip addresses into domains, or identifying labels that use a variety of recognizable naming conventions the efficient coexistence of telephone, video, and data communication within a single network, offering convenience and flexibility not possible with separate infrastructures the integration of communication channels into a single service
Answers: 2
You know the right answer?
Yates' big file sorter

for this assignment, you will design and implement a sort program...
Questions
question
Mathematics, 19.11.2021 19:20
question
Mathematics, 19.11.2021 19:20
Questions on the website: 13722367