UFO ET IT

대용량 데이터를 텍스트 파일 Java로 작성하는 가장 빠른 방법

ufoet 2020. 11. 19. 22:23
반응형

대용량 데이터를 텍스트 파일 Java로 작성하는 가장 빠른 방법


text [csv] 파일에 엄청난 데이터를 써야합니다. BufferedWriter를 사용하여 데이터를 썼고 174MB의 데이터를 쓰는 데 약 40 초가 걸렸습니다. 이것이 자바가 제공 할 수있는 가장 빠른 속도입니까?

bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );

참고 : 이 40 초에는 결과 집합에서 레코드를 반복하고 가져 오는 시간도 포함됩니다. :). 174MB는 결과 집합의 400000 개 행에 사용됩니다.


BufferedWriter를 제거하고 FileWriter를 직접 사용해보십시오. 최신 시스템에서는 어쨌든 드라이브의 캐시 메모리에 쓰기 만 할 가능성이 높습니다.

175MB (4 백만 문자열)를 작성하는 데 4 ~ 5 초가 걸립니다. 이것은 80GB, 7200RPM Hitachi 디스크로 Windows XP를 실행하는 듀얼 코어 2.4GHz Dell에 있습니다.

레코드 검색 시간과 파일 쓰기 시간을 분리 할 수 ​​있습니까?

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.Writer;
import java.util.ArrayList;
import java.util.List;

public class FileWritingPerfTest {


private static final int ITERATIONS = 5;
private static final double MEG = (Math.pow(1024, 2));
private static final int RECORD_COUNT = 4000000;
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n";
private static final int RECSIZE = RECORD.getBytes().length;

public static void main(String[] args) throws Exception {
    List<String> records = new ArrayList<String>(RECORD_COUNT);
    int size = 0;
    for (int i = 0; i < RECORD_COUNT; i++) {
        records.add(RECORD);
        size += RECSIZE;
    }
    System.out.println(records.size() + " 'records'");
    System.out.println(size / MEG + " MB");

    for (int i = 0; i < ITERATIONS; i++) {
        System.out.println("\nIteration " + i);

        writeRaw(records);
        writeBuffered(records, 8192);
        writeBuffered(records, (int) MEG);
        writeBuffered(records, 4 * (int) MEG);
    }
}

private static void writeRaw(List<String> records) throws IOException {
    File file = File.createTempFile("foo", ".txt");
    try {
        FileWriter writer = new FileWriter(file);
        System.out.print("Writing raw... ");
        write(records, writer);
    } finally {
        // comment this out if you want to inspect the files afterward
        file.delete();
    }
}

private static void writeBuffered(List<String> records, int bufSize) throws IOException {
    File file = File.createTempFile("foo", ".txt");
    try {
        FileWriter writer = new FileWriter(file);
        BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize);

        System.out.print("Writing buffered (buffer size: " + bufSize + ")... ");
        write(records, bufferedWriter);
    } finally {
        // comment this out if you want to inspect the files afterward
        file.delete();
    }
}

private static void write(List<String> records, Writer writer) throws IOException {
    long start = System.currentTimeMillis();
    for (String record: records) {
        writer.write(record);
    }
    writer.flush();
    writer.close();
    long end = System.currentTimeMillis();
    System.out.println((end - start) / 1000f + " seconds");
}
}

메모리 매핑 파일 시도 (내 m / c, 코어 2 듀오, 2.5GB RAM에 174MB를 쓰는 데 300m / s 소요) :

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes();
int number_of_lines = 400000;

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines);
for (int i = 0; i < number_of_lines; i++)
{
    wrBuf.put(buffer);
}
rwChannel.close();

통계를 위해서만 :

시스템은 새 SSD가있는 오래된 Dell입니다.

CPU : Intel Pentium D 2,8 Ghz

SSD : 패트리어트 인페르노 120GB SSD

4000000 'records'
175.47607421875 MB

Iteration 0
Writing raw... 3.547 seconds
Writing buffered (buffer size: 8192)... 2.625 seconds
Writing buffered (buffer size: 1048576)... 2.203 seconds
Writing buffered (buffer size: 4194304)... 2.312 seconds

Iteration 1
Writing raw... 2.922 seconds
Writing buffered (buffer size: 8192)... 2.406 seconds
Writing buffered (buffer size: 1048576)... 2.015 seconds
Writing buffered (buffer size: 4194304)... 2.282 seconds

Iteration 2
Writing raw... 2.828 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.078 seconds
Writing buffered (buffer size: 4194304)... 2.015 seconds

Iteration 3
Writing raw... 3.187 seconds
Writing buffered (buffer size: 8192)... 2.109 seconds
Writing buffered (buffer size: 1048576)... 2.094 seconds
Writing buffered (buffer size: 4194304)... 2.031 seconds

Iteration 4
Writing raw... 3.093 seconds
Writing buffered (buffer size: 8192)... 2.141 seconds
Writing buffered (buffer size: 1048576)... 2.063 seconds
Writing buffered (buffer size: 4194304)... 2.016 seconds

보시다시피 원시 메서드는 버퍼링 속도가 느립니다.


전송 속도는 Java에 의해 제한되지 않을 수 있습니다. 대신 나는 의심 할 것이다 (특정 순서없이)

  1. 데이터베이스에서 전송 속도
  2. 디스크로의 전송 속도

전체 데이터 세트를 읽은 다음 디스크에 쓰면 JVM이 메모리를 할당하고 db rea / disk 쓰기가 순차적으로 발생하므로 시간이 더 오래 걸립니다. 대신 db에서 읽는 모든 읽기에 대해 버퍼링 된 작성기에 기록하므로 작업이 동시 작업에 더 가까울 것입니다 (이 작업을 수행하는지 여부는 모르겠습니다)


DB에서 이러한 대량 읽기의 경우 Statement의 가져 오기 크기 를 조정할 수 있습니다 . DB에 대한 많은 왕복을 절약 할 수 있습니다.

http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29


package all.is.well;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import junit.framework.TestCase;

/**
 * @author Naresh Bhabat
 * 
Following  implementation helps to deal with extra large files in java.
This program is tested for dealing with 2GB input file.
There are some points where extra logic can be added in future.


Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.



It uses random access file,which is almost like streaming API.


 * ****************************************
Notes regarding executor framework and its readings.
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);

 *  	   for 10 threads:Total time required for reading and writing the text in
 *         :seconds 349.317
 * 
 *         For 100:Total time required for reading the text and writing   : seconds 464.042
 * 
 *         For 1000 : Total time required for reading and writing text :466.538 
 *         For 10000  Total time required for reading and writing in seconds 479.701
 *
 * 
 */
public class DealWithHugeRecordsinFile extends TestCase {

	static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";
	static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";
	static volatile RandomAccessFile fileToWrite;
	static volatile RandomAccessFile file;
	static volatile String fileContentsIter;
	static volatile int position = 0;

	public static void main(String[] args) throws IOException, InterruptedException {
		long currentTimeMillis = System.currentTimeMillis();

		try {
			fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles 
			file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles 
			seriouslyReadProcessAndWriteAsynch();

		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		Thread currentThread = Thread.currentThread();
		System.out.println(currentThread.getName());
		long currentTimeMillis2 = System.currentTimeMillis();
		double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;
		System.out.println("Total time required for reading the text in seconds " + time_seconds);

	}

	/**
	 * @throws IOException
	 * Something  asynchronously serious
	 */
	public static void seriouslyReadProcessAndWriteAsynch() throws IOException {
		ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class
		while (true) {
			String readLine = file.readLine();
			if (readLine == null) {
				break;
			}
			Runnable genuineWorker = new Runnable() {
				@Override
				public void run() {
					// do hard processing here in this thread,i have consumed
					// some time and eat some exception in write method.
					writeToFile(FILEPATH_WRITE, readLine);
					// System.out.println(" :" +
					// Thread.currentThread().getName());

				}
			};
			executor.execute(genuineWorker);
		}
		executor.shutdown();
		while (!executor.isTerminated()) {
		}
		System.out.println("Finished all threads");
		file.close();
		fileToWrite.close();
	}

	/**
	 * @param filePath
	 * @param data
	 * @param position
	 */
	private static void writeToFile(String filePath, String data) {
		try {
			// fileToWrite.seek(position);
			data = "\n" + data;
			if (!data.contains("Randomization")) {
				return;
			}
			System.out.println("Let us do something time consuming to make this thread busy"+(position++) + "   :" + data);
			System.out.println("Lets consume through this loop");
			int i=1000;
			while(i>0){
			
				i--;
			}
			fileToWrite.write(data.getBytes());
			throw new Exception();
		} catch (Exception exception) {
			System.out.println("exception was thrown but still we are able to proceeed further"
					+ " \n This can be used for marking failure of the records");
			//exception.printStackTrace();

		}

	}
}

참고 URL : https://stackoverflow.com/questions/1062113/fastest-way-to-write-huge-data-in-text-file-java

반응형