java - Very large heap size when reading large files -
i have ascii file consist 180 columns of digits , 60000 rows. file size ~80mb.
i need read file 2d array size 180x60000.
file structure example:
gsrv01: 946177 946061 ..[many columns].. 8359486 8359485 0 end total 184
.. [ many rows ] ..
gsrv01: 945998 946259 ..[many columns].. 8359489 8359487 1 end total 184
when i'm reading file i'm getting usage of memory 800mb. i'm using data file in gui application, total amount memory getting on 1200mb. unacceptable.
am doing reading right? how can reduce memory usage?
import java.io.bufferedreader; import java.io.fileinputstream; import java.io.ioexception; import java.io.inputstreamreader; import java.util.arraylist; import java.util.arrays; import java.util.list; import java.util.regex.matcher; import java.util.regex.pattern; public class readbigdata { public static void main(string[] args){ string pathfilename = "e:\\data\\8.txt"; long starttime = system.nanotime(); new readbigdata(pathfilename); long endtime = system.nanotime(); long duration = (endtime - starttime); //divide 1000000 milliseconds. double dur = (double) duration/1000000/1000; system.out.println("elapsed: " + dur + " sec."); try { system.in.read(); //to wait after execution. } catch (ioexception e) { e.printstacktrace(); } } public readbigdata(string pathfilename){ //list containing data list<list<double>> datatablelist = new arraylist<list<double>>(); pattern spacepattern = pattern.compile("\\s+"); //split whitespace or tab string regex = "^gsrv01:\\s+(.*)\\s+(\\d+)\\s+end total.*";//. -- symbol, * -- repeated 0 or more times. pattern pattern = pattern.compile(regex); try { fileinputstream inputstream = new fileinputstream(pathfilename); bufferedreader bufferedreader = new bufferedreader(new inputstreamreader(inputstream)); string line = null; while ((line = bufferedreader.readline()) != null) { matcher matches = pattern.matcher(line); while(matches.find()){ //slow!!!!!!!!!!!! string columnsstr = matches.group(1); list<string> columnslist = arrays.aslist(spacepattern.split(columnsstr, 0)); //fast list<double> list = new arraylist<double>(); (string str : columnslist) { list.add(double.parsedouble(str)); } datatablelist.add(list); } } inputstream.close(); } catch (ioexception e) { e.printstacktrace(); } //list array double[][] datatable = new double[datatablelist.size()][]; (int = 0; < datatablelist.size(); i++) { list<double> row = datatablelist.get(i); datatable[i] = row.toarray(new double[row.size()]); } } }
there's api processing unquantifiable sets of data. depending on amount of numbers, may want remove nested stream , use for-loop.
public static list<double[]> read(string pathfilename) { pattern pattern = pattern.compile("^gsrv01:\\s+(.*)\\s+(\\d+)\\s+end total.*"); try(fileinputstream in = new fileinputstream(pathfilename); inputstreamreader stream = new inputstreamreader(in); bufferedreader reader = new bufferedreader(stream)) { return reader.lines() .map(pattern::matcher) .filter(matcher::matches) .map(matcher -> matcher.group(1)) .map(s -> s.split("\\s+")) .map(strings -> arrays.stream(strings) .maptodouble(double::parsedouble) .toarray()) .collect(collectors.tolist()); } catch (ioexception e) { return collections.emptylist(); } } public static void main(string[] args) { system.out.println(read("8.txt").size()); }
this method parsed 59292 lines of numbers 80mb file attached in less 3 seconds on 6-year-old laptop
Comments
Post a Comment