java - Efficient searching according to "starts with" -
i facing issue write logic below problem.
i have 2 arraylists of strings:
list1: contains 5 million strings
list2: create on users input , contains strings/characters(ex. a,b,c,g,l,pd,sp,mta)
now have split list1 multiple lists according startswith strings in list2 in above case. need create 8 lists starts 'a', 'b','c', 'g', 'l','pd', 'sp' , 'mta'
but condition above have iterate list1 or list2 once. i.e. worst complexity algorithm should size of list1 (5 million).
it allowed use collections.sort() method
code have tried
// create list search strings. list<string> charlist = new arraylist<string>(); charlist.add("a"); charlist.add("b"); charlist.add("e"); charlist.add("z"); charlist.add("4"); charlist.add("1"); charlist.add("zi"); list<string> recordlist = new arraylist<string>(); // creating dummy data 100 character in live environment can // around 50 lakhs strings (int = 0; < 100; i++) { char[] chars = "abcdefghijklmnopqrstuvwxyzabcgdkl0123456789".tochararray(); stringbuilder sb = new stringbuilder(); random random = new random(); (int i1 = 0; i1 < 6; i1++) { char c = chars[random.nextint(chars.length)]; sb.append(c); } string output = sb.tostring(); recordlist.add(output); } // adding data mannually recordlist.add("zink"); recordlist.add("zebra"); recordlist.add("zzzzzz"); collections.sort(charlist, string.case_insensitive_order); collections.sort(recordlist, string.case_insensitive_order); system.out.println("recordlist ===>" + recordlist); system.out.println("***************************************************"); system.out.println("charlist ===>" + charlist); system.out.println("***************************************************"); list<list> lists = new arraylist<list>(); int startindex = 0, charpointer = 0; while (startindex < recordlist.size() && charpointer < charlist.size()) { list<string> temp = new arraylist<string>(); boolean ishit = false; string currentrecord = recordlist.get(startindex); string partitionsattement = charlist.get(charpointer); while (currentrecord.startswith(partitionsattement.touppercase()) || currentrecord.startswith(partitionsattement.tolowercase())) { temp.add(recordlist.get(startindex)); ishit = true; startindex++; } if (!ishit) { startindex++; } if (!temp.isempty()) { lists.add(temp); system.out.println(charlist.get(charpointer) + "====>" + temp); } charpointer++; }
just using string
startswith
method won't work in case. consider happens if first pattern not match input - you'll loop through strings in input list without finding match, though subsequent pattern matches exist.
what need instead compare each pattern against initial characters of each input string , process accordingly. let's have input string str
, pattern pat
. let substr
first pat.length()
characters of str
. can compare substr
, pat
using string
comparetoignorecase
method. there 3 cases consider:
substr < pat
move next input string.
substr == pat
add str
output pat
, move next input string.
substr > pat
move next pattern.
here's code illustrate (i've kept variable names possible).
list<list<string>> output = new arraylist<>(); for(int i=0; i<charlist.size(); i++) output.add(new arraylist<string>()); int startindex=0; int charpointer=0; while(startindex < recordlist.size() && charpointer < charlist.size()) { string charstr = charlist.get(charpointer); string recstr = recordlist.get(startindex); int cmp; if(recstr.length() < charstr.length()) { cmp = -1; } else { string recsubstr = recstr.substring(0, charstr.length()); cmp = recsubstr.comparetoignorecase(charstr); } if(cmp <= 0) { if(cmp == 0) output.get(charpointer).add(recstr); startindex++; } else { charpointer++; } } for(int i=0; i<charlist.size(); i++) { system.out.println(charlist.get(i) + " : " + output.get(i)); }
also, should note when include pattern starts pattern (e.g. "zi"
, "z"
) longer pattern never matched, since shorter 1 capture inputs.
Comments
Post a Comment