java - Efficient searching according to "starts with" -


i facing issue write logic below problem.

i have 2 arraylists of strings:

  • list1: contains 5 million strings

  • list2: create on users input , contains strings/characters(ex. a,b,c,g,l,pd,sp,mta)

now have split list1 multiple lists according startswith strings in list2 in above case. need create 8 lists starts 'a', 'b','c', 'g', 'l','pd', 'sp' , 'mta'

but condition above have iterate list1 or list2 once. i.e. worst complexity algorithm should size of list1 (5 million).

it allowed use collections.sort() method

code have tried

    // create list search strings.     list<string> charlist = new arraylist<string>();     charlist.add("a");     charlist.add("b");     charlist.add("e");     charlist.add("z");     charlist.add("4");     charlist.add("1");     charlist.add("zi");      list<string> recordlist = new arraylist<string>();      // creating dummy data 100 character in live environment can     // around 50 lakhs strings     (int = 0; < 100; i++) {         char[] chars = "abcdefghijklmnopqrstuvwxyzabcgdkl0123456789".tochararray();         stringbuilder sb = new stringbuilder();         random random = new random();         (int i1 = 0; i1 < 6; i1++) {             char c = chars[random.nextint(chars.length)];             sb.append(c);         }         string output = sb.tostring();         recordlist.add(output);     }      // adding data mannually     recordlist.add("zink");     recordlist.add("zebra");     recordlist.add("zzzzzz");     collections.sort(charlist, string.case_insensitive_order);     collections.sort(recordlist, string.case_insensitive_order);      system.out.println("recordlist ===>" + recordlist);     system.out.println("***************************************************");     system.out.println("charlist ===>" + charlist);     system.out.println("***************************************************");      list<list> lists = new arraylist<list>();      int startindex = 0, charpointer = 0;     while (startindex < recordlist.size() && charpointer < charlist.size()) {         list<string> temp = new arraylist<string>();          boolean ishit = false;         string currentrecord = recordlist.get(startindex);         string partitionsattement = charlist.get(charpointer);         while (currentrecord.startswith(partitionsattement.touppercase())                 || currentrecord.startswith(partitionsattement.tolowercase())) {             temp.add(recordlist.get(startindex));             ishit = true;             startindex++;         }         if (!ishit) {             startindex++;         }         if (!temp.isempty()) {             lists.add(temp);             system.out.println(charlist.get(charpointer) + "====>" + temp);         }         charpointer++;     } 

just using string startswith method won't work in case. consider happens if first pattern not match input - you'll loop through strings in input list without finding match, though subsequent pattern matches exist.

what need instead compare each pattern against initial characters of each input string , process accordingly. let's have input string str , pattern pat. let substr first pat.length() characters of str. can compare substr , pat using string comparetoignorecase method. there 3 cases consider:

substr < pat move next input string.

substr == pat add str output pat , move next input string.

substr > pat move next pattern.

here's code illustrate (i've kept variable names possible).

list<list<string>> output = new arraylist<>(); for(int i=0; i<charlist.size(); i++) output.add(new arraylist<string>());  int startindex=0; int charpointer=0;               while(startindex < recordlist.size() && charpointer < charlist.size()) {     string charstr = charlist.get(charpointer);     string recstr = recordlist.get(startindex);      int cmp;     if(recstr.length() < charstr.length())      {         cmp = -1;     }     else     {         string recsubstr = recstr.substring(0, charstr.length());                    cmp = recsubstr.comparetoignorecase(charstr);                }      if(cmp <= 0)     {         if(cmp == 0) output.get(charpointer).add(recstr);         startindex++;     }     else      {         charpointer++;     } }  for(int i=0; i<charlist.size(); i++) {     system.out.println(charlist.get(i) + " : " + output.get(i)); } 

also, should note when include pattern starts pattern (e.g. "zi" , "z") longer pattern never matched, since shorter 1 capture inputs.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -