wiki:KrakenAhoCorasick

Kraken Aho-Corasick

Kraken Aho-Corasick implements widely used aho-corasick fast pattern match algorithm. It also support continuous pattern matching over fragments.

Author

Usage

  • Create your own pattern class which implements  Pattern.
    class MyPattern implements Pattern {
    	private int id;
    	private String keyword;
    
    	public MyPattern(int id, String keyword) {
    		this.id = id;
    		this.keyword = keyword;
    	}
    	
    	public int getId() {
    		return id;
    	}
    
    	@Override
    	public byte[] getKeyword() {
    		try {
    			return keyword.getBytes("utf-8");
    		} catch (UnsupportedEncodingException e) {
    			return null;
    		}
    	}
    
    	@Override
    	public String toString() {
    		return "rule id=" + id + ", keyword=" + keyword;
    	}
    }
    
  • Build and compile Aho-Corasick state machine
    AhoCorasickSearch ac = new AhoCorasickSearch();
    ac.addKeyword(new MyPattern(1, "EICAR"));
    ac.addKeyword(new MyPattern(2, "ANTIVIRUS"));
    ac.compile();
    
  • Scan over multiple fragments
    String eicar1 = "X5O!P%@AP[4\\PZX54(P^)7CC)7}$EI";
    String eicar2 = "CAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*";
    
    SearchContext context = new SearchContext();
    List<Pair> results = new ArrayList<Pair>();
    
    // arrived first segment
    results.addAll(ac.search(eicar1.getBytes("utf-8"), context));
    
    // arrived second segment
    results.addAll(ac.search(eicar2.getBytes("utf-8"), context));
    
    for (Pair pair : results) {
    	System.out.println(pair);
    }
    
    You need to save only small search context. This is very useful in tcp stream reassembly.
  • Result
    pos=28, pattern=(rule id=1, keyword=EICAR)
    pos=43, pattern=(rule id=2, keyword=ANTIVIRUS)
    

Maven configuration

<dependency>
  <groupId>org.krakenapps</groupId>
  <artifactId>kraken-ahocorasick</artifactId>
  <version>1.0.0</version>
</dependency>

History

  • 1.0.0 release (2010-08-24)