Kraken Aho-Corasick
Kraken Aho-Corasick implements widely used aho-corasick fast pattern match algorithm. It also support continuous pattern matching over fragments.
Author
- delmitz ( delmitz@nchovy.com)
Usage
- Create your own pattern class which implements Pattern.
class MyPattern implements Pattern { private int id; private String keyword; public MyPattern(int id, String keyword) { this.id = id; this.keyword = keyword; } public int getId() { return id; } @Override public byte[] getKeyword() { try { return keyword.getBytes("utf-8"); } catch (UnsupportedEncodingException e) { return null; } } @Override public String toString() { return "rule id=" + id + ", keyword=" + keyword; } }
- Build and compile Aho-Corasick state machine
AhoCorasickSearch ac = new AhoCorasickSearch(); ac.addKeyword(new MyPattern(1, "EICAR")); ac.addKeyword(new MyPattern(2, "ANTIVIRUS")); ac.compile();
- Scan over multiple fragments
String eicar1 = "X5O!P%@AP[4\\PZX54(P^)7CC)7}$EI"; String eicar2 = "CAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*"; SearchContext context = new SearchContext(); List<Pair> results = new ArrayList<Pair>(); // arrived first segment results.addAll(ac.search(eicar1.getBytes("utf-8"), context)); // arrived second segment results.addAll(ac.search(eicar2.getBytes("utf-8"), context)); for (Pair pair : results) { System.out.println(pair); }You need to save only small search context. This is very useful in tcp stream reassembly.
- Result
pos=28, pattern=(rule id=1, keyword=EICAR) pos=43, pattern=(rule id=2, keyword=ANTIVIRUS)
Maven configuration
<dependency> <groupId>org.krakenapps</groupId> <artifactId>kraken-ahocorasick</artifactId> <version>1.0.0</version> </dependency>
History
- 1.0.0 release (2010-08-24)
