This came in 2006 after attending a talk on bioinformatics.
I had the idea of making an email client that would take the
methods of bioinformatics and apply them to spam-detection.
Searches through input and outputs sequences that are repeated.
Because it's intended for text files, control characters are
ignored.
FindPatterns [filename] [-b] [-e] [-i] [-o] [-v] [-m<n>] [-l<n>] [-g<n>] [-?|h]
- filename
- Attempt to read input from this file, otherwise uses stdin.
- -b
- Keep a buffer to count repeated matches (!o -> b.)
- -e
- Echo input.
- -i
- Case-insensitive (not implemented.)
- -n
- Don't display matches at the end.
- -o
- Output matches immediately as they are found.
- -s
- Silent mode - plain output with no extra characters.
- -v
- Verbose comments while outputting.
- -g<n>
- Set memory buffer granularity to the closest power of two
lower than <n> bytes (default 1024.)
- -l<n>
- Set match limit to <n> matches (default 4096; 0 -> no limit.)
- -m<n>
- Set minimum match length to <n> symbols (default 3).
- -?|h
- Display this help screen and exit.
Adding -<s>- will turn off switch <s>.
Also included is a simple KillSpam email client that takes the patterns
generated (from FindPatterns) and eliminates all the emails that have
matching patterns.
Source code for FindPatterns and KillSpam.
The mail client for Windows console (included with the source code.)
From http://neil.chaosnet.org/code/FindPatterns/.