A heuristic approach to feature extraction and compression for written languages