summaryrefslogtreecommitdiff
path: root/spamassassin/fuzzyocr/samples/README
blob: 98370c4703a4549a948012a145c7fa3bc97eb6a0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
These eml files are sample spam emails to test your installation of FuzzyOCR. Assuming you are using the default settings, the output you get should match the output listed here.

Use spamassassin -t < samplefile.eml to test :)

corrupted-gif.eml: Contains a corrupted gif image, additionally I changed the content-type to jpeg, so the output should show:

 1.5 FUZZY_OCR_WRONG_CTYPE  BODY: Mail contains an image with wrong
                            content-type set
                            Image has format "GIF" but content-type is
                            "image/jpeg"
 5.0 FUZZY_OCR_CORRUPT_IMG  BODY: Mail contains a corrupted image
                            Corrupt image: GIF-LIB error: Image is
                            defective, decoding aborted.
  10 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            Words found:
                            "stock" in 2 lines
                            "investor" in 1 lines
                            "company" in 1 lines
                            "price" in 2 lines
                            "trade" in 1 lines
                            "service" in 1 lines
                            (8 word occurrences found)

animated-gif.eml: Contains an animated gif with four frames. Both with default settings and with "focr_gif_max_frames 3" this should output:

  10 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            Words found:
                            "stock" in 2 lines
                            "company" in 3 lines
                            "trade" in 1 lines
                            "penis" in 1 lines
                            "growth" in 1 lines
                            (8 word occurrences found)

Note: Please verify that this is the output both with the setting mentioned and without, because with this setting, a different test is used.

jpeg.eml: Contains a jpeg file. Output should show:

 6.0 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            Words found:
                            "viagra" in 2 lines
                            "cialis" in 1 lines
                            "levitra" in 1 lines
                            (4 word occurrences found)

png.eml: Contains a png file. Output should show:

  24 FUZZY_OCR              BODY: Mail contains an image with common spam text inside
                            Words found:
                            "stock" in 1 lines
                            "investor" in 3 lines
                            "company" in 2 lines
                            "money" in 1 lines
                            "buy" in 1 lines
                            "price" in 6 lines
                            "trade" in 2 lines
                            "service" in 2 lines
                            "software" in 2 lines
                            "levitra" in 1 lines
                            "legal" in 1 lines
                            (22 word occurrences found)