Skip to main content

Table 4 The dictionary and parse sizes for prefixes of a database of Salmonella genomes, with three settings of the parameters w and p

From: Prefix-free parsing for building big BWTs

Number of genomes

Size

\(w = 6, p = 20\)

\(w = 8, p = 50\)

\(w = 10, p = 100\)

Dict.

Parse

%

Dict.

Parse

%

Dict.

Parse

%

50

249

68

43

44

77

20

39

91

10

40

100

485

83

85

35

99

39

28

122

19

29

500

2436

273

424

29

314

194

21

377

96

19

1000

4861

475

847

27

541

388

19

643

192

17

5000

24936

2663

4334

28

2915

1987

20

3196

985

17

10,000

49420

4190

8611

26

4652

3939

17

5176

1955

14

  1. Again, all sizes are reported in megabytes; percentages are the sums of the sizes of the dictionaries and parses, divided by the sizes of the uncompressed files
  2. For each prefix, the sizes are in italics for the settings with the best overall compression