forked from electrum/tpch-dbgen
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHISTORY
535 lines (526 loc) · 23.2 KB
/
HISTORY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
# @(#)HISTORY 2.1.8.3
Changes as of 10/11/99
-- versions: TPCH 1.2.0a, TPCR 1.1.0a
-- Correction to segmented updates that was causing extra file to be
generated
-- Porting changes for DigUnix
Changes as of 08/28/99
-- versions: TPCH 1.2.0, TPCR 1.1.0
-- reduced parameter substitution range for Q18
-- added new option to specify location of dists file (-b)
-- added DBGEN option to suppress all output (-q)
Changes as of 08/16/99
-- versions: TPCH 1.1.0a, TPCR 1.0.1e
-- prevent "reuse" of original data in update files
-- correction to lint target in makefile.suite
-- removal of vestigal l_partkey predicate from 21.sql
-- reorder lineitem/order join in q5
-- removal of table aliases from 2.sql
-- randomize seeding of qgen RNG to close bug 52
-- correct possible round off error in segmented update files
-- corrected soft copy answer set for Q22
-- corrected percision of answer set for Q19
Changes as of 07/08/99
-- versions: TPCH 1.1.0, TPCR 1.0.1
-- WORKLOAD must be set to either TPCH or TPCR in the makefile
-- unneeded reference to part table removed from q21 template
Changes as of 06/04/99
-- version 1.0.1d
-- Restarted version numbering to match specification revisions for
TPC-H and TPC-R
-- Corrected answer set for for Q13
-- Corrected parameter substitutions for Q16, Q17, Q19, Q20, Q21, Q22
-- Corrected RNG initialization in qgen.c
-- added adhoc.c adhoc.h to code base to support randomized data sets;
currently disabled
-- replaced calls to UnifInt() row_stop with call to NthElement()
-- Corrected a problem that caused small negative money values to print as
a positive value
-- Simplication of PR_xxx macros
-- QGEN building correct parameter logs again
******************
* NOTE NOTE NOTE *
******************
Below this line the file refers to TPC-D which was retired in favor of
TPC-H and TPC-R. Since the new speicifications are numbered from 1.0.0
the program version was reset.
******************
* NOTE NOTE NOTE *
******************
Changes as of 01/05/99
-- version 2.0.1
-- added 1999 to the copyright notice
-- corrected C++ compilation problem
-- sub-select phrasing corrected in Q4, Q21, Q22
-- added support for segmenting update files (contributed by Larry Kemp, HP)
Changes as of 12/08/98
-- version 2.0.0
-- removed permute.h from clean target in makefile
Changes as of 11/17/98
-- version 2.0.0 Alpha 8
-- corrected o_custkey overrun bug
-- removed upper bound on -C command option
-- added static permute.h to distribution to match the specification
Changes as of 10/23/98
-- version 2.0.0 Alpha 7
-- removed references to DSS_SEED and SEED_TAG
-- minor query template cleanup
-- V2 answer sets added
-- correction to hd_sparse for SF > 300
-- added static declaration to row types in gen_tbl to fix update problem
-- permuted params to Q22
Changes as of 5/19/98
-- version 2.0.0 Alpha6b
-- removed trailing apostrophe from dists.dss nouns for Tandem loader
-- corrected mk_sparse() problem with alpha6
-- added 64b support for NCR/Metaware
-- corrected revision problem with 2.0.0.6
Changes as of 5/7/98
-- version 2.0.0 Alpha6
-- corrected generation of parent/child tables in parallel
-- renamed ORDER table to ORDERS table
-- revision of DBGEN synced with revision of 2.0 specification
-- portability changes to process termination provided by John Matzka
-- portability changes for Watcom C provided by Andrew Eisenberg
-- indentation of specifications/templates now matches
-- queries now include a consistant header format
Changes as of 4/28/98
-- version 2.0.0 Alpha5
-- NO RELEASE OF ALPHA 5 ; skipped to sync spec/DBGEN revision levels
Changes as of 4/6/98
-- version 2.0.0 Alpha4
-- corrected parallel table generation
-- minor corrections to query templates
-- portability changes for HP
Changes as of 3/24/98
-- version 2.0.0 Alpha3
-- include substitution parameters for Q22
-- correct substitution parameters for Q16 under AIX
-- include permute.h until unix/NT makefile fix
-- correct orderkey generation
Changes as of 3/20/98
-- version 2.0.0 Alpha2
-- correct runtime malloc error from bad INIT_HUGE macro
-- improve pseudo text distribution in comments
-- fix problem with parallelism of data gen
-- re-enable generation of parent/child tables
-- remove recombinaton code for parallel flat files
Changes as of 3/11/98
-- version 2.0.0 Alpha1
-- removed the TIME table
-- removed the need for seed files
-- made 1GB the validation database size
-- add pseudo text support in comments
-- correct character selection in a_rnd()
-- correct population of P_NAME
-- removed unclaimed variants
-- added new queries 18-22, replaced Q13
Changes as of 2/6/98
-- version 1.3.1
-- Revised 64 bit support to clean up bcd2_bin()and mk_sparse()
-- Add 64b support for NT
Changes as of 12/31/97
-- version 1.3.0
-- support for seed generation > 1TB (data gen still to be tested)
-- rework of 64b support
-- added bcd support for subtraction, comparison, modulo
-- added 1998 to the copyright notice
-- clarified comments in dists.dss
-- corrected substitution problem in Q11
-- standardized fopen() error messages with OPEN_CHECK()
-- introduced PATH_SEP in config.h to allow changes in path separators
Changes as of 12/15/96
-- version 1.2.0
-- corrected typos in queries 8a, 8c, 8d, 11a, 12F and 14F, 17a
-- added variant 15c
-- defined MAX_SCALE and MIN_SCALE; issued error messages for SF > 1000
since implementation is incomplete
-- seed file generation can now be resumed with dbgen -R <n> ...
-- corrected slight compile bug under Solaris 2.5.1
-- documented compile problems under SunOS
Changes as of 8/1/96
-- version 1.1.0D
-- included new variants for queries 8 and 15
-- re-introduced answer sets in the source tree
Changes as of 5/1/96
-- version 1.1.0C
-- unified version numbering of DBGEN and QGEN
-- updated BUGS list
-- removed FAQ from soft appendix; web site will keep the current
version of the FAQ
-- added 1996 to the copyright notice
-- corrected bug in PR_DATE macro; NO CHANGE TO DATA SET
-- properly initialize param values for cleaner logging
-- adjusted output format of Q11 partam to allow scaling to 1TB
-- corrected typos in variant 14c
-- corrected data type for YEAR in variant 8c
-- corrected typos in variant 10a
-- added variant 8d
Changes as of 1/23/96
-- qgen version 1.1.0B
-- include support for ANSI semantics
-- improved patch for seed sensetivity
Changes as of 1/23/96
-- updated BUGS list
-- dbgen version 1.1.0A
-- patch to limit BCD2 fields to 12 characters for columnar output
-- qgen version 1.1.0A
-- patch to fix the "unknown flag" problem
-- patch to fix the seed sensetivity problem
Changes as of 12/19/95
-- updated BUGS list
-- dbgen version 1.1.0
-- upped default value of MAX_CHILDREN to 1000
-- corrected naming of detail tables in incremental load
-- corrected range delete output
-- forced delete files to truncate existing files
-- removed fixed size tables from seed generation
-- corrected overflow problem with large scale seed generation
-- allow date generation as MM-DD-YY based on config.h #define
-- correct truncation problem with columnar output in PR_VSTR()
-- added support for Windows NT
-- added PLATFORM macro to makefile, removed platform defines from
config.h
-- removed MAX_CHILDREN define from config.h (set to 1000 in dss.h)
-- qgen version 1.1.0
-- correct SET_OUTPUT macro to TDAT
-- use %ld in output for q17; portability
-- add support for SQLSERVER database dialect
-- add support for SYBASE database dialect
-- adjust parameter ranges for Q1, Q3, Q6
-- add -T/-t option to usage summary
-- added support for Windows NT
Changes as of 09/01/95
-- qgen version 1.0.1
-- formalized version numbering
-- -p now generates correct query permutations
-- added separate verion number for qgen
-- corrected Q3 substitution problem
-- updated permissible range for Q10
-- corrected rowcount_dflt and the MAX row indicator (-1)
-- expanded param logging to include all possible parameters
-- allowed qgen's -d option to be used at all scale factors
-- made parameter substitution permutation-independent
-- added qgen suppport for END_TRAN (-E) and DFLT_NUM (-N)
-- correct handling of :n directive
-- added more complete explanation of QGEN to README
-- rename of random to rndm, for portability
-- dbgen version 1.0.1
-- formalized version numbering
-- inclusion of SF=1 seed file
-- correct typo in usage() update example
-- patch to driver.c to allow correct updates
-- documentation change to README to clarify seed/stage/update
intereaction
-- corrected minor glitch in "open failed" error msg in print.c
-- added missing line continuation to makefile.suite
-- seed files are now based on scale factor and number of generators
-- seed files now hold seeds for one "step" of a given build
-- clean up of parallel load routines
-- inclusion of faster seed generation routines from Susanne Englert
-- removed the -E(xisting) option
-- assure proper scaling of O_CUSTKEY
-- corrected default update percentage
-- proper handling of child tables with '-O f'
-- removed seed files from the distribution
-- modified rpb_routine() to limit contribution of partkey in
retailprice
-- added '-S(tep)' option to allow multi-stage loads
-- roll in of 32 bit speed_seed routines from Dick Shelton
-- miscelaneous typo corrections in the documentation
-- cleanup of usage output
Changes as of 05/08/95
-- version 1.0
-- add Teradata defines to tpcd.h for QGEN
-- add :c to query templates for database CONNECT syntax
-- add examples of DBGEN and QGEN usage to README
-- add -T option to qgen to allow time able usage
-- query template names only requre .sql suffix, rest is arbitrary
Changes as of 03/13/95
-- version 9.1
-- surround DBNAME with ifndef in config.h
-- remove -DDBNAME from makefile.suite
-- sync varchar handling with 9.1 draft
Changes as of 02/21/95
-- version 9.0a
-- fixed bug in qgen that incorrectly included rnd.h
-- included revised DDL with changes for char/varchar and l_quantity
-- updated DBGEN help message to include new single table options for
order/lineitem and part/partsupp
-- included handling for multi-set seed files TPCDSEED.xxx
-- generated seeds up through 400GB; headed to 1TB!
-- ANSI lint cleanup; more needed
-- UF2 now defaults to key lists; use "-O r" to generate key ranges
also note, this routine this routine does NOT use the BCD2_*
routines. As a result, it WILL fail if the keys being deleted
exceed 32 bits. Since this would require ~660 update iterations,
this seems an acceptable oversight
Changes as of 01/19/95
-- version 9.0
-- allowed command line seeding of RNG for QGEN
-- order and number of params in QGEN now matches
presentation in spec
-- fixed bug in time table format of O_ORDERDATE
-- changed l_QUANTITY to FLOAT in dss.ddl
-- reworked QGEN options to be more useful
-- allowed creation of sparse keys beyond 32 bits (for 1TB)
-- removed unused '#ifdef' and associated code
-- allowed independent generation of master/detail tables
(eg, order/lineitem)
Changes as of 12/06/94
-- version 8.6
-- fixed renaming of flat files for child tables
-- various documentation fixes
-- added naming convention section to Porting.Notes
-- added -DIBM flag to config.h
-- synced up QGEN with draft 8.1
Changes as of 10/25/94
-- version 8.5a
-- corrected bug in columnar output of pr_supp
-- added pr_drange to generate a list of order keys to be
deleted instead of generating SQL
-- added '-O d' to generate range delete as SQL
-- updated default values for QGEN to sync with spec 8.1
-- corrected MK_SPARSE to reflect groups of 8
-- corrected a bug in o_orderstatus
-- regenerated seed files for SF in [1,10]
-- ANSI cleanup (primarily function declarations)
Changes as of 10/11/94
-- version 8.5
-- remove deletes/inserts to other than order/lineitem
-- increased cardinality for part.type part.container
-- '-r' argument is now integer; percentage in basis points
-- initial roll-in of new update scheme
-- added BBB comments to supplier table
Changes as of 9/27/94
-- version 8.4
-- all money calculations now use integer math. This should
bring everyone's data sets into exact aggreement.
Changes as of 9/21/94
-- version 8.3b
-- fixed handling of MAX_STREAM
-- added floor function to RPRICE bridge
-- misc lint cleanup (type fixes, new prototypes, etc.)
-- MONEY format becomes lf for DOS
-- further cleanup of PR_VSTR and its length argument
-- change to parameter generation for Q6 to allow for float
discount
Changes as of 9/15/94
-- version 8.3a
-- isolated MONEY format for Unisys (Lf) using DOS
-- make sure all arguments to MAKE_MONEY were double's
-- rolled in NEW_PTEXT to allow Berni to experiment
Changes as of 9/12/94
-- version 8.3
-- added -T n and -T r to usage to match getopt() and README
-- changed PR_MONEY to remove leading blanks
-- included revised DDL from Berni
-- included some MVS portability fixes in re malloc.h
-- cleaned up error messages in qgen and made #define ofp usage
universal
-- additional DOS portability changes
-- added {c,a}len to provide specific length for columnar
output of varchar
-- added PR_VSTR to handle varchar printing under MVS
-- fixed bit masking in a_rnd and cleaned up prototype match
with V_STR
-- PR_MONEY now used %Lf
-- added revised pseudo text under NEW_PTEXT ifdef for
experiments
Changes as of 9/09/94
-- version 8.2
-- l_discount and l_tax are now fractional (per teleconference)
-- money calculations moved to scaled integer math to clean up
answer sets
-- changed PR_FLT() to PR_MONEY to clarify usage
-- portability changes for SYBASE: dbname --> db_name
STATUS --> DBGEN_STATUS
-- added nations2 to dists.dss to handle qgen needs for now
-- reintroduced #ifndef DOS
-- reintroduced U2200 define to control kill_load()
-- broke out nation and region separately in -T option
-- updated dss.ddl based on mail from Berni
Changes as of 8/31/94
-- version 8.1
-- scaling for clerks needed to be 1000 (was 100)
-- added qgen parameter for scale
-- changed qgen parameter from s)tream to p)ermutation
-- synced qgen paramter values with 8.0 spec
-- corrected duplications in dists.dss
Changes as of 8/24/94
-- version 8.0
-- added sparse keys to lineitem/order
-- added varchar generation for comments/addresses
-- added variable lineitems/orders
-- removed ifdef for normalized code_tables
-- included code for parameter generation and template->EQT
routines
-- updated README and Porting.Notes to reflect QGEN
-- included DDL and RI examples from Berni
Changes as of 6/15/94
-- version 7.0b (numbers now match spec revsion)
-- rework of code tables to properly map nation/region; when
compiled with -DCODE_TABLES distributions are taken from
code.dss and two additional fields are generated for
customers and suppliers, [cs]_ncode and [cs]_rcode,
immediately following [cs]_region
-- replaced ifdef's around DEAD_DATA with opposites. DEAD_DATA
is now the default
-- worked through code to see that it conformed to 7.0
specification
-- adjusted scale factors/rowcounts for 1 GB == sf1
-- brought help message in line with current code
-- fixed order per customer at 10
-- make suppkey scalable in lineitem/partsupp
Changes as of 4/25/94
-- version 1.5
-- added the customers with no orders; Compile with -DDEAD_DATA
to activate the change.
-- added the code table for nation and region;
Compile with -DCODE_TABLES to activate the change.
Changes as of 3/17/94
-- version 1.41
-- completed implementation of JULIAN_DAY after talks with Berni
-- misc cleanup in usage/README files
-- removed all tabs and capped line length at 75
-- added -n option to allowing naming of inline-loaded database
Changes as of 3/16/94
-- version 1.4
-- prottyped julian day/month for query re-write work. Compile
with -DJULIAN_DAY to enable
-- removed gen_times() from driver.c
-- added VMS ifdef to config.h to clean up fork/signal issues
-- added ICL ifdef to config.h to clean up getopt() issues
-- changed header file references to config.h from machine.h
Changes as of 3/2/94
-- version 1.31
-- corrected format of C_NAME to match S_NAME and O_CLERK
-- re-allowed fractional scale factors < 1 (updates not
contiguous)
-- added DSS_CONFIG environemnt variable
-- reworked read_dist() to look for DSS_DIST in DSS_CONFIG
-- updated the README file
Changes as of 2/16/94
-- version 1.3
-- added command line options for parallel load and data set
expansion
-- changed dists.dss delimiter to | for portability
-- limited scale factors to integer values
-- added command line option for seed file generation
-- added all seed files to distribution for SFs 1 - 10
-- moved machine.h to config.h and added MAX_CHILDREN define
-- added 'f' flag to options to allow renaming of output files
-- added generation of SQL delete statements to match updates
(Note: updates are still single-threaded; -C is cleared
by -U)
-- corrected field sizing in dsstypes.h typedefs to match v 6.4
-- update percentage default set to 1%
Changes as of 12/3/93
-- version 1.2
-- added command line option to adjust update percentage
-- fixed update gneration for proper primary key ordering
-- renamed UUSR/PRC to RUSSIA/CHINA in dists.dss
-- cleaned up phone number generation to be consistant regard-
less of order of evaluation
-- adjusted size of lineitem comment to bring data in line with
100 MB == SF=1
Changes as of 10/15/93
-- added command line option for update data creation
-- miscelaneous porting and cleanup changes
-- reworked table generation to allow reuse for updates
-- added comment field to tdefs structure
-- added load_state and store_state to sync data gen and
update gen
Changes as of 7/26/93
-- combined loader and header stubs in load_stubs.c
-- separated Revision History (this file) from README
-- simplified makefile
-- removed redundancies from colors distribution
-- added getopt() for portability
-- created Porting.Notes
-- adjusted scaling rules
-- added help option to the command line
Changes as of 2/26/93
-- combined all typedefs in one header: dsstypes.h
-- combined flat file generation in print.ec
-- combined typedef population in build.ec
-- added -P to control rowcnt scaling (P for percentage)
-- added -D option for Direct data generation and added
appropriate hooks in tdefs[] structure
-- added -F option for flat file generation
-- reused -T option (use -P 0.1 to build test size database)
now accepts suboptions c,o,p,s for single table builds.
-- dropped -M option (scaling is now by rowcount)
-- added -O option for optional controls. Currently defined:
-O t -- generate optional time table a join fields in
order/lineitem
-O h -- generate headers for flat file output
-O m -- generate fixed column-length output
-- removed dynamic memory allocation, redundant calls to
UnifInt, etc to improve performance
Changes as of 1/12/92
-- julian() changed to handle orders->orderdate correctly
-- rflag distributions corrected in dists.dss
-- sea, gold removed from color distribution to clean up substring
problems
-- part->number and supplier-> adjusted for 1-based indexing
-- time->day changed to be day of month, not day of year
-- t.week changed to be week in year, not day of week
Changes as of 11/18/92
-- checked line length and tab for transmission
-- another chapter in the portability wars. added #include
"machine.h" to dss.h (which is included by everyone else). Any
machine particular porting changes should go here.
-- fixed fixed-field formats to prevent double printing
-- expanded PR_FLT formats to %010.2
Changes as of 10/21/92
-- added fixed format and column header handling; users of headers
will have to define the header functions to be called in
int (*tdefs.header)()
Changes as of 10/09/92:
-- added ansi prototypes and recompiled with gcc -ansi. users may
need to change the CC definition in the makefile and the contents
of CFLAGS to reflect their particular ansi compiler.
-- replaced all int references with long
-- replaced all float references with double
-- found and fixed odate/julian problem TS mentioned in 10/09 phone
call
Changes as of 9/09/92:
-- Park/Miller random number generator included
-- clerk scaling changed to 100 * scale
-- parts.name always built from 5 selections from colors set
-- test scaling changed to ~60MB (TEST_SCALING == 10)
-- logarithmic scaling removed
-- mfgcost removed and retail/supplier cost bounds adjusted
-- agg_str memory leak fixed
-- independent RNG streams on a per column basis
This is the revised data generator for DSS.
The rewrite tried to accomplish three things: (1) identify and isolate
all the implicit assumptions about limits, bounds, ranges, distribu-
tions, etc.; (2) standardize the way any given table was generated/
printed to ease understanding and maintenance; (3) bring the generator
in line with the current work of the committee and the excellent spec
the Indira put together; (4) provide an easy way to adjust distribu-
tions, string contents and to facilitate experimentation to get a
better idea of the impact of data population changes.
The files included are:
driver.c ------- main and the calling routines for the generators
dist.c ------- should really be named dss_util.c; misc routines
customer.c ------- generation and print routines for customer table
orders.c ------- "" "" order table
parts.c ------- "" "" parts/partsupp
suppliers.c ------- "" "" suppliers table
time.c ------- "" "" time table
customer.h ------- associate header files; contain structure
definitions
dss.h dss.h holds the large number of assumptions and
orders.h values that have been used as IFDEFs.
parts.h
suppliers.h
time.h
dists.dss ------- string selections and weights; used to build
distributions
Running make will create an executable (using the compiler flags in
CFLAGS, the ld flags in LDFLAGS and the libraries in LIBS [-O, -s,
and -lm by default]) which will create flat files suitable for dbload.
t