Command Line Interface

There are several options when using datacraft from the command line.

Inline JSON/YAML

To test small spec fragments, you can use the --inline <spec> flag. Most of the examples are inline YAML, since these work on both linux and windows based command prompts. Inline JSON is also supported. Example:

datacraft --inline "{ handle: { type: cc-word, config: {min: 3, mean: 5 } } }" -i 5 --log-level off --printkey

handle -> wzS
handle -> 9LRO
handle -> aeWMH
handle -> enXw_u
handle -> nTI

Log Levels

You can change the logging levels to one of 'critical', 'fatal', 'error', 'warning', 'warn', 'info', 'debug', 'off', 'stop', 'disable' by using the -l or --log-level flag. See example above.

Registry Help

Many of the Field Spec types, formatters, and casters are stored in the datacraft Registry. Use the commands below to list and print out help for the various registered entities.

List Types

To see a list of the registered types that can be used in specs use the --type-list command line flag:

datacraft --type-list

INFO [22-May-2050 06:20:02 AM] Starting Loading Configurations...
INFO [22-May-2050 06:20:02 AM] Loading custom type loader: core
INFO [22-May-2050 06:20:02 AM] Loading custom type loader: xeger
calculate
char_class
cc-ascii
cc-lower
cc-upper
...
uuid
values
xeger

Type Usage

To get detailed usage for all of the types use the --type-help flag. The flag with no arguments will list all of the available usage for all registered types. You can limit the usage printed to specific types by providing them as args to the --type-help flag:

# lists all usage
datacraft --type-help
# only lists help for calculate type
datacraft --type-help calculate -l warn

-------------------------------------
calculate | Example Spec:
{
  "height_in": {
    "type": "values",
    "data": [60, 70, 80, 90]
  },
  "height_cm": {
    "type": "calculate",
    "fields": [
      "height_in"
    ],
    "formula": "{{ height_in }} * 2.54"
  }
}
datacraft -s spec.json -i 3 -r 1--format json -x -l off
{"height_in": 60, "height_cm": 152.4}
{"height_in": 70, "height_cm": 177.8}
{"height_in": 80, "height_cm": 203.2}

-------------------------------------

Specify the -o <directory> option to create a file type-help.txt, with the full usage info:

datacraft --type-help -o .
INFO [22-May-2050 01:13:15 PM] Starting Loading Configurations...
INFO [22-May-2050 01:13:15 PM] Loading custom type loader: core
INFO [22-May-2050 01:13:15 PM] Loading custom type loader: xeger
INFO [22-May-2050 01:13:15 PM] Wrote data to .\type-help.txt

Caster List

The different casting operators available can be listed with the --cast-list command line flag. The ones that look like string -> str -> s indicate the aliases that can be used in place of the full caster name. For example:

{
    "age1": {
        "type": "rand_range",
        "data": [1, 100],
        "config": {
            "cast": "int"
        }
    },
    "age2:rand_range?cast=i": [1, 100],
    "age3:rand_range?cast=round3;str;f": [1, 100]
}

datacraft -s cast.json -i 1 -x -l off --format json-pretty
[
    {
        "age1": 44,
        "age2": 74,
        "age3": 78.535
    }
]

The age1 and age2 fields both cast the value to an integer. The age3 field illustrates the use of multiple casters. This one first rounds the value to three digits then casts to a string followed by a floating point number.

Formatter List

Use the command line --format-list flag to print out the list of registered formatters.

datacraft --format-list -l warn

json
json-pretty
csv
csvh
csv-with-header
yaml

Formatting Output

The default is to write the generated values out to the console. Use the --printkey flag to print the key with the value:

datacraft --inline "{ id:uuid, ts:date }" -i 2 --log-level off

6f1fad06-9eaa-4eb1-b6c4-e842682ce7d2
28-11-2050
493fff93-34e6-437f-bd82-71b1dee7219d
27-11-2050

datacraft --inline "{ id:uuid, ts:date }" -i 2 --log-level off --printkey

id -> 9275840a-bb1e-4ec6-ae88-702d7a1906c9
ts -> 14-11-2050
id -> 899f8928-b5f3-4c8e-9443-5ba5f41f81a9
ts -> 11-12-2050

Sometimes it may be useful to dump the generated data into a format that is easier to consume or view. Use the -f or --format flag to specify one of json or json-pretty or csv. The json format will print a flat version of each record that takes up a single line for each iteration. The json-pretty format will print an indented version of each record that will span multiple lines. The csv format will output each record as a comma separated value line. If you want headers with the csv use the csv-with-header or csvh format. Examples:

datacraft --inline "{ id:uuid, ts:date }" -i 2 -r 1 --log-level off --format json -x

{"id": "732376df-9adc-413e-8493-73555fae51f9", "ts": "21-04-2050"}
{"id": "d826774a-1eeb-4e35-8253-0b00a514c0d1", "ts": "02-04-2050"}

datacraft --inline "{ id:uuid, ts:date }" -i 2 --log-level off --format json-pretty -x

[
    {
        "id": "4a75d0fc-46b7-4c9b-82f1-c87dcee13674",
        "ts": "09-04-2050"
    },
    {
        "id": "62db293b-d8f8-4c9a-8653-6dba8713bab9",
        "ts": "13-04-2050"
    }
]

datacraft --inline "{ id:uuid, ts:date }" -i 2 --log-level off --format csv -x

f8b87f46-ebda-4364-a042-21e6ac117762,09-12-2050
3b0c236c-3882-4242-9f3b-053ab3da4be8,12-12-2050

datacraft --inline "{ id:uuid, ts:date.iso.us }" -i 2 --log-level off --format csvh -x

id,ts
1d79ebca-9cc4-4de2-8af3-0cfc1bbd7c55,2022-07-23T19:12:41.683306
a41e1f3a-3954-406b-b022-fc54f43f6aab,2022-07-25T10:23:19.766581

Records Per File

When writing results to a file, the default behavior is to write all records to a single file. You can modify this by specifying the -r or --records-per-file command line argument. The behavior is different when hosting the generated data with the --server option. In this case the default is to return a single record at a time. Use the same --records-per-file command line argument to return more that one record per request.

Examples:

datacraft --inline "{timestamp:date: {}}" -i 4 -r 2 --log-level off --format json -x
[{"timestamp": "25-04-2050"}, {"timestamp": "06-04-2050"}]
[{"timestamp": "09-04-2050"}, {"timestamp": "09-04-2050"}]

datacraft --inline "{timestamp:date: {}}" -i 4 -r 1 --log-level off --format json -x
[{"timestamp": "22-04-2050"}, {"timestamp": "03-04-2050"}, {"timestamp": "10-04-2050"}, {"timestamp": "06-04-2050"}]

Templated Data

If you have a fragment of data that is templated, you can use Datacraft to inject data from the DataSpec into the templated file. We make use of the the Jinja2 templating engine format for specifying variable names to substitute. For example:

<order>
    <product>
        <id>{{product_id}}</id>
        <name>Widget 22</name>
        <quantity>{{quantity}}</quantity>
    </product>
    <total>{{total_price}}</total>
</order>

You could then write a spec like:

{
  "product_id": {
    "type": "cc-number", "config": { "count": 13 }
  },
  "quantity": {
    "type": "rand_int_range", "data": [1, 10]
  },
  "total_price": {
    "type": "rand_range", "data": [1.75, 1111.75], "config": { "cast": "round2" }
  }
}

In order to apply the data to the template, use the -t flag:

$ datacraft -s spec.json -t template.xml -i 1 -l off
<order>
    <product>
        <id>1837671592021</id>
        <name>Widget 22</name>
        <quantity>6</quantity>
    </product>
    <total>592.08</total>
</order>

The -t flag will also work for an inline template as a string e.g.:

$ datacraft -s sample.json  -i 3 -t 'Ingredients: {{ ingredients | safe }}' -l off
Ingredients: "garlic", "onions"
Ingredients: "mushrooms", "potatoes", "garlic", "bell peppers"
Ingredients: "potatoes", "mushrooms"

Apply Raw

The --apply-raw command line flag will treat the argument of the -s flag as the raw-data that should be applied to the template. This can be helpful when working on adjusting the template that is being generated. You can dump the generated data from N iterations using the --format json or --format json-pretty then use this as raw input to the template file.

Debugging Specifications

There are a bunch of shorthand formats for creating specifications. These ultimately get turned into a full spec format. It may be useful to see what the full spec looks like after all the transformations have taken place. Use the --debug-spec to dump the internal form of the specification for inspection. Use the --debug-spec-yaml to dump the spec as YAML.

datacraft --inline "geo:geo.pair?start_lat=-99.0: {}" --log-level off --debug-spec

{
   "geo": {
       "config": {
           "start_lat": "-99.0"
       },
       "type": "geo.pair"
   }
}

datacraft --inline "geo:geo.pair?start_lat=-99.0: {}" --log-level off --debug-spec-yaml

geo:
  type: geo.pair
  config:
    start_lat: '-99.0'

Schema Level Validation

Most of the default supported field spec types have JSON based schemas defined for them. Schema based validation is turned off by default. Use the --strict command line flag to turn on the strict schema based checks for types that have schemas defined. Examples:

datacraft --inline "geo:geo.pair?start_lat=-99.0: {}" --log-level info -i 2 --format json --strict

INFO [13-Nov-2050 02:59:25 PM] Starting Loading Configurations...
INFO [13-Nov-2050 02:59:25 PM] Starting Processing...
WARNING [13-Nov-2050 02:59:25 PM] '-99.0' is not of type 'number'
ERROR [13-Nov-2050 02:59:25 PM] Failed to validate spec type: geo.pair with spec: {'config': {'start_lat': '-99.0'}, 'type': 'geo.pair'}

In the instance above the start latitude is interpreted as a string. If we reformat the inline spec:

datacraft --inline "{geo:geo.pair: {config: {start_lat: -99.0}}}" --log-level info -i 2 --format json --strict

INFO [13-Nov-2050 03:00:57 PM] Starting Loading Configurations...
INFO [13-Nov-2050 03:00:57 PM] Starting Processing...
WARNING [13-Nov-2050 03:00:57 PM] -99.0 is less than the minimum of -90
ERROR [13-Nov-2050 03:00:57 PM] Failed to validate spec type: geo.pair with spec: {'config': {'start_lat': -99.0}, 'type': 'geo.pair'}

This time validation fails for the expected reason that the start_lat is out of the valid range.

datacraft --inline "demo:unicode_range: {}" -i 3 --strict

INFO [13-Nov-2050 03:07:36 PM] Starting Loading Configurations...
INFO [13-Nov-2050 03:07:36 PM] Starting Processing...
WARNING [13-Nov-2050 03:07:36 PM] 'data' is a required property

Here we are told that we are missing a required property for the unicode_range spec. You can always use the --type-help flag to get an usable example for any type:

$ datacraft --type-help unicode_range
-------------------------------------
unicode_range | Example Spec:
{
  "text": {
    "type": "unicode_range",
    "data": ["3040", "309f"],
    "config": {
      "mean": 5
    }
  }
}

$ datacraft -s spec.json -i 3 --format json -x -l off
[{"text": "ぢたゝわすづそぜるく"}, {"text": "も"}, {"text": "゚ぷつ゛ざくしが゘び"}]

Default Values

There are some default values used when a given spec does not provide them. These defaults can be viewed using the --debug-defaults flag.

datacraft --debug-defaults -l off

{
    "sample_mode": false,
    "combine_join_with": "",
    "char_class_join_with": "",
    "geo_as_list": false,
    ...
    "json_indent": 4,
    "large_csv_size_mb": 250,
    "data_dir": "./data",
    "csv_file": "data.csv",
    "mac_addr_separator": ":"
}

The general convention is to use the type as a prefix for the key that it effects. You can save this information to disk by specifying the -o or --outdir flag. In the output above the default join_with config param is a comma for the geo type, but is an empty string for the combine and char_class types.

Override Defaults

To override the default values, use the --defaults /path/to/custom_defaults.json or specify individual overrides with --set-defaults key=value.

datacraft --debug-defaults -l off --defaults /path/to/custom_defaults.json

{
    "sample_mode": "true",
    "combine_join_with": "",
    "char_class_join_with": "",
    ...
    "large_csv_size_mb": 250,
    "data_dir": "./data",
    "csv_file": "data.csv",
    "mac_addr_separator": ":"
}

datacraft --debug-defaults -l off --set-defaults date_format="%Y_%m_%d" sample_mode="true"

{
    "sample_mode": "true",
    "combine_join_with": "",
    "char_class_join_with": "",
    "geo_as_list": false,
    ...
    "date_format": "%Y_%m_%d",
    "geo_precision": 4,
    "csv_file": "data.csv",
    "mac_addr_separator": ":"
}