Field Specs

Field Spec Structure

There are several ways to define a Field Spec. There is the full spec format, and a variety of short hand notations.

The Full Format

Each of the core built in types has a JSON schema. The full format is what is used to validate against this schema. Other shorthand formats are processed into the full format. Each Type Handler requires different pieces of information. For most types, the key fields are type, data, and config. Below is the general Field Spec structure.

{
  "type": "<the type>",
  "config": {
    "key1": "value1",
    ...
    "keyN": "valueN"
  },
  "data": ["the data"],
  "ref": "REF_POINTER_IF_USED",
  "refs": ["USES", "MORE", "THAN", "ONE"],
  "fields": { "for": {}, "nested": {}, "types": {} }
}

Values Shorthand

The values type is very common and so has a shorthand notation. Below is an example full Field Spec for some values types fields and the same spec in shorthand notation.

{
  "field1": {
    "type": "values",
    "data": [1, 2, 3, 4, 5]
  },
  "field2": {
    "type": "values",
    "data": {"A": 0.5, "B": 0.3, "C": 0.2}
  },
  "field3": {
    "type": "values",
    "data": "CONSTANT"
  }
}

Shorthand Format:

{
  "field1": [1, 2, 3, 4, 5],
  "field2": {
    "A": 0.5,
    "B": 0.3,
    "C": 0.2
  },
  "field3": "CONSTANT"
}

The value after the field name is just the value of the data element from the full Field Spec. Config params can be added to the key using the URL syntax described below.

Inline Key Type Shorthand

Some specs lend themselves to being easily specified with few parameters. One short hand way to do this is the use a colon in the key to specify the type after the field name. For example {"id:uuid":{}}. This says the field id is of type uuid and has no further configuration. If no type is specified, the field is assumed to be a values type.

Inline Key Config Shorthand

It is also possible to specify configuration parameters in the key by using URL style parameters. For example.

{
  "network:ipv4?cidr=192.168.0.0/16": {}
}

The network field is of type ipv4 and the required cidr param is specified in the key.

Spec Configuration

There are two ways to configure a spec. One is by providing a config element in the Field Spec and the other is by using a URL parameter format in the key. For example, the following two fields will produce the same values:

{
  "ONE": {
    "type": "values",
    "data": [1, 2, 3],
    "config": {
      "prefix": "TEST",
      "suffix": "@DEMO"
    }
  },
  "TWO?prefix=TEST&suffix=@DEMO": {
    "type": "values",
    "data": [1, 2, 3]
  }
}

Common Configurations

There are some configuration values that can be applied to all or a subset of types. These are listed below

key

argument

effect

prefix

string

Prepends the value to all results

suffix

string

Appends the value to all results

quote

string

Wraps the resulting value on both sides with the provided string

cast

i,int,f,float,s,str,string

For numeric types, will cast results the provided type

join_with

string

For types that produce multiple values, use this string to join them

as_list

yes,true,on

For types that produce multiple values, return as list without joining

Example:

{
  "field": {
    "type": "values",
    "data": ["world", "beautiful", "destiny"],
    "config": {
      "prefix": "hello "
    }
  }
}

Count Config Parameter

Several types support a count config parameter (cnt works too). The value of the count parameter can be any of the supported values specs formats. For example a constant 3, list [2, 3, 7], or weighted map {"1": 0.5, "2": 0.3, "3": 0.2 }. This will produce the number of values by creating a value supplier for the count based on the supplied parameter. Most of the time if the count is greater than 1, the values will be returned as an array. Some types support joining the values by specifying the join_with parameter. Some types will let you explicitly set the as_list parameter to force the results to be returned as an array and not the default for the given type.

Count Distributions

Another way to specify a count is to use a count distribution. This is done with the count_dist param. The param takes a string argument which is the distribution along with its required arguments in function call form with parameters explicitly named. See the table below.

distribution

required arguments

optional args

examples

uniform

start,end

“uniform(start=10, end=30)”

“uniform(start=1, end=3)”

guass

mean,stddev

min,max

“gauss(mean=2, stddev=1)”

guassian

“guassian(mean=7, stddev=1, min=4)”

normal

“normal(mean=25, stddev=10, max=40)”

normal, guassian, and gauss are all aliases for a Normal Distribution.

Example:

{
  "field": {
    "type": "char_class",
    "data": "visible",
    "config": {
      "count_dist": "normal(mean=5, stddev=2, min=1, max=9)"
    }
  }
}

Custom Count Distributions

Custom distributions can be supplied using custom code loading and the @datacraft.registry.distribution decorator:

from scipy.stats import gamma
import datacraft

class _GammaDist(datacraft.Distribution):
    def __init__(self, a: float):
        self.a = a

    def next_value(self):
        return gamma.rvs(self.a)

@datacraft.registry.distribution('gamma')
def _gamma_distribution(a, **kwargs) -> datacraft.Distribution:
    """ example custom distribution """
    return _GammaDist(float(a))

Casting Values

The CasterInterface exists to modify the results of generated data in small ways. An example would be the rand_range type that produces floating point numbers within a given range. If you want an integer in the range provided by the supplier, you can use the "cast": "int" config param. Below is a table of all of the built in caster types. Custom casters can be registered with the @datacraft.registry.casters decorator as well. See example below.

Built in Casters

name

description

input

output

int

casts floats or string floats to integers

44.567

44

i

alias for int

float

casts float strings or integers to floats

44

44.0

‘44.23’

44.23

‘44.23’

44.23

f

alias for float

string

casts any type to a string

123

‘123’

44.23

‘44.23’

True

‘True’

str

alias for string

s

alias for string

hex

casts integer objects to hexidecimal form

123

‘0x7b’

1023

‘0x3ff’

h

alias for hex

lower

casts to string and lower cases value

‘aBcD’

‘abcd’

123

‘123’

True

‘true’

l

alias for lower

upper

casts to string and upper cases value

‘aBcD’

‘ABCD’

123

‘123’

True

‘TRUE’

u

alias for upper

trim

removes leading and trailing whitespace

‘ val ‘

‘val’

t

alias for trim

round

round to nearest integer

44.567

45

44.123

44

round0

round to ones, type is float

44.567

45.0

round1

round to first decimal place

44.567

45.6

44.123

44.1

round2

round to second decimal place

44.567

45.57

44.123

44.12

same for round3 up to round9

zfill1

zero fill to one character

1

1

‘’

0

zfill2

zero fill to two characters

1

01

22

22

zfill3

zero fill to three characters

c

00c

# 4 characters -> no fill

44.1

44.1

same for up to zfill10

Custom Value Casters

Custom casters can be supplied using custom code loading and the @datacraft.registry.casters decorator:

from typeing import Any
import datacraft

class _ReverseCaster(datacraft.CasterInterface):
    def cast(self, value: Any) -> str:
        return str(value)[::-1]

@datacraft.registry.casters('reverse')
def _reverse_caster() -> datacraft.CasterInterface:
    """ example custom caster """
    return _ReverseCaster()