An encoder converts a value to a sparse distributed representation. More...

Inheritance diagram for Encoder:

Public Member Functions
def	getWidth
	Should return the output width, in bits. More...

def	encodeIntoArray
	Encodes inputData and puts the encoded value into the numpy output array, which is a 1-D array of length returned by getWidth(). More...

def	setLearning
	Set whether learning is enabled. More...

def	setFieldStats
	This method is called by the model to set the statistics like min and max for the underlying encoders if this information is available. More...

def	encode
	Convenience wrapper for encodeIntoArray. More...

def	getScalarNames
	Return the field names for each of the scalar values returned by getScalars. More...

def	getDecoderOutputFieldTypes
	Returns a sequence of field types corresponding to the elements in the decoded output field array. More...

def	setStateLock
	Setting this to true freezes the state of the encoder This is separate from the learning state which affects changing parameters. More...

def	getEncoderList

def	getScalars
	Returns a numpy array containing the sub-field scalar value(s) for each sub-field of the inputData. More...

def	getEncodedValues
	Returns the input in the same format as is returned by topDownCompute(). More...

def	getBucketIndices
	Returns an array containing the sub-field bucket indices for each sub-field of the inputData. More...

def	scalarsToStr
	Return a pretty print string representing the return values from getScalars and getScalarNames(). More...

def	getDescription
	This returns a list of tuples, each containing (name, offset). More...

def	getFieldDescription
	Return the offset and length of a given field within the encoded output. More...

def	encodedBitDescription
	Return a description of the given bit in the encoded output. More...

def	pprintHeader
	Pretty-print a header that labels the sub-fields of the encoded output. More...

def	pprint
	Pretty-print the encoded output using ascii art. More...

def	decode
	Takes an encoded output and does its best to work backwards and generate the input that would have generated it. More...

def	decodedToStr
	Return a pretty print string representing the return value from decode().

def	getBucketValues
	Returns a list of items, one for each bucket defined by this encoder. More...

def	getBucketInfo
	Returns a list of EncoderResult namedtuples describing the inputs for each sub-field that correspond to the bucket indices passed in 'buckets'. More...

def	topDownCompute
	Returns a list of EncoderResult namedtuples describing the top-down best guess inputs for each sub-field given the encoded output. More...

def	closenessScores
	Compute closeness scores between the expected scalar value(s) and actual scalar value(s). More...

def	getDisplayWidth
	Calculate width of display for bits plus blanks between fields. More...

def	formatBits
	Copy one array to another, inserting blanks between fields (for display) If leftpad is one, then there is a dummy value at element 0 of the arrays, and we should start our counting from 1 rather than 0. More...

Detailed Description

An encoder converts a value to a sparse distributed representation.

This is the base class for encoders that are compatible with the OPF. The OPF requires that values can be represented as a scalar value for use in places like the CLA Classifier. The Encoder superclass implements:

encode() - returns a numpy array encoding the input; syntactic sugar on top of encodeIntoArray. If pprint, prints the encoding to the terminal
pprintHeader() - prints a header describing the encoding to the terminal
pprint() - prints an encoding to the terminal

Methods/properties that must be implemented by subclasses:

getDecoderOutputFieldTypes() - must be implemented by leaf encoders returns [nupic.data.fieldmeta.FieldMetaType.XXXXX] (e.g., [nupic.data.fieldmetaFieldMetaType.float])
getWidth() - returns the output width, in bits
encodeIntoArray() - encodes input and puts the encoded value into the numpy output array, which is a 1-D array of length returned by getWidth()
getDescription() - returns a list of (name, offset) pairs describing the encoded output

Member Function Documentation

def closenessScores	(	self,
		expValues,
		actValues,
		fractional = `True`
	)

Compute closeness scores between the expected scalar value(s) and actual scalar value(s).

The expected scalar values are typically those obtained from the getScalars() method. The actual scalar values are typically those returned from the topDownCompute() method.

This method returns one closeness score for each value in expValues (or actValues which must be the same length). The closeness score ranges from 0 to 1.0, 1.0 being a perfect match and 0 being the worst possible match.

If this encoder is a simple, single field encoder, then it will expect just 1 item in each of the expValues and actValues arrays. Multi-encoders will expect 1 item per sub-encoder.

Each encoder type can define it's own metric for closeness. For example, a category encoder may return either 1 or 0, if the scalar matches exactly or not. A scalar encoder might return a percentage match, etc.

Parameters

expValues	Array of expected scalar values, typically obtained from getScalars()
actValues	Array of actual values, typically obtained from topDownCompute()

Returns: Array of closeness scores, one per item in expValues (or actValues).

def decode	(	self,
		encoded,
		parentFieldName = `''`
	)

Takes an encoded output and does its best to work backwards and generate the input that would have generated it.

 In cases where the encoded output contains more ON bits than an input
 would have generated, this routine will return one or more ranges of inputs
 which, if their encoded outputs were ORed together, would produce the
 target output. This behavior makes this method suitable for doing things
 like generating a description of a learned coincidence in the SP, which
 in many cases might be a union of one or more inputs.

 If instead, you want to figure the *most likely* single input scalar value
 that would have generated a specific encoded output, use the topDownCompute()
 method.

 If you want to pretty print the return value from this method, use the
 decodedToStr() method.

Parameters

encoded	The encoded output that you want decode
parentFieldName	The name of the encoder which is our parent. This name is prefixed to each of the field names within this encoder to form the keys of the dict() in the retval.

Returns

tuple(fieldsDict, fieldOrder) (see below for details)

 fieldsDict is a dict() where the keys represent field names
 (only 1 if this is a simple encoder, > 1 if this is a multi
 or date encoder) and the values are the result of decoding each
 field. If there are  no bits in encoded that would have been
 generated by a field, it won't be present in the dict. The
 key of each entry in the dict is formed by joining the passed in
 parentFieldName with the child encoder name using a '.'.

 Each 'value' in fieldsDict consists of (ranges, desc), where
 ranges is a list of one or more (minVal, maxVal) ranges of
 input that would generate bits in the encoded output and 'desc'
 is a pretty print description of the ranges. For encoders like
 the category encoder, the 'desc' will contain the category
 names that correspond to the scalar values included in the
 ranges.

 The fieldOrder is a list of the keys from fieldsDict, in the
 same order as the fields appear in the encoded output.

 TODO: when we switch to Python 2.7 or 3.x, use OrderedDict

 Example retvals for a scalar encoder:

     {'amount':  ( [[1,3], [7,10]], '1-3, 7-10' )}
     {'amount':  ( [[2.5,2.5]],     '2.5'       )}

 Example retval for a category encoder:

     {'country': ( [[1,1], [5,6]], 'US, GB, ES' )}

 Example retval for a multi encoder:

     {'amount':  ( [[2.5,2.5]],     '2.5'       ),
      'country': ( [[1,1], [5,6]],  'US, GB, ES' )}

def encode	(	self,
		inputData
	)

Convenience wrapper for encodeIntoArray.

 This may be less efficient because it allocates a new numpy array every
 call.

Parameters

inputData TODO: document

Returns: a numpy array with the encoded representation of inputData

def encodedBitDescription	(	self,
		bitOffset,
		formatted = `False`
	)

Return a description of the given bit in the encoded output.

 This will include the field name and the offset within the field.

Parameters

bitOffset	Offset of the bit to get the description of
formatted	If True, the bitOffset is w.r.t. formatted output, which includes separators

Returns: tuple(fieldName, offsetWithinField)

def encodeIntoArray	(	self,
		inputData,
		output
	)

Encodes inputData and puts the encoded value into the numpy output array, which is a 1-D array of length returned by getWidth().

 Note: The numpy output array is reused, so clear it before updating it.

Parameters

inputData	Data to encode. This should be validated by the encoder.
output	numpy 1-D array of same length returned by getWidth()

def formatBits	(	self,
		inarray,
		outarray,
		scale = `1`,
		blank = `255`,
		leftpad = `0`
	)

Copy one array to another, inserting blanks between fields (for display) If leftpad is one, then there is a dummy value at element 0 of the arrays, and we should start our counting from 1 rather than 0.

Parameters

inarray	TODO: document
outarray	TODO: document
scale	TODO: document
blank	TODO: document
leftpad	TODO: document

def getBucketIndices	(	self,
		inputData
	)

Returns an array containing the sub-field bucket indices for each sub-field of the inputData.

To get the associated field names for each of the buckets, call getScalarNames().

Parameters

inputData The data from the source. This is typically a object with members.

Returns: array of bucket indices

def getBucketInfo	(	self,
		buckets
	)

Returns a list of EncoderResult namedtuples describing the inputs for each sub-field that correspond to the bucket indices passed in 'buckets'.

 To get the associated field names for each of the values, call getScalarNames().

Parameters

buckets The list of bucket indices, one for each sub-field encoder. These bucket indices for example may have been retrieved from the getBucketIndices() call. A list of EncoderResult namedtuples. Each EncoderResult has three attributes:

value: This is the value for the sub-field in a format that is consistent with the type specified by getDecoderOutputFieldTypes(). Note that this value is not necessarily numeric.
scalar: The scalar representation of value. This number is consistent with what is returned by getScalars(). This value is always an int or float, and can be used for numeric comparisons
encoding This is the encoded bit-array (numpy array) that represents 'value'. That is, if 'value' was passed to encode(), an identical bit-array should be returned

def getBucketValues ( self )

Returns a list of items, one for each bucket defined by this encoder.

 Each item is the value assigned to that bucket, this is the same as the
 EncoderResult.value that would be returned by getBucketInfo() for that
 bucket and is in the same format as the input that would be passed to
 encode().

 This call is faster than calling getBucketInfo() on each bucket individually
 if all you need are the bucket values.

 **Must be overridden by subclasses.**

Returns: list of items, each item representing the bucket value for that bucket.

def getDecoderOutputFieldTypes ( self )

Returns a sequence of field types corresponding to the elements in the decoded output field array.

The types are defined by nupic.data.fieldmeta.FieldMetaType.

Returns: list of nupic.data.fieldmeta.FieldMetaType objects

def getDescription ( self )

This returns a list of tuples, each containing (name, offset).

 The 'name' is a string description of each sub-field, and offset is the bit
 offset of the sub-field for that encoder.

 For now, only the 'multi' and 'date' encoders have multiple (name, offset)
 pairs. All other encoders have a single pair, where the offset is 0.

 **Must be overridden by subclasses.**

Returns: list of tuples containing (name, offset)

def getDisplayWidth ( self )

Calculate width of display for bits plus blanks between fields.

Returns: width of display for bits plus blanks between fields

def getEncodedValues	(	self,
		inputData
	)

Returns the input in the same format as is returned by topDownCompute().

 For most encoder types, this is the same as the input data.
 For instance, for scalar and category types, this corresponds to the numeric
 and string values, respectively, from the inputs. For datetime encoders, this
 returns the list of scalars for each of the sub-fields (timeOfDay, dayOfWeek, etc.)

 This method is essentially the same as getScalars() except that it returns
 strings

Parameters

inputData The input data in the format it is received from the data source

Returns: A list of values, in the same format and in the same order as they are returned by topDownCompute.

def getEncoderList ( self )

Returns: a reference to each sub-encoder in this encoder. They are returned in the same order as they are for getScalarNames() and getScalars().

def getFieldDescription	(	self,
		fieldName
	)

Return the offset and length of a given field within the encoded output.

Parameters

fieldName Name of the field

Returns: tuple(offset, width) of the field within the encoded output

def getScalarNames	(	self,
		parentFieldName = `''`
	)

Return the field names for each of the scalar values returned by getScalars.

Parameters

parentFieldName The name of the encoder which is our parent. This name is prefixed to each of the field names within this encoder to form the keys of the dict() in the retval.

Returns: array of field names

def getScalars	(	self,
		inputData
	)

Returns a numpy array containing the sub-field scalar value(s) for each sub-field of the inputData.

To get the associated field names for each of the scalar values, call getScalarNames().

For a simple scalar encoder, the scalar value is simply the input unmodified. For category encoders, it is the scalar representing the category string that is passed in. For the datetime encoder, the scalar value is the the number of seconds since epoch.

The intent of the scalar representation of a sub-field is to provide a baseline for measuring error differences. You can compare the scalar value of the inputData with the scalar value returned from topDownCompute() on a top-down representation to evaluate prediction accuracy, for example.

Parameters

inputData The data from the source. This is typically a object with members

Returns: array of scalar values

def getWidth ( self )

Should return the output width, in bits.

Returns: output width in bits

def pprint	(	self,
		output,
		prefix = `""`
	)

Pretty-print the encoded output using ascii art.

Parameters

output	to print
prefix	printed before the header if specified

def pprintHeader	(	self,
		prefix = `""`
	)

Pretty-print a header that labels the sub-fields of the encoded output.

This can be used in conjuction with pprint.

Parameters

prefix printed before the header if specified

def scalarsToStr	(	self,
		scalarValues,
		scalarNames = `None`
	)

Return a pretty print string representing the return values from getScalars and getScalarNames().

Parameters

scalarValues	input values to encode to string
scalarNames	optional input of scalar names to convert. If None, gets scalar names from getScalarNames()

Returns: string representation of scalar values

def setFieldStats	(	self,
		fieldName,
		fieldStatistics
	)

This method is called by the model to set the statistics like min and max for the underlying encoders if this information is available.

Parameters

fieldName	name of the field this encoder is encoding, provided by multiencoder
fieldStatistics	dictionary of dictionaries with the first level being the fieldname and the second index the statistic ie: fieldStatistics['pounds']['min']

def setLearning	(	self,
		learningEnabled
	)

Set whether learning is enabled.

Parameters

learningEnabled whether learning should be enabled

def setStateLock	(	self,
		lock
	)

Setting this to true freezes the state of the encoder This is separate from the learning state which affects changing parameters.

Implemented in subclasses.

def topDownCompute	(	self,
		encoded
	)

Returns a list of EncoderResult namedtuples describing the top-down best guess inputs for each sub-field given the encoded output.

These are the values which are most likely to generate the given encoded output. To get the associated field names for each of the values, call getScalarNames().

Parameters

encoded The encoded output. Typically received from the topDown outputs from the spatial pooler just above us.

Returns: A list of EncoderResult namedtuples. Each EncoderResult has three attributes:

value: This is the best-guess value for the sub-field in a format that is consistent with the type specified by getDecoderOutputFieldTypes(). Note that this value is not necessarily numeric.

scalar: The scalar representation of this best-guess value. This number is consistent with what is returned by getScalars(). This value is always an int or float, and can be used for numeric comparisons.

encoding This is the encoded bit-array (numpy array) that represents the best-guess value. That is, if 'value' was passed to encode(), an identical bit-array should be returned.

The documentation for this class was generated from the following file:

nupic/encoders/base.py

Public Member Functions

Detailed Description

Member Function Documentation