Input / Output

FileRecordStream

class nupic.data.file_record_stream.FileRecordStream(streamID, write=False, fields=None, missingValues=None, bookmark=None, includeMS=True, firstRecord=None)

CSV file based RecordStream implementation

appendRecord(record, inputBookmark=None)

Saves the record in the underlying csv file.

record: a list of Python objects that will be string-ified

Returns: nothing

appendRecords(records, inputRef=None, progressCB=None)

Saves multiple records in the underlying storage.

Params: records - array of records as in ‘appendRecord’
inputRef - reference to the corresponding input (not applicable
in case of a file storage)

progressCB - callback to report progress

Returns: nothing

clearStats()

Resets stats collected so far.

getBookmark()

Returns an anchor to the current position in the data. Passing this anchor to a constructor makes the current position to be the first returned record.

getDataRowCount()

Returns: count of data rows in dataset (excluding header lines)

getError()

Returns errors saved in the stream.

getFieldNames()

Returns an array of field names associated with the data.

getFields()

Returns a sequence of nupic.data.fieldmeta.FieldMetaInfo name/type/special tuples for each field in the stream.

getLastRecords(numRecords)

Returns a tuple (successCode, recordsArray), where successCode - if the stream had enough records to return, True/False recordsArray - an array of last numRecords records available when

the call was made. Records appended while in the getLastRecords will be not returned until the next call to either getNextRecord() or getLastRecords()
getNextRecord(useCache=True)

Returns next available data record from the file.

retval: a data row (a list or tuple) if available; None, if no more records
in the table (End of Stream - EOS); empty sequence (list or tuple) when timing out while waiting for the next record.
getNextRecordIdx()

Returns the index of the record that will be read next from getNextRecord()

getRecordsRange(bookmark=None, range=None)

Returns a range of records, starting from the bookmark. If ‘bookmark’ is None, then records read from the first available. If ‘range’ is None, all available records will be returned (caution: this could be a lot of records and require a lot of memory).

getStats()

Parse the file using dedicated reader and collect fields stats. Never called if user of FileRecordStream does not invoke getStats method.

Returns: a dictionary of stats. In the current implementation, min and max

fields are supported. Example of the return dictionary is:

{
‘min’ : [f1_min, f2_min, None, None, fn_min], ‘max’ : [f1_max, f2_max, None, None, fn_max]

}

(where fx_min/fx_max are set for scalar fields, or None if not)

isCompleted()

Returns True if all records are already in the stream or False if more records is expected.

next()

Implement the iterator protocol

recordsExistAfter(bookmark)

Returns True iff there are records left after the bookmark.

rewind()

Put us back at the beginning of the file again)

seekFromEnd(numRecords)

Seeks to numRecords from the end and returns a bookmark to the new position.

setAutoRewind(autoRewind)

Controls whether getNext() should automatically rewind the source when EOF is reached.

autoRewind: True = getNext() will automatically rewind the source on EOF;
False = getNext() will not automatically rewind the source on EOF
setCompleted(completed=True)

Marks the stream completed (True or False)

setError(error)

Saves specified error in the stream.

setTimeout(timeout)

Set the read timeout