Dataformat Here are some brief description of the main classes defined in the dataformat library. DataObject A class to read, save, and operate on rather general "row and column based" data. The data is supposed to be stored with one entry per line, and with a fixed number of (typically tab-separated) columns, where each column has a specific type which may be continuous values, discrete values, or symbols (i.e. strings). Internally all discrete and symbolic data is stored as integers and continuous data as floats, In more detail each data entry (row) is stored as a vector of the union intfloat, which can store either an integer or a float. There may be unknown values anywhere in the data, which is then stored internally in the intfloat as the integer -1. A Format object keeps track of the format of each column and how to interpret the internal values. The format can either be read from file or constructed internally. Some selected methods: DataObject(char* formatfile, char* datafile); DataObject(Format* format, char* datafile); Constructors. Read in data from file, either using a format file or an internally constructed format description. By omitting the data file name, an empty DataObject is constructed, which can then be filled by entries. int size(); The number of entries in the DataObject. int length(); The number of values (columns) of each entry (row) in the dataObject. union intfloat* newentry(); Creates and returns a new entry (row) in the DataObject. void sort(int (*func)(union intfloat* v1, union intfloat* v2)); Sorts the data entries according to the ordering function. int save(char* datafile); Saves the DataObject to file. Format* format(); Return the Format associated with the DataObject. To access the element in row i column j of the DataObject data, use (*data)[i][j].i or (*data)[i][j].f depending on whether it is a discrete or continuous value. Format and FormatSpec The Format object keeps track on formats of the values in the DataObject. For each column it has a FormatSpec which can be asked to interpret a value. Some methods are: form->nth(int n) : Returns the FormatSpec of the n:th column form->nth(n)->type() : Returns an integer depending on the type. 3 means continuous, in which case .f should be used, 2 means discrete and 4 means symbolic, in both of which cases .i should be used. form->nth(n)->represent(intfloat v) : Returns the string representation of the value, as it is stored on file. Is useful to find the symbolic representation of an internal value. form->nth(n)->interpret(char* str) : Return the internal intfloat representation corresponding to he string value. Useful to find the integer corresponding to a symbolic value. form->nth(n)->getnum() : In the case of discrete and symbolic attributes, returns the number of different possible values in that column. IndexTable A helper class that is used to assign integer indices to strings. Is used for symbolic attributes, but can be used separately to map back and forth between unique integers to strings. Implemented as a combined hash table and linked list, to get both fast lookup and sorted order. Some methods: IndexTable() : Creator. Creates a new empty table. void insert(char* str) : Inserts a new string, in sort order. void insert_last(char* str) : Inserts a new string last, i.e no resorting and thus not changing previous indices. int get(char* str) : returns the index of the string. char* nth(int n) : returns the string with the index. TokenLink* readtokens(FILE* f); Another useful helper function, that reads lines from files and parses them into tokens. Used by DataObject when reading data files, but can be used separately for reading lines of tab or space separated tokens from any file. A TokenLink is a single linked list declare as: struct TokenLink { char* token; TokenLink* next; }; Other functions used together with readtokens are: void freetokens(TokenLink* list) : Must be used to free the list returned by readtokens. int length(TokenLink* list) : The number of tokens in the list.