"Phoenix Project" Neural Network Data The data used in the first series of experiments with neural networks is drawn from work done at Johnson Space Center in 1974 on Spacelab photographs of the Phoenix metropolitan area. The photographs were re-photographed with a circular aperture corresponding to a 2-km diameter circle on the ground, spaced so that the area originally photographed was covered by 433 of these circular "events". Each "event" photograph was then processed through equipment which produced and photographed a Fraunhofer Diffraction Pattern. Two distinct sampling geometries were used to record the light intensity patterns of these FDP's: a radial wedge sampling and an annular ring sampling. The resulting 195 data points for each event were the basis of further analysis. During the 1974 project, 57 "features" were empirically extracted from the FDP data points. A variety of methods were used to identify classification groupings for the original events. These were boiled down from 96 to 5 land use classes, and Fischer discriminant tests were developed and refined to yield 90+% classification accuracy, using as few as 9 of the features as input. For the neural network experiments, 292 of the total events were selected to provide coverage of the 5 land use classes. These, and the frequency distribution with which they are represented, are: Class # Events __________ ________ R(esidential) 31 F(arm) 75 M(ountain) 29 W(ater) 31 U(rban) 126 _____ 292 The data format for all files on these two volumes is that of NeuralWorks input (.nna) files. The first section of the record contains all the floating point numbers which are the event data points, plus an event ID number. The second part of the record contains the desired network output values, in this case a 5- vector of binary digits, followed by the letter code for the land use class (see table above), and an event number to match that in the middle of the record. (The order on the end may vary from file to file: event number + land use class, or vice versa.) Because the records are more than one text line long, they consist of multiple "continuation" lines. NeuralWorks prefaces each text line after the first within a single record with an "&" character. The records are all ASCII. NOTE ON THE FORMAT STRUCTURE FOR THE RECORDS: These data sets have been converted from an earlier version of NeuralWorks which separated the input and output data into distinct record types. Each could carry comments after the data portion; the event ID and the land use class are comments. The conversion routine does not remove comments, or insert the current comment delimiter, "!". The data files you have thus have a comment (Event ID) in the middle of each record, separating the input portion from the output portion. When devleoping and specifying your network, you must take this into account in the I/O Parameters section. Be sure to specify that the input begins in column 2, and the output in column 57, for the 55- feature data (54 features plus and Event ID implies column 57 for output data.) This diskette contains both the raw data for the 292 sample events, that is the 195 light intensity data points from the FDPs, and the feature data (features 1-54 only) for the same events. The Feature data sets contain the original data, without normalization, and the feature data is presented in two data sets: the 148-event subset used for network training; and the 144-event subset used for recall and generalization testing. Feature Files Content _____________ ___________________ PhTrain55.nna 148 training events PhTest55.nna 144 recall events These have the following frequency distributions of the 5 land use classes: PhTrain PhTest ___ ___ R(esidential) 14 17 F(arm) 40 35 M(ountain) 13 16 W(ater) 16 15 U(rban) 65 61 ___ ___ 148 144 Also on the diskette are the raw FDP datapoint data sets. These contain, for each event, the 100 radial wedge geometry points, normalized to a scale of 0 -1000, and the 95 annular ring geometry points, also normalized to a scale of 0 - 1000. Again, the files have the format described above. The files are: File Name Contents _________ _____________________ PhTrain195.NNA 100 training events PhTest195.NNA 77 recall events 115 events not used in this series Their land use class frequency distributions are: PhTrain PhTest NotUsed ___ ___ ___ R(esidential) 18 13 0 F(arm) 21 22 32 M(ountain) 20 9 0 W(ater) 20 11 0 U(rban) 21 22 83 ___ ___ ___ ___ 100 77 115 292