- The labels and subjects for both the training and test sets were merged with the measurements from their respective sets.
- All data was merged into a single data set.
- The activity labels were transformed into a factor with levels "WALKING", "WALKING_UPSTAIRS", "WALKING_DOWNSTAIRS", "SITTING", "STANDING", and "LAYING". These correspond directly to the label values 1 through 6 respectively.
- Features were labeled as per their values in "UCI HAR Dataset/features.txt".
- A regular expression is used to extract all columns that represent a mean or standard deviation measurement.
- Finally, the data is summarized by splitting it into chunks per-subject and then performing an average over each of those chunks. These chunks are then re-merged back into a data frame and saved to disk.
- All columns are numeric.
- The data set is sorted by subject.
- There are 30 rows corresponding to 30 subjects.
- There are 80 columns. The first column is the subject for whom the row was computed from, and columns 2-80 correspond to average measurement data.
- All measurements are the average of either a mean or standard deviation on a per-subject basis.
Ex: tidy[1,"tBodyAcc-mean()-X"] is the average of all mean readings of the X-axis accelerometer for subject 1. - The measurements are the average of sensor signals generated by a Samsung Galaxy SII. Each of the measurements are unitless, having been normalized to be within the range [-1, 1].