-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Transformation #24
Comments
good question |
From what you have said, my understanding is that you have 34,992 time series in your training set, each of length 25 (and, likewise, 17,472 time series in your test set, each of length 25). If so, as you say, you should expect an output shape of [34,992, 9,996] (and [17,472, 9,996]). This suggests that the dataset is univariate, as otherwise the input shape would presumably be [34,992, c, 25] (e.g., for c channels), etc. If this is correct, you should be using the univariate version of MiniRocket. However, you also say:
If this is the case, I would interpret your input dimensions as representing a single time series of length 34,992 with 25 channels (in which case your input should be shaped [1, 25, 34,992], etc). Basically, we need to clarify the exact format and shape of your data. Does this help at all? |
Thank you for your answer. However, I don't understand the input shape part. My timeseries data have 34992 rows/samples and 25 columns/features. I am also trying to run this Multivariate MiniRocket on benchmark dataset PenDigits, but still I am getting lot of errors. I am sharing my code with you. Maybe you could help me out?
This implementation is similar to what you have provided in the documentation https://github.com/sktime/sktime/blob/main/examples/minirocket.ipynb. However, I am still getting errors.
Also, I would highly appreciate if you could provide detailed methodology/documentation regarding minirocket for multivariate timeseries. Link to PenDigits dataset: http://www.timeseriesclassification.com/description.php?Dataset=PenDigits |
maybe Load the data from the .ts file into a pandas DataFrameX_train, y_train = load_from_tsfile_to_dataframe(file_path) |
may you share similar data to your scada data (especially the same format) for example |
by the way , any chances to use data with categorical values for example green, red, black, brown |
Yes you can use encode command from sklearn to transform the categorical values. |
Thank you! But did you pad it manually? |
at least , what is it tpu? |
@angus924 could you please look into this? Also, I tried running MiniRocket SCADA data it giving me bad accuracy on test data (around 55%), I am planning to change the classifier into non-linear one, maybe LSTM. Do you think it would be right approach? To apply feature transformation using MiniRocket and run LSTM on it? Also, I need to understand the feature transformation in multivariate timeseries. I am running BasicMotion dataset, and this is what I get:
The BasicMotion dataset has 40 rows(samples) and 6 columns(features), and it is transformed into (40,9996), the kernels are to be applied on each feature, right? |
Hello,
I am trying to run MiniRocket on my dataset, which is basically a SCADA dataset containing data from multiple sensors over period of time. Its a multivariate time series therefore I am using multivariate version of MiniRocket from sklearn. However, the features are not being transformed the way they are supposed to be.
Initially, I ran the following chunk of code on my personal SCADA dataset:
minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
X_test_transform = minirocket_multi.transform(X_test)
This is the output that I am getting,
----------------------Before Transformation------------------------------
X_train: (34992, 25)
X_test: (17472, 25)
----------------------After Transformation------------------------------
X_train: (1, 9996)
X_test: (1, 9996)
However, I think after transformation the shape X_train and X_test should be (34992, 9996) and (17472, 9996). Could you please help me in this regard? Why is just transforming one single sample, not the rest?
Also, I would like to mention that I have loaded data as using pickle file, containing data in form of pandas dataframe.
with open(train_file, "rb") as f:
data_train=pickle.load(f)
X_train_wt = data_train.iloc[:, :-1]
y_train_wt = data_train.iloc[:, -1] # Last column
The text was updated successfully, but these errors were encountered: