Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Transformation #24

Open
rubbiyasultan opened this issue Jun 21, 2023 · 13 comments
Open

Feature Transformation #24

rubbiyasultan opened this issue Jun 21, 2023 · 13 comments

Comments

@rubbiyasultan
Copy link

rubbiyasultan commented Jun 21, 2023

Hello,

I am trying to run MiniRocket on my dataset, which is basically a SCADA dataset containing data from multiple sensors over period of time. Its a multivariate time series therefore I am using multivariate version of MiniRocket from sklearn. However, the features are not being transformed the way they are supposed to be.

Initially, I ran the following chunk of code on my personal SCADA dataset:

minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
X_test_transform = minirocket_multi.transform(X_test)

This is the output that I am getting,

----------------------Before Transformation------------------------------
X_train: (34992, 25)
X_test: (17472, 25)
----------------------After Transformation------------------------------
X_train: (1, 9996)
X_test: (1, 9996)

However, I think after transformation the shape X_train and X_test should be (34992, 9996) and (17472, 9996). Could you please help me in this regard? Why is just transforming one single sample, not the rest?

Also, I would like to mention that I have loaded data as using pickle file, containing data in form of pandas dataframe.

with open(train_file, "rb") as f:
data_train=pickle.load(f)
X_train_wt = data_train.iloc[:, :-1]
y_train_wt = data_train.iloc[:, -1] # Last column

@Sandy4321
Copy link

good question

@angus924
Copy link
Owner

From what you have said, my understanding is that you have 34,992 time series in your training set, each of length 25 (and, likewise, 17,472 time series in your test set, each of length 25). If so, as you say, you should expect an output shape of [34,992, 9,996] (and [17,472, 9,996]). This suggests that the dataset is univariate, as otherwise the input shape would presumably be [34,992, c, 25] (e.g., for c channels), etc.

If this is correct, you should be using the univariate version of MiniRocket.

However, you also say:

Its a multivariate time series

If this is the case, I would interpret your input dimensions as representing a single time series of length 34,992 with 25 channels (in which case your input should be shaped [1, 25, 34,992], etc).

Basically, we need to clarify the exact format and shape of your data.

Does this help at all?

@rubbiyasultan
Copy link
Author

Thank you for your answer. However, I don't understand the input shape part. My timeseries data have 34992 rows/samples and 25 columns/features. I am also trying to run this Multivariate MiniRocket on benchmark dataset PenDigits, but still I am getting lot of errors. I am sharing my code with you. Maybe you could help me out?


from sktime.datasets import load_from_tsfile_to_dataframe

# Specify the path to the .ts file
file_path = "data_benchmark/PenDigits/PenDigits_TRAIN.ts"

# Load the data from the .ts file into a pandas DataFrame
X_train, y_train = load_from_tsfile_to_dataframe(file_path)

# Print the data and target shapes
print("X_train:", X_train.shape)
print("y_train:", y_train.shape)

# Specify the path to the .ts file
file_path = "data_benchmark/PenDigits/PenDigits_TEST.ts"

# Load the data from the .ts file into a pandas DataFrame
X_test, y_test = load_from_tsfile_to_dataframe(file_path)

minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
# X_test_transform = minirocket_multi.transform(X_test)

This implementation is similar to what you have provided in the documentation https://github.com/sktime/sktime/blob/main/examples/minirocket.ipynb.

However, I am still getting errors.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[331], line 3
      1 # MiniRocket transformation
      2 minirocket_multi = MiniRocketMultivariate()
----> 3 X_train_transform = minirocket_multi.fit_transform(X_train)

File ~/.local/lib/python3.10/site-packages/sktime/transformations/base.py:620, in BaseTransformer.fit_transform(self, X, y)
    555 """Fit to data, then transform it.
    556 
    557 Fits the transformer to X and y and returns a transformed version of X.
   (...)
    616         Example: i-th instance of the output is the i-th window running over `X`
    617 """
    618 # Non-optimized default implementation; override when a better
    619 # method is possible for a given algorithm.
--> 620 return self.fit(X, y).transform(X, y)

File ~/.local/lib/python3.10/site-packages/sktime/transformations/base.py:439, in BaseTransformer.fit(self, X, y)
    437 # we call the ordinary _fit if no looping/vectorization needed
    438 if not vectorization_needed:
--> 439     self._fit(X=X_inner, y=y_inner)
    440 else:
    441     # otherwise we call the vectorized version of fit
    442     self._vectorize("fit", X=X_inner, y=y_inner)

File ~/.local/lib/python3.10/site-packages/sktime/transformations/panel/rocket/_minirocket_multivariate.py:117, in MiniRocketMultivariate._fit(self, X, y)
    115 *_, n_timepoints = X.shape
    116 if n_timepoints < 9:
--> 117     raise ValueError(
    118         (
    119             f"n_timepoints must be >= 9, but found {n_timepoints};"
    120             " zero pad shorter series so that n_timepoints == 9"
    121         )
    122     )
    123 self.parameters = _fit_multi(
    124     X, self.num_kernels, self.max_dilations_per_kernel, self.random_state_
    125 )
    126 return self

ValueError: n_timepoints must be >= 9, but found 8; zero pad shorter series so that n_timepoints == 9

Also, I would highly appreciate if you could provide detailed methodology/documentation regarding minirocket for multivariate timeseries.

Link to PenDigits dataset: http://www.timeseriesclassification.com/description.php?Dataset=PenDigits

@Sandy4321
Copy link

it is how data looks like in original code
image

@Sandy4321
Copy link

maybe
'''

Load the data from the .ts file into a pandas DataFrame

X_train, y_train = load_from_tsfile_to_dataframe(file_path)
'''
provides different format ?

@Sandy4321
Copy link

may you share similar data to your scada data (especially the same format) for example
https://data.world/datasets/scada

@Sandy4321
Copy link

ok I fixed issue
ValueError: n_timepoints must be >= 9, but found 8; zero pad shorter series so that n_timepoints == 9
you need to padd data to have more than 8 samples in one time series
image

it is padded data
image

@Sandy4321
Copy link

by the way , any chances to use data with categorical values for example

green, red, black, brown

@rubbiyasultan
Copy link
Author

ok I fixed issue ValueError: n_timepoints must be >= 9, but found 8; zero pad shorter series so that n_timepoints == 9 you need to padd data to have more than 8 samples in one time series image

it is padded data image

Thank you! But did you pad it manually?

@rubbiyasultan
Copy link
Author

rubbiyasultan commented Jun 30, 2023

by the way , any chances to use data with categorical values for example

green, red, black, brown

Yes you can use encode command from sklearn to transform the categorical values.

@Sandy4321
Copy link

Thank you! But did you pad it manually?
yes only 3 lines
Yes you can use encode command from sklearn tpu transform the categorical values
cool thanks
may you share code example?

@Sandy4321
Copy link

at least , what is it tpu?
but if you have code example with data set to try for multivariate time series with mixture of continues and categorical values pls share

@rubbiyasultan
Copy link
Author

rubbiyasultan commented Jul 3, 2023

Thank you for your answer. However, I don't understand the input shape part. My timeseries data have 34992 rows/samples and 25 columns/features. I am also trying to run this Multivariate MiniRocket on benchmark dataset PenDigits, but still I am getting lot of errors. I am sharing my code with you. Maybe you could help me out?


from sktime.datasets import load_from_tsfile_to_dataframe

# Specify the path to the .ts file
file_path = "data_benchmark/PenDigits/PenDigits_TRAIN.ts"

# Load the data from the .ts file into a pandas DataFrame
X_train, y_train = load_from_tsfile_to_dataframe(file_path)

# Print the data and target shapes
print("X_train:", X_train.shape)
print("y_train:", y_train.shape)

# Specify the path to the .ts file
file_path = "data_benchmark/PenDigits/PenDigits_TEST.ts"

# Load the data from the .ts file into a pandas DataFrame
X_test, y_test = load_from_tsfile_to_dataframe(file_path)

minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
# X_test_transform = minirocket_multi.transform(X_test)

This implementation is similar to what you have provided in the documentation https://github.com/sktime/sktime/blob/main/examples/minirocket.ipynb.
However, I am still getting errors.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[331], line 3
      1 # MiniRocket transformation
      2 minirocket_multi = MiniRocketMultivariate()
----> 3 X_train_transform = minirocket_multi.fit_transform(X_train)

File ~/.local/lib/python3.10/site-packages/sktime/transformations/base.py:620, in BaseTransformer.fit_transform(self, X, y)
    555 """Fit to data, then transform it.
    556 
    557 Fits the transformer to X and y and returns a transformed version of X.
   (...)
    616         Example: i-th instance of the output is the i-th window running over `X`
    617 """
    618 # Non-optimized default implementation; override when a better
    619 # method is possible for a given algorithm.
--> 620 return self.fit(X, y).transform(X, y)

File ~/.local/lib/python3.10/site-packages/sktime/transformations/base.py:439, in BaseTransformer.fit(self, X, y)
    437 # we call the ordinary _fit if no looping/vectorization needed
    438 if not vectorization_needed:
--> 439     self._fit(X=X_inner, y=y_inner)
    440 else:
    441     # otherwise we call the vectorized version of fit
    442     self._vectorize("fit", X=X_inner, y=y_inner)

File ~/.local/lib/python3.10/site-packages/sktime/transformations/panel/rocket/_minirocket_multivariate.py:117, in MiniRocketMultivariate._fit(self, X, y)
    115 *_, n_timepoints = X.shape
    116 if n_timepoints < 9:
--> 117     raise ValueError(
    118         (
    119             f"n_timepoints must be >= 9, but found {n_timepoints};"
    120             " zero pad shorter series so that n_timepoints == 9"
    121         )
    122     )
    123 self.parameters = _fit_multi(
    124     X, self.num_kernels, self.max_dilations_per_kernel, self.random_state_
    125 )
    126 return self

ValueError: n_timepoints must be >= 9, but found 8; zero pad shorter series so that n_timepoints == 9

Also, I would highly appreciate if you could provide detailed methodology/documentation regarding minirocket for multivariate timeseries.
Link to PenDigits dataset: http://www.timeseriesclassification.com/description.php?Dataset=PenDigits

@angus924 could you please look into this? Also, I tried running MiniRocket SCADA data it giving me bad accuracy on test data (around 55%), I am planning to change the classifier into non-linear one, maybe LSTM. Do you think it would be right approach? To apply feature transformation using MiniRocket and run LSTM on it?

Also, I need to understand the feature transformation in multivariate timeseries. I am running BasicMotion dataset, and this is what I get:

# Load the data
X_train, y_train = load_basic_motions(split="train", return_X_y=True)
X_test, y_test = load_basic_motions(split="test", return_X_y=True)
print("-------------before transformation--------")
print(X_train.shape)
print(X_test.shape)

# MiniRocket transformation
minirocket_multi = MiniRocketMultivariate()
X_train_transform = minirocket_multi.fit_transform(X_train)
X_test_transform = minirocket_multi.transform(X_test)

print("-------------before transformation--------")
print(X_train_transform.shape)
print(X_test_transform.shape)
      
Output:
-------------before transformation--------
(40, 6)
(40, 6)
-------------before transformation--------
(40, 9996)
(40, 9996)

The BasicMotion dataset has 40 rows(samples) and 6 columns(features), and it is transformed into (40,9996), the kernels are to be applied on each feature, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants