This quickstart describes a flow to predict future ETH price via a local AI model.
It is used for the Web3 ATL hackathon that runs Oct 26 - Nov 6, 2022.
- Kickoff: Oct 26, 2022
- Submission deadline: Nov 6, 2022 at 11:59pm Eastern Time.
- Prediction at times: Nov 7, 2022 at 01:00am, 02:00am, ..., 11:00, 12:00 midday. (12 predictions total)
- Winners announced: within one week
Here are the steps:
- Basic Setup
- Get data locally
- Make predictions
- Publish & share predictions
Prerequisites:
- Linux/MacOS
- Python 3.8.5+
- Arweave Bundlr:
npm install -g @bundlr-network/client
Now, let's install Python libraries. Open a terminal and:
# Initialize virtual environment and activate it.
python3 -m venv venv
source venv/bin/activate
# Avoid errors for the step that follows
pip3 install wheel
# Install libraries
pip3 install ocean-lib matplotlib pybundlr ccxt
You'll be using Polygon network. So, please ensure that you have a Polygon account that holds some MATIC (at least a few $ worth). More info.
In the terminal:
export REMOTE_TEST_PRIVATE_KEY1=<your Polygon private key>
Do the steps in "Appendix: Load helper functions".
In the terminal, run Python: python
In the Python console:
ocean = create_ocean_instance()
alice_wallet = create_alice_wallet(ocean) #you're Alice
Here, use whatever data you wish.
It can be static data or streams, free or priced, raw data or feature vectors or otherwise. It can be published via Ocean, or not.
The main README links to some options.
Here, build whatever AI/ML model you want, leveraging the data from the previous step.
This demo flow skips building a model because the next step will simply generate random predictions.
Predictions must be one prediction every hour on the hour, for a 12h period: 7 November at 1am, 2am, 3am, 4am, 5am, 6am, 7am, 8am, 9am, 10am, 11am, 12pm (EST). Therefore there are 12 predictions in total. The output is a list with 12 items.
Here's an example with random numbers. In the same Python console:
#get predicted ETH values
mean, stddev = 1300, 25.0
pred_vals = list(np.random.normal(loc=mean, scale=stddev, size=(12,)))
We use normalized mean-squared error (NMSE) as the accuracy measure.
In the same Python console:
# get the time range we want to test for
start_dt = datetime.datetime.now() - datetime.timedelta(hours=24) #must be >= 12h ago
start_dt = round_to_nearest_hour(start_dt) # so that times line up
target_uts = target_12h_unixtimes(start_dt)
print_datetime_info("target times", target_uts)
# get the actual ETH values at that time
import ccxt
allcex_x = ccxt.binance().fetch_ohlcv('ETH/USDT', '1h')
allcex_uts = [xi[0]/1000 for xi in allcex_x]
allcex_vals = [xi[4] for xi in allcex_x]
print_datetime_info("allcex times", allcex_uts)
cex_vals = filter_to_target_uts(target_uts, allcex_uts, allcex_vals)
# now, we have predicted and actual values. Let's find error, and plot!
nmse = calc_nmse(cex_vals, pred_vals)
print(f"NMSE = {nmse}")
plot_prices(cex_vals, pred_vals)
Keep iterating in step 3 until you're satisfied with accuracy. Then...
In the same Python console:
file_name = "/tmp/pred_vals.csv"
save_list(pred_vals, file_name)
The csv will look something like:
[1503.134,1512.490,1498.982,...,1590.673]
You'll upload to Arweave permanent decentralized file storage, via Bundlr. This makes the predictions tamper-proof. Bundlr enables you to pay via MATIC from your Polygon account.
In the same Python console:
from pybundlr import pybundlr
file_name = "/tmp/pred_vals.csv"
url = pybundlr.fund_and_upload(file_name, "matic", alice_wallet.private_key)
#e.g. url = "https://arweave.net/qctEbPb3CjvU8LmV3G_mynX74eCxo1domFQIlOBH1xU"
print(f"Your csv url: {url}")
In the same Python console:
name = "ETH predictions " + str(time.time()) #time for unique name
(data_nft, datatoken, asset) = ocean.assets.create_url_asset(name, url, alice_wallet, wait_for_aqua=False)
data_nft.set_metadata_state(metadata_state=5, from_wallet=alice_wallet)
print(f"New asset created, with did={asset.did}, and datatoken.address={datatoken.address}")
Write down the did
and datatoken.address
. You'll be needing to share them with the hackathon hosts.
In the same Python console:
to_address="0xA54ABd42b11B7C97538CAD7C6A2820419ddF703E" #official judges address
datatoken.mint(to_address, ocean.to_wei(10), alice_wallet)
Now, you're complete! Thanks for being part of this competition.
In the terminal:
export REMOTE_TEST_PRIVATE_KEY1=<judges' private key, having address 0xA54A..>
Do the steps in "Appendix: Load helper functions".
In the same Python console:
#setup
ocean = create_ocean_instance()
alice_wallet = create_alice_wallet(ocean) #you're Alice
#get predicted ETH values
did = <value shared by you>
file_name = ocean.assets.download_file(did, alice_wallet)
pred_vals = load_list(file_name)
#get actual ETH values (final)
ETH_USDT_did = "did:op:0dac5eb4965fb2b485181671adbf3a23b0133abf71d2775eda8043e8efc92d19"
file_name = ocean.assets.download_file(ETH_USDT_did, alice_wallet)
allcex_uts, allcex_vals = load_from_ohlc_data(file_name)
print_datetime_info("CEX data info", allcex_uts)
start_dt = datetime.datetime(2022, 11, 7, 5, 00) #Nov 7, 2022 at 1.00am Eastern (=05:00 UTC)
target_uts = target_12h_unixtimes(start_dt)
print_datetime_info("target times", target_uts)
cex_vals = filter_to_target_uts(target_uts, allcex_uts, allcex_vals)
#calc nmse, plot
nmse = calc_nmse(cex_vals, pred_vals)
print(f"NMSE = {nmse}")
plot_prices(cex_vals, pred_vals)
If the Python console isn't already open: python
In the Python console, copy and paste everything below:
#imports
import datetime
from datetime import timezone
import numpy as np
from pathlib import Path
import os
import time
import matplotlib
import matplotlib.pyplot as plt
from ocean_lib.example_config import ExampleConfig
from ocean_lib.ocean.ocean import Ocean
from ocean_lib.web3_internal.wallet import Wallet
#helper functions: setup
def create_ocean_instance() -> Ocean:
config = ExampleConfig.get_config("https://polygon-rpc.com") # points to Polygon mainnet
config["BLOCK_CONFIRMATIONS"] = 1 #faster
ocean = Ocean(config)
return ocean
def create_alice_wallet(ocean: Ocean) -> Wallet:
config = ocean.config_dict
alice_private_key = os.getenv('REMOTE_TEST_PRIVATE_KEY1')
alice_wallet = Wallet(ocean.web3, alice_private_key, config["BLOCK_CONFIRMATIONS"], config["TRANSACTION_TIMEOUT"])
bal = ocean.from_wei(alice_wallet.web3.eth.get_balance(alice_wallet.address))
print(f"alice_wallet.address={alice_wallet.address}. bal={bal}")
assert bal > 0, f"Alice needs MATIC"
return alice_wallet
#helper functions: time
def to_unixtime(dt: datetime.datetime):
#must account for timezone, otherwise it's off
ut = dt.replace(tzinfo=timezone.utc).timestamp()
dt2 = datetime.datetime.utcfromtimestamp(ut) #to_datetime() approach
assert dt2 == dt, f"dt: {dt}, dt2: {dt2}"
return ut
def to_unixtimes(dts: list) -> list:
return [to_unixtime(dt) for dt in dts]
def to_datetime(ut) -> datetime.datetime:
dt = datetime.datetime.utcfromtimestamp(ut)
ut2 = dt.replace(tzinfo=timezone.utc).timestamp() #to_unixtime() approach
assert ut2 == ut, f"ut: {ut}, ut2: {ut2}"
return dt
def to_datetimes(uts: list) -> list:
return [to_datetime(ut) for ut in uts]
def round_to_nearest_hour(dt: datetime.datetime) -> datetime.datetime:
return (dt.replace(second=0, microsecond=0, minute=0, hour=dt.hour)
+ datetime.timedelta(hours=dt.minute//30))
def pretty_time(dt: datetime.datetime) -> str:
return dt.strftime('%Y/%m/%d, %H:%M:%S')
def print_datetime_info(descr:str, uts: list):
dts = to_datetimes(uts)
print(descr + ":")
print(f" starts on: {pretty_time(dts[0])}")
print(f" ends on: {pretty_time(dts[-1])}")
print(f" {len(dts)} datapoints")
print(f" time interval between datapoints: {(dts[1]-dts[0])}")
def target_12h_unixtimes(start_dt: datetime.datetime) -> list:
target_dts = [start_dt + datetime.timedelta(hours=h) for h in range(12)]
target_uts = to_unixtimes(target_dts)
return target_uts
#helper-functions: higher level
def load_from_ohlc_data(file_name: str) -> tuple:
"""Returns (list_of_unixtimes, list_of_close_prices)"""
with open(file_name, "r") as file:
data_str = file.read().rstrip().replace('"', '')
x = eval(data_str) #list of lists
uts = [xi[0]/1000 for xi in x]
vals = [xi[4] for xi in x]
return (uts, vals)
def filter_to_target_uts(target_uts:list, unfiltered_uts:list, unfiltered_vals:list) -> list:
"""Return filtered_vals -- values at at the target timestamps"""
filtered_vals = [None] * len(target_uts)
for i, target_ut in enumerate(target_uts):
time_diffs = np.abs(np.asarray(unfiltered_uts) - target_ut)
tol_s = 1 #should always align within e.g. 1 second
target_ut_s = pretty_time(to_datetime(target_ut))
assert min(time_diffs) <= tol_s, \
f"Unfiltered times is missing target time: {target_ut_s}"
j = np.argmin(time_diffs)
filtered_vals[i] = unfiltered_vals[j]
return filtered_vals
#helpers: save/load list
def save_list(list_: list, file_name: str):
"""Save a file shaped: [1.2, 3.4, 5.6, ..]"""
p = Path(file_name)
p.write_text(str(list_))
def load_list(file_name: str) -> list:
"""Load from a file shaped: [1.2, 3.4, 5.6, ..]"""
p = Path(file_name)
s = p.read_text()
list_ = eval(s)
return list_
#helpers: prediction performance
def calc_nmse(y, yhat) -> float:
assert len(y) == len(yhat)
y, yhat = np.asarray(y), np.asarray(yhat)
range_y = max(y) - min(y)
nmse = np.sqrt(np.average(((yhat - y) / range_y) ** 2))
return nmse
def plot_prices(cex_vals, pred_vals):
matplotlib.rcParams.update({'font.size': 22})
x = [h for h in range(0,12)]
assert len(x) == len(cex_vals) == len(pred_vals)
fig, ax = plt.subplots()
ax.plot(x, cex_vals, '--', label="CEX values")
ax.plot(x, pred_vals, '-', label="Pred. values")
ax.legend(loc='lower right')
plt.ylabel("ETH price")
plt.xlabel("Hour")
fig.set_size_inches(18, 18)
plt.xticks(x)
plt.show()