Skip to content

Latest commit

 

History

History
374 lines (267 loc) · 11.1 KB

hack1.md

File metadata and controls

374 lines (267 loc) · 11.1 KB

Quickstart: Predict Future ETH Price: For Web3 ATL Hackathon

This quickstart describes a flow to predict future ETH price via a local AI model.

It is used for the Web3 ATL hackathon that runs Oct 26 - Nov 6, 2022.

  • Kickoff: Oct 26, 2022
  • Submission deadline: Nov 6, 2022 at 11:59pm Eastern Time.
  • Prediction at times: Nov 7, 2022 at 01:00am, 02:00am, ..., 11:00, 12:00 midday. (12 predictions total)
  • Winners announced: within one week

Here are the steps:

  1. Basic Setup
  2. Get data locally
  3. Make predictions
  4. Publish & share predictions

1. Setup

1.1 Prerequisites & installation

Prerequisites:

  • Linux/MacOS
  • Python 3.8.5+
  • Arweave Bundlr: npm install -g @bundlr-network/client

Now, let's install Python libraries. Open a terminal and:

# Initialize virtual environment and activate it.
python3 -m venv venv
source venv/bin/activate

# Avoid errors for the step that follows
pip3 install wheel

# Install libraries
pip3 install ocean-lib matplotlib pybundlr ccxt

1.2 Create Polygon Account (One-Time)

You'll be using Polygon network. So, please ensure that you have a Polygon account that holds some MATIC (at least a few $ worth). More info.

1.3 Set envvars, for Polygon address

In the terminal:

export REMOTE_TEST_PRIVATE_KEY1=<your Polygon private key>

1.4 Load helper functions

Do the steps in "Appendix: Load helper functions".

1.5 Setup in Python, for Polygon

In the terminal, run Python: python

In the Python console:

ocean = create_ocean_instance()
alice_wallet = create_alice_wallet(ocean) #you're Alice

2. Get data locally

Here, use whatever data you wish.

It can be static data or streams, free or priced, raw data or feature vectors or otherwise. It can be published via Ocean, or not.

The main README links to some options.

3. Make predictions

3.1 Build a simple AI model

Here, build whatever AI/ML model you want, leveraging the data from the previous step.

This demo flow skips building a model because the next step will simply generate random predictions.

3.2 Run the AI model to make future ETH price predictions

Predictions must be one prediction every hour on the hour, for a 12h period: 7 November at 1am, 2am, 3am, 4am, 5am, 6am, 7am, 8am, 9am, 10am, 11am, 12pm (EST). Therefore there are 12 predictions in total. The output is a list with 12 items.

Here's an example with random numbers. In the same Python console:

#get predicted ETH values
mean, stddev = 1300, 25.0
pred_vals = list(np.random.normal(loc=mean, scale=stddev, size=(12,)))

3.3 Calculate NMSE

We use normalized mean-squared error (NMSE) as the accuracy measure.

In the same Python console:

# get the time range we want to test for
start_dt = datetime.datetime.now() - datetime.timedelta(hours=24) #must be >= 12h ago
start_dt = round_to_nearest_hour(start_dt) # so that times line up
target_uts = target_12h_unixtimes(start_dt)
print_datetime_info("target times", target_uts)

# get the actual ETH values at that time
import ccxt
allcex_x = ccxt.binance().fetch_ohlcv('ETH/USDT', '1h')
allcex_uts = [xi[0]/1000 for xi in allcex_x]
allcex_vals = [xi[4] for xi in allcex_x]
print_datetime_info("allcex times", allcex_uts)

cex_vals = filter_to_target_uts(target_uts, allcex_uts, allcex_vals)

# now, we have predicted and actual values. Let's find error, and plot!
nmse = calc_nmse(cex_vals, pred_vals)
print(f"NMSE = {nmse}")
plot_prices(cex_vals, pred_vals)

Keep iterating in step 3 until you're satisfied with accuracy. Then...

4. Publish predictions

4.1 Save the predictions as a csv file

In the same Python console:

file_name = "/tmp/pred_vals.csv"
save_list(pred_vals, file_name)

The csv will look something like:

[1503.134,1512.490,1498.982,...,1590.673]

4.2 Put the csv online

You'll upload to Arweave permanent decentralized file storage, via Bundlr. This makes the predictions tamper-proof. Bundlr enables you to pay via MATIC from your Polygon account.

In the same Python console:

from pybundlr import pybundlr
file_name = "/tmp/pred_vals.csv"
url = pybundlr.fund_and_upload(file_name, "matic", alice_wallet.private_key)
#e.g. url = "https://arweave.net/qctEbPb3CjvU8LmV3G_mynX74eCxo1domFQIlOBH1xU"
print(f"Your csv url: {url}")

4.3 Publish Ocean asset

In the same Python console:

name = "ETH predictions " + str(time.time()) #time for unique name
(data_nft, datatoken, asset) = ocean.assets.create_url_asset(name, url, alice_wallet, wait_for_aqua=False)
data_nft.set_metadata_state(metadata_state=5, from_wallet=alice_wallet)
print(f"New asset created, with did={asset.did}, and datatoken.address={datatoken.address}")

Write down the did and datatoken.address. You'll be needing to share them with the hackathon hosts.

4.4 Share predictions to judges

In the same Python console:

to_address="0xA54ABd42b11B7C97538CAD7C6A2820419ddF703E" #official judges address
datatoken.mint(to_address, ocean.to_wei(10), alice_wallet)

Now, you're complete! Thanks for being part of this competition.

Appendix: What judges will do

In the terminal:

export REMOTE_TEST_PRIVATE_KEY1=<judges' private key, having address 0xA54A..>

Do the steps in "Appendix: Load helper functions".

In the same Python console:

#setup
ocean = create_ocean_instance()
alice_wallet = create_alice_wallet(ocean) #you're Alice

#get predicted ETH values
did = <value shared by you>
file_name = ocean.assets.download_file(did, alice_wallet)
pred_vals = load_list(file_name)

#get actual ETH values (final)
ETH_USDT_did = "did:op:0dac5eb4965fb2b485181671adbf3a23b0133abf71d2775eda8043e8efc92d19"
file_name = ocean.assets.download_file(ETH_USDT_did, alice_wallet)
allcex_uts, allcex_vals = load_from_ohlc_data(file_name)
print_datetime_info("CEX data info", allcex_uts)

start_dt = datetime.datetime(2022, 11, 7, 5, 00) #Nov 7, 2022 at 1.00am Eastern (=05:00 UTC)
target_uts = target_12h_unixtimes(start_dt)
print_datetime_info("target times", target_uts)

cex_vals = filter_to_target_uts(target_uts, allcex_uts, allcex_vals)

#calc nmse, plot
nmse = calc_nmse(cex_vals, pred_vals)
print(f"NMSE = {nmse}")
plot_prices(cex_vals, pred_vals)

Appendix: Load helper functions

If the Python console isn't already open: python

In the Python console, copy and paste everything below:

#imports
import datetime
from datetime import timezone
import numpy as np
from pathlib import Path
import os
import time

import matplotlib
import matplotlib.pyplot as plt
    
from ocean_lib.example_config import ExampleConfig
from ocean_lib.ocean.ocean import Ocean
from ocean_lib.web3_internal.wallet import Wallet


#helper functions: setup
def create_ocean_instance() -> Ocean:
    config = ExampleConfig.get_config("https://polygon-rpc.com") # points to Polygon mainnet
    config["BLOCK_CONFIRMATIONS"] = 1 #faster
    ocean = Ocean(config)
    return ocean


def create_alice_wallet(ocean: Ocean) -> Wallet:
    config = ocean.config_dict
    alice_private_key = os.getenv('REMOTE_TEST_PRIVATE_KEY1')
    alice_wallet = Wallet(ocean.web3, alice_private_key, config["BLOCK_CONFIRMATIONS"], config["TRANSACTION_TIMEOUT"])
    bal = ocean.from_wei(alice_wallet.web3.eth.get_balance(alice_wallet.address))
    print(f"alice_wallet.address={alice_wallet.address}. bal={bal}")
    assert bal > 0, f"Alice needs MATIC"
    return alice_wallet


#helper functions: time
def to_unixtime(dt: datetime.datetime):
    #must account for timezone, otherwise it's off
    ut = dt.replace(tzinfo=timezone.utc).timestamp()
    dt2 = datetime.datetime.utcfromtimestamp(ut) #to_datetime() approach
    assert dt2 == dt, f"dt: {dt}, dt2: {dt2}"
    return ut


def to_unixtimes(dts: list) -> list:
    return [to_unixtime(dt) for dt in dts]


def to_datetime(ut) -> datetime.datetime:
    dt = datetime.datetime.utcfromtimestamp(ut)
    ut2 = dt.replace(tzinfo=timezone.utc).timestamp() #to_unixtime() approach
    assert ut2 == ut, f"ut: {ut}, ut2: {ut2}"
    return dt


def to_datetimes(uts: list) -> list:
    return [to_datetime(ut) for ut in uts]


def round_to_nearest_hour(dt: datetime.datetime) -> datetime.datetime:
    return (dt.replace(second=0, microsecond=0, minute=0, hour=dt.hour)
            + datetime.timedelta(hours=dt.minute//30))


def pretty_time(dt: datetime.datetime) -> str:
    return dt.strftime('%Y/%m/%d, %H:%M:%S')


def print_datetime_info(descr:str, uts: list):
    dts = to_datetimes(uts)
    print(descr + ":")
    print(f"  starts on: {pretty_time(dts[0])}")
    print(f"    ends on: {pretty_time(dts[-1])}")
    print(f"  {len(dts)} datapoints")
    print(f"  time interval between datapoints: {(dts[1]-dts[0])}")


def target_12h_unixtimes(start_dt: datetime.datetime) -> list:
    target_dts = [start_dt + datetime.timedelta(hours=h) for h in range(12)]
    target_uts = to_unixtimes(target_dts)
    return target_uts


#helper-functions: higher level
def load_from_ohlc_data(file_name: str) -> tuple:
    """Returns (list_of_unixtimes, list_of_close_prices)"""
    with open(file_name, "r") as file:
        data_str = file.read().rstrip().replace('"', '')
    x = eval(data_str) #list of lists
    uts = [xi[0]/1000 for xi in x]
    vals = [xi[4] for xi in x]
    return (uts, vals)


def filter_to_target_uts(target_uts:list, unfiltered_uts:list, unfiltered_vals:list) -> list:
    """Return filtered_vals -- values at at the target timestamps"""
    filtered_vals = [None] * len(target_uts)
    for i, target_ut in enumerate(target_uts):
        time_diffs = np.abs(np.asarray(unfiltered_uts) - target_ut)
        tol_s = 1 #should always align within e.g. 1 second
        target_ut_s = pretty_time(to_datetime(target_ut))
        assert min(time_diffs) <= tol_s, \
            f"Unfiltered times is missing target time: {target_ut_s}"
        j = np.argmin(time_diffs)
        filtered_vals[i] = unfiltered_vals[j]
    return filtered_vals


#helpers: save/load list
def save_list(list_: list, file_name: str):
    """Save a file shaped: [1.2, 3.4, 5.6, ..]"""
    p = Path(file_name)
    p.write_text(str(list_))


def load_list(file_name: str) -> list:
    """Load from a file shaped: [1.2, 3.4, 5.6, ..]"""
    p = Path(file_name)
    s = p.read_text()
    list_ = eval(s)
    return list_


#helpers: prediction performance
def calc_nmse(y, yhat) -> float:
    assert len(y) == len(yhat)
    y, yhat = np.asarray(y), np.asarray(yhat)
    range_y = max(y) - min(y)    
    nmse = np.sqrt(np.average(((yhat - y) / range_y) ** 2))
    return nmse


def plot_prices(cex_vals, pred_vals):
    matplotlib.rcParams.update({'font.size': 22})
    
    x = [h for h in range(0,12)]
    assert len(x) == len(cex_vals) == len(pred_vals)
    
    fig, ax = plt.subplots()
    ax.plot(x, cex_vals, '--', label="CEX values")
    ax.plot(x, pred_vals, '-', label="Pred. values")
    ax.legend(loc='lower right')
    plt.ylabel("ETH price")
    plt.xlabel("Hour")
    fig.set_size_inches(18, 18)
    plt.xticks(x)
    plt.show()