Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup Pass #124

Merged
merged 29 commits into from
Apr 24, 2024
Merged

Cleanup Pass #124

merged 29 commits into from
Apr 24, 2024

Conversation

kfsone
Copy link
Contributor

@kfsone kfsone commented Apr 23, 2024

I'm trying to revisit various decisions I'd made during the early days that now cause problems either in performance with IDEs or get linters really bent out of shape.

There's also some work here to try and beautify things - although I'm introducing those sparingly and doing them in-place in the plugin where they seem relevant.

Some small amount of performance tuning, but it's not going to really make a noticeable difference yet.

Strongly recommend you review the individual changes; I think trying to read the entire diff would be brain-hurting...

I'm creating a pull-request before I've had chance to finish watching a complete import, mostly because of the size and to give Eyeonus a chance to ask others to review and/or reject/request changes.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

Everything looks good code-wise, I just have to ask:

Why are some of your commits using the angular prefixes and some aren't?

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

Some tests are failing in tox:
tests/test_trade.py:76: AssertionError
=========================== short test summary info ============================
FAILED tests/test_cache.py::TestCache::test_parseSupply - assert 2 == 3
FAILED tests/test_trade.py::TestTrade::test_buy - assert 'Cost Units Dis...
FAILED tests/test_trade.py::TestTrade::test_nav - AssertionError: assert 'Sys...
FAILED tests/test_trade.py::TestTrade::test_market - AssertionError: assert F...
=================== 4 failed, 51 passed, 4 skipped in 1.46s ====================

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

Everything looks good code-wise, I just have to ask:

Why are some of your commits using the angular prefixes and some aren't?

Not familiar with the term?

Also I have another commit to add: I'd added a UNIQUE(fdev_id) to StationItem and dumpPrices did not like that.

Saw your fix on the supply levels, good catch; also saw you had to fix my whitespace - I'm trying to turn off the settings I have locally to strip trailing whitespace. LMK if I'm still doing it.

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

Do you have contributor docs for setting up tox / etc?

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

@eyeonus BTW - What IDE/tooling are you using? (I mostly use VSCode, but when I'm trying to be good about python I use PyCharm, and I'm just setting that up for td now)

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

Looking at the errors; it looks like we weren't ever actually running flake8 without running all the tests. Once flake8 is enabled ... it warns. A lot -- see incoming extra change to this branch.

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

One thing a lot of python linters try to enforce is the distinction between

if happy:
  life = good
# and
if happy(
  life=good
)

so the standard formatters/styles expect keyword assignments to have no space around the equals, and include that in default arguments:

def poop(stink=bad):

unless there's a type annotation:

def poop(stink: Level = bad)

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

I'm happy to fix-up code or configure linters - just lmk your preferences.

tox -e flake8

with this branch will warnapalooza at ya.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

I'm at work right now, so just as a quick reply

I'm currently using Eclipse with PyDev, planning on switching over to PyCharm when I have the free time

angular is a commit commenting style that python-semantic-release uses to determine whether to increment the version number, whether to publish a release, automatically annotate the release with what changes occurred since last release, etc.

chore:, fix:, refactor: are examples

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

Did not know about that, will adjust accordingly. Like it.

I was doing a burn-thru of warnings and errors, getting myself back to grips with the code and tooling - I hopefully got us closer to being able to use pylint which has saved my bacon hard several times, even though it's a pain sometimes.

Tomorrow I'll actually fix the errors - I think I know what they are already, and I wanted to have my ide running so I could step thru meaningfully and see it for sure :)

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

I'll fix the commit messages in the morning. For now, I just checked in the fix that was causing the test failures.

@kfsone kfsone force-pushed the kfsone/cleanup-pass branch 3 times, most recently from 0f93713 to c9b28b9 Compare April 23, 2024 08:38
@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

That should cleanup the descriptions.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

Regarding the fdev_id in the Item table: We're using the fdev_id as the item_id nowadays, so it wouldn't be a bad idea to refactor the fdev_id away completely.

I believe the only things that use the fdev_id are my eddblink plugin, the spansh plugin, and the listener, which is a different repo.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

On another note, that's a lot of work done while I was at work. Color me impressed. And thanks for cleaning up the commit messages.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

Just a heads-up, someone wrote commandenv.colorize() and used it to color the output in tradecalc, not sure if that'll interfere with the rich coloring you've been doing.

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

@eyeonus I saw that, and left it alone -- the colorizing is just automatic by writing through rich, and the colorize implementation in tradecalc could be replaced with use of the rich/themeing.

Also - the build failure above is failing on deploying to pypi which I would hope I'm guaranteed to fail on from my branch :)

Right now I'm looking at some ideas from the "1 billion rows challenge" and how people solved that with python. The first kick in the teeth was a nice simple one: it's way faster to process files in binary or even ascii than utf-8. in the code where we're already using "blocks()" to speed it up, you can make it 8x faster by opening the file in 'rb' and then counting b'\n'. I'm doing more tests.

Also - I saw the optimization to go back from building the list to periodically closing the database. Agh. We're caught here between just needing someone to store the data and using SQL because we have a database as our store.

The idea really was that people would go thru TradeDB directly, with sqlite being a fallback way to access the data if something was amiss or went wrong.

I wonder if we shouldn't just get rid of the DB back end entirely and just pickle the python data. That used to be slow but it's fast as fek these days, and it would mean one source of truth rather than two. We already make it so that during imports you build a -new- database with the changes, this wouldn't be any different.

It would mean that you couldn't inspect the data with sqlite or anything.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

@eyeonus I saw that, and left it alone -- the colorizing is just automatic by writing through rich, and the colorize implementation in tradecalc could be replaced with use of the rich/themeing.

Also - the build failure above is failing on deploying to pypi which I would hope I'm guaranteed to fail on from my branch :)

Yup. That one only "succeeds" if the branch is release/v1, because obviously we don't want to publish releases otherwise

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

As far as the DB is concerned, everything in TD uses it, including TradeDB. The csv and prices files exist solely for rebuilding the DB if something goes wrong and it gets corrupted, lost, etc.

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

Also:

https://docs.python.org/3/library/pickle.html:

Warning

The pickle module is not secure. Only unpickle data you trust.

It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with hmac if you need to ensure that it has not been tampered with.

Safer serialization formats such as json may be more appropriate if you are processing untrusted data. See Comparison with json.

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

Yeah, we wouldn't be replacing the exchange formats with pickle. It's just a super fast way to "freeze" a snapshot of python's memory representation of objects:

import pickle
with open("test.pkl", "wb") as fh:
  pickle.dump({"data": [1,2,3], "names": set(("you", "me"))}, fh)

restored = pickle.load(open("test.pkl", "rb"))

gives

>>> print(restored)
{'data': [1, 2, 3], 'names': {'me', 'you'}}

But I don't think that it does as good a job at save/restore for actual entire object trees (as opposed to lists/tables of complex objects) as we'd produce if System has a direct link to Station that we want, rather than just an id reference.

I took a look at what it would require to pull of sanely, and the work to describe the structures nicely and modern-pythonically brings you also the entire way towards what would need doing in order to describe the schema to SQLAlchemy, at which point when SQLite is a bottle neck, people could simply swap the backend engine

if getOption("use_mysql"):
  tradedb.engine = SqlAlchemy.Engines.MySQL  # or so
elif getOption("use_postgres"):
  tradedb.engine .. giant_turd
  ...
  
 class TradeDB:
   def get_db(self):
     if self.db: return self.conn
     self.db= engine.connect(...)
     
     sol_system = sqlalchemy.select(System).where(name="Sol")
     print("sol is at", sol.pos_x, sol.pos_y, sol.pos_z)

@kfsone
Copy link
Contributor Author

kfsone commented Apr 23, 2024

image

@eyeonus
Copy link
Owner

eyeonus commented Apr 23, 2024

Let me just say, from my attempts to reign in the import time on eddblink with what is now a 2.5GB file, that SQL is definitely a bottleneck. It runs pretty quickly for the first 22,000,000 entries or so, but once the DB is over 3GB in size, it slows way down.

I mean, it gets slightly slower the whole way through, (inserting item 20,000 takes slightly longer than inserting item 10,000) but the difference becomes noticeable around 50% and gets worse from there.

kfsone and others added 15 commits April 23, 2024 18:22
- Adds Progresser to spansh_plugin as an experimental ui presentation layer using rich,
-- active progress bars with timers that run asynchronously (so they shouldn't have a significant performance overhead),
-- opt-out to plain-text mode incase it needs turning off quickly,
- present statistics with the view to give you a sense of progress,
- small perf tweaks
The idea is that flake8 is really fast, so you give it it's own environment and it comes back and says "YA MADE A TYPO OLIVER YA DID IT AGAIN WHY YOU ALWAYS..." *cough*, sorry, anyway.

I usually run tox one of two ways:

```
tox -e flake8   # for a quick what-did-I-type-wrong check
tox --parallel  # run em all, but, like, in parallel so I don't retire before you finish
```
turning on flake8 made linters very shouty
flake8 now runs its own environment that is JUST flake8 so it's fast, it doesn't install the package for itself;
added lots of flake8 ignores for formatting issues I'm unclear on
added a lot more ignores and some file-specific ignores, but generally got it to a point where flake8 becomes proper useful.
This fixes all of the tox warnings that I haven't disabled, and several that I did.
fixed a few actual problems while I was at it.
When I first created the 'str' methods, it was because I wanted a display name but repr and str both confused me at the time; I've now renamed it text but apparently I forgot that in Py2.x def str() worked like __str__...
@kfsone
Copy link
Contributor Author

kfsone commented Apr 24, 2024

nothing heavy, just some routine commands I could run on each box to make sure the data didn't just get ingested but it also didn't get digested ;)

@eyeonus
Copy link
Owner

eyeonus commented Apr 24, 2024

You could do the commands tox does to check nothing is broken?

@eyeonus
Copy link
Owner

eyeonus commented Apr 24, 2024

Maybe ask around on the EDCD discord? I'm sure the people there would have lots of suggestions

@kfsone
Copy link
Contributor Author

kfsone commented Apr 24, 2024

I haven't been able to test on the mac, it takes forever downloading the json file and without updates: the http headers tell us the gzip'd size but requests decompresses the file as it goes, and that .json file is, uhm, lets say very compressible? It's not compressing by a percentage, it's nearly compressing by an order of magnitude :)

@eyeonus
Copy link
Owner

eyeonus commented Apr 24, 2024

Yeah, it'd be nice if the download had progress updates.

Also, yeah, 1.4 GB -> 8.9 GB is quite the compression.

@kfsone
Copy link
Contributor Author

kfsone commented Apr 24, 2024

this will conflict after you merge the transfer-timeouts fix, but after that if you're ready to merge this one I think she's good.

@eyeonus eyeonus merged commit 7c60def into eyeonus:release/v1 Apr 24, 2024
10 of 11 checks passed
@eyeonus
Copy link
Owner

eyeonus commented Apr 24, 2024

Something got borked:

elite@quoth tradedangerous]$ trade buildcache -f
NOTE: Rebuilding cache file: this may take a few moments.
/home/elite/.local/bin/trade: /home/elite/tradedangerous/tddata/TradeDangerous.prices:19 ERROR Unrecognized line/syntax,
got: "Agronomic Treatment 5430 5568 ? 489H 2024-04-23 20:49:51". 

That's the first commodity in the file, and it has the correct format.

@eyeonus
Copy link
Owner

eyeonus commented Apr 24, 2024

I'm fairly certain it's from the changes made to newItemPriceRe in cache.py

@eyeonus
Copy link
Owner

eyeonus commented Apr 24, 2024

Yup, that was it. Reverting it fixed the problem.

@kfsone kfsone deleted the kfsone/cleanup-pass branch April 25, 2024 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants