-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup Pass #124
Cleanup Pass #124
Conversation
Everything looks good code-wise, I just have to ask: Why are some of your commits using the angular prefixes and some aren't? |
Some tests are failing in tox: |
Not familiar with the term? Also I have another commit to add: I'd added a UNIQUE(fdev_id) to StationItem and dumpPrices did not like that. Saw your fix on the supply levels, good catch; also saw you had to fix my whitespace - I'm trying to turn off the settings I have locally to strip trailing whitespace. LMK if I'm still doing it. |
Do you have contributor docs for setting up tox / etc? |
@eyeonus BTW - What IDE/tooling are you using? (I mostly use VSCode, but when I'm trying to be good about python I use PyCharm, and I'm just setting that up for td now) |
Looking at the errors; it looks like we weren't ever actually running flake8 without running all the tests. Once flake8 is enabled ... it warns. A lot -- see incoming extra change to this branch. |
One thing a lot of python linters try to enforce is the distinction between
so the standard formatters/styles expect keyword assignments to have no space around the equals, and include that in default arguments:
unless there's a type annotation:
|
I'm happy to fix-up code or configure linters - just lmk your preferences.
with this branch will warnapalooza at ya. |
I'm at work right now, so just as a quick reply I'm currently using Eclipse with PyDev, planning on switching over to PyCharm when I have the free time angular is a commit commenting style that python-semantic-release uses to determine whether to increment the version number, whether to publish a release, automatically annotate the release with what changes occurred since last release, etc.
|
Did not know about that, will adjust accordingly. Like it. I was doing a burn-thru of warnings and errors, getting myself back to grips with the code and tooling - I hopefully got us closer to being able to use pylint which has saved my bacon hard several times, even though it's a pain sometimes. Tomorrow I'll actually fix the errors - I think I know what they are already, and I wanted to have my ide running so I could step thru meaningfully and see it for sure :) |
I'll fix the commit messages in the morning. For now, I just checked in the fix that was causing the test failures. |
0f93713
to
c9b28b9
Compare
That should cleanup the descriptions. |
Regarding the fdev_id in the Item table: We're using the fdev_id as the item_id nowadays, so it wouldn't be a bad idea to refactor the fdev_id away completely. I believe the only things that use the fdev_id are my eddblink plugin, the spansh plugin, and the listener, which is a different repo. |
On another note, that's a lot of work done while I was at work. Color me impressed. And thanks for cleaning up the commit messages. |
Just a heads-up, someone wrote commandenv.colorize() and used it to color the output in tradecalc, not sure if that'll interfere with the rich coloring you've been doing. |
@eyeonus I saw that, and left it alone -- the colorizing is just automatic by writing through rich, and the colorize implementation in tradecalc could be replaced with use of the rich/themeing. Also - the build failure above is failing on deploying to pypi which I would hope I'm guaranteed to fail on from my branch :) Right now I'm looking at some ideas from the "1 billion rows challenge" and how people solved that with python. The first kick in the teeth was a nice simple one: it's way faster to process files in binary or even ascii than utf-8. in the code where we're already using "blocks()" to speed it up, you can make it 8x faster by opening the file in 'rb' and then counting b'\n'. I'm doing more tests. Also - I saw the optimization to go back from building the list to periodically closing the database. Agh. We're caught here between just needing someone to store the data and using SQL because we have a database as our store. The idea really was that people would go thru TradeDB directly, with sqlite being a fallback way to access the data if something was amiss or went wrong. I wonder if we shouldn't just get rid of the DB back end entirely and just pickle the python data. That used to be slow but it's fast as fek these days, and it would mean one source of truth rather than two. We already make it so that during imports you build a -new- database with the changes, this wouldn't be any different. It would mean that you couldn't inspect the data with sqlite or anything. |
Yup. That one only "succeeds" if the branch is |
As far as the DB is concerned, everything in TD uses it, including TradeDB. The csv and prices files exist solely for rebuilding the DB if something goes wrong and it gets corrupted, lost, etc. |
Also: https://docs.python.org/3/library/pickle.html:
|
Yeah, we wouldn't be replacing the exchange formats with pickle. It's just a super fast way to "freeze" a snapshot of python's memory representation of objects:
gives
But I don't think that it does as good a job at save/restore for actual entire object trees (as opposed to lists/tables of complex objects) as we'd produce if System has a direct link to Station that we want, rather than just an id reference. I took a look at what it would require to pull of sanely, and the work to describe the structures nicely and modern-pythonically brings you also the entire way towards what would need doing in order to describe the schema to SQLAlchemy, at which point when SQLite is a bottle neck, people could simply swap the backend engine
|
Let me just say, from my attempts to reign in the import time on eddblink with what is now a 2.5GB file, that SQL is definitely a bottleneck. It runs pretty quickly for the first 22,000,000 entries or so, but once the DB is over 3GB in size, it slows way down. I mean, it gets slightly slower the whole way through, (inserting item 20,000 takes slightly longer than inserting item 10,000) but the difference becomes noticeable around 50% and gets worse from there. |
- Adds Progresser to spansh_plugin as an experimental ui presentation layer using rich, -- active progress bars with timers that run asynchronously (so they shouldn't have a significant performance overhead), -- opt-out to plain-text mode incase it needs turning off quickly, - present statistics with the view to give you a sense of progress, - small perf tweaks
The idea is that flake8 is really fast, so you give it it's own environment and it comes back and says "YA MADE A TYPO OLIVER YA DID IT AGAIN WHY YOU ALWAYS..." *cough*, sorry, anyway. I usually run tox one of two ways: ``` tox -e flake8 # for a quick what-did-I-type-wrong check tox --parallel # run em all, but, like, in parallel so I don't retire before you finish ```
turning on flake8 made linters very shouty
flake8 now runs its own environment that is JUST flake8 so it's fast, it doesn't install the package for itself; added lots of flake8 ignores for formatting issues I'm unclear on
added a lot more ignores and some file-specific ignores, but generally got it to a point where flake8 becomes proper useful.
This fixes all of the tox warnings that I haven't disabled, and several that I did.
fixed a few actual problems while I was at it.
When I first created the 'str' methods, it was because I wanted a display name but repr and str both confused me at the time; I've now renamed it text but apparently I forgot that in Py2.x def str() worked like __str__...
5137077
to
0396566
Compare
nothing heavy, just some routine commands I could run on each box to make sure the data didn't just get ingested but it also didn't get digested ;) |
You could do the commands tox does to check nothing is broken? |
Maybe ask around on the EDCD discord? I'm sure the people there would have lots of suggestions |
I haven't been able to test on the mac, it takes forever downloading the json file and without updates: the http headers tell us the gzip'd size but requests decompresses the file as it goes, and that .json file is, uhm, lets say very compressible? It's not compressing by a percentage, it's nearly compressing by an order of magnitude :) |
Yeah, it'd be nice if the download had progress updates. Also, yeah, 1.4 GB -> 8.9 GB is quite the compression. |
this will conflict after you merge the transfer-timeouts fix, but after that if you're ready to merge this one I think she's good. |
Something got borked:
That's the first commodity in the file, and it has the correct format. |
I'm fairly certain it's from the changes made to |
Yup, that was it. Reverting it fixed the problem. |
I'm trying to revisit various decisions I'd made during the early days that now cause problems either in performance with IDEs or get linters really bent out of shape.
There's also some work here to try and beautify things - although I'm introducing those sparingly and doing them in-place in the plugin where they seem relevant.
Some small amount of performance tuning, but it's not going to really make a noticeable difference yet.
Strongly recommend you review the individual changes; I think trying to read the entire diff would be brain-hurting...
I'm creating a pull-request before I've had chance to finish watching a complete import, mostly because of the size and to give Eyeonus a chance to ask others to review and/or reject/request changes.