-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server hosting upgrade #148
Comments
As hoped, native support from the OS. Save me a lot of cocking about.
|
I already have one: gate.kfs.org (_I worked at Demon '93-00, so post, gate, etc are in my blood. also, I can't type "demo" to save my life, the muscle memory to hit 'n' is still too strong) STRONGLY recommend you get a virtual env set up so the system python doesn't get squirrely
maybe add some aliases to your startup shell rc And don't be doing it as root, lol. |
Hey - I hadn't created any user accounts at that point, there was only Also, I got into the bad habit of doing stuff as root on SunOS 3.5 back in '89 and it's ingrained and unlikely to stop any time soon. I don't use sudo nearly as much as I ought to. That said, I currently run TD server as an unprivileged user and will be doing so again. System python (out of the box) is 3.9, but 3.12 is a package from the centos repo, so it's also a system python. I will add appropriate alias for the TD user when I get around to creating it. I got to set up apache before I can do that, or it's not going to serve much of anything. I'll look at venv (it's a python thing, so naturally I know {next to} nothing about it). ObAnecdote: Guess who typed Is gate.kfsone.org a fixed IP or a DDNS entry? |
Note to self: Listener needs distutils, but this is now rolled into setuptools in Python 3.12 you forgetful bastard. |
Kids!
quick recap, no prejudice:
Running
it will constrain itself to only the packages installed in/by that environment. Likewise if you invoke it's pip:
but more usefully it can leverage environment variables so that you don't have to keep remembering which python you're using:
this will set environment variables and typically morph your prompt to something like:
There's a ton of stuff that leverages these sanboxes: ides like vs code, pycharm, eclipse, qt creator; tools like tox. There are a couple super-useful python tools that leverage it: poetry (which assumes a certain directory layout and uses it to make it trivial to get larger python apps up and running) and pipx (which lets you run python packages without installing them)
When I was working at Demon, I got to watch Ronald Khoo in a hurry do:
(ermin being one of the two hot-swapping centers of the world for the network, the other being? hint: center of world = cow)
|
I don't have access to the router for my LAN at this point. If and when that changes I'll look into getting a DDNS set up. |
I hope I remember correctly and you said you use Ubuntu. This might help some instructions. That said, we've managed this long without your having ssh access - it probably would be handy, but if it looks like a struggle, I'm sure we'll continue on just fine. |
Nope. Arch. But! https://wiki.archlinux.org/title/Dynamic_DNS |
"all" boils down to creating the virtualenv dir, and activating it in the .profile or .bashrc file. Commands like pip will happily uninstall packages or install upgrades in a different part of the python modules path than they were originally in. virtualenv is the secret sauce to python not intermittently needing obliterating and reinstalling.
next time you open a login-shell as tradeuser, it will be using its own python environment safely, and recreating it is a simple matter of deleting the venv dir and repeating those commands. |
All I'm seeing here is how python is a fundamentally broken piece of software that can't keep its versioning straight and breaks things, so rather than fix it, they invented this whole business I now am expected to learn. I mean, your cut and paste drop in to .bashrc is fine, but does that mean it will take a whole copy of python, /usr/lib/python3.n and whatever else, then do I install modules there or into the system? Do I do this for every python app leading to multiple copies of python strewn across the machine? Is that their grand solution? What a mess!
This is exactly the kind of crap I am upgrading the host OS to avoid, with a user having it's own environment, bin, lib, share, etc because the OS won't run modern software. Now you are telling me I still need to make a special environment because python sucks by design! |
You don't have to use virtual environments. IFF you want to keep X from interfering with Y, you can put one or the other into a venv, like, for example, if your system (X) needs to use python2, but your program uses python3 (Y). If you're only planning on using this server for the purposes of hosting the TD server, then I don't personally think you need to bother with it. |
They aren't supposed to interfere at all, that's why I'm kinda horrified. All the binaries have the version (e.g. python3.12, pip3.12) and the support files are also in version specific directories (e.g. /usr/lib/python3.12). The scenario Oliver is painting appears to be one where python will mess itself up regardless. |
It might, if you run things using |
In principle, they shouldn't interfere. In practice they won't interfere unless there is 3rd party influence such as ... using an operating system package manager to install parts of python. It's really just a microcosm of the issue that all package managers end up with: running brew and port and then updating a package from the apple store? You could use yum, snap, and apt all on the same machine without problems until one of them needs a different GLIBC. I'm jaded because I've been hot-supporting my employer's python toolchain and over the last year it's been kicking my ass a result of different cadences of python adoption/keep-up between various different vectors. It's taken for granted that MacOS ships with python, but actually https://developer.apple.com/documentation/macos-release-notes/macos-catalina-10_15-release-notes#Scripting-Language-Runtimes
You'll also find various inflection points with python on modern machines these days where python-facing things will say "oh, I can't do that, because such-and-such is managing the packages". It isn't really python's fault - they have a well spelled out way to organize things - and people have trampled it. And - for what little its worth - virtualenv long long before this was really a serious issue, it was created mostly for ci and development purposes. Another way to think of it is a "green install" of python that knows to stay inside its box. It will try to use symlinks, but it can only do that so much. |
I was going to demonstrate the os-package vs python-package conflict with centos and yum, but TIL: centos has been discontinued? https://www.redhat.com/en/topics/linux/centos-linux-eol |
Eg1. Imagine Wheel 2.5 moves exceptions into "wheel.exceptions" and errors into "wheel.errors".
Eg2. yum/apt again know nothing about Python packaging so they remove the bits of wheel they put there, but the directory is not empty so it is left there. wheel.py goes away but exceptions.py remains, and weird behaviors ensue - the package appears to be both there and not there. Eg3. During this update, your python-level reliance on the zlib package counts for nothing and stuff breaks. Other places I've seen things get fouled up:
What makes it start to get really murky is when you learn that a few years ago pip added support for user-installed packages - the equivalent of /bin, /usr/bin, /usr/local/bin, /opt/bin etc. "I can avoid that by not using it" - except there are 3rd party things that specifically use --user without asking, it was considered a best practice for a while, until people realized that now you can have wheel2.5 installed in user-pkgs and wheel 2.1.3 installed in site-pkgs and oddly "from wheel import.exceptions" works "wheel" itself is the 2.1.3 version... It's also no different than problems you run into when you have multiple compilers for a given language installed; at some point -- a recurring pita as a linux dev these days is if you install gcc before clang, unless you're very specific with env vars and command-line parameters, you will end up unable to compile on really rando things because clang is using some of gcc's headers... I wouldn't (don't) allow external users into a machine I don't want to have to maintain without 'jailing' them into a virtualenv with:
for all I know it hasn't actually saved me any headaches, but I do know that I had a lot of such headaches before I started doing it :) |
I'll read the longer post in a bit, but Centos has been EOL, Centos Stream is the replacement. The new host is rocking Centos Stream 9. Considering they kept the same version numbering, I dunno why they felt the need to change the name and make a fuss about it, but whatever. Other than the basic installation of 3.12, I'll be installing modules (any and all modules) with python3.12 -m pip install, so python will be handling that. Anything the system might need it will default to grabbing (and I guess maybe breaking) python3.9 which is the default - I won't be touching or using that for TD, so I think we'll be fine. Otherwise I have to learn and understand venv properly and completely, because I'll have to support it. External users won't have root access, so they wouldn't be able to add modules etc anyway, unless they build their own python in their $HOME, which is what I've had to do on the older host and what I'm trying to avoid in general. |
@kfsone Can you give me an alternate contact method which is safe for me to send you a password for login to Quoth? |
If you view trade.py its in there - I'd rather not also add it in a ticket comment, I don't want copilot getting all excited. |
Trom/Eye, catch me up on what the goal here is, is a different layout of the data a possibility? I found one that can reduce a 2.5GB listings.csv to 1GB - so at the very least that's ~2GB less to write, transfer, store, and read. It also gzip compresses (with default compression) down to 102MB where the listings.csv compresses to 250MB I made an experimental, jsonl-based format but the tricks might be worth using in the csv.
In the second, we also had the numbers turned into numbers for us as we went. The tricks I used might be applicable to the csv format to reduce that:
It looks like this:
The receiver will have to patch the item-id list - I compacted that by printing the change in id number.
first line summarizes the format, "epoch" is the number I subtracted from each timestamp, the item_ids is an incremental list, so the second is isn't 1, isn't 128049152+1, etc, doc lets us include a link to an explanation. after that the remaining lines are either a dict for a new station, or a list.
s is the station id relative to the previous one - so this is 0 + 128000000.
item_index, supply, demand here the item index is 0 - so it's it is 128049152 from the first line, there are 131 more lines for this station and then
s is the station_id offset relative to the last station, i.e. this is 128000000 + 512. Code: https://gist.github.com/kfsone/99ed69bf64103de8577210b5220c7b74 |
Second option: A binary format. I don't yet know whether this is something I'm going to suggest for TD to replace the .prices and .db files with, or a transfer format. A large part of the performance overheads at the moment is getting the data off disk and into a useful format, and then searching it. With SQLite we're paying for swings and roundabouts, and then using the space to play football. I think the next phase needs to be moving to our own binary format(s). I've written another experiment that reads in a clean listings.csv and then builds a binary-formatted file wastefully - that is, for every station, it allocates enough room to store 512 items or 8kb of listing data for each station. Each station then has two 64-byte "availability mask" which are stored in a separate, contiguous set of blocks so that you can rapidly scan for stations selling specific items. This turns a 2.5GB listings csv into a 2.94GB .data file. What I could perhaps do is put all the files together in a subdirectory so that if we need to regenerate/update it, we can do it in a second folder and then swap folders atomically. This is very much a draft version, and honestly doing bit-twiddling in python feels ... icky; I'd be more inclined to do it in c/c++/go/rust/zig. But even so, the draft python version reads the 2.5gb listings.csv file in 105 seconds, and writes out the data file (headers, items, stations, supply, and demand) in 30s, with lots of room for optimization - could quite reasonably be parallelized so that the writes are happening while listings.csv is being read,. code: https://gist.github.com/kfsone/68ae786cd3fe1e4fca36bfc222934900 |
I'm fine with any changes to the data format, both for import purposes and storing in the database, as long as whatever is used has the appropriate information, such as using the 'official' FDev IDs and names for everything, takes account of movable stations, and isn't horribly difficult to squash the bugs we will inevitably encounter. As far as the timestamp thing is concerned, the reason stations have a timestamp and the items have there own timestamp as well is because a station's market is not always updated whenever a station is. I'm not entirely sure, and the EDCD schemas don't really help give me a better clue, but it looks like a station gets updated whenever a CMDR docks, but the market is only updated when a CMDR opens the market after docking. I could be wrong about this, but I do know it is possible to have a different timestamp on the items versus the station itself, I've seen quite a few instances of this occurring. It might be better to use the market timestamp, I believe the station timestamp >= the market timestamp, but I'd have to find a few instances of that being different in order to verify my assumption |
Regarding specifically the import data format, I believe @Tromador would concur, but if not I'm certain he will say so. |
My goal is to get the hosting up to date and thus get rid of all the kludges I have had to implement to get an application which relies on modern software running on a host with a twenty year old operating system. I've been doing a couple of bits with the DNS (or rather, sending instructions to my friendly hostmaster) which hopefully will be done by close of business tomorrow and I should be able to start migrating services then, including TD. From my point of view it's then done. Anything with data formats you guys want to change is fine with me and probably will be easier and more supportable on the current CentOS than what we have now. I have mariadb installed already or can set up postgres. I've also worked with stuff like mongodb in the past if you want to move away from sql completely, makes little difference to me. If you want to go to a proprietary binary format, that's fine too. Just so long as it's supportable. I don't think eyeonus realises just how much I firewalled a lot of stuff away from him when we were having a lot of problems, but there was a period when I was getting emails or private messages every other day that listener had died or some such. Wasn't any point bothering eyeonus with issues he knew about, so I just did what was necessary to provide service - doing the standard unsung hero sysadmin tasks that are traditionally unappreciated lol. So I have to be able to support whatever you want to implement. Beyond that, the world is the mollusc of your choice. These are long term goals though, no? I want TD on the new host asap. I'll then want some testing help, then once we are happy it's a DNS change and we are done. |
@eyeonus separating the "station intrinsic" from the "market update" dateto two fields would still be a huge gain. I suspect - but defer - that it's very unlikely that you'll get out-of-sequence updates to an item over time. For the per-item listing to be valuable you'd need: T+0, stn 52 items 1, 3, 17, 31 updated and to cope with that we have to do a per-line-item date transformation and comparison which is painful when multiplied out by the number of rows we have :) |
(incidentally, I noticed in the current listings.csv there are records from 2020, and also some records that have an oddly suspicious price or units value of 2-to-the-power-of-31 minus one, which looks suspiciously like bad data to me. The next largest value was an order of magnitude smaller...) |
I do know all the items always have the same timestamp, since they all get updated by the same commodities schema message, so I don't think we'd need to have a full timestamp for every item, just the first. I don't know how easy that would be to implement in comparison nor how much it would save. Regarding old/bad data, I don't know how much we could do about that, except maybe disqualify it if it seems hokey when it's encountered in the source data? I am surprised that there exist stations that haven't been visited even once in several years, but then again there are a lot of stations. |
Remember that I was asking about old stations the other day? I did visit a couple with the oldest data, the ones as you say not visited for years and it quickly became obvious why. From a trade viewpoint, they just suck. Unless there is some good non trade reason to go there, it's hard to imagine any trade algorithm recommending a Commander to visit them. It may be that in some cases old garbage data is the cause, but mostly they really have little to offer players. Often they don't have much in the way of other services, may not have large pads, are in out of the way places or a combination of these. The only reasons I can think of to go there are as I did, out of random curiosity or possibly an altruistic desire to upload new and clean data. My own trip didnt last long, visiting these places was awkward and boring. |
Hmm, I see no reason not to just remove them from our data, then. Since the listener already has an automatic maxage set, they won't be re-added from the Spansh data just from being too old. Maybe we should add a purger to the listener to remove old data automatically? |
I've always been loathe to remove functionality that someone may want on occasion for some odd reason. As with many things that TD does, I couldn't find any alternative application which can give such reports on unvisited stations. So - if we keep it in listener then I guess anyone who does want ancient data can import it to their local db with spansh plugin and that will be fine. The option will be there for anyone who really wants it. |
No need to remove it - remember, guys, I'm not playing ED at the moment so my questions are often seeded by ignorance not some unspoken ultra knowledge :) |
It's not removing anything if there's another way to do it. Keeping a leaner standard dataset whilst leaving a method to get the larger dataset if desired seems like a good idea to me. It's similar to the skipvend option in eddblink. If a player has a use for the extended data, they can get it, whilst for normal use we have a more efficient solution. |
The server seems to be no longer updating. Last updates shown are 21st and 22nd May. |
Nothing to do with the upgrade, we haven't migrated yet. Opened a new issue. |
So, should I implement automatic purging? |
Sure for the server, depending on processing time for purge vs time saved elsewhere. Obviously if purge adds a huge time burden then it's not so helpful. |
How old would you say is a good age to purge? 1 month? 1 year? I'm guessing somewhere in between.... I think the best place to put it would be in the update checker, immediately before calling the export. |
No, I think a month is fine. Anything older than that has increasing
chances of being increasingly inaccurate and a month's worth of data is
still what I like to call "really a lot". Certainly plenty to make money
from trading, which ultimately is the point of the application. If people
want the older data for any reason, they can import spansh once in a while.
Once they've used such data to visit an "old data" station, then (presuming
they are using edmc or equivalent) that goes into current data in any
event. Honestly we'd probably get away with a fortnight and it would be fit
for purpose, I think a month is more than plenty.
…On Tue, 28 May 2024 at 03:19, Jonathan Jones ***@***.***> wrote:
How old would you say is a good age to purge? 1 month? 1 year? I'm
guessing somewhere in between....
—
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJJGYLCWSGP47VHB7DR73S3ZEPSSFAVCNFSM6AAAAABHAIELAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGI2DCNZRGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Omnia dicta fortiora si dicta Latina!
|
Alright, give me a bit to implement, I'll message you on discord once I've pushed it. |
Use
All using default settings and on an NVME m2 drive.
As you note in your next comment, you really should just use a binary format at this point. However, if printable characters really are strongly needed, here are some more tricks, since frankly the file's at this point already not particularly human readable anyway.
Oh, also,
That's correct, upon docking with a station the station's basic information gets updated (provided the commander has relevant telemetry tools), but only upon opening the outfitting page, the shipyard page, or the commodities page do the related journal files get updated, and consequently only then does new data get sent to EDDN. This also means it holds true that a station's update time must be less than or equal to its commodities' update times, for those prices to be "current" (i.e. someone didn't just dock then leave). |
You guys might want to move your data format & compression discussions out of this ticket. Once I'm happy the existing application is working on the new host I do plan to close this ticket, so off topic stuff might be lost. Unless it's directly related to setup on the new host, can it be discussed elsewhere please. A note about hardware though, being as nvme m2 drives were mentioned - The TD Linux host machine is a vhost running (I believe) on a Windows server with Hyper-V. I have no clue on the precise hardware spec of the windows host, but bearing in mind when I was a director I imagine the plan is still making the best use of older hardware because it fulfills the requirement and is much less expensive. I would not be surprised to find a large array of spinning rust, possibly in a separate rackmount enclosure from the CPUs (likely lots of Xeons). When I did work there, we had half a dozen racks full of stuff in air conditioned server room with a pair of redundant leased line Internet connections coming from two different directions (was fun getting permission to dig up the car park). |
Yea, that's a good idea. Though don't worry about closing the issue, github issues even when closed retain history (unless someone goes out of their way to delete it). Did want to clarify though, for the hardware thing, I figured the TD server was going to be more storage speed gated than CPU gated based on what you've mentioned around in the past. That's actually why I mentioned NVME M2s, my intention was to basically say "these are purely CPU numbers, storage related speeds won't make it any faster at least", i.e. the performance could only go down from my results above from storage. In reality I'd expect that server to compress a bit better actually due to likely having better CPUs / more CPUs than my desktop computer has, granted that really only will matter for if you wanted to use |
Actually honestly not sure about that. Because TD runs on a single core, a lot of the possibilities of threading on Xeon CPUs aren't taken advantage of. We briefly (by we I mean I bitched and eyeonus actually did the work) flirted with the idea of proper multiprocessing rather than the current threading (which in python means single core) but it was determined to be "hard and maybe to be addressed later" on account of (iirc) some global variables which need to be available to all threads at all times. So in practice, my home PC runs TD faster, because the individual cores on my I9 are normally faster. Since going to python 3.12 it does appear to offload a small part of the load onto a second processor, which I guess is python being clever in some way, but still the speed of a single Xeon core on the server isn't particularly special. If it could be rewritten to take advantage of proper SMP, then I imagine the server would be blindingly fast. That said, there are a couple of tasks (spansh import for example) which do seem to happen much faster on the server and I think this must be due to the speed of the storage. I would actually be astonished storage is really a bottleneck, I don't know if they are using spinning rust or solid state atm (I guess I could ask... ok email sent) but either way it will be many discrete drives all serving data simultaneously, so it's almost certainly the speed of the bus, not the speed of the individual drives that bottlenecks the storage and they won't have skimped on that (or at least we didn't when I was a director and the other two directors haven't changed). Progess report: I have the web server working, finally. For some reason, there always seems to be some idiot niggle with getting it going, the border firewall, internal firewall, the stupid context, something. And generally not the same niggle as the last time you did this with earlier version of OS/Apache. I've asked for a temp hostname to be set up and once my hostmaster sorts that for me I should be able to give a test url for you lot to abuse. |
Storage solution is an HP 2050 SAN full of spinning rust in various RAID arrays via SAS. This attached to a high availability cluster with failover all on an 8GB fibre. So as suspected, slightly older, but proper data centre kit. |
And here I am without even a NAS to call my own ;) |
Anyone who is willing to test the new server, it's up and running. You'll need to change the URL in the EDDB plugin code from elite.tromador.com to test.tromador.com where appropriate. |
You can also pass |
Hey, last couple of weeks have been hectic, I should get time to contribute again this weekend. |
Has anyone been able to test this for me? (other than @eyeonus). Another one successful test person would be a good confidence boost and then I can send it live. |
Sorry, Tromador. I am away from home for a week and have no pc with me. |
I've been using it to the test the listener without any problems for a few hours now, if that helps your confidence |
I'll get the DNS switched next week if it's still being well behaved. And
move the certificates I guess... ugh... hate fiddly jobs with lots of tiny
files.
…On Fri, 14 Jun 2024 at 21:57, Jonathan Jones ***@***.***> wrote:
I've been using it to the test the listener without any problems for a few
hours now, if that helps your confidence
—
Reply to this email directly, view it on GitHub
<#148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJJGYLD6GYWQAQBL37L6MBLZHNKM3AVCNFSM6AAAAABHAIELAOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYG42TGNBXHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Omnia dicta fortiora si dicta Latina!
|
Seems to be working for me, had to drop to http instead of https in the test domain as the certificate still seems to be a bit iffy |
It takes approximately the amount of space as a TD instance. There is a listener script I wrote which uses TD to produce the files the server provides. As to mirroring, that's Trom's balliwack, I don't have any idea how to do that. |
Certificate should be fixed. I am extremely unwell. End of line. |
My colleagues at AOC have set me up with a minimal Centos 9 install, so I can start migrating services including TD to it.
Opening this as a kinda placeholder, to give updates, beg for testing and handle any requests for hosting changes, given we're starting with a clean slate.
@eyeonus I know I've mentioned this before and your eyes glazed over, but if you can set up a DDNS record for your machine, I can have the firewall opened up so you can SSH and/or SFTP directly to the machine. I would also be prepared to extend this to @kfsone if necessary (either on fixed IP address, or DDNS basis).
Dynamic DNS is a way to have a permanent DNS record for your home network, even if your ISP changes your IP address regularly. Usually you'll need to configure your router to send the IP address information to your DDNS provider.
Lots more information about DDNS.
The place where I get free DDNS hosting.
The text was updated successfully, but these errors were encountered: