Distribute chunks over slaves #53

JJK801 · 2017-02-19T19:16:03Z

Hi,

This PR aims to distribute transcoding over multiple slaves at time.

Some input arguments are tweaked (-ss, -segment_start_number) and other added (-t) in order to delegate a set of segments (defined by SEGMENTS_PER_NODE - default 5) to each host and iterate over the timeline as the slaves finish their sets (allowing to have various CPU capicities on the same network).

Tested on my configuration and works like a charm:

1x Banana Pi M1 => PMS + PRT Slave + various services
1x Raspberry Pi 3 => PRT Slave + various services

Transcoding is very fast regarding the CPUs used (Flawlessly plays 1080p movies over the network).

Because of the NFS shares, it needs a powerful local network between server and slaves for high resolution movies.

Be also careful to put the biggest CPUs first to reduce initial load time.

TODO:

Speed up slave availability checks (2,5 to 3 seconds on each iteration with my configuration)

rob256 · 2017-02-19T19:45:48Z

prt.py

+            # If no load is returned, then it is likely that the host
+            # is offline or unreachable
+            log.debug("Couldn't get load for host '%s'" % hostname)
+            del servers[hostname]


I'm not sure that you're able to remove elements from a dictionary whilst iterating over it.

This will work in python 2.7, but not python 3.x. In python 3.x dict.items() returns an iterator, whereas in python 2.7 it does not.

ok thanks i'm not a python developper at all, i will fix it asap

rob256 · 2017-02-19T20:26:06Z

prt.py

+            ss+=segment_time*SEGMENTS_PER_NODE
+            q.put((proc, hostname))
+
+        if init is True:


rob256 · 2017-02-20T00:24:43Z

prt.py

+    log.debug("Available servers : %s\n" % servers)
+
+    return servers
+


PEP8: 2 blank lines in between functions please

liviynz · 2017-02-20T11:36:32Z

So first up I'd like to say kudos for original thinking around your approach, it's good to see people showing initiative, too many are content to reuse what they find via google! Genuine praise...

Now comes the other side I'm afraid, personally I think we shouldn't be bringing this sort of function into PRT without at the very least a total from head to foot scrutiny under a microscope...and that's if at all as it a very complex type of function at the core of it and it's a minefield of problems lurking under the surface.

I'll admit that i haven't dived into the code yet as I just managed to start catching up with all my git notifications and this stood out as something that I had to flag when I read the brief about it. That being said, I'm coming at this from the perspective of someone who has been a technical lead on designs/builds/support of enterprise cluster deployments. They are some of the trickiest miracles to perform! Even using very expensive clustering software from the industry leaders, they all have issues...and that's using enterprise class hardware and attempting to minimize your issues with your solution by things like : having direct access to data via SAN arrays rather than network shares, maximizing uptime on your nodes with resilient systems, etc, etc. So introducing a function that takes an input from and external source (PMS) that we have no control over and is continuously changing with each version and then trying to distribute/load balance individual jobs across multiple nodes which could be a random mixture OS's/hardware while dealing with a having to read/write from NFS to work reliably in producing a successful output every time (or even almost every time) really is a large piece of muddy land filled with death and a pot of gold on the horizon...
In a controlled Proof of Concept environment the issues can be minimal but when it goes into the wild it will be a totally different situation where we will likely be overloaded with questions/issues built on very complex configurations which will consume the project.
Yes, I'm just one person who some will say is being ridiculously pessimistic about this function but I felt I had to make my point anyway due to how I see the it picture and what a difficult journey it is likely to be for it to be a success.
Lastly I'd like to just make something else clear, I'm by no mean saying the idea/code/work done behind this is bad in itself! I like the concept from a theoretical perspective but I think bolting it into PRT (which remember can be deployed across such a mixture of platforms by anyone that wants just wants to) a bad move for the PRT project due to the amount of moving parts required to work perfectly when splitting single jobs across multiple nodes then re-assemble. It highlights the vulnerability the process has on all of the nodes involved (plus associated supporting infrastructure for those nodes) being working correctly for a successful run each time a job is transcoded...
That's my view anyway and I've made my point in an extremely lengthy (some will say too lengthy) manor so I'll stop here hoping it's taken in the constructive but highly concerned way I meant it.
Liviy

JJK801 · 2017-02-20T13:40:08Z

@liviynz Thanks for your feedback

I completely understand your point of view and i agree. But, we aren't discussing about an entreprise software project, this is most a tweak for those who want optimize their media server at a small cost (e.g like me using some cheap hardware to transcode high quality movies). The installation method, the lack of automated testing and the manual configuration shows that it's clearly a POC that the creator shares with us (thanks to him).

IMO Plex have to work on this subject (as we prove here that it's not a so big challenge), but as they don't we can continue to improve this project and maybe provide some ideas for the futur version by a working POC.

I've many ideas to improve this project and i submitted the only "essential" part here (the distributed transcoding) because it has no real BC breaks as it just improve the background algorythm and finally the user experience.

My work on this PR is not finished as i've to make it more "fault tolerent" (by retrying the transcoding if segments failed), but all i saw using it on my configuration were quite positive (same quality as the Plex default transcoder but really faster).

FYI the previously mentioned improvement ideas:

Package the lib to auto install it (deb, rpm, etc...)
Implements a service discovery (mDNS maybe ?) so we don't need to configure each slave on both sides

It's up to the maintainers to choose whether or not they want to merge it, it doesn't change anything for me as i'm using my fork.

Thanks

wnielson · 2017-02-22T17:29:03Z

First off, thank you @JJK801 for your work on this! I haven't had time to look through the code yet, but I agree with @liviynz that if we incorporate this feature, we will have to do so carefully (perhaps as an optional feature).

As to your other suggested improvements, I'm intrigued by the idea of automatic slave discovery. We have some other features on the todo list which we could use some help with as well :)

yacn · 2017-02-22T21:35:10Z

This is exactly what I've wanted from PRT. Definitely want to keep an eye on this PR.

liviynz · 2017-02-22T22:08:17Z

The two main challenges as I see it that we have with PRT as it stands are :

We're at the mercy of the PMS devs, they're being very cool and rapidly changing their code to bring out new features, etc however due to these constant changes in the core component it means that we're always on the back foot and having to change our code or find workarounds to keep PRT working with it. A prime example is PMS 1.4+ is currently incompatible with PRT, this has happened at least once maybe twice since 1.3.
We are two people doing this via open source and PRT is popular! There's 3 different contact areas that I'm aware of and try to keep across where we get requests to help users with PRT issues. Rarely it's an issue with PRT and more often it's either a misunderstanding on their part on setup/use or they have a different/unique setup that is outside the recommended one, etc. This all keeps us busy when we have time to allocate to PRT so when deciding what to add into it we have to be careful not to be too ambitious and leave ourselves in the position that we are totally swamped by the users asking for help.

Hopefully that gives a bit of a better background for things. I'm totally up for reviewing the code and talking it through with the boss man but due to the reasons I mentioned before I'm very cautious. I know we're not dealing with enterprise but my point was that I see quite a few issues arising when using the best hardware and running clustering software which costs 100's of thousands of dollars so scaling that to open sourced code that could be running on virtually any Linux distro on any hardware there's a lot of grey area there for problems purely from the sheer variations in there.

I need to catch up on my coding anyway so definitely keen to catch up internally to talk about this and some other items...my biggest issue is I'm not native with Python and if it was built around one I am native with I'd be hammering out new quality functions quite quickly. E.g. Auto detection of slaves, I have code sitting here right now that will do just that! I wrote it for another project and it works perfectly, it even allows for quick communication via ssh, etc but it's not in Python unfortunately.

Anyway, better get back to my paid job I suppose haha

zachbunyard · 2017-03-31T20:49:22Z

@JJK801 Did the video, audio, or both get transcoded (Did PMS say it was direct play, direct stream or transcode)? Is your transcode directory on the Banana Pi, the Raspberry Pi, or somewhere else? Where is the media coming from? Do you know what the bitrate of that 1080p video was? I understand that this will increase the rate in which you can pump out transcoded material, but I'm curious if this could help with the IO issues the Raspberry Pi faces when using NFS to read from the library and then again to write to the transcode directory. Particularly with high bitrate media (~10000kbps)

Distribute chunks over slaves

73d9e4f

rob256 reviewed Feb 19, 2017

View reviewed changes

prt.py Outdated

ss+=segment_time*SEGMENTS_PER_NODE

q.put((proc, hostname))

if init is True:

Copy link

rob256 Feb 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if init

rob256 reviewed Feb 20, 2017

View reviewed changes

prt.py Outdated

log.debug("Available servers : %s\n" % servers)

return servers

Copy link

rob256 Feb 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: 2 blank lines in between functions please

Fix get_available_remote_servers compatibility + Lint

ab2dc3b

wnielson mentioned this pull request Mar 31, 2017

Playback fails after upgrade to plex 1.4.1 #52

Open

killermosi mentioned this pull request Nov 20, 2018

Is this project still running? #74

Open

zachbunyard mentioned this pull request Dec 11, 2018

Raspberry Pi 3 Cluster Testing #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribute chunks over slaves #53

Distribute chunks over slaves #53

JJK801 commented Feb 19, 2017

rob256 Feb 19, 2017

rob256 Feb 20, 2017

JJK801 Feb 20, 2017 •

edited

Loading

rob256 Feb 19, 2017

rob256 Feb 20, 2017

liviynz commented Feb 20, 2017

JJK801 commented Feb 20, 2017

wnielson commented Feb 22, 2017

yacn commented Feb 22, 2017

liviynz commented Feb 22, 2017

zachbunyard commented Mar 31, 2017

		log.debug("Available servers : %s\n" % servers)

		return servers

Distribute chunks over slaves #53

Are you sure you want to change the base?

Distribute chunks over slaves #53

Conversation

JJK801 commented Feb 19, 2017

rob256 Feb 19, 2017

Choose a reason for hiding this comment

rob256 Feb 20, 2017

Choose a reason for hiding this comment

JJK801 Feb 20, 2017 • edited Loading

Choose a reason for hiding this comment

rob256 Feb 19, 2017

Choose a reason for hiding this comment

rob256 Feb 20, 2017

Choose a reason for hiding this comment

liviynz commented Feb 20, 2017

JJK801 commented Feb 20, 2017

wnielson commented Feb 22, 2017

yacn commented Feb 22, 2017

liviynz commented Feb 22, 2017

zachbunyard commented Mar 31, 2017

JJK801 Feb 20, 2017 •

edited

Loading