Wednesday, January 25, 2017

Resource building

As you may notice I like to push my tech to it's limit. Because of that I recently decided to switch to a lot better way of building resources. I took my app that were containing everything and split it on 3 different one.
They run as separate processes which connect with each other using TCP/IP. Comparing to previous setup this solution have a lot of advantages and this is some of them:
  • If one node crash I can just restart it and run whole setup further.
  • Whole node setup is scalable so I can have 100 of nodes.
  • I can run nodes on different machines. 
  • I can dispatch new version of building nodes to different machines automatically.
  • I can build specialized Python nodes for some of work.
  • Application don't need to have all this shitty code of resource compilation.
  • Whole concept became a lot easier to control.
So as you can see there is a lot of improvement :) comparing to previous one.

And here to my mind coming my recent discussion with friend:
By telling only truth you can easily manipulate people.
What you read above is not lie. But it's also not fully representing reality. And about this missing part this post will be.

When I decided to switch to this system everything looked exactly like you read above there is so much possibilities and cool features. In reality there came problems which still following me till this day.

If one node crash I can just restart it and run whole setup further.
Well yes I can but there is question what we want to do with resource that crashed it ? Crash may be unrelated to resource itself: i.e. memory leak, some memory stomping in other compilation code (yes they happening in all code bases and it's f....g hard to find them). On other hand we may try to restart it and run compilation of it again. But you need to do it only on this new node not any other because we would be back to begin.

Whole node setup is scalable so I can have 100 of nodes.
Dreams :] 100 processes running on one machine would kill performance of all nodes. Especially when run on slower machines (like my laptop). There is also issue of dispatching work to all this nodes which is big problem on it's own. I personally right now doing some trivial dependencies building just to use 4 nodes I have now and most of the time I use 1 but well this is still work in progress. Proper resolving of this will be some predictions and scaling amount of internal nodes depending on amount of work.
 
I can run nodes on different machines. 
Yes I can but if I should :) Well I still need to do here some testing to see how it will work. Most of resource building is really simple and probably just communication with different machine by TCP/IP may introduce some delay which probably will take some time. I thought about getting 5-10 Orage PI machines which I would use as cluster. I want to do it for fun to see if I will be able to handle nice scalable amount of machines. And there are also some challenges here f.ex:
  • Collecting of statistics for future improvement and balancing of system.
  • Handling crashes of building nodes and reporting issue.
  • Reconnecting node after loosing network. 
  • Handling power fails (they may corrupt some data).
  • Caching of some files for quicker execution.
  • Handling running out of space.
  • Approximations where to send work (sometimes it may be more optimal to just build data locally than do it remotely). 
And this is just the one that came to my mind right now.
 
I can dispatch new version of building nodes to different machines automatically.
Yes I can :) but I would need to build infrastructure to restart new processes and handle few other stuff and I really don't have too much time for this (there are so many other things to do :/).    
 
I can build specialized Python nodes for some of work.
I still considering this because Idea is cool but maintenance of it would make me staying whole nights. So for now I saying "no" to this idea :/
 
Application don't need to have all this shitty code of resource compilation.
This is good :) but then to separate application to build resources I need sometimes to duplicate code. I want building nodes be as much separated from higher level engine features and code as I can.
 
Whole concept became a lot easier to control. 
Concept became more clear and easier to control but execution just got a lot more complicated and introduced all this issues and problems that I didn't had previously. But well it's our job as programmers to resolve problems (just I have too much of them in my code base right now :( )
 
So yeah. Cool system that you read at begin is still there but it sound probably a lot more messy right now. It also coming with its own set of problems which weren't there before. I don't need to hide them because why? Sadly some of articles/papers skipping this most funny part of our work: 
  • What went wrong ?
  • What may go wrong ?
  • Where whole system failing ?
My release code finished building (on my laptop it's taking so long to rebuild project :( ) so I can return to coding. See you next time.

Greg

No comments:

Post a Comment