Multiple Processes Reading From Pipes Getting Stuck

Let's start past immigration upwardly an all-as well-common indicate of confusion amongst Ruby developers; namely: Concurrency and parallelism are not the aforementioned thing (i.e., concurrent != parallel).

In particular, Cherry-red concurrency is when two tasks tin start, run, and complete in overlapping time periods. It doesn't necessarily mean, though, that they'll ever both be running at the same instant (e.1000., multiple threads on a single-cadre car). In dissimilarity, parallelism is when two tasks literally run at the aforementioned time (east.g., multiple threads on a multicore processor).

The key point here is that concurrent threads and/or processes will not necessarily be running in parallel.

This tutorial provides a practical (rather than theoretical) handling of the diverse techniques and approaches that are available for concurrency and parallelism in Crimson.

For more real world Cherry examples, run across our commodity virtually Ruby Interpreters and Runtimes.

Our Test Example

For a simple test case, I'll create a Mailer class and add a Fibonacci function (rather than the slumber() method) to make each request more CPU-intensive, as follows:

          grade Mailer    def self.deliver(&block)     mail = MailBuilder.new(&block).mail     mail.send_mail   end    Mail service = Struct.new(:from, :to, :subject, :trunk) practice      def send_mail       fib(thirty)       puts "Email from: #{from}"       puts "Electronic mail to  : #{to}"       puts "Subject   : #{subject}"       puts "Torso      : #{body}"     end      def fib(northward)       northward < 2 ? n : fib(n-one) + fib(n-2)     end     end    class MailBuilder     def initialize(&block)       @mail service = Mail.new       instance_eval(&block)     terminate          attr_reader :mail      %w(from to discipline torso).each practice |m|       define_method(g) exercise |val|         @post.ship("#{one thousand}=", val)       stop     end   cease end                  

We can then invoke this Mailer class equally follows to send mail:

          Mailer.evangelize practice    from    "eki@eqbalq.com"   to      "jill@example.com"   subject "Threading and Forking"   torso    "Some content" stop                  

(Note: The source code for this test case is available hither on github.)

To establish a baseline for comparison purposes, permit'south begin by doing a uncomplicated criterion, invoking the mailer 100 times:

          puts Benchmark.measure{   100.times do |i|     Mailer.evangelize do        from    "eki_#{i}@eqbalq.com"       to      "jill_#{i}@instance.com"       field of study "Threading and Forking (#{i})"       trunk    "Some content"     end   end }                  

This yielded the following results on a quad-core processor with MRI Blood-red ii.0.0p353:

          15.250000   0.020000  xv.270000 ( 15.304447)                  

Multiple Processes vs. Multithreading

There is no "ane size fits all" answer when information technology comes to deciding whether to use multiple processes or to multithread your Cherry-red application. The tabular array below summarizes some of the key factors to consider.

Processes Threads
Uses more memory Uses less retentivity
If parent dies earlier children have exited, children can become zombie processes All threads die when the procedure dies (no chance of zombies)
More expensive for forked processes to switch context since OS needs to relieve and reload everything Threads have considerably less overhead since they share address space and memory
Forked processes are given a new virtual retentivity infinite (process isolation) Threads share the aforementioned memory, so demand to control and deal with concurrent retentivity issues
Requires inter-process advice Can "communicate" via queues and shared memory
Slower to create and destroy Faster to create and destroy
Easier to code and debug Tin exist significantly more complex to code and debug

Examples of Ruby solutions that apply multiple processes:

  • Resque: A Redis-backed Crimson library for creating background jobs, placing them on multiple queues, and processing them later.
  • Unicorn: An HTTP server for Rack applications designed to just serve fast clients on depression-latency, high-bandwidth connections and have reward of features in Unix/Unix-like kernels.

Examples of Ruby solutions that use multithreading:

  • Sidekiq: A full-featured background processing framework for Ruby. Information technology aims to exist simple to integrate with any modern Runway awarding and much higher performance than other existing solutions.
  • Puma: A Ruby-red web server congenital for concurrency.
  • Thin: A very fast and simple Carmine web server.

Multiple Processes

Before we look into Ruby multithreading options, let's explore the easier path of spawning multiple processes.

In Ruby, the fork() system call is used to create a "re-create" of the current process. This new process is scheduled at the operating system level, so it tin run concurrently with the original process, but as any other independent process tin can. (Note: fork() is a POSIX arrangement phone call and is therefore not bachelor if yous are running Ruby on a Windows platform.)

OK, so let'south run our examination case, simply this time using fork() to employ multiple processes:

          puts Criterion.measure out{   100.times do |i|     fork exercise            Mailer.deliver do          from    "eki_#{i}@eqbalq.com"         to      "jill_#{i}@example.com"         subject "Threading and Forking (#{i})"         torso    "Some content"       end     end   finish   Procedure.waitall }                  

(Process.waitall waits for all kid processes to exit and returns an array of process statuses.)

This code now yields the post-obit results (again, on a quad-core processor with MRI Ruby 2.0.0p353):

          0.000000   0.030000  27.000000 (  3.788106)                  

Not besides shabby! We made the mailer ~5x faster past just modifying a couple of lines of code (i.eastward., using fork()).

Don't go overly excited though. Although information technology might be tempting to utilize forking since it'south an easy solution for Ruby concurrency, information technology has a major drawback which is the corporeality of memory that it will swallow. Forking is somewhat expensive, especially if a Copy-on-Write (Cow) is non utilized by the Ruby interpreter that you're using. If your app uses 20MB of memory, for example, forking it 100 times could potentially eat as much equally 2GB of retentiveness!

Also, although multithreading has its ain complexities as well, there are a number of complexities that need to be considered when using fork(), such equally shared file descriptors and semaphores (between parent and kid forked processes), the need to communicate via pipes, then on.

Reddish Multithreading

OK, so at present let's try to make the same programme faster using Ruby multithreading techniques instead.

Multiple threads within a single process have considerably less overhead than a corresponding number of processes since they share address space and memory.

With that in mind, let'south revisit our test instance, but this time using Ruby's Thread class:

          threads = []  puts Benchmark.measure{   100.times do |i|     threads << Thread.new do            Mailer.evangelize exercise          from    "eki_#{i}@eqbalq.com"         to      "jill_#{i}@instance.com"         subject field "Threading and Forking (#{i})"         body    "Some content"       finish     finish   end   threads.map(&:join) }                  

This code now yields the post-obit results (again, on a quad-core processor with MRI Crimson two.0.0p353):

          xiii.710000   0.040000  13.750000 ( 13.740204)                  

Bummer. That sure isn't very impressive! So what's going on? Why is this producing nigh the aforementioned results as nosotros got when nosotros ran the code synchronously?

The answer, which is the bane of being of many a Reddish programmer, is the Global Interpreter Lock (GIL). Thanks to the GIL, CRuby (the MRI implementation) doesn't really support threading.

The Global Interpreter Lock is a mechanism used in computer language interpreters to synchronize the execution of threads so that only one thread can execute at a time. An interpreter which uses GIL will e'er allow exactly i thread and one thread simply to execute at a time, even if run on a multi-core processor. Ruddy MRI and CPython are 2 of the most common examples of popular interpreters that have a GIL.

And so back to our problem, how can we exploit multithreading in Ruby to improve performance in light of the GIL?

Well, in the MRI (CRuby), the unfortunate answer is that you're basically stuck and there'due south very picayune that multithreading tin do for you.

Ruby concurrency without parallelism tin withal be very useful, though, for tasks that are IO-heavy (e.g., tasks that need to frequently wait on the network). So threads can still be useful in the MRI, for IO-heavy tasks. There is a reason threads were, after all, invented nd used even before multi-core servers were mutual.

But that said, if you take the option of using a version other than CRuby, you tin can utilize an alternative Crimson implementation such every bit JRuby or Rubinius, since they don't have a GIL and they do support existent parallel Ruddy threading.

threaded with JRuby

To show the point, here are the results we become when nosotros run the exact same threaded version of the lawmaking as before, but this time run it on JRuby (instead of CRuby):

          43.240000   0.140000  43.380000 (  five.655000)                  

Now nosotros're talkin'!

But…

Threads Own't Gratuitous

The improved functioning with multiple threads might lead one to believe that nosotros tin can just keep calculation more threads – basically infinitely – to proceed making our code run faster and faster. That would indeed be nice if it were true, but the reality is that threads are not free and so, sooner or later on, yous will run out of resources.

Permit'south say, for case, that we want to run our sample mailer not 100 times, merely 10,000 times. Let's encounter what happens:

          threads = []  puts Benchmark.measure{   10_000.times exercise |i|     threads << Thread.new practice            Mailer.evangelize do          from    "eki_#{i}@eqbalq.com"         to      "jill_#{i}@example.com"         subject "Threading and Forking (#{i})"         torso    "Some content"       stop     end   end   threads.map(&:join) }                  

Nail! I got an mistake with my Os X 10.8 subsequently spawning effectually 2,000 threads:

          can't create Thread: Resource temporarily unavailable (ThreadError)                  

As expected, sooner or later on we beginning thrashing or run out of resources entirely. So the scalability of this arroyo is conspicuously express.

Thread Pooling

Fortunately, at that place is a better way; namely, thread pooling.

A thread pool is a group of pre-instantiated, reusable threads that are available to perform work as needed. Thread pools are peculiarly useful when there are a large number of brusque tasks to exist performed rather than a pocket-size number of longer tasks. This prevents having to incur the overhead of creating a thread a large number of times.

A key configuration parameter for a thread pool is typically the number of threads in the pool. These threads can either be instantiated all at once (i.due east., when the pool is created) or lazily (i.eastward., as needed until the maximum number of threads in the pool has been created).

When the pool is handed a job to perform, it assigns the task to one of the currently idle threads. If no threads are idle (and the maximum number of threads accept already been created) it waits for a thread to complete its work and become idle and then assigns the job to that thread.

Thread Pooling

So, returning to our example, nosotros'll offset by using Queue (since it's a thread prophylactic information type) and employ a unproblematic implementation of the thread pool:

crave "./lib/mailer" require "criterion" require 'thread'

          POOL_SIZE = 10  jobs = Queue.new  10_0000.times{|i| jobs.push i}  workers = (POOL_SIZE).times.map do   Thread.new practise     begin             while 10 = jobs.pop(true)         Mailer.deliver do            from    "eki_#{x}@eqbalq.com"           to      "jill_#{ten}@example.com"           subject "Threading and Forking (#{10})"           body    "Some content"         end               stop     rescue ThreadError     end   stop finish  workers.map(&:join)                  

In the higher up code, nosotros started by creating a jobs queue for the jobs that need to be performed. We used Queue for this purpose since it'southward thread-safe (so if multiple threads access it at the aforementioned fourth dimension, it will maintain consistency) which avoids the demand for a more complicated implementation requiring the utilize of a mutex.

Nosotros then pushed the IDs of the mailers to the job queue and created our pool of 10 worker threads.

Within each worker thread, we pop items from the jobs queue.

Thus, the life-cycle of a worker thread is to continuously expect for tasks to be put into the chore Queue and execute them.

So the good news is that this works and scales without whatever problems. Unfortunately, though, this is fairly complicated fifty-fifty for our simple tutorial.

Celluloid

Thanks to the Ruby Gem ecosystem, much of the complication of multithreading is neatly encapsulated in a number of easy-to-employ Red Gems out-of-the-box.

A great example is Celluloid, one of my favorite ruby gems. Celluloid framework is a elementary and clean way to implement actor-based concurrent systems in Ruby. Celluloid enables people to build concurrent programs out of concurrent objects simply as easily every bit they build sequential programs out of sequential objects.

In the context of our give-and-take in this mail service, I'm specifically focusing on the Pools feature, merely practise yourself a favor and check it out in more item. Using Celluloid you'll be able to build multithreaded Reddish programs without worrying virtually nasty bug similar deadlocks, and yous'll find information technology piffling to use other more than sophisticated features like Futures and Promises.

Here's how simple a multithreaded version of our mailer program is using Celluloid:

          crave "./lib/mailer" require "benchmark" require "celluloid"  course MailWorker   include Celluloid    def send_email(id)     Mailer.deliver do        from    "eki_#{id}@eqbalq.com"       to      "jill_#{id}@case.com"       bailiwick "Threading and Forking (#{id})"       trunk    "Some content"     cease          end end  mailer_pool = MailWorker.pool(size: x)  10_000.times do |i|   mailer_pool.async.send_email(i) end                  

Clean, easy, scalable, and robust. What more can you lot ask for?

Groundwork Jobs

Of course, some other potentially viable alternative, depending on your operational requirements and constraints would be to utilise groundwork jobs. A number of Blood-red Gems exist to support groundwork processing (i.e., saving jobs in a queue and processing them afterward without blocking the electric current thread). Notable examples include Sidekiq, Resque, Delayed Chore, and Beanstalkd.

For this postal service, I'll use Sidekiq and Redis (an open source key-value cache and store).

First, allow's install Redis and run it locally:

          mash install redis redis-server /usr/local/etc/redis.conf                  

With our local Redis instance running, allow's take a look at a version of our sample mailer program (mail_worker.rb) using Sidekiq:

          require_relative "../lib/mailer" require "sidekiq"  form MailWorker   include Sidekiq::Worker      def perform(id)     Mailer.deliver practise        from    "eki_#{id}@eqbalq.com"       to      "jill_#{id}@example.com"       subject field "Threading and Forking (#{id})"       body    "Some content"     end     finish stop                  

We tin trigger Sidekiq with the mail_worker.rb file:

          sidekiq  -r ./mail_worker.rb                  

And so from IRB:

          ⇒  irb >> require_relative "mail_worker" => true >> 100.times{|i| MailWorker.perform_async(i)} 2014-12-20T02:42:30Z 46549 TID-ouh10w8gw INFO: Sidekiq client with redis options {} => 100                  

Awesomely unproblematic. And it tin scale easily by just changing the number of workers.

Another option is to utilize Sucker Punch, one of my favorite asynchronous RoR processing libraries. The implementation using Sucker Punch will be very similar. We'll just demand to include SuckerPunch::Task rather than Sidekiq::Worker, and MailWorker.new.async.perform() rather MailWorker.perform_async().

Determination

High concurrency is non only achievable in Reddish, but is also simpler than you might think.

One feasible approach is simply to fork a running process to multiply its processing ability. Another technique is to take advantage of multithreading. Although threads are lighter than processes, requiring less overhead, you tin can withal run out of resources if y'all commencement besides many threads concurrently. At some signal, you may notice information technology necessary to utilise a thread pool. Fortunately, many of the complexities of multithreading are fabricated easier past leveraging whatsoever of a number of available gems, such every bit Celluloid and its Role player model.

Some other way to handle time consuming processes is by using groundwork processing. There are many libraries and services that allow you to implement background jobs in your applications. Some pop tools include database-backed job frameworks and bulletin queues.

Forking, threading, and groundwork processing are all feasible alternatives. The conclusion as to which ane to employ depends on the nature of your application, your operational environment, and requirements. Hopefully this tutorial has provided a useful introduction to the options bachelor.

andersontooming.blogspot.com

Source: https://www.toptal.com/ruby/ruby-concurrency-and-parallelism-a-practical-primer

0 Response to "Multiple Processes Reading From Pipes Getting Stuck"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel