Python Subprocess

python

Subprocess

A running system is known as a process. Each process has its own framework state, which incorporates memory, lists of open records, a program counter that monitors the guidance being executed, and a call stack used to hold the neighborhood factors of capacities.

Regularly, a process executes proclamations consistently in a solitary grouping of control stream, which is sometimes called the principle string of the process. At some random time, the program is just doing a certain something.

A program can make new processes utilizing library capacities, for example, those found in the os or subprocess modules, for example, os.fork(), subprocess.Popen(), and so on Nonetheless, these processes, known as subprocesses, run as totally free elements each with their own private framework state and primary string of execution.

Since a subprocess is free, it executes simultaneously with the first process. That is, the process that made the subprocess can proceed to take a shot at different things while the subprocess completes its own work in the background.

Subprocess Module

The subprocess module allows us to:

  1. spawn new processes
  2. connect to their input/output/error pipes
  3. obtain their return codes

It offers a higher-level interface than some of the other available modules, and is intended to replace the following functions:

  1. os.system()
  2. os.spawn*()
  3. os.popen*()
  4. popen2.*()
  5. commands.*()

We cannot use UNIX commands in our Python script as if they were Python code. For example, echo name is causing a syntax error because echo is not a built-in statement or function in Python. So, in Python script, we’re using print name instead.

To run UNIX commands we need to create a subprocess that runs the command. The recommended approach to invoking subprocesses is to use the convenience functions for all use cases they can handle. Or we can use the underlying Popen interface can be used directly.

os.system()

The simplest way of running UNIX command is to use os.system().

>>> import os
>>> os.system('echo $HOME')
/user/khong
0

>>> # or we can use
>>> os.system('echo %s' %'$HOME')
/user/khong
0

As expected, we got $HOME as stdout (to a terminal). Also, we got a return value of 0 which is the result of executing this command, which means there was no error in the execution.

os.system(‘command with args’) passes the command and arguments to our system’s shell. By using this can actually run multiple commands at once and set up pipes and input/output redirections. :

os.system('command_1 < input_file | command_2 > output_file')

If we run the code above os.system(‘echo $HOME’) in the Python IDLE, we only see the 0 because the stdout means a terminal. To see the command output we should redirect it to a file, and the read from it:

>>> import os
>>> os.system('echo $HOME > outfile')
0
>>> f = open('outfile','r')
>>> f.read()
'/user/khong\n'

os.popen()

Open a line to or from order. The return esteem is an open record object associated with the line, which can be perused or composed relying upon whether mode is ‘r’ (default) or ‘w’. The bufsize contention has a similar importance as the relating contention to the implicit open() work. The leave status of the order (encoded in the arrangement indicated for pause()) is accessible as the return estimation of the nearby() technique for the document object, then again, actually when the leave status is zero (end without blunders), None is returned.

>>> import os
>>> stream = os.popen('echo $HOME')
>>> stream.read()
'/user/khong\n'

os.popen() does the same thing as os.system except that it gives us a file-like stream object that we can use to access standard input/output for that process. There are 3 other variants of popen that all handle the i/o slightly differently.

If we pass everything as a string, then our command is passed to the shell; if we pass them as a list then we don’t need to worry about escaping anything.

However, it’s been deprecated since version 2.6: This function is obsolete. Use the subprocess module.

subprocess.call()

This is basically just like the Popen class and takes all of the same arguments, but it simply wait until the command completes and gives us the return code.

subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)

Run the command described by args. Wait for command to complete, then return the returncode attribute.

>>> import os
>>> os.chdir('/')
>>> import subprocess
>>> subprocess.call(['ls','-l'])
total 181
drwxr-xr-x    2 root root  4096 Mar  3  2012 bin
drwxr-xr-x    4 root root  1024 Oct 26  2012 boot
...

The command line arguments are passed as a list of strings, which avoids the need for escaping quotes or other special characters that might be interpreted by the shell.

>>> import subprocess
>>> subprocess.call('echo $HOME')
Traceback (most recent call last):
...
OSError: [Errno 2] No such file or directory
>>>
>>> subprocess.call('echo $HOME', shell=True)
/user/khong
0

Setting the shell contention to a genuine worth causes subprocess to bring forth a middle of the road shell process, and tell it to run the order. All in all, utilizing a halfway shell implies that factors, glob designs, and other exceptional shell highlights in the order string are processed before the order is run. Here, in the model, $HOME was processed before the reverberation order. As a matter of fact, this is the situation of order with shell extension while the order ls – l considered as a basic order.

Here is a sample code (PyGoogle/FFMpeg/iframe_extract.py). It downloads YouTube video and then extracts I-frames to sub folder:

'''
iframe_extract.py - download video and ffmpeg i-frame extraction
Usage: 
(ex) python iframe_extract.py -u https://www.youtube.com/watch?v=dP15zlyra3c
This code does two things:
1. Download using youtube-dl
cmd = ['youtube-dl', '-f', videoSize, '-k', '-o', video_out, download_url]
2. Extract i-frames via ffmpeg
cmd = [ffmpeg,'-i', inFile,'-f', 'image2','-vf',
        "select='eq(pict_type,PICT_TYPE_I)'",'-vsync','vfr', imgFilenames]
'''

from __future__ import unicode_literals
import youtube_dl

import sys
import os
import subprocess
import argparse
import glob

if sys.platform == "Windows":
    FFMPEG_BIN = "ffmpeg.exe"
    MOVE = "move"
    MKDIR = "mkdir"
else:
    FFMPEG_BIN = "ffmpeg"
    MOVE = "mv"
    MKDIR = "md"


def iframe_extract(inFile):
# ffmpeg -i inFile -f image2 -vf \
#   "select='eq(pict_type,PICT_TYPE_I)'" -vsync vfr oString%03d.png

    # infile : video file name 
    #          (ex) 'FoxSnowDive-Yellowstone-BBCTwo.mp4'
    imgPrefix = inFile.split('.')[0]
    # imgPrefix : image file 

    # start extracting i-frames
    home = os.path.expanduser("~")
    ffmpeg = home + '/bin/ffmpeg'

    imgFilenames = imgPrefix + '%03d.png'
  
    cmd = [ffmpeg,'-i', inFile,'-f', 'image2','-vf',
        "select='eq(pict_type,PICT_TYPE_I)'",'-vsync','vfr', imgFilenames]
    
    # create iframes
    print "creating iframes ...."
    subprocess.call(cmd)

    # Move the extracted iframes to a subfolder
    # imgPrefix is used as a subfolder name that stores iframe images
    cmd = 'mkdir -p ' + imgPrefix
    os.system(cmd)
    print "make subdirectoy", cmd
    mvcmd = 'mv ' + imgPrefix + '*.png ' + imgPrefix
    print "moving images to subdirectoy", mvcmd
    os.system(mvcmd)



def get_info_and_download(download_url):

    # Get video meta info and then download using youtube-dl

    ydl_opts = {}

    # get meta info from the video
    with youtube_dl.YoutubeDL(ydl_opts) as ydl:
        meta = ydl.extract_info(download_url, download=False)

    # renaming the file 
    # remove special characters from the file name
    print('meta[title]=%s' %meta['title'])
    out = ''.join(c for c in meta['title'] if c.isalnum() or c =='-' or c =='_' ) 
    print('out=%s' %out)
    extension = meta['ext']
    video_out = out + '.' + extension
    print('video_out=%s' %video_out)
    videoSize = 'bestvideo[height<=540]+bestaudio/best[height<=540]'
    cmd = ['youtube-dl', '-f', videoSize, '-k', '-o', video_out, download_url]
    print('cmd=%s' %cmd)

    # download the video
    subprocess.call(cmd)

    # Sometimes output file has format code in name such as 'out.f248.webm'
    # so, in this case, we want to rename it 'out.webm' 
    found = False
    extension_list = ['mkv', 'mp4', 'webm']
    for e in extension_list:
       glob_str = '*.' + e
       for f in glob.glob(glob_str):
          if out in f:
             if os.path.isfile(f):
                video_out = f
                found = True
                break
       if found:
          break
       
    # call iframe-extraction : ffmpeg
    print('before iframe_extract() video_out=%s' %video_out)
    iframe_extract(video_out)
    return meta



def check_arg(args=None):

# Command line options
# Currently, only the url option is used

    parser = argparse.ArgumentParser(description='download video')
    parser.add_argument('-u', '--url',
                        help='download url',
                        required='True')
    parser.add_argument('-i', '--infile',
                        help='input to iframe extract')
    parser.add_argument('-o', '--outfile',
                        help='output name for iframe image')

    results = parser.parse_args(args)
    return (results.url,
            results.infile,
            results.outfile)


# Usage sample:
#    syntax: python iframe_extract.py -u url
#    (ex) python iframe_extract.py -u https://www.youtube.com/watch?v=dP15zlyra3c

if __name__ == '__main__':
    u,i,o = check_arg(sys.argv[1:])
    meta = get_info_and_download(u)

 

subprocess.check_call()
subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False)

The check_call() function works like call() except that the exit code is checked, and if it indicates an error happened then a CalledProcessError exception is raised.

>>> import subprocess
>>> subprocess.check_call(['false'])
Traceback (most recent call last):
...
subprocess.CalledProcessError: Command '['false']' returned non-zero exit status 1

 

subprocess.check_output()
subprocess.check_output(args, *, stdin=None, stderr=None, 
                                 shell=False, universal_newlines=False)

The standard input and output channels for the process started by call() are bound to the parent’s input and output. That means the calling program cannot capture the output of the command. To capture the output, we can use check_output() for later processing.

>>> import subprocess
>>> output = subprocess.check_output(['ls','-l'])
>>> print output
total 181
drwxr-xr-x    2 root root  4096 Mar  3  2012 bin
drwxr-xr-x    4 root root  1024 Oct 26  2012 boot
...
>>> output = subprocess.check_output(['echo','$HOME'], shell=True)
>>> print output
/user/khong

This function was added in Python 2.7.

 

subprocess.Popen()

The underlying process creation and management in this module is handled by the Popen class. It offers a lot of flexibility so that developers are able to handle the less common cases not covered by the convenience functions.

subprocess.Popen() executes a child program in a new process. On Unix, the class uses os.execvp()-like behavior to execute the child program. On Windows, the class uses the Windows CreateProcess() function.

args of subprocess.Popen()

class subprocess.Popen(args, bufsize=0, executable=None, 
                       stdin=None, stdout=None, stderr=None, 
                       preexec_fn=None, close_fds=False, 
                       shell=False, cwd=None, env=None, universal_newlines=False, 
                       startupinfo=None, creationflags=0)
    1. args:
      should be a sequence of program arguments or else a single string. By default, the program to execute is the first item in args if args is a sequence. If args is a string, the interpretation is platform-dependent. It is recommended to pass args as a sequence.

 

    1. shell:
      shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.
      On Unix with shell=True, the shell defaults to /bin/sh.

      1. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them.
      2. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself. That is to say, Popen does the equivalent of:
        Popen(['/bin/sh', '-c', args[0], args[1], ...])

 

    1. bufsize:
      if given, has the same meaning as the corresponding argument to the built-in open() function:

      1. 0 means unbuffered
      2. 1 means line buffered
      3. any other positive value means use a buffer of (approximately) that size
      4. A negative bufsize means to use the system default, which usually means fully buffered
      5. The default value for bufsize is 0 (unbuffered)

 

    1. executable:
      specifies a replacement program to execute. It is very seldom needed.

 

    1. stdin, stdout and stderr:
      1. specify the executed program’s standard input, standard output and standard error file handles, respectively.
      2. Valid values are PIPE, an existing file descriptor (a positive integer), an existing file object, and None.
      3. PIPE indicates that a new pipe to the child should be created.
      4. With the default settings of None, no redirection will occur; the child’s file handles will be inherited from the parent.
      5. Additionally, stderr can be STDOUT, which indicates that the stderr data from the child process should be captured into the same file handle as for stdout.

 

    1. preexec_fn:
      is set to a callable object, this object will be called in the child process just before the child is executed. (Unix only)

 

    1. close_fds:
      is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed. (Unix only). Or, on Windows, if close_fds is true then no handles will be inherited by the child process. Note that on Windows, we cannot set close_fds to true and also redirect the standard handles by setting stdin, stdout or stderr.

 

    1. cwd:
      is not None the child’s current directory will be changed to cwd before it is executed. Note that this directory is not considered when searching the executable, so we can’t specify the program’s path relative to cwd.

 

    1. env:
      is not None, it must be a mapping that defines the environment variables for the new process; these are used instead of inheriting the current process’ environment, which is the default behavior.

 

    1. universal_newlines:
      is True, the file objects stdout and stderr are opened as text files in universal newlines mode. Lines may be terminated by any of ‘\n’, the Unix end-of-line convention, ‘\r’, the old Macintosh convention or ‘\r\n’, the Windows convention. All of these external representations are seen as ‘\n’ by the Python program.

 

    1. startupinfo:
      will be a STARTUPINFO object, which is passed to the underlying CreateProcess function.

 

  1. creationflags:
    can be CREATE_NEW_CONSOLE or CREATE_NEW_PROCESS_GROUP. (Windows only)