bdutil is a command-line script used to manage Hadoop instances on Google Compute Engine. bdutil manages deployment, configuration, and shutdown of your Hadoop instances.


bdutil depends on the Google Cloud SDK. bdutil is supported in any posix-compliant Bash v3 or greater shell.


See the command-line deployment to learn how to set up your Hadoop instances using bdutil.

  1. Install the Google Cloud SDK if not already installed.
  2. Installing bdutil

    Follow these steps to install bdutil on your computer:

    1. Download a copy of the bdutil package from one of these sources:
      1. A zip file containing the latest stable bdutil release
      2. Clone the bdutil GitHub repository
    2. If you downloaded the .zip file, navigate to where you downloaded the file and extract it
    3. Open the bdutil folder (it will be named bdutil-version)

    Now you have the bdutil scripts installed on your local computer.

  3. Modify the following variables in the file:
    • PROJECT – Set to the project ID for all bdutil commands. The project value will be overridden in the following order (where 1 overrides 2, and 2 overrides 3):
      1. -p flag value, or if not specified then
      2. PROJECT value in, or if not specified then
      3. gcloud default project value
    • CONFIGBUCKET – Set to a Google Compute Storage bucket that your project has read/write access to.
  4. Run bdutil --help for a list of commands.

The script implements the following commands, which are very similar:

  • bdutil create creates and starts instances, but will not apply most configuration settings. You can call bdutil run_command_steps on instances afterward to apply configuration settings to them. Typically you wouldn’t use this, but would use bdutil deploy instead.
  • bdutil deploy creates and starts instances with all the configuration options specified in the command line and any included configuration scripts.

Components installed

The latest release of bdutil is 1.3.3. This release of bdutil installs the following versions of open source components:

  • Hadoop 2 – 2.7.1
  • Spark – 1.5.0
  • Pig – 0.12
  • Hive – 1.2.1