Drupal 8: Migrating data from JSON files

The beauty of Drupal 8's built-in Migrate module is its flexibility. Lots of people will likely use it to migrate their Drupal 6 or 7 sites, but that's not all it can do. In fact, it's capable of migrating data from just about any data source PHP can read.

The first few times I used Migrate in Drupal 8, I was migrating data from a MySQL database into Drupal. See my previous posts about the topic here.

Recently, my team had a new task: Migrating data from a series of JSON files into Drupal nodes. As the resident data migration enthusiast here at Metal Toad, I decided to take on this task. It was a unique challenge and let me explore some different parts of the Migrate infrastructure.

Let's say we have a series of JSON files, which are all in a single directory. There might be hundreds of them, and they all have the same structure:

{ 
'title': "page 1" 
'body': '<div class="post">Lorem Ipsum dolor sit amet...</div>' 
'date': '2017-01-19' 
}

In real life, there will probably be more fields than this, but we're keeping it short for simplicity.

Our Custom Module

For this exercise, we're going to put our migration into a custom module called "json_migrate". The module's YAML file contains just the basic information:

File drupal/modules/custom/json_migrate/json_migrate.yml:

name: Migrate from JSON files
description: Custom migration module to import data from JSON files
package: Migration
type: module
core: 8.x
dependencies:
  - drupal:migrate
  - migrate_plus:migrate_plus
  - migrate_tools:migrate_tools

The Migration Definition

Our migration definition YAML file is fairly simple. It defines a custom source plugin (json_page) and the built-in "entity:node" destination plugin. The field mappings look just like any other migration.

File drupal/modules/custom/json_migrate/config/install/migrate_plus.migration.json_page.yml:

id: json_page 
label: JSON Page Migration 
Group: json_example 
source: 
  plugin: json_page 
destination: 
  plugin: entity:node 
  bundle: article 
process: 
  type: 
    plugin: default_value 
    default_value: article 
  title: title 
created: date 
  changed: date 
  'body/format': 
    plugin: default_value 
    default_value: basic_html 
  'body/value': body
  langcode: 
    plugin: default_value 
    default_value: en
  uid: 
    plugin: default_value 
    default_value: 1
 
# use forced module dependency so uninstall/reinstall works properly 
dependencies: 
  enforced: 
    module: 
      - json_migrate 

You can see above how the fields in the JSON are going to map to the Drupal fields. Since our JSON data doesn't include data like the user ID and language code, we're providing default values for these fields.

The Source Plugin

The interesting bit in this migration is the source plugin. It needs to provide its own iterator logic, which will loop through the JSON files in the file system and enumerate them as objects which can be migrated.

File drupal/modules/custom/json_migrate/src/Plugin/migrate/source/JsonPage.php:

<?php 
namespace Drupal\json_migrate\Plugin\migrate\source; 
use Drupal\migrate\Plugin\migrate\source\SourcePluginBase; 
use Drupal\migrate\Row; 
 
/** 
* Source plugin to import data from JSON files 
* @MigrateSource( 
*   id = "json_page" 
* ) 
*/ 
class JsonPage extends SourcePluginBase { 
 
  public function prepareRow(Row $row) { 
    $title = $row->getSourceProperty('title');  
    // make sure the title isn't too long for Drupal 
    if (strlen($title) > 255) { 
      $row->setSourceProperty('title', substr($title,0,255)); 
    }  
    return parent::prepareRow($row); 
  } 
 
  public function getIds() { 
    $ids = [ 
      'json_filename' => [ 
        'type' => 'string' 
      ] 
    ]; 
    return $ids; 
  } 
 
  public function fields() { 
    return array( 
      'url' => $this->t('URL'), 
      'title' => $this->t('Title'), 
      'body' => $this->t('Body'), 
      'date' => $this->t('Date Published'), 
      'json_filename' => $this-t{"Source JSON filename") 
    ); 
  } 
 
  public function __toString() { 
    return "json data"; 
  } 
 
  /** 
   * Initializes the iterator with the source data. 
   * @return \Iterator 
   *   An iterator containing the data for this source. 
   */ 
  protected function initializeIterator() { 
 
    // loop through the source files and find anything with a .json extension 
    $path = dirname(DRUPAL_ROOT) . "/source-data/*.json"; 
    $filenames = glob($path); 
    $rows = []; 
    foreach ($filenames as $filename) { 
      // using second argument of TRUE here because migrate needs the data to be 
      // associative arrays and not stdClass objects. 
      $row = json_decode(file_get_contents($filename), true); // sets the title, body, etc. 
      $row['json_filename'] = $filename;
 
      // migrate needs the date as a UNIX timestamp 
      try { 
        // put your source data's time zone here, or just use strtotime() if it's already in UTC 
        $d = new \DateTime($date, new \DateTimeZone('America/Los_Angeles')); 
         $row['date'] = $d->format('U'); 
      } catch (\Exception $e) { 
        echo "Exception: " . $e->getMessage() . "\n"; 
        $row['date'] = time();  // fallback – set it to now so we don't have errors 
      } 
 
      // append it to the array of rows we can import 
      $rows[] = $row; 
    } 
 
    // Migrate needs an Iterator class, not just an array
    return new \ArrayIterator($rows); 
  } 
} 

Our Source plugin class extendsSourcePluginBase. This means we need to create our own iterator object, which provides all the rows to import. In the initializeIterator() method, we loop through the JSON files on the file system and do some work on their contents:

  • Parse the contents into an associative array
  • Convert the date string to a Unix timestamp
  • Save the filename into the array so we can use it later
  • Add the resulting array to our $rows array

Once we have looped through all the files, we create anArrayIteratorfrom the result. Migrate needs an iterator instead of a simple array. Fortunately, PHP's built-in \ArrayIterator class does the trick.

Up in the prepareRow() method, we don't need to do much. The JSON decoding has already been done ininitializeIterator(). All we do here is truncate the title field if it's longer than the maximum 255 characters of a standard Drupal node title. (We could have even done that ininitializeIterator() if we had wanted.)

Why did we do the decoding in ininitializeIterator() instead of doing all the transformation work in prepareRow()? During my initial testing, the getIds() method was throwing errors if the ID fields weren't set in ininitializeIterator(), so I saved myself a headache and moved most of the processing into ininitializeIterator().

Now we can enable our new module:

drush en-y json-migrate

Running drush migrate-status will show us our new migration:

Group: JSON Example (json_example) Status Total Imported Unprocessed Last imported
json_page                          Idle   123   0        123

And we can run our migration with:

drush migrate-import json_page

This will loop through each of the JSON files and import the data. Go to Admin -> Content of your Drupal site. The new nodes should be visible there.

Happy coding!

Comments

Hi,
thank you, this is great example.
Will you create other one for taxonomy ?

After several tutorials, I cannot get this to work. Yours is the only one that includes a plugin. That begs the question: is it necessary to have a source file. Not even the 'advanced' sample that comes with migrate_plus has one.

Thank you for the great article!
Should the name be `drupal/modules/custom/json_migrate/json_migrate.info.yml`. I think the `info` part is missing from the module.

filemodules/custom/json_migrate/config/install/migrate_plus.migration.json_page.yml : A colon cannot be used in an unquoted mapping value at line 26 (near "
"). in C:\Users\Administrador\Desktop\acquia\chart\core\lib\Drupal\Core\Config\FileStorage.php:117

Thanks for the example. I'm new to D8 but used to do a lot with D7/drush. I reproduced these module files and renamed the .yml to .info.yml but I get this:

`The website encountered an unexpected error. Please try again later.`

Add new comment

Restricted HTML

  • Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type> <li> <dl> <dt> <dd> <h2 id> <h3 id> <h4 id> <h5 id> <h6 id>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <cpp>, <java>, <php>. The supported tag styles are: <foo>, [foo].
  • Web page addresses and email addresses turn into links automatically.
  • Lines and paragraphs break automatically.

About the Author

Keith Dechant, Software Architect

Keith has been working on the web almost since it was new, writing his first HTML in 1996 and his first PHP code in 1999. Along the way, he has written everything from e-commerce websites to mailing list software to large web applications for industrial and non-profit organizations. Recently, he has done a lot of work with ASP.NET, Django, Drupal 7 and 8, and AngularJS.

Keith is a native of Illinois who moved to Portland in 2009. He has been to 49 US states, five continents, three former Soviet republics, and two countries that no longer exist. He likes hiking, mountain climbing, classic video games, and the Oxford comma.

Ready for transformation?