As part of a big cleanup of my web sites, I am moving a wordpress.com site into my default Ghost Docker stack. It's a flying blog, with a few hundred MB of images. The problem is that Ghost is pretty fussy with its importer, so trying to load the Wordpress.com export directly into Ghost does not compute.
You can enable plugins in Wordpress.com if you pay them money and upgrade your plan to "Business". If you want to do that, go nuts, install the Ghost plugin, export in Ghost format and the job is done.
I'm going to use a temporary/staging Wordpress.org server. I have an install method for building a self hosted app Docker and Caddy for HTTPS. This same process can easily be modified for a temporary wordpress.org server. The basic process I will be following is:
- Export from wordpress.com
- Import to a temporary wordpress.org
- Export with Ghost plugin
- Import to Ghost
Be warned, depending on the size of your blog, the temporary container will need a bit of underlying power to process the import from wordpress.com. On my usual small 1GB memory Lightsail server, it imported for a while before crashing.
I will use a higher powered server instead: Lightsail has pro rata for temporary use so I can go larger with 8GB and 2 CPU, then delete the server once finished. Observing the import while it was happening, including buffering a total of about 3.5GB memory was being used so a 4GB server would probably do it.
Export from Wordpress.com
The basic exporter on wordpress.com allows you to export the data as a zipped XML, and the media as a tar file. It's the first one we want, "Export Content".
Create a temporary/staging Wordpress.org server
This isn't a tutorial about creating a Wordpress docker container, but it should not be too hard based on my Caddy & Ghost guide. And I have added a Docker Compose reference at the end of this post. However, note:
- Use an Apache based Wordpress Docker image for simplicity. At the time of writing, the image "wordpress:apache" should do it.
- You'll need a temporary domain to point at your temporary WP server. You can use a subdomain off one you already own, if that applies. Caddy brokers incoming connections and sends to the appropriate "internal" network container if you happen to be sharing a server.
Disable crawling and note down the info in the other fields.
Once you have logged into it, delete all posts, pages and comments. It probably doesn't hurt in the Settings to set the General -> time zone and formats, Media -> uncheck organise my uploads, Comments -> uncheck allow people to submit comments, and verify Permalinks match the original wordpress.com URLs ("Day and Name") which should be the default.
Import to your temporary Wordpress.org server
Prepare the import file
On your local drive, unzip the backed up "content" file, and among the extracted files you should have a small XML file.
Run the importer
Install then run the Wordpress importer.
Import and you should get some options.
- You can choose to assign posts to a new author or an existing.
- Choose to download and install attachments. It will pull them from the .com locations and install in the local media library.
- Submit to run the import. This might take a long time for larger blogs. Note there is risk your temporary server could crash if it's small. I recommend min. 4GB memory.
- When you get the "All done" screen, have a look through the blog, and check the paths of everything. All the images should have links under your local URL.
- Optionally - back up the freshly imported server. I did this in Linux by going into /data, and doing a cp -rp on the two wptmp directories. E.g.
$ cd /data $ cp -rp wptmp wptmp.bak $ sudo cp -rp wptmp_mariadb wptmp_mariadb.bak
Cleanup the server ready for export
Remember Ghost exports only take Posts, Pages, text and images, and Tags. Not comments, shortcodes, themes or any of the other possible things out there. Any Categories will be ignored and you should convert any you want to keep to Tags before export. Ghost docs suggest a plugin called "Taxonomy Converter" but it looks a bit rubbish and it's very, very old.
I had a poke around and "Term Taxonomy Converter" seems a little better, but it's a low bar. I note in the built-in Import tools, above the Wordpress Importer, it offers a "Categories and Tags Converter", you might have better luck there.
Optional: I have no particluar existing traffic to worry about so now that I am free of the .com shackles, I am going to change to "Post name" permalinks. If you have established traffic you may not want to, as this will create a whole bunch of different paths and break backlinks or SEO.
Export from your staging site
Party time. Install the Ghost plugin on your temporary server and activate. Can you smell the excitement.
I'm downloading a 100MB zip file.
Import the staging site export to Ghost
In your target Ghost blog, delete everything, deleting the Ghost user should get rid of all the intro posts. I have turned off newsletters and set membership access to nobody. Optionally - you may wish to set the site to private (Settings -> General -> Make this site private) until you have a chance to compare paths, and presentation.
I go to Settings -> Labs -> Import content and load up the file. There seems to be no progress indicator so I wait ...
The system resources on this smaller 1GB VPS seems to cope based on a 100MB import file. After several minutes the screen returns with "Import successful with warnings." In my case, these were - duplicate user, as I already had the blog one set up; two Posts had dates in the wrong format, which it imported as Drafts. I can live with that.
The blog is photographic, so I have a chance to make use of the Ghost gallery mode now.
I edit the draft posts which are a little screwed up, and rebase the text based on the original .com site. I also use the Ghost article sidebar to set the publish date to match the original publish date and bring it back online. The post URLs look good.
The site basics look to be in place, so I clean up some old images with
docker image prune -a , and delete the wptmp temporary Lightsail server I created.
Remove subdomains no longer used.
Remove or set to private the original wodpress.com site.
I snapshot backup the new server ahead of many smaller tweaks to come.
And the longer task.. much tweaking of the new site. Navigation, CSS, favicon, image reorgs etc etc.
I hope this road has been worth following! All the best.
Docker Compose sample file for a temporary Wordpress instance
If you are going to use these examples, first
cd /data ; mkdir wptmp wptmp_mariadb
sudo chown www-data wptmp
sudo chown 999:root wptmp_mariadb
- Use a custom password in the file
- Point the domain to the parent Caddy instance
- Set up Caddy and Caddyfile to go to wptmp on port 80
- Remember what you have done in order to remove it all after completion, also check out docker prune.
wptmp: image: wordpress:apache restart: unless-stopped user: "33" environment: WORDPRESS_DB_HOST: wptmp_mariadb # This must match the DB service name WORDPRESS_DB_NAME: wptmp WORDPRESS_DB_USER: wptmp_user WORDPRESS_DB_PASSWORD: secure_pass_123 volumes: - /data/wptmp:/var/www/html networks: - internal depends_on: - wptmp_mariadb wtmp_mariadb: # This DB service name is also the DB Host name image: mariadb:10 command: --max-allowed-packet=128MB restart: unless-stopped volumes: - /data/wptmp_mariadb:/var/lib/mysql environment: MARIADB_ROOT_PASSWORD: secure_pass_123 MYSQL_DATABASE: wptmp # This name is internal to the DB MYSQL_USER: wptmp_user MYSQL_PASSWORD: secure_pass_123 networks: - internal