Skip to main content

Import from Microsoft Word

Importing from Microsoft Word is complex, because it is not a structured format. So the results will depend a lot on the input quality. The most important consideration is usually that Word styles should have been applied consistently.

There are three options when preparing import from Microsoft Word:

  • Use the Import Wizard to directly import the Ms Word content. It can handle most well-structured Word documents.

  • Convert to DocBook using the (optional) Oxygen XML editor with the Paligo plugin.

  • Convert to XML with a purchased customization from support. This is used if your content is not easily imported directly. For very large documents or many documents, the latter method is sometimes preferable.

To import content from Microsoft Word into Paligo, you will need to apply certain styles to your content and make some adjustments. This is because MS Word is unstructured content, and to import it into Paligo, it needs to be given some form of structure.

We recommend that you save a copy of your Word file for use as a Word document. Then use another copy for the Paligo import.

To prepare your MS Word file for import, complete the following tasks.

We recommend that you give your MS Word document a title. To create the title, add your title text and apply the Title style to it. For more information on using styles, see the official Microsoft Word documentation.

Microsoft-word-title.png

If you try to import a Word document that does not have a title, Paligo will apply a default title, such as "Article d3e1".

In some cases, MS Word documents that have no title may cause the import process to fail. If this happens, you may see the following error message:

"The import file could not be created. Nothing to import. The intermediate transformation resulted in an empty document."

Paligo uses the style names from MS Word to map your Word content to the appropriate elements in Paligo. For example, Paligo uses the Heading 1 style in Word to determine which parts of a Word document should be a top-level topic in Paligo.

For the import and mapping to work, it is important that you use the styles for formatting your MS Word content. You can create and apply styles from the Styles panel in Word:

Styles panel from Microsoft Word document. It has a current style field, a new style button, a select all button and a list of styles to choose from.

To work with Paligo, the styles need to have specific names:

  • Create a style called Title and apply it to the name of your document, for example, the title on the front cover. Make sure that the title is the first text that appears in the document.

    Note

    If you do not give your document a title, the publication will fail and get a name like "Article d4e1" when you import the document into Paligo.

  • If your Word document has a subtitle, create a Subtitle style and apply it to the subtitle text.

  • Make sure your headings use styles named Heading 1 through to Heading 6.

  • If you have code examples, create a paragraph style called Source Code and apply it to the code text. This has to be a paragraph style and not an inline style that is only applied to words inside a sentence.

The names of the styles are important for the Paligo import, so please make sure they match the names given above.

Tip

To find out how to set up styles, see the Microsoft documentation.

Paligo uses the hierarchy of heading styles to determine which parts of your document should be converted into top-level topics, second-level topics, third-level topics, and so on. For this to work correctly, it is important that you use the heading levels as intended and consistently in your MS Word doc.

  • Use Heading 1 for chapter headings

  • Use Heading 2 for first-level subsections in chapters

  • Use Heading 3 for second-level subsections in chapters

  • Use Heading 4 for third-level subsections in chapters

Always use them in this order and do not skip a heading level, for example, do not have a Heading 1 followed by a Heading 3. The sequence of Headings should always be Heading 1, Heading 2, Heading 3, and so on.

Typically, technical communication is organized into three or four levels of content for ease of use. If you have more levels, you can use Heading 5 and Heading 6. But if you need to use Headings 5 and 6, it could suggest a problem with the information architecture. It may be worth reconsidering the structure and reorganising it into 3 or 4 levels maximum.

Your MS Word document most likely contains some features that are needed for Word, but they are either not needed in Paligo or they cannot be imported. These include page breaks, a table of contents, and review comments.

  • Paligo cannot import text that is inside frames. If you have text in frames, you will need to move it into the regular content or it will be lost.

  • If there are any active review comments or track changes, make sure they are accepted before you import.

  • Remove the header and footer content

  • Remove all page breaks and section breaks

  • Remove the Word table of contents (TOC). Paligo will generate its own table of contents when you publish in Paligo.

Note

In some cases, graphics on the front cover can be a problem with importing. If you have front cover graphics, it can be a good idea to remove them and then add them to Paligo manually after the import.

In Microsoft Word, there are different ways to add content that is indented inside a list item (or is not indented, but really should be!). This can make it difficult to import in a consistent way.

For the cleanest import, we recommend that you format your indented list item content like this in MS Word:

  1. Create a new list item for the indented content.

  2. Add the text or image for the indented content in the new step.

  3. Position the cursor at the start of the new list item and press backspace. The list item formatting is removed so that the content is now indented as part of the previous step.

When you create indented content in MS Word in this way, Paligo will import the list and give it the expected structure:

correct-list-structure.png

Note that each list item has a listitem element and then the text for the list item is inside a para element. Here, we have imported a list that has an indented image for the third list item. The image uses a mediaobject element in Paligo and you can see that it is inside the listitem.

correct-list-structure-indented-image.png

For indented paragraphs, the paragraph is inserted in a para element inside the list item:

correct-list-structure-indented-para.png

You could have sublists indented in the same way. The important part is that the content is nested inside the listitem to which it relates.

Note

If you have content in a list and it was indented in a different way in MS Word, it may import as:

  • Regular paragraphs that break the flow of the list.

  • Literallayout elements, which are valid Paligo XML, but are not the recommended way to structure lists. In the editor, the list item can look like code, for example:

    literallayout-import.png

In both cases, you can use the Paligo editor to fix the issues. You can use the XML tree to indent the regular paragraphs and create new listitems for the literallayouts and move the content into those. Or you may prefer to address the problems in the MS Word document and re-import the content, so that the issues are fixed at the source.

Paligo can import tables from Microsoft Word, but for best results we recommend that you:

  • Avoid using large tables that cover multiple pages in MS Word.

    While large tables can import correctly, they may go beyond the boundaries of the Paligo editor, making them more difficult to edit (you will need to use the source code editor). It is better to try and break large tables down into smaller sets of tables. This makes them more usable in both Paligo and MS Word.

    When redesigning your tables, think about what information your users need from each table. Do they need to compare items? If yes, then can those related items be organized into smaller groups, perhaps smaller tables organized by product type or product name? If no, then think about making smaller tables for groups of related items rather than one oversized table.

  • Try to use simple tables where possible.

    Tables with merged cells will sometimes import as two separate columns, depending on how they are formatted in the background code in MS Word. The more complex the table, the greater the chance that you will need to manually edit the table in Paligo to get the result you want.

  • Use tables appropriately.

    Tables are a good option for presenting data, but are sometimes used for information that would work better as a regular paragraph. For example, if you have a cell with many paragraphs in it, then this could be a good candidate for being a section in its own right. The table cell would then just need a link to the appropriate section.

    If you intend on publishing to the web, it is also worth noting that tables of content can be more difficult to use on small devices such as smartphones. This is especially true of larger tables where vertical and horizontal scrolling is needed.

If you have used customized styles in your MS Word document, the styles must be renamed to Source code, before the document is imported into Paligo. Otherwise the code listings from the customized styles will not be imported.

Tip

For more information about styles in MS Word, see Microsoft Support page.

In this instruction, the customized style is called "Linux Commands".

  1. Open the document in MS Word.

  2. Open the Styles pane.

    MS_Word_StylesPane_small.png
  3. Locate the customized commands style (in this example called "Linux Commands").

  4. In the context menu, choose Modify Style.

  5. Rename the style to "Source Code" (instead of "Linux Commands").

    MSWord_StylesPane_ModifyStyle_small.png
  6. Select OK.

  7. Save the document, zip it and import it to Paligo.

    All content using the custom commands style should now be shown as programlisting since the name "Source code" is the one recognized by the tool converting from Word.

The simplest method is to import the Ms Word files directly by using the Import Wizard. Paligo can handle most well-structured Word documents.

  1. Zip each individual Word file. Do not include multiple Word files in one zip file.

    Tip

    To find out about zipping files in Word, refer to the Microsoft Word help.

  2. Use the Import Wizard to import the zip files.

    Select Word (.docx) as the type of file to import.

By using Oxygen XML editor, you can convert the content to a DocBook document before importing it.

  1. In Oxygen, create a DocBook 5.1 Article, by selecting File > New, and then selecting the proper DocBook template.

    CreateArticle.png
  2. Remove the first "sect1" element.

    SelectSect1.png
  3. Insert a new section element by pressing enter on your keyboard and select Section in the element list. This will not really be used, and we'll remove it at the end. But it's needed because of a quirk in Oxygen.

    InsertSection.png
  4. Place the cursor inside the section element.

  5. Save the document with any name you choose.

    It is however important to save it with a name, otherwise images will not be properly saved.

  6. Copy the text from your Word document. You should leave out any Table of Contents or similar, since it won't be needed.

  7. Paste the content into the section element in Oxygen.

    You will get a warning saying that it needs to place it inside the closest Article element. Go ahead and accept that.

    OxygenWarning.png
  8. When you paste the content, it will be automatically converted to the proper XML elements.

  9. Remove the empty section tag. It will be at the bottom or at the top of the document.

  10. Save the document again.

  11. Use the Import Wizard to import the resulting DocBook document.

The third method involves a preconversion to XML using a script package that requires a purchased customization. It is slightly more complex, but with good results. It is especially preferable for very large documents or many documents, as it is very fast and can be tweaked more to adapt to your content.

Note

Contact Paligo support if you would like to use this method instead of the above procedures.

Depending on the complexity of your content and mapping the structure, there may be a charge to do this conversion.

If you are experiencing problems with the MS Word import, you may find useful information about the most common issues in the following sections.

Note

If you are experiencing different types of problems with your MS Word import, please contact customer support for help.

If the import process is not complete, make sure to properly Prepare MS Word Document for Import. It needs to have a title and use headings correctly as a minimum.

Note

If you continue to experience problems, contact customer support for help.

If Paligo's Word import process completes successfully but the import folders contain no topics, or some topics are missing, it is likely due to incorrect formatting.

Make sure that your Word document has a title and uses headings correctly, as described in Prepare MS Word Document for Import.

If you prepare your MS Word document correctly, the import should create a topic for each section of content that has a heading. If you continue to experience problems, contact customer support for help.

If you imported a table from MS Word and the table extends beyond the boundaries of the Paligo editor, this is due to the table being too wide. You can still edit it in Paligo - use the scroll bar at the bottom of the Paligo editor to scroll horizontally to the additional cells.

For HTML outputs, Paligo will add a scrollbar feature to the web page so that your readers can scroll to cells beyond the display area.

For PDF outputs, the table will go off the edge of the page. For this reason, you should consider setting the table to display as landscape rather than portrait, see Rotate a Table (PDF). You may also need to redesign the table if it is too large for the page size, for example, you may need to create several smaller tables instead.

If you imported content from MS Word into Paligo and the list numbering is incorrect, the first thing to do is publish the content. The published content may have the correct numbering. This is because Paligo cannot use some of Word's numbering systems in the editor, but it can give a list item an override attribute with a number. This number is only applied when the content is published.

Another potential cause of incorrect numbering is that a list may be imported as several lists rather than one list. This can happen if the MS Word content has text or images that break the flow of the list and you used Word's "continue numbering" feature to make the list look correct. For example, let's say you had this list in MS Word:

A numbered list with list items numbered 1 to 3. The third list item is followed by an image that is at the same level as the entire list. It is followed by another list item that is numbered 4.

The image has been added outside of the list item for step 3. This breaks the flow of the list, and step 4 is actually a new list. Word's "continue numbering" feature makes it look like a continuation of the previous list.

When this is imported into Paligo, you get step 4 numbered as 1, like this:

A numbered list with list items numbered 1 to 3. The third list item is followed by an image that is at the same level as the entire list. It is followed by another list item that is numbered 1.

This is because step 4 is actually a new list and the "continue numbering" feature is not recognised.

To solve this, you can either:

  • Fix it in the MS Word file

    We recommend that you insert the image (or any text between steps) as a new list item and then use backspace to delete the numbering. This will give you an indented image (or text) that is part of the preceding list item, which is correct formatting. Then step 4 will be a list item in the same list rather than a new list.

    For details on adding content inside a list item, see Prepare MS Word Document for Import.

  • Use the XML tree in Paligo to move the image inside listitem 3. Then add list item 4 to the end of the first list. Finally, remove the extra orderedlist element (it represents the second list).

    XML tree view of an ordered list. At the top level it has ordered list, mediaobject, and then another ordered list. The mediaobject is incorrectly at the top level. It should be inside the third list item in the first orderedlist.
    XML tree view of an ordered list. At the top level it has ordered list. It has 4 list items at the second level. Inside the third list item there is a para and a mediaobject. Inside the 4th list item that is a para.

    Before (left) and after (right). The before image shows the imported structure. The after image shows the corrected structure.

If your lists in MS Word use a soft return (shift + enter) to place content on the next line, it will be imported into Paligo inside a literallayout element. You can see it in the Paligo editor as there is a shaded box around the content and, if you select it, you can see the literallayout element in the Element Structure Menu.

Paligo topic containing a number list. The first list item has a shaded box around it. In the element structure menu, the literallayout element is highlighted.

While this is valid, it is not the way text and images are usually added to a list item in Paligo.

The more commonly used structure for this content is:

<orderedlist>
    <listitem>
      <para>This is step one.</para>
      <para>In MS Word, this line was created by using a soft return at the end of the previous line.</para>
    </listitem>
    <listitem>
      <para>This is step two.</para>
    </listitem>
  </orderedlist>

So the difference is that the extra line of text in the step is inside an additional para element rather than a literallayout element. If the content in the list was an image, it would be inside a mediaobject structure instead. But in both cases, they should be inside the list item.

To correct your content you can either:

  • Edit the lists in MS Word

    Remove the soft returns and then add a new list item for the text or image. This will create an extra step that you do not want, but that is intended for now. Next, position the cursor at the start of the new list item and press backspace. This will make the text/image an indented part of the previous list item. In this form, it will import into Paligo cleanly, with no literallayout element.

    For details on adding content inside a list item, see Prepare MS Word Document for Import

  • Edit the lists in Paligo

    For text between lists, add a para element inside the listitem element and then add your text content to that.

    For images between lists, either insert an image inside the listitem element or use the XML tree to move the mediaobject element into the listitem.

Tip

To learn how to

Paligo imports Microsoft Word number lists as ordered lists. If you want them to be imported as procedures, it may be possible to do that as a customization project (there is usually a fee for customization projects). Please contact customer support for details.

The Paligo editor always shows tables as portrait. For HTML outputs, wide tables are given a scroll bar so that users can access all of the data in the table. For PDF outputs, you can set them to display in landscape mode instead, see Rotate a Table (PDF).