Load Mode

Related Topics:  Selecting different modes  |  Create Mode  |  Presets  |  Schema Overview  |  Column specification syntax

"Load Mode" is used to import data from a text file or Excel® spreadsheet into the database.

Values for link and attribute fields can be extracted from the file or invariate values can be provided explicitly.

It is possible to create new instances and also to add or replace data in instances which have already been created.

For information about exactly what is required for each table, see the schema overview.

To discover more about individual attributes, right-click on the name of the attribute and select What is this attribute? from the menu that is displayed.

Some simple examples are presented below, for more complete examples of "Load Mode" usage please see the Tutorials.

Quick links to sections of this document:

Overview

Each line of the file, or row of the spreadsheet, should correspond to a single instance to be loaded into the database; however lines can be ignored as required.

Data is loaded by identifying three things: (a) which of the lines contain data (b) how to break these lines up into columns and (c) which of those columns contain useful values and where these values should be stored.

A set of controls, called the parsing settings, are provided for identifying which lines of the file (or rows of the spreadsheet) are of interest and for indicating how to split lines into columns.

The identification of values of interest is done using the column specification syntax. Values can be extracted from one or more columns of the file and can be manipulated in a wide variety of different ways as they are loaded.

The Preset feature allows you to store frequently used settings for later use. The current parsing settings and column specifications can be saved by pressing the Save current settings as a new preset button and providing a descriptive name for the saved settings. This name will then appear in the drop-down list. When the name is selected from this list, the previously saved settings will be retrieved.

The User Interface

The interface is divided into four areas:

1. The file chooser which contains the Select file button used for picking a data file and the View file contents button which opens a window showing the contents of the selected file.

2. The preset controls which allow quick access to previously defined settings and contains buttons for saving the current settings for future re-use, removing presets and so on.

3. The parser controls which are used to define which lines of a file (or rows of a spreadsheet) are to be used and how those lines should be converted into columns.

4. The data setting controls which are used to set up the mapping between the data columns and the various data fields in the database.

Each of these sets of controls are explained in more detail in the following sections.

The Data Loading Process

The procedure begins by choosing the file to import from. The Select file button activates a file chooser dialog box with which to select the file. Files can be either plain text or Excel® spreadsheets. After selecting a file, maxdLoad2 must be shown which parts of the file are of interest and whereabouts in the database these parts should end up.

Working With Excel® Spreadsheets

If the spreadsheet file contains mutliple worksheets then when the file is first opened, a dialogue box is displayed showing the names of the worksheets and asking which one should be selected. Selection of a particular worksheet can be done automatically by adding a ":" followed by the worksheet number to the end of the file name, for example:  c:\data\my_stuff.xls:3  selects the third worksheet from the file.

No parameters needs to be set to identify the columns in an Excel® spreadsheet file. All of the column detection settings are ignored when reading data from Excel® files.

Working With Tabular Files

The first task is to split the lines of the file into columns. Several of the parsing settings influence how the file is converted into columns for display in this view namely the 'text encoding', 'line prefix', 'line suffix' and 'column delimiter'.

Having the right value for these settings is crucial for correct data extraction. The tabular view of the file displayed by the View file contents button is a handy way to check that the lines have been correctly converted into a columns. If the tabular view appears to be incorrect, close it, adjust the settings accordingly and press the View file contents button again.

Creating or Updating ?

"Load Mode" can be used both for creating new instances in the database or for updating existing instances. Updating covers both adding additional data to existing instances or replacing old data values with new ones.

The Name Matching control selects how maxdLoad2 will respond when it sees that the name assigned to an instance coming from the data file matches the name of one or more instances already in the database.

As detailed here, the three possible settings for Name Matching enable either updates to be perfomed as required, or for new instance creation to be forced.

An imporant feature to consider when updating existing instances is that it is possible to specify an incomplete set of information for an instance. If, for example, you are adding information to some existing "Feature" instances then you do not have to respecify the "ArrayType" link (unless of course you want to change it). When the Validate button is used to check the data, or the actual loading is initiated, a warning will be displayed that there are one or more missing items and you will be offer the choice of continuing or cancelling the operation. If you are confident that the missing information has already been loaded previously, then it is safe to continue.

Identifying The Interesting Data

The 'interesting' data (i.e. the data that should be loaded into the database) is identified by selecting which of the lines of the file (or rows of the spreadsheet) should be processed and which columns (or combinations of columns) should be used from these lines.

Once again, the tabular view of the file contents is useful as it displays the line (row) and column numbers and provides mouse "tooltips" which identify specific cells in the table when the mouse is hovered over them.

Two things need to be done in order to show maxdLoad2 which data to load into the database; firstly indicating which lines of the file (or row of the spreadsheet) contain data and secondly, identifying which data should be extracted from the columns on those lines and placed into names, links and attributes in the database.

1. Select which lines to ignore

The parsing settings controls (on the left-hand side of the window) are used to define one or conditions against which each line of the file (or row of the spreadsheet) is checked. If any of the conditions are true, then the line will be ignored, otherwise it will be included.

Ignore the first ... lines

The specified number of lines, counting from the top of the file, will be ignored.

Ignore the last ... lines

The specified number of lines, counting from the bottom of the file, will be ignored.

Ignore lines until ... seen

All lines before and including the first one to match this regular expression will be ignored.

Ignore lines once ... seen

Once a line which matches this regular expression is seen, that line and all subsequent lines will be ignored.

Ignore lines beginning with ...

If this option is selected, then any lines which start with the specified sequence of characters will be ignored.

Ignore blank lines

If this option is selected, then all lines which are blank will be ignored (otherwise such a line will cause an error).

2. Extracting data from columns

Once the interesting lines have been identified, each of them is processed in turn by converting it into columns (using the settings outlined above) and then the values in one or more of those columns will be used to create a new instance in the database, or update an existing instance.

The right hand side of the window contains input fields which correspond exactly to the set of fields that are displayed in a "Create Mode" form for the same schema element.

The only mandantory item which must be provided is the column (or combination of columns) which contains the name for the instance. Clearly, without the name, it is impossible to create a new instance or identify an existing instance for updating.

The requirements for providing values for the link fields and attribute fields are exactly the same as in "Create Mode". That is, fields which require a value will be identified by their colour.

Values for all fields are provided using the column specification syntax which enables any combination of static statings and information extracted from the data file to be constructed. Some examples of how this is done are found below.

For convenience, the fields have a popup context menu (accessible by either a right-mouse-button click or holding ALT or CTRL whilst doing a left-mouse-button click).

Available on the menu for all types of fields, the 'Insert' sub-menu allows access to some common special characters such as mathematical symbols and Greek letters.
For Link fields, the 'Recent instances' sub-menu allows easy selection of recently created instances for the respective schema object being linked to.
Found on the menu for Attribute fields, the 'Possible values' sub-menu provides a convenient way of picking a valid value for attributes which have predefined vocabularies.

Saving settings for future re-use

The complete collections of parameters that have been specified for a loading a particular data format (namely both the parser settings and the data settings) can be saved for future re-use.

To save the parameters, press the Save current settings as a new preset button and provide a memorable name to decribe them. These settings can then be retrieved

Previewing, Validation and Loading

The Validate reads the data from the file, applying the parser and data settings and attempts to detect any errors such as missing or illegal values and overly long names.

The Preview button opens a window which shows how the data in the file will be interpreted. The fields must be valid before a preview can be displayed. If any of the required fields are missing, or if the values in the file are invalid then a message describing problem the will be displayed instead.

Once all of the data settings have been configured and validating them suggest that all is well, the Load button will begin to load the new instances into the database. When loading large numbers of instance, a progress bar will give information about the current state of the task and a Cancel button will appear which enables the operation to be aborted.