Personal tools

How to Parse a Geographic Position

From Henry Support

Jump to: navigation, search

You can extend the rules that Henry uses to detect a geographic position (latitude and longitude) when importing plain text (using Edit | "Paste into new Overlay", File | "Create Overlays from a Text File...", or the Henry Overlay Tool).

This is an advanced option. You should make a copy of the SupportedCharacters.XML file (refered to below) so that you can re-instate it if you make a mistake.

Henry detects a geographic position in text by looking for patterns. For example, it knows that "E" and "W" are used to indicate that the preceding numbers are a longitude.

You can add to some of these patterns by editing the SupportedChartacters.xml file.

Note: The location of SupportedCharacters.xml is indicated in Help | About. You can change it using the CherSoft Registry Configuration tool (CRE.exe) which is found in the same directory as Henry.exe (typically c:\program files (x86)\CherSoft\Henry\). The default location, c:\program files (x86)\CherSoft\Henry\ is a read-only location on disk; you will need to copy the file to a writable location, such as c:\programdata\chersoft\henry\ and update Henry's knowledge of the new location using CRE. E.g.

Image:CRE SupportedCharacters.png

The following elements of the geographic position can be configured. Each is represented by an XML structure consisting of tags in angle brackets:

element Type Comment
Degrees <Value> The Unicode character values that represent a degree symbol
Minutes <Value> The Unicode character values that represent a minute symbol
Seconds <Value> The Unicode character values that represent a seconds symbol
Decimal <Value> The Unicode character values that represent a decimal point
LatLonSeparator <Value> The Unicode character values that are allowed between the latitude and longitude parts
AngleComponentSeparator <Value> The Unicode character values that are allowed between the degrees and minutes and between the minute and seconds of either latitude or longitude
LatPrefix <String> The text that is allowed (but only required if the LonPrefix was found) before the latitude part
LonPrefix <String> The text that is allowed (but only required if the LatPrefix was found) before the longitude part
Minus <Value> The Unicode character values that represent a negative (only used when not using the cardinal values; N,S,E,W)
N <Value> The Unicode character values that represent the north cardinal indicator
S <Value> The Unicode character values that represent the south cardinal indicator
E <Value> The Unicode character values that represent the east cardinal indicator
W <Value> The Unicode character values that represent the west cardinal indicator
Space <Value> The Unicode character values that are treated as white space between parts of the position

<Value>s need to be entered as Unicode values which can be obtained from [1]

<String>s need to be entered as text.

An example of the characters that will be interpreted as a degree symbol:

  <SupportedCharacter>
		<CharacterType>Degrees</CharacterType>
		<UnicodeValues>
			<Value>00BA</Value>
			<Value>00B0</Value>
			<Value>002A</Value>
      <Value>002D</Value>
		</UnicodeValues>
	</SupportedCharacter>

An example of the characters that will be interpreted as a north cardinal indicator:

	<SupportedCharacter>
		<CharacterType>N</CharacterType>
		<UnicodeValues>
			<Value>006E</Value>
		</UnicodeValues>
	</SupportedCharacter>


Home