More MS Word for DOS file parsing in the name of GSM
It seems that from TS 09.02 version 4.2.0 on, the editors have actually
put markers as "hidden text" inside the Word document, which allow
better automatic detection when a given ASN.1 module starts, is interrupted
by plain text, continues and ends. The following screen shot (from Section 14
of the above-mentioned document) is human-readable hidden text explaining the
So now I'm adding this format as a second option to my extraction tool.
Please note: The tool I wrote yesterday (working fine with version 3.x.y of 09.02)
is available from asn1_docextract.git.
Later today it should also support the >= 4.2.0 annotations outlined in this
Implementing a custom MS Word for DOS file parser to properly do GSM SS7
Yes, I'm not kidding!
In recent months, I've been writing quite a bit of GSM MAP (Mobile Application
Part) code. MAP is the protocol used heavily in the GSM core network and
especially on the roaming interfaces between different operators. It is
specified in GSM TS 09.02 and later 3GPP TS 29.002.
The protocol specification relies on ASN.1 description of the messages as well
as the regular BER encoding rules. ASN.1 is this marvelous technology that
allows a protocol to be specified in an abstract and formal notation, in an
extensible way, removing all the problems of human-written marshalling code,
full of errors and differences due to different developers interpreting a
human-readable specification in different ways.
So far so good. You think it should be simple to write a parser and generator
for MAP messages: Simply feed them into the ASN.1 compiler of your choice, it
will generate code in the target language you require.
As long as both sides of the communication do that using exactly the same
revision of the specification (and don't make implementation mistakes), this
will work. The reality looks very different, though :( When I test my code
against something like one million of real-world messages captured on a
production SS7 roaming interface, it produces errors already on packet number
six of that trace.
The problem is: The protocol designers have not specified the first versions
in a really extensible way, i.e. a given operation originally only returned
one atomic data field, and it was later extended to return a sequence of data
fields. Thus, there is one additional level of hierarchy in the encoding.
Not only that, but in their infinite wisdom, the designers of MAP have also
failed to include versioning information in each and every message header.
Instead, it is part of the application context name, which is only
part of the first message of every conversation.
Furthermore, different versions of the MAP specifications disagree on
whether certain fields are deemed optional or not. This is further
complicated by somewhat strange versioning habits. There is the Revision
number of the TS 09.02 (like 3.8.10), then there is a different version
number encoded in the corresponding ASN.1 files like 'version9(9)' and
individual operations then have v1/v2/v3 in their application context
Some even more wiser decision must have been to remove the description
of older messages from the later versions of the specifications. So even
specifications published in the year 2000 no longer include definitions of
messages that were still part 5 years earlier. Why does it matter? Because
today, in 2011, you still see MAP message on the international SS7 interfaces
that are encoded in some of the earliest versions of the MAP protocol!
And if all of this was not enough, the biggest bummer is: For most of
the releases of the specification, the ASM.1 text files are not distributed
separately, but they are interspersed with human-readable text in the
actual specification documents (which can be 600 pages long, nothing you want
Even worse: If you go to the ETSI homepage and download the PDF version of old
09.02 specs, they will actually provide a PDF with a scanned paper print-out,
i.e. no searching and no copy+pasting.
Luckily, the 3GPP has made the history of 3.8.0 and later available on their FTP
server. But they are in MS Word for DOS format, like they were written
originally. This format can not be opened by OpenOffice, and as far as I
know not even by any of the Windows Word versions that MS has released in
the last 10 years.
So what did I do? I actually installed MS
Word 5.5 for DOS (provided as Freeware from Microsoft) and ran it in
DOSEMU, to convert the specs into RTF format. This way I can at least open
them and look at them in a modern text processor.
But this still does not solve the copy+paste problem.
I finally found antiword, but it
mainly focuses on Word for Windows files and only does rudimentary text
extraction from Word for DOS files. But hey, there is an online copy of
chapter 16 from the File Formats Handbook, apparently published by
Dr.Dobb's (who remembers them!!) at some time in the past.
So what did I do? I wrote some custom parser for those old Word/DOS files,
which parses the paragraph format descriptions and tries to identify those
sections that contain the ASN.1 code. As they are almost the only part in the
specification that is enclosed with a border line on all four pages, this
should work pretty fine. Early results are quite promising!
My hope is now that the ETSI stylesheets did not change too much over time,
i.e. that this parser will be able to extract the ASN.1 spec for all of the
protocol versions that I can find. If that works, I can run them through
a validator, then pretty-print them and putt them all in one git tree in
chronological order. And maybe at some point in 2011, we will have the
marvels of an unified diff between two different MAP versions. The strange part is: Diff was developed in the 1970ies, GSM in the late 1980ies. They should have known about it back then, and used a revision control system like SCCS to record all the changes in the specification they make.
I guess this all is a glimpse how a digital archaeologist of the 22nd century
must feel when analyzing ancient artefacts and trying to understand what the
heck his ancestors have been doing back then.
UPDATE: The tool can be found at http://cgit.osmocom.org/cgit/asn1_docextract/
Travelling to Belem/Brazil to talk about OpenPCD and OsmocomBB at UFPA
Tomorrow I'll be leaving for a 10-day trip to the signal processing lab of UFPA (Federal
University of Para) in Belem, Brazil. I was kindly invited by Prof.
Aldebaro Klautau to hold some lectures and lab exercises regarding Free
Software (+Hardware) RFID projects like OpenPCD as well as Free Software GSM
projects like OsmocomBB.
I would love to use that opportunity to spend some more time in Brazil for
holidays, but my schedule really doesn't allow for anything like that at
this time. It's always sad to have to miss such a chance. It would be exactly
the right time of the year to spend some time at the beaches of Pernambuco or