Thursday, 9 October 2025

GSM SS7 message encode/decode, GSM alphabet

There are 42 less commonly used accented and greek chars in GSM alphabet. 

A string for testing all GSM chars, in python

most_of_gsm = """@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !"#¤%&'()*+,-./ 0123456789 :;<=>?¡ ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿ abcdefghijklmnopqrstuvwxyzäöñüà"""
 

Accented and greek and what we could call more unusual or funny chars:

    £ ¥ èéùìòÇØøÅå Δ ΦΓΛΩΠΨΣΘΞÆæßÉ ¤ ¡ ÄÖÑܧ ¿ äöñüà
 

More normal chars quite common use:
    @ $ _ !"# %&'()*+,-./ 0123456789 :;<=>? 
    ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
 

Extended GSM chars and euro € is the newest addition to this.

     ^ {}\ [~] |  €
 

Excluding CTRL chars \n \r and extended ctrl chars.
 

https://en.wikipedia.org/wiki/GSM_03.38
https://en.wikipedia.org/wiki/Data_Coding_Scheme 

 

# Including CTRL chars 0x0a \x0a LF, 0x0d \x0d DR, not including 0x1b \x1b ESC

# and spaces just to space out or annoy 

alphabet = """ @£$¥èéùìòÇ\nØø\rÅå Δ_ΦΓΛΩΠΨΣΘΞ ÆæßÉ  !"#¤%&'()*+,-./ 0123456789 :;<=>? ¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ ¿abcdefghijklmnopqrstuvwxyzäöñüà """

exp_alphabet = """ \x00\x01\x02\x03""" + \
            """\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f """ + \
            """\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a \x1c\x1d\x1e\x1f """ + \
            """ !"#\x24%&'()*+,-./ 0123456789 :;<=>? \x40ABCDEFGHIJKLMNOPQRSTUVWXYZ""" + \
            """\x5b\x5c\x5d\x5e\x5f """ + \
            """\x60abcdefghijklmnopqrstuvwxyz""" + \
            """\x7b\x7c\x7d\x7e\x7f """


 

# extended = escaped 

alphabet = """^ {}\ [~] |€"""

exp_alphabet = """\x1b\x14 \x1b\x28\x1b\x29\x1b\x2f \x1b\x3c\x1b\x3d\x1b\x3e \x1b\x40\x1b\x65"""

 

How to decode a SMS-DELIVER GSM message ? 

From the TCAP, we extract the  UserInfo part of MT_ForwardSM_Arg

Just a test string and bogus OA DA inside ...

RPUI = 
04 0c 91 01 10 00 00 00 00 00 00 52 01 70 71 01 
41 40 78 04 02 81 40 20 10 08 04 02 81 40 20 10 
08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 
02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 81 
40 20 10 08 04 02 81 40 20 10 08 04 02 81 40 20 
10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 
04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 
81 40 20 10 08 04 02 81 40 20 10 08 
 

RPUI = 
040c9101100000000000005201707101414078040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008

echo -n "04 0c 91 01 10 00 00 00 00 00 00 52 01 70 71 
01 41 40 78 04 02 81 40 20 10 08 04 02 81 40 20 
10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 
04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 
81 40 20 10 08 04 02 81 40 20 10 08 04 02 81 40 
20 10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 
08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 
02 81 40 20 10 08 04 02 81 40 20 10 08 " | xxd -r -p |wc 

How can we decode this ?

Cheat: we know this should contain a 120 char message, 120 same e-grave chars. 

   e-grave char encoding in GSM is 4.  

   so we think in this string we have 120 chars value 4 repeated

   and we think each char is using 7bits  

 

Note, Wireshark .. unfortunately this was not SCTP encapsulated but TCP/IP encapsulated. I had an older wireshark with a change in decoding plugins hand built that could decode SS7, but, on an old machine now :-(. See below found an online docoder that worked BUT it doesn't explain how it is done.

Gemini .. gave some good pointers on decoding and referenced the right specs but this was a hard task for it and it was hallucinating and making up[ decodings more often than giving good info.

 

ETSI 3GPP standards 3GPP TS 23.040 and 3GPP TS 23.038 

We can decode the PDU with info in these 2 standards. 

https://www.etsi.org › deliver › etsi_ts

The SM-RP-UI is of type OctetString and it shall contain a short message transfer protocol data unit (TPDU) which is defined in 3GPP TS 23.040
https://www.etsi.org/deliver/etsi_ts/129300_129399/129338/16.00.00_60/ts_129338v160000p.pdf

SM-RP-UI AVP:3301 6.3.3.3 OctetString M, V No 
6.3.3.3 SM-RP-UI
The SM-RP-UI is of type OctetString and it shall contain a short message transfer protocol data unit (TPDU) which is
defined in 3GPP TS 23.040 [3] and represents the user data field carried by the short message service relay sub-layer
protocol. Its maximum length is of 200 octets.

9.2.2.1    SMS‑DELIVER type Basic elements of the SMS‑DELIVER type:

..

TP‑DCS TP‑Data‑Coding‑Scheme M o

   Parameter identifying the coding scheme within the TP‑User‑Data.

..

TP‑UDL TP‑User‑Data‑Length M I
   Parameter indicating the length of the TP‑User‑Data field to follow.

TP‑UD TP‑User‑Data O 3)
   
1)    Provision;    Mandatory (M) or Optional (O).
2)    Representation;    Integer (I), bit (b), 2 bits (2b), Octet (o), 7 octets (7o), 2‑12 octets (2‑12o).
3)    Dependent on the TP‑DCS.

 

9.2.3    Definition of the TPDU parameters
9.2.3.1    TP‑Message‑Type‑Indicator (TP‑MTI)

..

please decode this SS7 message .. finally hand decode to confirm :-7 3GPP TS 23.040
04 7c just before it, 04 is TP-MTI etc I think 7c is length 124 Bytes
  7c = 124 == TP‑UDL, octets .. ?
sm_RP_UI = 
04 0c 91 01 10 00 00 00 00 00 00 52 01 70 71 01 
41 40 78 04 02 81 40 20 10 08 04 02 81 40 20 10   FROM HERE 04 02 81 40 20 10 08 seq repeats 15 times 15*7octets = 70+35=105
   15*7*8=bits /7 =15*8septets  =120septets = 120e-grave encoded as 0x04 PHEW!
08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 
02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 81 
40 20 10 08 04 02 81 40 20 10 08 04 02 81 40 20 
10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 
04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 
81 40 20 10 08 04 02 81 40 20 10 08 

0x78 = 120 decimal, 120 septets in sequence == TP‑User‑Data‑Length (TP‑UDL)

TP‑User‑Data‑Length (TP‑UDL)
If the TP‑User‑Data is coded using the GSM 7 bit default alphabet, the TP‑User‑Data‑Length field gives an integer representation of the number of septets within the TP‑User‑Data field to follow.

 

04 02 81 40 20 10 08 seq repeats 15 times exactly, handy,

7 octets

00000100 00000010 10000001 01000000 00100000 00010000 00001000

7 octet to 8 septet conversion . . carry msbs to lsb of next screwball ASN.1 encoding stuff

0000100 000010 0 00001 00 0000 100 000 0100 00 00100 0 000100 0000100

0000100 0000100 0000100 0000100 0000100 0000100 0000100 0000100

04 04 04 04 04 04 04 04

9.1.2.4    Alphanumeric representation
A field which uses alphanumeric representation shall consist of a number of 7‑bit characters represented as the default alphabet defined in 3GPP TS 23.038 [9].

AND, in 3GPP TS 23.038 we can find how septets are packed into bytes:

6.1.2    Character packing
6.1.2.1    SMS Packing
6.1.2.1.1    Packing of 7-bit characters

e.g.eight characters in seven octets:

- bits number:

7 6 5 4 3 2 1 0

2g 1a 1b 1c 1d 1e 1f 1g

3f 3g 2a 2b 2c 2d 2e 2f

4e 4f 4g 3a 3b 3c 3d 3e

5d 5e 5f 5g 4a 4b 4c 4d

6c 6d 6e 6f 6g 5a 5b 5c

7b 7c 7d 7e 7f 7g 6a 6b

8a 8b 8c 8d 8e 8f 8g 7a

 

SO, TP-UD encoding IS NOT ASN.1, it's defined in 3GPP TS 23.040, it depends on the DCS

https://www.w3.org/Protocols/HTTP-NG/asn1.html 

https://www.erlang.org/doc/apps/asn1/asn1_getting_started#character-strings

https://www.oss.com/asn1/resources/asn1-made-simple/asn1-quick-reference/packed-encoding-rules.html

"In the UNALIGNED variant, character strings are encoded in the fewest number of bits necessary"

https://coolaj86.com/articles/asn1-for-dummies/ with encoding to bytes examples

Gemini managed to explain this ok:

No, the TP-UD (Transfer Protocol - User Data) field, which contains the actual text of the SMS, is not defined using ASN.1 (Abstract Syntax Notation One).

The TP-UD field is defined directly within the constraints of the SMS-TPDU (3GPP TS 23.040) and its contents are governed by the Data Coding Scheme (TP-DCS).

How the TP-UD is Defined and Encoded

The TP-UD is simply a sequence of bytes. Its structure and meaning are dictated entirely by the preceding fields in the SMS-TPDU:

1. TP-UDL (User Data Length)

The TP-UDL field specifies the length of the TP-UD field.1

If the message uses GSM 7-bit encoding (the default alphabet), the length is measured in 7-bit characters. The length is then packed into the minimum number of bytes required (e.g., 140 characters fit into 160 bytes).

If the message uses UCS-2/UTF-16 encoding, the length is measured in bytes.

2. TP-DCS (Data Coding Scheme)

The TP-DCS is the single most important field for decoding the TP-UD. This byte specifies the alphabet used:

TP-DCS ValueEncoding StandardMax Characters (Single Segment)
0x00 - 0x03GSM 7-bit Default Alphabet160 characters
0x08UCS-2 (UTF-16)70 characters (since each character takes two bytes)
0x048-bit Data140 bytes

The TP-UD is simply the raw byte stream of the text, encoded according to the rule set by the TP-DCS.


Contrast with MAP/TCAP (The ASN.1 Layers)

The reason for the confusion is that the protocols surrounding the SMS-TPDU are defined using ASN.1:

TCAP (Transaction Capabilities Application Part): The TCAP dialogue and components are defined entirely in ASN.1.2

MAP (Mobile Application Part): MAP operations (like mo-ForwardSM) and their parameters are defined using ASN.1.

The SMS-TPDU itself is often wrapped in a MAP Parameter:3 The entire SMS-TPDU (including the TP-UD) is usually embedded inside a MAP ASN.1 construct as an OCTET STRING parameter. The ASN.1 defines the container but not the internal content of the TP-UD field itself.

 

This ONLINE Decoder worked: but doesn't breakdown the decode so you cannot rely on it for testing

https://www.smsdeliverer.com/online-sms-pdu-decoder.aspx

sm_RP_UI = 
040c9101100000000000005201707101414078040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008

 

If the TP‑User‑Data is coded using compressed GSM 7 bit default alphabet or compressed 8 bit data or compressed UCS2 [24] data, the TP‑User‑Data‑Length field gives an integer representation of the number of octets after compression within the TP‑User‑Data field to follow.
YEOW

It's not compressed though.

 

 

 

 

gsm_map_with_PasteSS7egrave5.pcap 

# this did not quite work out! :-(

�ò�\0\0\0\0\0\0\0\0\0\0\0�\0\0\0\0��\0\0��    \0\0\0\0\0\0E\0 4\0\0���t X X\0\0\0\06\0\0�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\0\0у(b!    \0
�\0r(\0 \0r(`Ale��H�I� +l�����,0���0��00| �\0\0\0\0\0\0RpqA@x�@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ \0\0 pad with \0\0 ?
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Wireshark https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html
 I _think_ total bytes including wireshark file head = 321 bytes
 file header 24 octets and packet record 16 octets (including 2 4octet packet lengths)
    timestamps ��\0\0��    \0
    packet lengths �\0\0\0 �\0\0\0   OR �\0\0\0 �\0\0\0  .. NEED TO INCREASE
      �\0\0\0 = 218, change to 281 = 256 + 25 = \0\0 I think
        try more ? 282 ?
    try more again 283? ? 0x1b 0x01
 I _think_ total bytes excluding wireshark file + packet heads = 281 bytes
 Ethernet \0
 IP4 E\0�4\0\0���t
    including tot length in 3rd 4th octets � OR OR ..
    01cc == total length 460 it says IPv4 tot length exceeds packet length (204 bytes
       THEN 267 bytes) so change to 256+11 =        + CTRL-I a TAB I think ?
         no, not ctrl-I ctrl-K    
       IPV4 length is the length from the 45(=E) to the end.
         256+12 =  
     and increase by 1 again         256+13 =
 SCTP  X X\0\0\0\06\0\0�\0\0\0\0\0\0\0\0\0\0\0
    sport 2904  X dport 2904  X vtag:\0\0\0\0 cksum:6
    DATA CHUNK ctype:\0(data) cflag:(3)
       clen:\0� = 172 NEED TO INCREASE THIS   \0œ ?? chunk length œž ??
       clen:\0� = 00ac it says payload length: 156 chunk len 00ac=172
       \254 octal = 10101100 = 0xac yep duh = 172
         172+60 about right 232 = 0xe9 ? = 11101001 = \351
     emacs how insert C−q→Type ∖x1b→RET oh right \xe9 \xe9 _\xe9 C-q (quoted-insert) command.   C-q 033   C-q 0351   é HAH ironic  it inserted 2 bytes, utf8 e-grave c3a9
     235 = 0xec   echo "e9 ec" |xxd -r -p -g1 >> ~/Downloads/gsm_map_with_PasteSS7egrave4.pcap
     189=������   � - 0xea  try � = 0xeb
     e9 - 4 gives us e5 ? too short by lots by 6 ? e9+2=eb �
     e9 to ea ������
 MTP2\0\0\0�\0\0�
     01 version, 00reserved, 06class, 01DATA,
     of which 0000009c=message length 156, 0300=param protocol data 1 0092=param len 146
       NEED to increase +40 ? +60 ? yep 9c to 0xda and 92 to 0xd0 or thereabouts
       cf to d3 are: � d0 ����
       d4 to df are: ������ da ������
       tried d0 and da � � .. TRY d3 and dd � �
    ����������������������������������
    echo "c2cc c3cd c4ce c5cf c6d0 c7d1 c8d2 c9d3 cad4 cbd5 ccd6 cdd7 ced8 cfd9 d0da 
d1db d2dc" |xxd -r -p -g1 >> ~/Downloads/gsm_map_with_PasteSS7egrave4.pcap
      TRIED DB D1 �\0\0у(b   SUBTRACT 2 try d9cf \331\317 ��
      BAHH OK the 9c includes from start of MTP2 count 94hex to end packet
      BAHH OK the 92 includes 2bytes for itself count 90hex to end of packet
         d3 to end of packet => use d3+8 = db
         cf to end of packet => use cf+2 = d1  doublecheck 92+a=9c d1+a=db
     dbd1 is ��   MAYBE add 1 dcd2 is ��  maybe sub 1, sub 2
     dad0 �� d9cf  ��
     
 MTP3�(b!
 MTP3�(b!
 MTP3�(b!
 SCCP  - there are 3 pointers to CdPa CgPa and ?
   WHAT is that 0x6c just before tcap part pasted in 0x65etcetc ?
   wireshark highlights as if it is a length ?
   CHANGE IT TO AC ???
 SCCP     \0
�\0r(\0 \0r(`Ale��H
      3 POINTERS:
      CdPa: �\0r(\0
      CgPa:  \0r(`A
      HUH:  l....
      0x6c = l might be length  .. ac is � I think .. 10101100 YEP
         need another 2 ? ae \256 needed �
      MAYBE NOT
   
 TCAP
   65 81 aa ..
   otid dtid components:1 item == GSM MAP
   01 01 2c opcode 44
   81? 8f ? 
   DA 80 06 013011111111
   OA 04 07 91020130101030
   0x04 0x7c after OA .. 0x7c maybe is length of PDU.
We get contained item exceeds length of containing item error message


 

No comments: