There are 42 less commonly used accented and greek chars in GSM alphabet.
A string for testing all GSM chars, in python
most_of_gsm = """@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !"#¤%&'()*+,-./ 0123456789 :;<=>?¡ ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿ abcdefghijklmnopqrstuvwxyzäöñüà"""
Accented and greek and what we could call more unusual or funny chars:
£ ¥ èéùìòÇØøÅå Δ ΦΓΛΩΠΨΣΘΞÆæßÉ ¤ ¡ ÄÖÑܧ ¿ äöñüà
More normal chars quite common use:
@ $ _ !"# %&'()*+,-./ 0123456789 :;<=>?
ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz
Extended GSM chars and euro € is the newest addition to this.
^ {}\ [~] | €
Excluding CTRL chars \n \r and extended ctrl chars.
https://en.wikipedia.org/wiki/GSM_03.38
https://en.wikipedia.org/wiki/Data_Coding_Scheme
# Including CTRL chars 0x0a \x0a LF, 0x0d \x0d DR, not including 0x1b \x1b ESC
# and spaces just to space out or annoy
alphabet = """ @£$¥èéùìòÇ\nØø\rÅå Δ_ΦΓΛΩΠΨΣΘΞ ÆæßÉ !"#¤%&'()*+,-./ 0123456789 :;<=>? ¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ ¿abcdefghijklmnopqrstuvwxyzäöñüà """
exp_alphabet = """ \x00\x01\x02\x03""" + \
"""\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f """ + \
"""\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a \x1c\x1d\x1e\x1f """ + \
""" !"#\x24%&'()*+,-./ 0123456789 :;<=>? \x40ABCDEFGHIJKLMNOPQRSTUVWXYZ""" + \
"""\x5b\x5c\x5d\x5e\x5f """ + \
"""\x60abcdefghijklmnopqrstuvwxyz""" + \
"""\x7b\x7c\x7d\x7e\x7f """
# extended = escaped
alphabet = """^ {}\ [~] |€"""
exp_alphabet = """\x1b\x14 \x1b\x28\x1b\x29\x1b\x2f \x1b\x3c\x1b\x3d\x1b\x3e \x1b\x40\x1b\x65"""
How to decode a SMS-DELIVER GSM message ?
From the TCAP, we extract the UserInfo part of MT_ForwardSM_Arg
Just a test string and bogus OA DA inside ...
RPUI =
04 0c 91 01 10 00 00 00 00 00 00 52 01 70 71 01
41 40 78 04 02 81 40 20 10 08 04 02 81 40 20 10
08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 04
02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 81
40 20 10 08 04 02 81 40 20 10 08 04 02 81 40 20
10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 08
04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 02
81 40 20 10 08 04 02 81 40 20 10 08
RPUI =
040c9101100000000000005201707101414078040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008
echo -n "04 0c 91 01 10 00 00 00 00 00 00 52 01 70 71
01 41 40 78 04 02 81 40 20 10 08 04 02 81 40 20
10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 08
04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 02
81 40 20 10 08 04 02 81 40 20 10 08 04 02 81 40
20 10 08 04 02 81 40 20 10 08 04 02 81 40 20 10
08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 04
02 81 40 20 10 08 04 02 81 40 20 10 08 " | xxd -r -p |wc
How can we decode this ?
Cheat: we know this should contain a 120 char message, 120 same e-grave chars.
e-grave char encoding in GSM is 4.
so we think in this string we have 120 chars value 4 repeated
and we think each char is using 7bits
Note, Wireshark .. unfortunately this was not SCTP encapsulated but TCP/IP encapsulated. I had an older wireshark with a change in decoding plugins hand built that could decode SS7, but, on an old machine now :-(. See below found an online docoder that worked BUT it doesn't explain how it is done.
Gemini .. gave some good pointers on decoding and referenced the right specs but this was a hard task for it and it was hallucinating and making up[ decodings more often than giving good info.
ETSI 3GPP standards 3GPP TS 23.040 and 3GPP TS 23.038
We can decode the PDU with info in these 2 standards.
https://www.etsi.org › deliver › etsi_ts
The SM-RP-UI is of type OctetString and it shall contain a short message transfer protocol data unit (TPDU) which is defined in 3GPP TS 23.040
https://www.etsi.org/deliver/etsi_ts/129300_129399/129338/16.00.00_60/ts_129338v160000p.pdf
SM-RP-UI AVP:3301 6.3.3.3 OctetString M, V No
6.3.3.3 SM-RP-UI
The SM-RP-UI is of type OctetString and it shall contain a short message transfer protocol data unit (TPDU) which is
defined in 3GPP TS 23.040 [3] and represents the user data field carried by the short message service relay sub-layer
protocol. Its maximum length is of 200 octets.
9.2.2.1 SMS‑DELIVER type Basic elements of the SMS‑DELIVER type:
..
TP‑DCS TP‑Data‑Coding‑Scheme M o
Parameter identifying the coding scheme within the TP‑User‑Data.
..
TP‑UDL TP‑User‑Data‑Length M I
Parameter indicating the length of the TP‑User‑Data field to follow.
TP‑UD TP‑User‑Data O 3)
1) Provision; Mandatory (M) or Optional (O).
2) Representation; Integer (I), bit (b), 2 bits (2b), Octet (o), 7 octets (7o), 2‑12 octets (2‑12o).
3) Dependent on the TP‑DCS.
9.2.3 Definition of the TPDU parameters
9.2.3.1 TP‑Message‑Type‑Indicator (TP‑MTI)
..
please decode this SS7 message .. finally hand decode to confirm :-7 3GPP TS 23.040
04 7c just before it, 04 is TP-MTI etc I think 7c is length 124 Bytes
7c = 124 == TP‑UDL, octets .. ?
sm_RP_UI =
04 0c 91 01 10 00 00 00 00 00 00 52 01 70 71 01
41 40 78 04 02 81 40 20 10 08 04 02 81 40 20 10 FROM HERE 04 02 81 40 20 10 08 seq repeats 15 times 15*7octets = 70+35=105
15*7*8=bits /7 =15*8septets =120septets = 120e-grave encoded as 0x04 PHEW!
08 04 02 81 40 20 10 08 04 02 81 40 20 10 08 04
02 81 40 20 10 08 04 02 81 40 20 10 08 04 02 81
40 20 10 08 04 02 81 40 20 10 08 04 02 81 40 20
10 08 04 02 81 40 20 10 08 04 02 81 40 20 10 08
04 02 81 40 20 10 08 04 02 81 40 20 10 08 04 02
81 40 20 10 08 04 02 81 40 20 10 08
0x78 = 120 decimal, 120 septets in sequence == TP‑User‑Data‑Length (TP‑UDL)
TP‑User‑Data‑Length (TP‑UDL)
If the TP‑User‑Data is coded using the GSM 7 bit default alphabet, the TP‑User‑Data‑Length field gives an integer representation of the number of septets within the TP‑User‑Data field to follow.
04 02 81 40 20 10 08 seq repeats 15 times exactly, handy,
7 octets
00000100 00000010 10000001 01000000 00100000 00010000 00001000
7 octet to 8 septet conversion . . carry msbs to lsb of next screwball ASN.1 encoding stuff
0000100 000010 0 00001 00 0000 100 000 0100 00 00100 0 000100 0000100
0000100 0000100 0000100 0000100 0000100 0000100 0000100 0000100
04 04 04 04 04 04 04 04
9.1.2.4 Alphanumeric representation
A field which uses alphanumeric representation shall consist of a number of 7‑bit characters represented as the default alphabet defined in 3GPP TS 23.038 [9].
AND, in 3GPP TS 23.038 we can find how septets are packed into bytes:
6.1.2 Character packing
6.1.2.1 SMS Packing
6.1.2.1.1 Packing of 7-bit characters
e.g.eight characters in seven octets:
- bits number:
7 6 5 4 3 2 1 0
2g 1a 1b 1c 1d 1e 1f 1g
3f 3g 2a 2b 2c 2d 2e 2f
4e 4f 4g 3a 3b 3c 3d 3e
5d 5e 5f 5g 4a 4b 4c 4d
6c 6d 6e 6f 6g 5a 5b 5c
7b 7c 7d 7e 7f 7g 6a 6b
8a 8b 8c 8d 8e 8f 8g 7a
SO, TP-UD encoding IS NOT ASN.1, it's defined in 3GPP TS 23.040, it depends on the DCS
https://www.w3.org/Protocols/HTTP-NG/asn1.html
https://www.erlang.org/doc/apps/asn1/asn1_getting_started#character-strings
https://www.oss.com/asn1/resources/asn1-made-simple/asn1-quick-reference/packed-encoding-rules.html
"In the UNALIGNED variant, character strings are encoded in the fewest number of bits necessary"
https://coolaj86.com/articles/asn1-for-dummies/ with encoding to bytes examples
Gemini managed to explain this ok:
No, the TP-UD (Transfer Protocol - User Data) field, which contains the actual text of the SMS, is not defined using ASN.1 (Abstract Syntax Notation One).
The TP-UD field is defined directly within the constraints of the SMS-TPDU (3GPP TS 23.040) and its contents are governed by the Data Coding Scheme (TP-DCS).
How the TP-UD is Defined and Encoded
The TP-UD is simply a sequence of bytes. Its structure and meaning are dictated entirely by the preceding fields in the SMS-TPDU:
1. TP-UDL (User Data Length)
The TP-UDL field specifies the length of the TP-UD field.1
If the message uses GSM 7-bit encoding (the default alphabet), the length is measured in 7-bit characters. The length is then packed into the minimum number of bytes required (e.g., 140 characters fit into 160 bytes).
If the message uses UCS-2/UTF-16 encoding, the length is measured in bytes.
2. TP-DCS (Data Coding Scheme)
The TP-DCS is the single most important field for decoding the TP-UD. This byte specifies the alphabet used:
TP-DCS Value | Encoding Standard | Max Characters (Single Segment) |
0x00 - 0x03 | GSM 7-bit Default Alphabet | 160 characters |
0x08 | UCS-2 (UTF-16) | 70 characters (since each character takes two bytes) |
0x04 | 8-bit Data | 140 bytes |
The TP-UD is simply the raw byte stream of the text, encoded according to the rule set by the TP-DCS.
Contrast with MAP/TCAP (The ASN.1 Layers)
The reason for the confusion is that the protocols surrounding the SMS-TPDU are defined using ASN.1:
TCAP (Transaction Capabilities Application Part): The TCAP dialogue and components are defined entirely in ASN.1.2
MAP (Mobile Application Part): MAP operations (like mo-ForwardSM
) and their parameters are defined using ASN.1.
The SMS-TPDU itself is often wrapped in a MAP Parameter:3 The entire SMS-TPDU (including the TP-UD) is usually embedded inside a MAP ASN.1 construct as an OCTET STRING parameter. The ASN.1 defines the container but not the internal content of the TP-UD field itself.
This ONLINE Decoder worked: but doesn't breakdown the decode so you cannot rely on it for testing
https://www.smsdeliverer.com/online-sms-pdu-decoder.aspx
sm_RP_UI =
040c9101100000000000005201707101414078040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008040281402010080402814020100804028140201008
If the TP‑User‑Data is coded using compressed GSM 7 bit default alphabet or compressed 8 bit data or compressed UCS2 [24] data, the TP‑User‑Data‑Length field gives an integer representation of the number of octets after compression within the TP‑User‑Data field to follow.
YEOW
It's not compressed though.
gsm_map_with_PasteSS7egrave5.pcap
# this did not quite work out! :-(
�ò�\0\0\0\0\0\0\0\0\0\0\0�\0\0\0\0��\0\0��
\0\0\0\0\0\0E\0 4\0\0���t X
X\0\0\0\06\0\0�\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0�\0\0у(b! \0
�\0r(\0 \0r(`Ale��H�I� +l�����,0���0��00| �\0\0\0\0\0\0RpqA@x�@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ �@ \0\0 pad with \0\0 ?
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Wireshark https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html
I _think_ total bytes including wireshark file head = 321 bytes
file header 24 octets and packet record 16 octets (including 2 4octet packet lengths)
timestamps ��\0\0�� \0
packet lengths �\0\0\0 �\0\0\0 OR �\0\0\0 �\0\0\0 .. NEED TO INCREASE
�\0\0\0 = 218, change to 281 = 256 + 25 = \0\0 I think
try more ? 282 ?
try more again 283? ? 0x1b 0x01
I _think_ total bytes excluding wireshark file + packet heads = 281 bytes
Ethernet \0
IP4 E\0�4\0\0���t
including tot length in 3rd 4th octets � OR OR ..
01cc == total length 460 it says IPv4 tot length exceeds packet length (204 bytes
THEN 267 bytes) so change to 256+11 = + CTRL-I a TAB I think ?
no, not ctrl-I ctrl-K
IPV4 length is the length from the 45(=E) to the end.
256+12 =
and increase by 1 again 256+13 =
SCTP X X\0\0\0\06\0\0�\0\0\0\0\0\0\0\0\0\0\0
sport 2904 X dport 2904 X vtag:\0\0\0\0 cksum:6
DATA CHUNK ctype:\0(data) cflag:(3)
clen:\0� = 172 NEED TO INCREASE THIS \0œ ?? chunk length œž ??
clen:\0� = 00ac it says payload length: 156 chunk len 00ac=172
\254 octal = 10101100 = 0xac yep duh = 172
172+60 about right 232 = 0xe9 ? = 11101001 = \351
emacs how insert C−q→Type ∖x1b→RET oh right \xe9 \xe9 _\xe9 C-q
(quoted-insert) command. C-q 033 C-q 0351 é HAH ironic it
inserted 2 bytes, utf8 e-grave c3a9
235 = 0xec echo "e9 ec" |xxd -r -p -g1 >> ~/Downloads/gsm_map_with_PasteSS7egrave4.pcap
189=������ � - 0xea try � = 0xeb
e9 - 4 gives us e5 ? too short by lots by 6 ? e9+2=eb �
e9 to ea ������
MTP2\0\0\0�\0\0�
01 version, 00reserved, 06class, 01DATA,
of which 0000009c=message length 156, 0300=param protocol data 1 0092=param len 146
NEED to increase +40 ? +60 ? yep 9c to 0xda and 92 to 0xd0 or thereabouts
cf to d3 are: � d0 ����
d4 to df are: ������ da ������
tried d0 and da � � .. TRY d3 and dd � �
����������������������������������
echo "c2cc c3cd c4ce c5cf c6d0 c7d1 c8d2 c9d3 cad4 cbd5 ccd6 cdd7 ced8 cfd9 d0da
d1db d2dc" |xxd -r -p -g1 >> ~/Downloads/gsm_map_with_PasteSS7egrave4.pcap
TRIED DB D1 �\0\0у(b SUBTRACT 2 try d9cf \331\317 ��
BAHH OK the 9c includes from start of MTP2 count 94hex to end packet
BAHH OK the 92 includes 2bytes for itself count 90hex to end of packet
d3 to end of packet => use d3+8 = db
cf to end of packet => use cf+2 = d1 doublecheck 92+a=9c d1+a=db
dbd1 is �� MAYBE add 1 dcd2 is �� maybe sub 1, sub 2
dad0 �� d9cf ��
MTP3�(b!
MTP3�(b!
MTP3�(b!
SCCP - there are 3 pointers to CdPa CgPa and ?
WHAT is that 0x6c just before tcap part pasted in 0x65etcetc ?
wireshark highlights as if it is a length ?
CHANGE IT TO AC ???
SCCP \0
�\0r(\0 \0r(`Ale��H
3 POINTERS:
CdPa: �\0r(\0
CgPa: \0r(`A
HUH: l....
0x6c = l might be length .. ac is � I think .. 10101100 YEP
need another 2 ? ae \256 needed �
MAYBE NOT
TCAP
65 81 aa ..
otid dtid components:1 item == GSM MAP
01 01 2c opcode 44
81? 8f ?
DA 80 06 013011111111
OA 04 07 91020130101030
0x04 0x7c after OA .. 0x7c maybe is length of PDU.
We get contained item exceeds length of containing item error message
No comments:
Post a Comment