动态库链接顺序

动态链接库链接顺序

总结

概念

  1. -L 用于链接时指定显示链接的库的搜索路径(优先级高)
  2. -rpath 用于在链接时指定直接或间接链接的库搜索路径(最高优先级),并且(写入二进制文件中RPATH)指定运行时的本二进制文件的直接或间接依赖的动态加载库搜索路径(最高优先级)。注:有些系统默认开启链接选项-enable-new-dtags,导致-rpath生成RUNPATH。通过指定链接选项-disable-new-dtags来使其生成RPATH。
  3. -rpath-link 用于在链接时指定直接或间接链接的库搜索路径(优先级高)。
  4. LD_LIBRARY_PATH 在运行时搜索直接或间接依赖。优先级低于RPATH为第二优先级
  5. RUNPATH写入在二进制文件中,用于指定运行时本二进制文件的直接依赖动态加载库搜索路径(优先级低于LD_LIBRARY_PATH)。存在时覆盖二进制文件中RPATH。

工具

  1. ldd -s lib_path
  2. LD_DEBUG=libs lib_path
  3. readelf -d lib_path
  4. cmake 设置rpath指令
  5. chrpath -r [new_rpath] [lib_name] 注:只能改路径比原来短的.
  6. patchelf –set-rpath [new_rpath] [lib_name] 注:没有限制路径长度,但是修改的是RUNPATH

优先级

C语言运算符优先级

优先级

运算符

名称或含义

使用形式

结合方向

说明

1

[]

数组下标

数组名[常量表达式]

左到右

--

()

圆括号

(表达式)/函数名(形参表)

--

.

成员选择(对象)

对象.成员名

--

->

成员选择(指针)

对象指针->成员名

--

 

2

-

负号运算符

-表达式

右到左

单目运算符

~

按位取反运算符

~表达式

++

自增运算符

++变量名/变量名++

--

自减运算符

--变量名/变量名--

*

取值运算符

*指针变量

&

取地址运算符

&变量名

!

逻辑非运算符

!表达式

(类型)

强制类型转换

(数据类型)表达式

--

sizeof

长度运算符

sizeof(表达式)

--

 

3

/

表达式/表达式

左到右

双目运算符

*

表达式*表达式

%

余数(取模)

整型表达式%整型表达式

4

+

表达式+表达式

左到右

双目运算符

-

表达式-表达式

5

<< 

左移

变量<<表达式

左到右

双目运算符

>> 

右移

变量>>表达式

 

6

大于

表达式>表达式

左到右

双目运算符

>=

大于等于

表达式>=表达式

小于

表达式<表达式

<=

小于等于

表达式<=表达式

7

==

等于

表达式==表达式

左到右

双目运算符

=

不等于

表达式!= 表达式

 

8

&

按位与

表达式&表达式

左到右

双目运算符

9

^

按位异或

表达式^表达式

左到右

双目运算符

10

|

按位或

表达式|表达式

左到右

双目运算符

11

&&

逻辑与

表达式&&表达式

左到右

双目运算符

12

||

逻辑或

表达式||表达式

左到右

双目运算符

 

13

?:

条件运算符

表达式1?

表达式2: 表达式3

右到左

三目运算符

 

14

=

赋值运算符

变量=表达式

右到左

--

/=

除后赋值

变量/=表达式

--

*=

乘后赋值

变量*=表达式

--

%=

取模后赋值

变量%=表达式

--

+=

加后赋值

变量+=表达式

--

-=

减后赋值

变量-=表达式

--

<<=

左移后赋值

变量<<=表达式

--

>>=

右移后赋值

变量>>=表达式

--

&=

按位与后赋值

变量&=表达式

--

^=

按位异或后赋值

变量^=表达式

--

|=

按位或后赋值

变量|=表达式

--

 

15

逗号运算符

表达式,表达式,…

左到右

--

说明:

同一优先级的运算符,运算次序由结合方向所决定。
简单记就是:! > 算术运算符 > 关系运算符 > && > || > 赋值运算符

windows

  1. win10命令行打开当前文件夹界面的方法

命令行进入文件夹后执行

explore.exe .

Shell for

FOR

Windows bat脚本的for语句基本形态如下:

1
在cmd窗口中:for %I in (command1) do command2 在批处理文件中:for %%I in (command1) do command2

之所以要区分cmd窗口和批处理文件两种环境,是因为在这两种环境下,命令语句表现出来的行为虽然基本一样,但是在细节上还是稍有不同,最明显的一个差异就是:在cmd窗口中,for之后的形式变量I必须使用单百分号引用,即%I;而在批处理文件中,引用形式变量I必须使用双百分号,即%%I。为了方便起见,若不是特别强调,以下的讲解都以批处理文件环境为例。

我们先来看一下for语句的基本要素都有些什么:

  1. for、in和do是for语句的关键字,它们三个缺一不可;
  2. %%I是for语句中对形式变量的引用,即使变量l在do后的语句中没有参与语句的执行,也是必须出现的;
  3. in之后,do之前的括号不能省略;
  4. command1表示字符串或变量,command2表示字符串、变量或命令语句;

  下面来看一个Windows bat脚本的demo(记为demo1):

1
2
3
@echo off
for %%I in (ABC) do echo %%I
pause

保存为.bat文件(批处理文件)并执行,将会在弹出的批处理窗口中看到这样的信息:

对批处理文件的for循环就是这样简单,下面来看看for语句的注意事项,并运行更复杂的for循环实例。

  1. for语句的形式变量I,可以换成26个字母中的任意一个,这些字母会区分大小写,也就是说,%%I和%%i会被认为不是同一个变量;形式变量I还可以换成其他的字符,但是,为了不与批处理中的%0~%9这10个形式变量发生冲突,请不要随意把%%I替换为%%0~%%9中的任意一个;

  2. in和do之间的command1表示的字符串或变量可以是一个,也可以是多个,每一个字符串或变量,我们称之为一个元素,每个元素之间,用空格键、跳格键、逗号、分号或等号分隔;

  3. for语句依次提取command1中的每一个元素,把它的值赋予形式变量I,带到do后的command2中参与命令的执行;并且每次只提取一个元素,然后执行一次do后的命令语句,而无论这个元素是否被带到command2中参与了command2的运行;当执行完一次do后的语句之后,再提取command1中的下一个元素,再执行一次command2,如此循环,直到command1中的所有元素都已经被提取完毕,该for语句才宣告执行结束。

    有了以上的基础,我们再来看下面这个例子,这个例子修改了demo1中的部分内容(记为demo2),结果将大不一样:

1
2
3
@echo off
for %%I in (A,B,C) do echo %%I
pause

  运行结果如下:

如果把 bbs.bathome.cn 这个字符串中的点号换为空格、跳格或等号,执行结果将和demo2的执行结果别无二致。

现在,来分析一下demo2代码中for语句的执行过程:

  1.for语句以逗号为分隔符,把 A,B,C 这个字符串切分成三个元素:A、B和C,由此决定了do后的语句将会被执行3次; 

2.第一次执行过程是这样的:先把 bbs 这个字符串作为形式变量I的值,带入do后的语句中加以执行,也就是执行 echo %%I 语句,此时的I值为A,因此,第一次执行的结果,将会在屏幕上显示A这个字符串;第二次执行和第一次执行的过程是一样的,只不过此时I的值已经被替换为command1中的第二个元素了,也就是 B 这个字符串;如此循环,当第三次echo执行完毕之后,整条for语句才算执行完毕,此时,将执行下一条语句,也就是pause命令。

 高级用法:

1)搜索当前目录下有哪些文件?

1
2
3
@echo off
for %%i in (*.*) do echo "%%i"
pause

2)搜索当前目录下所有的文本文件?

1
2
3
@echo off
for %%i in (*.txt) do echo "%%i"
pause

dd

用途

dd命令,主要功能为转换和复制文件。
在Linux中,硬件的设备驱动和特殊设备文件 也是文件;dd也可以直接读取或写入到这些文件。
dd:用指定大小的块拷贝一个文件,并在拷贝的同时进行指定的转换。

块是衡量一次读取、写入和转换字节的单位。命令行选项可以为输入/读取(ibs)和输出/写入(obs)指定一个不同的块大小,尽管块大小(bs)选项会覆盖ibs和obs选项。输入和输出的默认块大小为512字节(传统的磁盘块及POSIX规定的“块”大小)复制的count选项。

参数详解

  • if=文件名:输入文件名,缺省为标准输入。即指定源文件。< if=input file >
  • of=文件名:输出文件名,缺省为标准输出。即指定目的文件。< of=output file >
  • ibs=bytes:一次读入bytes个字节,即指定一个块大小为bytes个字节。
  • obs=bytes:一次输出bytes个字节,即指定一个块大小为bytes个字节。
  • bs=bytes:同时设置读入/输出的块大小为bytes个字节。
  • cbs=bytes:一次转换bytes个字节,即指定转换缓冲区大小。
  • skip=blocks:从输入文件开头跳过blocks个块后再开始复制。
  • seek=blocks:从输出文件开头跳过blocks个块后再开始复制。
    注意:通常只用当输出文件是磁盘或磁带时才有效,即备份到磁盘或磁带时才有效。
  • count=blocks:仅拷贝blocks个块,块大小等于ibs指定的字节数。
  • conv=conversion:用指定的参数转换文件。
  • ascii:转换ebcdic为ascii
  • ebcdic:转换ascii为ebcdic
  • ibm:转换ascii为alternate ebcdic
  • block:把每一行转换为长度为cbs,不足部分用空格填充
  • unblock:使每一行的长度都为cbs,不足部分用空格填充
  • lcase:把大写字符转换为小写字符
  • ucase:把小写字符转换为大写字符
  • swab:交换输入的每对字节
  • noerror:出错时不停止
  • notrunc:不截短输出文件
  • sync:将每个输入块填充到ibs个字节,不足部分用空(NUL)字符补齐。

dd 示例

1.将本地的/dev/hdb整盘备份到/dev/hdd

1
dd if=/dev/hdb of=/dev/hdd

2.将/dev/hdb全盘数据备份到指定路径的image文件

1
dd if=/dev/hdb of=/root/image

3.将备份文件恢复到指定盘

1
dd if=/root/image of=/dev/hdb

4.备份/dev/hdb全盘数据,并利用gzip工具进行压缩,保存到指定路径

1
dd if=/dev/hdb | gzip > /root/image.gz

5.将压缩的备份文件恢复到指定盘

1
gzip -dc /root/image.gz | dd of=/dev/hdb

6.备份与恢复MBR
备份磁盘开始的512个字节大小的MBR信息到指定文件:

1
2
dd if=/dev/hda of=/root/image count=1 bs=512
count=1指仅拷贝一个块;bs=512指块大小为512个字节。

恢复:

1
2
dd if=/root/image of=/dev/had
将备份的MBR信息写到磁盘开始部分

7.备份软盘

1
dd if=/dev/fd0 of=disk.img count=1 bs=1440k (即块大小为1.44M)

8.拷贝内存内容到硬盘

1
dd if=/dev/mem of=/root/mem.bin bs=1024 (指定块大小为1k)

9.拷贝光盘内容到指定文件夹,并保存为cd.iso文件

1
dd if=/dev/cdrom(hdc) of=/root/cd.iso

10.增加swap分区文件大小

1
2
3
4
5
6
7
8
第一步:创建一个大小为256M的文件:
dd if=/dev/zero of=/swapfile bs=1024 count=262144
第二步:把这个文件变成swap文件:
mkswap /swapfile
第三步:启用这个swap文件:
swapon /swapfile
第四步:编辑/etc/fstab文件,使在每次开机时自动加载swap文件:
/swapfile swap swap default 0 0

11.销毁磁盘数据

1
dd if=/dev/urandom of=/dev/hda1

注意:利用随机的数据填充硬盘,在某些必要的场合可以用来销毁数据。

12.测试硬盘的读写速度

1
2
dd if=/dev/zero bs=1024 count=1000000 of=/root/1Gb.file
dd if=/root/1Gb.file bs=64k | dd of=/dev/null

通过以上两个命令输出的命令执行时间,可以计算出硬盘的读、写速度。

13.确定硬盘的最佳块大小:

1
2
3
4
dd if=/dev/zero bs=1024 count=1000000 of=/root/1Gb.file
dd if=/dev/zero bs=2048 count=500000 of=/root/1Gb.file
dd if=/dev/zero bs=4096 count=250000 of=/root/1Gb.file
dd if=/dev/zero bs=8192 count=125000 of=/root/1Gb.file

通过比较以上命令输出中所显示的命令执行时间,即可确定系统最佳的块大小。

14.修复硬盘:

1
dd if=/dev/sda of=/dev/sda 或dd if=/dev/hda of=/dev/hda

当硬盘较长时间(一年以上)放置不使用后,磁盘上会产生magnetic flux point,当磁头读到这些区域时会遇到困难,并可能导致I/O错误。当这种情况影响到硬盘的第一个扇区时,可能导致硬盘报废。上边的命令有可能使这些数 据起死回生。并且这个过程是安全、高效的。

15.将一个很大的视频文件中的第i个字节的值改成0x41(也就是大写字母A的ASCII值)

1
echo A | dd of=bigfile seek=$i bs=1 count=1 conv=notrunc

SHELL 调试方法选项

Shell本身提供一些调试方法选项:

  1. -n,读一遍脚本中的命令但不执行,用于检查脚本中的语法错误。
  2. -v,一边执行脚本,一边将执行过的脚本命令打印到标准输出。
  3. -x,提供跟踪执行信息,将执行的每一条命令和结果依次打印出来。

使用这些选项有三种方法(注意:避免几种调试选项混用)

  1. 在命令行提供参数:sh -x script.sh 或者 bash -n script.sh
  2. 脚本开头提供参数:#!/bin/sh -x 或者 #!/bin/bash -x
  3. 在脚本中用set命令启用或者禁用参数,其中set -x表示启用,set +x表示禁用

set命令的详细说明

Bash 官网文档

线程

线程

1
2
3
4
GDB> show scheduler-locking //显示线程的scheduler-locking状态
GDB> set scheduler-locking on //调试加锁当前线程,停止所有其他线程
GDB> set print pretty on //格式化打印
GDB> thread find [regexp] //查找线程

符号表

如何在gdb中加载多个符号文件.我有一个可执行的foo.out和加载模块栏.我创建了两个符号文件foo.symbol和bar.symbol.如何将两个文件加载到GDB中.

# gdb –core core

(gdb)

(gdb) symbol-file foo.symbol

如何加载第二个符号文件.或有什么方法可以加载gdb

中的所有目录文件

推荐答案

设置包含符号文件的目录use

set debug-file-directory

并使用

show debug-file-directory

显示当前设置为包含符号文件的目录.

符号文件是从此目录自动读取的,如果二进制文件以调试链接为单位.

二进制文件.

要添加其他符号,您可能会使用add-symbol-file.

(如 gdb onlinedocs 目前我在这里引用此信息,

附加符号文件的文件名地址

附加符号文件文件名地址[-readNow] [ - 映射]

附加符号文件文件名-Ssection地址…

附加符号文件命令从文件文件名中读取其他符号表信息.当将文件名(通过某些其他方式)动态加载到正在运行的程序中时,您将使用此命令.地址应为已加载文件的内存地址; GDB无法自行解决这个问题.您还可以指定任意数量的``ssection地址’’对,以提供该节的明确截面名称和基础地址.您可以将任何地址指定为表达式.

文件文件名的符号表被添加到最初使用符号文件命令读取的符号表中.您可以多次使用附加符号文件命令;因此,新的符号数据读取不断添加到旧.为了丢弃所有旧符号数据,请使用没有任何参数的符号文件命令.

尽管文件名通常是共享库文件,可执行文件或已完全重新安置以加载到过程中的其他对象文件,但您也可以从重新定位的.o文件加载符号信息,只要:

  • 文件的符号信息仅指该文件中定义的链接器符号,而不是由其他对象文件定义的符号,
  • 每节文件的符号信息涉及实际上已加载到文件中,并且
  • 您可以确定每个部分加载的地址,并将其提供给附加符号文件命令.

一些嵌入式操作系统,例如Sun Chorus和VXWorks,可以将可重定位的文件加载到已经运行的程序中;这样的系统通常使上述要求易于满足.但是,重要的是要认识到,许多天然系统都使用复杂的链接程序(例如,链接部分分解和C ++构造函数组件),使要求难以满足.通常,不能假设使用附加符号文件读取可重新定位对象文件的符号信息的效果与将可重定位对象文件链接到程序中的效果相同.

.

加上符号文件在使用后不重复.

与符号文件命令一样,您可以使用-mapped’ and -ReadNow’选项,以更改GDB管理文件名的符号表信息.

其他推荐答案

其他符号可以加载到gdb调试会话:

add-symbol-file filename address

参数address是.text节的地址.可以通过以下方式检索此地址

readelf -WS path/to/file.elf | grep .text | awk ‘{ print “0x”$5 }’

可以通过在gdb中自动化这可以通过添加以下输入到~/.gdbinit:

来自动化.

define add-symbol-file-auto

Parse .text address to temp file

shell echo set \$text_address=$(readelf -WS $arg0 | grep .text | awk ‘{ print “0x”$5 }’) >/tmp/temp_gdb_text_address.txt

Source .text address

source /tmp/temp_gdb_text_address.txt

Clean tempfile

shell rm -f /tmp/temp_gdb_text_address.txt

Load symbol table

add-symbol-file $arg0 $text_address
end

以上功能定义add-symbol-file-auto可以使用其他符号:

(gdb) add-symbol-file-auto path/to/bootloader.elf
add symbol table from file “path/to/bootloader.elf” at
.text_addr = 0x8010400
(gdb) add-symbol-file-auto path/to/application.elf
add symbol table from file “path/to/application.elf” at
.text_addr = 0x8000000
(gdb) break main
Breakpoint 1 at 0x8006cb0: main. (2 locations)
(gdb) info break
Num Type Disp Enb Address What
1 breakpoint keep y
1.1 y 0x08006cb0 in main() at ./source/main.cpp:114
1.2 y 0x080106a6 in main() at ./main.cpp:10
(gdb)

AES_NI

AES_NI

How to find out AES-NI (Advanced Encryption) Enabled on Linux System

One can find out that the processor has the AES/AES-NI instruction set using the lscpu command:

lscpu

Type the following grep command to make sure that the processor has the AES instruction set and enabled in the BIOS:

grep -o aes /proc/cpuinfo

OR

grep -m1 -o aes /proc/cpuinfo

Base64

Base64 - Wikipedia

Excerpt

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.


In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

Common to all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web[1] where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.[2]

Base64 is also widely used for sending e-mail attachments. This is required because SMTP – in its original form – was designed to transport 7-bit ASCII characters only. This encoding causes an overhead of 33–37% (33% by the encoding itself; up to 4% more by the inserted line breaks).

Design[edit]

The particular set of 64 characters chosen to represent the 64-digit values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not 8-bit clean.[3] For example, MIME‘s Base64 implementation uses AZ, az, and 09 for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is UTF-7.

The earliest instances of this type of encoding were created for dial-up communication between systems running the same OS, for example, uuencode for UNIX and BinHex for the TRS-80 (later adapted for the Macintosh), and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.[4][5][6][3]

Base64 table from RFC 4648[edit]

This is the Base64 alphabet defined in RFC 4648 §4 . See also Variants summary (below).

IndexBinaryCharIndexBinaryCharIndexBinaryCharIndexBinaryChar
0000000A16010000Q32100000g48110000w
1000001B17010001R33100001h49110001x
2000010C18010010S34100010i50110010y
3000011D19010011T35100011j51110011z
4000100E20010100U36100100k521101000
5000101F21010101V37100101l531101011
6000110G22010110W38100110m541101102
7000111H23010111X39100111n551101113
8001000I24011000Y40101000o561110004
9001001J25011001Z41101001p571110015
10001010K26011010a42101010q581110106
11001011L27011011b43101011r591110117
12001100M28011100c44101100s601111008
13001101N29011101d45101101t611111019
14001110O30011110e46101110u62111110+
15001111P31011111f47101111v63111111/
Padding=

Examples[edit]

The example below uses ASCII text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode binary data (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.

Here is a well-known idiom from distributed computing:

Many hands make light work.

When the quote (without trailing whitespace) is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME‘s Base64 scheme as follows (newlines and white spaces may be present anywhere but are to be ignored on decoding):

TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the byte values 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.

As this example illustrates, Base64 encoding converts three octets into four encoded characters.

SourceText (ASCII)Man
Octets77 (0x4d)97 (0x61)110 (0x6e)
Bits010011010110000101101110
Base64
encoded
Sextets1922546
CharacterTWFu
Octets84 (0x54)87 (0x57)70 (0x46)117 (0x75)

= padding characters might be added to make the last encoded block contain four Base64 characters.

Hexadecimal to octal transformation is useful to convert between binary and Base64. Such conversion is available for both advanced calculators and programming languages. For example, the hexadecimal representation of the 24 bits above is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs (23 26 05 56), and each pair is converted to decimal to yield 19 22 05 46. Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are TWFu.

If there are only two significant input octets (e.g., ‘Ma’), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding = padding character):

SourceText (ASCII)Ma
Octets77 (0x4d)97 (0x61)
Bits010011010110000100
Base64
encoded
Sextets19224Padding
CharacterTWE=
Octets84 (0x54)87 (0x57)69 (0x45)61 (0x3D)

If there is only one significant input octet (e.g., ‘M’), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four least significant bits of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two = padding characters):

SourceText (ASCII)M
Octets77 (0x4d)
Bits010011010000
Base64
encoded
Sextets1916PaddingPadding
CharacterTQ==
Octets84 (0x54)81 (0x51)61 (0x3D)61 (0x3D)

Output padding[edit]

Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = 4 × 6 = 24 bits) represents three octets of unencoded text or data (3 octets = 3 × 8 = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is =, which indicates that no further bits are needed to fully encode the input. (This is different from A, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding:

Input Output Padding
Text Length Text
light work. 11 bGlnaHQgd29yay4=
light work 10 bGlnaHQgd29yaw==
light wor 9 bGlnaHQgd29y
light wo 8 bGlnaHQgd28=
light w 7 bGlnaHQgdw==

The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.

Decoding Base64 with padding[edit]

When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single = indicates that the four characters will decode to only two bytes, while == indicates that the four characters will decode to only a single byte. For example:

Encoded Padding Length Decoded
bGlnaHQgdw== == 1 light w
bGlnaHQgd28= = 2 light wo
bGlnaHQgd29y None 3 light wor

Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a = is encountered. For example, when `bGlnaHQgdw==` is decoded, we convert each character (except the trailing occurrences of =) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first = and another 2 trailing bits for the other =. In this instance, we would get 6 bits from the d, and another 6 bits from the w for a bit string of length 12, but since we remove 2 bits for each = (for a total of 4 bits), the dw== ends up producing 8 bits (1 byte) when decoded.

Decoding Base64 without padding[edit]

Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of two Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example:

Length Encoded Length Decoded
2 bGlnaHQgdw 1 light w
3 bGlnaHQgd28 2 light wo
4 bGlnaHQgd29y 3 light wor

Implementations and history[edit]

Variants summary table[edit]

Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections below.

Encoding Encoding characters Separate encoding of lines Decoding non-encoding characters
62nd 63rd pad Separators
RFC 1421: Base64 for Privacy-Enhanced Mail (deprecated) + / = mandatory
RFC 2045: Base64 transfer encoding for MIME + / = mandatory
RFC 2152: Base64 for UTF-7 + / No
RFC 3501: Base64 encoding for IMAP mailbox names + , No
RFC 4648 §4: base64 (standard)[a] + / = optional
RFC 4648 §5: base64url (URL- and filename-safe standard)[a] - _ = optional
RFC 4880: Radix-64 for OpenPGP + / = mandatory
Other variations see Applications not compatible with RFC 4648 Base64 (below)
  1. ^ Jump up to: a b It is important to note that this variant is intended to provide common features where they are not desired to be specialized by implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which have not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated here may be overridden.

Privacy-enhanced mail[edit]

The first known standardized use of the encoding now called MIME Base64 was in the Privacy-enhanced Electronic Mail (PEM) protocol, proposed by RFC 989 in 1987. PEM defines a “printable encoding” scheme that uses Base64 encoding to transform an arbitrary sequence of octets to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as SMTP.[7]

The current version of PEM (specified in RFC 1421) uses a 64-character alphabet consisting of upper- and lower-case Roman letters (AZ, az), the numerals (09), and the + and / symbols. The = symbol is also used as a padding suffix.[4] The original specification, RFC 989, additionally used the * symbol to delimit encoded but unencrypted data within the output stream.

To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: “ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/“, and the indicated character is output.

The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.

After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two = characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one = character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.

PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.

MIME[edit]

Main article: MIME

The MIME (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two binary-to-text encoding schemes (the other being quoted-printable).[8] MIME’s Base64 encoding is based on that of the RFC 1421 version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the = symbol for output padding in the same way, as described at RFC 2045.

MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (For example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF newline pair to delimit encoded lines.

Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length (4⁄3×78⁄76), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:

1
2
bytes = (string_length(encoded_string) − 814) / 1.37

UTF-7[edit]

UTF-7, described first in RFC 1642, which was later superseded by RFC 2152, introduced a system called modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.[9][10]

The “Modified Base64” alphabet consists of the MIME Base64 alphabet, but does not use the “=“ padding character. UTF-7 is intended for use in mail headers (defined in RFC 2047), and the “=“ character is reserved in that context as the escape character for “quoted-printable” encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.

OpenPGP[edit]

OpenPGP, described in RFC 4880, describes Radix-64 encoding, also known as “ASCII armor“. Radix-64 is identical to the “Base64” encoding described by MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the “=“ symbol as the separator, appended to the encoded output data.[11]

RFC 3548[edit]

RFC 3548, entitled The Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify the RFC 1421 and RFC 2045 specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings.

Unless implementations are written to a specification that refers to RFC 3548 and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.[6]

RFC 4648[edit]

This RFC obsoletes RFC 3548 and focuses on Base64/32/16:

This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.

URL applications[edit]

Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) into a string for use as an HTTP parameter in HTTP forms or HTTP GET URLs. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.

Using standard Base64 in URL requires encoding of ‘+‘, ‘/‘ and ‘=‘ characters into special percent-encoded hexadecimal sequences (‘+‘ becomes ‘%2B‘, ‘/‘ becomes ‘%2F‘ and ‘=‘ becomes ‘%3D‘), which makes the string unnecessarily longer.

For this reason, modified Base64 for URL variants exist (such as base64url in RFC 4648), where the ‘+‘ and ‘/‘ characters of standard Base64 are respectively replaced by ‘-‘ and ‘_‘, so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is YouTube.[12] Some variants allow or require omitting the padding ‘=‘ signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries[which?] will encode ‘=‘ to ‘.‘, potentially exposing applications to relative path attacks when a folder name is encoded from user data.[citation needed]

HTML[edit]

The atob() and btoa() JavaScript methods, defined in the HTML5 draft specification,[13] provide Base64 encoding and decoding functionality to web pages. The btoa() method outputs padding characters, but these are optional in the input of the atob() method. This is real life example website that uses atob() Base64-Encoder

Other applications[edit]

Example of an SVG containing embedded JPEG images encoded in Base64[14]

Base64 can be used in a variety of contexts:

  • Base64 can be used to transmit and store text that might otherwise cause delimiter collision
  • Base64 is used to encode character strings in LDAP Data Interchange Format files
  • Base64 is often used to embed binary data in an XML file, using a syntax similar to <data encoding="base64">…</data> e.g. favicons in Firefox‘s exported bookmarks.html.
  • Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
  • The data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a CSS stylesheet file as data: URIs, instead of being supplied in separate files.
  • Although not part of the official specification for SVG, some viewers can interpret Base64 when used for embedded elements, such as images inside SVG.[15]
  • Base64 can be used to store/transmit relatively small amounts of binary data via a computer’s text clipboard functionality, especially in cases where the information doesn’t warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of cryptocurrency recipients as Base64 encoded text strings, which can be easily copied and pasted into users’ wallet software.
  • Binary data that must be quickly verified by humans as a safety mechanism, such as file checksums or key fingerprints, is often represented in Base64 for easy checking, sometimes with additional formattings, such as separating each group of four characters in the representation of a PGP key fingerprint with a space.
  • QR codes which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data, as there is a stronger guarantee that all QR code readers will accurately decode text, as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data.

Applications are not compatible with RFC 4648 Base64[edit]

Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (see Variants summary table above).

  • The Uuencoding alphabet includes no lowercase characters, instead using ASCII codes 32 (“ “ (space)) through 95 (“_“), consecutively. Uuencoding uses the alphabet “ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_“. Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.[citation needed]
  • BinHex 4 (HQX), which was used within the classic Mac OS, excludes some visually confusable characters like ‘7‘, ‘O‘, ‘g‘ and ‘o‘. Its alphabet includes additional punctuation characters. It uses the alphabet “!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr“.
  • A UTF-8 environment can use non-synchronized continuation bytes as base64: 0b10**xxxxxx**. See UTF-8#Self-synchronization.
  • Several other applications use alphabets similar to the common variations, but in a different order:
    • Unix stores password hashes computed with crypt in the /etc/passwd file using an encoding called B64. crypt’s alphabet puts the punctuation . and / before the alphanumeric characters. crypt uses the alphabet “./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz“. Padding is not used.
    • The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is “./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz“.[16]
    • bcrypt hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt’s alphabet is in a different order than crypt’s. bcrypt uses the alphabet “./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789“.[17]
    • Xxencoding uses a mostly-alphanumeric character set similar to crypt, but using + and - rather than . and /. Xxencoding uses the alphabet “+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz“.
    • 6PACK, used with some terminal node controllers, uses an alphabet from 0x00 to 0x3f.[18]
    • Bash supports numeric literals in Base64. Bash uses the alphabet “0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_“.[19]

One issue with the RFC 4648 alphabet is that, when a sorted list of ASCII-encoded strings is Base64-transformed and sorted again, the order of elements changes. This is because the padding character and the characters in the substitution alphabet are not ordered by ASCII character value (which can be seen by using the following sample table’s sort buttons). Alphabets like (unpadded) B64 address this.

ASCII Base64 Base64, no padding B64
light w bGlnaHQgdw== bGlnaHQgdw P4ZbO5EURk
light wo bGlnaHQgd28= bGlnaHQgd28 P4ZbO5EURqw
light wor bGlnaHQgd29y bGlnaHQgd29y P4ZbO5EURqxm

See also[edit]

References[edit]

  1. ^ “Base64 encoding and decoding – Web APIs”. MDN Web Docs.
  2. ^ “When to base64 encode images (and when not to)”. 28 August 2011.
  3. ^ Jump up to: a b The Base16,Base32,and Base64 Data Encodings. IETF. October 2006. doi:10.17487/RFC4648. RFC 4648. Retrieved March 18, 2010.
  4. ^ Jump up to: a b Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures. IETF. February 1993. doi:10.17487/RFC1421. RFC 1421. Retrieved March 18, 2010.
  5. ^ Multipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies. IETF. November 1996. doi:10.17487/RFC2045. RFC 2045. Retrieved March 18, 2010.
  6. ^ Jump up to: a b The Base16, Base32, and Base64 Data Encodings. IETF. July 2003. doi:10.17487/RFC3548. RFC 3548. Retrieved March 18, 2010.
  7. ^ Privacy Enhancement for Internet Electronic Mail. IETF. February 1987. doi:10.17487/RFC0989. RFC 989. Retrieved March 18, 2010.
  8. ^ Cite error: The named reference RFC 2045 was invoked but never defined (see the help page).
  9. ^ UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. July 1994. doi:10.17487/RFC1642. RFC 1642. Retrieved March 18, 2010.
  10. ^ UTF-7 A Mail-Safe Transformation Format of Unicode. IETF. May 1997. doi:10.17487/RFC2152. RFC 2152. Retrieved March 18, 2010.
  11. ^ OpenPGP Message Format. IETF. November 2007. doi:10.17487/RFC4880. RFC 4880. Retrieved March 18, 2010.
  12. ^ “Here’s Why YouTube Will Practically Never Run Out of Unique Video IDs”. www.mentalfloss.com. 23 March 2016. Retrieved 27 December 2021.
  13. ^ “7.3. Base64 utility methods”. HTML 5.2 Editor’s Draft. World Wide Web Consortium. Retrieved 2 January 2018. Introduced by changeset 5814, 2021-02-01.
  14. ^ <image xlink:href=”data:image/jpeg;base64,JPEG contents encoded in Base64“ … />
  15. ^ “Edit fiddle”. jsfiddle.net.
  16. ^ “The GEDCOM Standard Release 5.5”. Homepages.rootsweb.ancestry.com. Retrieved 2012-06-21.
  17. ^ Provos, Niels (1997-02-13). “src/lib/libc/crypt/bcrypt.c r1.1”. Retrieved 2018-05-18.
  18. ^ “6PACK a “real time” PC to TNC protocol”. Retrieved 2013-05-19.
  19. ^ “Shell Arithmetic”. Bash Reference Manual. Retrieved 8 April 2020. Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.