io_fs_03
EXT2/3/4的layout
Super block, stoing:
size and location of bitmaps
number and location of inodes
number and location of data blocks
index of root inodes
Bitmap of indoes
Bitmap of data blocks
Inodes table
Data blocks(4kB each)
EXT2/3/4是以group进行分组的,
superblock记录文件系统的类型、block大小、block总数、free block数、inode大小、inode总数、free inode数、group的总数等,在多个group进行备份
group描述符记录:block bitmap位置、inode bitmap位置、inode表位置、free block、free inode数量
文件系统的一致性
对一个文件增加4k大小,需要对indoes Bitmap、data blocks Bitmap 、Inodes table、Data blocks进行修改。如果只修改了其中几项,突然断电了,文件系统就不一致。
例子:
## 创建一个4M的镜像文件
$ dd if=/dev/zero of=Image bs=1024 count=4096
## 将镜像文件格式化成ext4格式
$ mkfs.ext4 -b 4096 Image
## 查看inodes bitmap位于第x块
$ dumpe2fs Image
Inode 位图位于 18 (+18)
## 查看镜像文件对应的inodes bitmap数据
$ dd if=Image bs=4096 skip=18 | hexdump -C -n 32
00000000 ff 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020
## inodes bitmap数据为[ff 07],即前11个inode都被使用了
## 如果在此文件系统创建新文件,分配的inode number就是12,如下
$ sudo mount -o loop Image test/
$ cd test/
$ sudo touch yes
$ ls -li yes
12 -rw-r--r-- 1 root root 0 11月 13 19:13 yes
$ dd if=Image bs=4096 skip=18 | hexdump -C -n 32
00000000 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020
## 下面制造一个"断电"事故,导致inodes bitmap没有正常修改
## 将ff 0f修改为ff 07
$ hexedit Image
$ dd if=Image bs=4096 skip=18 | hexdump -C -n 32
00000000 ff 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020
$ sudo mount -o loop Image test/
$ cd test/
$ ls -li
11 drwx------ 2 root root 16384 11月 13 19:07 lost+found
12 -rw-r--r-- 1 root root 0 11月 13 19:13 yes
## 因为此文件系统不一致性,导致fail
$ sudo touch no
touch: 无法创建 'no': 设备上没有空间
$ dmesg | tail
[ 604.710098] EXT4-fs (loop1): mounted filesystem without journal. Opts: (null)
[ 604.710113] ext4 filesystem being mounted at /home/vernon/workplaces/test/io-courses/test supports timestamps until 2038 (0x7fffffff)
[ 907.625269] EXT4-fs (loop1): mounted filesystem without journal. Opts: (null)
[ 907.625287] ext4 filesystem being mounted at /home/vernon/workplaces/test/io-courses/test supports timestamps until 2038 (0x7fffffff)
[ 1568.654614] EXT4-fs (loop1): mounted filesystem without journal. Opts: (null)
[ 1568.654630] ext4 filesystem being mounted at /home/vernon/workplaces/test/io-courses/test supports timestamps until 2038 (0x7fffffff)
[ 1582.104416] EXT4-fs error (device loop1): ext4_validate_inode_bitmap:99: comm touch: Corrupt inode bitmap - block_group = 0, inode_bitmap = 18
$ ls -li
11 drwx------ 2 root root 16384 11月 13 19:07 lost+found
12 -rw-r--r-- 1 root root 0 11月 13 19:13 yes
掉电与文件系统一致性
任何的软件技术都无法保证掉电不丟数据,只能保证一致性(元数据+数据的一致性 或者 仅元数据的一致性)
dirty_expire_centisecs、DIRECT_IO、SYNC IO的调整,不影响丟/不丟数据,只影响丟多少数据
fsck、日志、COW文件系统等技术,帮忙提供一致性
fsck
fsck, 全称: file system consistency check
针对早期文件系统,系统重新启动中,使用fsck提供一致性,修复前面的"断电"事故
例子:
$ dd if=Image bs=4096 skip=18 | hexdump -C -n 32
00000000 ff 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020
$ fsck.ext4 -f Image
e2fsck 1.45.5 (07-Jan-2020)
第 1 步:检查inode、块和大小
第 2 步:检查目录结构
第 3 步:检查目录连接性
第 4 步:检查引用计数
第 5 步:检查组概要信息
Inode位图的差异: +12
处理<y>? 是
Image:***** 文件系统已修改 *****
Image:12/1024 文件(0.0% 为非连续的), 42/1024 块
$ dd if=Image bs=4096 skip=18 | hexdump -C -n 32
00000000 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020
日志
针对ext2/ext3/ext4文件系统,对文件系统的修改进行日志管理,提供一致性,从而修复突然断电事故
第一种方法:保存元数据+数据日志(data=journal) - 4个阶段:
jonrnal write
journal commit
jonrnal checkpoint
jonrnal free
第二种方法:只保存元数据日志(data=writeback or data=ordered) - 5个阶段:
data write
journal metadata write
journal commit
jonrnal checkpoint metadata
jonrnal free
Copy On Write
没有日志,用COW实现文件系统一致性
如:针对btrfs文件系统,每次写磁盘时,先将更新数据写入一个新的block,当新数据写入成功之后,再更新相关的数据结构指向新block
文件系统的debug和dump
常用工具:mkfs、dumpe2f、blkcat、dd、debugfs、blktrace
## block_number = 11591720
## inode_number = 2097400
$ debugfs -R 'stat /home/vernon/test.txt' /dev/sda1
Inode: 2097400 Type: regular Mode: 0644 Flags: 0x80000
Generation: 477007534 Version: 0x00000000:00000001
User: 1000 Group: 1000 Size: 14
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x5fae92a4:360c2a44 -- Fri Nov 13 22:05:24 2020
atime: 0x5fae92a4:360c2a44 -- Fri Nov 13 22:05:24 2020
mtime: 0x5fae92a4:360c2a44 -- Fri Nov 13 22:05:24 2020
crtime: 0x5fae92a4:360c2a44 -- Fri Nov 13 22:05:24 2020
Size of extra inode fields: 28
EXTENTS:
(0): 11591720
## 打印/home/vernon/test.txt内容
## 第一种方法:通过/dev/sda1
$ blkcat /dev/sda1 11591720
or
$ dd if=/dev/sda1 of=out.txt skip=$((11591720*8)) bs=512c count=1
## 第二种方法:通过/dev/sda
$ fdisk /dev/sda ## 可知,/dev/sda1 offset = 2048
$ dd if=/dev/sda of=out.txt skip=$((11591720*8+2048)) bs=512c count=1
## block number look for inode number
$ debugfs -R 'icheck 11591720' /dev/sda1
debugfs 1.41.11 (14-Mar-2010)
Block Inode number
11591720 2097400
## inode number look for file name
$ debugfs -R 'ncheck 2097400' /dev/sda1
debugfs 1.41.11 (14-Mar-2010)
Inode Pathname
2097400 /home/vernon/test.txt
Last updated
Was this helpful?