mmc issue? #17

Closed
opened 2018-07-26 01:02:47 +02:00 by SolidEva · 39 comments
SolidEva commented 2018-07-26 01:02:47 +02:00 (Migrated from github.com)

Picked up another c201, which has a different emmc. The usual installation isn't working either even if the size is corrected.

chromeos-3.14 config:

CONFIG_MMC=y
# CONFIG_MMC_DEBUG is not set
CONFIG_MMC_UNSAFE_RESUME=y
# CONFIG_MMC_CLKGATE is not set
# CONFIG_MMC_EMBEDDED_SDIO is not set
# CONFIG_MMC_PARANOID_SD_INIT is not set

#
# MMC/SD/SDIO Card Drivers
#
CONFIG_MMC_BLOCK=y
CONFIG_MMC_BLOCK_MINORS=16
CONFIG_MMC_BLOCK_BOUNCE=y
# CONFIG_SDIO_UART is not set
CONFIG_MMC_TEST=m
# CONFIG_MMC_FFU is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_ARMMMCI is not set
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PLTFM=y
# CONFIG_MMC_SDHCI_OF_ARASAN is not set
# CONFIG_MMC_SDHCI_PXAV3 is not set
# CONFIG_MMC_SDHCI_PXAV2 is not set
CONFIG_MMC_DW=y
CONFIG_MMC_DW_IDMAC=y
CONFIG_MMC_DW_PLTFM=y
# CONFIG_MMC_DW_EXYNOS is not set
# CONFIG_MMC_DW_K3 is not set
CONFIG_MMC_DW_ROCKCHIP=y
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

4.9 config:


#
# USB Physical Layer drivers
#
# CONFIG_USB_PHY is not set
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_USB_GPIO_VBUS is not set
# CONFIG_USB_ISP1301 is not set
CONFIG_USB_ULPI=y
CONFIG_USB_ULPI_VIEWPORT=y
# CONFIG_USB_GADGET is not set
# CONFIG_USB_LED_TRIG is not set
# CONFIG_USB_ULPI_BUS is not set
# CONFIG_UWB is not set
CONFIG_MMC=y
# CONFIG_MMC_DEBUG is not set
CONFIG_PWRSEQ_EMMC=y
CONFIG_PWRSEQ_SIMPLE=y

#
# MMC/SD/SDIO Card Drivers
#
CONFIG_MMC_BLOCK=y
CONFIG_MMC_BLOCK_MINORS=256
CONFIG_MMC_BLOCK_BOUNCE=y
# CONFIG_SDIO_UART is not set
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
# CONFIG_MMC_ARMMMCI is not set
# CONFIG_MMC_SDHCI is not set
CONFIG_MMC_DW=y
CONFIG_MMC_DW_PLTFM=y
# CONFIG_MMC_DW_EXYNOS is not set
# CONFIG_MMC_DW_K3 is not set
CONFIG_MMC_DW_ROCKCHIP=y
# CONFIG_MMC_VUB300 is not set
# CONFIG_MMC_USHC is not set
# CONFIG_MMC_USDHI6ROL0 is not set
# CONFIG_MMC_MTK is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
# CONFIG_LEDS_CLASS_FLASH is not set

CONFIG_MMC_BLOCK_MINORS and CONFIG_MMC_DW_IDMAC are different at least

Picked up another c201, which has a different emmc. The usual installation isn't working either even if the size is corrected. chromeos-3.14 config: ``` CONFIG_MMC=y # CONFIG_MMC_DEBUG is not set CONFIG_MMC_UNSAFE_RESUME=y # CONFIG_MMC_CLKGATE is not set # CONFIG_MMC_EMBEDDED_SDIO is not set # CONFIG_MMC_PARANOID_SD_INIT is not set # # MMC/SD/SDIO Card Drivers # CONFIG_MMC_BLOCK=y CONFIG_MMC_BLOCK_MINORS=16 CONFIG_MMC_BLOCK_BOUNCE=y # CONFIG_SDIO_UART is not set CONFIG_MMC_TEST=m # CONFIG_MMC_FFU is not set # # MMC/SD/SDIO Host Controller Drivers # # CONFIG_MMC_ARMMMCI is not set CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PLTFM=y # CONFIG_MMC_SDHCI_OF_ARASAN is not set # CONFIG_MMC_SDHCI_PXAV3 is not set # CONFIG_MMC_SDHCI_PXAV2 is not set CONFIG_MMC_DW=y CONFIG_MMC_DW_IDMAC=y CONFIG_MMC_DW_PLTFM=y # CONFIG_MMC_DW_EXYNOS is not set # CONFIG_MMC_DW_K3 is not set CONFIG_MMC_DW_ROCKCHIP=y # CONFIG_MMC_VUB300 is not set # CONFIG_MMC_USHC is not set # CONFIG_MEMSTICK is not set CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y ``` 4.9 config: ``` # # USB Physical Layer drivers # # CONFIG_USB_PHY is not set # CONFIG_NOP_USB_XCEIV is not set # CONFIG_USB_GPIO_VBUS is not set # CONFIG_USB_ISP1301 is not set CONFIG_USB_ULPI=y CONFIG_USB_ULPI_VIEWPORT=y # CONFIG_USB_GADGET is not set # CONFIG_USB_LED_TRIG is not set # CONFIG_USB_ULPI_BUS is not set # CONFIG_UWB is not set CONFIG_MMC=y # CONFIG_MMC_DEBUG is not set CONFIG_PWRSEQ_EMMC=y CONFIG_PWRSEQ_SIMPLE=y # # MMC/SD/SDIO Card Drivers # CONFIG_MMC_BLOCK=y CONFIG_MMC_BLOCK_MINORS=256 CONFIG_MMC_BLOCK_BOUNCE=y # CONFIG_SDIO_UART is not set # CONFIG_MMC_TEST is not set # # MMC/SD/SDIO Host Controller Drivers # # CONFIG_MMC_ARMMMCI is not set # CONFIG_MMC_SDHCI is not set CONFIG_MMC_DW=y CONFIG_MMC_DW_PLTFM=y # CONFIG_MMC_DW_EXYNOS is not set # CONFIG_MMC_DW_K3 is not set CONFIG_MMC_DW_ROCKCHIP=y # CONFIG_MMC_VUB300 is not set # CONFIG_MMC_USHC is not set # CONFIG_MMC_USDHI6ROL0 is not set # CONFIG_MMC_MTK is not set # CONFIG_MEMSTICK is not set CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y # CONFIG_LEDS_CLASS_FLASH is not set ``` CONFIG_MMC_BLOCK_MINORS and CONFIG_MMC_DW_IDMAC are different at least
SolidEva commented 2018-07-28 22:36:38 +02:00 (Migrated from github.com)

built chromeos recovery image with bin found at:

ASUS Chromebook C201PA
version=10575.58.0
desc=
channel=stable-channel
hwidmatch=^SPEEDY .*
hwid=
md5=027d004fbdeea5362dca91ab311b3d1f
sha1=5603eed597b389d298763dac889b99a8c62d1e43
zipfilesize=1054069801
file=chromeos_10575.58.0_veyron-speedy_recovery_stable-channel_speedy-mp-v3.bin
filesize=2271232512
url=https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_10575.58.0_veyron-speedy_recovery_stable-channel_speedy-mp-v3.bin.zip

This was found in the linux recovery script.
the stock chromeos with kernel 3.14 can read the mmc correctly and creates /dev/ files for each partition unline the newer kernel (>=4.9)
Testing with debian and 3.14 kernel.

built chromeos recovery image with bin found at: ``` ASUS Chromebook C201PA version=10575.58.0 desc= channel=stable-channel hwidmatch=^SPEEDY .* hwid= md5=027d004fbdeea5362dca91ab311b3d1f sha1=5603eed597b389d298763dac889b99a8c62d1e43 zipfilesize=1054069801 file=chromeos_10575.58.0_veyron-speedy_recovery_stable-channel_speedy-mp-v3.bin filesize=2271232512 url=https://dl.google.com/dl/edgedl/chromeos/recovery/chromeos_10575.58.0_veyron-speedy_recovery_stable-channel_speedy-mp-v3.bin.zip ``` This was found in the linux recovery script. the stock chromeos with kernel 3.14 can read the mmc correctly and creates /dev/ files for each partition unline the newer kernel (>=4.9) Testing with debian and 3.14 kernel.
SolidEva commented 2018-07-28 22:37:58 +02:00 (Migrated from github.com)

arch linux has some good cgpt docs when combined with the man page https://wiki.archlinux.org/index.php/Talk:Chrome_OS_devices

arch linux has some good cgpt docs when combined with the man page https://wiki.archlinux.org/index.php/Talk:Chrome_OS_devices
SolidEva commented 2018-07-28 22:40:12 +02:00 (Migrated from github.com)

The scripts here may be a good example of how the mmc has to be partitioned correctly, otherwise the main script of the chromeos recovery binary would be a great reference, however I have yet to extract them.
https://github.com/altreact/archbk/issues/3

The scripts here may be a good example of how the mmc has to be partitioned correctly, otherwise the main script of the chromeos recovery binary would be a great reference, however I have yet to extract them. https://github.com/altreact/archbk/issues/3
SolidEva commented 2018-07-28 22:40:38 +02:00 (Migrated from github.com)
linux chromeos recovery guide: https://github.com/raphael/linux-samus/wiki/How-do-I-restore-ChromeOS-after-it-has-been-removed%3F
SolidEva commented 2018-07-29 00:05:07 +02:00 (Migrated from github.com)
#!/bin/bash

cgpt create $1
cgpt add -i 1 -t data -b 30781406 -s 32 -l "STATE"  $1
cgpt add -i 2 -t kernel -b 20480 -s 32768 -l "KERN-A" -S 1 -T 5 -P 10 $1
cgpt add -i 3 -t rootfs -b 315424 -s 30465982 -l "ROOT-A" $1
cgpt add -i 4 -t kernel -b 53428 -s 32768 -l "KERN-B" -S 0 -T 0 -P 1 $1
cgpt add -i 5 -t rootfs -b 315392 -s 32 -l "ROOT-B" $1
#Are these needed??
cgpt add -i 6 -t kernel -b 16448 -s 1 -S 0 -T 0 -P 0 $1
cgpt add -i 7 -t rootfs -b 16449 -s 1 $1
cgpt add -i 8 -t data -b 86016 -s 32768 $1
cgpt add -i 9 -t reserved -b 16450 -s 1 $1
cgpt add -i 10 -t reserved -b 16451 -s 1 $1
cgpt add -i 11 -t data -b 64 -s 16384 $1
#Probably needed? 
cgpt add -i 12 -t efi -b 249856 -s 65536 $1
``` #!/bin/bash cgpt create $1 cgpt add -i 1 -t data -b 30781406 -s 32 -l "STATE" $1 cgpt add -i 2 -t kernel -b 20480 -s 32768 -l "KERN-A" -S 1 -T 5 -P 10 $1 cgpt add -i 3 -t rootfs -b 315424 -s 30465982 -l "ROOT-A" $1 cgpt add -i 4 -t kernel -b 53428 -s 32768 -l "KERN-B" -S 0 -T 0 -P 1 $1 cgpt add -i 5 -t rootfs -b 315392 -s 32 -l "ROOT-B" $1 #Are these needed?? cgpt add -i 6 -t kernel -b 16448 -s 1 -S 0 -T 0 -P 0 $1 cgpt add -i 7 -t rootfs -b 16449 -s 1 $1 cgpt add -i 8 -t data -b 86016 -s 32768 $1 cgpt add -i 9 -t reserved -b 16450 -s 1 $1 cgpt add -i 10 -t reserved -b 16451 -s 1 $1 cgpt add -i 11 -t data -b 64 -s 16384 $1 #Probably needed? cgpt add -i 12 -t efi -b 249856 -s 65536 $1 ```
SolidEva commented 2018-07-30 00:46:07 +02:00 (Migrated from github.com)

dd'ing the debian install with 3.14 chromeos kernel directly from the flash drive didnt work, but adjusting the partition map like in the https://github.com/altreact/archbk/issues/3 scrips then dd'ing sda1 to kerna and sda2 to root a worked! Just need two scripts for the two standard gpt tables. Probably will find similar logic in the recovery installer.

TODO: Test making own gpt table in same style as expected.

dd'ing the debian install with 3.14 chromeos kernel directly from the flash drive didnt work, but adjusting the partition map like in the https://github.com/altreact/archbk/issues/3 scrips then dd'ing sda1 to kerna and sda2 to root a worked! Just need two scripts for the two standard gpt tables. Probably will find similar logic in the recovery installer. TODO: Test making own gpt table in same style as expected.
SolidEva commented 2018-08-10 15:41:23 +02:00 (Migrated from github.com)

In the newer kernels, 'CONFIG_MMC_UNSAFE_RESUME' is removed.
Even if this wasn't set, the kernel command line parameter 'removable' could be used

This seems to have been replaced by the device tree parameter non-removable which is set in
torvalds/linux@8efcf34a26/arch/arm/boot/dts/rk3288-veyron.dtsi

This commit shows the usage of the 'nonremovable' property, and the switch to allowing the read to be from the device tree torvalds/linux@73a47a9bb3

TODO: This is worth looking into further if nothing else works. I have yet to prove the 'non-removable' parameter provides the same function as 'CONFIG_MMC_UNSAFE_RESUME'

In the newer kernels, 'CONFIG_MMC_UNSAFE_RESUME' is removed. Even if this wasn't set, the kernel command line parameter 'removable' could be used This seems to have been replaced by the device tree parameter `non-removable` which is set in https://github.com/torvalds/linux/blob/8efcf34a263965e471e3999904f94d1f6799d42a/arch/arm/boot/dts/rk3288-veyron.dtsi This commit shows the usage of the 'nonremovable' property, and the switch to allowing the read to be from the device tree https://github.com/torvalds/linux/commit/73a47a9bb3e2c4a9c553c72456e63ab991b1a4d9 TODO: This is worth looking into further if nothing else works. I have yet to prove the 'non-removable' parameter provides the same function as 'CONFIG_MMC_UNSAFE_RESUME'
SolidEva commented 2018-08-10 15:50:10 +02:00 (Migrated from github.com)

'CONFIG_MMC_BLOCK_BOUNCE' was removed by this patch https://patchwork.kernel.org/patch/9732989/
as it was found to be default on, and disabled by kernel logic later if unneeded

TODO: If issue persists, ensure block bounce is still functioning properly
https://cateee.net/lkddb/web-lkddb/MMC_BLOCK_BOUNCE.html implies it was last found in kernel 4.12

'CONFIG_MMC_BLOCK_BOUNCE' was removed by this patch https://patchwork.kernel.org/patch/9732989/ as it was found to be default on, and disabled by kernel logic later if unneeded TODO: If issue persists, ensure block bounce is still functioning properly https://cateee.net/lkddb/web-lkddb/MMC_BLOCK_BOUNCE.html implies it was last found in kernel 4.12
SolidEva commented 2018-08-10 15:57:16 +02:00 (Migrated from github.com)

disabled 'CONFIG_PHY_ROCKCHIP_EMMC=y'

disabled 'CONFIG_PHY_ROCKCHIP_EMMC=y'
SolidEva commented 2018-08-10 16:23:56 +02:00 (Migrated from github.com)

'CONFIG_MMC_DW_IDMAC=y' was removed by this commit torvalds/linux@3fc7eaef44 (diff-913858dae5)
between 4.3 and 4.4

and was replaced with logic to determine whether internal or external dma are to be used

in static void dw_mci_init_dma(struct dw_mci *host)

	/*
	* Check tansfer mode from HCON[17:16]
	* Clear the ambiguous description of dw_mmc databook:
	* 2b'00: No DMA Interface -> Actually means using Internal DMA block
	* 2b'01: DesignWare DMA Interface -> Synopsys DW-DMA block
	* 2b'10: Generic DMA Interface -> non-Synopsys generic DMA block
	* 2b'11: Non DW DMA Interface -> pio only
	* Compared to DesignWare DMA Interface, Generic DMA Interface has a
	* simpler request/acknowledge handshake mechanism and both of them
	* are regarded as external dma master for dw_mmc.
	*/
	host->use_dma = SDMMC_GET_TRANS_MODE(mci_readl(host, HCON));
	if (host->use_dma == DMA_INTERFACE_IDMA) {
		host->use_dma = TRANS_MODE_IDMAC;
	} else if (host->use_dma == DMA_INTERFACE_DWDMA ||
		   host->use_dma == DMA_INTERFACE_GDMA) {
		host->use_dma = TRANS_MODE_EDMAC;
	} else {
		goto no_dma;
	}

SDMMC_GET_TRANS_MODE(x) is defined as

/* HCON register defines */
#define DMA_INTERFACE_IDMA		(0x0)
#define DMA_INTERFACE_DWDMA		(0x1)
#define DMA_INTERFACE_GDMA		(0x2)
#define DMA_INTERFACE_NODMA		(0x3)
#define SDMMC_GET_TRANS_MODE(x)		(((x)>>16) & 0x3)

mci_readl is defined as the following in /drivers/mmc/host/dw_mmc.h

/* Register access macros */
#define mci_readl(dev, reg)			\
	readl_relaxed((dev)->regs + SDMMC_##reg)

TODO: determining which mode this logic selects is a decent avenue, as the bug would exist in the two versions I've tested that have the issue: 4.9 and 4.17

'CONFIG_MMC_DW_IDMAC=y' was removed by this commit https://github.com/torvalds/linux/commit/3fc7eaef44dbcbcd602b6bcd0ac6efba7a30b108#diff-913858dae5b244b2e358c501f5af7c73 between 4.3 and 4.4 and was replaced with logic to determine whether internal or external dma are to be used in `static void dw_mci_init_dma(struct dw_mci *host)` ``` /* * Check tansfer mode from HCON[17:16] * Clear the ambiguous description of dw_mmc databook: * 2b'00: No DMA Interface -> Actually means using Internal DMA block * 2b'01: DesignWare DMA Interface -> Synopsys DW-DMA block * 2b'10: Generic DMA Interface -> non-Synopsys generic DMA block * 2b'11: Non DW DMA Interface -> pio only * Compared to DesignWare DMA Interface, Generic DMA Interface has a * simpler request/acknowledge handshake mechanism and both of them * are regarded as external dma master for dw_mmc. */ host->use_dma = SDMMC_GET_TRANS_MODE(mci_readl(host, HCON)); if (host->use_dma == DMA_INTERFACE_IDMA) { host->use_dma = TRANS_MODE_IDMAC; } else if (host->use_dma == DMA_INTERFACE_DWDMA || host->use_dma == DMA_INTERFACE_GDMA) { host->use_dma = TRANS_MODE_EDMAC; } else { goto no_dma; } ``` `SDMMC_GET_TRANS_MODE(x)` is defined as ``` /* HCON register defines */ #define DMA_INTERFACE_IDMA (0x0) #define DMA_INTERFACE_DWDMA (0x1) #define DMA_INTERFACE_GDMA (0x2) #define DMA_INTERFACE_NODMA (0x3) #define SDMMC_GET_TRANS_MODE(x) (((x)>>16) & 0x3) ``` `mci_readl` is defined as the following in /drivers/mmc/host/dw_mmc.h ``` /* Register access macros */ #define mci_readl(dev, reg) \ readl_relaxed((dev)->regs + SDMMC_##reg) ``` TODO: determining which mode this logic selects is a decent avenue, as the bug would exist in the two versions I've tested that have the issue: 4.9 and 4.17
SolidEva commented 2018-08-10 18:49:48 +02:00 (Migrated from github.com)

disabled CONFIG_SCSI_SCAN_ASYNC for debugging
TODO: re-enable if not the culprit, speeds up everything

disabled `CONFIG_SCSI_SCAN_ASYNC` for debugging TODO: re-enable if not the culprit, speeds up everything
SolidEva commented 2018-08-10 18:58:43 +02:00 (Migrated from github.com)

Building two images, one with the newly modified 4.17 config and another with the arch arm config found here
https://raw.githubusercontent.com/archlinuxarm/PKGBUILDs/master/core/linux-veyron/config

Building two images, one with the newly modified 4.17 config and another with the arch arm config found here https://raw.githubusercontent.com/archlinuxarm/PKGBUILDs/master/core/linux-veyron/config
SolidEva commented 2018-08-10 20:44:37 +02:00 (Migrated from github.com)

Built third image with the broken-cd parameter in the emmc and sdio sections of the veyron device tree
essentially reverted the following commit just for rk3288-veyron.dtsi

torvalds/linux@57375d88fa (diff-65dbac344b)

Built third image with the `broken-cd` parameter in the emmc and sdio sections of the veyron device tree essentially reverted the following commit just for rk3288-veyron.dtsi https://github.com/torvalds/linux/commit/57375d88fa3f6bf9351051529464c708f72adb1d#diff-65dbac344b467c94b73bb22b81e93a9a
SolidEva commented 2018-08-11 17:37:10 +02:00 (Migrated from github.com)

None of the three images described above could see the internal mmc.
Testing linux-libre 3.14 to see if support for this mmc was broken sometime in mainline or specifically added to chromeos-3.14

None of the three images described above could see the internal mmc. Testing linux-libre 3.14 to see if support for this mmc was broken sometime in mainline or specifically added to chromeos-3.14
SolidEva commented 2018-08-11 19:14:33 +02:00 (Migrated from github.com)

3.14 mainline has no support for veyron speedy, trying 4.3.6 as 4.3 is the first release with the dtsi

3.14 mainline has no support for veyron speedy, trying 4.3.6 as 4.3 is the first release with the dtsi
SolidEva commented 2018-08-13 15:44:55 +02:00 (Migrated from github.com)

4.3.6 boots to a white screen.

Found the chromeos kernel always includes the reset function in pinctl-0:

pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8 &emmc_deassert_reset>;

but mainline only does if mmc-pwrseq is supported

	mmc-pwrseq = <&emmc_pwrseq>;
	non-removable;
	pinctrl-names = "default";
pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8>


	emmc_pwrseq: emmc-pwrseq {
		compatible = "mmc-pwrseq-emmc";
		pinctrl-0 = <&emmc_reset>;
		pinctrl-names = "default";
		reset-gpios = <&gpio2 RK_PB1 GPIO_ACTIVE_HIGH>;
};

TODO: test including emmc_reset in pinctrl-0 by default, also test the emmc pwrseq config option.

4.3.6 boots to a white screen. Found the chromeos kernel always includes the reset function in pinctl-0: `pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8 &emmc_deassert_reset>;` but mainline only does if `mmc-pwrseq` is supported ``` mmc-pwrseq = <&emmc_pwrseq>; non-removable; pinctrl-names = "default"; pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8> ``` ``` emmc_pwrseq: emmc-pwrseq { compatible = "mmc-pwrseq-emmc"; pinctrl-0 = <&emmc_reset>; pinctrl-names = "default"; reset-gpios = <&gpio2 RK_PB1 GPIO_ACTIVE_HIGH>; }; ``` TODO: test including `emmc_reset` in pinctrl-0 by default, also test the emmc pwrseq config option.
SolidEva commented 2018-08-13 16:05:13 +02:00 (Migrated from github.com)
TODO: Try the mmc test config https://stackoverflow.com/questions/40882479/is-there-any-user-space-tool-available-for-emmc-to-perform-diagnostic-test-and-p
SolidEva commented 2018-08-13 16:13:29 +02:00 (Migrated from github.com)

Testing https://github.com/dimkr/devsus/tree/hybrid which has mmc-pwrseq enabled in config
Actually testing https://github.com/SolidHal/devsus/tree/hybrid_dev which build debian instead of devuan as it is much more stable to build as the devuan servers can stop responding for seemingly no reason.

Since https://github.com/SolidHal/devsus/tree/master is able to see and use the emmc with debian stretch and the chromeos-3.14 kernel we can guarantee the OS is not at fault here, so testing with debian stretch is a non issue.

~~Testing https://github.com/dimkr/devsus/tree/hybrid which has mmc-pwrseq enabled in config~~ Actually testing https://github.com/SolidHal/devsus/tree/hybrid_dev which build debian instead of devuan as it is **much** more stable to build as the devuan servers can stop responding for seemingly no reason. Since https://github.com/SolidHal/devsus/tree/master is able to see and use the emmc with debian stretch and the chromeos-3.14 kernel we can guarantee the OS is not at fault here, so testing with debian stretch is a non issue.
SolidEva commented 2018-08-13 19:14:17 +02:00 (Migrated from github.com)

Also testing multiple older commits that use older configs and older versions of 4.17:
Librean master commit fbac1b5fe2
Librean master commit 23e05d6af6

Also testing multiple older commits that use older configs and older versions of 4.17: Librean master commit fbac1b5fe27e912dc0e7a263a5cdb2cb4a6e0438 Librean master commit 23e05d6af694ae74a5acf58f23cf435b893ebffe
SolidEva commented 2018-08-14 16:05:51 +02:00 (Migrated from github.com)

Strong lead, CONFIG_MMC_BLOCK_MINORS is set to 256 in the 4.17 k configs.
The answer at this post explains why that is a problem: https://unix.stackexchange.com/questions/217640/dev-mmcblk0-partitions-limit.

Since 4.17 would map the emmc to mmcblk2, CONFIG_MMC_BLOCK_MINORS=256 would leave no room for anything besides mmcblk0 and its partitions.

In devsus 3.14 CONFIG_MMC_BLOCK_MINORS is set to 16. This is a good compromise, as each mmcblk device can have a ton of partitions in default chromeos.

TODO: Build image and test from this repo https://github.com/SolidHal/Librean/tree/mintest-emmc-4.17 Edit: didn't work, update here:https://github.com/SolidHal/Librean/issues/17#issuecomment-413933394

Image name: Librean-4.17.2-mintest-CONFIG_MMC_BLOCK_MINORS-16.img

Strong lead, `CONFIG_MMC_BLOCK_MINORS` is set to 256 in the 4.17 k configs. The answer at this post explains why that is a problem: https://unix.stackexchange.com/questions/217640/dev-mmcblk0-partitions-limit. Since 4.17 would map the emmc to mmcblk2, `CONFIG_MMC_BLOCK_MINORS=256` would leave no room for anything besides mmcblk0 and its partitions. In devsus 3.14 `CONFIG_MMC_BLOCK_MINORS` is set to 16. This is a good compromise, as each mmcblk device can have a ton of partitions in default chromeos. ~~TODO: Build image and test from this repo https://github.com/SolidHal/Librean/tree/mintest-emmc-4.17~~ Edit: didn't work, update here:https://github.com/SolidHal/Librean/issues/17#issuecomment-413933394 Image name: Librean-4.17.2-mintest-CONFIG_MMC_BLOCK_MINORS-16.img
SolidEva commented 2018-08-14 16:07:41 +02:00 (Migrated from github.com)

For future debugging, mmc attributes are described here https://www.kernel.org/doc/Documentation/mmc/mmc-dev-attrs.txt

and can be read at least partially using https://www.kernel.org/doc/Documentation/mmc/mmc-tools.txt

For future debugging, mmc attributes are described here https://www.kernel.org/doc/Documentation/mmc/mmc-dev-attrs.txt and can be read at least partially using https://www.kernel.org/doc/Documentation/mmc/mmc-tools.txt
SolidEva commented 2018-08-14 19:16:54 +02:00 (Migrated from github.com)
Include this patch as well if previous TODO doesnt work out: https://github.com/Miouyouyou/RockMyy/blob/master/patches/kernel/v4.17/DTS/0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch
SolidEva commented 2018-08-17 19:22:35 +02:00 (Migrated from github.com)

Strong lead, CONFIG_MMC_BLOCK_MINORS is set to 256 in the 4.17 k configs.
The answer at this post explains why that is a problem: https://unix.stackexchange.com/questions/217640/dev-mmcblk0-partitions-limit.

Since 4.17 would map the emmc to mmcblk2, CONFIG_MMC_BLOCK_MINORS=256 would leave no room for anything besides mmcblk0 and its partitions.

In devsus 3.14 CONFIG_MMC_BLOCK_MINORS is set to 16. This is a good compromise, as each mmcblk device can have a ton of partitions in default chromeos.

TODO: Build image and test from this repo https://github.com/SolidHal/Librean/tree/mintest-emmc-4.17

This didn't work, but I did notice fdisk -l recognizes all of the partitions but they don't appear in /dev/

The boot partitions and rmpbr partitons of the mmcblk device do show up in /dev/ however

> Strong lead, CONFIG_MMC_BLOCK_MINORS is set to 256 in the 4.17 k configs. > The answer at this post explains why that is a problem: https://unix.stackexchange.com/questions/217640/dev-mmcblk0-partitions-limit. > > Since 4.17 would map the emmc to mmcblk2, CONFIG_MMC_BLOCK_MINORS=256 would leave no room for anything besides mmcblk0 and its partitions. > > In devsus 3.14 CONFIG_MMC_BLOCK_MINORS is set to 16. This is a good compromise, as each mmcblk device can have a ton of partitions in default chromeos. > > TODO: Build image and test from this repo https://github.com/SolidHal/Librean/tree/mintest-emmc-4.17 This didn't work, but I did notice fdisk -l recognizes all of the partitions but they don't appear in /dev/ The boot partitions and rmpbr partitons of the mmcblk device do show up in /dev/ however
SolidEva commented 2018-08-17 19:26:13 +02:00 (Migrated from github.com)

TODO: Try running partprobe ?

This guy had some issues with an sdcard that are similar:
https://unix.stackexchange.com/questions/198082/my-sd-card-has-a-partition-but-linux-doesnt-create-a-device-entry-for-the-parti

It looks like the posted dmesg logs have some errors that may be of interest

[3783981.471166] Buffer I/O error on device sdd, logical block 0
[3783981.482542] Dev sdd: unable to read RDB block 0
[3783981.482548]  sdd: unable to read partition table

Further down, one answered suggests:

when you call partprobe later, the kernel is able to read the partition table just fine. It looks like there is either a hardware error or a driver bug that causes the initial read to fail. It could be that the SD card or reader firmware needs some time to finish initializing and the driver attempts to read too early.

As a workaround they suggest creating a udev rule that runs partprobe

TODO: see if these errors are present on the device as well
They arent, see here: https://github.com/SolidHal/Librean/issues/17#issuecomment-414337527

TODO: Try running `partprobe` ? This guy had some issues with an sdcard that are similar: https://unix.stackexchange.com/questions/198082/my-sd-card-has-a-partition-but-linux-doesnt-create-a-device-entry-for-the-parti It looks like the posted dmesg logs have some errors that may be of interest ``` [3783981.471166] Buffer I/O error on device sdd, logical block 0 [3783981.482542] Dev sdd: unable to read RDB block 0 [3783981.482548] sdd: unable to read partition table ``` Further down, one answered suggests: > when you call partprobe later, the kernel is able to read the partition table just fine. It looks like there is either a hardware error or a driver bug that causes the initial read to fail. It could be that the SD card or reader firmware needs some time to finish initializing and the driver attempts to read too early. As a workaround they suggest creating a udev rule that runs partprobe ~~TODO: see if these errors are present on the device as well~~ They arent, see here: https://github.com/SolidHal/Librean/issues/17#issuecomment-414337527
SolidEva commented 2018-08-17 19:52:08 +02:00 (Migrated from github.com)

It could be the chromeos 3.14 kernel has some customizations around reading the partition table that mainline never got

It could be the chromeos 3.14 kernel has some customizations around reading the partition table that mainline never got
SolidEva commented 2018-08-20 06:06:22 +02:00 (Migrated from github.com)

partprobe is not a standard part of the debian install, need the parted package

partprobe is not a standard part of the debian install, need the parted package
SolidEva commented 2018-08-20 16:10:39 +02:00 (Migrated from github.com)

Built image with CONFIG_MMC_DEBUG enabled, as well as with the parted package for the partprobe command.
Additionally included this patch https://github.com/Miouyouyou/RockMyy/blob/master/patches/kernel/v4.17/DTS/0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch for kicks.

Follow procedure refered to here: https://github.com/SolidHal/Librean/issues/17#issuecomment-412529789
for the mmc debugging process.

TODO: Try the mmc test config https://stackoverflow.com/questions/40882479/is-there-any-user-space-tool-available-for-emmc-to-perform-diagnostic-test-and-p

Built image with `CONFIG_MMC_DEBUG` enabled, as well as with the `parted` package for the `partprobe` command. Additionally included this patch <https://github.com/Miouyouyou/RockMyy/blob/master/patches/kernel/v4.17/DTS/0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch> for kicks. Follow procedure refered to here: https://github.com/SolidHal/Librean/issues/17#issuecomment-412529789 for the mmc debugging process. > TODO: Try the mmc test config https://stackoverflow.com/questions/40882479/is-there-any-user-space-tool-available-for-emmc-to-perform-diagnostic-test-and-p
SolidEva commented 2018-08-20 16:33:52 +02:00 (Migrated from github.com)

This guy had some issues with an sdcard that are similar:
https://unix.stackexchange.com/questions/198082/my-sd-card-has-a-partition-but-linux-doesnt-create-a-device-entry-for-the-parti

It looks like the posted dmesg logs have some errors that may be of interest

[3783981.471166] Buffer I/O error on device sdd, logical block 0
[3783981.482542] Dev sdd: unable to read RDB block 0
[3783981.482548] sdd: unable to read partition table

Further down, one answered suggests:

when you call partprobe later, the kernel is able to read the partition table just fine. It looks like there is either a hardware error or a driver bug that causes the initial read to fail. It could be that the SD card or reader firmware needs some time to finish initializing and the driver attempts to read too early.

As a workaround they suggest creating a udev rule that runs partprobe

TODO: see if these errors are present on the device as well

These errors are NOT present, as the mmc driver is used not the sd (scsi device) driver.

Got logs from inserting and removing a sd card, which should have a similar init process in the mmc driver as as the emmc.

mmc_host mmc0: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 50000000Hz, actual 50000000HZ div = 0)
mmc0: new high speed SDHC card at address 59b4
mmcblk0: mmc0:59b4 ND4GB 3.73 GiB
 mmcblk0: p1
.
.
.
mmc0: card 59b4 removed

Dissecting:

mmc0: new high speed SDHC card at address 59b4

is printed by torvalds/linux@0a4b6e2f80/drivers/mmc/core/bus.c

I believe

mmcblk0: mmc0:59b4 ND4GB 3.73 GiB
 mmcblk0: p1

is printed by block.c, which is located at https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.14/drivers/mmc/card/block.c for chrome os, and https://github.com/torvalds/linux/blob/master/drivers/mmc/core/block.c for mainline

And

mmc0: card 59b4 removed

is printed by https://elixir.bootlin.com/linux/latest/source/drivers/mmc/core/bus.c#L371

block.c diverges from mainline at commit f662ae48ae67dfd42739e65750274fe8de46240a

> This guy had some issues with an sdcard that are similar: > https://unix.stackexchange.com/questions/198082/my-sd-card-has-a-partition-but-linux-doesnt-create-a-device-entry-for-the-parti > > It looks like the posted dmesg logs have some errors that may be of interest > > [3783981.471166] Buffer I/O error on device sdd, logical block 0 > [3783981.482542] Dev sdd: unable to read RDB block 0 > [3783981.482548] sdd: unable to read partition table > > Further down, one answered suggests: > > when you call partprobe later, the kernel is able to read the partition table just fine. It looks like there is either a hardware error or a driver bug that causes the initial read to fail. It could be that the SD card or reader firmware needs some time to finish initializing and the driver attempts to read too early. > > As a workaround they suggest creating a udev rule that runs partprobe > > ~~TODO: see if these errors are present on the device as well~~ These errors are NOT present, as the mmc driver is used not the sd (scsi device) driver. Got logs from inserting and removing a sd card, which should have a similar init process in the mmc driver as as the emmc. ``` mmc_host mmc0: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0) mmc_host mmc0: Bus speed (slot 0) = 50000000Hz (slot req 50000000Hz, actual 50000000HZ div = 0) mmc0: new high speed SDHC card at address 59b4 mmcblk0: mmc0:59b4 ND4GB 3.73 GiB mmcblk0: p1 . . . mmc0: card 59b4 removed ``` Dissecting: ``` mmc0: new high speed SDHC card at address 59b4 ``` is printed by https://github.com/torvalds/linux/blob/0a4b6e2f80aad46fb55a5cf7b1664c0aef030ee0/drivers/mmc/core/bus.c I believe ``` mmcblk0: mmc0:59b4 ND4GB 3.73 GiB mmcblk0: p1 ``` is printed by `block.c`, which is located at https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.14/drivers/mmc/card/block.c for `chrome os`, and https://github.com/torvalds/linux/blob/master/drivers/mmc/core/block.c for `mainline` And ``` mmc0: card 59b4 removed ``` is printed by https://elixir.bootlin.com/linux/latest/source/drivers/mmc/core/bus.c#L371 `block.c` diverges from mainline at commit f662ae48ae67dfd42739e65750274fe8de46240a
SolidEva commented 2018-08-20 21:03:48 +02:00 (Migrated from github.com)

Tracking commit differences between chrome os and mainline
As mentioned,

block.c diverges from mainline near commit f662ae48ae67dfd42739e65750274fe8de46240a

f662ae4 mmc: fix host release issue after discard operation before or after seems to be missing in mainline, but likely not relevant
ce4421b mmc_block: Allow more than 8 partitions per card gets merged to both

HEAD of chrome os https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.14/drivers/mmc/card/block.c is very similar to torvalds/linux@f662ae48ae/drivers/mmc/card/block.c

Tracking commit differences between `chrome os` and `mainline` As mentioned, > block.c diverges from mainline near commit f662ae48ae67dfd42739e65750274fe8de46240a `f662ae4 mmc: fix host release issue after discard operation` before or after seems to be missing in mainline, but likely not relevant `ce4421b mmc_block: Allow more than 8 partitions per card` gets merged to both `HEAD` of `chrome os` https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.14/drivers/mmc/card/block.c is very similar to https://github.com/torvalds/linux/blob/f662ae48ae67dfd42739e65750274fe8de46240a/drivers/mmc/card/block.c
SolidEva commented 2018-08-20 21:54:24 +02:00 (Migrated from github.com)

This change could be problematic, maybe it's timing out before it can read the partition table?
Edit: it wasn't

-#define MMC_BLK_TIMEOUT_MS  (10 * 60 * 1000)        /* 10 minute timeout */
+
+/*
+ * Set a 10 second timeout for polling write request busy state. Note, mmc core
+ * is setting a 3 second timeout for SD cards, and SDHCI has long had a 10
+ * second software timer to timeout the whole request, so 10 seconds should be
+ * ample.
+ */
+#define MMC_BLK_TIMEOUT_MS  (10 * 1000)

TODO: Test Librean-timeout-4.17.2-test.img Edit: No change

This patch disabled packaged commands:
https://patchwork.kernel.org/patch/9439165/
A less urgent TODO in CHROME OS 3.14: test if the c201 emmc requires packaged commands by changing

#define MMC_BLK_PACKED_CMD	(1 << 2)	/* MMC packed command support */

to

#define MMC_BLK_PACKED_CMD	(0 << 2)	/* MMC packed command support */

which should disable the feature in the chrome os kernel

~~This change could be problematic, maybe it's timing out before it can read the partition table?~~ Edit: it wasn't ``` -#define MMC_BLK_TIMEOUT_MS (10 * 60 * 1000) /* 10 minute timeout */ + +/* + * Set a 10 second timeout for polling write request busy state. Note, mmc core + * is setting a 3 second timeout for SD cards, and SDHCI has long had a 10 + * second software timer to timeout the whole request, so 10 seconds should be + * ample. + */ +#define MMC_BLK_TIMEOUT_MS (10 * 1000) ``` ~~TODO: Test Librean-timeout-4.17.2-test.img~~ Edit: No change This patch disabled packaged commands: https://patchwork.kernel.org/patch/9439165/ A less urgent TODO in CHROME OS 3.14: test if the c201 emmc requires packaged commands by changing ``` #define MMC_BLK_PACKED_CMD (1 << 2) /* MMC packed command support */ ``` to ``` #define MMC_BLK_PACKED_CMD (0 << 2) /* MMC packed command support */ ``` which should disable the feature in the `chrome os` kernel
SolidEva commented 2018-08-20 22:09:41 +02:00 (Migrated from github.com)

The dif between mainline block.c and chrome os block.c https://ghostbin.com/paste/w9u3g

The dif between mainline block.c and chrome os block.c https://ghostbin.com/paste/w9u3g
SolidEva commented 2018-08-21 05:36:38 +02:00 (Migrated from github.com)

Some log analysis:
The 4.17 logs have these lines spammed at boot, they are missing from the 3.14 logs

[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[    0.000000] rockchip_mmc_get_phase: invalid clk rate
[ 0.000000] rockchip_mmc_get_phase: invalid clk rate

Seems to be from torvalds/linux@ce84eca927 which references this commit torvalds/linux@c420c1e4db which started checking for invalid clock rates. Trying torvalds/linux@ce84eca927 ,as its not in 4.17.2, to see if it removes these.

ADDING THE ABOVE PATCH DID NOT FIX THE ISSUE, IT MADE IT WORSE! Now between the dwmmc_rockchip ff------- lines there are more invalid clk rate errors.

Also:

In working:

[ 0.600088] rockchip-vop ff940000.vop: Attached to iommu domain

~~[ 0.614939] mmc0: BKOPS_EN bit is not set~~

[ 0.627145] mmc_host mmc0: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0)

(From core/mmc.c)

NOT in non working version, TODO: look into BKOPS??
Not a problem, the chromeos version prints if BKOPS is not enabled, and mainline prints if it IS enabled. Since there is no message about BKOPS, we are good to go.

Try to find what prints:

[    1.010244] Primary GPT is being ignored, using alternate GPT.
[    1.010283]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
[    1.015057]  mmcblk0boot1: unknown partition table
[ 1.016743] mmcblk0boot0: unknown partition table

As it is all missing from the 4.17.2 log

/drivers/mmc/core/block.c, specifically device_add_disk in mmc_add_disk should be getting the partition table...

static int mmc_add_disk(struct mmc_blk_data *md)
{
	int ret;
	struct mmc_card *card = md->queue.card;

	device_add_disk(md->parent, md->disk);

So add some prints around that, and mmc_blk_alloc_req as well as mmc_blk_alloc_rpmb_part both do a bunch, so add some prints there.

I added some pr_info calls but they don't seem to happen? Test multiple print methods.

Some log analysis: The 4.17 logs have these lines spammed at boot, they are missing from the 3.14 logs ``` [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate [ 0.000000] rockchip_mmc_get_phase: invalid clk rate ``` Seems to be from https://github.com/torvalds/linux/commit/ce84eca927af24ca27897ba5fee4fbeed443d5fc which references this commit https://github.com/torvalds/linux/commit/c420c1e4db229a5d18faed4b58c01ef89027d5b7 which started checking for invalid clock rates. Trying https://github.com/torvalds/linux/commit/ce84eca927af24ca27897ba5fee4fbeed443d5fc ,as its not in 4.17.2, to see if it removes these. **ADDING THE ABOVE PATCH DID NOT FIX THE ISSUE, IT MADE IT WORSE!** Now between the `dwmmc_rockchip ff-------` lines there are more invalid clk rate errors. ~~Also:~~ ~~In working:~~ ``` [ 0.600088] rockchip-vop ff940000.vop: Attached to iommu domain ~~[ 0.614939] mmc0: BKOPS_EN bit is not set~~ [ 0.627145] mmc_host mmc0: Bus speed (slot 0) = 148500000Hz (slot req 150000000Hz, actual 148500000HZ div = 0) ``` ~~(From core/mmc.c)~~ ~~NOT in non working version, TODO: look into BKOPS??~~ Not a problem, the chromeos version prints if BKOPS **is not** enabled, and mainline prints if it **IS** enabled. Since there is no message about BKOPS, we are good to go. Try to find what prints: ``` [ 1.010244] Primary GPT is being ignored, using alternate GPT. [ 1.010283] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 [ 1.015057] mmcblk0boot1: unknown partition table [ 1.016743] mmcblk0boot0: unknown partition table ``` As it is all missing from the 4.17.2 log `/drivers/mmc/core/block.c`, specifically `device_add_disk` in `mmc_add_disk` should be getting the partition table... ``` static int mmc_add_disk(struct mmc_blk_data *md) { int ret; struct mmc_card *card = md->queue.card; device_add_disk(md->parent, md->disk); ``` So add some prints around that, and `mmc_blk_alloc_req` as well as `mmc_blk_alloc_rpmb_part` both do a bunch, so add some prints there. I added some pr_info calls but they don't seem to happen? Test multiple print methods.
SolidEva commented 2018-08-21 05:39:57 +02:00 (Migrated from github.com)

Tomorrow:

Try this:

This patch disabled packaged commands:
https://patchwork.kernel.org/patch/9439165/
A less urgent TODO in CHROME OS 3.14: test if the c201 emmc requires packaged commands by changing

#define MMC_BLK_PACKED_CMD (1 << 2) /* MMC packed command support */

to

#define MMC_BLK_PACKED_CMD (0 << 2) /* MMC packed command support */

which should disable the feature in the chrome os kernel

And add the above described print statements in order to find out what prints the partition list in the kernel log.

Tomorrow: Try this: > This patch disabled packaged commands: > https://patchwork.kernel.org/patch/9439165/ > A less urgent TODO in CHROME OS 3.14: test if the c201 emmc requires packaged commands by changing > > #define MMC_BLK_PACKED_CMD (1 << 2) /* MMC packed command support */ > > to > > #define MMC_BLK_PACKED_CMD (0 << 2) /* MMC packed command support */ > > which should disable the feature in the chrome os kernel And add the above described print statements in order to find out what prints the partition list in the kernel log.
SolidEva commented 2018-08-22 18:52:23 +02:00 (Migrated from github.com)

Analysis of how In block.c in the mmc driver finds partitions,

It starts here

static int mmc_add_disk(struct mmc_blk_data *md)
{
        int ret;
        struct mmc_card *card = md->queue.card;
        printk("HAL_DEBUG: SHOULD BE GOING TO FIND PARTITIONS");
        device_add_disk(md->parent, md->disk);
        printk("HAL_DEBUG: DID WE FIND ANY??");

When a new mmc device is attached device_add_disk(md->parent, md->disk) is called, and does not return until the mmc device is removed or the os mounts a partition on it. In the meantime however, device_add_disk(md->parent, md->disk) or something it calls prints

 mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12

if the user calls mount on one of the partitons, or if the device is removed device_add_disk(md->parent, md->disk) returns and printk("HAL_DEBUG: DID WE FIND ANY??") is printed

void device_add_disk(struct device *parent, struct gendisk *disk)

calls

__device_add_disk(parent, disk, true)

Which is defined as

/**
 * __device_add_disk - add disk information to kernel list
 * @parent: parent device for the disk
 * @disk: per-device partitioning information
 * @register_queue: register the queue if set to true
 *
 * This function registers the partitioning information in @disk
 * with the kernel.
 *
 * FIXME: error handling
 */
static void __device_add_disk(struct device *parent, struct gendisk *disk,
			      bool register_queue)
{
	dev_t devt;
	int retval;

	/* minors == 0 indicates to use ext devt from part0 and should
	 * be accompanied with EXT_DEVT flag.  Make sure all
	 * parameters make sense.
	 */
	WARN_ON(disk->minors && !(disk->major || disk->first_minor));
	WARN_ON(!disk->minors &&
		!(disk->flags & (GENHD_FL_EXT_DEVT | GENHD_FL_HIDDEN)));

	disk->flags |= GENHD_FL_UP;

	retval = blk_alloc_devt(&disk->part0, &devt);
	if (retval) {
		WARN_ON(1);
		return;
	}
	disk->major = MAJOR(devt);
	disk->first_minor = MINOR(devt);

	disk_alloc_events(disk);

	if (disk->flags & GENHD_FL_HIDDEN) {
		/*
		 * Don't let hidden disks show up in /proc/partitions,
		 * and don't bother scanning for partitions either.
		 */
		disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
		disk->flags |= GENHD_FL_NO_PART_SCAN;
	} else {
		int ret;

		/* Register BDI before referencing it from bdev */
		disk_to_dev(disk)->devt = devt;
		ret = bdi_register_owner(disk->queue->backing_dev_info,
						disk_to_dev(disk));
		WARN_ON(ret);
		blk_register_region(disk_devt(disk), disk->minors, NULL,
				    exact_match, exact_lock, disk);
	}
	register_disk(parent, disk);
	if (register_queue)
		blk_register_queue(disk);

	/*
	 * Take an extra ref on queue which will be put on disk_release()
	 * so that it sticks around as long as @disk is there.
	 */
	WARN_ON_ONCE(!blk_get_queue(disk->queue));

	disk_add_events(disk);
	blk_integrity_add(disk);
}

This specifically bodes well... 🙄

 * FIXME: error handling
__device_add_disk(struct device *parent, struct gendisk *disk, bool register_queue)

calls

static void register_disk(struct device *parent, struct gendisk *disk)

which is defined as:

static void register_disk(struct device *parent, struct gendisk *disk)
{
	struct device *ddev = disk_to_dev(disk);
	struct block_device *bdev;
	struct disk_part_iter piter;
	struct hd_struct *part;
	int err;

	ddev->parent = parent;

	dev_set_name(ddev, "%s", disk->disk_name);

	/* delay uevents, until we scanned partition table */
	dev_set_uevent_suppress(ddev, 1);

	if (device_add(ddev))
		return;
	if (!sysfs_deprecated) {
		err = sysfs_create_link(block_depr, &ddev->kobj,
					kobject_name(&ddev->kobj));
		if (err) {
			device_del(ddev);
			return;
		}
	}

	/*
	 * avoid probable deadlock caused by allocating memory with
	 * GFP_KERNEL in runtime_resume callback of its all ancestor
	 * devices
	 */
	pm_runtime_set_memalloc_noio(ddev, true);

	disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
	disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);

	if (disk->flags & GENHD_FL_HIDDEN) {
		dev_set_uevent_suppress(ddev, 0);
		return;
	}

	/* No minors to use for partitions */
	if (!disk_part_scan_enabled(disk))
		goto exit;

	/* No such device (e.g., media were just removed) */
	if (!get_capacity(disk))
		goto exit;

	bdev = bdget_disk(disk, 0);
	if (!bdev)
		goto exit;

	bdev->bd_invalidated = 1;
	err = blkdev_get(bdev, FMODE_READ, NULL);
	if (err < 0)
		goto exit;
	blkdev_put(bdev, FMODE_READ);

exit:
	/* announce disk after possible partitions are created */
	dev_set_uevent_suppress(ddev, 0);
	kobject_uevent(&ddev->kobj, KOBJ_ADD);

	/* announce possible partitions */
	disk_part_iter_init(&piter, disk, 0);
	while ((part = disk_part_iter_next(&piter)))
		kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD);
	disk_part_iter_exit(&piter);

	err = sysfs_create_link(&ddev->kobj,
				&disk->queue->backing_dev_info->dev->kobj,
				"bdi");
	WARN_ON(err);
}

To be continued: added comments to register_disk...

Analysis of how In `block.c` in the mmc driver finds partitions, It starts here ``` static int mmc_add_disk(struct mmc_blk_data *md) { int ret; struct mmc_card *card = md->queue.card; printk("HAL_DEBUG: SHOULD BE GOING TO FIND PARTITIONS"); device_add_disk(md->parent, md->disk); printk("HAL_DEBUG: DID WE FIND ANY??"); ``` When a new mmc device is attached `device_add_disk(md->parent, md->disk)` is called, and does not return until the mmc device is removed or the os mounts a partition on it. In the meantime however, `device_add_disk(md->parent, md->disk)` or something it calls prints ``` mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 ``` if the user calls `mount` on one of the partitons, or if the device is removed `device_add_disk(md->parent, md->disk)` returns and `printk("HAL_DEBUG: DID WE FIND ANY??")` is printed ``` void device_add_disk(struct device *parent, struct gendisk *disk) ``` calls ``` __device_add_disk(parent, disk, true) ``` Which is defined as ``` /** * __device_add_disk - add disk information to kernel list * @parent: parent device for the disk * @disk: per-device partitioning information * @register_queue: register the queue if set to true * * This function registers the partitioning information in @disk * with the kernel. * * FIXME: error handling */ static void __device_add_disk(struct device *parent, struct gendisk *disk, bool register_queue) { dev_t devt; int retval; /* minors == 0 indicates to use ext devt from part0 and should * be accompanied with EXT_DEVT flag. Make sure all * parameters make sense. */ WARN_ON(disk->minors && !(disk->major || disk->first_minor)); WARN_ON(!disk->minors && !(disk->flags & (GENHD_FL_EXT_DEVT | GENHD_FL_HIDDEN))); disk->flags |= GENHD_FL_UP; retval = blk_alloc_devt(&disk->part0, &devt); if (retval) { WARN_ON(1); return; } disk->major = MAJOR(devt); disk->first_minor = MINOR(devt); disk_alloc_events(disk); if (disk->flags & GENHD_FL_HIDDEN) { /* * Don't let hidden disks show up in /proc/partitions, * and don't bother scanning for partitions either. */ disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO; disk->flags |= GENHD_FL_NO_PART_SCAN; } else { int ret; /* Register BDI before referencing it from bdev */ disk_to_dev(disk)->devt = devt; ret = bdi_register_owner(disk->queue->backing_dev_info, disk_to_dev(disk)); WARN_ON(ret); blk_register_region(disk_devt(disk), disk->minors, NULL, exact_match, exact_lock, disk); } register_disk(parent, disk); if (register_queue) blk_register_queue(disk); /* * Take an extra ref on queue which will be put on disk_release() * so that it sticks around as long as @disk is there. */ WARN_ON_ONCE(!blk_get_queue(disk->queue)); disk_add_events(disk); blk_integrity_add(disk); } ``` This specifically bodes well... 🙄 ``` * FIXME: error handling ``` ``` __device_add_disk(struct device *parent, struct gendisk *disk, bool register_queue) ``` calls ``` static void register_disk(struct device *parent, struct gendisk *disk) ``` which is defined as: ``` static void register_disk(struct device *parent, struct gendisk *disk) { struct device *ddev = disk_to_dev(disk); struct block_device *bdev; struct disk_part_iter piter; struct hd_struct *part; int err; ddev->parent = parent; dev_set_name(ddev, "%s", disk->disk_name); /* delay uevents, until we scanned partition table */ dev_set_uevent_suppress(ddev, 1); if (device_add(ddev)) return; if (!sysfs_deprecated) { err = sysfs_create_link(block_depr, &ddev->kobj, kobject_name(&ddev->kobj)); if (err) { device_del(ddev); return; } } /* * avoid probable deadlock caused by allocating memory with * GFP_KERNEL in runtime_resume callback of its all ancestor * devices */ pm_runtime_set_memalloc_noio(ddev, true); disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj); disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj); if (disk->flags & GENHD_FL_HIDDEN) { dev_set_uevent_suppress(ddev, 0); return; } /* No minors to use for partitions */ if (!disk_part_scan_enabled(disk)) goto exit; /* No such device (e.g., media were just removed) */ if (!get_capacity(disk)) goto exit; bdev = bdget_disk(disk, 0); if (!bdev) goto exit; bdev->bd_invalidated = 1; err = blkdev_get(bdev, FMODE_READ, NULL); if (err < 0) goto exit; blkdev_put(bdev, FMODE_READ); exit: /* announce disk after possible partitions are created */ dev_set_uevent_suppress(ddev, 0); kobject_uevent(&ddev->kobj, KOBJ_ADD); /* announce possible partitions */ disk_part_iter_init(&piter, disk, 0); while ((part = disk_part_iter_next(&piter))) kobject_uevent(&part_to_dev(part)->kobj, KOBJ_ADD); disk_part_iter_exit(&piter); err = sysfs_create_link(&ddev->kobj, &disk->queue->backing_dev_info->dev->kobj, "bdi"); WARN_ON(err); } ``` To be continued: added comments to register_disk...
SolidEva commented 2018-08-23 02:51:58 +02:00 (Migrated from github.com)

Next,

register_disk()

calls

blkdev_get(bdev, FMODE_READ, NULL);

which is defined as

/**
 * blkdev_get - open a block device
 * @bdev: block_device to open
 * @mode: FMODE_* mask
 * @holder: exclusive holder identifier
 *
 * Open @bdev with @mode.  If @mode includes %FMODE_EXCL, @bdev is
 * open with exclusive access.  Specifying %FMODE_EXCL with %NULL
 * @holder is invalid.  Exclusive opens may nest for the same @holder.
 *
 * On success, the reference count of @bdev is unchanged.  On failure,
 * @bdev is put.
 *
 * CONTEXT:
 * Might sleep.
 *
 * RETURNS:
 * 0 on success, -errno on failure.
 */
int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
{
	struct block_device *whole = NULL;
	int res;

	WARN_ON_ONCE((mode & FMODE_EXCL) && !holder);

	if ((mode & FMODE_EXCL) && holder) {
		whole = bd_start_claiming(bdev, holder);
		if (IS_ERR(whole)) {
			bdput(bdev);
			return PTR_ERR(whole);
		}
	}

	res = __blkdev_get(bdev, mode, 0);

	if (whole) {
		struct gendisk *disk = whole->bd_disk;

		/* finish claiming */
		mutex_lock(&bdev->bd_mutex);
		spin_lock(&bdev_lock);

		if (!res) {
			BUG_ON(!bd_may_claim(bdev, whole, holder));
			/*
			 * Note that for a whole device bd_holders
			 * will be incremented twice, and bd_holder
			 * will be set to bd_may_claim before being
			 * set to holder
			 */
			whole->bd_holders++;
			whole->bd_holder = bd_may_claim;
			bdev->bd_holders++;
			bdev->bd_holder = holder;
		}

		/* tell others that we're done */
		BUG_ON(whole->bd_claiming != holder);
		whole->bd_claiming = NULL;
		wake_up_bit(&whole->bd_claiming, 0);

		spin_unlock(&bdev_lock);

		/*
		 * Block event polling for write claims if requested.  Any
		 * write holder makes the write_holder state stick until
		 * all are released.  This is good enough and tracking
		 * individual writeable reference is too fragile given the
		 * way @mode is used in blkdev_get/put().
		 */
		if (!res && (mode & FMODE_WRITE) && !bdev->bd_write_holder &&
		    (disk->flags & GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE)) {
			bdev->bd_write_holder = true;
			disk_block_events(disk);
		}

		mutex_unlock(&bdev->bd_mutex);
		bdput(whole);
	}

	return res;
}
EXPORT_SYMBOL(blkdev_get);

Rather unsurprisingly,

blkdev_get()

calls

__blkdev_get(bdev, mode, 0)

which is large, and defined as:

static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
{
	struct gendisk *disk;
	int ret;
	int partno;
	int perm = 0;
	bool first_open = false;

	if (mode & FMODE_READ)
		perm |= MAY_READ;
	if (mode & FMODE_WRITE)
		perm |= MAY_WRITE;
	/*
	 * hooks: /n/, see "layering violations".
	 */
	if (!for_part) {
		ret = devcgroup_inode_permission(bdev->bd_inode, perm);
		if (ret != 0) {
			bdput(bdev);
			return ret;
		}
	}

 restart:

	ret = -ENXIO;
	disk = bdev_get_gendisk(bdev, &partno);
	if (!disk)
		goto out;

	disk_block_events(disk);
	mutex_lock_nested(&bdev->bd_mutex, for_part);
	if (!bdev->bd_openers) {
		first_open = true;
		bdev->bd_disk = disk;
		bdev->bd_queue = disk->queue;
		bdev->bd_contains = bdev;
		bdev->bd_partno = partno;

		if (!partno) {
			ret = -ENXIO;
			bdev->bd_part = disk_get_part(disk, partno);
			if (!bdev->bd_part)
				goto out_clear;

			ret = 0;
			if (disk->fops->open) {
				ret = disk->fops->open(bdev, mode);
				if (ret == -ERESTARTSYS) {
					/* Lost a race with 'disk' being
					 * deleted, try again.
					 * See md.c
					 */
					disk_put_part(bdev->bd_part);
					bdev->bd_part = NULL;
					bdev->bd_disk = NULL;
					bdev->bd_queue = NULL;
					mutex_unlock(&bdev->bd_mutex);
					disk_unblock_events(disk);
					put_disk_and_module(disk);
					goto restart;
				}
			}

			if (!ret)
				bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);

			/*
			 * If the device is invalidated, rescan partition
			 * if open succeeded or failed with -ENOMEDIUM.
			 * The latter is necessary to prevent ghost
			 * partitions on a removed medium.
			 */
			if (bdev->bd_invalidated) {
				if (!ret)
					rescan_partitions(disk, bdev);
				else if (ret == -ENOMEDIUM)
					invalidate_partitions(disk, bdev);
			}

			if (ret)
				goto out_clear;
		} else {
			struct block_device *whole;
			whole = bdget_disk(disk, 0);
			ret = -ENOMEM;
			if (!whole)
				goto out_clear;
			BUG_ON(for_part);
			ret = __blkdev_get(whole, mode, 1);
			if (ret)
				goto out_clear;
			bdev->bd_contains = whole;
			bdev->bd_part = disk_get_part(disk, partno);
			if (!(disk->flags & GENHD_FL_UP) ||
			    !bdev->bd_part || !bdev->bd_part->nr_sects) {
				ret = -ENXIO;
				goto out_clear;
			}
			bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
		}

		if (bdev->bd_bdi == &noop_backing_dev_info)
			bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info);
	} else {
		if (bdev->bd_contains == bdev) {
			ret = 0;
			if (bdev->bd_disk->fops->open)
				ret = bdev->bd_disk->fops->open(bdev, mode);
			/* the same as first opener case, read comment there */
			if (bdev->bd_invalidated) {
				if (!ret)
					rescan_partitions(bdev->bd_disk, bdev);
				else if (ret == -ENOMEDIUM)
					invalidate_partitions(bdev->bd_disk, bdev);
			}
			if (ret)
				goto out_unlock_bdev;
		}
	}
	bdev->bd_openers++;
	if (for_part)
		bdev->bd_part_count++;
	mutex_unlock(&bdev->bd_mutex);
	disk_unblock_events(disk);
	/* only one opener holds refs to the module and disk */
	if (!first_open)
		put_disk_and_module(disk);
	return 0;

 out_clear:
	disk_put_part(bdev->bd_part);
	bdev->bd_disk = NULL;
	bdev->bd_part = NULL;
	bdev->bd_queue = NULL;
	if (bdev != bdev->bd_contains)
		__blkdev_put(bdev->bd_contains, mode, 1);
	bdev->bd_contains = NULL;
 out_unlock_bdev:
	mutex_unlock(&bdev->bd_mutex);
	disk_unblock_events(disk);
	put_disk_and_module(disk);
 out:
	bdput(bdev);

	return ret;
}

TODO: bdev->bd_part = disk_get_part(disk, partno) could be of interest, it's called in both the functional and non functional versions

called next is

rescan_partitions(disk, bdev)

which is defined as

int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
{
	struct parsed_partitions *state = NULL;
	struct hd_struct *part;
	int p, highest, res;
rescan:
	if (state && !IS_ERR(state)) {
		free_partitions(state);
		state = NULL;
	}

	res = drop_partitions(disk, bdev);
	if (res)
		return res;

	if (disk->fops->revalidate_disk)
		disk->fops->revalidate_disk(disk);
	check_disk_size_change(disk, bdev);
	bdev->bd_invalidated = 0;
	if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
		return 0;
	if (IS_ERR(state)) {
		/*
		 * I/O error reading the partition table.  If any
		 * partition code tried to read beyond EOD, retry
		 * after unlocking native capacity.
		 */
		if (PTR_ERR(state) == -ENOSPC) {
			printk(KERN_WARNING "%s: partition table beyond EOD, ",
			       disk->disk_name);
			if (disk_unlock_native_capacity(disk))
				goto rescan;
		}
		return -EIO;
	}
	/*
	 * If any partition code tried to read beyond EOD, try
	 * unlocking native capacity even if partition table is
	 * successfully read as we could be missing some partitions.
	 */
	if (state->access_beyond_eod) {
		printk(KERN_WARNING
		       "%s: partition table partially beyond EOD, ",
		       disk->disk_name);
		if (disk_unlock_native_capacity(disk))
			goto rescan;
	}

	/* tell userspace that the media / partition table may have changed */
	kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);

	/* Detect the highest partition number and preallocate
	 * disk->part_tbl.  This is an optimization and not strictly
	 * necessary.
	 */
	for (p = 1, highest = 0; p < state->limit; p++)
		if (state->parts[p].size)
			highest = p;

	disk_expand_part_tbl(disk, highest);

	/* add partitions */
	for (p = 1; p < state->limit; p++) {
		sector_t size, from;

		size = state->parts[p].size;
		if (!size)
			continue;

		from = state->parts[p].from;
		if (from >= get_capacity(disk)) {
			printk(KERN_WARNING
			       "%s: p%d start %llu is beyond EOD, ",
			       disk->disk_name, p, (unsigned long long) from);
			if (disk_unlock_native_capacity(disk))
				goto rescan;
			continue;
		}

		if (from + size > get_capacity(disk)) {
			printk(KERN_WARNING
			       "%s: p%d size %llu extends beyond EOD, ",
			       disk->disk_name, p, (unsigned long long) size);

			if (disk_unlock_native_capacity(disk)) {
				/* free state and restart */
				goto rescan;
			} else {
				/*
				 * we can not ignore partitions of broken tables
				 * created by for example camera firmware, but
				 * we limit them to the end of the disk to avoid
				 * creating invalid block devices
				 */
				size = get_capacity(disk) - from;
			}
		}

		/*
		 * On a zoned block device, partitions should be aligned on the
		 * device zone size (i.e. zone boundary crossing not allowed).
		 * Otherwise, resetting the write pointer of the last zone of
		 * one partition may impact the following partition.
		 */
		if (bdev_is_zoned(bdev) &&
		    !part_zone_aligned(disk, bdev, from, size)) {
			printk(KERN_WARNING
			       "%s: p%d start %llu+%llu is not zone aligned\n",
			       disk->disk_name, p, (unsigned long long) from,
			       (unsigned long long) size);
			continue;
		}

		part = add_partition(disk, p, from, size,
				     state->parts[p].flags,
				     &state->parts[p].info);
		if (IS_ERR(part)) {
			printk(KERN_ERR " %s: p%d could not be added: %ld\n",
			       disk->disk_name, p, -PTR_ERR(part));
			continue;
		}
#ifdef CONFIG_BLK_DEV_MD
		if (state->parts[p].flags & ADDPART_FLAG_RAID)
			md_autodetect_dev(part_to_dev(part)->devt);
#endif
	}
	free_partitions(state);
	return 0;
}

next check_partition() is called by:

	if (!get_capacity(disk) || !(state = check_partition(disk, bdev))){

check_partition() is in block/partitions/check.c and is defined as

check_partition(struct gendisk *hd, struct block_device *bdev)
{
	struct parsed_partitions *state;
	int i, res, err;

	state = allocate_partitions(hd);
	if (!state)
		return NULL;
	state->pp_buf = (char *)__get_free_page(GFP_KERNEL);
	if (!state->pp_buf) {
		free_partitions(state);
		return NULL;
	}
	state->pp_buf[0] = '\0';

	state->bdev = bdev;
	disk_name(hd, 0, state->name);
	snprintf(state->pp_buf, PAGE_SIZE, " %s:", state->name);
	if (isdigit(state->name[strlen(state->name)-1]))
		sprintf(state->name, "p");

	i = res = err = 0;
	while (!res && check_part[i]) {
		memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
		res = check_part[i++](state);
		if (res < 0) {
			/* We have hit an I/O error which we don't report now.
		 	* But record it, and let the others do their job.
		 	*/
			err = res;
			res = 0;
		}

	}
	if (res > 0) {
		printk(KERN_INFO "%s", state->pp_buf);

		free_page((unsigned long)state->pp_buf);
		return state;
	}
	if (state->access_beyond_eod)
		err = -ENOSPC;
	if (err)
	/* The partition is unrecognized. So report I/O errors if there were any */
		res = err;
	if (res) {
		if (warn_no_part)
			strlcat(state->pp_buf,
				" unable to read partition table\n", PAGE_SIZE);
		printk(KERN_INFO "%s", state->pp_buf);
	}

	free_page((unsigned long)state->pp_buf);
	free_partitions(state);
	return ERR_PTR(res);
}

When its functioning correctly, it runs this while loop:

while (!res && check_part[i]) {
		memset(state->parts, 0, state->limit * sizeof(state->parts[0]));
		res = check_part[i++](state);
		if (res < 0) {
			/* We have hit an I/O error which we don't report now.
		 	* But record it, and let the others do their job.
		 	*/
			err = res;
			res = 0;
		}

Then enters this if statement, as res = check_part[i++](state); sets res > 0, where res is the partition type as defined by

static int (*check_part[])(struct parsed_partitions *) = {
	/*
	 * Probe partition formats with tables at disk address 0
	 * that also have an ADFS boot block at 0xdc0.
	 */
#ifdef CONFIG_ACORN_PARTITION_ICS
	adfspart_check_ICS,
#endif
#ifdef CONFIG_ACORN_PARTITION_POWERTEC
	adfspart_check_POWERTEC,
#endif
#ifdef CONFIG_ACORN_PARTITION_EESOX
	adfspart_check_EESOX,
#endif

	/*
	 * Now move on to formats that only have partition info at
	 * disk address 0xdc0.  Since these may also have stale
	 * PC/BIOS partition tables, they need to come before
	 * the msdos entry.
	 */
#ifdef CONFIG_ACORN_PARTITION_CUMANA
	adfspart_check_CUMANA,
#endif
#ifdef CONFIG_ACORN_PARTITION_ADFS
	adfspart_check_ADFS,
#endif

#ifdef CONFIG_CMDLINE_PARTITION
	cmdline_partition,
#endif
#ifdef CONFIG_EFI_PARTITION
	efi_partition,		/* this must come before msdos */
#endif
#ifdef CONFIG_SGI_PARTITION
	sgi_partition,
#endif
#ifdef CONFIG_LDM_PARTITION
	ldm_partition,		/* this must come before msdos */
#endif
#ifdef CONFIG_MSDOS_PARTITION
	msdos_partition,
#endif
#ifdef CONFIG_OSF_PARTITION
	osf_partition,
#endif
#ifdef CONFIG_SUN_PARTITION
	sun_partition,
#endif
#ifdef CONFIG_AMIGA_PARTITION
	amiga_partition,
#endif
#ifdef CONFIG_ATARI_PARTITION
	atari_partition,
#endif
#ifdef CONFIG_MAC_PARTITION
	mac_partition,
#endif
#ifdef CONFIG_ULTRIX_PARTITION
	ultrix_partition,
#endif
#ifdef CONFIG_IBM_PARTITION
	ibm_partition,
#endif
#ifdef CONFIG_KARMA_PARTITION
	karma_partition,
#endif
#ifdef CONFIG_SYSV68_PARTITION
	sysv68_partition,
#endif
	NULL
};

Back to check_partition(),
Something in the following if prints the partition list resembling:

mmcblk0: p1

and the referenced if block:

	if (res > 0) {
		printk(KERN_INFO "%s", state->pp_buf);

		free_page((unsigned long)state->pp_buf);
    printk("HAL_DEBUG: if res > 0, return state");
		return state;
	}

When the same chunk of code hits the non functioning mmc device, it exits the while loop, and then since res is zero as I believe it is unrecognized, bypasses all of the if statements and runs the cleanup code before returning:

	free_page((unsigned long)state->pp_buf);
	free_partitions(state);
	return ERR_PTR(res);

TODO: Confirm that check_partition is incorrectly not identifying the chromeos partitions. Do this by comparing check_part[] in block/partitions/check.c and the check_partition()` function in both mainline and chromeos kernel.

Next, ``` register_disk() ``` calls ``` blkdev_get(bdev, FMODE_READ, NULL); ``` which is defined as ``` /** * blkdev_get - open a block device * @bdev: block_device to open * @mode: FMODE_* mask * @holder: exclusive holder identifier * * Open @bdev with @mode. If @mode includes %FMODE_EXCL, @bdev is * open with exclusive access. Specifying %FMODE_EXCL with %NULL * @holder is invalid. Exclusive opens may nest for the same @holder. * * On success, the reference count of @bdev is unchanged. On failure, * @bdev is put. * * CONTEXT: * Might sleep. * * RETURNS: * 0 on success, -errno on failure. */ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder) { struct block_device *whole = NULL; int res; WARN_ON_ONCE((mode & FMODE_EXCL) && !holder); if ((mode & FMODE_EXCL) && holder) { whole = bd_start_claiming(bdev, holder); if (IS_ERR(whole)) { bdput(bdev); return PTR_ERR(whole); } } res = __blkdev_get(bdev, mode, 0); if (whole) { struct gendisk *disk = whole->bd_disk; /* finish claiming */ mutex_lock(&bdev->bd_mutex); spin_lock(&bdev_lock); if (!res) { BUG_ON(!bd_may_claim(bdev, whole, holder)); /* * Note that for a whole device bd_holders * will be incremented twice, and bd_holder * will be set to bd_may_claim before being * set to holder */ whole->bd_holders++; whole->bd_holder = bd_may_claim; bdev->bd_holders++; bdev->bd_holder = holder; } /* tell others that we're done */ BUG_ON(whole->bd_claiming != holder); whole->bd_claiming = NULL; wake_up_bit(&whole->bd_claiming, 0); spin_unlock(&bdev_lock); /* * Block event polling for write claims if requested. Any * write holder makes the write_holder state stick until * all are released. This is good enough and tracking * individual writeable reference is too fragile given the * way @mode is used in blkdev_get/put(). */ if (!res && (mode & FMODE_WRITE) && !bdev->bd_write_holder && (disk->flags & GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE)) { bdev->bd_write_holder = true; disk_block_events(disk); } mutex_unlock(&bdev->bd_mutex); bdput(whole); } return res; } EXPORT_SYMBOL(blkdev_get); ``` Rather unsurprisingly, ``` blkdev_get() ``` calls ``` __blkdev_get(bdev, mode, 0) ``` which is large, and defined as: ``` static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) { struct gendisk *disk; int ret; int partno; int perm = 0; bool first_open = false; if (mode & FMODE_READ) perm |= MAY_READ; if (mode & FMODE_WRITE) perm |= MAY_WRITE; /* * hooks: /n/, see "layering violations". */ if (!for_part) { ret = devcgroup_inode_permission(bdev->bd_inode, perm); if (ret != 0) { bdput(bdev); return ret; } } restart: ret = -ENXIO; disk = bdev_get_gendisk(bdev, &partno); if (!disk) goto out; disk_block_events(disk); mutex_lock_nested(&bdev->bd_mutex, for_part); if (!bdev->bd_openers) { first_open = true; bdev->bd_disk = disk; bdev->bd_queue = disk->queue; bdev->bd_contains = bdev; bdev->bd_partno = partno; if (!partno) { ret = -ENXIO; bdev->bd_part = disk_get_part(disk, partno); if (!bdev->bd_part) goto out_clear; ret = 0; if (disk->fops->open) { ret = disk->fops->open(bdev, mode); if (ret == -ERESTARTSYS) { /* Lost a race with 'disk' being * deleted, try again. * See md.c */ disk_put_part(bdev->bd_part); bdev->bd_part = NULL; bdev->bd_disk = NULL; bdev->bd_queue = NULL; mutex_unlock(&bdev->bd_mutex); disk_unblock_events(disk); put_disk_and_module(disk); goto restart; } } if (!ret) bd_set_size(bdev,(loff_t)get_capacity(disk)<<9); /* * If the device is invalidated, rescan partition * if open succeeded or failed with -ENOMEDIUM. * The latter is necessary to prevent ghost * partitions on a removed medium. */ if (bdev->bd_invalidated) { if (!ret) rescan_partitions(disk, bdev); else if (ret == -ENOMEDIUM) invalidate_partitions(disk, bdev); } if (ret) goto out_clear; } else { struct block_device *whole; whole = bdget_disk(disk, 0); ret = -ENOMEM; if (!whole) goto out_clear; BUG_ON(for_part); ret = __blkdev_get(whole, mode, 1); if (ret) goto out_clear; bdev->bd_contains = whole; bdev->bd_part = disk_get_part(disk, partno); if (!(disk->flags & GENHD_FL_UP) || !bdev->bd_part || !bdev->bd_part->nr_sects) { ret = -ENXIO; goto out_clear; } bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9); } if (bdev->bd_bdi == &noop_backing_dev_info) bdev->bd_bdi = bdi_get(disk->queue->backing_dev_info); } else { if (bdev->bd_contains == bdev) { ret = 0; if (bdev->bd_disk->fops->open) ret = bdev->bd_disk->fops->open(bdev, mode); /* the same as first opener case, read comment there */ if (bdev->bd_invalidated) { if (!ret) rescan_partitions(bdev->bd_disk, bdev); else if (ret == -ENOMEDIUM) invalidate_partitions(bdev->bd_disk, bdev); } if (ret) goto out_unlock_bdev; } } bdev->bd_openers++; if (for_part) bdev->bd_part_count++; mutex_unlock(&bdev->bd_mutex); disk_unblock_events(disk); /* only one opener holds refs to the module and disk */ if (!first_open) put_disk_and_module(disk); return 0; out_clear: disk_put_part(bdev->bd_part); bdev->bd_disk = NULL; bdev->bd_part = NULL; bdev->bd_queue = NULL; if (bdev != bdev->bd_contains) __blkdev_put(bdev->bd_contains, mode, 1); bdev->bd_contains = NULL; out_unlock_bdev: mutex_unlock(&bdev->bd_mutex); disk_unblock_events(disk); put_disk_and_module(disk); out: bdput(bdev); return ret; } ``` TODO: `bdev->bd_part = disk_get_part(disk, partno)` could be of interest, it's called in both the functional and non functional versions called next is ```rescan_partitions(disk, bdev)``` which is defined as ``` int rescan_partitions(struct gendisk *disk, struct block_device *bdev) { struct parsed_partitions *state = NULL; struct hd_struct *part; int p, highest, res; rescan: if (state && !IS_ERR(state)) { free_partitions(state); state = NULL; } res = drop_partitions(disk, bdev); if (res) return res; if (disk->fops->revalidate_disk) disk->fops->revalidate_disk(disk); check_disk_size_change(disk, bdev); bdev->bd_invalidated = 0; if (!get_capacity(disk) || !(state = check_partition(disk, bdev))) return 0; if (IS_ERR(state)) { /* * I/O error reading the partition table. If any * partition code tried to read beyond EOD, retry * after unlocking native capacity. */ if (PTR_ERR(state) == -ENOSPC) { printk(KERN_WARNING "%s: partition table beyond EOD, ", disk->disk_name); if (disk_unlock_native_capacity(disk)) goto rescan; } return -EIO; } /* * If any partition code tried to read beyond EOD, try * unlocking native capacity even if partition table is * successfully read as we could be missing some partitions. */ if (state->access_beyond_eod) { printk(KERN_WARNING "%s: partition table partially beyond EOD, ", disk->disk_name); if (disk_unlock_native_capacity(disk)) goto rescan; } /* tell userspace that the media / partition table may have changed */ kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE); /* Detect the highest partition number and preallocate * disk->part_tbl. This is an optimization and not strictly * necessary. */ for (p = 1, highest = 0; p < state->limit; p++) if (state->parts[p].size) highest = p; disk_expand_part_tbl(disk, highest); /* add partitions */ for (p = 1; p < state->limit; p++) { sector_t size, from; size = state->parts[p].size; if (!size) continue; from = state->parts[p].from; if (from >= get_capacity(disk)) { printk(KERN_WARNING "%s: p%d start %llu is beyond EOD, ", disk->disk_name, p, (unsigned long long) from); if (disk_unlock_native_capacity(disk)) goto rescan; continue; } if (from + size > get_capacity(disk)) { printk(KERN_WARNING "%s: p%d size %llu extends beyond EOD, ", disk->disk_name, p, (unsigned long long) size); if (disk_unlock_native_capacity(disk)) { /* free state and restart */ goto rescan; } else { /* * we can not ignore partitions of broken tables * created by for example camera firmware, but * we limit them to the end of the disk to avoid * creating invalid block devices */ size = get_capacity(disk) - from; } } /* * On a zoned block device, partitions should be aligned on the * device zone size (i.e. zone boundary crossing not allowed). * Otherwise, resetting the write pointer of the last zone of * one partition may impact the following partition. */ if (bdev_is_zoned(bdev) && !part_zone_aligned(disk, bdev, from, size)) { printk(KERN_WARNING "%s: p%d start %llu+%llu is not zone aligned\n", disk->disk_name, p, (unsigned long long) from, (unsigned long long) size); continue; } part = add_partition(disk, p, from, size, state->parts[p].flags, &state->parts[p].info); if (IS_ERR(part)) { printk(KERN_ERR " %s: p%d could not be added: %ld\n", disk->disk_name, p, -PTR_ERR(part)); continue; } #ifdef CONFIG_BLK_DEV_MD if (state->parts[p].flags & ADDPART_FLAG_RAID) md_autodetect_dev(part_to_dev(part)->devt); #endif } free_partitions(state); return 0; } ``` next `check_partition()` is called by: ``` if (!get_capacity(disk) || !(state = check_partition(disk, bdev))){ ``` `check_partition()` is in `block/partitions/check.c` and is defined as ``` check_partition(struct gendisk *hd, struct block_device *bdev) { struct parsed_partitions *state; int i, res, err; state = allocate_partitions(hd); if (!state) return NULL; state->pp_buf = (char *)__get_free_page(GFP_KERNEL); if (!state->pp_buf) { free_partitions(state); return NULL; } state->pp_buf[0] = '\0'; state->bdev = bdev; disk_name(hd, 0, state->name); snprintf(state->pp_buf, PAGE_SIZE, " %s:", state->name); if (isdigit(state->name[strlen(state->name)-1])) sprintf(state->name, "p"); i = res = err = 0; while (!res && check_part[i]) { memset(state->parts, 0, state->limit * sizeof(state->parts[0])); res = check_part[i++](state); if (res < 0) { /* We have hit an I/O error which we don't report now. * But record it, and let the others do their job. */ err = res; res = 0; } } if (res > 0) { printk(KERN_INFO "%s", state->pp_buf); free_page((unsigned long)state->pp_buf); return state; } if (state->access_beyond_eod) err = -ENOSPC; if (err) /* The partition is unrecognized. So report I/O errors if there were any */ res = err; if (res) { if (warn_no_part) strlcat(state->pp_buf, " unable to read partition table\n", PAGE_SIZE); printk(KERN_INFO "%s", state->pp_buf); } free_page((unsigned long)state->pp_buf); free_partitions(state); return ERR_PTR(res); } ``` When its functioning correctly, it runs this while loop: ``` while (!res && check_part[i]) { memset(state->parts, 0, state->limit * sizeof(state->parts[0])); res = check_part[i++](state); if (res < 0) { /* We have hit an I/O error which we don't report now. * But record it, and let the others do their job. */ err = res; res = 0; } ``` Then enters this if statement, as ` res = check_part[i++](state);` sets `res > 0`, where `res` is the partition type as defined by ``` static int (*check_part[])(struct parsed_partitions *) = { /* * Probe partition formats with tables at disk address 0 * that also have an ADFS boot block at 0xdc0. */ #ifdef CONFIG_ACORN_PARTITION_ICS adfspart_check_ICS, #endif #ifdef CONFIG_ACORN_PARTITION_POWERTEC adfspart_check_POWERTEC, #endif #ifdef CONFIG_ACORN_PARTITION_EESOX adfspart_check_EESOX, #endif /* * Now move on to formats that only have partition info at * disk address 0xdc0. Since these may also have stale * PC/BIOS partition tables, they need to come before * the msdos entry. */ #ifdef CONFIG_ACORN_PARTITION_CUMANA adfspart_check_CUMANA, #endif #ifdef CONFIG_ACORN_PARTITION_ADFS adfspart_check_ADFS, #endif #ifdef CONFIG_CMDLINE_PARTITION cmdline_partition, #endif #ifdef CONFIG_EFI_PARTITION efi_partition, /* this must come before msdos */ #endif #ifdef CONFIG_SGI_PARTITION sgi_partition, #endif #ifdef CONFIG_LDM_PARTITION ldm_partition, /* this must come before msdos */ #endif #ifdef CONFIG_MSDOS_PARTITION msdos_partition, #endif #ifdef CONFIG_OSF_PARTITION osf_partition, #endif #ifdef CONFIG_SUN_PARTITION sun_partition, #endif #ifdef CONFIG_AMIGA_PARTITION amiga_partition, #endif #ifdef CONFIG_ATARI_PARTITION atari_partition, #endif #ifdef CONFIG_MAC_PARTITION mac_partition, #endif #ifdef CONFIG_ULTRIX_PARTITION ultrix_partition, #endif #ifdef CONFIG_IBM_PARTITION ibm_partition, #endif #ifdef CONFIG_KARMA_PARTITION karma_partition, #endif #ifdef CONFIG_SYSV68_PARTITION sysv68_partition, #endif NULL }; ``` Back to `check_partition()`, Something in the following if prints the partition list resembling: ``` mmcblk0: p1 ``` and the referenced if block: ``` if (res > 0) { printk(KERN_INFO "%s", state->pp_buf); free_page((unsigned long)state->pp_buf); printk("HAL_DEBUG: if res > 0, return state"); return state; } ``` When the same chunk of code hits the non functioning mmc device, it exits the while loop, and then since res is zero as I believe it is unrecognized, bypasses all of the if statements and runs the cleanup code before returning: ``` free_page((unsigned long)state->pp_buf); free_partitions(state); return ERR_PTR(res); ``` ~~TODO: Confirm that `check_partition` is incorrectly not identifying the chromeos partitions. Do this by comparing `check_part[]` in `block/partitions/check.c and the `check_partition()` function in both mainline and chromeos kernel.~~
SolidEva commented 2018-08-23 03:51:51 +02:00 (Migrated from github.com)

Compared chromeos to mainline, very few differences but it did confirm that if res = 0 the partition is unrecognized. Also confirmed res = 0 through additional tests.

The three commits labeled CHROMIUM here: https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.14/block/partitions are all great candidates for testing.

Testing the two that aren't about a print function now.

Compared chromeos to mainline, very few differences but it did confirm that if `res = 0` the partition is unrecognized. Also confirmed `res = 0` through additional tests. The three commits labeled CHROMIUM here: https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.14/block/partitions are all great candidates for testing. Testing the two that aren't about a print function now.
SolidEva commented 2018-08-23 04:25:33 +02:00 (Migrated from github.com)

Those two commits fix the issue!

Those two commits fix the issue!
SolidEva commented 2018-08-23 20:52:20 +02:00 (Migrated from github.com)
Adding patches made from these commits: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/abba28d0a1b7361da6e2023352e92687166ca30d https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a2b7b398404c665926a0e085523f40a51a419e29 https://chromium.googlesource.com/chromiumos/third_party/kernel/+/bd0c62c7de0c8a63314b7955e5718d8f6192f9d2 To the master branch, also keeping the `CONFIG_MMC_BLOCK_MINORS=16` config option Will close this issue when fully tested
SolidEva commented 2018-08-26 20:01:41 +02:00 (Migrated from github.com)

Patches fully tested and placed in /patches-tested, closing issue

Patches fully tested and placed in `/patches-tested`, closing issue
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ev4/PrawnOS#17
No description provided.