mmc issue? #17
Labels
No labels
CI enhacement
CS10 (chromestick)
HIGH PRIOROITY
Low Priority
Solved
TODO
arm64
armhf
bug
c100 (veyron minnie)
duplicate
enhancement
good first issue
help wanted
invalid
minor bug
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ev4/PrawnOS#17
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Picked up another c201, which has a different emmc. The usual installation isn't working either even if the size is corrected.
chromeos-3.14 config:
4.9 config:
CONFIG_MMC_BLOCK_MINORS and CONFIG_MMC_DW_IDMAC are different at least
built chromeos recovery image with bin found at:
This was found in the linux recovery script.
the stock chromeos with kernel 3.14 can read the mmc correctly and creates /dev/ files for each partition unline the newer kernel (>=4.9)
Testing with debian and 3.14 kernel.
arch linux has some good cgpt docs when combined with the man page https://wiki.archlinux.org/index.php/Talk:Chrome_OS_devices
The scripts here may be a good example of how the mmc has to be partitioned correctly, otherwise the main script of the chromeos recovery binary would be a great reference, however I have yet to extract them.
https://github.com/altreact/archbk/issues/3
linux chromeos recovery guide: https://github.com/raphael/linux-samus/wiki/How-do-I-restore-ChromeOS-after-it-has-been-removed%3F
dd'ing the debian install with 3.14 chromeos kernel directly from the flash drive didnt work, but adjusting the partition map like in the https://github.com/altreact/archbk/issues/3 scrips then dd'ing sda1 to kerna and sda2 to root a worked! Just need two scripts for the two standard gpt tables. Probably will find similar logic in the recovery installer.
TODO: Test making own gpt table in same style as expected.
In the newer kernels, 'CONFIG_MMC_UNSAFE_RESUME' is removed.
Even if this wasn't set, the kernel command line parameter 'removable' could be used
This seems to have been replaced by the device tree parameter
non-removablewhich is set intorvalds/linux@8efcf34a26/arch/arm/boot/dts/rk3288-veyron.dtsiThis commit shows the usage of the 'nonremovable' property, and the switch to allowing the read to be from the device tree
torvalds/linux@73a47a9bb3TODO: This is worth looking into further if nothing else works. I have yet to prove the 'non-removable' parameter provides the same function as 'CONFIG_MMC_UNSAFE_RESUME'
'CONFIG_MMC_BLOCK_BOUNCE' was removed by this patch https://patchwork.kernel.org/patch/9732989/
as it was found to be default on, and disabled by kernel logic later if unneeded
TODO: If issue persists, ensure block bounce is still functioning properly
https://cateee.net/lkddb/web-lkddb/MMC_BLOCK_BOUNCE.html implies it was last found in kernel 4.12
disabled 'CONFIG_PHY_ROCKCHIP_EMMC=y'
'CONFIG_MMC_DW_IDMAC=y' was removed by this commit
torvalds/linux@3fc7eaef44 (diff-913858dae5)between 4.3 and 4.4
and was replaced with logic to determine whether internal or external dma are to be used
in
static void dw_mci_init_dma(struct dw_mci *host)SDMMC_GET_TRANS_MODE(x)is defined asmci_readlis defined as the following in /drivers/mmc/host/dw_mmc.hTODO: determining which mode this logic selects is a decent avenue, as the bug would exist in the two versions I've tested that have the issue: 4.9 and 4.17
disabled
CONFIG_SCSI_SCAN_ASYNCfor debuggingTODO: re-enable if not the culprit, speeds up everything
Building two images, one with the newly modified 4.17 config and another with the arch arm config found here
https://raw.githubusercontent.com/archlinuxarm/PKGBUILDs/master/core/linux-veyron/config
Built third image with the
broken-cdparameter in the emmc and sdio sections of the veyron device treeessentially reverted the following commit just for rk3288-veyron.dtsi
torvalds/linux@57375d88fa (diff-65dbac344b)None of the three images described above could see the internal mmc.
Testing linux-libre 3.14 to see if support for this mmc was broken sometime in mainline or specifically added to chromeos-3.14
3.14 mainline has no support for veyron speedy, trying 4.3.6 as 4.3 is the first release with the dtsi
4.3.6 boots to a white screen.
Found the chromeos kernel always includes the reset function in pinctl-0:
pinctrl-0 = <&emmc_clk &emmc_cmd &emmc_bus8 &emmc_deassert_reset>;but mainline only does if
mmc-pwrseqis supportedTODO: test including
emmc_resetin pinctrl-0 by default, also test the emmc pwrseq config option.TODO: Try the mmc test config https://stackoverflow.com/questions/40882479/is-there-any-user-space-tool-available-for-emmc-to-perform-diagnostic-test-and-p
Testing https://github.com/dimkr/devsus/tree/hybrid which has mmc-pwrseq enabled in configActually testing https://github.com/SolidHal/devsus/tree/hybrid_dev which build debian instead of devuan as it is much more stable to build as the devuan servers can stop responding for seemingly no reason.
Since https://github.com/SolidHal/devsus/tree/master is able to see and use the emmc with debian stretch and the chromeos-3.14 kernel we can guarantee the OS is not at fault here, so testing with debian stretch is a non issue.
Also testing multiple older commits that use older configs and older versions of 4.17:
Librean master commit
fbac1b5fe2Librean master commit
23e05d6af6Strong lead,
CONFIG_MMC_BLOCK_MINORSis set to 256 in the 4.17 k configs.The answer at this post explains why that is a problem: https://unix.stackexchange.com/questions/217640/dev-mmcblk0-partitions-limit.
Since 4.17 would map the emmc to mmcblk2,
CONFIG_MMC_BLOCK_MINORS=256would leave no room for anything besides mmcblk0 and its partitions.In devsus 3.14
CONFIG_MMC_BLOCK_MINORSis set to 16. This is a good compromise, as each mmcblk device can have a ton of partitions in default chromeos.TODO: Build image and test from this repo https://github.com/SolidHal/Librean/tree/mintest-emmc-4.17Edit: didn't work, update here:https://github.com/SolidHal/Librean/issues/17#issuecomment-413933394Image name: Librean-4.17.2-mintest-CONFIG_MMC_BLOCK_MINORS-16.img
For future debugging, mmc attributes are described here https://www.kernel.org/doc/Documentation/mmc/mmc-dev-attrs.txt
and can be read at least partially using https://www.kernel.org/doc/Documentation/mmc/mmc-tools.txt
Include this patch as well if previous TODO doesnt work out: https://github.com/Miouyouyou/RockMyy/blob/master/patches/kernel/v4.17/DTS/0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch
This didn't work, but I did notice fdisk -l recognizes all of the partitions but they don't appear in /dev/
The boot partitions and rmpbr partitons of the mmcblk device do show up in /dev/ however
TODO: Try running
partprobe?This guy had some issues with an sdcard that are similar:
https://unix.stackexchange.com/questions/198082/my-sd-card-has-a-partition-but-linux-doesnt-create-a-device-entry-for-the-parti
It looks like the posted dmesg logs have some errors that may be of interest
Further down, one answered suggests:
As a workaround they suggest creating a udev rule that runs partprobe
TODO: see if these errors are present on the device as wellThey arent, see here: https://github.com/SolidHal/Librean/issues/17#issuecomment-414337527
It could be the chromeos 3.14 kernel has some customizations around reading the partition table that mainline never got
partprobe is not a standard part of the debian install, need the parted package
Built image with
CONFIG_MMC_DEBUGenabled, as well as with thepartedpackage for thepartprobecommand.Additionally included this patch https://github.com/Miouyouyou/RockMyy/blob/master/patches/kernel/v4.17/DTS/0007-RK3288-DTSI-rk3288-Add-missing-SPI2-pinctrl.patch for kicks.
Follow procedure refered to here: https://github.com/SolidHal/Librean/issues/17#issuecomment-412529789
for the mmc debugging process.
These errors are NOT present, as the mmc driver is used not the sd (scsi device) driver.
Got logs from inserting and removing a sd card, which should have a similar init process in the mmc driver as as the emmc.
Dissecting:
is printed by
torvalds/linux@0a4b6e2f80/drivers/mmc/core/bus.cI believe
is printed by
block.c, which is located at https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.14/drivers/mmc/card/block.c forchrome os, and https://github.com/torvalds/linux/blob/master/drivers/mmc/core/block.c formainlineAnd
is printed by https://elixir.bootlin.com/linux/latest/source/drivers/mmc/core/bus.c#L371
block.cdiverges from mainline at commit f662ae48ae67dfd42739e65750274fe8de46240aTracking commit differences between
chrome osandmainlineAs mentioned,
f662ae4 mmc: fix host release issue after discard operationbefore or after seems to be missing in mainline, but likely not relevantce4421b mmc_block: Allow more than 8 partitions per cardgets merged to bothHEADofchrome oshttps://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-3.14/drivers/mmc/card/block.c is very similar totorvalds/linux@f662ae48ae/drivers/mmc/card/block.cThis change could be problematic, maybe it's timing out before it can read the partition table?Edit: it wasn't
TODO: Test Librean-timeout-4.17.2-test.imgEdit: No changeThis patch disabled packaged commands:
https://patchwork.kernel.org/patch/9439165/
A less urgent TODO in CHROME OS 3.14: test if the c201 emmc requires packaged commands by changing
to
which should disable the feature in the
chrome oskernelThe dif between mainline block.c and chrome os block.c https://ghostbin.com/paste/w9u3g
Some log analysis:
The 4.17 logs have these lines spammed at boot, they are missing from the 3.14 logs
Seems to be from
torvalds/linux@ce84eca927which references this committorvalds/linux@c420c1e4dbwhich started checking for invalid clock rates. Tryingtorvalds/linux@ce84eca927,as its not in 4.17.2, to see if it removes these.ADDING THE ABOVE PATCH DID NOT FIX THE ISSUE, IT MADE IT WORSE! Now between the
dwmmc_rockchip ff-------lines there are more invalid clk rate errors.Also:In working:(From core/mmc.c)NOT in non working version, TODO: look into BKOPS??Not a problem, the chromeos version prints if BKOPS is not enabled, and mainline prints if it IS enabled. Since there is no message about BKOPS, we are good to go.
Try to find what prints:
As it is all missing from the 4.17.2 log
/drivers/mmc/core/block.c, specificallydevice_add_diskinmmc_add_diskshould be getting the partition table...So add some prints around that, and
mmc_blk_alloc_reqas well asmmc_blk_alloc_rpmb_partboth do a bunch, so add some prints there.I added some pr_info calls but they don't seem to happen? Test multiple print methods.
Tomorrow:
Try this:
And add the above described print statements in order to find out what prints the partition list in the kernel log.
Analysis of how In
block.cin the mmc driver finds partitions,It starts here
When a new mmc device is attached
device_add_disk(md->parent, md->disk)is called, and does not return until the mmc device is removed or the os mounts a partition on it. In the meantime however,device_add_disk(md->parent, md->disk)or something it calls printsif the user calls
mounton one of the partitons, or if the device is removeddevice_add_disk(md->parent, md->disk)returns andprintk("HAL_DEBUG: DID WE FIND ANY??")is printedcalls
Which is defined as
This specifically bodes well... 🙄
calls
which is defined as:
To be continued: added comments to register_disk...
Next,
calls
which is defined as
Rather unsurprisingly,
calls
which is large, and defined as:
TODO:
bdev->bd_part = disk_get_part(disk, partno)could be of interest, it's called in both the functional and non functional versionscalled next is
rescan_partitions(disk, bdev)which is defined as
next
check_partition()is called by:check_partition()is inblock/partitions/check.cand is defined asWhen its functioning correctly, it runs this while loop:
Then enters this if statement, as
res = check_part[i++](state);setsres > 0, whereresis the partition type as defined byBack to
check_partition(),Something in the following if prints the partition list resembling:
and the referenced if block:
When the same chunk of code hits the non functioning mmc device, it exits the while loop, and then since res is zero as I believe it is unrecognized, bypasses all of the if statements and runs the cleanup code before returning:
TODO: Confirm thatcheck_partitionis incorrectly not identifying the chromeos partitions. Do this by comparingcheck_part[]inblock/partitions/check.c and thecheck_partition()` function in both mainline and chromeos kernel.Compared chromeos to mainline, very few differences but it did confirm that if
res = 0the partition is unrecognized. Also confirmedres = 0through additional tests.The three commits labeled CHROMIUM here: https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-3.14/block/partitions are all great candidates for testing.
Testing the two that aren't about a print function now.
Those two commits fix the issue!
Adding patches made from these commits:
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/abba28d0a1b7361da6e2023352e92687166ca30d
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/a2b7b398404c665926a0e085523f40a51a419e29
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/bd0c62c7de0c8a63314b7955e5718d8f6192f9d2
To the master branch, also keeping the
CONFIG_MMC_BLOCK_MINORS=16config optionWill close this issue when fully tested
Patches fully tested and placed in
/patches-tested, closing issue