Re: Huge page support for PowerPC 32 bit and WIMG flexibility
From: Kumar Gala <hidden>
Date: 2007-01-31 22:30:32
On Jan 31, 2007, at 4:01 PM, Ilya Lipovsky wrote:
Hi, I am not experienced in kernel development, so please be patient. After exploring the latest (2.19.2) sources it appears that there =20 is no huge page support for the 32 bit powerpc platform. I deduced =20 it by starting from 0x300 in head_32.S and comparing notes with =20 head_64.S. It appears that the only sensible path for hashing in a =20 huge page (on 64bit ppc) is via: 0x300: data_access -> do_hash_page -> hash_page -> hash_huge_page Unfortunately, on the 32bit, all paths that do anything useful end =20 up in create_hpte() found in hash_low_32.S. I noticed someone on =20 this mailing list claiming huge page support for IBM 44x core=85 Is =20=
it possible to make it general enough to encompass ppc32 in general? Another issue I have is the absence of control over hardware =20 specific attributes of memory such as WIMG. More concretely, I am =20 interested in having the ability to allocate off the heap in such a =20=
way so as to explicitly set the M (coherency) bit off =20 (independently of SMP or non-SMP mode). This is needed because some =20=
multicore PowerPC platforms (e.g. 745x) perform an extra address =20 broadcast to guarantee cache coherency per each store miss on a =20 cacheline. This degrades performance for store-bound programs. I understand that hashing pages as non-cache-coherent makes data =20 contained therein a potential victim to cache coherency paradoxes. =20 Nevertheless, since I am working on high performance library, I am =20 prepared to shift coherency guarantees to the library, which is =20 supposed the one managing the data flow between memory and CPU =20 caches intelligently. So, I have 2 main questions: 1) What=92s so special about ppc32 that it didn=92t get the =20 matching feature of huge page support that ppc64 has? Who is =20 responsible/willing to fix it?
The ppc32 HW doesn't support the same MMU features that ppc64 does. =20 There's a possibility for something like tlbfs support using BATs, =20 but the normal MMU path doesn't have any HW capable of doing large =20 pages.
2) Is it appropriate to provide a syscall mechanism (parallel =20=
to sys_brk, sys_mmap, and sys_shmget) to add WIMG settings?
You can do some of this via mmap today. I think O_SYNC is the flag =20 you need (well at least for mmap'ing /dev/mem).
Overall, the vision here is to be able (from user-side, on =20 powerpc32) to call: shmid =3D shmget(2, LENGTH, SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W | =20=
POWERPC_NONCOHERENT); shmaddr =3D shmat(shmid, ADDR, SHMAT_FLAGS); And get a segment mapped with wimg=3D0bxx0x (actually, I assume all =20=
x=92s are 0). This would be very nice! Thank you, -Ilya P.S. As a side note, it is pretty difficult to read kernel sources =20 (especially assembly ones) because of the lack of comments for =20 people who are not in the kernel hacker =93circle.=94 For example, =
what =20
in the whole world is =93paca??=94
"paca" has to deal with the IBM HV interface. - k=