1. You are viewing our forum as a guest. For full access please Register. WindowsBBS.com is completely free, paid for by advertisers and donations.

Server 2003 current patches -SP1 reboots with stop error IRQL_NOT_LESS_OR_EQUAL

Discussion in 'Windows Server System' started by NJITKehoe, 2005/04/11.

Thread Status:
Not open for further replies.
  1. 2005/04/11
    NJITKehoe

    NJITKehoe Inactive Thread Starter

    Joined:
    2005/04/11
    Messages:
    12
    Likes Received:
    0
    This server is a PE4400 dual 1Ghz Xeon w/4GB RAM, 2 x Perc 3DC's, OS is on an internal hardware raid mirror. Server 2003 enterprise, all current patches except SP1, SQL 2000 w/current patches, McAfee VS 8.0i patch 10, ePO agent 3.5 and is primarily a file server serving many files under 1MB to approximately 400-500 users concurrently.

    Roughly every 8-16 days the server reboots with the same error code and then same memory dump.

    I've isolated and fixed several small memory leaks related to McAfee products using poolmon. Weeks of monitoring has not shown any excessive leaks or resource hogs, memory management appears to allocating and releasing memory based on demand, load, cache, etc...

    Based on what I've been reading, I'm led to believe that the crashdump below could indicate a CPU problem based on this line "Probably caused by : hardware ( nt!KiDeferredReadyThread+4b1 )" and since the address which referenced memory was e0b59030 which lies in the address range used by ntkrnlmp.exe.

    As the stop error points out, i do not have an IRQ of 00000051 (81)

    System Information report written at: 04/11/05 17:09:44
    System Name:
    [IRQs] Resource Device Status
    IRQ 9 Microsoft ACPI-Compliant System OK
    IRQ 17 3Com Gigabit NIC (3C2000) OK
    IRQ 13 Numeric data processor OK
    IRQ 0 System timer OK
    IRQ 6 Standard floppy disk controller OK
    IRQ 1 Standard 101/102-Key or Microsoft Natural PS/2 Keyboard OK
    IRQ 12 PS/2 Compatible Mouse OK
    IRQ 4 Communications Port (COM1) OK
    IRQ 3 Communications Port (COM2) OK
    IRQ 8 System CMOS/real time clock OK
    IRQ 15 System board OK
    IRQ 23 DELL PERC 3/DC & PERC 3/DCL RAID Controller OK
    IRQ 11 DELL PERC RAID Adapter Component OK
    IRQ 22 DELL PERC 3/DC & PERC 3/DCL RAID Controller OK
    IRQ 7 DELL PERC RAID Adapter Component OK
    IRQ 25 Adaptec AIC-7880 PCI Ultra SCSI OK

    Does this sound accurate to anyone? Does anyone have any advice or suggestions? Thanks...

    -Mike


    0: kd> !analyze -v;r;kv;lmtn;
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 00000051, memory referenced
    Arg2: 0000001b, IRQL
    Arg3: 00000000, value 0 = read operation, 1 = write operation
    Arg4: e0b59030, address which referenced memory

    Debugging Details:
    ------------------


    READ_ADDRESS: 00000051

    CURRENT_IRQL: 1b

    FAULTING_IP:
    nt!KiDeferredReadyThread+4b1
    e0b59030 a051000000 mov al,[00000051]

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    LAST_CONTROL_TRANSFER: from e0b5ac86 to e0b59030

    MISALIGNED_IP:
    nt!KiDeferredReadyThread+4b1
    e0b59030 a051000000 mov al,[00000051]

    TRAP_FRAME: e0bda430 -- (.trap ffffffffe0bda430)
    ErrCode = 00000000
    eax=00000000 ebx=ffdff9bc ecx=00000000 edx=ffdff9bc esi=fb4a1800 edi=f61c69bd
    eip=e0b59030 esp=e0bda4a4 ebp=e0bda4c0 iopl=0 nv up ei pl zr na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
    nt!KiDeferredReadyThread+0x4b1:
    e0b59030 a051000000 mov al,[00000051] ds:0023:00000051=??
    Resetting default scope

    STACK_TEXT:
    e0bda4c0 e0b5ac86 00000000 e0b5ab93 e0bda55c nt!KiDeferredReadyThread+0x4b1
    e0bda4c8 e0b5ab93 e0bda55c 00000000 f83a3b88 nt!KiProcessDeferredReadyList+0x16
    e0bda4e4 e0b5aac2 ffdff980 ffdff120 00000000 nt!KiExitDispatcher+0x23
    e0bda588 e0b5ad3c 00000000 00000000 02aefdb2 nt!KiTimerExpiration+0x207
    e0bda5e0 e0b511f7 00000000 0000000e 00000000 nt!KiRetireDpcList+0x63


    FOLLOWUP_IP:
    nt!KiDeferredReadyThread+4b1
    e0b59030 a051000000 mov al,[00000051]

    SYMBOL_STACK_INDEX: 0

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: nt!KiDeferredReadyThread+4b1

    IMAGE_NAME: hardware

    DEBUG_FLR_IMAGE_TIMESTAMP: 0

    STACK_COMMAND: .trap ffffffffe0bda430 ; kb

    MODULE_NAME: hardware

    BUCKET_ID: IP_MISALIGNED

    Followup: MachineOwner
    ---------

    eax=ffdff13c ebx=0000000a ecx=e0be4d00 edx=40000000 esi=ffdff120 edi=00000051
    eip=e0baeac9 esp=e0bda3fc ebp=e0bda414 iopl=0 nv up ei ng nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
    nt!KeBugCheckEx+0x19:
    e0baeac9 5d pop ebp
    ChildEBP RetAddr Args to Child
    e0bda414 e0b4df58 0000000a 00000051 0000001b nt!KeBugCheckEx+0x19 (FPO: [Non-Fpo])
    e0bda414 e0b59030 0000000a 00000051 0000001b nt!KiTrap0E+0x224 (FPO: [0,0] TrapFrame @ e0bda430)
    e0bda4c0 e0b5ac86 00000000 e0b5ab93 e0bda55c nt!KiDeferredReadyThread+0x4b1 (FPO: [Non-Fpo])
    e0bda4c8 e0b5ab93 e0bda55c 00000000 f83a3b88 nt!KiProcessDeferredReadyList+0x16 (FPO: [0,0,0])
    e0bda4e4 e0b5aac2 ffdff980 ffdff120 00000000 nt!KiExitDispatcher+0x23 (FPO: [Non-Fpo])
    e0bda588 e0b5ad3c 00000000 00000000 02aefdb2 nt!KiTimerExpiration+0x207 (FPO: [EBP 0xe0bda5e0] [4,36,0])
    e0bda5e0 e0b511f7 00000000 0000000e 00000000 nt!KiRetireDpcList+0x63 (FPO: [Non-Fpo])
    start end module name
    de800000 de9c7000 win32k win32k.sys Tue Dec 28 18:26:09 2004 (41D1EB91)
    de9c7000 de9dd000 dxg dxg.sys Tue Mar 25 04:46:23 2003 (3E80256F)
    de9dd000 de9f6c00 atiraged atiraged.dll Tue Mar 25 04:46:28 2003 (3E802574)
    e0b21000 e0b49000 hal halmacpi.dll Tue Mar 25 02:07:28 2003 (3E800030)
    e0b49000 e0db0000 nt ntkrnlmp.exe Wed May 26 20:32:57 2004 (40B53739)
    f1902000 f1924000 RDPWD RDPWD.SYS Tue Mar 25 02:03:00 2003 (3E7FFF24)
    f19c4000 f19de7c0 naiavf5x naiavf5x.sys Tue Nov 02 12:32:39 2004 (4187C4B7)
    f1ae7000 f1af1000 TDTCP TDTCP.SYS Tue Mar 25 02:02:52 2003 (3E7FFF1C)
    f1c37000 f1c64000 Fastfat Fastfat.SYS Tue Mar 25 03:00:16 2003 (3E800C90)
    f2024000 f2083000 srv srv.sys Tue Mar 25 03:49:51 2003 (3E80182F)
    f21c3000 f21f2000 afd afd.sys Tue Mar 25 02:40:50 2003 (3E800802)
    f2242000 f225a000 Cdfs Cdfs.SYS Tue Mar 25 03:17:19 2003 (3E80108F)
    f312a000 f3133000 spartadrv spartadrv.sys Thu Feb 17 00:50:28 2005 (421430A4)
    f3c87000 f3c91000 Dxapi Dxapi.sys Tue Mar 25 02:06:01 2003 (3E7FFFD9)
    f4eb9000 f4ec0000 parvdm parvdm.sys Tue Mar 25 02:03:49 2003 (3E7FFF55)
    f4ec9000 f4ed1000 EntDrv52 EntDrv52.sys Tue Oct 05 19:23:57 2004 (41632D0D)
    f51dd000 f51f0000 Fips Fips.SYS Tue Mar 25 03:54:59 2003 (3E801963)
    f51f0000 f525c000 mrxsmb mrxsmb.sys Tue Jan 18 20:35:18 2005 (41EDB956)
    f535b000 f5386000 rdbss rdbss.sys Mon Oct 11 20:38:09 2004 (416B2771)
    f53cb000 f5400000 netbt netbt.sys Fri Jul 18 13:16:03 2003 (3F182B53)
    f5422000 f5484000 tcpip tcpip.sys Tue Mar 25 03:04:01 2003 (3E800D71)
    f5484000 f54a0000 ipsec ipsec.sys Tue Mar 25 02:55:45 2003 (3E800B81)
    f5531000 f553d000 vga vga.sys Tue Mar 25 02:08:03 2003 (3E800053)
    f567e000 f5685000 Beep Beep.SYS Tue Mar 25 02:03:04 2003 (3E7FFF28)
    f5686000 f568d000 Null Null.SYS Tue Mar 25 02:03:05 2003 (3E7FFF29)
    f568e000 f5696000 Fs_Rec Fs_Rec.SYS Tue Mar 25 02:08:36 2003 (3E800074)
    f5787000 f57ba000 update update.sys Tue Mar 25 03:59:59 2003 (3E801A8F)
    f57ba000 f57e4000 ks ks.sys Tue Mar 25 03:47:36 2003 (3E8017A8)
    f57e4000 f5818000 rdpdr rdpdr.sys Tue Mar 25 02:09:30 2003 (3E8000AA)
    f5818000 f582d000 raspptp raspptp.sys Tue Mar 25 03:19:09 2003 (3E8010FD)
    f582d000 f5848000 ndiswan ndiswan.sys Tue Mar 25 03:48:19 2003 (3E8017D3)
    f5848000 f585f000 rasl2tp rasl2tp.sys Tue Mar 25 02:54:46 2003 (3E800B46)
    f585f000 f5873000 cdrom cdrom.sys Tue Mar 25 02:05:18 2003 (3E7FFFAE)
    f5873000 f588a000 parport parport.sys Tue Mar 25 02:03:56 2003 (3E7FFF5C)
    f588a000 f58a2000 serial serial.sys Tue Mar 25 02:40:08 2003 (3E8007D8)
    f58a2000 f58b8000 i8042prt i8042prt.sys Tue Mar 25 04:01:43 2003 (3E801AF7)
    f58b8000 f58dbf80 EL2K_XP EL2K_XP.sys Tue Jun 03 18:48:12 2003 (3EDD25AC)
    f58dc000 f58f5000 VIDEOPRT VIDEOPRT.SYS Tue Mar 25 02:08:02 2003 (3E800052)
    f58f5000 f5906400 atiragem atiragem.sys Thu Oct 03 21:22:15 2002 (3D9CED47)
    f59f5000 f5a04000 termdd termdd.sys Tue Mar 25 02:02:52 2003 (3E7FFF1C)
    f5a05000 f5a0e000 raspti raspti.sys Tue Mar 25 02:11:36 2003 (3E800128)
    f5a15000 f5a20000 ptilink ptilink.sys Tue Mar 25 02:03:51 2003 (3E7FFF57)
    f5a25000 f5a30000 TDI TDI.SYS Tue Mar 25 02:14:28 2003 (3E8001D4)
    f5a35000 f5a43000 raspppoe raspppoe.sys Tue Mar 25 02:11:37 2003 (3E800129)
    f5a45000 f5a4e000 ndistapi ndistapi.sys Tue Mar 25 02:11:28 2003 (3E800120)
    f5a55000 f5a5f000 serenum serenum.sys Tue Mar 25 02:04:01 2003 (3E7FFF61)
    f5a65000 f5a6f000 mouclass mouclass.sys Tue Mar 25 02:03:09 2003 (3E7FFF2D)
    f5a75000 f5a7f000 kbdclass kbdclass.sys Tue Mar 25 02:03:10 2003 (3E7FFF2E)
    f5b0d000 f5b10e40 aspi32 aspi32.sys Fri Sep 10 19:46:10 1999 (37D99842)
    f5b3e000 f5b49000 fdc fdc.sys Tue Mar 25 02:04:31 2003 (3E7FFF7F)
    f5b4e000 f5b57000 watchdog watchdog.sys Tue Mar 25 02:09:01 2003 (3E80008D)
    f5b5e000 f5b6c000 p3 p3.sys Tue Mar 25 02:07:36 2003 (3E800038)
    f5b6e000 f5b7b000 netbios netbios.sys Tue Mar 25 02:09:53 2003 (3E8000C1)
    f5b7e000 f5b87000 dump_diskdump dump_diskdump.sys Tue Mar 25 02:05:15 2003 (3E7FFFAB)
    f5b8e000 f5b98000 flpydisk flpydisk.sys Tue Mar 25 02:04:32 2003 (3E7FFF80)
    f5bbe000 f5bcd000 msgpc msgpc.sys Tue Mar 25 02:10:12 2003 (3E8000D4)
    f5bce000 f5bdc000 NDProxy NDProxy.SYS Tue Mar 25 02:11:30 2003 (3E800122)
    f5cae000 f5cd0000 Mup Mup.sys Tue Mar 25 03:55:58 2003 (3E80199E)
    f5cd0000 f5d11000 NDIS NDIS.sys Tue Mar 25 03:45:35 2003 (3E80172F)
    f5d11000 f5dae000 Ntfs Ntfs.sys Tue Mar 25 02:40:05 2003 (3E8007D5)
    f5dae000 f5dcf000 KSecDD KSecDD.sys Tue Mar 25 02:05:39 2003 (3E7FFFC3)
    f5dcf000 f5de8000 RSFilter RSFilter.sys Tue Mar 25 02:10:49 2003 (3E8000F9)
    f5de8000 f5dfe000 CLASSPNP CLASSPNP.SYS Tue Mar 25 02:38:14 2003 (3E800766)
    f5dfe000 f5e0f7c0 afamgt afamgt.sys Mon Nov 24 18:39:49 2003 (3FC296C5)
    f5e10000 f5e28e80 adpu160m adpu160m.sys Mon Sep 17 16:55:53 2001 (3BA66359)
    f5e29000 f5e4f000 SCSIPORT SCSIPORT.SYS Tue Mar 25 03:01:25 2003 (3E800CD5)
    f5e4f000 f5e70000 volsnap volsnap.sys Tue Mar 25 02:05:47 2003 (3E7FFFCB)
    f5e70000 f5e9a000 dmio dmio.sys Tue Mar 25 02:08:14 2003 (3E80005E)
    f5e9a000 f5ebf000 ftdisk ftdisk.sys Tue Mar 25 02:05:26 2003 (3E7FFFB6)
    f5ebf000 f5ed4000 pci pci.sys Tue Mar 25 02:16:40 2003 (3E800258)
    f5ed4000 f5f05000 ACPI ACPI.sys Tue Mar 25 02:16:21 2003 (3E800245)
    f5f26000 f5f2f000 WMILIB WMILIB.SYS Tue Mar 25 02:13:00 2003 (3E80017C)
    f5f36000 f5f45000 isapnp isapnp.sys Tue Mar 25 02:16:35 2003 (3E800253)
    f5f46000 f5f55000 MountMgr MountMgr.sys Tue Mar 25 02:03:05 2003 (3E7FFF29)
    f5f56000 f5f64000 PartMgr PartMgr.sys Tue Mar 25 03:04:02 2003 (3E800D72)
    f5f66000 f5f73f00 aic78xx aic78xx.sys Mon May 20 12:16:12 2002 (3CE9214C)
    f5f76000 f5f82000 mraid35x mraid35x.sys Fri Dec 12 01:13:16 2003 (3FD95C7C)
    f5f86000 f5f95000 disk disk.sys Tue Mar 25 02:05:20 2003 (3E7FFFB0)
    f5f96000 f5fa2000 Dfs Dfs.sys Tue Mar 25 02:09:52 2003 (3E8000C0)
    f5fa6000 f5faf000 crcdisk crcdisk.sys Tue Mar 25 02:07:23 2003 (3E80002B)
    f5fb6000 f5fc3000 wanarp wanarp.sys Tue Mar 25 02:11:22 2003 (3E80011A)
    f6026000 f6032000 dump_mraid35x dump_mraid35x.sys Fri Dec 12 01:13:16 2003 (3FD95C7C)
    f6036000 f6044460 mvstdi5x mvstdi5x.sys Thu Oct 07 12:35:49 2004 (41657065)
    f60a6000 f60b0000 Msfs Msfs.SYS Tue Mar 25 02:08:56 2003 (3E800088)
    f6176000 f6182000 Npfs Npfs.SYS Tue Mar 25 02:08:59 2003 (3E80008B)
    f61a6000 f61ae000 kdcom kdcom.dll Tue Mar 25 02:08:00 2003 (3E800050)
    f61ae000 f61b6000 BOOTVID BOOTVID.dll Tue Mar 25 02:07:58 2003 (3E80004E)
    f61b6000 f61bd000 dmload dmload.sys Tue Mar 25 02:08:08 2003 (3E800058)
    f61be000 f61c3c00 mraid2k mraid2k.sys Wed Jan 22 21:56:25 2003 (3E2F59D9)
    f6246000 f624e000 mnmdd mnmdd.SYS Tue Mar 25 02:07:53 2003 (3E800049)
    f625e000 f6266000 RDPCDD RDPCDD.sys Tue Mar 25 02:03:05 2003 (3E7FFF29)
    f6266000 f626e000 rasacd rasacd.sys Tue Mar 25 02:11:50 2003 (3E800136)
    f62b6000 f62bd000 dxgthk dxgthk.sys Tue Mar 25 02:05:52 2003 (3E7FFFD0)
    f62d6000 f62de000 audstub audstub.sys Tue Mar 25 02:09:12 2003 (3E800098)
    f648a000 f648b200 swenum swenum.sys Tue Mar 25 02:03:22 2003 (3E7FFF3A)

    Unloaded modules:
    f2024000 f2083000 srv.sys
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f621e000 f6226000 EntDrv52.sys
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f6106000 f6114000 imapi.sys
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f5372000 f5386000 redbook.sys
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f5696000 f569b000 Cdaudio.SYS
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f6402000 f6406000 dell120dlt.s
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f569e000 f56a6000 Sfloppy.SYS
    Timestamp: unavailable (00000000)
    Checksum: 00000000
     
  2. 2005/04/11
    BenMcDonald[MS]

    BenMcDonald[MS] Inactive

    Joined:
    2004/12/14
    Messages:
    228
    Likes Received:
    0
    three possibilities,

    Don't ask how i figured this out. It involves unassembly and two's complement arithmetic. :D

    a) you have a bitflip at e0b59078 (you can run !chkimg nt )
    b) you have a corrupt file in NTKRNLMP.EXE (which has a bitflip at +10078)
    c) you are hitting an erratum I dont recognize, causing a bit error on the JZ.

    solutions
    a) bad ram, motherboard or CPU
    b) bad disk/bad download
    c) update bios, call vendor for new CPUs

    Let us know what it turns out to be, my money is !chkimg will show a bit error and you have a bad stick of ram.
     

  3. to hide this advert.

  4. 2005/04/11
    NJITKehoe

    NJITKehoe Inactive Thread Starter

    Joined:
    2005/04/11
    Messages:
    12
    Likes Received:
    0
    Thanks for your help, do you have any feelings about which would be a good start? A, B or C?

    Since I have another box completely identical to this box adn that box is working, would it make sense to copy NTKRNLMP.EXE from the working box to the problem box?

    Do you think applying sp1 would work?

    Would a repair from the installation CD and then re-application of patches be another way out?

    The bios and all drivers are current between the identical working box and the problem box. So what I'm asking, where would you gamble to start - hardware (cpu) or software?
     
  5. 2005/04/11
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    Check for BIOS updates, since its free and easy to do.

    I dont advocate doing a file copy.

    This is almost certainly bad RAM.

    If you use the debugwiz advanced options (or just open the dump in the debugger) to include !chkimg nt in the command options, youll get a better track on it.
     
  6. 2005/04/11
    NJITKehoe

    NJITKehoe Inactive Thread Starter

    Joined:
    2005/04/11
    Messages:
    12
    Likes Received:
    0
    Ok - !for_each_module !chkimg @#ModuleName (as per the help files). I'm taking this result to mean that ntkrnlmp.exe is not corrupted but what is there any information on the response I got for halmacpi.dll? The symbol path was SRV*c:\symbols*http://msdl.microsoft.com/download/symbols.

    Any other thoughts and suggestions are appreciated as well...

    ---------------
    @#ModuleIndex: 03
    @#ImageName: halmacpi.dll#
    @#ModuleName: # hal
    @#LoadedImageName: #
    @#SymbolFileName: halmacpi.dll#
    @#MappedImageName: #
    @#Base: e0b21000 @#Size: 00028000 @#End: e0b49000
    @#TimeDateStamp: 3e800030 @#Checksum: 0001d21b
    @#Flags: 00000000 @#SymbolType: 5
    @#ImageNameSize: 0000000d @#ModuleNameSize: 00000004 @#LoadedImageNameSize: 00000001
    @#SymbolFileNameSize: 0000000d @#MappedImageNameSize: 00000001
    @#FileDescription: #
    Error for hal: Could not find image file for the module. Make sure binaries are included in the symbol path.
    ---------------
    @#ModuleIndex: 04
    @#ImageName: ntkrnlmp.exe#
    @#ModuleName: # nt
    @#LoadedImageName: # ntkrnlmp.exe
    @#SymbolFileName: c:\symbols\ntkrnlmp.pdb\466B4165EAA84AF88D29D617E86A95982\ntkrnlmp.pdb#
    @#MappedImageName: #
    @#Base: e0b49000 @#Size: 00267000 @#End: e0db0000
    @#TimeDateStamp: 40b53739 @#Checksum: 00253c04
    @#Flags: 00000000 @#SymbolType: 3
    @#ImageNameSize: 0000000d @#ModuleNameSize: 00000003 @#LoadedImageNameSize: 0000000d
    @#SymbolFileNameSize: 00000047 @#MappedImageNameSize: 00000001
    @#FileDescription: NT Kernel & System#
    0 errors : nt
    ---------------

    Another oddity worth looking into is why MEGARAID SCSI Controller Driver for Windows 2000 PAE is loaded on a 2k3 system with 2k3 versions of the driver as well.

    ---------------
    @#ModuleIndex: 58
    @#ImageName: mraid2k.sys#
    @#ModuleName: # mraid2k
    @#LoadedImageName: #
    @#SymbolFileName: mraid2k.sys#
    @#MappedImageName: #
    @#Base: f61be000 @#Size: 00005c00 @#End: f61c3c00
    @#TimeDateStamp: 3e2f59d9 @#Checksum: 0000ec04
    @#Flags: 00000000 @#SymbolType: 5
    @#ImageNameSize: 0000000c @#ModuleNameSize: 00000008 @#LoadedImageNameSize: 00000001
    @#SymbolFileNameSize: 0000000c @#MappedImageNameSize: 00000001
    @#FileDescription: MEGARAID SCSI Controller Driver for Windows 2000 PAE#
    Error for mraid2k: Could not find image file for the module. Make sure binaries are included in the symbol path.
    ---------------
     
  7. 2005/04/11
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    ImageName: ntkrnlmp.exe
    0 errors : nt



    Hmm. the plot thickens! well, i would have lost that bet.

    You are looking at Option C. Something is not happy with your CPU or supporting IO subsystem. If the dump is solid, then its not a RAM bitflip, which means it flipped in transit, in the CPU cache, or as part of a problem with the CPU processing the command.

    (and good for you for figuring out how to get the debugger working)

    Hal always does that, because it gets renamed to HAL.DLL, even though its HALACPI.DLL or whatever, no sweat, and no worries on figuring out how to make it work, its not involved here.
    As far as the megaraid controller. I dont have a lot of exposure to that specific driver, but thats just the internal name of the driver. Its probably part of a matched set of supporting files to get that card working correctly. Somebody forgot to go in and update the name, or its a shared library. I wouldnt give it any attention.
     
  8. 2005/04/11
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    You sure all the dumps look IDENTICAL to this? If they do, then you are probably dealing with an erratum. a !CPUID would be good for posterity for the next poor soul that runs into this.
     
  9. 2005/04/12
    NJITKehoe

    NJITKehoe Inactive Thread Starter

    Joined:
    2005/04/11
    Messages:
    12
    Likes Received:
    0
    > You sure all the dumps look IDENTICAL to this?

    To me, yes but to some maybe not, so here is some other info. I can track the save dump event logs back to 9/5/2003 (ugh, almost 2 years tracking this) and there are a total of 29 dumps.

    The first 7 (9/5 - 12/16/2003) are The bugcheck was: 0x0000000a (0x00000051, 0x0000001b, 0x00000000, 0x804ee030)

    The next 10 (12/16/2003 - 10/17/2004) are The bugcheck was: 0x0000000a (0x00000051, 0x0000001b, 0x00000000, 0xe0b5a030)

    The last 12 (10/26/2004 - 4/11/2005) are The bugcheck was: 0x0000000a (0x00000051, 0x0000001b, 0x00000000, 0xe0b59030)

    So this situation has survived multiple OS and software patches, application upgrades as well as multiple firmware, bios, and driver upgrades with only difference being the memory address referenced. Niether IRQ 81 nor 27 exist in the problem box. I also have an identically configured box running right next to the problem box, for the same length of time, at the same OS/Application patch levels with the same firmware/bios level serving similar applications under a similar load without a problem. Not being able to put my finger on this (or at least reliably reproduce or predict it) is maddening...

    CPU-Z seems to give more info about the CPU and !cpuid or !cpuinfo which might stick out to someone.

    Both CPU's in the system are:
    Name Intel Xeon III EB
    Code name Cascades
    Specification Intel Xeon III EB 1000 MHz
    Family/Model/Stepping 686
    Extended Family/Model 0/0
    Brand ID 3
    Package Slot 2 SECC
    Core Stepping cC0
    Technology 0.18µ
    Instructions Sets MMX, SSE
    Clock Speed 993.4 MHz
    Clock multiplier x7.5
    Front Side Bus Frequency 132.4 MHz
    Bus Speed 132.4 MHz
    Stock frequency 1000 MHz
    L1 Data Cache 16 KBytes, 4-way set associative, 32 Bytes line size
    L1 Instruction Cache 16 KBytes, 4-way set associative, 32 Bytes line size
    L2 Cache 256 KBytes, 8-way set associative, 32 Bytes line size
    L2 Latency 0
    L2 Speed 993.4 MHz (Full)
    L2 Location On Chip
    L2 ECC Check enabled
    L2 Data Prefetch Logic no
    L2 Bus Width 256 bits



    I believe this translates into the following PDF with errata for CPU 686h C0.
    http://download.intel.com/design/PentiumIII/xeon/specupdt/24446039.pdf

    There seems to be a lot that applies, does any one in particular jump out as a possible cause to the symptoms I'm seeing? Would you agree that my problem is server load vs. cpu errata at this point?

    As a final test, I'm going to swap the CPU's and memory from the problem box to the identical working box and see if the symptoms follow.

    Again, thanks for your time, help and suggestions.

    -Mike
     
  10. 2005/04/12
    BenMcDonald[MS]

    BenMcDonald[MS] Inactive

    Joined:
    2004/12/14
    Messages:
    228
    Likes Received:
    0
    It is not obvious to me that its any of those. As joe said, this could be a ondie cache consistancy thing, not an erratum. Based on years of consistancy though, i beleive it to be a problem with the chip, not the cache or motherboard.

    I do not see how this is load related, other than load==heat==badthings.

    I like your swap the CPU test. If it follows the chips, its a cache or cpu thing, if it stays, its probably a motherboard thing. Good test.
     
  11. 2005/04/14
    BenMcDonald[MS]

    BenMcDonald[MS] Inactive

    Joined:
    2004/12/14
    Messages:
    228
    Likes Received:
    0
    I did a little research on this one.. There was an old case with this exact phenomenon, using those same CPUs. It turned out to be a motherboard problem. "Cache line pattern sensitivity "

    Based on that, i'm expecting that your CPU swap test will stay with the same machine.

    If you havent engaged Dell, you probably should. They should have records of this crash identity.
     
  12. 2005/04/18
    NJITKehoe

    NJITKehoe Inactive Thread Starter

    Joined:
    2005/04/11
    Messages:
    12
    Likes Received:
    0
    Been away on personal time. Thanks for the research. Can you give me any info on the case you found? Talking to first level script monkeys at Dell support at this point is going to frustrate me to no end. I don't really see the need to waste 4 hours reseating everything according to their scripts.
     
Thread Status:
Not open for further replies.

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.