1. You are viewing our forum as a guest. For full access please Register. WindowsBBS.com is completely free, paid for by advertisers and donations.

DUMP DATA: BugCheck A on Windows Server 2003

Discussion in 'Windows Server System' started by valemon, 2004/12/07.

Thread Status:
Not open for further replies.
  1. 2004/12/07
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Hi everyone,
    I am having trouble with a web/database server, it unexplainably reboots about 15 times over a period of 24 hours - and has been behaving like this for quite a few weeks now. I've gone through all the event logs and or other logs with a microscope, but havent been able to get a clue..

    The dump created using the debuttools from MS and debugwiz is below. Is this as simple as faulty memory hardware? Unfortunately, I do not know the current hardware history of this server, wether theres been any mem chip configuration changes etc..

    I'd be very happy for any input on this!

    ---------------------------------------------
    Opened log file 'c:\debuglog.txt'
    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

    Microsoft (R) Windows Debugger Version 6.3.0017.0
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\Documents and Settings\vikar-111\Skrivebord\MEMORY.DMP]
    Kernel Complete Dump File: Full address space is available

    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: C:\WINNT;C:\WINNT\system32;C:\WINNT\system32\drivers
    Windows Server 2003 Kernel Version 3790 MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer SingleUserTS
    Built by: 3790.srv03_gdr.040410-1234
    Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
    Debug session time: Tue Dec 07 10:45:27 2004
    System Uptime: 0 days 1:51:12.298
    Loading Kernel Symbols
    .............................................................................................................
    Loading unloaded module list
    ..
    Loading User Symbols
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {80ffffe8, 2, 1, 804f1918}

    Probably caused by : memory_corruption ( nt!MiRemovePageByColor+af )

    Followup: MachineOwner
    ---------

    0: kd> !analyze -v;r;kv;lmtn;.logclose;q
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 80ffffe8, memory referenced
    Arg2: 00000002, IRQL
    Arg3: 00000001, value 0 = read operation, 1 = write operation
    Arg4: 804f1918, address which referenced memory

    Debugging Details:
    ------------------


    WRITE_ADDRESS: 80ffffe8

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    LAST_CONTROL_TRANSFER: from 805030bd to 804f1918

    TRAP_FRAME: f789a788 -- (.trap fffffffff789a788)
    .trap fffffffff789a788
    ErrCode = 00000002
    eax=fffffffd ebx=00000008 ecx=0003faba edx=81000000 esi=8150a3c0 edi=00000000
    eip=804f1918 esp=f789a7fc ebp=00035c28 iopl=0 nv up ei pl nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
    nt!MiRemovePageByColor+0xaf:
    804f1918 890cc2 mov [edx+eax*8],ecx ds:0023:80ffffe8=????????
    .trap
    Resetting default scope

    STACK_TEXT:
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf


    FOLLOWUP_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    SYMBOL_STACK_INDEX: 0

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: nt!MiRemovePageByColor+af

    MODULE_NAME: nt

    DEBUG_FLR_IMAGE_TIMESTAMP: 40b53739

    STACK_COMMAND: .trap fffffffff789a788 ; kb

    IMAGE_NAME: memory_corruption

    BUCKET_ID: 0xA_W_nt!MiRemovePageByColor+af

    Followup: MachineOwner
    ---------

    eax=ffdff13c ebx=0000000a ecx=865aa3c8 edx=40000000 esi=ffdff120 edi=80ffffe8
    eip=80543ac9 esp=f789a754 ebp=f789a76c iopl=0 nv up ei ng nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
    nt!KeBugCheckEx+0x19:
    80543ac9 5d pop ebp
    ChildEBP RetAddr Args to Child
    f789a76c 804e2f58 0000000a 80ffffe8 00000002 nt!KeBugCheckEx+0x19 (FPO: [Non-Fpo])
    f789a76c 804f1918 0000000a 80ffffe8 00000002 nt!KiTrap0E+0x224 (FPO: [0,0] TrapFrame @ f789a788)
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf (FPO: [EBP 0x81115698] [0,3,0])
    start end module name
    804de000 80745000 nt ntkrnlmp.exe Thu May 27 02:32:57 2004 (40B53739)
    80745000 8076d000 hal halmacpi.dll Tue Mar 25 08:07:28 2003 (3E800030)
    baad0000 baae8000 Cdfs Cdfs.SYS Tue Mar 25 09:17:19 2003 (3E80108F)
    bab30000 bab39000 asyncmac asyncmac.sys Tue Mar 25 08:11:27 2003 (3E80011F)
    badb8000 bae17000 srv srv.sys Tue Mar 25 09:49:51 2003 (3E80182F)
    bae67000 baebf000 HTTP HTTP.sys Tue Mar 25 09:55:21 2003 (3E801979)
    baee7000 baf09000 RDPWD RDPWD.SYS Tue Mar 25 08:03:00 2003 (3E7FFF24)
    bafd1000 bb000000 afd afd.sys Tue Mar 25 08:40:50 2003 (3E800802)
    bf800000 bf9c7000 win32k win32k.sys Tue Aug 10 00:48:07 2004 (4117FF27)
    bf9c7000 bfa1d680 ati2drad ati2drad.dll Tue Mar 25 10:43:37 2003 (3E8024C9)
    bff60000 bff7b000 RDPDD RDPDD.dll Tue Mar 25 21:12:05 2003 (3E80B815)
    bff80000 bff96000 dxg dxg.sys Tue Mar 25 10:46:23 2003 (3E80256F)
    f2c45000 f2c58000 Fips Fips.SYS Tue Mar 25 09:54:59 2003 (3E801963)
    f2c58000 f2c84000 rdbss rdbss.sys Tue Mar 25 08:09:06 2003 (3E800092)
    f2ceb000 f2d18000 Fastfat Fastfat.SYS Tue Mar 25 09:00:16 2003 (3E800C90)
    f2d18000 f2d85000 mrxsmb mrxsmb.sys Tue Mar 25 08:09:09 2003 (3E800095)
    f2e79000 f2e83000 TDTCP TDTCP.SYS Tue Mar 25 08:02:52 2003 (3E7FFF1C)
    f36e6000 f36eb620 PcdrNt PcdrNt.sys Thu Mar 23 06:42:23 2000 (38D9AEBF)
    f3add000 f3b12000 netbt netbt.sys Fri Jul 18 19:16:03 2003 (3F182B53)
    f3dca000 f3dd4000 Dxapi Dxapi.sys Tue Mar 25 08:06:01 2003 (3E7FFFD9)
    f3dfa000 f3e03000 dump_diskdump dump_diskdump.sys Tue Mar 25 08:05:15 2003 (3E7FFFAB)
    f3e4f000 f3e54f00 dump_mraid35x dump_mraid35x.sys Fri May 30 19:24:54 2003 (3ED793E6)
    f3f44000 f3fa6000 tcpip tcpip.sys Tue Mar 25 09:04:01 2003 (3E800D71)
    f3fc6000 f3fe2000 ipsec ipsec.sys Tue Mar 25 08:55:45 2003 (3E800B81)
    f411e000 f412e760 naveng naveng.sys Fri Oct 01 03:59:17 2004 (415CB9F5)
    f412f000 f41c7680 navex15 navex15.sys Fri Oct 01 04:11:15 2004 (415CBCC3)
    f47dc000 f47eef00 SYMEVENT SYMEVENT.SYS Thu Jan 15 03:02:13 2004 (4005F4A5)
    f47ef000 f483e000 savrt savrt.sys Tue Feb 10 00:24:30 2004 (402816AE)
    f48be000 f48d1000 usbhub usbhub.sys Tue Mar 25 08:10:46 2003 (3E8000F6)
    f4973000 f4983000 Savrtpel Savrtpel.sys Tue Feb 10 00:24:34 2004 (402816B2)
    f4993000 f499f000 Npfs Npfs.SYS Tue Mar 25 08:08:59 2003 (3E80008B)
    f4dc0000 f4dc14e0 BASFND BASFND.sys Thu Oct 11 05:05:51 2001 (3BC50C8F)
    f5905000 f5938000 update update.sys Tue Mar 25 09:59:59 2003 (3E801A8F)
    f5938000 f596c000 rdpdr rdpdr.sys Tue Mar 25 08:09:30 2003 (3E8000AA)
    f596c000 f5979000 wanarp wanarp.sys Tue Mar 25 08:11:22 2003 (3E80011A)
    f59dc000 f59e6000 Msfs Msfs.SYS Tue Mar 25 08:08:56 2003 (3E800088)
    f5a0c000 f5a21000 raspptp raspptp.sys Tue Mar 25 09:19:09 2003 (3E8010FD)
    f5ce8000 f5d03000 ndiswan ndiswan.sys Tue Mar 25 09:48:19 2003 (3E8017D3)
    f5d03000 f5d1a000 rasl2tp rasl2tp.sys Tue Mar 25 08:54:46 2003 (3E800B46)
    f5d1a000 f5d44d00 b57xp32 b57xp32.sys Thu May 22 03:47:11 2003 (3ECC2C1F)
    f5d45000 f5d66e80 USBPORT USBPORT.SYS Tue Mar 25 08:10:43 2003 (3E8000F3)
    f5d67000 f5d91000 ks ks.sys Tue Mar 25 09:47:36 2003 (3E8017A8)
    f5d91000 f5da5000 redbook redbook.sys Tue Mar 25 08:04:38 2003 (3E7FFF86)
    f5da5000 f5db9000 cdrom cdrom.sys Tue Mar 25 08:05:18 2003 (3E7FFFAE)
    f5db9000 f5dd1000 serial serial.sys Tue Mar 25 08:40:08 2003 (3E8007D8)
    f5dd1000 f5de7000 i8042prt i8042prt.sys Tue Mar 25 10:01:43 2003 (3E801AF7)
    f5de7000 f5e00000 VIDEOPRT VIDEOPRT.SYS Tue Mar 25 08:08:02 2003 (3E800052)
    f5e00000 f5e53d80 ati2mpad ati2mpad.sys Fri Jul 19 03:13:20 2002 (3D3767B0)
    f61a1000 f61aa000 ndistapi ndistapi.sys Tue Mar 25 08:11:28 2003 (3E800120)
    f61b1000 f61be000 Modem Modem.SYS Tue Mar 25 08:14:40 2003 (3E8001E0)
    f61c1000 f61cd900 dcesmwdm dcesmwdm.sys Mon Jul 21 23:43:13 2003 (3F1C5E71)
    f61d1000 f61db000 serenum serenum.sys Tue Mar 25 08:04:01 2003 (3E7FFF61)
    f61e1000 f61eb000 mouclass mouclass.sys Tue Mar 25 08:03:09 2003 (3E7FFF2D)
    f61f1000 f61fb000 kbdclass kbdclass.sys Tue Mar 25 08:03:10 2003 (3E7FFF2E)
    f716a000 f716de40 ASPI32 ASPI32.SYS Sat Sep 11 01:46:10 1999 (37D99842)
    f7212000 f7234000 Mup Mup.sys Tue Mar 25 09:55:58 2003 (3E80199E)
    f7234000 f7275000 NDIS NDIS.sys Tue Mar 25 09:45:35 2003 (3E80172F)
    f7275000 f7312000 Ntfs Ntfs.sys Tue Mar 25 08:40:05 2003 (3E8007D5)
    f7312000 f7333000 KSecDD KSecDD.sys Tue Mar 25 08:05:39 2003 (3E7FFFC3)
    f7333000 f7345500 drvmcdb drvmcdb.sys Sat Dec 22 07:33:52 2001 (3C242950)
    f7346000 f735c000 CLASSPNP CLASSPNP.SYS Tue Mar 25 08:38:14 2003 (3E800766)
    f735c000 f736d5c0 afamgt afamgt.sys Wed Oct 30 04:00:12 2002 (3DBF4B3C)
    f736e000 f7394000 SCSIPORT SCSIPORT.SYS Tue Mar 25 09:01:25 2003 (3E800CD5)
    f7394000 f73b0000 atapi atapi.sys Tue Mar 25 08:04:48 2003 (3E7FFF90)
    f73b0000 f73d1000 volsnap volsnap.sys Tue Mar 25 08:05:47 2003 (3E7FFFCB)
    f73d1000 f73fb000 dmio dmio.sys Tue Mar 25 08:08:14 2003 (3E80005E)
    f73fb000 f7420000 ftdisk ftdisk.sys Tue Mar 25 08:05:26 2003 (3E7FFFB6)
    f7420000 f7435000 pci pci.sys Tue Mar 25 08:16:40 2003 (3E800258)
    f7435000 f7466000 ACPI ACPI.sys Tue Mar 25 08:16:21 2003 (3E800245)
    f7487000 f7490000 WMILIB WMILIB.SYS Tue Mar 25 08:13:00 2003 (3E80017C)
    f7497000 f74a6000 isapnp isapnp.sys Tue Mar 25 08:16:35 2003 (3E800253)
    f74a7000 f74b4000 PCIIDEX PCIIDEX.SYS Tue Mar 25 08:04:44 2003 (3E7FFF8C)
    f74b7000 f74c6000 MountMgr MountMgr.sys Tue Mar 25 08:03:05 2003 (3E7FFF29)
    f74c7000 f74d5000 PartMgr PartMgr.sys Tue Mar 25 09:04:02 2003 (3E800D72)
    f74d7000 f74e6000 disk disk.sys Tue Mar 25 08:05:20 2003 (3E7FFFB0)
    f74e7000 f74f1f20 vsp vsp.sys Fri Sep 05 18:34:41 2003 (3F58BB21)
    f74f7000 f7503000 Dfs Dfs.sys Tue Mar 25 08:09:52 2003 (3E8000C0)
    f7507000 f7510000 crcdisk crcdisk.sys Tue Mar 25 08:07:23 2003 (3E80002B)
    f7517000 f7520000 raspti raspti.sys Tue Mar 25 08:11:36 2003 (3E800128)
    f7527000 f7536000 termdd termdd.sys Tue Mar 25 08:02:52 2003 (3E7FFF1C)
    f7577000 f7585000 NDProxy NDProxy.SYS Tue Mar 25 08:11:30 2003 (3E800122)
    f7597000 f75a0000 ndisuio ndisuio.sys Tue Mar 25 08:09:47 2003 (3E8000BB)
    f75a7000 f75b6000 msgpc msgpc.sys Tue Mar 25 08:10:12 2003 (3E8000D4)
    f75d7000 f75e3000 vga vga.sys Tue Mar 25 08:08:03 2003 (3E800053)
    f7607000 f7611000 flpydisk flpydisk.sys Tue Mar 25 08:04:32 2003 (3E7FFF80)
    f7627000 f7634000 netbios netbios.sys Tue Mar 25 08:09:53 2003 (3E8000C1)
    f7657000 f7663000 processr processr.sys Tue Mar 25 08:07:36 2003 (3E800038)
    f7667000 f7675080 racser racser.sys Tue Feb 18 17:06:00 2003 (3E5259E8)
    f7677000 f7680000 watchdog watchdog.sys Tue Mar 25 08:09:01 2003 (3E80008D)
    f7687000 f7692000 fdc fdc.sys Tue Mar 25 08:04:31 2003 (3E7FFF7F)
    f76c7000 f76d5000 raspppoe raspppoe.sys Tue Mar 25 08:11:37 2003 (3E800129)
    f76d7000 f76e2000 TDI TDI.SYS Tue Mar 25 08:14:28 2003 (3E8001D4)
    f76e7000 f76f2000 ptilink ptilink.sys Tue Mar 25 08:03:51 2003 (3E7FFF57)
    f7707000 f770f000 kdcom kdcom.dll Tue Mar 25 08:08:00 2003 (3E800050)
    f770f000 f7717000 BOOTVID BOOTVID.dll Tue Mar 25 08:07:58 2003 (3E80004E)
    f7717000 f771e000 pciide pciide.sys Tue Mar 25 08:04:46 2003 (3E7FFF8E)
    f771f000 f7726000 dmload dmload.sys Tue Mar 25 08:08:08 2003 (3E800058)
    f7727000 f772cf00 mraid35x mraid35x.sys Fri May 30 19:24:54 2003 (3ED793E6)
    f775f000 f7767000 Fs_Rec Fs_Rec.SYS Tue Mar 25 08:08:36 2003 (3E800074)
    f7787000 f778f000 audstub audstub.sys Tue Mar 25 08:09:12 2003 (3E800098)
    f778f000 f7797000 RootMdm RootMdm.sys Tue Mar 25 08:14:42 2003 (3E8001E2)
    f7797000 f779e000 Null Null.SYS Tue Mar 25 08:03:05 2003 (3E7FFF29)
    f779f000 f77a6000 Beep Beep.SYS Tue Mar 25 08:03:04 2003 (3E7FFF28)
    f77a7000 f77af000 mnmdd mnmdd.SYS Tue Mar 25 08:07:53 2003 (3E800049)
    f77af000 f77b7000 RDPCDD RDPCDD.sys Tue Mar 25 08:03:05 2003 (3E7FFF29)
    f77b7000 f77bb200 usbohci usbohci.sys Tue Mar 25 08:10:41 2003 (3E8000F1)
    f77ff000 f7807000 rasacd rasacd.sys Tue Mar 25 08:11:50 2003 (3E800136)
    f786f000 f7876000 dxgthk dxgthk.sys Tue Mar 25 08:05:52 2003 (3E7FFFD0)
    f7a01000 f7a02200 swenum swenum.sys Tue Mar 25 08:03:22 2003 (3E7FFF3A)
    f7a23000 f7a24580 USBD USBD.SYS Tue Mar 25 08:10:39 2003 (3E8000EF)

    Unloaded modules:
    f59bc000 f59ca000 imapi.sys
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    f7767000 f776f000 Sfloppy.SYS
    Timestamp: unavailable (00000000)
    Checksum: 00000000
    Closing open log file c:\debuglog.txt
     
  2. 2004/12/07
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    probable memory corruption. Need more dumps to determine if its a software or hardware problem. when you post the next sets, you can strip off the drivers, we only need to see the list once. can you get 3 of them and then we will see where we are at.
     

  3. to hide this advert.

  4. 2004/12/07
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    more dumps

    Dump number two:

    Opened log file 'c:\debuglog.txt'
    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

    Microsoft (R) Windows Debugger Version 6.3.0017.0
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\Documents and Settings\vikar-111\Skrivebord\MEMORY.DMP]
    Kernel Complete Dump File: Full address space is available

    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: C:\WINNT;C:\WINNT\system32;C:\WINNT\system32\drivers
    Windows Server 2003 Kernel Version 3790 MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer SingleUserTS
    Built by: 3790.srv03_gdr.040410-1234
    Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
    Debug session time: Tue Dec 07 12:28:10 2004
    System Uptime: 0 days 1:41:33.010
    Loading Kernel Symbols
    .............................................................................................................
    Loading unloaded module list
    ..
    Loading User Symbols
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {80ffffe8, 2, 1, 804f1918}

    Probably caused by : memory_corruption ( nt!MiRemovePageByColor+af )

    Followup: MachineOwner
    ---------

    0: kd> !analyze -v;r;kv;lmtn;.logclose;q
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 80ffffe8, memory referenced
    Arg2: 00000002, IRQL
    Arg3: 00000001, value 0 = read operation, 1 = write operation
    Arg4: 804f1918, address which referenced memory

    Debugging Details:
    ------------------


    WRITE_ADDRESS: 80ffffe8

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    LAST_CONTROL_TRANSFER: from 805030bd to 804f1918

    TRAP_FRAME: f789a788 -- (.trap fffffffff789a788)
    .trap fffffffff789a788
    ErrCode = 00000002
    eax=fffffffd ebx=00000004 ecx=0000d3d9 edx=81000000 esi=81134460 edi=00000000
    eip=804f1918 esp=f789a7fc ebp=0000cd84 iopl=0 nv up ei pl nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
    nt!MiRemovePageByColor+0xaf:
    804f1918 890cc2 mov [edx+eax*8],ecx ds:0023:80ffffe8=????????
    .trap
    Resetting default scope

    STACK_TEXT:
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf


    FOLLOWUP_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    SYMBOL_STACK_INDEX: 0

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: nt!MiRemovePageByColor+af

    MODULE_NAME: nt

    DEBUG_FLR_IMAGE_TIMESTAMP: 40b53739

    STACK_COMMAND: .trap fffffffff789a788 ; kb

    IMAGE_NAME: memory_corruption

    BUCKET_ID: 0xA_W_nt!MiRemovePageByColor+af

    Followup: MachineOwner
    ---------

    eax=ffdff13c ebx=0000000a ecx=865aa3c8 edx=40000000 esi=ffdff120 edi=80ffffe8
    eip=80543ac9 esp=f789a754 ebp=f789a76c iopl=0 nv up ei ng nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
    nt!KeBugCheckEx+0x19:
    80543ac9 5d pop ebp
    ChildEBP RetAddr Args to Child
    f789a76c 804e2f58 0000000a 80ffffe8 00000002 nt!KeBugCheckEx+0x19 (FPO: [Non-Fpo])
    f789a76c 804f1918 0000000a 80ffffe8 00000002 nt!KiTrap0E+0x224 (FPO: [0,0] TrapFrame @ f789a788)
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf (FPO: [EBP 0x812752c8] [0,3,0])
    start end module name
     
  5. 2004/12/07
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    the third dump

    Dump number three (I'll get you the fourth one later this evening):

    Opened log file 'c:\debuglog.txt'
    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

    Microsoft (R) Windows Debugger Version 6.3.0017.0
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\Documents and Settings\vikar-111\Skrivebord\MEMORY.DMP]
    Kernel Complete Dump File: Full address space is available

    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: C:\WINNT;C:\WINNT\system32;C:\WINNT\system32\drivers
    Windows Server 2003 Kernel Version 3790 MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer SingleUserTS
    Built by: 3790.srv03_gdr.040410-1234
    Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
    Debug session time: Tue Dec 07 13:15:53 2004
    System Uptime: 0 days 0:46:33.333
    Loading Kernel Symbols
    .............................................................................................................
    Loading unloaded module list
    ..
    Loading User Symbols
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {80ffffe8, 2, 1, 804f1918}

    Probably caused by : memory_corruption ( nt!MiRemovePageByColor+af )

    Followup: MachineOwner
    ---------

    0: kd> !analyze -v;r;kv;lmtn;.logclose;q
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 80ffffe8, memory referenced
    Arg2: 00000002, IRQL
    Arg3: 00000001, value 0 = read operation, 1 = write operation
    Arg4: 804f1918, address which referenced memory

    Debugging Details:
    ------------------


    WRITE_ADDRESS: 80ffffe8

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    LAST_CONTROL_TRANSFER: from 805030bd to 804f1918

    TRAP_FRAME: f789a788 -- (.trap fffffffff789a788)
    .trap fffffffff789a788
    ErrCode = 00000002
    eax=fffffffd ebx=00000003 ecx=00024e87 edx=81000000 esi=8137b048 edi=00000000
    eip=804f1918 esp=f789a7fc ebp=00025203 iopl=0 nv up ei pl nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
    nt!MiRemovePageByColor+0xaf:
    804f1918 890cc2 mov [edx+eax*8],ecx ds:0023:80ffffe8=????????
    .trap
    Resetting default scope

    STACK_TEXT:
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf


    FOLLOWUP_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    SYMBOL_STACK_INDEX: 0

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: nt!MiRemovePageByColor+af

    MODULE_NAME: nt

    DEBUG_FLR_IMAGE_TIMESTAMP: 40b53739

    STACK_COMMAND: .trap fffffffff789a788 ; kb

    IMAGE_NAME: memory_corruption

    BUCKET_ID: 0xA_W_nt!MiRemovePageByColor+af

    Followup: MachineOwner
    ---------

    eax=ffdff13c ebx=0000000a ecx=865aa3c8 edx=40000000 esi=ffdff120 edi=80ffffe8
    eip=80543ac9 esp=f789a754 ebp=f789a76c iopl=0 nv up ei ng nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
    nt!KeBugCheckEx+0x19:
    80543ac9 5d pop ebp
    ChildEBP RetAddr Args to Child
    f789a76c 804e2f58 0000000a 80ffffe8 00000002 nt!KeBugCheckEx+0x19 (FPO: [Non-Fpo])
    f789a76c 804f1918 0000000a 80ffffe8 00000002 nt!KiTrap0E+0x224 (FPO: [0,0] TrapFrame @ f789a788)
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf (FPO: [EBP 0x811e5d30] [0,3,0])
    start end module name
     
  6. 2004/12/07
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    thanks

    Thank you very muchso far! This input is good to have when deciding our next action :)
     
  7. 2004/12/08
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    4th dump data

    Here it is!

    Does it make the situation any clearer to you?
    -------------------------------------

    Opened log file 'c:\debuglog.txt'
    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

    Microsoft (R) Windows Debugger Version 6.3.0017.0
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\Documents and Settings\vikar-111\Skrivebord\MEMORY.DMP]
    Kernel Complete Dump File: Full address space is available

    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: C:\WINNT;C:\WINNT\system32;C:\WINNT\system32\drivers
    Windows Server 2003 Kernel Version 3790 MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer SingleUserTS
    Built by: 3790.srv03_gdr.040410-1234
    Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
    Debug session time: Wed Dec 08 02:42:34 2004
    System Uptime: 0 days 11:01:40.144
    Loading Kernel Symbols
    ............................................................................................................
    Loading unloaded module list
    ..
    Loading User Symbols
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck A, {80ffffe8, 2, 1, 804f1918}

    Probably caused by : memory_corruption ( nt!MiRemovePageByColor+af )

    Followup: MachineOwner
    ---------

    0: kd> !analyze -v;r;kv;lmtn;.logclose;q
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)
    An attempt was made to access a pageable (or completely invalid) address at an
    interrupt request level (IRQL) that is too high. This is usually
    caused by drivers using improper addresses.
    If a kernel debugger is available get the stack backtrace.
    Arguments:
    Arg1: 80ffffe8, memory referenced
    Arg2: 00000002, IRQL
    Arg3: 00000001, value 0 = read operation, 1 = write operation
    Arg4: 804f1918, address which referenced memory

    Debugging Details:
    ------------------


    WRITE_ADDRESS: 80ffffe8

    CURRENT_IRQL: 2

    FAULTING_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    BUGCHECK_STR: 0xA

    LAST_CONTROL_TRANSFER: from 805030bd to 804f1918

    TRAP_FRAME: f789a788 -- (.trap fffffffff789a788)
    .trap fffffffff789a788
    ErrCode = 00000002
    eax=fffffffd ebx=0000000a ecx=0001a03b edx=81000000 esi=8129b770 edi=00000000
    eip=804f1918 esp=f789a7fc ebp=0001bcfa iopl=0 nv up ei pl nz na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
    nt!MiRemovePageByColor+0xaf:
    804f1918 890cc2 mov [edx+eax*8],ecx ds:0023:80ffffe8=????????
    .trap
    Resetting default scope

    STACK_TEXT:
    f789a814 805030bd 00000000 00008000 f789ad38 nt!MiRemovePageByColor+0xaf


    FOLLOWUP_IP:
    nt!MiRemovePageByColor+af
    804f1918 890cc2 mov [edx+eax*8],ecx

    SYMBOL_STACK_INDEX: 0

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: nt!MiRemovePageByColor+af

    MODULE_NAME: nt

    DEBUG_FLR_IMAGE_TIMESTAMP: 40b53739

    STACK_COMMAND: .trap fffffffff789a788 ; kb

    IMAGE_NAME: memory_corruption

    BUCKET_ID: 0xA_W_nt!MiRemovePageByColor+af

    Followup: MachineOwner
    ---------
     
  8. 2004/12/08
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Another update

    I've also successfully completed a series of hardware tests using Dells bootdisk-based utilities for this. They have reported nothing wrong with the hardware...

    And the server is ofcourse still crashing. I might turn to Microsoft support here, using the client's support programme :)
     
  9. 2004/12/08
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    valemon- thanks for posting the other dumps, it helps me be more confident in my analysis. The problem here is you are crashing in a bad place. Your PFN list being corrupted, in exactly the same place every time. This is 'odd'. There are a couple of vectors for this to occur. Bad ram is the most likely canidate. It could also be a driver operating on pages in some way, but thats a much lower likelyhood.
    At this point, i'd start with some new memory sticks and see where we are after that.

    In this case, there isnt anything magical that microsoft support could do, and I wouldnt recommend burning an incident with them for this problem unless you need some general guidance. This is crashing in a place that isnt exposed, guarded or tweakable. You pretty much have to use straight up troubleshooting to track these down. Your two vectors are something with the hardware (memory, cpu, motherboard), or some kernel software (drivers, .sys files) is malfunctioning.

    You cannot test hardware with software. Your run time is all over the place, if you are getting bit errors once every 45mins to 11 hours, you arent looking very good for catching it with a few hours under a pattern writer.
     
  10. 2004/12/08
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Thanks for looking into this again!
    I think your conclusion is precise enough for me to work further on. That being said, I've contacted Microsoft, and they have sent me an unreleased hotfix that might sort this out. I will try installing it tomorrow morning, when more complete backups are present ;) If this doesnt work out, its onto the "great hunt of the buggy driver ".

    This is the KB article refering to the hotfix:
    http://support.microsoft.com/?kbid=836049

    Will keep you all posted.
     
  11. 2004/12/08
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    hmm. i wouldn't expect an NTFS hotfix to have an effect on the problem.

    Run MSINFO32 and under 'system summary' grab the Processor line that looks like Processor x86 Family 15 Model 2 Stepping 9 GenuineIntel ~2593 Mhz and paste it in here.

    Also, update your BIOS to the very latest for the machine. I ran across an ERRATA that has been patched in a microcode update (hidden inside bios updates) that apperently could cause similar behavior.
     
  12. 2004/12/09
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    There are 4 lines (dual CPU box)reading:
    Processor x86 Family 15 Model 2 Stepping 9 GenuineIntel ~3052 Mhz

    As for the BIOS, I just upgraded it. Funny that Dell Support wouldnt ask to check wether I'd done this :)

    Lets see how this turns out now.. I just had another bluescreen so I fear neither the hotfix nor the BIOS update helped at all. :eek:

    Update: am working with Microsoft and Dell to resolve this issue now, will keep you posted.
     
    Last edited: 2004/12/09
  13. 2004/12/10
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    A new bugcheck code

    Dell are sending a new motherboard and CPUs, since Microsoft staff says the memory dumps indicate a double-fault bitflip error might be occuring. Related to faulty CPUs from Intel.

    Anyway, this other bugcheck (7f) has started to occur once for every 5 (ish) 0A:

    Opened log file 'c:\debuglog.txt'
    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

    Microsoft (R) Windows Debugger Version 6.3.0017.0
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\Documents and Settings\vikar-111\Skrivebord\MEMORY-7F-2.DMP]
    Kernel Complete Dump File: Full address space is available

    Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
    Executable search path is: C:\WINNT;C:\WINNT\system32;C:\WINNT\system32\drivers
    Windows Server 2003 Kernel Version 3790 MP (4 procs) Free x86 compatible
    Product: Server, suite: TerminalServer SingleUserTS
    Built by: 3790.srv03_gdr.040410-1234
    Kernel base = 0x804de000 PsLoadedModuleList = 0x8057b6a8
    Debug session time: Fri Dec 10 03:37:17 2004
    System Uptime: 0 days 6:01:28.459
    WARNING: Process directory table base 3167F000 doesn't match CR3 00039000
    WARNING: Process directory table base 3167F000 doesn't match CR3 00039000
    Loading Kernel Symbols
    ............................................................................................................
    Loading unloaded module list
    ..
    Loading User Symbols
    ........................
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck 7F, {8, 80042000, 0, 0}

    *** ERROR: Symbol file could not be found. Defaulted to export symbols for SYMEVENT.SYS -
    *** WARNING: Unable to verify checksum for java.dll
    *** ERROR: Symbol file could not be found. Defaulted to export symbols for java.dll -
    Probably caused by : SYMEVENT.SYS ( SYMEVENT!SYMEvent_GetVMDataPtr+6834 )

    Followup: MachineOwner
    ---------

    0: kd> !analyze -v;r;kv;lmtn;.logclose;q
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    UNEXPECTED_KERNEL_MODE_TRAP (7f)
    This means a trap occurred in kernel mode, and it's a trap of a kind
    that the kernel isn't allowed to have/catch (bound trap) or that
    is always instant death (double fault). The first number in the
    bugcheck params is the number of the trap (8 = double fault, etc)
    Consult an Intel x86 family manual to learn more about what these
    traps are. Here is a *portion* of those codes:
    If kv shows a taskGate
    use .tss on the part before the colon, then kv.
    Else if kv shows a trapframe
    use .trap on that value
    Else
    .trap on the appropriate frame will show where the trap was taken
    (on x86, this will be the ebp that goes with the procedure KiTrap)
    Endif
    kb will then show the corrected stack.
    Arguments:
    Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
    Arg2: 80042000
    Arg3: 00000000
    Arg4: 00000000

    Debugging Details:
    ------------------


    BUGCHECK_STR: 0x7f_8

    TSS: 00000028 -- (.tss 28)
    .tss 28
    eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=864884ec edi=864884fc
    eip=f727965e esp=bbeba8b4 ebp=b9eba8c4 iopl=0 nv up ei pl zr na po nc
    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
    Ntfs!NtfsQueueClose+0xdc:
    f727965e 5f pop edi
    .trap
    Resetting default scope

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    CURRENT_IRQL: 0

    LAST_CONTROL_TRANSFER: from f72a212a to f727965e

    STACK_TEXT:
    b9eba8c4 f72a212a 864884c0 00000201 804f1d00 Ntfs!NtfsQueueClose+0xdc
    b9eba94c 804f0473 864fa718 8642e2a8 b9eba9a4 Ntfs!NtfsFsdClose+0x3b2
    b9eba95c f4a5efd4 00000000 b9eba9a4 86430090 nt!IofCallDriver+0x3f
    WARNING: Stack unwind information not available. Following frames may be wrong.
    b9ebaa08 805cf5c0 00f07238 86502e18 85f5ba84 SYMEVENT!SYMEvent_GetVMDataPtr+0x6834
    b9ebab04 8058e482 86502e30 00000000 85f5b9e0 nt!IopParseDevice+0xe89
    b9ebab80 8058dbb9 00000000 b9ebabc0 00000040 nt!ObpLookupObjectName+0x545
    b9ebabd4 8059d4b8 00000000 00000000 585af801 nt!ObOpenObjectByName+0xe8
    b9ebad54 804dfd24 03def9bc 03def994 00000000 nt!NtQueryAttributesFile+0xe6
    b9ebad54 7ffe0304 03def9bc 03def994 00000000 nt!KiSystemService+0xd0
    03def974 77f42cb7 77e426bd 03def9bc 03def994 SharedUserData!SystemCallStub+0x4
    03def978 77e426bd 03def9bc 03def994 02ce3f90 ntdll!ZwQueryAttributesFile+0xc
    03def9dc 005a7ab3 02ce3f90 02e82ce8 266d917a kernel32!GetFileAttributesW+0x58
    03def9f8 0098d59a 033d9ee8 03defa0c 03defa10 java_5a0000!Java_java_io_WinNTFileSystem_getBooleanAttributes+0x4d
    033d9ee8 00000000 00000000 00000000 00000000 0x98d59a


    FOLLOWUP_IP:
    SYMEVENT!SYMEvent_GetVMDataPtr+6834
    f4a5efd4 894618 mov [esi+0x18],eax

    SYMBOL_STACK_INDEX: 3

    FOLLOWUP_NAME: MachineOwner

    SYMBOL_NAME: SYMEVENT!SYMEvent_GetVMDataPtr+6834

    MODULE_NAME: SYMEVENT

    IMAGE_NAME: SYMEVENT.SYS

    DEBUG_FLR_IMAGE_TIMESTAMP: 4005f4a5

    STACK_COMMAND: .tss 28 ; kb

    BUCKET_ID: 0x7f_8_SYMEVENT!SYMEvent_GetVMDataPtr+6834

    Followup: MachineOwner
    ---------
     
  14. 2004/12/10
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    nod. that was were i was headed as well, thats why i wanted to know the stepping (aka the version number) of your CPU. The 7f in pop edi kind of seals the deal on that. I have confidence you are on the path to recovery.

    Let us know how the story turns out.
     
  15. 2004/12/12
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Yet more additional info

    Yeah, I'm really hoping this will sort things out. However, Dell is making a mess of this issue by not being able to ship the needed parts to us so that they arrive prior to the technician.. and so on, its turned out to a bit of a story :)

    My guess, it will be at least tuesday before the CPUs are replaced.

    Anyway, additional info. The CPUs are revision #22 as well. According to Microsoft (referring Intel), revisions up to #18 had the bitflip-bug. But, the BIOS updates I've run on the server also includes microcode updates, so its still possible that we have indeed revision <18 CPUs. Anyway, the revision number is no guarantee that the bug cannot occur :)

    Will update you once theres something useful to tell ;)
     
  16. 2004/12/13
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Allright, Dell's just been here and replaced the CPUs and motherboard...
    And the new CPUs... are.. the same familiy, model, stepping, AND revision as the suspected faulty ones! :(

    That kinda lessens my hope that this would fix the problem. Hehe, anyway.. thats the last of it for now..
     
  17. 2004/12/13
    Scott Smith

    Scott Smith Inactive Alumni

    Joined:
    2002/01/12
    Messages:
    1,950
    Likes Received:
    4
    I bet it's the RAM. Did they change it out?

    Already been through this with Dell. You have to diagnose and make them fix it.
    Had an Optiplex that all of their tests showed fine. I swapped RAM and all was well. Then they shipped me RAM after I told them what I found.
     
  18. 2004/12/14
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Well, 24 hours now, with new CPUs and motherboard. New record, the server hasnt been up for that "long" in at least two months ;)

    Looking good!

    Will need to let it run for another full day before finally conclude that this fixed the problem, but I am very positive it did :)
     
  19. 2004/12/14
    Scott Smith

    Scott Smith Inactive Alumni

    Joined:
    2002/01/12
    Messages:
    1,950
    Likes Received:
    4
    Cool!
     
  20. 2004/12/16
    valemon

    valemon Inactive Thread Starter

    Joined:
    2004/12/07
    Messages:
    13
    Likes Received:
    0
    Case closed :)

    The server's been running fine for 3 complete days now - so I conclude that the faulty CPUs was the issue.

    Along the way I've even picked up how to debug this down to the specific bitflip that occured :rolleyes:

    Thanks for all your help!!
     
  21. 2004/12/16
    JoeHobart

    JoeHobart Inactive Alumni

    Joined:
    2004/05/19
    Messages:
    919
    Likes Received:
    1
    Outstanding! I'm glad we could help.
     
Thread Status:
Not open for further replies.

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.