PDT Failure and Slow Epicor

Awesome so something is going on with your Host. Atleast we can narrow that down :slight_smile: If you go to any OS hosted on that Host and you get the same results, now you know to look at the Host.

Unlike Jose I dont sneak in bitcoin miners. :cowboy_hat_face:

3 Likes

Got a full host reboot planned at noon today.
Hopefully that helps.

1 Like

Ether is where it’s at these days :wink:

2 Likes

Can you just Remote into the Host assuming its HyperV with an RDP Session and look at the Task Manager, it should reveal 99% cpu usage or something? or are you vSphere?

Dis one.

If you login into vSphere it reveals no Alerts, Warnings or if you look at overall stats there isn’t 1 Guest using all the CPU and your Host is balooning ? Just FYI I did have HW11 Bottlenecks and VMWare posted it in their KB Article that HW11 and Windows 2012 don’t mix together well. But that would be on the Guest level.

No events, CPU is showing a max of 16% for the host, and memory ballooning shows zero across the board. We’re running Server 2008 R2 right now.

Lastly what about your Storage, is that being maxed? After all your .vm images are just running on a Hard Drive and at the mercy of that Hard Drive’s performance. We had QNap bottlenecked (they have a seperate UI to monitor that) and Fiber Channel lost 1 Channel or whatever its called in a Cisco Blade.

Anyways - Was hoping to see the cause, before just a reboot :smiley:

Overall our result was we were not using even 10% cpu and it was always capped at a speed… We logged into our Blade Portal and updated all the Cisco Firmwares to be equal and we broke out of that jail. But if you haven’t done anything and this just started, then yeah perhaps Host reboot - who knows - sometimes power cycle has powers.

Storage looks good. Archive is just a repository for SQL backups.
image

For sure. Root causes are always good to find, but I can only mess around looking for one for so long before people bust out the pitch forks, lol.

For the sake of this thread, a year ago w/ Epicor reviewing our vSphere. If anyone is curious. #vSphere

Meeting Minutes:

9:10AM: 1 Issue, 8 Sockets on VMWare - Standard SQL can't see more than 4 sockets. (Sockets vs Cores)
Keep the Sockets under 4. (Issue, needs fix)

9:11AM: Launched vSphere for Sherman to show us what he means by Socket Configuration
- CPU tab (1 socket, 8 cores) - Change it to 4 sockets and 2 Cores.

9:15AM: Looked at the Network Adapter - VMXNet 3 - The CM Servers should be ParaVirtual SCSI (some are not, Issue)

9:22: There is some latency there, it will affect how the client protocol performs on the LAN - However E10 uses NET.TCP w/ good compression.
- Downloaded tcping.exe to run TCP Pings
- tcping.exe -t 192.1.10.106 445

PLUS: The Pings 40ms are really consistent, which is very helpful

9:32: Pinging SQL Server
  - tcping.exe -t SERVER-SQL1 1433
  Looking for majority of the pings to be about 0.5ms, we are looking good! averaging 0.5ms as expected.

9:35AM: Run 32-bit Performance and Diagnostics Tool
  - We should be receiving 1/2 second of time to retrieve Packet 0.5 -- we are at about 0.8
  - Sherman: "This could indicate that CPU is not running at full speed"
  - We are looking at <300ms yet its 554ms - BIOS Settings are not set to MAX Performance
   - Lets proof this by going to vSphere looking at the Power Schema at the Host Level

  We are currently running on "Balanced". LETS Change it to "High Performance" (Lets do this also to CPS Later!) - It will BOOST our CPU Performance to meet right Metrics!
  You have to change this at UCS BIOS Setting really. (Cisco UCS Manager at 25.25.254.100)

Bios Policies... 
     C1E Disabled It!, 
     C3, 
     C7 and on CPU Performance set "Enterprise" it will give us the Faster peformance
     [ Will Require a Later Reboot ]

  Power Technology, leave that are Performance and Energy Performance leave that at Performance too.
  P-STATE Coordination: Let's to "SOFTWARE" all and we will set it in vSphere to "High Performance"

  Package C State: C0
  Memory RAS Config: maximum-performance

  VT For Directed IO: Enabled
  NUMA: Enabled

- Disabled SpeedStep which underclocks the CPU
   Short for Enhanced Intel SpeedStep Technology, EIST allows Intel processors to run at a lower speed, which reduces the overall power consumption. This feature is very useful in portable computers that use battery power.

1 Like

Have you rebooted yet? LoL

Everyone at the plant is saying “Aaron’s System is slow!” as if he build the system, he gets all the credit however. :slight_smile:

3 Likes

Still sitting at 13k+ after the host reboot.

Something must have gone bad with the Hardware. If you didn’t do anything, thats all I can think of. Do you have additional Hosts and can you Move your Images to that Host via vSphere, so you can play with the bad Host. We have the ability to move Guests to other Hosts without disrupting anything, as they are running.

I’m not sure. I don’t think so.

Time to search your Host makers forums / websites :slight_smile: (HP or Dell) =) Worst case you end up calling VMWare Support.

Don’t suppose you have a faulty disk in your array?

1 Like

I’m still learning to hardware. The lack of AC and the temperature issues you wenr through is likely the culprit

Apparently, I did what was called a “warm reboot” of the host yesterday. I guess I was supposed to do a “cold reboot”. There’s something to do with the Dell servers that when something like the heat increase happens, the sensors don’t reset properly unless you do a cold reboot.
Gonna give it a try in a few minutes.

Success! Cold reboot, everybody. Lesson learned.

image

2 Likes

By the way the upper number 5182, I got that off the Epicor PDT Test but its not displayed anywhere so when it says >= 1000 FAIL, I usually ignore the upper reading, I never have gotten that to pass, perhaps Epicor just uses it as a “warmup” because the 2nd number 3546 is the one that is displayed on PDT :slight_smile: FYI if anyone is wondering. If anyone uses the tool, just focus on the 2nd reading.