I finally made a support case on this and this. Yeah separate cases so they don’t get confused. Gave neat repro steps, lots of screen shots. Well confusion ensues and eventually a screen recording is needed. More confusion and a voicemail, still can’t repro
Eventually I answered his phone call and we talked it over. Wow.
The guy was nice n everything and clearly wants to help. this isn’t about him it’s about the repro-gate.
For an issue to pass front-line support, they must reproduce and write up steps and create a PRB makes sense. repro is a logical gatekeeper
Here’s the thing: I was blown away when he said they’re using an ON-PREM reference instance trying to repro cloud user issues.
So the best we could do is write some notes and pass off to cloud-perf team
I mean shouldn’t they be working in the cloud? Who’s even testing cloud stuff? How many Linux issues are gonna pass this gate when they’re trying to repro on windows? What a mess.
You gotta even feel for these folks in between stressing your company may shutdown cuz your cloud plat is broke. So broke.
This depends on the issue, right? If it’s a problem with the application, it shouldn’t matter. If it’s environmental, that’s the next thing to check. Imagine the alternative. Would you let Epicor log into your system with full admin? If I was Epicor, I wouldn’t want to be accused of stealing data or bringing down a customer’s system.
The same is true for cloud. Why take on that liability? Even Microsoft has a very narrow way to access your cloud infrastructure with all kinds of approvals and gate keeping.
Moving to Linux will make support’s job even easier since most of the environmental stuff is encapsulated in the container instead of keeping VMs for every version and manually patching to the customer version for each ticket. The beauty of the container model is it doesn’t matter where you run it.
Yes, depends on the issue. I’m not saying they should jump on customer systems everytime but they should repro cloud multi-tenant customer issues in a cloud MT environment.
Besides being an impossible job to repro any infra issue means we’re both wasting our time, it’s the wrong gate to get my issue into their pipeline.
Whether it’s app or infra, it’s on them, doesn’t matter to me.
Infrastructure issues shouldn’t go to application support. Of course, we need a reliable way to know what is an infra problem vs an application problem. If a bad BPM or BAQ is eating up memory or cpu, is that an infra problem, application problem, or user bad practice?
Frankly, I’d rather move away from problem reproduction and focus more on observability. If we can extract actual network traffic, actual logs, screenshots, and correlated environmental statistics (cpu, memory, network, settings, etc.), is far more productive than trying to reproduce in another environment. This is far more favorable to me than letting someone have access to my data either by sending the database or inviting others into the tenant.
I agree. Hopefully they move this direction. But for now it’s “send us your data” to reproduce every issue. I think they have to in order to create a problem. Observing the issue isn’t sufficient.
But if I have the actual network traffic logged to and from the client that causes the error, isn’t that a good place to start before sending GBs of data and trying to match the environment?
Yes, observeability. Except let’s not wait for a million needles to pile up in the production haystack when customers can demo a problem which likely ties to a known story.
I won’t bore you with the details of these issues as I gather SolBench is rather under utilized by folks on here (for good n obvious reasons). But in both cases, a known major change to SolBench was implemented in the latest patch.
Shouldn’t we raise the ‘potential’ issue to the owner of that story before adding it to a stack of needles that’ll never get found? I mean they’re the ones that didn’t do the QA, after all. In this case they’d be like ‘ohh yeah we forgot to test that silly SQL sproc that searches xxxdef cross-join solution install history’. I bet there’s a missing index now that layers are imported this way instead of that way. yup that was it, all fixed.’
I meant not sufficient for Epicor today. Even if you provide them all that they still have to recreate the problem on their own to create a dev ticket.
SWB needs to be revisited completely. Why try to find all the needles in an outdated approach to moving customizations? Why are we moving the customizations in their persisted state? This introduces all kinds of coupling to specific versions, which means hashing logic to prevent manual altering of these records, etc. This format is also not source code friendly. You can’t diff these records easily. If we saved these customizations as JSON, and then used the exact same routines used by the GUI to import them, now we have a source record that is compatible with source code. We also can import it and reject features not implemented at that time. We can also automate the creation of text with AI. This is all not possible with SWB and the multitude of Import/Export routines. Cut bait, SWB in its current state is not worth keeping.
I’m just saying it’s there and it worked one day and not the next. Someone is responsible for the change and related QA. Fast track issues to them. Watch things get solved, watch QA improve.
I know I’ve harped on this before, but I think it’s relevant to the discussion here.
10 years ago, most of our support cases would very quickly lead to some kind of desktop sharing meeting, so we could show the issue. The reps would understand the problem, take it to their environments, repro (or not the problem) and go from there. Sometimes we needed to upload the database. Sometimes we’d get as deep as looking into SQL right way to troubleshoot.
These days, a new support case has days of back and forth with requests of baqs, videos, and whatever else. Live meeting or phone call seems to be the last thing they’ll attempt, after you’ve escalated the situation. This obviously lengthens time to resolution, how much effort and time you have to take, and honestly I think it also increases the amount of time wasted by their own reps, so I don’t it is efficient for them either.
To @Mark_Wonsil ‘s comment above about giving access to Epicor’s employees, a remote desktop sharing meeting as first point of contact doesn’t have many security issues, you can still not give them access to do anything but look at it, and you can drive following their instructions. Sometimes easier if you do let them take control, but you can see everything they’re doing. Then there’s no complication of different environments and infra, cloud or premise. There are some issues that are not necessarily something that can be reproduced, but maybe data corruption of some sort, so it needs to be specific for your environment anyway.
Just to be clear, when I say observability, I’m not talking about somebody watching it live on a screenshare or by recreating it locally. I’m talking in terms of modern observability. Here’s the definition at the open-source Open Telemetry site:
What is Observability?
Observability lets you understand a system from the outside by letting you ask questions about that system without knowing its inner workings. Furthermore, it allows you to easily troubleshoot and handle novel problems, that is, “unknown unknowns”. It also helps you answer the question “Why is this happening?”
To ask those questions about your system, your application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue, because they have all of the information they need.
This goes far beyond screen recording. For example, I created a Playwright trace of a session in the Education database. I toggled core logging and it captures every screen, every console log, every network event with all the same information we see in a debug session. With all that, it was about 13Mb. I can send that to anyone, and they can watch the screenshots, the console log, and the network calls. You can see the Kinetic JSON in the result of the GetApp call. We don’t have to schedule this time with anyone. We record and put it in a place where the reader can get it. They go to http://trace.playwright.com and it will run locally to display the results. Here is the sample trace I did earlier today just navigating around the Education Database for a couple of minutes. Anyone can download it and see everything above.
Now, this is only the client side, which is more important right now because of the classic sunset. But Epicor could allow us to enable this type of capture at the server too. Having these signals recorded along with environmental settings and levels would be less intrusive than granting access to the whole database.
Why are you needing to do ANY of this? Isn’t the point of the cloud that is should just run and you don’t need to deal with these types of problems anymore?
All the cloud services run on what’s called “The Shared Responsibility Model”. Nobody is offering a service that allows users to run amok for a fixed price. For cloud services that have a platform aspect, like Salesforce, AWS, Azure, etc. this is even more true.
I think of cloud services as a utility. Can I use all the electricity I want during a heatwave? Can I use all the water I want during a drought? Should it “just run?” or will there be brown-outs and low water pressure suffered by others?
Then were is the saving really? Cloud is supposed to reduce the need for your I.T staff to manage infrastructure (I get the bad BAQ is killing the service). Infrastructure issues should not be the issue of the customer. If you have a bad BAQ etc they should be telling you about it, the OP said “Infrastructure issues should go to support”.
Electricity vendors have almost no way of stopping you from using power along with water providers so it’s a bad example. And if they charged cloud prices they would probably get told to jog on.