Last time I was involved into project because of some strange issue. Guys had environments divided on CM and CD, there were a few of them, and on each was problem with propagating content.
Let's start with details, well, imagine situation when you change something using content editor on CM server. Then publish that changes and go to CD server. Now you notice that there are no changes here. Strange, right?
Ok, then imagine situation when things mentioned before happened, but sometimes,nobody knows why, it's working. Yeap, in some situation your changes, like new field values or new items, appear on CD.
Nice?
Than imagine situation when things mentioned before happened and additionally you have an access from CD to Sitecore dashboard. We assume some intern didn't know to cut off access from there. To clarify both CD and CM use the same set of databases. To check master database we are making some changes in Content Editor and save it, than without publish we go to CD, go to Sitecore dashboard and open Content Editor. What we see ? There is no our changes. What more? sometimes it works, like on Web database.
To summarize:
- we have separation on CM and CD
- both use the same DBs set
- sometimes we see changes on each DB
Strange ? Strange is an euphemism.
What we think firstly? Come on guys, there should be some other DB or some corrupted connection strings. Some other ideas, maybe some Load Balancer switch sometimes to other app version? Or maybe there is an issue with EventQueue, it's overloaded and doesn't refresh remote server so often as it should. Other ideas?
After passing above points we also figure out a few more and our list after some time looked like:
- checked DBs connections
- checked LoadBalancer settings
- checked overflow of EventQueue and PublishQueue
- checked sites html definitions
- checked AWS RDS with Sitecore vanilla
- checked dynamic cache on IIS
- checked cache server possibility before application servers
After went through whole list, everybody can feel disappointed. So, the best assumption was check data propagation one more time, hence we came back to EventQueue. As before, it wasn't blocked, overflowed or something, so looked nice. I got instance name from InstanceName column and check with machines and what discovered, both had the same name !
What it means ?
It means there was sort of race between servers during reading
EventQueue. Who read first EventQueue, then it has the newest value,
second server in reading saw it has been read already by itself (the
same instance name) and didn't invoke any event to refresh.
It was probably occurred by devops server creation, who just clone previous server :)
To ad hoc fix I've setup InstanceName settings in config. for each application instance on servers.
I hope it will help someone, because sometimes so small things can be imperceptible.
Mystery solved !
No comments:
Post a Comment