Does anyone have experience in or led the development of a data infrastructure and processes? We’re a young startup and are currently using mostly read replicas with Periscopedata on top of it. The challenges I’m facing is two fold.
If I ask a question, two different analysts will come back with two different answers.
It always takes hours as the queries take quite long to execute.
I hope to get some insights on how we should start the journey to address these issues.
I suggest you start moving your data to redshift or big query, this will increase the speed of your queries massively. And in parallel you should build a simple data dictionary where you explain the overall data and table relationships and it also could document the definition of the most used business metrics.
The cost of having basic queries take hours is hard to quantify, but it is massive. it not only means you take a lot longer to respond to ideas / needs (especially taking into account that hard questions take multiple iterative questions), but beyond that that you implicitly make an order of magnitude less queries that you would otherwise make.