Post by account_disabled on Dec 6, 2023 10:37:05 GMT
Simple to implement, is extremely important. In the case of medium and large data sets, it can speed up work on them even several times. If data analysis is performed on a flat table containing approximately several dozen thousand records and several dozen columns, this step can reduce its size by up to several times. This translates into analysis time. take one instead of a dozen or so seconds, and consequently, executing a full analytical script will take a few instead of a dozen or so minutes. Although these are not large numbers, remember that building and testing the script involves running it many times.
At this stage, we transform the available data. This is a simple Email Marketing List task and can be performed in most analytical tools. For this step, be mindful of the data types supported by our analytics environment and always strive to minimize the memory needed. The most memory-hungry types, such as string (words, tasks, strings of characters), should be avoided. Instead, it is worth aiming for the most parsimonious ones, such as boolean (true/false). Manage missing data appropriately The next step in the data preparation process is the appropriate management of missing data in the database.
In every production database, we encounter a situation where the value on a given object is empty. The classic mistake in such a case is to assign this value to zero. However, the lack of data in the selected cells does not mean that the value in this place is zero - there is no basis for such action. Also, what if the column should be a date or product description rather than a number? Here, we cannot replace the lack with zero. The first step is to determine the type of deficiencies due to their origin. This way.
At this stage, we transform the available data. This is a simple Email Marketing List task and can be performed in most analytical tools. For this step, be mindful of the data types supported by our analytics environment and always strive to minimize the memory needed. The most memory-hungry types, such as string (words, tasks, strings of characters), should be avoided. Instead, it is worth aiming for the most parsimonious ones, such as boolean (true/false). Manage missing data appropriately The next step in the data preparation process is the appropriate management of missing data in the database.
In every production database, we encounter a situation where the value on a given object is empty. The classic mistake in such a case is to assign this value to zero. However, the lack of data in the selected cells does not mean that the value in this place is zero - there is no basis for such action. Also, what if the column should be a date or product description rather than a number? Here, we cannot replace the lack with zero. The first step is to determine the type of deficiencies due to their origin. This way.