What is the reason for having column families? Example:
Scenario 1 :
Table Row-Key ColumnFamily1 ColumnFamily2 ColumnFamily3
Scenario 2 :
Table1 Row-Key Column1...ColumnN Table2 Row-Key Column1...ColumnN Table3 Row-Key Column1...ColumnN
In scenario 1, although a table can have many column families, all column families are stored separately. Then why is there a concept of column families itself? Why can't there be simply scenario 2? Again with scenario 2, I'm not blocking any feature HBase provides. You can still add dynamic columns later on (and other features).
My only concern is, if the column families are stored separately, then why they are in the same table? I'm only interested in what is the intent of having column families (and what problem it solves)?
A table, by definition, is a unit of organization for data which logically belongs together. Column families provide you with a way to create substructure within your table in order to optimize performance based on your access patterns (that's the problem it solves).
In practical terms, although column families within a table are stored "separately," in different files, they are also stored "nearby" in the sense that HBase stores all the values for a given row in the same Region. This includes the separate files for column families. Although they're in separate files, they're owned by the same Region Server.
By contrast, if you divided your data into different tables, parts of the same "row" would live in different HBase Regions, and when accessing them you'd pay the overhead of lookup on different Region Servers in your cluster.
So if you opt to put some of your data in a separate table rather than in a column family, not only are you organizing your data in a way which could become hard to manage, you're also forfeiting a lot of performance advantages from HBase.